Evanescent coupling of nonlinear integrated cavities for all-optical reservoir computing

We consider theoretically a network of evanescently coupled optical microcavities to implement a space-multiplexed optical neural network in an integrated nanophotonic circuit. Nonlinear photonic network integrations based on evanescent coupling ensure a highly dense integration, reducing the chip footprint by several orders of magnitude compared to commonly used designs based on long waveguide connections while allowing the processing of optical signals with bandwidth in a practical range. Different nonlinear effects inherent to such microcavities are studied for realizing an all-optical autonomous computing substrate based on the reservoir computing concept, and their contribution to computing performance is demonstrated. We provide an in-depth analysis of the impact of basic microcavity parameters on the computational metrics of the system, namely, the dimensionality and the consistency. Importantly, we find that differences between frequencies and bandwidths of supermodes formed by the evanescent coupling are the determining factor of the reservoir’s dimensionality and scalability. The network’s dimensionality can be improved with frequency-shifting nonlinear effects such as the Kerr effect, while two-photon absorption has the opposite effect. Finally, we demonstrate in simulation that the proposed reservoir is capable of solving the Mackey–Glass prediction and the optical signal recovery tasks at gigahertz timescale.


Introduction
Artificial neural networks (ANNs) are a computing approach different from conventional algorithmic concepts.They are inspired by the most basic principles of human brains, where numerous neurons are connected with synapses to create a network.Currently, ANNs are mostly implemented on the von Neumann architecture, i.e. on digital computers, and, while successful in many fields, this approach has its own drawbacks, mainly a data bottleneck inherent to the architecture as well as significant power requirements.In addition, the latency between an input and a computed output presents a major concern in some applications.One example is optical communications, where an exponentially growing demand for internet connectivity requires increasingly faster processing of optical signals [1].A conventional approach is based on the processing with digital algorithms [2][3][4] or more recent ANN-inspired solutions [5].However, at higher bandwidths, efficient processing becomes complicated.The development of ANNs in a physical layer can help overcome this challenge [6,7].Besides analog electronics [8], the manipulation of optical signals is highly relevant when a large bandwidth operation is important, as was demonstrated in the field of microwave photonics [9].For implementing such computing systems, integrated photonics is a promising platform that provides compactness and intrinsic nonlinear effects necessary for efficient and scalable ANN in hardware [10,11].
Reservoir computing (RC) has attracted a lot of interest in the in-hardware computing field as it provides an implementation-friendly ANN topology with competitive performance and a simplistic training procedure [12].RC on integrated photonic platforms has been demonstrated in simulations [13,14] and experimentally [15,16].These works considered optical microcavities or splitters as neurons with long intra-neural connections, i.e. with a non-negligible coupling delay.This was required to match the system's internal transient timescales to available electronics [15].However, these delay lines are costly in terms of chip footprint and optical losses and limit scalability.For example, a 280 ps delay requires 2 cm of silicon waveguide [15], which results in approximately one neuron per square millimeter of the chip surface.Coupling without delays has been demonstrated in an RC based on nonlinear photonic crystal (PhC) cavities, which was applied to the generation of optical signals modulated at hundreds of gigahertz [17]; however, if such a network could be applied to the processing of optical signals with a typical bandwidth of tens of gigahertz was not clear.Another delay-free RC was implemented in a large-area PhC cavity [18].Even though this design drastically reduced the chip footprint, its nonlinear operation is unlikely.
This study considers evanescent coupling between adjacent cavities as an alternative to create maximally compact integrated all-optical computing systems.We propose an RC consisting of a rectangular grid of evanescently coupled nonlinear microcavities and study the impact of cavity parameters and reservoir geometry on its basic computational properties, namely the dimensionality [19,20] and the consistency [21].We found that a longer photon lifetime allows for better scalability due to more variability between coupling-induced supermodes of the system.Importantly, we found that the nonlinear effects can impact computing performance in opposite directions, where two-photon absorption (TPA) reduces the reservoir's dimensionality, while free carrier dispersion (FCD) and the optical Kerr effect increase the dimensionality.Moreover, unlike TPA, a sufficiently strong FCD and Kerr effect induce chaotic dynamics in this purely dissipative network without gain.Finally, we evaluate the RC's performance in Mackey-Glass prediction and optical signal recovery tasks.For simulations we used cavity and waveguide parameters accepted as standard for GaAs integrated photonic circuits.We found that such integrated photonic reservoirs show a comparable performance to other implementations found in the literature, which shows the great potential of this maximum integration-density approach.

System of coupled nonlinear microcavities
The backbone of our photonic RC is an N ∥ × N ⊥ square grid of evanescently coupled nonlinear cavities (see figure 1(a)).One side of the grid is coupled to the input waveguide according to cavity-waveguide coupling coefficient κ.Through this waveguide, an optical input s(t) is injected.Assuming intra-cavity and cavity-waveguide coupling strengths are weak, the coupled-mode theory can be applied [22].For a purely linear case, the electric field of microcavity modes a(t) can be described with where ω are the cavities' resonant frequencies, Γ o are intrinsic optical losses, Γ e are optical losses to waveguide, ⊙ is the Hadamard product, Mµ , Mκ are evanescent and waveguide-assisted cavity-cavity coupling matrices, K i is an input waveguide-cavity coupling matrix (see appendix A).Here, ] (section 7.5) if the kth and mth cavities are evanescently coupled and For simplicity, we assume that k can vary to some degree, e.g.due to fabrication tolerances.The phase terms φ km and φ i k are defined by a chip geometry and are considered independent and identically distributed (i.i.d.).The strength of intra-cavity and the cavity-waveguide coupling is weak when fulfilling Evanescent coupling can only induce an interaction between cavities when resonances are sufficiently aligned.For that reason, unless stated otherwise, we assume ω k = ω 0 .Moreover, the direction of cavity modes should be taken into account.For instance, each microring resonator (MRR) resonance is doubly degenerate with counter-propagating modes.If an MRR-based reservoir is implemented exactly as in figure 1(a), both modes of each cavity can be excited, and a would have 2N components.In PhC cavities, however, resonances are nondegenerate and, therefore, a has N components.For simplicity, we transform the MRR reservoir such that only one mode of each MRR cavity is excited (see figure 1(b)).
In order to extract signals from cavities, they are coupled to independent waveguides, outputs of which are z(t) = K o ⊙ a(t), where K o is a vector of output waveguide coupling coefficients, for which we assume a uniformity according to K o k = κ o .These waveguides introduce an additional optical loss, which has to be included in Γ e .We consider three nonlinear effects commonly present in integrated photonic microresonators: TPA, FCD and the Kerr effect.TPA can be described by adding to the right side of equation ( 1) [24].Here, β 2 is a TPA coefficient, c the speed of light, n the refractive index, and V TPA is a nonlinear effective volume [25] where max and U(r) is a cavity's normal mode, normalized such that 0.5 ´ε0 ε r |U(r)| 2 dr = 1.
FCD causes a microcavity's resonance to shift by ∆ω k as a function of free electron density N k [26].This effect adds a term i(dω/dN)N k a k to the right side of equation (1) [27].Such free electrons could be generated via a linear photon absorption or TPA.The former is possible in direct-bandgap semiconductors but causes high optical loss independent of the input power, unlike with TPA, where it linearly increases with the optical energy stored in a cavity.Considerable losses in systems without gain make large networks unrealistic.We, therefore, exclude linear photon absorption as a sustainable effect producing nonlinearity and only consider TPA for free electron generation.The density of TPA-generated electrons is computed by [24] dN where Γ c is a free electron recombination rate, V c is a volume across which electrons spread through diffusion, and I k = |a k | 2 is the optical intensity inside the kth cavity.The Kerr effect causes a resonance frequency shift in response to a stronger electric field inside a cavity and is described by adding i(dω/dI)I k a k to the right side of equation (1).While the impact of the Kerr effect is typically weaker than FCD, it is still useful to be considered as an isolated case, as, unlike FCD with TPA, it provides a resonance frequency shift without an additional optical loss.

A photonic reservoir computer based on evanescently coupled nonlinear cavities
An RC is often introduced as an echo state network (ESN) [28] (see figure 1(c)): where x n and u n are vectors of neuron activations and inputs at a time step n ∈ [1, 2, . . .T], α is a leaking rate, Ŵ and Ŵ in are a recurrent and an input weight matrices.The ESN's output is defined as y n = Ŵ out x n , where Ŵ out is an output weight matrix.The readout training for software-based models is typically performed using ridge regression [28], and here, we use the same approach for our simulation.Assume an ESN is being trained to solve a task with input signals u n and the ideal target outputs y tgt n .First, compute update equations for T steps to obtain a set of ESN state vectors x n , which are consequently appended to create a matrix X.Similarly, from y tgt n , one creates Ŷ tgt .The optimal output weight matrix is then given by where ⟨. . .⟩ ij is an average over all matrix indices ij, ζ is a regularization constant and Î is the identity matrix, superscript (. . . ) T is the matrix transpose operation.
In the previous section, we proposed a photonic system that receives optical signal s(t) as an input and provides a set of optical signals z(t).We suggest to treat this system as a reservoir computer.In the integrated photonic hardware, readout weights could be realized based on tunable attenuators with transmission w k , phase-shifters according to φ k and a combiner tree to implement a linear combination of z k (t) that creates the output optical signal y(t): This way, the conversion of optical input s(t) to optical output y(t) is done fully optically, as electronics only control attenuators and phase shifters that are constant during a computation.

Linear regime
We first consider the case when the input signal is not sufficiently strong to induce a nonlinear response by the microcavities.An important value for the following discussion is the input signal's bandwidth BW, which we define as the span between its spectral half-power positions.As we will show, the computational properties of such photonic reservoirs depend mostly on the microcavities' spectral characteristics.In that regard, an important consequence when evanescently coupling microcavities is the formation of supermodes with split resonances (see figure 2).Consider the linear part of equation ( 1): eigenvalues of which are λ E k .Then, the average supermode bandwidth is Γ S /2π = −2Re λ E k k /2π.Importantly, the coupling-induced supermode splitting increases the reservoir's response bandwidth significantly beyond the bandwidth of a single resonator.We define the reservoir bandwidth (RBW) as the maximum distance between the network's supermodes measured at half intensity.For example, without the input waveguide, two identical independent cavities with resonance frequencies at ω 0 and coupled with rate µ become split ω 0 ± |µ|, leading to RBW ≈ (2|µ| + Γ S )/2π.For multiple identical cavities in a chain, equation ( 9) is a tridiagonal Toeplitz matrix, the eigenvalues of which are given by [29] meaning that RBW ≈ (4|µ| + Γ S )/2π.For non-trivial connectivity architectures, an analytical derivation of RBW becomes non-trivial.However, in numerical simulations, we found that supermode resonances stay within a ω 0 ± π|µ| interval (see figure 2).Therefore, we assume that RBW ≈ (2π|µ| + Γ S )/2π for such an evanescently coupled microcavity system.Furthermore, as will be shown later, in the cases we are interested in Γ S ≪ |µ|, and hence RBW ≈ |µ|.
An important computing metric of reservoirs is the dimensionality D, which corresponds to the systems' degrees of freedom.Since reservoirs, and ANNs in general, perform a high-dimensional expansion of input signals [28], a higher dimensionality usually leads to a better computing capacity.To determine the dimensionality, we generate a random sequence s0 , where √ I 0 (t n ) and φ 0 (t n ) are randomly sampled according to a white noise distribution.Then, an 8th-order Butterworth low-pass filter with a given bandwidth BW is applied on s0 (t n ) to obtain s 0 (t n ), which is then linearly interpolated to s 0 (t).This signal is used to modulate an optical carrier that is injected into the reservoir as an input.Here and throughout the article, differential equations are integrated using the DifferentialEquations.jllibrary [30] in the Julia programming language [31].The photonic reservoir's dimensionality is then determined by applying the principal component analysis (PCA) [32] on z n .Importantly, for the PCA, we split electric fields into a real and an imaginary part, which implies that the maximum dimensionality our reservoir can have is 2N.It is, therefore, convenient to introduce a normalized dimensionality D = D/2N.
At this stage, it is beneficial to consider the supermode concept in more detail.A supermode is an independent, i.e. an orthogonal oscillation of the coupled system's electric fields.Each supermode can be considered a virtual microcavity that performs pass-band filtering at its resonant frequency and bandwidth.Crucially, the coupling-induced splitting of supermode resonances makes their frequencies vary, promising a high dimensionality.To start, a supermode needs to receive input.For that, we consider the impact of the injection BW on the reservoir's dimensionality.As we see in figure 3(a), when BW ≪ RBW, the dimensionality is low.In this case, a large number of supermodes are off-resonance relative to the injected field, and hence, they cannot be excited.With an increasing BW, more and more supermodes are able to interact with the injected information.As a consequence, the reservoir's dimensionality increases until BW = RBW, when all supermodes become excited.Accordingly, the physics of supermodes in coupled arrays results in a necessary bandwidth-matching condition for maximizing the system's dimensionality: BW ⩾ RBW.Importantly, here, the network was driven with white noise, i.e. by a signal with a flat spectrum.However, in realistic and application-relevant situations, such a signal is unlikely.Certain supermodes will be excited less than others and accordingly contribute less to the network's dimensionality.
The microcavity network's dimensionality relies upon individual supermodes being driven by different aspects of an input signal, which requires supermodes to be sufficiently separated.Therefore, to quantify this metric, we consider an average supermode spacing RBW/N, normalized by the average supermode bandwidth which can be considered a measure of spectral span available for individual supermodes.In figure 3(b), we see that the dimensionality is low for F Γ ≪ 1, and for F Γ > 1, it is maximized.This threshold may change depending on system parameters; however, we find that it is typically comparable to unity.Therefore, a second requirement for a high dimensionality is F Γ ⪆ 1.Again, this is not a sufficient condition since it does not ensure that all supermodes are separated.Spacing between supermodes is typically inhomogeneous, and supermodes can potentially overlap, even if F Γ is large.This effect can be mitigated through a stronger coupling to the input waveguide κ i , which can shift resonances or increase the bandwidth of some supermodes, as seen in figure 2 for a 4 × 3 reservoir with and without the input waveguide.In figure 3(c), we see that this approach can be effective when N ⊥ is low, i.e. when overall coupling to the input waveguide is relatively stronger for the supermode splitting.
From F Γ a scaling behavior of the dimensionality can be determined.Since F Γ ∝ N −1 , a larger number of cavities leads to a lower F Γ , which leads to a lower D. This, however, can be compensated by lowering optical losses, as F Γ ∝ (Γ S ) −1 .Indeed, in figure 4(a), we see that with a high µ/Γ o (which is analogous to F Γ ), the dimensionality can reach the theoretical limit.Furthermore, the dimensionality varies for a given number of cavities depending on their geometric arrangement.A minor fluctuation of the dimensionality is due to a seemingly random nature of supermode formation-supermode frequencies of a 24 × 1 grid are different from a 12 × 2 grid.However, there are also outliers with a considerably lower dimensionality.In these cases, N ⊥ is high, which allows supermodes to localize spatially further from the input waveguide.As a result, these supermodes become too weakly coupled to the input and, as a consequence, contribute less to the system's dimensionality (see figure 4(b)).In addition to that, a lower N ⊥ allows for a higher dimensionality even at a low F Γ , likely due to the impact of the input waveguide discussed above.

Nonlinear regime
In order to induce a nonlinear response from the photonic system, an optical power of at least P NL needs to be injected into the cavities.This power depends on the material, cavity parameters and which particular nonlinearity is to be leveraged.We define P NL as an input power at which a nonlinear effect equals a related linear effect.TPA increases losses inside the cavities according to Γ TPA , and it is natural to place P NL where these nonlinear losses start matching linear optical losses, i.e.Γ TPA ≈ Γ S .FCD and the Kerr effect cause resonance shift ∆ω, which, in turn, is natural to be compared to an average bandwidth of supermodes, hence also Γ S .Therefore, an optical loss rate is the normalization factor determining the cavity array's nonlinearity power threshold for all considered nonlinear effects.For a single cavity injected with a monochromatic wave, one can determine P NL analytically (see supplementary material).The result is shown in figure 5(a) for various integrated platforms for MRR and PhC cavities based on data obtained from literature (see table 2), and the horizontal axis shows an operating frequency that is BW/2.Assume a cavity with a bandwidth Γ S and an electron relaxation rate Γ c .This cavity can receive most of the optical power of an injected signal if BW ⩽ Γ S /2π.However, a low Γ S corresponds to a higher loaded Q-factor, allowing for a stronger nonlinearity and, consequently, a lower P NL .We therefore assume that BW ≈ Γ S /2π.Additionally, the scale of BW relative to Γ c is also important.When BW ≪ Γ c , the generated free electrons recombine too fast, and FCD becomes negligible, whereas TPA becomes the dominant nonlinearity.Then, P NL is computed w.r.t.Γ TPA ; see the dashed line in figure 5(a).When BW is comparable to Γ c , FCD becomes stronger than TPA.Then, P NL is computed w.r.t.∆ω, and this case is shown as a solid line in figure 5(a).Finally, when BW ≫ Γ c , the electron density cannot react to the changes to the input signal in time, and FCD becomes irrelevant, see dotted line in figure 5(a).Compared to MRRs, PhC cavities require less power and allow a faster operation on the same material, as a smaller mode volume corresponds to a stronger light confinement and a stronger nonlinearity, while a higher surface area enhances Γ c [34].
For a network comprised of a large number of cavities, analytical treatment is not immediately tractable.We, therefore, resort to a numerical approach.Consider a reservoir with TPA as a nonlinearity.Since its supermodes are split due to evanescent coupling and, as we found in the previous section, Γ S ≪ RBW, a low-bandwidth excitation of this nonlinearity is unsuitable.Therefore, the input changes in time and modulates Γ TPA k on a potentially individual resonator level.In order to then compare it to the scalar Γ S , we consider its standard deviation across many input samples σ t [Γ TPA k ].Another natural option could be an average over samples.However, consider a case when TPA is dominant and , then the loss terms in equation ( 1) for the kth cavity is Here, ⟨Γ TPA k ⟩ t only increases the linear loss; hence, the kth cavity is close to linear.However, averaging would still classify the cavity as a nonlinear since We then compute ⟨σ t [Γ TPA k ]⟩ k to obtain a scalar TPA rate for a given input power P. Repeating this process for a range of P, we obtain a largely monotonic curve P(⟨σ t [Γ TPA k ]⟩ k ), through which we can determine the input power necessary to induce a particular level of TPA-induced nonlinearity.In that case, P NL = P(Γ S ).For FCD and the Kerr effect, obtaining a scalar characteristic capturing the network's relevant response is identical, except that ∆ω k is used instead of Γ TPA k .In figure 5(b), we see that P NL scales linearly with the number of cavities provided N ⊥ is limited.Indeed, for large N ⊥ , some supermodes can localize far from the input waveguide and couple to input much weaker than those near it.This creates a nonlinearity imbalance in the reservoir, and to reach ⟨σ t [Γ TPA k ]⟩ k = Γ S , more optical power is needed.
Nonlinearity can result in a reservoir operating in a chaotic regime, which, for most purposes, is harmful to computation.Chaos corresponds to the situation where a network's state is sensitive to infinitesimal changes in the input.Hence, the unavoidable presence of noise in hardware would render computations not reproducible for even perfectly identical inputs.To characterize this effect, we carry out a consistency analysis [20].Consistency is a property describing the reproducibility of a dynamical system's (here, the reservoir's) responses under an injection of slightly different input signals [21].We generate a set of inputs in the form s n (t) = s 0 (t) + ∆s n (t) where s 0 (t) and ∆s n (t) are values sampled from a complex-valued white noise distribution filtered with a Butterworth 8th-order pass-band filter.s 0 (t) is the 'true input' , while ∆s n (t) emulates the impact of noise in an experiment, and we attain a signal-to-noise ratio of 20 dB by scaling the amplitude of ∆s n (t).Each s n (t) is injected into a separateidentical reservoir.The absolute value of correlation between trajectories of a kth cavity of ith and jth reservoirs z i k (t) and z j k (t) is then computed [43] where E[A] is the expected value of A. The consistency for this pair of reservoirs is a root mean square (RMS) across all their cavities γ ij = ⟨(γ ij k ) 2 ⟩ k , and, equally, the consistency of system γ is the RMS of γ ij of all combinations of i and j.It is typically desirable that the consistency is high [44].
In figure 6, we show the consistency and the dimensionality of reservoirs with different nonlinearities as a function of input power.Noteworthy, we find that a stronger TPA reduces the dimensionality while the consistency remains constant and close to unity.The drop of the dimensionality is a consequence of the F Γ reduction, as increasing TPA-induced losses increase Γ S .A high consistency could be explained by the negative-feedback nature of TPA, where the system's response to a stronger input is an increased damping that often stabilizes the system.According to the data in figure 6, FCD and the Kerr effect increase the dimensionality, but eventually, the system loses consistency for too high input power.Generally, the consistency decreases when ⟨σ t [∆ω k ]⟩ k approaches Γ S , i.e. when input power surpasses P NL .The effect causing the dimensionality increase is the nonlinear shift of individual supermodes, which otherwise might overlap in the linear regime.We found that, generally, other system parameters have a relatively small impact compared to P NL .
Interestingly, the nonlinear shift of supermode frequencies also compensates to some degree the negative impact on the dimensionality caused by a mismatch between BW and RBW (see figure 6(d)).This could be explained by the fact that a cavity might belong to multiple supermodes.When a cavity resonance shifts, it also causes multiple supermodes to shift, even those that are not excited in a linear regime.If the shift is strong enough, these supermodes can end up inside the input bandwidth and receive input.Moreover, as BW reduces, a stronger nonlinearity is needed to move resonances into the input bandwidth, and at some point, the concept of supermodes stops being applicable.
To conclude, nonlinearity, in general, has a non-trivial effect on a reservoir.Importantly, this sensitively depends on the particular nonlinear effect that is leveraged.On one hand, the dimensionality is reduced by TPA but increased by FCD and the Kerr effect.On the other hand, using TPA, the reservoir stays consistent, while FCD and the Kerr effect harm the consistency.However, in the case of FCD and Kerr nonlinearities, the dimensionality increase comes before the consistency loss.Hence, there is an interval of optimal input power that allows for a better dimensionality with a high consistency.

Computing performance
We evaluate the performance when applying an evanescently coupled cavity array reservoir to computing tests.Here, we consider the Mackey-Glass prediction task, a commonly used benchmark in the RC literature, and the optical signal recovery task that demonstrates a relevant practical application of such a system [1].

Mackey-Glass prediction task
The Mackey-Glass equation is a first-order time-delayed differential equation: where we choose τ = 17, α = 0.2, g = 10 and γ = 0.1, with which ξ(t) exhibits a moderately chaotic behavior [45].Integrating equation ( 14) with the Euler method with a timestep of 0.17 and downsampling the result with a ratio of 3/0.17, we obtain a discrete series ξ(t n ).The task is to predict y tgt n = ξ(t n+δ ) with ξ(t n ) as an input and a given δ > 0. Both memory and nonlinearity are required for a successful prediction, which made this task a popular RC benchmark.The prediction performance is measured with the normalized root mean square error (NRMSE) [28]: Nevertheless, the reservoir used is time-continuous; we then convert ξ(t n ) into a continuous signal by a zero-hold interpolation.This signal is used to modulate the amplitude of the optical carrier that is injected into the reservoir as an input.To find the output z k (t) are sampled at t n .Due to the nature of ξ(t n ), the Fourier spectrum of optical input is uneven (see figure 7(a)).As a result, the excitation of supermodes is not uniform, with a non-trivial impact on the reservoir.To visualize the impact of nonlinearities, we consider a linearized Jacobian of the reservoir where the last three terms correspond to TPA, FCD and the Kerr effect, with δ km as the Dirac delta.In figure 8(d), we show trajectories of Ĵ eigenvalues of the system while driven by the input.As expected, TPA increases eigenloss and modifies eigenfrequencies, most likely due to modifications to the spatial shape of the supermodes.With FCD, these modifications become notably stronger; in the case of the Kerr effect they are the dominant feature.This behavior is notably different from ESNs [46].The inequality of nonlinearity strength is also evident.A supermode near a Fourier peak with a strong connection to an input waveguide is Here, BW < RBW, which goes against the conclusions of section 4. High peaks in the input spectra correspond to a dominating harmonic (ξ(t) largely oscillates, see figure 8(c)); however, for a close prediction, it is important to also capture high-frequency irregularities despite their weak amplitude.
subject to several times stronger nonlinearity than others.This poses a challenge for FCD and the Kerr effect, as a nonlinearity concentration in a few supermodes leads the reservoir to chaos before nonlinearity in other supermodes becomes strong enough.
Besides the obvious relevance of the NRMSE for performance, the output signal power also becomes a concern in noisy systems.Even though noise was not present in simulations, we also consider a power penalty as an additional performance metric: L p = 10 log(⟨|y(t)| 2 ⟩ t /⟨|s(t)| 2 ⟩ t ), where ⟨. . .⟩ t is an average over time.
Figure 8(a) shows a considerable improvement with all three nonlinearities.However, with frequency-shifting nonlinearities, the performance is lost after the normalized average input power approaches unity.This is consistent with section 5, as the training becomes ineffective with a loss of consistency.However, in figure 8(b), we see that before the consistency is lost, the output power is higher than in the TPA case, which is likely due to an increase in dimensionality (see figure 6).For an 8 × 3 reservoir with TPA and FCD, an NRMSE of 1.5 • 10 −1 was achieved (see figure 8(c)), while for TPA 8 • 10 −2 , considering power penalty restriction.For comparison, a performance baseline is NRMSE(u n , u n+3 ) ≈ 1.1, an all-optical time-multiplexed reservoir with 330 virtual neurons based on a semiconductor laser achieved NRMSE of 0.23 [47], a simulated time-multiplexed reservoir based on a nonlinear MRR with 25 virtual nodes had NRMSE of 7.2 • 10 −2 [48].However, an ESN with 400 neurons outperforms by a large margin, predicting approximately 20 steps ahead with an NRMSE of 1.2 • 10 −4 [49].Nevertheless, the proposed reservoir performs such computation fully-optically in real-time.
Simulations have shown that prediction performance improves with the number of cavities (see figure 9) for N ⊥ = 1 or 3 and for all three nonlinearities.With FCD and the Kerr effect, N ⊥ = 1 and N ⊥ = 3 reservoirs have shown similar performance, while with TPA N ⊥ = 3 was better than N ⊥ = 1.The lowest error with an NRMSE of 0.037 and L p of −25 dB is achieved for 48 cavities with N ⊥ = 3.We also note the importance of the evanescent coupling by comparing our system with a set of independent cavities, a linear regime of which was considered by [50].For a fair comparison, resonances of cavities were set to resonances of supermodes of a corresponding N × 1 reservoir.With all nonlinearities, coupled cavities provided considerably better performance, likely due to a nonlinearity-induced supermode mixing.
The result in figure 8(a) seemingly contradicts the conclusions in section 5: TPA reduces the dimensionality but improves the prediction accuracy.The answer is that the dimensionality is not a definitive performance metric.In the case of a highly nonlinear task of Mackey-Glass prediction, it is likely that nonlinear transformations of neuron trajectories are beneficial to the prediction accuracy, even at the cost of dimensionality.

Optical signal recovery
An optical signal propagating through an optical fiber is subject to chromatic dispersion and the Kerr nonlinearity, among other effects.Electric field E(z, t) in the fiber can be described with a nonlinear Schrödinger equation [51]: where we choose α loss = 0.2 dB km −1 , β 2 = 21.7 ps 2 km −1 , γ = 1.3 W −1 km −1 , E(0, t) = s 0 (t) and s 0 (t) is the transmitted signal.Because of distortion, after propagating through a fiber of length L, the signal at the  1), but for TPA-only dω/dN = 0, while for the Kerr effect, β2 = 0 and dω/dI < 0 is chosen arbitrarily.The regularization constant was increased until the power penalty reached −25 dB.(c) Three-step prediction with TPA+FCD case at an optimal input power, here NRMSE is 0.15 and Lp ≈ −22 dB.(d) Trajectories of linearized Jacobian J km for (c).receiver side E(L, t) = s(t) differs from s 0 (t).Examples of the distortion are shown in figures 10(a) and (b).If s 0 (t) is a carrier of encoded symbols, the reservoir's task is to recover them from s(t).
We generate a random sequence s 0 (t n ) of symbols encoded with on-off keying at 25 GHz with eight samples per bit and filter with a first-order low-pass filter with 0.6 • 25 GHz cutoff frequency (see figure 10(a)).Then equation ( 17) is solved to obtain distorted signal samples s(t n ), which are linearly interpolated to s(t) and injected into the reservoir.The reservoir parameters are the same as in the Mackey-Glass task (see table 1) except for µ = 40 GHz, chosen according to the s(t) spectrum (see figure 10(b)) and nonlinear effects disabled.As in the previous section, µ is increased for N ∥ × 1 reservoirs to  equalize RBW.For RC training, we set y tgt (t) = s 0 (t − τ ), where τ is an output delay and sample s 0 (t), s(t) and z k (t) once per bit in its middle (see dots in figure 10(a)).For RC testing, y(t) and y tgt (t) are similarly sampled to produce y n and y tgt n and a linear classifier C is used.Performance is measured with the symbol error rate (SER): SER(y, where N t is the number of transmitted symbols.However, SER cannot be lower than 1/ min c n 1[C(y tgt n ) = c] , which in our case is approximately 2/N t .The presence of an output delay is motivated by group velocity dispersion that causes different input signal frequencies to arrive at a receiver at different times.By delaying y tgt (t), we allow the reservoir to accumulate more information about the current symbol before producing an output.Simulations show that it leads to a better separation of output samples at a given power penalty.For used fiber parameters, a delay of 4 bits is adequate and is used throughout the section (see figure 11).
A larger reservoir can handle a distortion from a longer fiber (see figure 12).The performance of an N ∥ × 1 grid of cavities was comparable to an N ∥ × 3 grid and outperformed independent cavities.

Discussion
The proposed setup can successfully be used as an RC.However, there are notable differences that need discussion.
First, an ESN connection matrix is typically random and sparse [28], and two neurons are unlikely to have a connection equally strong in both directions.In the photonic case, if two cavities are evanescently coupled, the connection is reciprocal and coupling coefficients are negative complex conjugate of each other (see figure 13).
Moreover, in ESNs, a nonlinear function encompasses the input and the neuron connection terms (see equation ( 6)).In our reservoir, however, a nonlinearity depends only on the current reservoir state (see equation ( 1)).Nonlinearities themselves are different as well.Typically, in ESNs, a sigmoid or hyperbolic tangent is used.It adds a saturation of neuron activation, which is, effectively, an increase of loss.During a typical computation, ESN's linearized Jacobian eigenvalues largely move toward the z-plane origin (corresponding to an increase of loss) and go back with, perhaps, slight frequency deviations [46].In the photonic case with TPA as the only nonlinearity, the behavior is similar, but with FCD and the Kerr effect, the trajectories of eigenvalues change considerably (see figure 7(b)).Nonetheless, all three nonlinearities positively contribute to performance (see figure 8(a)).
Section 4 proposed a relation between cavity parameters and their effect on reservoir dimensionality.Whether these requirements could be met with the available technology needs to be discussed.First, consider that Another requirement demands the reservoir to remember previous inputs: an optical timescale should be larger than an input signal timescale, i.e.
which is already satisfied by equation (18).For example, consider a reservoir with 20 cavities.To process a signal with BW = 20 GHz, an approximate lower bound for Γ o would be 2π GHz, corresponding to an intrinsic Q-factor of ω 0 /Γ o ≈ 0.2 • 10 6 .This value is rather high but not impossible.An intrinsic Q-factor of 0.7 • 10 6 was demonstrated in passive GaAs PhCs, with 10 6 considered achievable [36].In an AlGaAs MRR, a Q-factor of 1.5 • 10 6 was demonstrated [52] and in a silicon PhC, a Q-factor of more than 11 • 10 6 [53].The low-loss silicon nitride allowed for an even higher Q-factor of 37 • 10 6 in an MRR [54].Next, consider a nonlinearity timescale.TPA and the Kerr effect respond almost instantaneously to a change of electric field, similar to an ESN.However, FCD is also affected by the free electron recombination rate (see figure 5(a)), which should be comparable to a rate at which an optical signal in a cavity changes, i.e. half of its bandwidth.In our case, each cavity belongs to multiple supermodes, and thus, its bandwidth approaches the RBW, which is close to BW.The electron recombination rate of silicon MRRs was shown to be approximately 2 GHz [55], i.e.BW ⪅ 4 GHz would be supported.The recombination can be accelerated to almost 20 GHz with a reverse-biased p-i-n junction embedded in a cavity [55].Due to a higher surface area, in PhCs, the recombination can reach 10 GHz for silicon [56] and 100 GHz for GaAs [36].
Until this point, we assumed that cavities' resonances are aligned ∀k = 1..N ω k = ω 0 .However, this might not be the case due to fabrication tolerances.For directly coupled cavities to interact, resonances should be sufficiently close.A weak intercavity interaction leads to two issues, the first being a dimensionality reduction when some cavities are not coupled to the input waveguide directly (e.g. for N ⊥ > 1).Since such cavities can only receive signals through other cavities, without an interaction, they stop receiving signals and hence cannot contribute to the dimensionality, which is demonstrated in figure 14 for 8 × 3 and 24 × 1 linear MRR reservoirs.As a resonance disorder is increased, at first, D increases as the supermode overlap is alleviated but decreases afterward.The 24 × 1 reservoir is affected less than the 8 × 3 one since in the former, 12 cavities are coupled to the waveguide (due to MRR mode direction matching, see figure 1(b)), but only 4 in the latter.If MRR mode direction was not considered and both MRR modes were excited, all cavities of a 24 × 1 MRR reservoir are coupled to a waveguide.In that case, a further increase of resonance disorder simply increases F Γ and D will eventually reach unity.A similar outcome is expected for a PhC reservoir.
The second issue is the weakening of supermode mixing, i.e. delocalization.When cavities interact well, supermodes span multiple cavities; otherwise, they isolate in individual cavities.Because of that, a nonlinearity of an individual cavity can only affect a single supermode.As shown in figure 8(d), nonlinearity could not contribute to computing performance as much in this case.Here, N ⊥ > 1 cases could be affected to a lesser degree than N ⊥ = 1, since for N ⊥ > 1, each cavity has 2-4 neighboring cavities, while only 1-2 for N ⊥ = 1 and the probability of resonance mismatch with all neighbors is lower with more neighbors.The weakening of supermode mixing is shown in the inset of figure 14.Each heatmap represents a matrix Λ where Λ km is an intensity of mth supermode in the kth cavity.For clarity, each mth column of Λ is normalized to unity.We see that when σ k [ω k /2π] < 0.5µ, supermodes span multiple cavities but only a few otherwise.The resonance disorder can be mitigated with the thermo-optic effect, but for more cavities, it becomes challenging.In addition to that, a close proximity of cavities results in a strong thermal crosstalk.However, the resonance disorder can be less severe when cavities are spatially close [57].

Implementation
The proposed system can be implemented in an integrated photonic platform, for instance silicon-on-insulator.The advantage would be a very compact geometry and relatively large nonlinearity, where three nonlinearities considered (TPA, Kerr and free carrier-related) will contribute.On this platform, MRR with a radius of 5 µm could be implemented [58], allowing for a neuron density of 10 4 neurons mm −2 .Alternatively, silicon nitride could be used, offering much lower losses but requiring a larger chip footprint [59].The nonlinear response will be pure Kerr and strong enough if the supermode lifetimes are adjusted to large values through coupling to input and outputs.Both platforms offer thermo-electric control for the readout, while integrated detectors are available with the silicon platform.Other technologies could be considered, particularly those based on the hybrid integration of materials such as III-V on silicon.
In a fully integrated system, cavities would also be coupled to readout waveguides.This, however, can be challenging to realize for cavities inside the grid since the compact cavity arrangement leaves no space for such waveguides.One could consider using N ∥ × 1 or 2 × N ⊥ grids, where all cavities are accessible.An alternative could be using multiple photonic layers demonstrated in the silicon nitride-on-silicon platform [60] or three-dimensional waveguides [61].
Apart from the internal layer of an RC, the readout's scalability also requires consideration.A common strategy for coherent signals is Mach-Zehnder interferometers with two phase shifters, typically based on the thermo-optic effect (TO).Such phase shifters require a constant supply of electrical current, and their state-of-the-art chip footprint is 109 × 21 µm [62].If we were to estimate the neuron density of an RC with a readout, it would be limited to 200 neurons mm −2 .There are a few alternatives to TO, for example, phase shifters on the micro-electro-mechanical systems platform, although they do not reduce the footprint by themselves with state-of-the-art 100 × 45 µm 2 , which is twice as wide as TO-based ones.However, such shifters do not suffer from thermal crosstalk, which would allow for a more compact design.They also require three orders of magnitude less power [62].Phase shifters based on plasmonic nonlinear polymers are even more compact, with the device length being 29 µm [63].The footprint of phase change materials is just a few square microns [64], which, combined with non-volatility, makes them a very attractive option for a readout implementation, even with limited switching resolution [65].

Conclusion
We have applied a network of evanescently coupled nonlinear nanophotonic cavities for an RC.Such coupling allows for a lower chip footprint by several orders of magnitude, excluding readout.We have derived general conditions under which such a system attains a high dimensionality, an important property for RC.The role of nonlinear effects present in integrated microcavities, namely TPA, FCD and the Kerr effect, were studied.While their impact on the reservoir dimensionality and the consistency is different, all three nonlinearities positively contributed to the performance in the Mackey-Glass prediction task.The reservoir was also successful in the recovery of an OOK-encoded optical signal.In both cases, the reservoir has shown a fully optical autonomous real-time computation in the gigahertz range.
Several potential improvements to this design could be proposed.For example, one could use multiple cavity resonances to operate on multiple channels simultaneously.Another option is to overcome the dimensionality limitation due to a finite Q-factor.If a reservoir has an upper limit of N cavities, one could consider using two such reservoirs, with one receiving a delayed copy of the input.This way, the dimensionality of the setup could be expected to increase two-fold.The goal is to find the matrix representation of a system with n + 1 cavities based on a system with n cavities.We first eliminate the flux between the cavities Consequently, new outputs are

Appendix B. Comparing evanescent and dissipative coupling
Consider a linear case of two evanescently coupled lossless cavities.Their state will be described with an equation [23]: where µ is a coupling coefficient and * is a complex conjugation.Then a depending on φ 12 and φ 21 there could be an interaction ranging from the one similar to evanescent coupling to its total absence, i.e. light escapes from cavities to the input waveguide.In practice, however, phases are effectively random.As a result, we choose the evanescent coupling as a neuron connectivity method.

Appendix C. Estimation of reference power for a single cavity
Assuming a monochromatic excitation with an input power P NL and a steady state, the power that the cavity receives should be equal to the power lost: (C.1) For a TPA-dominated case, the reference power is found when Γ TPA (a) = Γ S , then For an FCD-dominated case in a steady state, assuming that a detuning does not impact the power absorbed by the cavity:

Figure 1 .
Figure 1.(a) Proposed photonic reservoir computer, (b) MRR mode direction matching and (c) echo state network.

Figure 2 .
Figure 2. Supermodes of evanescently coupled microcavities for various grid geometries.Bottom figure shows a histogram of supermode frequencies.

Figure 3 .
Figure 3.The dimensionality of a linear photonic reservoir w.r.t.(a) input signal bandwidth, (b) average supermode spacing and (c) input waveguide coupling strength.Reservoir parameters are given in table 1, and by default, BW > RBW, but in (a) µ increased to 80 GHz to increase the upper limit of the dimensionality, and in (b), each point corresponds to specific µ and Γ o .In (c) the legend shows different grid geometries.

Figure 4 .
Figure 4. (a) Scalability of linear reservoir dimensionality w.r.t.optical losses.Dimensionalities of all possible combinations of N ∥ and N ⊥ for given N are shown as histograms.Reservoir parameters are shown in table 1, with µ = 80 GHz, reduced ko and BW > RBW.(b) Impact of F Γ and grid geometry on the dimensionality.Both figures use the same dataset.

Figure 5 .
Figure 5. (a) Comparison of material platforms w.r.t.PNL and an operating frequency of a single MRR or PhC cavity, details in text.Here, ALD is atomic layer deposition.For references, see table 2. (b) Scaling of PNL w.r.t. the number of cavities.Reservoir parameters are given in table 1; BW matches RBW.

Figure 6 .
Figure 6.Effect of (a) TPA, (b) TPA and FCD and (c) the Kerr effect on reservoir's dimensionality and consistency.Reservoir parameters are given in table 1.(d) Each point corresponds to an input power at which γ = 0.98.

Figure 7 .
Figure 7.The normalized Fourier transform of the input signal (blue) with reservoir resonances overlaid (orange).Here, BW < RBW, which goes against the conclusions of section 4. High peaks in the input spectra correspond to a dominating harmonic (ξ(t) largely oscillates, see figure8(c)); however, for a close prediction, it is important to also capture high-frequency irregularities despite their weak amplitude.

Figure 8 .
Figure8.Impact of nonlinearities on (a) prediction error and (b) power penalty on a 3-step Mackey-Glass prediction.For all cases, GaAs MRR parameters are used (see table1), but for TPA-only dω/dN = 0, while for the Kerr effect, β2 = 0 and dω/dI < 0 is chosen arbitrarily.The regularization constant was increased until the power penalty reached −25 dB.(c) Three-step prediction with TPA+FCD case at an optimal input power, here NRMSE is 0.15 and Lp ≈ −22 dB.(d) Trajectories of linearized Jacobian J km for (c).

Figure 9 .
Figure 9. Scalability of three-step prediction performance with various geometries and (a) TPA (b) TPA with FCD and (c) the Kerr effect.The regularization constant was increased for each point until the power penalty reached −25 dB.The minimum NRMSE reached (a) 0.037, (b) 0.070 and (c) 0.087.

Figure 10 .
Figure 10.(a) Time traces of the real part of the original signal (dashed line) and distorted signal after 150 km of optical fiber (solid line).Black dots show bits' middles and act as the training target; blue and orange are only for visualization and show distorted signal samples, with each color corresponding to each symbol.(b) Normalized Fourier spectrum of the distorted signal after 150 km of fiber (blue) with 8 × 3 reservoir resonances overlaid (orange).(c) Constellation of distorted signal samples after 150 km of fiber, colors correspond to (a).

Figure 11 .
Figure 11.(a) Effect of output delay (shown on legend) on output symbol separability measured with NRMSE at a given power penalty, controlled with regularization parameter.Here, L = 150 km and the signal is processed with an 8 × 3 linear reservoir.(b) and (c) Constellations of output symbols at Lp ≈ −20 dB with 0-and 4-bit delay, respectively.Figure 10(c) shows the constellation of input signal samples.

Figure 12 .
Figure 12.Performance scaling of optical signal recovery w.r.t.reservoir size.The reservoir is (a) N independent MRRs, (b) N ∥ × 1 grid and (c) N ∥ × 3 grid of linear MRRs.The regularization parameter was increased for each computation until the power penalty reached −25 dB.

Figure 13 .
Figure 13.Neuron connectivity graph of (a) a typical ESN and (b) a photonic RC with a grid of cavities.Connection strength is represented by arrow thickness.

Figure 14 .
Figure 14.Impact of cavity resonance frequency disorder on the dimensionality.Reservoir parameters are given in table 1, BW > RBW.Each dot corresponds to a N ∥ × N ⊥ (shown in legend) linear MRR reservoir with resonances ω k ∈ N (ω0, σω) with various σω.Insets represent supermode intensity distribution over cavities Λ.

T n 1 e 1 e 1 −* 2 e 2 −e −iβd κ * 1 Figure A1 .
Figure A1.Scheme of waveguide-cavity interaction.A rectangle with an represents a row of n cavities coupled to the waveguide.

Table 1 .
Default reservoir parameters.Most parameters are omitted as they only impact the required input power.

Table 2 .
t. PNL and an operating frequency of a single MRR or PhC cavity, details in text.Here, ALD is atomic layer deposition.For references, see table 2. (b) Scaling of PNL w.r.t. the number of cavities.Reservoir parameters are given in table 1; BW matches RBW.References used in figure 5.