Neural networks are one of the disruptive computing concepts of our time. However, they fundamentally differ from classical, algorithmic computing. These differences result in equally fundamental, severe and relevant challenges for neural network computing using current computing substrates. Neural networks urge for parallelism across the entire processor and for a co-location of memory and arithmetic, i.e. beyond von Neumann architectures. Parallelism in particular made photonics a highly promising platform, yet until now scalable and integratable concepts are scarce. Here, we demonstrate for the first time how a fully parallel and fully implemented photonic neural network can be realized by spatially multiplexing neurons across the complex optical near-field of a semiconductor multimode laser. Discrete spatial sampling defines ∼90 nodes on the surface of a large-area vertical cavity surface emitting laser that is optically injected with the artificial neural networks input information. Importantly, all neural network connections are realized in hardware, and our processor produces results without pre- or post-processing. Input and output weights are realized via the complex transmission matrix of a multimode fiber and a digital micro-mirror array, respectively. We train the readout weights to perform 2-bit header recognition, a 2-bit XOR logical function and 2-bit digital to analog conversion, and obtain and 2.9 × 10−2 error rates for digit recognition and XOR, respectively. Finally, the digital to analog conversion can be realized with a standard deviation of only 5.4 × 10−2. Crucially, our proof-of-concept system is scalable to much larger sizes and to bandwidths in excess of 20 GHz.
Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 license. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
Artificial neural networks (ANNs) were conceptualized already in 1943, with providing a 'logical calculus of the ideas immanent in nervous activity' as the original motivation . In line with this spirit, nonlinear nodes standing in as neurons form the ANN with the help of connections, much like synapses, dendrites and axons connect biological neurons. Such a parallel architecture distributes computing across a usually large number of simple nonlinear nodes, and programming becomes optimizing the network's topology. Besides several important breakthroughs such as error-back propagation to optimize, i.e. train an ANN , it was the increasing availability of powerful computing hardware  which turned ANNs into a technology of primary societal , scientific  and economic relevance.
High-performance computing hardware is therefore a must for ANNs. The reason lies in a core aspect of networks: the number of possible connections scales quadratically with the number of network nodes and every new computational result requires querying all network connections. Integration of parallel ANNs in electronic circuits is highly challenging due to: (a) the strong energy penalty for switching the large number of signaling wires; and (b) a fundamental incompatibility with two-dimensional (2D) integration . Thus, it has so far been impossible to integrate parallel and large ANNs, and application specific circuits still heavily rely on serial routing.
Photonics is a promising alternative as it mitigates much of the energy cost associated to information transduction. This potential advantage over electronics was realized decades ago [7, 8] and is detailed in . Reducing energy deposition into the substrate could enable novel three-dimensional integration  to ultimately realize scalable integration of parallel ANNs . Optical ANNs did outperform their electronic counterparts already at a previous stage , and their superior scaling has recently again been demonstrated . Finally, ANNs implemented in silicon photonics [14–16] promise ultra-high speed, ultra low latency and the benefits of integration leveraging the tools of silicon and 2D lithography.
However, the vast majority of large scale and parallel photonic ANN demonstrations are not standalone and autonomous. Most proof-of-concepts lack one or several of an ANN's constituents, require substantial involvement of a traditional computer in the creation of the ANN's state, are not parallel or run on substrates too exotic for realistic integration in mid-term technological platforms. Here, we address all these points and experimentally demonstrate a fully parallel photonic reservoir computer [17, 18] with 90 neurons that we train and operate in realtime with minimal interference from a control computer. Neurons are defined at periodic and adjacent spatial sampling positions across the complex multimode field of an injection locked large area vertical cavity surface emitting laser (LA-VCSEL). The LA-VCSEL has a diameter of ∼20 µm, lases around 920 nm and follows design principles developed for high bandwidth and high efficiency VCSEL arrays . It was fabricated via standard ultraviolet contact mask lithography, dry (inductively coupled plasma-reactive ion etching) mesa etching, and thin-film metal evaporation and lift-off methods in a university cleanroom [20, 21]. All the ANN's connections are implemented in photonic hardware. The complex transfer matrix of a multimode fiber (mm-fiber)  couples injected information, which is Boolean encoded on a DMD, to the LA-VCSEL. Intra-cavity fields and carrier diffusion intrinsic to LA-VCSELs recurrently connect the photonic neurons . Finally, trainable readout weights are encoded on a digital micro-mirror device (DMD)  which is imaged on a large-area detector to directly provide the ANN's computational result.
We operate our recurrent photonic ANN in its steady state, and its bandwidth is limited by the input DMD's frame rate to ∼130 inferences per second. However, such semiconductor lasers can be modulated extremely fast, often in excess of 20 GHz [20, 25], which therefore corresponds to the potential intrinsic speed of our system. We train the readout weights to perform 2-bit header recognition, a 2-bit XOR and 2-bit digital to analog conversion (DAC). For the 2-bit header recognition our systems achieves an excellent error rate smaller than 0.9 × 10−3, for the more demanding XOR it is still a respectable 2.9 × 10−2. Finally, the digital to analog conversion can be realized with a standard deviation of only 5.4 × 10−2. In this context it is important to highlight the fully analog nature of our photonic ANN.
2. A photonic ANN in a complex and continuous nonlinear media
Parallelism in our photonic ANN is based on spatial mode multiplexing, and the correspondence between the ANN's conceptual sections and the relevant photonic hardware is schematically illustrated in figure 1(a). The collimated beam of a tunable external cavity laser (ECL, Toptica CTL 950) illuminates an area of DMDa (Vialux XGA 0.7" V4100) with a Gaussian cross-section of 90 pixels in diameter. A DMD comprises mirrors which can be discretely switched between ±12∘, but only mirrors in −12∘ orientation inject information into the ANN. Input information is therefore a Boolean matrix. Within the illuminated area we define a ring that is constantly active. This DC-locking signal stabilises the LA-VCSEL's potentially autonomous free-running dynamics and hence allows consistency of response . Inside the remaining illuminated area we define n sub-sections, much like wedges of a pie, with which we encode input information for digit d in an n-bit binary representation.
In figure 1(a), input information corresponds to injecting digit d = 9 in a n = 4 bit representation. The spatially-modulated beam is collected by a mm-fiber that implements complex input weights through optical mode mixing induced by birefringence and scattering . Based on the fiber's V number, we approximate that our input fiber supports between 30 and 40 modes, i.e. an input matrix of equal size.
The near-field at the mm-fiber's output facet is injected into the LA-VCSEL. Semiconductor lasers react highly nonlinearly to optical injection, and previously we showed that this nonlinear response can be harvested to implement the nonlinear nodes of an ANN [26, 27]. The feasibility of this concept was shown numerically for 33 GHz . However, here each neuron is physically implemented in parallel via the LA-VCSEL's complex and high-dimensional multimode field. Optical diffraction of the LA-VCSEL's intra-cavity field as well as carrier diffusion induce interactions between different locations, i.e. coupling between our photonic nodes . Dynamics induced by optical injection are governed by an equation of the type
where τ is the system's response time and is the LA-VCSEL's state at time t distributed across its surface. State comprises the local optical field amplitude, the optical phase and the electronic carriers divided between spin up and spin down populations . While equation (1) in its full expression is highly non-trivial, structurally it is comparable to the concept of Neural Ordinary Differential Equations , and we leverage state to implement a recurrent neural network. As we will operate orders of magnitude slower than any of a LA-VCSEL's inherent time scales (nanoseconds and below for relaxation oscillations and photon lifetimes), we only need to consider its steady state.
The spatially distributed emission power represents the neurons' readout states. This spatially continuous multimode optical field varies on scales of m, which allows for ∼60 features to fit inside the LA-VCSEL's aperture. For implementing programmable readout weights we image the laser's near-field onto DMDb (Vialux XGA 0.7" D4100), which applies Boolean output weights to . Mirrors of DMDb therefore define individual neurons through spatial sampling, and here we over-sample by ∼1.5 which creates in 90 photonic neurons. The weighted state is converted in output signal via an optical detector (det):
We image DMDb on the detector, and hence optical fields of individual DMD-mirrors do not spatially overlap. This makes the detected signal proportional to the weighted sum of powers. Finally, is normalized by its standard deviation and its mean is subtracted.
Evolutionary learning will be used to optimize only . and do not partake in task-specific optimization, and we therefore implement a reservoir computer . Output is recorded for discrete samples, hence is our learning batch-size and we posses labeled target values with . We compute mean square error at each learning epoch k, and at the transition from the single weight at position l(k) is inverted. If such weight inversion was beneficial () we keep this particular modification, if not we revert according to . In previous experiments on a spatial light modulator based setup  we found that for random selection of l(k) error (k) converges exponentially on average , and we have analytically shown that this form of convergence is task independent .
3. Experimental setup
The experimental setup is schematically illustrated in figure 1(b). The output of our fiber-coupled ECL is collimated via a 7.5 mm focal distance lens (Thorlabs AC050-008-B-ML). The surface of DMDa is imaged via two lenses (Thorlabs AC254-150-B-M, C110TMD-B) onto a 25 µm core diameter, one meter long mm-fiber (Thorlabs M67L01). DMDa displays input information with a frame rate of ∼130 frames per second, which currently sets the speed of our photonic ANN. Using a 25 mm focal length lens (AC127-025-B-ML) in combination with a MAG10 near infrared optimized microscope objective (MO, Olympus LMPLN10XIR), the mm-fiber's near-field at its output is imaged on top of our LA-VCSEL. The LA-VCSEL is imaged onto a CMOS camera (ids U3-3482LI-H) using a 100 mm focal distance lens (AC254-100-B-M). Within the same optical path we placed a non-polarizing 50/50 beam-splitter (Thorlabs CCM1-BS014/M), and the reflected signal is collected by a single mode fiber connected to an optical spectrum analyzer (OSA, Yokagawa AQ6370D). Additional non-polarizing 90/10 (Thorlabs BSX10R) and 50/50 (Thorlabs CCM1-BS014/M) beam-splitters direct optical injection and the LA-VCSEL's emission. Finally, we image the LA-VCSEL onto DMDb with a 100 mm focal length lens (Thorlabs AC254-100-B-M), resulting in a magnification of 5.6, and an additional lens (Thorlabs AC254-045-B-M) focuses the reflected signal on a detector (Thorlabs PM100A, S150C).
The LA-VCSEL design and fabrication are optimized to improve operation efficiency and bandwidth. Our device wafers are produced by metal-organic vapor-phase epitaxy (MOVPE) on heavily n-doped (001) surface oriented GaAs substrates. The epitaxial VCSEL structure consists of an n-doped GaAs buffer layer, followed by a Si-doped (n-doped) 37-period hybrid (two section) bottom distributed Bragg reflector(DBR), a half-wavelength (λ/2 optically thick) cavity, a C-doped (p-doped) 14.5-period top DBR, and heavily p-doped Al0.1Ga0.9As and GaAs current spreading layers each λ/2 optically thick. The bottom n-doped DBR is composed of 33 periods of AlGaAs with x = 0.05 high refractive index layers and x = 1.0 low refractive index layers, followed by four periods of x = 0.05 and x = 0.92 layers. The top p-doped DBR is similar but with alternating x = 0.1 and x = 0.92 AlGaAs layers. All DBRs include linearly graded interfaces between the alternating low and high refractive index layers to reduce series resistance. The optical cavity consists of five compressively strained ∼4.2 nm thick InGaAs quantum wells (QWs) surrounded by six strain-compensating ∼5.1 nm thick GaAsP barrier layers that are in tension. From double crystal x-ray diffraction measurements on QW calibration structures the net active region strain is close to zero (as designed). The ensemble of QWs and barriers is centered in the λ/2 optical cavity and the remaining layers of the optical cavity (on both sides of the QWs and barrier layers) are AlGaAs layers stepped from x = 0.4 to x = 0.92, surrounded by 20 nm thick (before oxidation) Al0.98Ga0.02As layers and followed by linearly graded AlxGa1-xAs layers decreasing in mole fraction (x) from x = 0.92 toward x = 0.1 in the top p-doped DBR and from x = 0.92 toward x = 0.05 in the bottom n-doped DBR. The two x = 0.98 layers reside inside the optical cavity. The two x = 0.98 layers are selectively thermally oxidized during device processing to form wave-guiding and current confining oxide apertures with tapered ends (that minimize scattering losses). The estimated oxidation length (from the top mesa edge inward) is roughly 9 µm, based on a series of selective thermal oxidation experiments on cleaved slivers of the VCSEL material—yielding a plot of the oxidation length (in from a mesa edge) versus the oxidation time. On resonance (in our mode simulations) the optical field intensity nodes that define the λ/2 optical cavity fall outside the two x = 0.98 layers. We fabricate top-surface-emitting VCSELs onto quarter wafer pieces with high frequency co-planar ground-signal-ground (GSG) contact pads suitable for on-wafer testing via a standard GSG electrical probe head with pin to pin spacing of 150 µm using the fabrication methods reported in .
4. Optical injection and ANN states
Optical injection into multimode LA-VCSELs has been extensively studied [33, 34], yet here we injection-lock such a device to a complex optical input field for the first time. Figure 2(a) shows the device's optical response under DC injection using with different rings. The top, middle and bottom rows were recorded using a ring with 15, 5 and 0 DMD mirrors thickness, respectively. The bottom row therefore corresponds to the laser without optical injection. The table shows the configuration of (left column), the spatial distribution of the optical injection (center column) and the injected LA-VCSEL's near-field (right column).
We bias the laser with mA, which is 1.28 times above its lasing threshold, where it emits a total of P = 1.68 mW. The injection laser's wavelength was tuned to the VCSEL's highest susceptibility to optical injection locking at nm. The free-running laser shows the expected multimode optical spectrum, and its near-field corresponds to high-order Laguerre–Gaussian modes . A DC-lock ring width of 5 pixels results in injecting mW, for which the laser's emission spectrum as well as its spatial emission profile are already notably perturbed. Widening the DC-lock ring to 15 pixels increases the injection power to mW, for which the device is fully locked to the injected field and its near-field significantly differs from its free-running near-field and from the optically injected profile. This already demonstrates the non-trivial and non-uniform transformation of optically injected patterns. Figure 2(b) shows the resulting photonic ANN state for the example of injecting the four possible configurations d ∈ [0, 1, 2, 3] of 2-bit symbols. Individual responses significantly differ for each case, which is a prerequisite to differentiate individual digits. Finally, we found the DC-lock ring of width 10 to be the smallest width resulting in a stable LA-VCSEL near-field. We used this for all the following computing experiments, as the smallest width allows us to use the maximum input dimensionality for input information encoding.
5. ANN performance of the LA-VCSEL
As starting point for our photonic ANN's performance evaluation we trained a single output classifier to identify one of the four digits in a 2-bit header. Training sequences of samples were comprised to 50% of the digits to be classified, and to 50% of the remaining three digits. The system was trained until the training error k dropped below , and in figure 3(a) we show the resulting convergence.
For all four tasks our system converged to below the targeted training error, however starting from strongly differing initial performances. As reason we assume the injection conditions consequence of the different configurations, where is lowest for d = 0 and highest for d = 3. Additionally, for the case of classifying bits the LA-VCSEL ANN needs to invert the power-dependency: the digit with the lowest injection power needs to be transformed into the highest power in the ANN's output. This observation is consistent with the initial classification errors for bits and starting from comparable levels, and with digit starting from the lowest level.
After successful training, the system was evaluated with an independent test-sequence comprising 1000 samples repeated ten times. Average testing errors correspond to the filled diamonds in figure 3(a). Again, they reflect the stability of the different locking conditions. Strongest injection (bits ) produces a testing error nearly identical to the training error, and lower injection powers result in stronger training vs. testing error divergence. Figure 3(b) shows the temporally resolved testing sequences for bits and . Output in the upper panel clearly shows stronger undulations for the case of classifying . The lower panel shows error as lines, while the stars show a classification performance after threshold, i.e. . Therefore, stars in figure 3(b) located at amplitude 1 are samples for which our photonic ANN misclassified the corresponding digit. The number of misclassified samples normalized by the test sequence's length results in the symbol error rate (SER), and we obtained SER = (2.5 × 10−3, , 1.1 × 10−3, ) for bits , , , , respectively. On average we therefore obtain SER.
As a final test on the 2-bit task we train the LA-VCSEL ANN to (a) implement an XOR logic operation; and (b) to implement a DAC. The XOR task is a classical benchmark to demonstrate that an ANN is capable to implement nonlinear transformations. As such it ultimately motivated the development of deep ANNs since single layer perceptrons  failed in this tasks . DAC places a similar, yet slightly easier challenge to an ANN. Figure 4(a) shows the error of the XOR task, illustrating as a line and as previously the misclassifications after thresholding as stars. The obtained SER for the XOR task is 2.9 × 10−2. Finally, the performance in DAC is shown in figure 4(b), with the red line as the ideal case. We find that the system excellently approximates the output target with a standard deviation of 5.4 × 10−2.
We demonstrated a spatially-extended ANN based on an injection locked LA-VCSEL with ∼90 artificial photonic neurons. All ANN connections are implemented in photonic hardware, and the network's operation is fully parallel. Trainable readout connections are realized using a DMD, while static input connections are obtained via the complex transfer matrix of a mm-fiber. Finally, the information input was provided via another DMD.
We trained and tested our photonic ANN's performance against tasks involving sequences of 2-bits, i.e. pattern classification, the XOR task and DAC. Our LA-VCSEL based ANN shows excellent performance in all, achieving long-term stability of the trained states during the much longer testing sequences. Finally, our demonstration is only a proof-of-concept. The ANN's present speed is orders of magnitude below its potential bandwidth, and the speed of convergence can most likely be significantly improved with more advanced training concepts.
Our photonic ANN's future potential relies on the fact that it is capable of fully parallel computing in the multi-GHz range, a potential speed-up by around seven orders of magnitude that would enable direct applications in ultra-fast systems . DMD's can be operated frames s−1 in realtime, making information injection the current speed bottleneck. In the future we will address this limitation by exploring alternative systems with same functionality, like e.g. ferroelectric spatial light modulators that readily enable frames s−1 or spatial light modulators based on arrays of vertical microcavities promising phase modulations at GHz bandwidths . Crucially, such improvement is compatible with scaling up the number output channels through creation of multiple, spatially non-overlapping copies of the LA-VCSEL's near-field. Noteworthy, the size of input, photonic ANN or output does not reduce the system's speed due to complete photonic parallelism. All weights implemented in our system are passive, making the system continuously operate at the DC power-levels of its components. inferences per second with 100 TBit/s input data rate at 1...10 W power consumption are therefore realistic, which is orders of magnitude beyond any electronic special purpose ANN computing hardware .
Data availability statement
The data generated and/or analysed during the current study are not publicly available for legal/ethical reasons but are available from the corresponding author on reasonable request.
The authors acknowledge the support of the Region Bourgogne Franche-Comté. This work is supported by the EUR EIPHI program (Contract No. ANR-17-EURE- 0002), by the Volkswagen Foundation (NeuroQNet I&II), by the French Investissements d'Avenir program, project ISITE-BFC (contract ANR-15-IDEX-03), by the German Research Foundation (Deutsche Forschungsgemeinschaft—DFG) via the Collaborative Research Center (Sonderforschungsbereich—SFB) 787, by the European Union's Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie Grant Agreements Nos. 713694 (MULTIPLY) and 860830 (POST DIGITAL).