Brought to you by:
Letter

Usage of machine-learning algorithms in inverse problem of light self-focusing in isotropic chiral medium with cubic nonlinearity

, and

Published 21 June 2022 © 2022 Astro Ltd
, , Citation N Yu Kuznetsov et al 2022 Laser Phys. Lett. 19 085401 DOI 10.1088/1612-202X/ac7135

1612-202X/19/8/085401

Abstract

Efficiency of convolutional artificial neural networks in the problem of finding nonlinearity parameters proportional to the local and non-local cubic dielectric susceptibilities of a medium and intensity of the incident radiation, fully describing the self-focusing character of elliptically polarized laser beams, is demonstrated. It is shown that realization of the predictive algorithm by the neural network can be improved by using complex structured light so that the error is lowered down to percent units.

Export citation and abstract BibTeX RIS

1. Introduction

While propagating inside a medium with cubic nonlinearity, an elliptically polarized laser beam with intensity greater than a certain threshold value undergoes self-focusing, which is caused by the gradient of the refractive index, induced by the electric field of this very beam. For almost 60 years since self-focusing was discovered, it has been studied in many papers (e.g. see the reviews [1, 2]). It was recently shown [38] in particular that during propagation of an initially uniformly polarized light beam, complex spatial distributions of the electric field vector can arise, including ones containing polarization singularities [9, 10]—the points of space, where at least one of the polarization ellipse characteristics can not be defined.

The development of finite-difference and finite-element numerical schemes for the solutions of partial differential equations and growth of the computational powers allows one to simulate with high precision the light–matter interaction in various nonlinear media. However, experimental determination of the nonlinearity parameters of the media, which is vital for the implications of the theoretical results, is difficult in some cases, and the interpretation of emerging intensity and polarization patterns in the light beam may be challenging. In recent times, to analyze solutions of nonlinear optical problems which include these hard-to-determine parameters, new methods based on machine-learning techniques, predominantly artificial neural networks, were introduced. For example, neural networks of various architectures were used in [11] as an alternative to classical finite-difference schemes for calculation of nonlinear dispersion of laser pulses in optical fiber. It was shown that in some cases neural networks provide an increase in computational speed, and are able to predict unknown parameters of fiber nonlinear response. A similar problem is solved in [12] by means of so-called physics-informed neural networks. The authors show their method to have a significant advantage in computational effectiveness compared to traditionally used finite-difference schemes. A more simple fully connected neural network is used in [13] to predict the spectral characteristics of a laser pulse passed through nonlinear optical fiber as well as the nonlinear properties of the fiber itself. However, to the best of our knowledge, these and other papers do not make use of neural networks that would account for the spatial distribution of the electric field of the laser beam and the possible dependency of nonlinear optical response on its polarization state.

The goal of the present paper is to determine the parameters that characterize the nonlinear optical response of an isotropic chiral medium by analyzing the transverse profiles of light beams after their nonlinear diffraction or self-focusing inside the medium. To solve this problem, we use an artificial neural network with convolutional architecture. The choice of this architecture is determined by evident similarity of the considered problem and the methods of computer vision, which nowadays are almost entirely based on the convolutional neural networks. Either numerically computed or experimentally measured, the transverse profiles of the light beams essentially are two-dimensional arrays of real or, in more general cases, complex numbers with high degree of spatial correlation. A standard raster image has a similar mathematical representation, being different only in the number of so-called input channels: each pixel is usually given by three independent numbers (red, green, blue in the RGB color scheme, etc), while in our problem the number of input channels is four, and these are the real and imaginary parts of two complex amplitudes of vectorial light beam with right-hand and left-hand circular polarization, respectively.

2. Formulation of the problem

Let the elliptically polarized beam propagating along the Oz axis fall normally onto the flat boundary of the nonlinear medium which lies in the plane z = 0. The transverse structure of the beam is considered rather general, and the slowly varying complex $E_{\pm}(x,y)$ amplitudes of its circularly polarized components have the following dependency on the transverse coordinates x and y in the plane z = 0:

Equation (1)

Here, E0 is a global amplitude of the beam and $A_{\pm}(x/w,y/w,z/L_d)$ are dimensionless complex amplitudes which are dependent on the normalized spatial coordinates, where w is a characteristic beam width and Ld is its diffraction length in free space. For the sake of simplicity, we will further use symbols x, y, and z to denote these normalized coordinates (i.e., use a measurement system in which $w = L_d = 1$). The self-focusing of elliptically polarized light in an isotropic chiral medium is described by the following pair of partial parabolic differential equations for the slowly varying amplitudes $A_\pm(x,y,z)$:

Equation (2)

where $\Delta_\perp = \partial_x^2 + \partial_y^2$ is a transverse Laplace operator. The constants $a_\pm$ and $b_\pm$ are proportional to the factor $E_0^2$, and also depend on the parameters of local and nonlocal nonlinear optical response of the medium.

The task of the developed neural network is to determine the parameters $a_\pm$ and $b_\pm$ by analyzing the transverse distributions of intensity and polarization of the light beam passed through the nonlinear medium (in the plane z = 1) for the input beams of certain types. To create the datasets for the neural network training, we used numerical solutions of the system (2) by standard finite-difference methods, subsequently calculating the transverse distributions on the $900\times 900$ grid in equidistant z-coordinates from range $[0;1]$. The parameters $a_\pm$ and $b_\pm$ were randomly chosen from ranges $[0; 2]$ and $ [0; 10]$, respectively. From the complete dataset used for neural network training and testing we excluded the combinations of $a_\pm$ and $b_\pm$, which led to growth of dimensionless power of the beam $|A_+|^2+|A_-|^2$ up to 100 and higher at least in one point of the transverse cross-section. This situations were treated as the development of nonlinear collapse, typical for the self-focusing problems, which can no longer be correctly described by the system (2).

To analyze the effectiveness of our neural network, we considered four types of the input beams:

  • (a)  
    Elliptically polarized Gaussian
    Equation (3)
  • (b)  
    Elliptically polarized super-Gaussian
    Equation (4)
  • (c)  
    Linearly polarized vortex
    Equation (5)
  • (d)  
    Poincaré beam
    Equation (6)

The amplitudes of all beams were normalized in such a way that the maximal intensities at the boundary of the medium (z = 0) were equal. Figure 1 shows the intensity and polarization distributions for each type of beam in the planes z = 0 (a, d, g, j) and z = 1 (the rest subfigures) for different values of the medium nonlinearity parameters. Red (empty) ellipses denote the right-hand elliptical polarization of the beam ($|A_+| \gt |A_-|$) in the points corresponding to the center of the ellipse. Left-hand polarization ($|A_-| \gt |A_+|$) is denoted by blue (filled) ellipses. The sizes of the ellipses are all shown in one scale and are proportional to the sizes of polarization ellipses of the propagating light. The orientation of the ellipses is given by the angle $\textrm{Arg}(A_+A_-^*)/2$. The values of the parameters $a_\pm$ and $b_\pm$ are given above each distribution. It can be seen that all four types of beams become inhomogeneously polarized (subfigures (b), (c), (e), (f), (h), (i), (k), and (l))—even the Gaussian and super-Gaussian ones, which initially have homogeneous polarization. The polarization of output beams can be linear, elliptical, and even circular in certain points of the cross-sections. In some cases, the handedness of rotation of electric field strength vector is reversed in the certain regions of the passed beam cross-section.

Figure 1.

Figure 1. Typical spatial distribution of intensity (shown in color) and polarization (shown by ellipses) of the input beams (a), (d), (g), and (j) and the transformed output beams (the rest subfigures). Red (empty) ellipses correspond to right-hand polarization, blue (filled) ellipses to the left-hand one. The profiles (a)–(c) show the evolution of the Gaussian beam, (d)–(f) the super-Gaussian beam, (g)–(i) the vortex beam, and (j)–(l) the Poincaré beam.

Standard image High-resolution image

An important feature of the beams (5) and (6) is the presence of optical singularity points in their cross-sections: phase singularity (the point of zero intensity) in the vortex beam (figure 1(g)) and polarization singularity (the point of pure circular polarization) in the Poincaré beam (j). The incident vortex beam is linearly polarized, but the direction of electric field oscillation in different points of its cross-section is not the same. The Poincaré beam is inhomogeneously elliptically polarized: in its cross-sections there are points with all possible polarization states, including points of linear polarization and a circular polarization singularity point on its axis. In the plane z = 1, the intensity and polarization distribution of these beams become even more complex (figures 1(h), (i), (k), and (l)) compared to regular beams, and, consequently, provide much more information on nonlinear light–matter interaction during self-focusing.

The process of neural network training is that of finding a large number of parameters, describing the connection magnitudes between artificial neurons. However, setup of a large-scale architecture of the neural network, known as hyperparameters, is also important. The hyperparameters include the number of layers of artificial neurons and the dimensions of these layers; the activation functions of the neurons, which give the relations between their input and output signals; and learning rate in the stochastic gradient descent algorithm and its modifications, etc. Of course, the predictions of the neural network based only on the data used for its training are not demonstrative, as they could be merely the result of 'remembering' the correct matches between input and output data, thus having no predicting potential. For this reason, the commonly used technique is to apply two datasets to tune the neural network: the training dataset and the test dataset. However, using only training and test datasets, one could unwillingly choose the hyperparameters of the neural network which are suitable specifically for these two chosen sets but less applicable to the general problem. To avoid this, we used three sets: the training set, which is explicitly used to tune the neural network, the validation set, used for comparing and choosing the best architecture of the network, and the test set, based on which a final conclusion of the neural network effectiveness is made.

For each incident beam (3)–(6), we performed numerical solutions of the system (2) for 2048 different combinations of nonlinearity parameters $a_{\pm}$ and $b_{\pm}$. The training set consisted of 1280 simulated cases, the validation set contained 512 other cases, and the remaining 256 cases were used for the final test of the neural network. The input and output distributions of electric field were given by four $256 \times 256$ arrays of real numbers, corresponding to the values of real and imaginary parts of the complex amplitudes $A_\pm$ in the nodes of a homogeneous square grid. We tested two variants of the neural network. The first one operated only with the transformed beams and had four input channels, and had to implicitly determine the transverse profiles of the incident beams in the learning process. The second variant operated with both incident and transformed beams and had eight input channels. This two approaches have to be theoretically equivalent, but it is not guaranteed that the optimal parameters will be found in the practical training routine. However, we found that differences in the neural network prediction accuracy are less than the precision of that accuracy measurement, and thus in the main part of the paper, only a four-input channel network has been used as a simpler and less resource-intensive one.

To evaluate the prediction accuracy of the network, we have used the mean relative error $\epsilon = \langle\delta/\sigma\rangle$ of the prediction of the different nonlinear coefficients and the error of their prediction relative to the mean values $\epsilon_m = \langle\delta\rangle/ \langle\sigma\rangle$. Here,

Equation (7)

Equation (8)

where the angle brackets denote averaging by all defined parameters of the medium for a dataset of interest, $a^{\prime}_\pm$ and $b^{\prime}_\pm$ are the predicted values of these parameters, while $a_\pm$ and $b_\pm$ are their true values. The errors of training, test, and validation datasets have been evaluated separately. Contributions of the errors of the artificial neural network to the mean value $\epsilon_m$ do not depend on magnitude of the nonlinear coefficients of the nonlinear response. Conversely, the contribution to $\epsilon$ value is much greater for media with small magnitudes of $a_\pm$, $b_\pm$ coefficients than for media with significant nonlinearity.

3. Architecture of the neural network and the prediction results

The data used to predict the parameters of a nonlinear medium by a neural network forms a rather large array, consisting of the results of the (2) solution with initial conditions (3)–(6). Strong correlation between the neighboring elements of the array makes it natural to use a convolutional architecture of the neural network, since it has been proven to be a useful approach in the conceptually similar computer vision. The final results of the artificial neural network functioning are four numbers $a_\pm$, $b_\pm$, which are not mutually connected. For this reason, the last layers of the network have fully connected architecture, since it does not imply any correlation between the elements of these four output values.

We used three convolutional layers with $4 \times 4$ kernel [14], stride 4 and no padding, successively lowering the field dimensions from 256 to 64, 16, and 4 points on each spatial axis. Then the $4\times4$ tensor obtained in these transformations was flattened to a single-dimensional vector and was processed by a final fully connected block of the artificial neural network, also consisting of three layers: two hidden layers with 40 and 60 neurons respectively, and an output layer of four neurons. These dimensions of the convolutional and fully connected layers along with the activation functions of the hyperbolic tangent (tanh) for the convolutional layers and Rectified Linear Unit [15] for the fully connected ones were chosen as a result of a large number of experiments with the test and validation datasets with the lasso algorithm [16] of the Optuna Python library. An overall scheme of the neural network is shown on figure 2.

Figure 2.

Figure 2. Neural network diagram. Convolutional layers are depicted in blue; fully connected ones in red. The numbers in their lower parts denote spatial dimensions of the correspondent layers and the numbers in the upper parts indicate number of channels for the convolutional layers and total numbers of the neurons for the fully connected ones.

Standard image High-resolution image

The neural network was trained with the Adam [17] optimization algorithm. After the training cycle, the network with the best result on a validation dataset, not used during the training phase, was chosen. Then we checked quality of this network prediction on a third (test) dataset of the nonlinear medium parameters. The full source code of the artificial neural network in Python along with all datasets being used, training routine, and a brief documentation is published at https://github.com/unerriar/igm-selffocus-nn in a form of Google collaboratory public notebook and is available without any license restrictions. Results of the training of four neural networks and their successive predictions on validation and test datasets, with simulations of the Gaussian, super-Gaussian, vortex, and Poincaré beams in the medium with Kerr nonlinearity are shown on the figures 3 and 4.

Figure 3.

Figure 3. Mean relative errors $\epsilon = \langle\delta/\sigma\rangle$ of the artificial neural network predictions in different datasets and incident radiation modes.

Standard image High-resolution image
Figure 4.

Figure 4. Errors of the artificial neural network predictions $\epsilon_m = \langle\delta\rangle/\langle\sigma\rangle$ relative to the mean true values of the nonlinear parameters in different datasets and incident radiation modes.

Standard image High-resolution image

Mean relative error and error relative to the mean, evaluated for the results of the neural network predictions on the test datasets, are the most representative, since these data has not been used in the training process either explicitly or implicitly. As can be seen in figures 3 and 4, assembling $\epsilon$ and $\epsilon_m$ distributions, the neural network predictions in case of usage of an elliptically polarized beam with a regular cross-section of Gaussian or super-Gaussian shape turned out to be rather mediocre. This can be attributed to the fact that amplitude ratio $|A_+|/|A_-|$ and phase difference between the circularly polarized components $A_\pm$ are constant in all points of the z = 0 plane. In this case, different sets of $a_\pm$, $b_\pm$ values can lead to nearly the same overall effects; and even though that consistency of the amplitude ratio can be disturbed due to diffraction, the situation may arise when these diffraction effects are mitigated in the bulk of medium by the nonlinear cross interaction of the circularly polarized components of the electric field. In this case, both human and artificial neural network will hardly be able to distinguish significant features in the distributions of the intensity and polarization parameters of the output radiation, which can be attributed to the influence of one particular nonlinear coefficient.

In contrast to the uniformly polarized beams, the singular ones (5) and (6) have in their cross-sections regions with very different ratios $|A_+|/|A_-|$ and phase differences between the $A_{\pm}$ components. Thus, a situation in which different sets of the parameters, characterizing the nonlinear response of the medium for all these regions, will give in the plane z = 1 similar results simultaneously appears to be very unlikely. That can be attributed to the fact of the huge (approximately tenfold) increase of the prediction accuracy of the neural network, shown in figures 3 and 4, where the values of $\epsilon$ and $\epsilon_m$ errors for two beams, vortex and Poincaré, can be seen to be much less than those of the regular (Gaussian and super-Gaussian) beams. Changing in the plane z = 0 ratio $|A_+|/|A_-|$ and the phase difference between $A_{\pm}$ can also be responsible for the less significant (approximately one and a half times) but still apparent advantage of the Poincaré beam over the vortex one, also non-uniformly polarized. Complex distribution of the polarizations in their cross-sections on the plane of incidence on the medium provides variability and informativity of the characteristic features of the diffraction and nonlinear cross-interaction of the circularly polarized components of the propagating wave.

High accuracy of the neural network predictions with usage of the beams containing a polarization singularity demonstrates a fundamental perspective of the usage of convolutional neural networks for solving the problems of prediction of the nonlinear response coefficients when the incident radiation contains regions of diverse enough polarization properties in its cross-section. The significantly inferior result obtained with the usage of the uniformly polarized beams thus has to be attributed to a large extent to the lack of the informative features in these radiation modes and not (at least not only) to the inability of the convolutional neural network to distinguish them.

The error distribution in the $a_+a_-$ and $b_+b_-$ spaces is depicted on the figures 5 (Poincaré beam) and 6 (Gaussian beam). On these diagrams in the planes $a_+ a_-$ (a) and $b_+b_-$ (b), points with coordinates, corresponding to the predicted by the neural network parameters, are marked by red points. From each of these, a black arrow is drawn in such way that its head indicates the point, corresponding to the true values of medium nonlinear parameters, i.e., those which were used to obtain the distribution of complex amplitudes of the circularly polarized components of the electric field, submitted to the input layer of the neural network in the corresponding simulation (red dot). The length of the arrow in this case is equal to the 'projection' of an absolute error δ of the prediction of the parameters of the nonlinear medium onto the subspace $a_+a_-$ (a) or $b_+b_-$ (b), respectively.

Figure 5.

Figure 5. Scatter plots of the errors in the spaces of the nonlinear medium parameters for the test dataset of the incident Poincaré beam. Red dots correspond to the parameters $a_\pm^{\prime}, b_\pm^{\prime}$ of the nonlinear medium, predicted by the neural network. The arrowheads (black) depict the true values $a_\pm$, $b_\pm$ of these parameters.

Standard image High-resolution image
Figure 6.

Figure 6. Scatter plots of the errors in the spaces of the nonlinear medium parameters for the test dataset of the incident Gaussian beam. Red dots correspond to the parameters $a_\pm^{\prime}, b_\pm^{\prime}$ of the nonlinear medium predicted by the neural network. The arrowheads (black) depict the true values $a_\pm$, $b_\pm$ of these parameters. Blue dashed line shows the $b_+ = 2b_-$ region.

Standard image High-resolution image

For the Poincaré beam, this error is very low in respect to the range of $a_\pm$ and $b_\pm$ values being studied. The relative error is greater for the coefficients $a_\pm$, describing the action of each circularly polarized component $A_{\pm}$ on itself (figure 5(a)) rather than the error of the coefficients $b_\pm$, responsible for their mutual interaction (figure 5(b)). This can be attributed to the smaller dispersion of the parameters $a_\pm$ in the data being used. The analogous diagrams in case of the incident Gaussian beam are shown in figure 6. Large errors in the $a_\pm$ and $b_\pm$ coefficient prediction are attributable to the different nonlinear coefficients being able to influence the overall beam structure in the same way due to the uniform state of the incident beam polarization. This is clearly illustrated by figure 6(b), in which one can distinctly observe an area of the largest errors, lying near the $b_+ = 2b_-$ line. This area appearance must be attributed to the fact that, for the chosen polarization state of the incident beam, its right-hand ($E_+$) circular polarization component is two times greater than its left-hand one ($E_-$). So, with $b_+ \approx 2b_-$ the numerical values of the right parts of the first and second equations of the system (2) will have nearly the same ratio as that of the incident beam polarization components. That will lead to the propagation of the beam without significant disturbance of its polarization homogeneity in the cross-section, which in its turn will significantly complicate distinguishing of any features characteristic for particular polarization components in the profile of the beam.

Such a picture of the high error area distribution has a place in all datasets when uniformly polarized beams of Gaussian or super-Gaussian shapes are used, which means that in the increase of the neural network efficiency with the usage of the singular beams, the key role belongs specifically to the polarization (and not the intensity) profile of the incident radiation. Thus, to obtain the most accurate predictions, one must apply non-uniformly polarized beams. The radiation modes, containing polarization singularities, satisfy this criterion and what is more, the heterogeneity of their polarization is very stable to small perturbations of the field and can be easily achieved in an experiment [1822]. This makes such beams candidates for usage in problems related to the neural networks in the field of nonlinear polarization optics.

Unlike the homogeneously polarized beams, a singular beam propagates with a destructive interference of individual spatial Fourier-harmonics near its axis. This effect leads to higher values of the parameters of the nonlinear medium, which force the beam to collapse, than those for the Gaussian (and even more so super-Gaussian) beams of commensurable width and peak intensity [23]. This can be clearly seen on the error scatter plots (figure 6(b)) containing no red dots (and accordingly no arrows) in the area of large values of $b_\pm$. With $b_+b_-\gt c \approx 30$, a beam with Gaussian shape is nearly guaranteed to collapse at $0 \lt z \leqslant 1$, and thus such values of $a_\pm$ and $b_\pm$ are thrown out of future consideration.

In the case of Poincaré beams, the empty area takes a much smaller part of the scatter plot of the $b_\pm$ coefficient prediction error (figure 5(b)), corresponding to the values $b_+b_- \gt c_1\approx 60$. For comparison, the vortex beam collapses when $b_+b_-\gt c_2~\approx 80$ because it has singularities in both of its circularly polarized components in the plane of incidence, while the Poincaré beam only has a singularity in its right-hand ($A_+$) circularly polarized component. For the super-Gaussian beam, dots and arrows vanish on the error scatter plot at the values $b_+b_-$ greater than 40–50, which is larger than the corresponding threshold of the Gaussian mode. But this is due to the $\sqrt{2}$ times less value of the width parameter of the super-Gaussian beam (4) than of the three other modes. Such choice of the width parameter in (4) was made because with the same width as (3)–(6), the super-Gaussian mode collapses nearly with any nonlinear coefficient values due to the wide plateau in the intensity profile of the super-Gaussian beam. So one more advantage in usage of the singular beams to define the nonlinear response parameters of a medium is the ability to study a wider range of these parameters without risk of going beyond the application area of the parabolic approximation of the diffraction theory (2).

4. Conclusion

We have demonstrated that convolutional neural networks can be effectively applied to problems of self-interaction of inhomogeneously polarized structured light in a medium with cubic nonlinearity of the optical response. Comparing the results of four neural networks trained to predict nonlinear characteristics of a medium from a known solution of the self-focusing problem of the elliptically polarized beams, we have shown that usage of a complex structured light significantly (down to nearly 1%) lowers the error of the prediction of the parameters characterizing the medium nonlinearity. It also widens the range of the accessible for the study values of these parameters, linearly dependent on the local and non-local cubic susceptibilities and peak intensity of the incident beam. Using light with complex structure allows one to more effectively distinguish informative features in the intensity and polarization distributions of the transmitted radiation and leads to the lowering of the prediction errors of the nonlinear parameters of a medium during the inverse problem solving.

The study has been carried out with the financial support of Russian Foundation for Fundamental Investigations (20-32-90123, Foundation for the Advancement of Theoretical Physics and Mathematics "Basis" and Non-commerce foundation for the development of science and education 'Intellect'.

Please wait… references are loading.
10.1088/1612-202X/ac7135