Region-specified inverse design of absorption and scattering in nanoparticles by using machine learning

Alex Vallone; Nooshin M Estakhri; Nasim Mohammadi Estakhri

doi:10.1088/2515-7647/acc7e5

1. Introduction

The quest for computational inverse design of photonic structures is an exploration of many frontiers. A successful inverse design method seeks to generate physical structures that can achieve a pre-defined set of outcomes depending on the design goals, or else to determine the limits of the system response based on available physical parameters [1]. While several analytical and semi-analytical solutions have been devised for photonic structures, it is not always possible to construct an analytical solution, especially for more complex structures. If an analytical solution to the inverse problem is not accessible the inverse design turns into an optimization problem looking for the optimal set of physical parameters that maximize the objective [2]. This process may entail optimization of a certain number of parameters or treating the entire device footprint as the parameter space and optimizing for the topology of the structure.

Topology optimization was first introduced in the design of mechanical structures [3] and relies on iterative refinement of the distribution of the material inside a specified design area. For photonic structures, the design process adjusts the permittivity distribution inside a fixed region, typically accompanied by a set of additional rules such as binarization, increased robustness, avoiding unrealistic materials, or removing small geometrical features that may not be feasible from a fabrication point of view [4, 5]. The outcomes of topology optimization are rather sophisticated structures, and it is quite challenging to establish a physical insight into the complex relationship between the input and output of the device. As such, the process is highly application-oriented, and the physical shape of the device may change drastically depending on the set objectives [5]. Nevertheless, the design possibilities are very large and topology optimization has found interesting applications in the inverse design of photonic structures, especially to decrease the physical footprint of the device by using the full parameter space. Number of recent examples include grating couplers [6], wavelength demultiplexers [7], hyperlenses [8], beam steering metasurfaces [9], resonators [10], nonlinear nanophotonic devices [11], polarization splitters [12, 13], and equation solving elements [14].

On the other side of the spectrum are problems that are constructed based on a limited number of variables in predetermined geometries. In such cases and relying on the physical insights of the designer, the general form of the device is already known, and a few parameters are tuned to enhance the device's performance. While the number of parameters may not be as large as the topology optimization scenarios, simultaneously satisfying multiple objectives within a typically constrained parameter space is not a simple task [1]. Nonetheless, parameter optimization and tuning has been extensively explored in the inverse design of electromagnetic devices [15–19]. Indeed, given the complexity of typical photonic structures, the number of tuning parameters may not be small even for a predetermined geometry. Consequently, bio-inspired inverse design optimization algorithms such as genetic algorithm and particle swarm optimization have gained a lot of attention, particularly in the field of electromagnetic engineering, due to the possibility of working with a moderate to a large number of parameters [20–24]. Genetic algorithm is also capable of avoiding local minima and maxima, a feature that allows for the inverse design of overly complex structures, comparable with topology-optimized structures [25, 26].

Photonic inverse design using machine learning is conceptually different from the previously discussed traditional techniques. In contrast to iterative optimization approaches, here a neural network is trained to learn the dynamics of the system, and subsequently, model a physical system or design a device with the desired performance [27, 28]. It is worth mentioning that the recent rapid advances in using machine learning for the modeling and design of nano-optic structures are undeniably rooted in increased computation capacity and availability of high-performance hardware, allowing for extensive computations at the training stage. To date, machine learning algorithms have been applied to several forward and inverse photonic problems including, modeling lossless particles [29], design of chiral metamaterials [30], design and characterization of optical elements for metasurfaces [31], inverse design [32–36] and response prediction [37–39] in one-dimensional (1D) photonic crystals, inverse design of multilayered nanostructures [40], modeling and design of electric and magnetic dipole response [41], modeling three-dimensional nanostructures [42], and dielectric metasurface design [43]. Interestingly, the machine learning-based design approach is not necessarily a blind data-driven method, and information about the physics of the problem may also be included in the model [44, 45]. Aside from all these advantages, data-driven approaches require a large number of good-quality training data [40, 46] for fast convergence and the computational cost at the data generation stage, which requires full-wave electromagnetic simulation, can be high.

In this work, we use convolutional neural networks (CNNs) to design multilayered nanoparticles using a small training dataset. While the training dataset is generated analytically here (see section 3), for devices and structures that an analytical or numerically fast approach is not accessible, generating the training dataset itself is a computationally expensive part of the inverse design. Therefore, it is crucial to devise and investigate techniques that rely on smaller training datasets. The particle geometry is designed to simultaneously control the relative levels of (i) absorption to maximum dipolar absorption, and (ii) absorption to scattering. Using a region-specified training, the same model is re-used to design particles across the 350–700 nm wavelength spectrum. The structure of the manuscript is as follows: section 2 gives a brief illustration of the electromagnetic problem under study and the formulation, section 3 describes the configuration and properties of our neural network and the training process, and the results are discussed in section 4.

2. Electromagnetic modeling

The structure of the multilayered spherical nanoparticle is illustrated in figure 1(a). The core and outer shell are made of silicon dioxide [47], and the inner shell is a plasmonic material (silver, with parameters adapted from [48]). Through the inverse design, three scaling parameters ${\alpha _1}$ , ${\alpha _2}$ , and ${\alpha _3}$ are determined, which are related to the radius of each interface as ${r_1} = {\alpha _1}{\alpha _2}{\alpha _3} \times 700\,{\text{nm}}$ , ${r_2} = {\alpha _1}{\alpha _2} \times 700\,{\text{nm}}$ , and ${r_3} = {\alpha _1} \times 700\,{\text{nm}}$ . This allows us to normalize the design parameters as $0 < {\alpha _2},{\alpha _3} < 1$ and $0 < {\alpha _1} < 0.2$ , indicating the maximum particle diameter of 280 nm. The scaling parameters may alternatively be written in terms of radius of each interface as ${\alpha _1} = {{{r_3}} \mathord{\left/ \right. } {700\,{\text{nm}}}}$ , ${\alpha _2} = {{{r_2}} \mathord{\left/ \right. } {{r_3}}}$ , ${\alpha _3} = {{{r_1}} \mathord{\left/ \right. } {{r_2}}}$ . As mentioned above, by choosing ${\alpha _{1,\max }} = 0.2$ , the maximum diameter of particle is 280 nm. This is an intentional choice which allows us to quantify the importance of the quality of the training dataset (more discussion in section 3). In summary, we pick a suitable maximum diameter such that we can achieve a rich and highly varying scattering response near 350 nm and a featureless scattering response closer to 700 nm.

The particles are illuminated with a plane wave and the first ten Mie scattering coefficients (five transverse magnetic, TM, and five transverse electric, TE) are calculated. Mie theory solves the scattering of a plane wave from a spherical particle through spherical harmonics expansion [49]. Here we follow the notation used in [50, 51]. The scattering and absorption cross sections of the particle are then given by

$\begin{equation}\begin{gathered} {\sigma _{{\text{scs}}}} = \frac{{{\lambda ^2}}}{{2\pi }}\sum\limits_{n = 1}^N {\left( {2n + 1} \right)\left[ {{{\left| {C_n^{{\text{TM}}}} \right|}^2} + {{\left| {C_n^{{\text{TE}}}} \right|}^2}} \right]} \hfill \\ {\sigma _{{\text{abs}}}} = - \frac{{{\lambda ^2}}}{{2\pi }}\sum\limits_{n = 1}^N {\left( {2n + 1} \right)\left[ {\operatorname{Re} \left( {C_n^{{\text{TM}}}} \right) + \operatorname{Re} \left( {C_n^{{\text{TE}}}} \right) + {{\left| {C_n^{{\text{TM}}}} \right|}^2} + {{\left| {C_n^{{\text{TE}}}} \right|}^2}} \right]} , \hfill \\ \end{gathered} \end{equation} \tag{ 1 }$

in which ${\sigma _{{\text{scs}}}}$ is the total scattering cross section, ${\sigma _{{\text{abs}}}}$ is the total absorption cross section, $N$ is set at five, $\lambda$ is the operation wavelength, and $C_n^{{\text{TM}}}$ and $C_n^{{\text{TE}}}$ are the nth TM and TE Mie scattering coefficients, respectively. At optical frequencies, both these cross sections are typically very small as they scale with ${\lambda ^2}$ . As such, relevant quantities that are typically used to gain better insight into the strength of absorption and scattering are dimensionless scattering and absorption efficiencies, defined as

$\begin{equation}{Q_{{\text{scs}}}} = \frac{{{\sigma _{{\text{scs}}}}}}{{\pi r_3^2}},\,\;{Q_{{\text{abs}}}} = \frac{{{\sigma _{{\text{abs}}}}}}{{\pi r_3^2}},\end{equation} \tag{ 2 }$

which are normalized to the physical cross section of the particle [52].

We should point out that even a very large scattering (or absorption) efficiency does not necessarily imply that the particle is a good scatterer (or absorber). Indeed, a large efficiency simply indicates that the particle scatters (or absorbs) well compared to its physical size [53]. In this work, we aim to design particles with high overall absorption (modeling good sensors, emitters, or receivers) independent of their physical size. At the same time, we aim to control the ratio between absorbed and scattered powers to implement structures ranging from low-scattering absorbers to bright (high-scattering) absorbers. For this purpose, we define two new metrics

$\begin{equation}{\sigma _{{\text{Norm}}}} = \frac{{{\sigma _{{\text{abs}}}}}}{{{\sigma _{{\text{1, abs}-\text{max}}}}}},\;{\sigma _{{\text{Ratio}}}} = \frac{{{\sigma _{{\text{abs}}}}}}{{{\sigma _{{\text{scs}}}}}}.\end{equation} \tag{ 3 }$

The first metric, ${\sigma _{{\text{Norm}}}}$ , measures the level of absorption compared to the maximum attainable absorption from a dipole at the same frequency ( ${\sigma _{{\text{1, abs}-\text{max}}}} = {{3{\lambda ^2}} \mathord{\left/ \right. } {8\pi }}$ ). This metric, while normalized, is independent of the size of the particle and provides a physically meaningful measure to assess the power absorbed by the structure. The second metric is simply the ratio between the two cross sections. For small dielectric particles, the scattering coefficients for n > 1 are typically negligible, and a trade-off is present between the scattered and absorbed powers [51, 54, 55]. By increasing the size of the particle, incorporating plasmonic materials, and carefully arranging the contribution of higher-order scattering modes, in principle, it should be possible to break this trade-off and control these ratios at will [51]. However, while it has been proven theoretically that such particles do not violate any physical bounds, designing one is not a straightforward problem and requires an optimization approach. Here we use a CNN, as detailed in the next section, to model this inverse problem and design such particles. The choice of a neural network (as opposed to an iterative search algorithm) is rooted in several practical considerations: First, after the training stage, the inverse designs are immediately accessible through the trained neural networks. While the accuracy of the design is not typically as good as an iterative model, the speed is considerably higher. Second, after generating the training dataset, continued access to a photonic simulator is not required and the network itself will mimic the physical system. Third, by using a region-specified design, our goal is to encapsulate the localized wavelength-dependent behaviors of the desired metrics in a one-time training process. Upon this one-time training, all the designs presented throughout the paper are immediately accessible with no additional computational cost. This also allows us to easily quantify the importance of the quality of the training dataset to achieve good designs with a neural network (see section 4). Finally, the specific choice of 1D CNN, is further rooted in the nature of the optimized metrics and again, the adaptation of region-specified design. CNNs are typically used in image and pattern recognition application due to their ability in identifying features of response [56]. As such, CNNs can be useful in capturing the correlations and features of the spectral response [28]. The analytical calculations of generalized optimal low-scattering absorbers indicate that such optimal condition is associated with aligned and balanced resonances of scattering coefficients [51]. As we are interested in identifying these features over different wavelength ranges, a CNN is used here.

The three-layer particle considered here includes a plasmonic material and the overall diameter can be as large as 280 nm. Therefore, we expect to achieve a reasonable scattering (and absorption) response across 350–700 nm, especially closer to the lower wavelength bound. As we get closer to 700 nm, the particle size is too small compared with the wavelength to support multiple scattering harmonics. This gradual evolution allows us to observe the trend of the inverse design based on the quality of the training data. Naturally, we expect to see better results close to 350 nm. We also note that the choice of a three-layer plasmonic nanoparticle allows for a variety of characteristics such as strong light–matter interaction at the subwavelength scale, reasonable control on scattering and absorption properties across a wide range of wavelengths, realistic choice of materials, and realistic fabrication constraints. On the other hand, the same framework matches other relevant scenarios such as radio frequency antennas or optical nano-antennas, where the scattering response is a combination of only few harmonics (i.e. dipole, quadrupole, etc).

Finally, we also note that since the denominator of ${\sigma _{{\text{Ratio}}}}$ is the scattering cross section, which can be very small, this metric can reach very large values for particles with small scattering. Due to their typically small absorption, these particles are not of our interest and will be removed from training, as detailed in the next section.

3. Data generation and network configuration

For the dataset, we used Mie theory to model 2310 different particles (which corresponds to approximately 13 sampling for each scaling factor ${\alpha _{1,2,3}}$ ). After removing particles with extremely high ${\sigma _{{\text{Ratio}}}}$ , 1452 particles are selected as a suitable dataset. Each datapoint has two input channels of ${\sigma _{{\text{Norm}}}}$ and ${\sigma _{{\text{Ratio}}}}$ , defined in equation (3), across 200 evenly spaced wavelengths. The outputs of each datapoint are the three scaling parameters ( ${\alpha _1}$ , ${\alpha _2}$ , and ${\alpha _3}$ ). All datapoints are augmented to include two additional input functions, to specify the wavelength range of interest [57], for each metric. As discussed in the next section, for all the designs we assume that these wavelength ranges are the same, i.e. ${\sigma _{{\text{Norm}}}}$ and ${\sigma _{{\text{Ratio}}}}$ are optimized over the same wavelength range indicated by $R\left( \lambda \right)$ . Each $R\left( \lambda \right)$ is of a random length and is positioned to start at a random wavelength. At each point $R\left( \lambda \right)$ is either zero or one depending on the represented wavelengths. Metrics in the original two channels are multiplied by their respective $R\left( \lambda \right)$ to zero out the contribution from outside the specific design region [57]. Each datapoint was combined with 25 $R\left( \lambda \right)$ functions from a pre-defined pool containing 300 (not necessarily unique) functions. This expands our dataset from 1452 datapoints to 36 300. Overall, 90% of datapoints are used for training and the remaining 10% are used for testing. The minimum bandwidth of each $R\left( \lambda \right)$ in the pool is 3% and the maximum bandwidth is 50% of the overall maximum bandwidth (i.e. 350 nm). In addition, there is a 10% chance to generate $R\left( \lambda \right)$ with only zeros and an additional 10% chance to generate $R\left( \lambda \right)$ with only ones.

The data we are using is spectral, so we used a residual 1D CNN with a 1D convolutional layer, five residual blocks (ResBlock1D), and a flatten layer followed by a dense network. Each residual block consists of three convolutions with Leaky ReLU (Leaky Rectified Linear Unit) activations and a skip connection. Also, each residual block maintains its input size as the output size, and hence each one is followed by a max pooling layer, cutting the size in half each time. This is done to take the original size of 200 wavelengths down to a final size of 12 before flattening for the dense network. The model is trained and optimized using Adam with the mean squared error as the loss function. A batch size of 128 and an initial learning rate of $5 \times {10^{ - 3}}$ are used. The learning rate is decayed exponentially every 1000 steps with a rate of 0.92 [58, 59]. Figures 1(b) and 2 illustrates network configuration and the training convergence, respectively.

**Figure 2.** Training convergence of the model: (a) training, and (b) validation. Mean squared error is used as the loss function: ${\text{Loss}} = \frac{1}{n}\sum\limits_{i = 1}^n {{{\left( {{y_{i{\text{,true}}}} - {y_{i{\text{,predicted}}}}} \right)}^2}}$ , where n is number of outputs (400 points) and y refers to either of the two metrics.
Download figure:
Standard image High-resolution image

${\text{Loss}} = \frac{1}{n}\sum\limits_{i = 1}^n {{{\left( {{y_{i{\text{,true}}}} - {y_{i{\text{,predicted}}}}} \right)}^2}} $ — **Figure 2.** Training convergence of the model: (a) training, and (b) validation. Mean squared error is used as the loss function: ${\text{Loss}} = \frac{1}{n}\sum\limits_{i = 1}^n {{{\left( {{y_{i{\text{,true}}}} - {y_{i{\text{,predicted}}}}} \right)}^2}}$ , where n is number of outputs (400 points) and y refers to either of the two metrics.
Download figure:
Standard image High-resolution image

4. Results and discussions

In the first step, we evaluate the performance of the trained network for physical test samples (figure 3). Afterwards we will investigate inverse design for non-physical test samples (figures 4–6). For physical test samples the inputs of the CNN are ${\sigma _{{\text{Norm}}}}$ and ${\sigma _{{\text{Ratio}}}}$ for actual three-layer particles. We randomly pick these eight particles (as explained below), which in return implies that we expect the network to provide $\alpha$ parameters that we are sure that they should exist. For non-physical test samples, on the other hand, we do not start from any specific particles. Quite contrary, we assume ${\sigma _{{\text{Norm}}}}$ and ${\sigma _{{\text{Ratio}}}}$ values that might be physically inaccessible from particles made with characteristics shown in figure 1(a). In this case, we expect the network to find the closest possible solution to such non-physical inputs [57].

**Figure 3.** Inverse design performance of *physical* test samples. (a)–(h) ${\sigma _{{\text{Norm}}}}$ (red-dashed) and ${\sigma _{{\text{Ratio}}}}$ (blue-dashed) calculated based on the predicted values of ${\alpha _1}$ , ${\alpha _2}$ , and ${\alpha _3}$ . The solid lines indicate the desired response.
Download figure:
Standard image High-resolution image

Going back to physical test samples, the scattering responses of eight randomly chosen particles with $0 < {\alpha _2},{\alpha _3} < 1$ and $0 < {\alpha _1} < 0.2$ are calculated using Mie theory. The results, along with eight different $R\left( \lambda \right)$ functions are fed into the inverse network shown in figure 1(b) to generate the predicted values of ${\alpha _1}$ , ${\alpha _2}$ , and ${\alpha _3}$ . Table 1 reports the actual and predicted values of these three parameters. We note that electromagnetic problems are highly non-unique and different particles can create similar scattering and absorption responses over limited bandwidths [40]. As such, the values reported in table 1 are not necessarily the best measures to evaluate the performance of the network and the metrics must be directly compared.

Table 1. Scaling parameters for figure 3. Actual values correspond to solid lines and predicted values correspond to dashed lines in figure 3.

The panel in figure 3	(a)	(b)	(c)	(d)	(e)	(f)	(g)	(h)
${\alpha _1}$ : actual, predicted	0.118, 0.126	0.171, 0.171	0.135, 0.141	0.157, 0.159	0.134, 0.167	0.181, 0.192	0.107, 0.095	0.088, 0.066
${\alpha _2}$ : actual, predicted	0.736, 0.723	0.441, 0.436	0.558, 0.544	0.424, 0.39	0.51, 0.422	0.382, 0.417	0.072, 0.058	0.23, 0.105
${\alpha _3}$ : actual, predicted	0.078, 0.08	0.705, 0.719	0.223, 0.12	0.422, 0.42	0.926, 0.927	0.48, 0.477	0.453, 0.411	0.6, 0.513

Subsequently, the predicted values are used to calculate the metrics ${\sigma _{{\text{Norm}}}}$ and ${\sigma _{{\text{Ratio}}}}$ across the 350–700 nm range by using the Mie theory. The results are summarized in figure 3. First, we chose $R\left( \lambda \right)$ to sweep the entire wavelength range in five equal steps of 70 nm width (figures 3(a)–(e)). Then, the effect of changing bandwidth is investigated in the last three panels of figures 3(f)–(h), where the bandwidth is changed between 52.5 nm, 35 nm, and 17.5 nm. Investigating figure 3, we observe a good performance from the trained model, with slight peak shifts in (a), (b) and (f). The shift is more apparent when the scattering/absorption spectrum has closely packed features, as in panel (f). Interestingly, and quite expectedly, the predictions are less accurate when ${\sigma _{{\text{Norm}}}}$ is small. In such cases, ${\sigma _{{\text{Ratio}}}}$ is very sensitive to small changes in absorption and scattering and thus, the predictions become less accurate. These values, however, are not of interest to us as we aim to design particles with reasonably high absorption levels.

Next, we use the same model, which has been trained in the previous step, to inverse design particles with different levels of ${\sigma _{{\text{Norm}}}}$ and ${\sigma _{{\text{Ratio}}}}$ near shorter wavelengths (figure 4). The function $R\left( \lambda \right)$ is set to one for wavelengths between 350 and 367.5 nm (5% of the overall bandwidth) and is set to zero everywhere else. In these examples, the absorption metric ${\sigma _{{\text{Norm}}}}$ is set to one, indicating a rather highly absorptive particle. This particle would essentially absorb the same level of power as an ideal dipole absorber (a conjugate matched dipole) which operates at the same wavelengths [51]. ${\sigma _{{\text{Ratio}}}}$ is also varied between 0.25 and 4.75 (see the caption of figure 4 for more details). The smaller values of ${\sigma _{{\text{Ratio}}}}$ indicate that the majority of the interaction of the incident wave with the particle is toward scattering. Although the particle is a good absorber, it still scatters a significant amount of power, which in the sensor/antenna design realm, would be considered as wasted power. Larger values of ${\sigma _{{\text{Ratio}}}}$ , on the other hand, indicate that the particle mainly absorbs the incident power, without any additional and unnecessary scattering [60, 61]. While ${\sigma _{{\text{Ratio}}}}$ may be chosen as large as desired and without violating the optical theorem [51, 62], the physical size of the particle imposes a limit on the maximum number of scattering harmonics that may be realistically excited. As such, we set the maximum value of ${\sigma _{{\text{Ratio}}}}$ at around 5.

Figure 4 summarizes the performance of the model in this scenario. The gradual increase in the level of ${\sigma _{{\text{Ratio}}}}$ can be clearly observed between different cases. An intuitive measure of relative error is also defined as the mean value of the difference between the ideal and inverse designed metrics, also reported in figure 4.

Next, we evaluate the performance of the model for the same trend of desired metrics over 437.5–455 nm (figures 5(a)–(h)) and 595–612.5 nm (figures 5(i)–(l)), each covering 5% of the overall bandwidth but now moving toward longer wavelengths. Inspecting figure 5, the importance of the quality of training data is evident. As mentioned before, for the training of this model, we have used particles with a maximum radius of 140 nm. For such particles, the scattering and absorption values are typically smaller and featureless for longer wavelengths. In particular, for small values of scattering, ${\sigma _{{\text{Ratio}}}}$ does not follow a meaningful physical behavior and creates an unsuitable training data. Consequently, the relative error for longer wavelengths is higher and the model simply cannot predict a meaningful response. We emphasize that the model was trained with only 1306 different particles and increasing the number of training data and maximum radius can, in principle, improve the performance at larger wavelengths.

**Figure 5.** Inverse design performance of *non-physical* test samples. (a)–(l) ${\sigma _{{\text{Norm}}}}$ (red-dashed) and ${\sigma _{{\text{Ratio}}}}$ (blue-dashed) calculated based on the predicted values of ${\alpha _1}$ , ${\alpha _2}$ , and ${\alpha _3}$ . The solid lines indicate the desired response. ${\sigma _{{\text{Norm}}}}$ is fixed at 1 in all panels, and ${\sigma _{{\text{Ratio}}}}$ is set to (a) 0.25, (b) 0.75, (c) 1.5, (d) 2, (e) 2.75, (f) 3.25, (g) 4, (h) 4.75, (i) 0.25, (j) 1.25, (k) 2.25, and (l) 3.25. The relative prediction error is indicated at the top of each panel.
Download figure:
Standard image High-resolution image

Comparing the results in figure 3 with figures 4 and 5, an immediate observation is the importance of choosing the design goal [57]. The model performs well when provided with physically accessible metrics (as in figure 3), yet for non-physical flat ${\sigma _{{\text{Norm}}}}$ and ${\sigma _{{\text{Ratio}}}}$ functions, even across a small bandwidth of 17.5 nm, the performance deteriorates. This is understandable by inspecting the width of typical resonance peaks in these figures (e.g. figures 4(c) and 5(a)). To accommodate for the possibility of near-resonance response and without enforcing the presence of a resonance at the design point, next, we slightly modify the metrics to include different relative slopes. For each pair of desired ${\sigma _{{\text{Norm}}}}$ and ${\sigma _{{\text{Ratio}}}}$ , nine different objectives are generated, where:

(i)
${\sigma _{{\text{Norm}}}}$ and ${\sigma _{{\text{Ratio}}}}$ are both flat,
(ii)
${\sigma _{{\text{Norm}}}}$ is flat and ${\sigma _{{\text{Ratio}}}}$ has a positive slope,
(iii)
${\sigma _{{\text{Norm}}}}$ is flat and ${\sigma _{{\text{Ratio}}}}$ has a negative slope,
(iv)
${\sigma _{{\text{Norm}}}}$ has a positive slope and ${\sigma _{{\text{Ratio}}}}$ is flat,
(v)
${\sigma _{{\text{Norm}}}}$ has a negative slope and ${\sigma _{{\text{Ratio}}}}$ is flat,
(vi)
${\sigma _{{\text{Norm}}}}$ has a negative slope and ${\sigma _{{\text{Ratio}}}}$ has a positive slope,
(vii)
${\sigma _{{\text{Norm}}}}$ has a positive slope and ${\sigma _{{\text{Ratio}}}}$ has a negative slope,
(viii)
${\sigma _{{\text{Norm}}}}$ and ${\sigma _{{\text{Ratio}}}}$ both have positive slopes,
(ix)
${\sigma _{{\text{Norm}}}}$ and ${\sigma _{{\text{Ratio}}}}$ both have negative slopes.

We then pick the best overall performance out of the nine alterations of the same objective. In this example, ${\sigma _{{\text{Norm}}}}$ is again set around one (with a slope), maintaining a high absorption level, and ${\sigma _{{\text{Ratio}}}}$ is again between 0.25 and 4.75 (with a slope, see the caption of figure 6 for more details) across the same three wavelength ranges as before. Figure 6 reports the best overall inverse designs along with the desired response and their relative slopes (each panel is the best out of nine objectives listed above). Inspecting figures 6(a)–(j), it can be seen that the addition of artificial slopes to the objective clearly improves the performance. For longer wavelengths (figures 6(k)–(t)), the results are still affected by the quality of training data over these wavelengths, yet importantly, they are also improved compared to figure 5.

**Figure 6.** Inverse design performance of *sloped non-physical* test samples. (a)–(t) ${\sigma _{{\text{Norm}}}}$ (red-dashed) and ${\sigma _{{\text{Ratio}}}}$ (blue-dashed) calculated based on the predicted values of ${\alpha _1}$ , ${\alpha _2}$ , and ${\alpha _3}$ . The solid lines indicate the desired response. ${\sigma _{{\text{Norm}}}}$ is fixed around 1 in all panels, and ${\sigma _{{\text{Ratio}}}}$ is varied. Three different wavelength ranges are considered:(a)–(j) 350–367.5 nm, (k)–(p) 437.5–455 nm, and (q)–(t) 595–612.5 nm. For each panel, the average of ${\sigma _{{\text{Ratio}}}}$ is set to: (a) 0.25, (b) 0.75, (c) 1.25, (d) 1.75 (e) 2.25, (f) 2.75, (g) 3.25, (h) 3.75, (i) 4.25, (j) 4.75, (k) 0.25, (l) 1.25, (m) 2.25, (n) 3.25, (o) 3.75, (p) 4.75, (q) 0.25, (r) 1.25, (s) 2.75, and (t) 4.75.
Download figure:
Standard image High-resolution image

5. Conclusions

Machine learning has been used for scattering simulation and inverse design [29, 63], multipole placement [41, 57], and scattering reduction in nanoparticles [46]. Here, we adapted the region-specified design approach to design highly absorptive nanoparticles with different levels of relative scattering. We investigated the case of ∼1400 datapoints for the training since the dataset generation is typically a computationally expensive task for electromagnetic problems. Our results indicate that if the training data is of good quality, the model can successfully accomplish the goals of the inverse design. We also showed that the performance of the model can be well-boosted by choosing objectives to mimic the physical response of the system. Our results are relevant to the design of nanoparticles for biological, chemical, and optical applications as well as the design of low-scattering absorbers and antennas.

Acknowledgments

This material is based upon work supported by the National Science Foundation under Grant No. 2138869 and the Chapman Faculty Opportunity Fund (2021).

Data availability statement

The data that support the findings of this study are available upon reasonable request from the authors.

Region-specified inverse design of absorption and scattering in nanoparticles by using machine learning

Article metrics

Submit

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Peer review information

Abstract

1. Introduction

2. Electromagnetic modeling

3. Data generation and network configuration

4. Results and discussions

5. Conclusions

Acknowledgments

Data availability statement

Region-specified inverse design of absorption and scattering in nanoparticles by using machine learning

Article metrics

Submit

Share this article

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Peer review information

Abstract

1. Introduction

2. Electromagnetic modeling

3. Data generation and network configuration

4. Results and discussions

5. Conclusions

Acknowledgments

Data availability statement