Deep Molecular Dreaming: Inverse machine learning for de-novo molecular design and interpretability with surjective representations

Computer-based de-novo design of functional molecules is one of the most prominent challenges in cheminformatics today. As a result, generative and evolutionary inverse designs from the field of artificial intelligence have emerged at a rapid pace, with aims to optimize molecules for a particular chemical property. These models 'indirectly' explore the chemical space; by learning latent spaces, policies, distributions or by applying mutations on populations of molecules. However, the recent development of the SELFIES string representation of molecules, a surjective alternative to SMILES, have made possible other potential techniques. Based on SELFIES, we therefore propose PASITHEA, a direct gradient-based molecule optimization that applies inceptionism techniques from computer vision. PASITHEA exploits the use of gradients by directly reversing the learning process of a neural network, which is trained to predict real-valued chemical properties. Effectively, this forms an inverse regression model, which is capable of generating molecular variants optimized for a certain property. Although our results are preliminary, we observe a shift in distribution of a chosen property during inverse-training, a clear indication of PASITHEA's viability. A striking property of inceptionism is that we can directly probe the model's understanding of the chemical space it was trained on. We expect that extending PASITHEA to larger datasets, molecules and more complex properties will lead to advances in the design of new functional molecules as well as the interpretation and explanation of machine learning models.


Introduction
The de-novo design of new functional chemical compounds can bring enormous scientific and technological advances. For this reason, researchers in cheminformatics have developed a plethora of A.I. methodologies for the challenging inverse molecular design task [5,6]. They include deep learning techniques such as variational autoencoders (VAE) [7,8,9], generative adversarial networks Figure 1: Deep dreaming is well-known for creating new dream-like images [3]. To generate this image, we used the following github repo [4].
These methods belong to a category with one particular attribute: the model indirectly optimizes molecules for a target property. For example, VAEs and GANs learn to mimic a distribution of molecules from a training set, constructing a latent space that is then scanned to find molecules that optimize an objective function. In the case of RL, the agent learns from rewards in the environment in order to build a policy for generating molecules, which is subsequently used to maximize an objective function. Finally, in GAs, the population is optimized iteratively by applying mutations and selections. In all of these cases, the optimization process does not directly maximize the objective function in a gradient-based way.
Here, we present preliminary results for PASITHEA 1 , a new generative model for molecules inspired by inceptionism techniques [2] in computer vision. PASITHEA is a gradient-based method that optimizes a discrete molecular structure for a target property. We train a neural network to predict chemical properties using a molecular string representation. We then invert the training of the network to generate new variants of molecules. This approach has two significant novelties: • Molecules are directly optimized to a given objective function, sidestepping the learning of distributions and policies, or the application of mutations to a population.
• We can analyse what the regression network has learned about the chemical property by probing its inverse training with test molecules. This may allow us to explain the neural network's understanding of chemistry.
Furthermore, in contrast to most exploratory methods such as RLs or GAs, PASITHEA does not require expensive function evaluations for quantum chemistry calculations. Provided that we use a pre-calculated dataset, this is an important advantage over explorative approaches such as GA or RL, since costly chemical properties can be directly optimized.
This method is made possible by the application of SELFIES, a 100% robust molecular string representation [1]. In contrast to SMILES, for which a large fraction of generable strings do not map to valid molecular graphs, SELFIES is a surjective map between molecular strings and molecular graphs. That is, for every SELFIES string, there exists a valid molecular graph, and every molecular graph can be represented by SELFIES.

2.01152
Model weights and biases are suspended.

Methodology
Inceptionism [3,2] has drawn considerable attention as an artistic method for rendering images. By using a neural network trained to classify an image (i.e., dog, car or house), the network can perform deep 'dreaming' on an image in order to mutate it gradually to fit a different class while retaining features of the original image. For example, it may enhance animal features in the image of a chemistry lab while the general structure of a lab is still visible (Figure 1). The rendered images have dream-like properties that make them a popular artistic style in the media [3].
We generalize this methodology to the inverse-design task of functional molecules. PASITHEA uses a fully-connected neural network consisting of four layers, each with 500 nodes, and takes as input the one-hot encoding of the SELFIES representation of each molecular graph.
Prior to deep dreaming, the network learns to predict a specific real-valued property for each molecule in a given dataset (i.e., logarithm of partition coefficient, or logP) from the molecular graph. The training involves the standard feedforward and backpropagation process. For a set of fixed inputs and outputs, the network iteratively improves its predictions by updating the weights through mini-batch gradient descent (Figure 2a). In deep dreaming, an input molecule with a property value predicted by the network is incrementally modified to a similar molecule with the desired value. The weights and biases of each layer of the network are now fixed and the neural network is no longer adjusting its logP prediction for each molecule. Through backpropagation, we minimize the error between the predicted properties of each input molecule and the desired target property (Figure 2b). The computed error is then used to compute the gradient with respect to the one-hot encoding of the input. This effectively transforms the input gradually to a molecule that matches the target property. Each increment of the one-hot encoding corresponds to a potential transformation of the input molecule.
Once the loss function has been minimized, the gradient evaluates approximately to zero, which terminates the training. In this process, the same standard feedforward and backpropagation algorithm is used, but the input molecule is adjusted while the weights and biases remain constant.

Results
It is an ongoing study to find the best numerical conversion from the SELFIES string into an appropriate input for deep dreaming. When a one-hot encoding is taken as input, the encoding transforms from a vector of binary variables to a vector of real numbers in the first and subsequent iterations. As a result of mixed vector representations in training, the model has difficulty in converging. Therefore, taking one of several possible approaches, we introduce noise in the one-hot encoding as input to deep dreaming. Every zero in the one-hot encoding is altered to a random number between zero and a specific upper-bound, which is typically set to a value between 0.5 and 0.9. Using this method, we observe an incremental optimization for each given molecular input, as required.
Another important contribution to the model is the application of SELFIES. This method requires a continuous space in which all points are valid, a criterion met by the recently developed SELFIES, which is proven to be 100 % valid [1]. The traditional SMILES representation can be problematic when the deep dreaming model transitions over an invalid structure from one molecule to the next. For example, in the transition from a string containing a ring,"CCCC1CCCC1CC", to a string without a ring, "CCCCCCCCCC", the model is likely to produce strings resembling "CCCC1CCCCCC", which does not correspond to a valid molecular graph. In this case, the transformation may reveal the network's understanding of string syntax in relation to logP, but not the molecular structure in relation to logP, since the string does not correspond to a valid molecule. In contrast, the SELFIES representation enforces a constraint on the syntax to prevent the model from producing such invalid structures, which produces a complete optimization sequence that directly maps to valid molecules. Our findings here highlight only one of the many potential applications of SELFIES. Our experiments clearly indicate that deep dreaming achieves both a direct, gradient-based design of novel functional molecules and the explainability of neural networks for molecules. A simple four-layer neural network, with no added components, suffices for our results and we did not require an exhaustive search for the ideal training hyperparameters. In this analysis, PASITHEA is trained to predict the logarithm of partition coefficient (logP), obtained from the RDKit library [20], on a set of the smallest 10,000 molecules in the QM9 dataset. The logP, which measures the lipophilicity of a molecule, is an important property of drug molecules and an indicator of drug-likeness [21]. We demonstrate how PASITHEA transforms molecules in a stepwise, quasi-continuous fashion and shifts the distribution of logP in the molecular dataset toward set targets. These logP targets are set high in order to observe a rightward shift in logP distribution and similarly set low to observe a shift in the opposite direction. With logP targets much further from the central tendency in the distribution, surpassing the highest and lowest values in the dataset, we observe a more pronounced shift in distribution during training. We then analyse what PASITHEA has learned regarding the relationship between logP and molecular structure.

Evolution of individual molecules
Of particular interest is the gradual progression of each molecule through inverse training. Over hundreds of training epochs, the gradient with respect to input SELFIES produces minor adjustments in the molecule that increments to a pronounced transmutation ( Figure 3). The behaviour of these adjustments are stepwise due to the discrete, textual nature of the molecules represented by strings, but continuous in terms of real-valued one-hot encodings.

Shift in distribution
In order to observe a large-scale pattern over the entire dataset, we disregard the intermediate molecules and restrict our analysis to the initial and fully-optimized molecules. We take a sample of the smallest 10,000 molecules from the QM9 dataset and apply deep dreaming to each molecule. From these results, there is a clear shift in the distribution of logP values in the set of molecules as they transmute toward a given target value (Figure 4). Although the training learning rates have little effect on the quality of training, the addition of more noise to one-hot encoded inputs (higher upper-bound values in Figure 4) has a large influence on the shifts in distribution curves. We furthermore observe from these distribution shifts in Figure 4 that there are some molecules generated with logP values exceeding the lowest and highest values in the original dataset. For instance, notice that in Figure  4a, the left tail of the left-shifted (green) distribution extends beyond the left tail of the original (red) distribution and the right tail of the right-shifted (blue) distribution extends beyond the right tail of the original distribution. A quantitative account of the distribution in Figure 4a is summarized in Figure  5. Notice that the maximum logP in the right-shifted distribution exceeds that in the original dataset, and the minimum in the original dataset exceeds that in the left-shifted distribution. Demonstrably, PASITHEA is generating novel molecules with properties outside the limits of the original training set of molecules, which attests to the large potential of this method.

Interpretable ML in physics
Our approach to de-novo molecular generation does not require domain knowledge, nor is the design of PASITHEA influenced by domain knowledge. However, this knowledge is useful when applied to individual molecular evolutions in deep dreaming [22]. In particular, we take interest in the recent progress in the machine-assisted discovery of concepts in the natural sciences [23,24,25,26]. These lines of research use machine learning techniques to draw conclusions about the underlying processes of a particular physical system, which are often mathematical models with tunable parameters that are responsive to input observations. This approach differs from research in machine-assisted de-novo molecular generation [7], where the focus lies in producing optimization methods that can navigate a massive chemical search space. Our approach may close the gap between these lines of research. By inverting the training, we achieve both molecular generation and insights into how the network produces each molecular transformation, such as the 'strategy' employed to optimize molecules by appending nitrogen atoms. Although PASITHEA does not model the behaviour of molecules in the physical sense, it does model the transition rules required for molecular optimization; there is potential in rigorously quantifying these transition rules. Specifically, the viability of inceptionism in recovering the thermodynamic principles of physics [23] attests to the potential for chemistry.

Interpretable ML in chemistry
Inspired by explainable representations in image recognition [27] and rediscovery of concepts in physics [23], we can understand the internal molecular representation by inverting it. For that, we probe the neural network with specific test molecules and observe patterns in how it changes them. For example, the composition of atoms after inverse training follows a predictable pattern, such as the appendage of a few non-carbon atoms, fluorine and nitrogen. Take, for example, the transmutations of the simplest molecules in the QM9 dataset ( Figure 6), which suggest that PASITHEA interprets these non-carbon atoms as correlated with lower logP values. A similar trend persists for more complex molecules, in which more than one atom may be replaced with nitrogen (Figure 3a), though this persists to a lower extent for fluorine. The intermediate states during continuous transformation can be used as additional insights into the network's understanding of chemical property. In particular, by observing a single test molecule, there are instances where an additional iteration in inverse-training transforms the molecule with a repeated 'strategy' that has been used in previous iterations. The neural network appears to persist with a single strategy until the training terminates. We demonstrate this behaviour in Figure 3b, which shows a gradual process of reducing length, and in Figure 3a, which shows an initial molecule containing a single nitrogen atom, an intermediate molecule containing two nitrogen atoms, and a final molecule containing three nitrogen atoms. These cases validate that the network is charting deliberate, non-arbitrary paths toward the target logP; it has a non-trivial understanding of features corresponding to higher and lower logP values.

Comparison to VAEs
A simple 4-layer network highlights one key difference between PASITHEA and other optimization methods: we perform reverse-differentiation directly on the molecular representation, which is a one-hot encoding of SELFIES. Let us compare this approach with the related concept of variational autoencoder [7]. In VAEs, a latent space is learned by encoding and decoding molecules. After the reconstruction, another neural network can then optimize in the newly created latent space. In this case, since the prediction network is applied to the latent space, the basis for gradient computation lies in the latent space, not in the molecular representation itself.
The direct reversibility on the basis of model weights is important in the context of machine learning interpretability. Our goal is to understand directly what a neural network learns about a specific molecular property. We believe that probing the regression neural network with test molecules, without a detour over some specific latent spaces, is the most direct way to understand what the model has learned.

Outlook
We propose a direct, gradient-based property-optimization method that offers insights into the network's understanding of structure-property relationships. In the immediate future, we will verify our results on larger datasets and more complex molecules, such as PubChem. Furthermore, we plan to test PASITHEA on molecular properties that require expensive quantum chemistry calculations. We see much potential in discovering other 'strategies' the network may use in order to optimize molecules with different properties.
There is also work to be done to add transparency [22] to our approach. There are many possible directions, including exploring other surjective string representations that may be more suitable to the task of deep dreaming, and comparing other reverse-differentiable machine learning architectures that may be capable of a similar 'dreaming' process. Ultimately, our work can be used to find the underlying rules the neural network discovers in order to optimize a property, conjointly offering insights into how the network makes its predictions for interpretability and suggesting ways in which a human can use these rules in order to generate new and useful chemical compounds for explainability [22].