Disentangling Multiple Emitting Components in Molecular Observations with Nonnegative Matrix Factorization

Molecular emission from the galactic and extragalactic interstellar medium ( ISM ) is often used to determine the physical conditions of the dense gas. However, even from spatially resolved regions, the observed molecules do not necessarily arise from a single component. Disentangling multiple gas components is often a degenerate problem in radiative transfer studies. In this paper, we investigate the use of the nonnegative matrix factorization ( NMF ) approach as a means to recover gas components from a set of blended line intensity maps of molecular transitions that may trace different physical conditions. We run a series of experiments on synthetic data sets designed to replicate conditions in two very different environments: galactic pre-stellar cores and the ISM in high-redshift galaxies. We ﬁ nd that the NMF algorithm often recovers the multiple components resembling those used in the data-generating process, provided that the different components have similar column densities. When NMF fails to recover all the individual components it does however group together the most similarly emitting ones. We further found that initialization and regularisation are key factors in the ef ﬁ ciency of the NMF algorithm.


INTRODUCTION
Understanding the chemical and physical processes regulating the interstellar medium (ISM) is necessary for a thorough understanding of many astronomical processes, such as star formation (van Dishoeck & Blake 1998) and galaxy quenching (Colombo et al. 2020).As molecular lines are the primary means through which the conditions of the interstellar medium can be probed, their interpretation is of vital importance for accomplishing these goals.
Radiative-transfer modelling is the framework through which radiation emitted by the ISM can be connected back to the physical conditions of the gas it is emitted from.Through computational radiativetransfer modelling codes, such as RADEX (van der Tak et al. 2007), the strength of molecular transitions can be used to constrain the chemistry, temperature, density and other astrophysical conditions of the emitting interstellar medium.This, in turn, then enables a better understanding of the local environment in which the ISM is embedded.
Unfortunately, the radiative-transfer modelling problem needing solving when analysing molecular lines is typically degenerate.Because of the large number of free parameters in radiative-transfer models and their similar influence on intensities, usually multiple set of input parameters are capable of reproducing any given set of molecular line intensities.In Tunnard & Greve (2016), it was found that radiative transfer modeling was typically incapable of recovering parameters to better than half a dex.Although such degeneracies can be partially addressed through using rigorous statistical methodologies (Tunnard et al. 2015;James et al. 2021) or through incorporating astrochemical knowledge (Viti 2017;de Mijolla et al. 2019), the interpretation of molecular lines still remains difficult.
Notably, the analysis of emission arising from blended components is particularly challenging.In sub-mm astronomy, the spatial resolution limits of modern tele-scopes mean that we typically do not observe flux from a single gas component but rather from multiple nonresolved components, each with unique physical conditions.This is particularly the case for extra-galactic observations where objects as large as giant molecular clouds, or even whole sections of a galaxy, will be contained in a single beam.When this is the case, interpreting the physical conditions of the ISM becomes especially difficult as it is unclear what fraction of the measured emission originates from each component or even how many components are responsible for the observed emission.Moreover, as observers increasingly turn their attention towards the analysis of external galaxies, where multiple blended components are expected, mitigating this blending is likely to become an ever more pressing concern.
To address the difficulties associated with the radiative-transfer problem, there is value in approaching the task of disentangling gas components within observations from a more data-driven perspective.In this study, we investigate the use of data-driven approaches for separating the flux emitted by molecular lines into the individual gas components within observations.More precisely, we investigate the use of NMF (Lee & Seung 1999) which is a matrix-factorization algorithm used to decompose a non-negative matrix into a product of non-negative matrices.In NMF, components are estimated globally from all pixels available.As there will typically be far fewer components than pixels, NMF can then exploit the redundancy across pixels to constrain the possible shapes of the components.
Our work is not the first attempt at using data-driven approaches to understand the ISM.For example, there has been a long line of research into using Principal Component Analysis to interpret observations of the ISM (Lo et al. 2009;Gratier, Pierre et al. 2017;Neufeld et al. 2007).Perhaps more similar to our work, there have also been efforts into using NMF to interpret interstellar spectra.For example, NMF has been used to analyze interstellar bands (Berné et al. 2007;Foschino et al. 2019).Efforts at using NMF in astronomy date as far back as 1997, where positive matrix factorization, a precursor to NMF, was used to interpret molecular observations (Juvela et al. 1996a).
Specifically, in this study we attempt to answer the question of whether NMF can be used to recover gas components from a set of blended line intensity maps of transitions tracing different physical conditions.Here, unlike previous approaches such as those presented in Boulais et al. (2021); Juvela et al. (1996a), our interest is less about identifying co-moving parcels of gas within observations and more about recovering compo-nents tracing a common physical environment even if they may not have originated from the same spatial location.Whereas previous approaches have operated on the spectral datacube where frequency information is available and only made use of a small number of linetransitions, here we operate on the derived line-intensity maps for a larger number of lines and in doing so ignore the spectral shape of line-transitions.This means that our factorization ignore the radial velocity of spectral lines and only use the molecular composition when constructing components.Our hope is then for NMF components to group together all emission arising from gas under similar physical conditions leading to NMF components capturing the archetypical signatures of physical environments in observations.

DATA MODEL
We begin by detailing a model for the blending of components which relates line intensities and spatial locations of components in a given region to the intensities measured in line-intensity maps.
Let us represent a set of molecular line intensity-maps, observed over a region of N x × N y pixels, by a nonnegative matrix X of shape N M × N p whose entries contain the intensities for N M molecular line transitions measured at the N p = N x × N y pixels.We assume that multiple components contribute additively to the intensities such that the intensity at every pixel is the sum of the intensities of the separate components.We represent the strength of emission of the components in the various line-transitions by a (non-negative) matrix W true of shape N M × N C whose N C columns each represent the intensities emitted in the N M molecular transitions for a component.
Similarly, we encode the spatial contributions of each of the N C components to the N p pixels into a (nonnegative) matrix V true of shape N C × N p .Spatial contributions are factors between 0 and 1 indicating how much a component contributes to a pixel.The intensity of a given line at a given pixel will simply be the sum of the emission in that line for all components multiplied by their contribution to that same pixel.That is to say the line-intensity maps can be written as where Σ is an additional noise term chosen to be Gaussian.
So far, we have neglected the impact of convolution by a telescope beam factor on observations.Provided that all line intensities are downsampled to a common spatial resolution, convolution can be modeled through a matrix multiplication by a convolution matrix K of shape N p × N p .The data-generating process then becomes X = W true V true K + Σ which can be written using the associativity property of matrix multiplication as X = W true V conv + Σ where V conv = V true K. Hence, including convolution by a beam-filling factor does not change the original model beyond the replacement of V true with V conv .
A number of conditions must be met for our model of line-intensity maps to be accurate.Firstly, molecular transitions must not be optically thick so as for intensities to be (near) additive.Secondly, all components must be purely emitting rather than absorbing as we assume all contributions are positive.Finally, as explained in the previous paragraph, transitions must be preliminarily downsampled to a common spatial resolution for the convolution to be parameterizable by a matrix K.

NON-NEGATIVE MATRIX FACTORIZATION
In this study, we consider the problem of recovering a set of blended components W true and their convolved spatial locations V conv from intensity maps X, under the assumption that X = W true V conv + Σ.Although many algorithms could be relevant for this task, here we focus on investigating the non-negative matrix factorization algorithm (NMF) (Juvela et al. 1996b).Given a matrix X, the NMF algorithm aims to find non-negative matrices W and V such that X ≈ W V , where W and V are matrices of shape N M × N N M F and N N M F × N p and N N M F is the number of components used.In the context of molecular observations, N N M F is the number of physically distinct emitting components assumed to exist within telescope beam.It is a user-defined parameter for the algorithm which should ideally be equal to N C the true number of components so as for the dimensions of W and V to match those of W true and V true .NMF falls within the realm of blind-source separation algorithms which is to say that it is an algorithm making minimal assumptions on the contents of matrices W and V .
Formally, the NMF algorithm proceeds by determining non-negative W and V minimizing a loss function where ∥A∥ is a matrix-norm which is often chosen to be the Frobenius norm ∥A∥ 2 F = i,j A 2 ij .Minimizing the loss function is a non-convex optimization problem which is typically approached using iterative solvers.For this paper, we use the scikit-learn implementation of NMF making use of a coordinate-descent solver (Cichocki & Phan 2009) with W and V initialized as random matrices.
Matrix factorization is an ill-posed problem.Indeed, if X = W V , then it would also be the case that X = W P P −1 V for any invertible matrix P (of shape N C ×N C ) and so if W and V are solutions to the matrix factorization problem W P and P −1 V will also be solutions, provided they are non-negative matrices.Because of this, there is no formal guarantees that the NMF algorithm applied to line-intensity maps X will converge towards W true and V conv , even in the limit of infinite data.
In NMF, to further constrain the matrix factorization problem towards desirable solutions, it is common to add to the loss function additional regularization terms on W and V .In such cases, the loss function minimized by NMF becomes: In addition to regularization, the initialization of matrices W and V can impact the convergence of NMF and hence the matrices retrieved by the algorithm.In this project, we initialize the algorithm with non-negative random matrices as defined in the scikit-learn NMF documentation.
It is also worth mentioning that the NMF algorithm, like many other matrix factorization algorithms, is most effective when there are significantly fewer components than molecular lines; that is to say when N C << N M .This is because NMF relies on finding a set of components capturing the most information about observations X which requires grouping together correlated molecular lines.In the limit where the number of components matches the number of molecular lines observed (N C = N M ), the NMF loss is trivially minimized by setting W to the identity matrix (i.e. one component reconstruct each line) and V to be equal to X.Such a solution perfectly reconstructs observations but brings no useful information about the correlations amongst molecular lines.

SYNTHETIC DATA GENERATION
In order to assess the use of NMF in interpreting molecular line-intensity maps, we test its effectiveness at retrieving components from synthetic line-intensity maps.Synthetic data allow us to have exact knowledge of the ground-truth components and therefore evaluate how closely they are recovered by NMF.Our aim when constructing these synthetic observations is to approximately reproduce components and observations found within real-world datasets.Our aim is not to build extremely accurate simulations.As such, we have striven for simplicity over complexity in our data-generating process.
We construct synthetic line-intensity maps of transitions of CO, CS, HCN and HCO + (see Section 5), using numpy (Harris et al. 2020), by first using chemical and radiative transfer models to produce emission profiles (W true ) from chosen physical conditions.We then specify an arbitrary spatial location of the emission of each component (V true ).Finally, we select a noise level (Σ) and convolution due to the beam-filling factor (K) to produce the maps (X) following Eq 1 as discussed in Section 2.
The matrix W true contains the emission of each component for the molecular lines considered in X.For our experiments, we select a temperature and density for each component and the intensity of molecular lines are then determined using the radiative transfer modelling code RADEX (van der Tak et al. 2007).RADEX requires a gas density and temperature as well as the column density of the emitting species.Since the emission is proportional to the column density, and we choose the noise level, the absolute value of the column density is completely arbitrary and has no effect on the experiments.However, the differences in column density between components is observationally important so we use an astrochemical model to obtain reasonable values.
To do this, we use the astrochemical code UCLCHEM (Holdship et al. 2017) to find the steady state abundances of our species for the chosen temperature and density of each component, assuming standard Milky Way values for the cosmic-ray ionization rate and UV field of, respectively, 1.3×10 −17 s −1 and 1 Draine (equivalent to 2.74×10 −3 erg/s/cm 2 ).We convert abundances to molecular column densities by finding the factor that would convert the component with the highest CO abundance to a CO column density of 10 19 cm −2 .We then scale the other components by the same factor so that their column density ratios are equal to their abundance ratios.Therefore, whilst our absolute column density values are arbitrary, their ratios are informed by chemistry.
The matrix V true encodes how the emission of each component varies spatially from the peak value given by RADEX.We model the spatial contribution of a component at every pixel as a 2d Gaussian whose dimensions correspond to the x and y coordinates of the pixel.That is to say, the contribution at a pixel v xy is v xy = A×N (µ, σ) evaluated at the x,y coordinates of the pixel center.In this data-generating process, the mean µ of a component controls its location, the covariance σ its extent and the amplitude A its relative strength.For all components, the amplitude A is set such that v xy = 1 at the peak location of the component.The entries of V true are then the contribution v xy of the N p pixels for the N D components.
The telescope beam-response is approximated by a Gaussian kernel K.After creating noiseless line-intensity maps according to W true V true K, in our final step gaussian noise N is added to line-intensity maps such that every map has a signal-to-noise-ratio of 100 with gaussian noise independently, identically, and uniformly added to all pixels.

EXPERIMENTS
We now present our experiments on applying NMF to synthetic observations (with results plotted using matplotlib (Hunter 2007)).All experiments are carried out using a common blueprint for the characteristics/properties of the line-intensity map.We choose to produce synthetic observations of two environments: protostellar cores and high redshift galaxies.We arbitrarily pick the molecular transitions from the observational study of the nearby galaxy NGC 1068 from Viti et al. (2014).While the choice of this particular sample of lines was arbitrary, we note that it contains some of the most observed molecules in both galactic as well as extragalactic environments.All line-intensity maps are constructed on a 50 × 50 dimensional grid (i.e.50 × 50 pixels), for the transitions observed in Viti et al. (2014), to which Gaussian noise is added such that maps have a mean per-pixel SNR of 100 and to which a convolution with a Gaussian kernel of width 1/10 of the overall width of maps has been applied.Although this blueprint is only one amongst many possible parametrizations, we found the conclusions of our study to be robust to changes to the blueprint.
In the following experiments, before application of NMF, we always apply a series of preprocessing steps to the line intensity maps.These preprocessing steps would likely need to be reproduced when working on real observations.As a first step, in order to satisfy the non-negativity requirement of NMF, all negative-entries in the line-intensity maps are zeroed-out.As a second step, so as for the NMF loss to equally weight all lineintensity maps, we rescale line-intensity maps prior to running NMF through multiplication by a scaling factor such that, after multiplication, all line-intensity maps have an equal average flux of unity.After the NMF algorithm has finished running, this scaling factor is erased from the NMF components through dividing entries in W by the scaling factor associated to their transition.
As described in Section 3, there are two regularisation terms that can be used to control how the algorithm converges.To investigate the sensitivity of our approach to this regularisation, we try 3 different regularisation approaches (regularise W on its own, V on its own, or both W and V simultaneously) and for each of these regularisation approaches we try 21 different values of α.
Whilst testing the sensitivity to regularisation is important, there is a further confounding factor.The result of NMF varies depending on the initialisation of matrices W and V .To capture this variation, we repeat each regularisation set up 300 times with different random matrix intialisations which is achieved by passing init="random" in the scikit-learn NMF implementation.This allows us to be sure that where one regularisation set up performs better than another, it is not due to random fluctuations because we can test whether it is consistent across 300 repeats.
Overall, each experiment is performed with 3 different regularisation approaches at 21 different regularisation strengths repeated 300 times each for a total of 18,900 runs.To analyse this population of NMF runs, we have created a goodness-of-fit metric condensing all information about each individual run into a single number quantifying the level of agreement between recovered and true components.This metric is defined as the mean-absolute error (MAE) between the true components emission profiles W true and those retrieved by the NMF run (W ).However, since components returned by NMF can only be correct up to a scaling factor, we rescale the emission profiles to have an average flux (averaged over all transitions) of unity before calculating the MAE.This definition of MAE means that the line intensities (W ) of components retrieved by a run with a MAE of 0.1 will on average differ from the ground-truth intensities (W true ) by 10% of the average line-intensity of their closest matching ground-truth component.Finally, since any of the three NMF components could correspond to any of the three true components, we choose the permutation that gives the lowest MAE.We show in Appendix 6.1 the pseudo-code summarising the overall procedure for generating results as described in this section.

Protostellar environments
In this section we present insights derived from applying NMF to the first of two sets of mock lineintensity maps X proto , which models emission from blended protostellar environments.The maps in X proto are generated from three components, each meant to approximate a gas component found in star forming regions.Generated components are i) An extended "low density" component (T=20 K and n=10 4 cm −3 ), ii) a "medium density" component (T=10 K and n=10 5 cm −3 ), and iii) a "dense" warmer component (T=100 K and n=10 7 cm −3 ).These three idealized components may, for example, be representing the surround- ing ISM, the envelope, and the more central compact region of a protostellar core.The synthetic-line intensity maps (X proto ) as well as emission profiles and convolved contributions of emitting components are shown in Figure 1.From this Figure, we see some moderate spatial overlap between components as well as some components emitting more strongly than others.Both of these factors make it difficult, at least visually, to recover components from the line-intensity maps motivating the use of specialized retrieval algorithms.
We show in Figure 2 outputs obtained from running a population of NMF runs on X proto .Runs in the population use three NMF components, the same number as the number of chemical components used in the modelling, which means that the algorithm should have enough capacity to fit all components.Figure 2a (on the left) characterises the goodness of fit of NMF across this population of runs.Each panel of the figure represents a binned density plot of runs in which the x-axis coordinates quantify the amplitude of the regularisation term and the y-axis coordinate the goodness of fit of the retrieved components as defined by the MAE metric introduced in Section 5.The binned density plot captures the probability distribution of the goodness-of-fit metric at a given regularisation value (i.e. at a fixed x-axis value).Each panel can be viewed as a succession of vertically displayed histograms capturing the probability distribution in the MAE metric for a regularisa-tion level specified by the axis.Here, the probability distributions capture the variability which arises due to the random initialisation for the initial guesses of matrices W and V .Each panel considers a different type of regularisation: i) only on W (top panel), ii) only on V (central panel), or iii) on both W and V (bottom panel).Figure 2 b and c focus-in on two runs within the population and compare the relative line strengths (W ) and spatial contributions (V ) of these runs to those of the ground-truth components.The locations of these runs within the density plots are shown by markers #1 and #2.These two runs are selected for further study from the population of NMF runs because they represent two extremes (in terms of agreement with W true ) amongst all runs.
We find an excellent agreement between the components from run #1, one of the better fitting runs in the population, and the true components.Although there are minor mismatches between true and recovered components, such as most of the HCN(1-0) originating from the diffuse (top-panel) true component being mistakenly attributed to the middle-panel retrieved component, the recovered components are by and large in excellent agreement with the true gas components.However other runs in the population differ more strongly from the true components.Run #2 is an extreme example of such a run with one of the lowest y-values across the population.This run exhibits a much higher level of disagreement with the true components, with in particular its top-panel component not matching well any of the true components.The disagreement observed stems from the run learning a different decomposition into components than that used to generate the data.In this decomposition, the top panel component captures emission from both the top and bottom panel of the true components.Such a decomposition, while different from the data-generating decomposition, was found to match the lower MAE runs (e.g.run#1) in terms of its quality of fit to the line-intensity maps.The decomposition found in run#2 is thus an equally valid decomposition that cannot be ruled-out from studying the quality of the match between X proto and the factorized approximation W V .
By comparing the distribution of the y-coordinates of runs in the models population to the y-values of runs #1 and #2, we can understand how effective and reliable NMF is at recovering components closely matching with the underlying component.We observe that the vast majority of runs, irrespective of regularisation, have y-values comparable to #1, indicating that most runs retrieve components matching well with the underlying components.While poor-match runs comparable to #2 do exist, these runs are relatively rare.This seems to suggest that, at least for the considered dataset, NMF is an effective, albeit imperfect, algorithm for recovering the underlying components.
We can also study the impact of hyperparameters on NMF results.Regularization appears to have a beneficial effect on the agreement between recovered and ground-truth components with even small amounts of regularization on W being sufficient to steer the factorization away from the higher MAE solutions.Whilst regularization on V does not have as strong of an impact, it also improves the match between true and recovered gas components as can be seen from how the best performance is obtained with high-levels of regularization on both W and V .For now we do not discuss how the choice of the number of NMF components affects results (but such a discussion can be found in Section 5.3).However, it is worth acknowledging that we expect performance to degrade when using a sub-optimal number of components.

High-z galaxy (three-component fits)
Disentangling emission arising from blended components is of course a problem that exacerbates the further away the object is.In this section, we therefore study the performance of the NMF algorithm on a synthetic dataset approximating line-intensity maps from a highredshift galaxy X gal .Unlike in the previous experiment, where components had only moderate spatial overlap, in this scenario all components peak in the same location -the center of the galaxy -but have different spatial extents.This high-overlap between components complicates the retrieval process as there are fewer pixels which can clearly be attributed back to a unique component.This dataset thus constitutes a more challenging test for NMF but an important one as the NMF algorithm will only be practical if it can reliably produce useful retrievals across a range of different scenarios.
The components used in this run are a less dense component (T=10 K and n=10 4 cm −3 ), a medium density moderately extended and warm component (T=30 K and n=10 5 cm −3 ), and a dense compact hotter central component (T=50 K and n=10 7 cm −3 ).We note that, for a spatially unresolved high redshift galaxy, our three gas components are definitely more arbitrary in nature than in our previous example.Nevertheless, we may identify the first component with the average cold molecular gas, typical of a Giant Molecular Cloud, the second component with dense molecular gas heated by either nearby star formation or an AGN, and the third component with compact very dense gas found in the proximity of the nucleus of a galaxy.Once again, we validate the algorithm through plotting i) line-intensity maps and associated components, ii) density plots showing the MAEs for a population of three-component runs, and iii) the retrieved components for two runs whose locations in the scatter plot are shown by markers #1 and #2.These plots are shown in Figure 3 and 4. NMF runs use three NMF components to fit the line-intensity maps which is a number equal to the number of components within observations and as many as was used in the previous experiment.
We can see from these figures that there is a relatively good level of agreement between the true components and those recovered in runs #1 and #2, with runs having at most a MAE of 0.2 (which represents an average of 20% difference in intensity between the true line intensities and those produced by the NMF algorithm).This suggests that for this high-overlap scenario, NMF can approximately recover the underlying components.However, the mismatch between true and recovered components appears larger here (at least in terms of contributions) than in the previous low-overlap scenario (Section 5.1).In particular, the algorithm seems to struggle with decomposing the fully overlapping central component, favouring a decomposition into spatially non-overlapping components, despite the high overlap betweenthe underlying components.This can be seen by how, in both runs, emission arising from more extended components in the central portion of the observations is misattributed to the less extended retrieved component (bottom panel).This in turn then leads to a mismatch between the true and retrieved emission profile for this less extended components as well as the creation of donut-like holes in the spatial profiles of the other more extended components (at the location of misattributed emission).
Overall, these results highlight the presence of degeneracies when doing retrievals of observations containing highly overlapping components.When such high-spatial overlap components exist, the NMF algorithm struggles to recover the proper overlap between components.Furthermore, no choice of regularisation appears capable of lifting these degeneracies.Nevertheless, recovered components still match fairly well with observations.

High-z galaxy (two-component fits)
In real-world applications, the number of components used in the NMF retrieval will need to be estimated from observations.However, because of differences in physical conditions and chemical make-up, different spatial locations will emit in different molecular transitions leading to the existence of a higher number of components.In such cases, to maintain the regime of fewer NMF components than observed line-transitions, NMF retrievals must be run with fewer NMF components than required to fully capture observations.The aim is then not to retrieve the true components but instead to retrieve useful components.We examine here how NMF behaves in this regime.
To do this, we run NMF on our high-z line intensity maps but using only two components instead of the three components actually required for reproducing the lineintensity maps.This provides a test for the behaviour of the NMF algorithm when data is more complex than the available number of components.Figure 5 shows the output of such two-component runs.Here, we do not show density plots since our goodness of fit metric is not applicable when the number of retrieved and ground truth gas components differ.
Comparing these runs to their three-component counterparts (Figure 4) reveals that the two-component runs seem to only differ from the three-component runs in their treatment of the top and middle-panel components.Because the two-component runs are incapable of fully capturing the emission from all three components, they group together the top and middle-panel components and instead return a component whose emission profile is approximately a mixture of the emission from these two components.On the other hand, the lowestpanel component of the two-component runs, characterizing the densest component, is almost identical to the analogous component in the three-component runs.In fact, similarly to the three-component runs, it (mistakenly) captures emission from other gas-phases within the central region leading to a similar mismatch between retrieved and true dense component and a similar (incorrect) donut shape spatial emission in other components.
Conceptually, it appears that because the twocomponent NMF runs did not have enough capacity to reproduce all three components, they selectively grouped the two most similar components -the top and middle panel components -into a single "archetypical component" capturing the average emission across both phases whilst dedicating the remaining component to the densest component because it emitted more uniquely.Such behaviour suggests that, when lacking enough components to fit observations, the NMF algorithm will fall-back on grouping together the most similar components into archetypical components.This means that the algorithm could allow for synthesizing highly-complex observations into a manageable number of components for human interpretation.

High-z galaxy -dependence on the number of components
In the previous section we studied the fit to our high-z environment X gal , retrieved by a two-component NMF model.We found that the two-component model did a good job at reproducing line-intensity maps despite these technically containing three components.In this section we explicitly study the effect of varying the number of components N C used in the NMF algorithm on the quality of fit to line-intensity maps.It is worth noting that whereas previous sections have evaluated the NMF algorithm by assessing the quality of its retrieved components, here we evaluate NMF by assessing how well it reproduces the line-intensity maps (irrespective of whether this emission arises from components resembling the true gas phases).This better matches the reality of fitting an unknown number of components where the only data we have are the line-intensity maps.
To show this, in Figure 6 we plot the mean-squared error (MSE) between the NMF fits to X gal (obtained as W V ) and X gal for different values of N C , the number of components in the NMF (shown on the x-axis).Here, to calculate the MSE, we first normalise each line-intensity map in order to prevent any single line-intensity map from dominating over others in the calculation of the MSE, then flatten the line-intensity maps into a 1D vector and finally calculate M SE = 1 N N i=1 (x i − xi ) 2 .We also show, as a baseline, the MSE between X gal and the noiseless version of X gal which represents some ideal MSE value for a model reproducing the ground-truth gas phases and none of the noise.
Interestingly, we find that a two-component NMF fit, similar to that explored in the previous subsection, does a good job at matching the emission in observations X gal Normalized Flux despite using fewer components than the number of gas phases present in the observations.This is particularly interesting because it may be conceptually similar to analysing an emission map as a set of archetypical components (gas originating from a starburst ring; dense gas surrounding an AGN; etc) as it is commonly done, rather than considering its true physical gas components.In fact, we find that the two-component fit has a lower MSE than the baseline representing the MSE solely due to noise.This is due to the NMF decomposition overfitting the noise in the line-intensity maps.From this figure, we further find that increasing the number of components in the NMF algorithm leads to further overfitting to the noise in observations, as evidenced by the seemingly linear decrease in MSE with increasing dimensionality of N C .
Since the noise level can be estimated from the map itself, this provides a method for estimating the number of components one should use to analyse observations.A similar plot can be produced for a real lineintensity map and the optimal number of components is the number that causes the MSE of the NMF algorithm to reach the true noise level.Crucially, this would be the number of gas components that are sufficiently distinct for the algorithm to separate, and not the true number of physical components.However, this would be useful since, even when analysing archetypical components, there is often a question of how many to include.This approach should be further tested by analysing low resolution maps of objects where high resolution imaging has uncovered distinct components.Figure 6.Mean-squared error (MSE) between the NMF fits to X gal (obtained as W V and denoted as X(NC )) and X gal for varying number of NMF components NC (blue) where MSE is plotted on log-scale.As a baseline we also plot the MSE between X gal and the noiseless version of X gal (green).

Discussion and conclusions
We have run a series of experiments aiming at recovering the underlying components within observations of molecular line-intensity maps.These experiments were run on a set of synthetic datasets designed to replicate conditions expected from real observations while also capturing some diversities of environments where the algorithm could be applicable.Because NMF ignores the spatial locations of pixels and would perform equally well if pixels were scrambled, we did not put a strong emphasises on reproducing the geometrical structure of real molecular maps and instead focused on reproducing the expected overlap between components.To capture the diversity of environments we built two synthetic line-intensity maps on which to test our observations: one with a moderate overlap between components and a second dataset with a high overlap.We validated the effectiveness of NMF on these synthetic environments through studying the algorithm's ability to recover the components used in the data-generating process.Because the convergence of the NMF algorithm was found to be sensitive to hyperparameters, we have characterised the performance across a range of hyperparameters rather than for a single set of hyperparameters.
The results of our investigations were as follows.We found that the NMF algorithm often recover components resembling those used in the data-generating process.We found that when the NMF algorithm lacked the capacity to reproduce all components, it would group together the most similarly emitting components into one archetypical component.However, in general, the matrix-factorization problem solved by NMF did not have a unique solution and so its convergence to a set of components closely resembling the underlying components was not guaranteed.Instead, initialisation and regularisation played a crucial role towards influencing the type of solution the algorithm converged towards, with sometimes very different solutions depending on the hyperparameters.This sensitivity of NMF to model hyperparameters is obviously not ideal as it means that the algorithm can sometimes recover components that match poorly with observations.In general we also found that the NMF algorithm struggled with recovering the proper spatial overlap of overlapping components.
Our experiments provide a proof-of-concept towards the applicability of NMF for analysing line-intensity maps.However, some differences do still exist between real data and the data within our experiments.One such difference is that our mock line-intensity maps shared a common beam size.When working with real data, it will be crucial to down-sample observations so as to guarantee that parcels of emitting gas have the same spatial extent across all line transition maps.Another difference is that because of effects such as optical depth and differences in local physical conditions, it is unlikely that real data will be perfectly characterised as having arisen from a small number of components.Nevertheless, as demonstrated by our experiment in which we used a smaller number of NMF components than the number of components, even when the data-generating process is not perfectly satisfied by observations, NMF still retains the ability to uncover meaningful components in the form of archetypical components.
A more important weakness of this work is that our experiments only considered scenarios in which observations were created from components of similar column densities and thus in which all components contributed more or less equally to the line-intensity maps.If components have dissimilar column densities then it becomes much more challenging for NMF to recover the fainter components.This is because as the lower column density components emit noticeably more weakly, correctly fitting them does not contribute much to the minimisation of the loss function.For synthetic observations containing low-column density components, in order to recover the weakest components it is crucial that the signal-to-noise within observations be high enough for these weakest components to not be drowned by noise.
We also wish to emphasise that our work provides only one, amongst potentially many, approaches for incorporating matrix factorization constraints into the component retrieval process.A potential extension of this work would be to combine NMF with radiative-transfer codes to enforce on retrieved components both the matrix factorization constraints of NMF and the physical realism constraints of radiative-transfer codes.
Regardless of the exact approach taken, the large degeneracies associated with radiative transfer currently make it difficult to constrain the properties of the ISM especially at high-redshift.Combining the information available across all spatial locations during inference, as is done in NMF and other matrix factorization approaches, is, we believe, a promising approach to address such degeneracies.However, whilst NMF is useful for understanding the relationship between lines within observations, on its own it is not sufficient for fully constraining components.

Figure 2
Figure 2. a) Binned density plots characterizing the performance of a population of three-component NMF runs on Xproto in which x-axis coordinates quantify the amplitude of the regularization term and y-axis coordinates the goodness of fit of the retrieved components as defined by the MAE metric introduced in Section 5.Each panel considers a different type of regularization: i) only on W (top panel), ii) only on V (central panel), or iii) on both W and V (bottom panel).b) Emission profiles of the true components alongside those of two runs in the population whose location in the scatter-plot are represented by markers #1 & #2.c) Convolved spatial contributions of the same two runs.

Figure 3 .
Figure 3. Synthetic line-intensity maps of high-redshift galaxy (X gal : bottom) alongside the emission profile (Wtrue : top-right) and convolved (unitless) spatial contribution (Vconv : top-left) of each components contributing to the maps.

Figure 4
Figure 4. a) Binned density plots characterizing the performance of a population of three-component NMF runs on X gal in which x-axis coordinates quantify the amplitude of the regularization term and y-axis coordinates the goodness of fit of the retrieved components as defined by the MAE metric introduced in Section 5.Each panel considers a different type of regularization: i) only on W (top panel), ii) only on V (central panel), or iii) on both W and V (bottom panel).b) Emission profiles of the true components alongside those of two runs in the population whose location in the scatter-plot are represented by markers #1 & #2.c) Convolved spatial contributions of the same two runs.

Figure 5 .
Figure 5. Emission profiles and (unitless) convolved spatial contributions of the true components alongside those of two random runs in a population of two-component runs fitted to X gal .