First results from SMAUG: Insights into star formation conditions from spatially-resolved ISM properties in TNG50

Physical and chemical properties of the interstellar medium (ISM) at sub-galactic ($\sim$kpc) scales play an indispensable role in controlling the ability of gas to form stars. As part of the SMAUG (Simulating Multiscale Astrophysics to Understand Galaxies) project, in this paper, we use the TNG50 cosmological simulation to explore the physical parameter space of 8 resolved ISM properties in star-forming regions to constrain the areas of this hyperspace over which most star-forming environments exist. We deconstruct our simulated galaxies spanning a wide range of mass (M$_\star = 10^{7-11}$ M$_\odot$) and redshift ($0 \leq z \leq 3$) into kpc-sized regions, and statistically analyze the gas/stellar surface densities, gas metallicity, vertical stellar velocity dispersion, epicyclic frequency and dark-matter volumetric density representative of each region in the context of their star formation activity and galactic environment (radial galactocentric location). By examining the star formation rate (SFR) weighted distributions of these properties, we show that stars primarily form in two spatially distinct environmental regimes, which are brought about by an underlying bi-component radial SFR surface density profile in galaxies. We examine how the relative prominence of these two regimes depends on host galaxy mass and cosmic time. We also compare our findings with those from integral field spectroscopy observations and achieve a good overall agreement. Further, using dimensionality reduction, we characterise the aforementioned hyperspace to reveal a high-degree of multicollinearity in relationships amongst ISM properties that drive the distribution of star formation at kpc-scales. Based on this, we show that a reduced 3D representation underpinned by a multi-variate radius relationship is sufficient to capture most of the variance in the original 8D space.


INTRODUCTION
Local characteristics of the ISM drive a complex interplay of processes at the sub-galactic scale which regulate the star formation and feedback in a galaxy (e.g., Leroy et al. 2008Leroy et al. , 2013Shi et al. 2018;Dey et al. 2019;Sun et al. 2020). Gravitational instabilities on kiloparsec (kpc) scales heavily influence the lifecycle and properties of giant molecular bm2900@columbia.edu clouds, thereby setting up star formation and controlling its efficiency in different regions of the galaxy (e.g., Elmegreen 1987Elmegreen , 1991Kim & Ostriker 2001;Kim et al. 2002;Kim & Ostriker 2006;Dobbs 2008;Dobbs et al. 2011;Bournaud et al. 2010;Renaud et al. 2013). Feedback on these scales influences the dynamical state of the gas by limiting its evolution toward high densities and dispersing the clouds (e.g., Hopkins et al. 2012a;Kim et al. 2013;Kim & Ostriker 2015;Semenov et al. 2017Semenov et al. , 2018. Additionally, hydrodynamical interaction between the hot and cold ISM on kpc-scales facilitates the acceleration of galactic outflows (e.g., Hopkins et al. 2012b;Muratov et al. 2015;Anglés-Alcázar et al. 2017;Kim & Ostriker 2018) that subsequently suppress star formation through the depletion of cold gas, and prevention of future gas cooling and accretion (e.g., Bouché et al. 2010;Davé et al. 2012;Lilly et al. 2013;Forbes et al. 2014a;Rodríguez-Puebla et al. 2016). As such, by virtue of controlling the incidence and effects of star formation, physical properties of the interstellar medium on kpc-scales play a vital role in modulating the overall baryon cycle of galaxies (Naab & Ostriker 2017;Somerville & Davé 2015).
Deciphering the link between star formation and galactic structure is a multi-scale problem, ranging from ∼100 pc scale of molecular cloud collapse to the 10 5−6 pc scale of the circumgalactic medium. Due to the steep computational challenge of simulating the vast range of scales involved, modern simulations of galaxy formation must implement only approximate sub-grid treatments of small-scale physical processes, smoothing over much of the complexity at or below cloud-scales (Naab & Ostriker 2017). Consequently, while results from contemporary large-scale cosmological simulations have been shown to match the integrated stellar mass abundances and star formation rates (SFRs) of observed galaxies (see Somerville & Davé 2015, for a review), they invoke simplified treatments of the underlying, often less-understood, small-scale processes such as star formation and ISM physics, making the treatment less realistic. Understanding the conditions that govern the onset of star formation, and their interrelationships, is therefore, of fundamental importance to the modern theory of galaxy formation and evolution (Morselli et al. 2019;Trayford & Schaye 2019;Orr et al. 2018;Chruslinska & Nelemans 2019). The conditions in which stars form is a crucial ingredient that must be firmly constrained in order to improve our models of star formation and feedback. Beyond star formation, characterising the physical properties of galaxies in a spatially-resolved manner also has ramifications for our understanding of how stellar populations vary in their compositions across galactic scales as well as the kinematic evolution and dynamical state of galaxies.
It has long been known that properties of the birth sites of stars can differ substantially between regions not only within a galaxy, but also between galaxies of different global phenotypes (masses, morphologies, etc.). Across large parts of a galaxy, star-forming gas can show a range of densities and metallicities, often correlated with the environment and changing with galactocentric radial location (e.g., Pagel & Edmunds 1981;Koda et al. 2009;Heyer & Dame 2015). Past observations capable of directly probing cloud-scale quantities like velocity dispersions and surface densities for cold gas -such as those with the Atacama Large Millimeter/submillimeter Array (ALMA) and the Northern Extended Millimeter Array (NOEMA) -have indeed revealed sizeable deviations amongst the properties of star forming molecular clouds in different sites, namely, the Galactic centre (Shetty et al. 2012;Battersby et al. 2020), outer parts of our Galaxy (Rice et al. 2016;Miville-Deschênes et al. 2016) and local star-forming galaxies (Schinnerer et al. 2013;Sun et al. 2018). Along the same lines, merging and starburst galaxies have been shown to boast higher densities, line-widths and diluted metallicities (Cortijo-Ferrero et al. 2017;Elmegreen et al. 2016;Irwin 1994) than most other environments. Nevertheless, results in this area have been limited to a handful of physical parameters, and to small samples of galaxies lacking diversity in their global properties Bolatto et al. 2008;Wong et al. 2011;Druard et al. 2014;Faesi et al. 2018).
In the last decade, our understanding of both gas and stellar properties on resolved (∼kpc) scales has greatly benefitted from the advent of integral field spectroscopy. Owing to the use of wide-field multiplexed integral field units (IFUs) in large galaxy surveys -namely, the Calar Alto Legacy Integral Field Area Survey (CALIFA; Sánchez et al. 2012), the Sydney-AAO Multi-object Integral field spectrograph Galaxy Survey (SAMI; Croom et al. 2012), and Mapping Nearby Galaxies at Apache Point Observatory survey (MaNGA; Bundy et al. 2015) -we now have simultaneous photometric+spectroscopic information across relatively large radial extents of low-redshift galaxies for a statistically significant sample with varied structural and environmental properties. Recent works from these surveys have explored local ionization states of galactic regions and the different processes responsible for them, the presence of resolved scaling relations down to kpc-scales, their comparison with global counterparts, and the role of star-formation and dynamical processes in giving rise to these relations (see Sánchez 2020;Sanchez et al. 2020, and references therein). Furthermore, the combination of IFU surveys with millimeter-wave interferometry, e.g., EDGE-CALIFA 1 (Bolatto et al. 2017) and ALMaQUEST 2 (Lin et al. 2019) has now poised us to better understand the kinematical properties of molecular gas and its role in the fueling and quenching of star formation on resolved scales.
On the theoretical front, to investigate star formation and its effects at a higher level of spatial detail, intermediate-scale simulations capable of resolving supernova feedback (for e.g., the TIGRESS framework by Kim & Ostriker 2017; see also Gatto et al. 2017;Kannan et al. 2020) are now helping to close the gap between stellar ( pc) and cosmological scales ( Mpc). In general, these are vertically stratified "tall-box" simulations representative of specific local star-forming regions within a galaxy with domain sizes of ∼kpc and ∼pcscale resolution. In such simulations, the adopted models allow for a more comprehensive and self-consistent evolution of a self-gravitating multiphase ISM with an explicit treatment of star formation and supernova feedback and very few a-priori assumptions. The ISM content and disc gravity within such a framework are parameterized by a set of physical properties (e.g., gas/stellar surface density, dark matter density, gas metallicity, stellar scale height/vertical velocity dispersion etc.) which are representative of the patch being simulated, and play a central role in governing the process of star formation. Hence, performing a systematic exploration of these parameters in these simulations comprises a vital effort towards achieving a thorough insight into how diverse galactic environments influence the formation of stars, and consequently stellar feedback and outflow properties. Unfortunately though, given the high dimensionality of this parameter space, and the immense computational cost involved in conducting a sweep through it, this problem does not lend itself well to a brute-force approach and requires a better sampling scheme to pick out the most essential initial conditions.
As part of the SMAUG consortium 3 , we undertake in this paper the task of generating and statistically surveying the multi-dimensional parameter space of the aforementioned local physical properties of star-forming sites in a largevolume cosmological simulation. Specifically, we use the IllustrisTNG50 simulation to do a coarse-grained (∼kpcscale) exploration of the gas surface density, gas metallicity, stellar surface density, stellar vertical velocity dispersion, epicyclic frequency, and dark-matter volumetric density in a statistically significant sample of galaxies across a wide range of galaxy mass as well as redshift. While these properties may not potentially be exhaustive in describing the conditions of star formation -either because other properties have a less obvious connection to star formation or because the timescales for their evolution are short compared to the star formation time -we have aimed to examine the most common quantities upon which contemporary first principle based models are built (Table 1 summarises the list). We study these physical parameters in the context of ongoing star formation (via star formation surface density) and the galactic location/environment (radial galacto-centric distance) of each site. In doing this study, our goal is to uncover the region(s) of the parameter space over which most star-forming environments exist, or in other words, identify the birth-conditions in which most stars in the universe form. Alongside this, our work will enable the recognition of meaningful initial conditions for future local/tall-box simulations, and help devise an optimal strategy for their exploration. Doing so would enable us to understand the link between star formation and outflows directly, which is a key element of the SMAUG project's larger goal to develop and implement advanced sub-grid recipes, such as for star-formation, winds and black hole feeding, that are built on the results of simulations that explicitly resolve small-scale physics, thereby eliminating the necessity of tuning to observations (see Smith et al. 2020;Kim et al. 2020;Angles-Alcazar et al. 2020;Li et al. 2020a,b;Fielding et al. 2020;Pandya et al. 2020, for other work in SMAUG). Lastly, the results of our work will also be a useful research tool for pursuing in detail future lines of inquiry including but not limited to investigating the universality of resolved scaling relations in galaxies, and the 3 https://www.simonsfoundation.org/flatiron/ center-for-computational-astrophysics/galaxy-formation/smaug intricate link between local star-formation histories and the scaling laws that control star formation within galaxies. This paper is organised as follows: Section 2 provides a brief description of several features of the IllustrisTNG simulations that are most pertinent to this study. Section 3 describes our detailed methodology for generating the ISM physical parameter space. In Section 4, we present a picture of the parameter space in one dimension, focusing mainly on the distributions of all properties, as well as their dependence on galaxy masses and cosmic time. Section 5 portrays a multi-dimensional view of all of the parameters, where we briefly explore resolved scaling relationships amongst properties, and later, describe the results of dimensionality reduction conducted on the hyper-parameter-space. In Section 6, we make a qualitative comparison of our simulation results with resolved observations from the MaNGA IFU survey and report our findings. Finally, we summarise the conclusions of our work in Section 7.

THE TNG SIMULATIONS
The IllustrisTNG project is a suite of gravomagnetohydrodynamic simulations (Marinacci et al. 2018;Nelson et al. 2018;Springel et al. 2018;Naiman et al. 2018;Pillepich et al. 2018a;Nelson et al. 2019a) consisting of three separate cosmological volumes (TNG300, TNG100, and TNG50) run using the moving-mesh code AREPO (Springel et al. 2001) at distinct mass resolutions. The physical model employed is described in Pillepich et al. 2018b , and includes subgrid treatments of star formation (Springel & Hernquist 2003), metal enrichment from stellar evolution (Naiman et al. 2018), ideal magneto-hydrodynamics (Pakmor & Springel 2013), SN-winds and AGN feedback (Weinberger et al. 2017), and has been shown to yield results that agree with observations over a diverse range of galaxy properties. For this work, we primarily use the highest resolution realisation of the smallest volume box in the suite, i.e. TNG50-1. Here, we present a brief summary of the key features of TNG50-1 (hereafter identified as TNG50; Pillepich et al. 2019;Nelson et al. 2019b), and refer the interested reader to the papers mentioned in this section for further details on the TNG suite.
TNG50 has a uniformly-sampled domain volume of roughly 50 3 Mpc 3 with 2 × 2160 3 initial resolution elements, and mass resolution of 8.5 × 10 4 and 4.5 × 10 5 solar masses for baryons and dark matter respectively. The co-moving gravitational softening lengths for dark matter and collisionless star particles is 290 pc, while the gas gravitational softening length is adaptive and set by its cell size, with a floor at 74 pc. Both of these values are considerably smaller than the analysis scale we are interested in (i.e. ∼kpc), and hence ensure ample resolving power. At these values, TNG50 provides an exceptional combination of volume alongside resolution that allows us to meaningfully investigate spatiallyresolved star formation in this study.

Star Formation in TNG50
The process of star formation from dense gas in all TNG simulations is governed by an updated Springel & Hernquist (2003) (hereafter SH03) sub-grid model of star formation, which uses a specified density threshold as a criterion for star formation to set in. Gas less dense than the threshold value n th = 0.13 cm −3 (in physical units) is considered non starforming, and its behavior is driven purely by hydrodynamics (in addition to gravity) based on an ideal-gas equation of state. Whereas, above this density value, the model treats the inter-stellar medium as an admixture of two phases of gas i) a cold, dense star-forming cloud phase, and ii) a hot, ionised phase. Of these, the star-forming phase stochastically turns into stars on a timescale set by the local cold gas density. This conversion of gas into stars is then regulated by the heating of the ISM due to supernova feedback (assumed to be instantaneous in this case) resulting from the death of a fraction of the formed stars. Finally, in this self-regulated regime, the model describes the bulk properties of this multiphase highdensity gas, such as its pressure and temperature, in terms of an effective equation of state as a function of density.
The key way in which the model incorporated in the TNG simulations differs from the original SH03 model is in its use of a different initial mass function (Chabrier 2003instead of Salpeter 1955) as well as a softer equation of state. Additionally, on account of the numerical challenge of forming stars due to the very high resolution and consequently extremely short MHD timesteps in the TNG50 simulation, the model implemented therein employs a steeper relationship between the equilibrium star formation timescale and gas density (t ∝ n −1 ) for the densest gas in the simulation ( 230 n th ) as opposed to the canonical scaling (∝ n −1/2 ) set by the observed Kennicutt-Schmidt relation (Schmidt 1959;Kennicutt 1998) used in other TNG boxes. This change was made for numerical efficiency reasons alone, and was found to have no significant impact on the overall gas content or galaxy properties in the simulation (for more details, see Nelson et al. 2019b).
Utilizing the TNG simulations, a number of studies seeking to understand star formation and its implications have reported promising outcomes. E.g., Tacchella et al. (2019) investigate the connection between galaxy morphology and star formation, finding that the morphology of galaxies exhibits only a weak correlation with their star formation activity, which matches the observed correlation. Additionally, Donnari et al. (2019) have characterised the star formation activity of simulated galaxies, and shown that the slope and normalisation of the star-forming main sequence in TNG are in excellent agreement with observations at z = 0. Using an improved framework to estimate neutral gas abundances, Diemer et al. (2019) demonstrated a good agreement between the simulated and observed galaxy HI size-mass relationship as well as the overall gas fractions. And most recently, Nelson et al. (2019b) have described how the subgrid input parameters in TNG successfully give rise to a realistic multi-phase structure and diverse properties of feedbackdriven galactic outflows. These studies, combined with the multitude of results on other aspects of galaxy formation, provide a solid empirical validation to the TNG model.

METHODS AND ANALYSIS 3.1. Galaxy Sample Selection
The large volume of TNG50 provides a statistically sizeable sample of galaxies at a resolution capable of discerning the internal structure of galaxies. Haloes and their substructure are identified in the simulation using the standard Friends of Friends (FoF; Turner & Gott 1976) and SUBFIND (Springel et al. 2001;Dolag et al. 2009) algorithms. The FoF algorithm identifies collective groups of dark matter particles (aka halos) based on their physical proximity, whereafter the SUBFIND algorithm identifies gravitationally self-bound associations of all resolution elements combined within each halo (aka subhalos). The gas, stars, black holes and dark matter associated with the most massive subhalo within a FoF halo are considered as belonging to the central galaxy, while the rest of the self-gravitating substructures, when present, are classified as satellite galaxies. At the present epoch, the simulation contains a total of ∼96,000 galaxies with M 10 5 M with a variety of morphologies, sizes and formation histories (e.g., Pillepich et al. 2019;Joshi et al. 2020;Pulsoni et al. 2020). For our specific analysis, we construct a sample consisting of both centrals and satellites as follows: 1. We select galaxies with stellar masses M in the range 10 7−11 M at redshifts z = {0, 0.5, 1, 2, 3}. The lower limit of 10 7 M is chosen to ensure measurement robustness in view of the mass resolution of the simulation. In this mass range, we find a range of morphologies, from the extended, actively star-forming discs to quenched (non star-forming) elliptical systems, similar to those observed in the real universe.
2. For the selected galaxies, we implement a minimum cut on the number of particles contained within the galaxy to be at 100 each for the gas, stars and dark matter, such that all three components of the galaxies are sufficiently resolved. At the standard TNG50 resolution, this corresponds to a minimum M 10 6.9 M and M dm 10 7.7 M . We also impose a minimum total star formation rate threshold of 5 ×10 −4 M yr −1 within a spherical volume of twice the 3D stellar half-mass radius (R 1/2 ) centered at the centre of mass of each galaxy. Even though the latter measure leads to an exclusion of galaxies that are fully or almost-fully quenched, it does not adversely impact this study given that such galaxies are not expected to contribute much in terms of actively star forming regions.
3. Lastly, due to our inclusion of satellite galaxies in the sample, we take precaution to remove any misidentified subhalos such as clumps or fragments in the outer parts of a halo that did not arise from standard cosmological processes of structure-formation and collapse, using the SubhaloFlag in the simulation (see Nelson et al. 2019a for more details). Galactocentric radius, R kpc NOTE-All spatial quantities here are expressed in physical units (not co-moving), and property roles are defined in the context of their contribution to the tall-box simulation physics.
Finally, our sample contains 10394, 13806, 16663, 21039 and 20630 galaxies respectively at z = {0, 0.5, 1, 2, 3}. In totality, our selected sample at each of the redshifts encompasses >80% of the total instantaneous star formation occurring in the simulation volume. For most of the analysis in this paper, we will focus on z = 0 galaxies with stellar mass in the range 10 9−10 M , but also explore the variation in resulting trends with galaxy stellar mass and redshift.

Galaxy Data Processing
Since our goal is to characterise the birth environments of stars in the context of their corresponding ISM properties, we set up a multi-dimensional physical parameter space by conducting a coarse-grained measurement of these properties from individual star-forming regions within our galaxy sample. In a nutshell, we divide each galaxy into spatiallyresolved regions by projecting it onto a 2-dimensional image grid (in the manner of Diemer et al. 2018), where each pixel represents an ISM patch, and subsequently extract the physical parameters of interest as column-integrated quantities from the image pixels so produced. We further elaborate on the principal facets of our analysis below.
Particle Smoothing: Due to the discretised nature of the simulation volume and a finite mass resolution, each cell/particle in it represents an unresolved entity that should ideally be spatially distributed. For investigating spatially resolved properties, as we do in this paper, it is thus important that we alleviate the effects of this coarse sampling so as to avoid biasing our quantitative analysis or making it dependent on the choice of the analysis scale adopted.
In this work, we utilise a smoothing scheme where we re-sample each star, gas and dark matter particle such that it gives rise to a finitely extended distribution of sub-particles. The smoothing length σ, which governs the spatial extent within which the sub-particles originating from a parent particle are distributed, is determined adaptively for each parti-cle based on the local density of the corresponding particle type. This translates into the radius encompassing a fixed number of nearest neighbours N ngb , which we choose to be 32 and 64 for stars and dark matter respectively. For gas, we use one-third of the distance to the 32nd nearest neighbour. We note that our choice of N ngb here is determined based on a visual conciliation between noticeable pixelation effect and fading of resolved structure in galaxies. Using this smoothing length as the standard deviation, we then convolve each particle with a discrete 2D Gaussian kernel of size 6σ on a side and resolution 1 kpc centred on its position, giving us a collection of sub-particles that inherit the physical properties of their parent particle in a manner that conserves the extensive properties (mass, energy, momentum) of the parent particle. From here on, we use these re-sampled sub-particles in lieu of the original particles for further analysis.
Galaxy rotation and coarse-graining: For each galaxy, we first compile the list of both the parent (original) gas/star/dark-matter and corresponding sub-particles comprised within it. We then calculate the angular momentum of the galaxy in its centre of mass rest frame, taken here to be that of all the star-forming parent gas particles within 2×R 1/2 , and use it to perform a rotation on the galaxy such that the cartesian z-axis is aligned with the direction of the calculated angular momentum vector. This operation transforms the galaxies with(without) rotational symmetry to a face-on(random) orientation, which is then spatially binned using a grid of predetermined size and resolution along the x-y plane to create an image representation for each of the desired physical quantities. In this study, we choose a fixed pixel scale (grid resolution) of 1 kpc, which closely emulates the domain size of local ISM simulations, as well as the sampling scale of modern IFU surveys. The overall size of the square image is given by L image ≈ 2 × max(2R 1/2 , R 95,sfr ), where R 95,sfr is the radius within which 95% of all starforming parent gas particles of the galaxy are enclosed. This kpc on a side, which is twice the radius encompassing 95% of all star-forming gas particles in this galaxy. The white circles denote twice the stellar half-mass radius at ≈8.5 kpc.
criterion allows us to include most of the star formation in each galaxy while also accounting for a significant fraction of its visible stellar component. A discussion on convergence with different values of pixel scale and simulation resolution is presented in Appendix A. Since we do not impose any restrictions on max(2R 1/2 , R 95,sfr ) to assume an integer or half-integer value, we do not expect the center of the image grid to coincide with the coordinate center of the galaxy.

Generation of Property Maps
To obtain the desired physical property maps for a galaxy, we utilise the correspondingly binned gas, star and dark matter sub-particles gravitationally bound to the subhalo (as determined by SUBFIND) that lie within a vertical column of height z = ±20 kpc (unless otherwise noted) relative to the projection plane. This value is arbitrarily chosen, and was selected to minimise the contamination from the hot gaseous corona as well as from halo stars, while preserving the contribution from the diffuse stellar component associated with disc galaxies as well as accommodating galaxies that do not have a well-defined rotation axis. Below, we provide the exact definitions used to calculate local ISM properties of individual pixels (also see Table 1 for a summary list): Gas surface density Σ g . Sum of the masses of all gas (sub-)particles contained within the column divided by the area of the pixel. For non star-forming gas, we only include the fraction of mass present as neutral hydrogen (although this generally includes HI + H 2 , the simulation does not differentiate between the two). For gas cells below the density threshold, the simulation computes this fraction using the atomic network of (Katz et al. 1996) and a density-based self shielding prescription (Rahmati et al. 2013) in the presence of a time-dependent uniform ultraviolet background (Faucher-Giguère et al. 2009).
Stellar surface density Σ . Sum of the masses of all star particles present within the column divided by the pixel area.
Gas metallicity Z g . Mass-weighted mean metallicity of the same gas as used for the calculation of Σ g .
Star formation surface density Σ sfr . Sum of the star formation rate of all gas particles contained within the column divided by the area of the pixel.
Stellar vertical velocity dispersion σ . Mass-weighted standard deviation of the z-velocity of all the star particles present within the column.
Galacto-centric radius R. Measured based on the number of star-forming gas sub-particles (N sfg ) inside the pixel. For pixels with N sfg ≤ 1 , R is assigned to be the Euclidean distance between the galaxy centre and the centre of the pixel, whereas when N sfg > 1, R is the mean euclidean distance between the galaxy centre and the SFR-weighted mean 2D Figure 2. Independently normalised 1D probability distributions for the measured physical properties of all star-forming pixels from the sample of ≈1900 galaxies with M = 10 9−10 M at z = 0 in TNG50. Grey curves depict the unweighted distributions for each property using only the pixels/regions with finite amount of star formation in them while black curves depict the corresponding Σ sfr -weighted distributions. For completeness, we also show in blue the unweighted property distributions obtained from the full pixel dataset inclusive of non-star-forming pixels.
position coordinates of the gas sub-particles in the pixel.
Epicyclic frequency κ. Calculated using the following expression (simplified from Eq. 3.83-84 in Binney & Tremaine 2008) involving the galactocentric radius of the pixel R and the circular velocity at that location v c (R): Here, v c (R) is due to the total mass enclosed within a spherical volume of radius R, and defined as GM (R)/R.
Dark-matter volumetric density ρ dm . Sum of the masses of all dark-matter sub-particles within a column of height z = ±h ,z divided by the volume of the column. Here, h ,z denotes the stellar half-mass height associated with the pixel.
As an example, we show in Figure 1 images of the aforementioned physical properties for a galaxy with high gasmass fraction. After generating such images for our entire sample of galaxies, we record the values for all pixels obeying Σ sfr > 0 (hereafter dubbed 'star-forming regions') to obtain the final 8D parameter-space. We note that due to the presence of out-of-equilibrium and merging galaxies in the simulation volume, not all star-forming pixels in our sample are inherited from dynamically stable discs. Nevertheless, we do not exclude such pixels from our analysis as we are interested in exploring all types of star-forming environments in this study. We also, once again, remind the reader that there are potentially additional local properties that may be important in describing star formation, but in choosing the aforementioned quantities, we have attempted to identify those that are commonly discussed in the context of star formation on kpc-scales. It is indeed expected that not all of these quantities would be mutually independent and encode unique information, and we address this aspect in a later section of this paper (see §5.1).

PHYSICAL PROPERTIES OF STAR-FORMING REGIONS
One of the primary goals of our study is to understand the dominant regime of star formation in the universe by way of characterising the underlying physical properties of the ISM. A starting point for doing so would be to summarise the statistical characteristics of the properties themselves measured from our large sample of galaxies. Thereafter, one can glean insight into the physical processes driving the shapes of the distribution functions, as well as parse the degeneracies between them, by exploring how the distributions evolve as galaxies evolve in the redshift and stellar mass space.
Thus, in this section, we begin by presenting probability distributions for the measured physical properties weighted by star formation and highlight their salient features in §4.1. We then explore how these distributions change as a function of stellar mass and redshift of the parent system in §4.2 and §4.3. We also present composite distributions of ISM conditions that have given rise to stars throughout cosmic time in §4.4. Lastly, in §4.5, we investigate the underlying origin of the bimodality seen in the distribution functions presented in §4.1.

Distribution Functions for Low-redshift Galaxies
In order to discern which regions of the ISM physical parameter space support most of the overall amount of star formation, it is instructive to look at the distributions of these parameters weighted by the star formation rate of the corresponding pixels. Due to the fixed physical size of all pixels, this is equivalent to weighting by Σ sfr . In Figure 2, we show independently-normalised one-dimensional distributions of the properties of all star-forming regions belonging to our fiducial sample (which are galaxies with M = 10 9−10 M at z = 0). The grey curve in each panel depicts the unweighted distribution, and in black we show the same distributions weighted by Σ sfr . We observe that the weighted radius distribution prefers lower values while all other distributions are shifted towards higher values compared to their corresponding unweighted counterparts. This trend is reflective of the intuitive notion that the denser, inner regions of galaxies are more conducive to the formation of stars on account of the gas being dense enough to cool and collapse. More notably, we find that unlike the unweighted distributions, many of the weighted distributions -all except gas surface density and stellar vertical velocity dispersion -exhibit a strong bimodality. This suggests that star formation in our sample of galaxies is neither agnostic to the properties of the ISM nor does it favour a specific range of values, but instead preferentially occurs in two distinct environmental regimes. Later in this section, we investigate the origin of this feature from radial star formation surface density profiles of the overall population (see §4.5).

Dependence on Galaxy Stellar Mass
In view of the large dynamic range of galaxy masses present in the simulation, we now look at how properties of the ISM in star-forming regions differ between galaxies with different stellar masses. Figure 3 shows the Σ sfr -weighted distributions of ISM properties of regions drawn from present day (z = 0) galaxies in four equal-sized bins of galaxy stellar mass. In each panel, the curves become darker with increasing stellar mass from 10 7 to 10 11 M . We find three broad features to be apparent.
First, we notice that the distributions of gas surface density, stellar surface density, and SFR surface density all exhibit a subtle shift towards higher densities and SFR values for higher mass galaxies. While this trend is mostly manifest in the tails (particularly, on the high-Σ g/ /sfr end), the peak values of the distributions do not significantly vary amongst galaxies of different masses. Given that we are solely looking at actively star-forming regions of the galaxies, the concurrence in the behaviours of Σ g and Σ sfr is consistent with, and ensues from the fact that the rate of star formation in galaxies is fundamentally governed by the density of gas on subgalactic scales. In line with this, the two are directly related by construction in the star formation model of IllustrisTNG ( §2.1). Additionally, the lack of strong variation in the resolved star formation rate (alongside gas and stellar density) distributions with galaxy mass confirms that star formation, being an inherently small-scale process, is rather impervious to the overall gravitational potential of the galaxy, but is instead strongly influenced by the local gravity set by Σ g and Σ Second, the mean stellar vertical velocity dispersion of star-forming regions monotonically increases as a function of galaxy mass, with the distributions themselves progressively broadening. This dependence can be ascribed to the fact that galaxies with a more massive stellar component require larger dispersions to maintain vertical dynamical equi- librium against the deepening of their gravitational potential wells. Our finding in this case is also corroborated by the results by Pillepich et al. 2019, where they show that the median 3D velocity dispersion and scale height of stellar discs of star-forming TNG50 galaxies indeed increase as a function of mass regardless of the sample redshift. Similarly, the peak gas metallicity also shows an increasing trend with stellar mass (Tremonti et al. 2004), albeit with a gradual translation of the distribution as a whole to higher values. Starforming gas in higher mass galaxies is expected to be on average more enriched than in lower-mass galaxies owing to a longer corresponding history of star formation and deeper potential, and consequentially, greater metal production and retention. In contrast, lower mass galaxies have a higher gas fraction relative to their stellar material, thus making the metal content more dilute compared to their high-mass counterparts. Interestingly, the variation in the shapes of Z g distributions closely follows as those of the corresponding Σ distributions, indicating that the increase in metallicity is systematically linked to the bias towards higher stellar densities in more massive galaxies, hence explaining the presence of a local mass-metallicity relationship (cf. §5.1). Finally, from the quantities exhibiting bimodally-shaped distributions, we find that at a given redshift (here, z = 0), star formation almost exclusively takes place in the high-DM density innermost regions of low stellar mass galaxies, while in the higher mass galaxies, a gradual suppression of this concentrated star formation paves way for relatively more diffuse star formation in lower density regions. These two regimes are roughly equally populated for galaxies with M 10 9 M , above and below which star formation is prevalent in separate sets of parameters. This trend appears due in part to galaxies being more extended at larger masses, hence availing more area for star formation to happen at large radii. This size increase effect is reflected in the translation occurring in the peak positions of the R distribution towards larger values for larger masses. However, other factors could also potentially be at play, namely, an increasing prevalence of central AGN-feedback in more massive galaxies (Bongiorno et al. 2016;Kauffmann et al. 2003;Wang & Kauffmann 2008) as well as mass-dependent secular transformation processes leading to a decline in the central gas supply that give rise to a quiescent dense centre surrounded by a more extended gas-rich annulus where star formation mainly occurs (Forbes et al. 2014b;Kormendy & Bender 2012).

Evolution with Redshift
Having looked at how the local property distributions transform with host galaxy stellar mass, we now explore how these properties vary in similar-mass galaxies as a function of cosmic time. Figure 4 shows independently-normalised property distributions for star-forming patches from galaxies at five different epochs from z = 3 to present with the host mass fixed in the range 10 10−11 M at each epoch. The curves get darker towards lower values of redshift. The panels show that the bimodally distributed quantities favour lower density regions at later times compared to regions within similarly massive hosts at higher redshifts, albeit maintaining a similar overall range of values. At fixed mass, galaxies at higher redshifts are more compact potentially giving rise to the aforementioned trend.
Another notable feature appears in the case of stellar velocity dispersion, where a discernible overall shift occurs in the distributions from higher to lower values, while their width remains mostly unaffected. As expected, this trend can be ascribed to the fact that galaxies at low redshifts, especially star-forming, have a higher degree of rotational support and relatively thinner discs. Figure 5. Stellar-mass-weighted distributions for the ISM birth conditions in TNG50 for all stars formed during 10 ≥ z ≥ 0 in galaxies with stellar mass in the range 10 7−11 M (black), compared with the corresponding Σ sfr -weighted distributions of star-forming regions properties in galaxies with M = 10 10−11 M at z = 1 (pink; similar to that in Figure 3). All curves are independently-normalised.
Lastly, we see that the shape of the metallicity distribution mildly changes from being bimodal for galaxies at high redshifts to unimodal at low redshifts. As evident from the corresponding Σ distributions, this pattern can be ascribed to the removal of high-density, high-metallicity gas from the central regions of galaxies. However, the range of local metallicity values for our galaxy sample does not substantially evolve with redshift. Considering that the total gas fraction of galaxies appreciably varies with redshift at fixed stellar mass (Santini et al. 2014), the constancy in local metallicity distributions demonstrates that the chemical evolution in galaxies is predominantly driven through outflows and less so via gas inflow and stellar evolution (Torrey et al. 2019). The redshift dependence of the integrated mass-metallicity relationship (MZR) must thus result from galaxy populations losing gas while sampling from an underlying unevolving distribution of local gas metallicity.

Star formation across Cosmic Time
In the preceding subsections, we looked at the full expanse of conditions under which star formation occurs in the universe at fixed time, and explored how these conditions depend on galaxy mass and epoch. It is then natural to ask: what are the distributions of stellar birth conditions that have collectively given rise to all the stars that have ever been formed? To answer this, we now look at resolved ISM property distributions of star-forming regions across a very wide window of cosmic time. In Figure 5, black curves indicate independently-normalised stellar-mass-weighted distributions obtained from a composite dataset of pixels from galaxies with M = 10 7−11 M at 18 different snapshotswith a roughly uniform spacing of 0.1 in scale factor -between z = 0 and 10. Each set of property values (corre-sponding to a pixel) is weighted by the mass of newly-formed stars contributed by the associated star-forming region. Assuming that the distribution of star-forming region properties varies weakly enough with time, we calculate the stellarmass contribution for each pixel to be its instantaneous star formation rate (same as Σ sfr due to unit pixel size) times the inter-snapshot duration ∆t sf . More precisely, for snapshot i, ∆t i sf = t i−1 − t i+1 /2, where t i is the lookback time associated with snapshot i. For the last snapshot (corresponding to z = 0), it is t i−1 − t i /2.
The property distributions have an overall strong resemblance to the distributions we saw in §4.3 for galaxies with M = 10 10−11 M at z = 1 (also shown in Figure 5 in pink). In all of the panels barring σ , the similarity is apparent both in the locations of the peaks as well as their amplitudes signifying a connection between local ISM conditions contributing new stellar mass in the universe to the conditions sustaining star-formation in massive galaxies at z = 1. Previously published work on the global star formation histories of galaxies and the evolution of star formation efficiencies have shown that galaxies with halo masses in the range 10 11.5 −10 12.2 M have the highest star formation efficiency at every epoch, and are responsible for making most of the stars in the universe (Behroozi et al. 2013a). From the stellar mass-halo mass relationship (Moster et al. 2010), this is roughly equivalent to the stellar mass range 10 10 − 10 11 M , in keeping with our result. Moreover, it has been estimated from both observations and simulations that galaxies in this mass range build up ∼ 80-90 % of their stellar mass at z 2 (mostly in their discs; Tacchella et al. 2019;Behroozi et al. 2013b) and have a mass-weighted mean stellar age of ∼7 Gyrs, corresponding to z 1 (Behroozi et al. 2013b). This again, is well-reflected in our current findings. Nonetheless, due to the influence of stellar mass formed along the entire cosmic star formation history -which includes stars formed at earlier times and in galaxies with lower stellar masses -the overall distributions here are somewhat broader than the ones we saw in the preceding sections for fixed mass and time.

The Origin of Bimodality
To obtain some insight into the source of bimodality in several of the ISM properties we have seen thus far, we now examine the spatial distribution of star formation. Specifically, since the bimodality in the distributions arises from weighting by Σ sfr , we look into how the mutual variation between star formation and ISM properties can generate bimodality in the respective distributions of those properties. In the preceding sections, we saw a concomitant evolution of bimodalities in multiple quantities, hinting at a common underlying reason for their appearance. Based on this, we choose only one of the parameters -the galactocentric radius R -in this section for illustrative purposes, and expect our inferences to apply equivalently to the other bimodally-distributed quantities, namely, Σ , Z g , Σ sfr , κ, and ρ dm .
In Figure 6 we show the sample-averaged Σ sfr (R) profile constructed using the measured values of R and Σ sfr of all star-forming pixels from our fiducial sample of galaxies 4 . The profile appears to be comprised of two disparate components, which we demonstrate below to be responsible for the two distinct peaks in the distributions of quantities. To ascertain the exact shape of the profile, we fit it with a twocomponent analytical model consisting of an inner plus an outer exponential component as: where Σ E1 (Σ E2 ) are the normalisations, and R E1 (R E2 ) are the scale lengths associated with the inner(outer) exponentials. To calculate the fit parameters, we utilise MATLAB's 'trust-region' algorithm (a kind of non-linear least squares formulation) with bisquare weights. The data points used for fitting are obtained by binning the sample values at a uniform interval of 0.5 kpc, while our choice of the model itself is motivated by commonly used profiles in the literature to describe azimuthally-averaged radial distributions of HI gas and star formation rates in galaxies. We exclude from our fitting procedure any data points with relative standard error of mean values exceeding 10%. Figure 6 top panel depicts the fitting procedure results for the fiducial galaxy sample. The overall best fit profile is represented as a solid crimson line with the individual components plotted as yellow (inner exponential) and purple (outer exponential) curves.
As our next step, we seek to combine the two-component fit we obtained with the unweighted distribution (or the num-4 It is important to clarify here that due to the exclusion of pixels with no finite amount of star formation, the profile shown in Figure 6 does not represent a true radially-averaged density profile, in a way that would be constructed using a stacked galaxy sample. Here, unlike the total pixel population in a galaxy, the number of star-forming-only pixels per radial bin does not scale with the bin radius in a linear fashion. Figure 6. Top: A two-component fit to the average radial star formation rate surface density profile (black markers) generated using the sample of pixels with Σ sfr > 0 belonging to TNG50 galaxies in the mass range 10 9−10 M at z = 0. The best fit curves are shown in yellow (inner exponential), purple (outer exponential) and crimson (total). Error-bars on the markers indicate relative standard error of mean, and the abscissa limits do not contain data points that were excluded from the fit. Bottom: The two components making up the average star formation rate profile result in a bimodal shape of the galactocentric radius distribution of star-forming regions (cf. panel 7 in Figure 2).
ber distribution) of R corresponding to the pixel sample (cf. logR panel in Figure 2). For this purpose, we obtain a func- tional approximation of the unweighted distribution by fitting it to a log-normal distribution of the form (3) Here, N (R) denotes the number of pixels as a function of radius, N tot is the total number of pixels, and µ and σ are the mean and standard deviation of the distribution respectively.
Equipped with a functional form for both the average radial Σ sfr -profile as well the unweighted distribution, we then proceed to derive the resultant weighted distribution as where Σ sfr,tot is the sum of Σ sfr values of all pixels in the fiducial dataset. As shown in Figure 6 bottom panel, the distribution so obtained not only reproduces the double-peaked structure of the original weighted distribution, but also, the two individual modes present can be separately recovered from the convolution of the unweighted distribution with the "inner-" and "outer-" component of the Σ sfr (R)-profile respectively. The multiplicity of modes in the weighted distribution is therefore a natural outcome of the occurrence of multiple exponential scale lengths in the corresponding average Σ sfr (R)-profile, with the scale length values also governing the positions of the modes.  Figure 8. Average radial star formation rate surface density profiles for the TNG50 fiducial galaxy sample separated into three different star-formation-size bins containing roughly equal number of galaxies.
Having discerned the origins of bimodality in the ensemble property distributions, we now examine whether, and to what degree, the dichotomy in star formation conditions arises from two distinct populations of galaxies contributing exclusively to either peaks. In other words, could the bimodality result from an admixture of galaxies undergoing inside-out quenching producing the "outer" mode with the ones with mostly central star formation manifesting as the "inner" mode? Or perhaps a population of small galaxies with steep exponential profiles mixed with a separate population of large galaxies with shallow profiles? To this end, we assimilate the pixels corresponding to each galaxy separately and calculate the relative fraction of the galaxy's total star formation contained within the two modes of the ensemble distribution. To separate the two modes, we utilise the local minimum between the peaks as the separation radius, which for our fiducial sample turns out to be at logR 0.19. We use this simplified criterion instead of a Gaussian mixture model (GMM) for peak separation as GMMs cannot be applied to density distributions when sample weights are in consideration. Figure 7 shows the distribution of the fractional amount of integrated star formation rate of each galaxy that is contained within the "inner" mode (generated by the inner exponential component) of the weighted ensemble distribution of Figure 6. As can be gleaned from the figure, the distribution spans the full range from 0 to 1 indicating that galaxies do not in fact fall into two separate classes, viz., starforming cores and rings, contributing exclusively to either of the two modes of star formation at the ensemble level. While this is true, this does not deliver a clear conclusion about whether multiple components exist within the star-formation rate profiles of individual galaxies.
The existence of a sharp transition in the slope of the ensemble Σ sfr -profile may arise from two plausible scenarios: i) a population of small galaxies with steep profiles contribut- . Corner plot depicting scaling relations between properties of kiloparsec-sized star-forming regions in TNG50. Densities are derived using the collective data corresponding to all star-forming pixels from our fiducial sample. The diagonal panels show 1D Σ sfr -weighted histograms for each quantity, while coloured contours represent Σ sfr -weighted cumulative joint density fractions at 10, 30, 50, 70, 90 and 99% from innermost to the outermost.
ing to the inner exponential mixed with a population of large galaxies with shallow profiles producing the outer exponential, or ii) the preponderance of galaxies among the entire sample having the presence of two different scale lengths individually. To ascertain this, we investigate the shape of the average star-formation rate profile for our fiducial galaxy sample separated by their sizes as shown in Figure 8. Galaxies are split into three equally populated bins of star formation size, which we define to be the galactocentric radius of the farthest star-forming pixel in each galaxy. The panels confirm that galaxies of different sizes all have the presence of a broken star formation rate profile, albeit with a more pronounced transition (or elbow) in bigger galaxies on account of their outer components having greater scale lengths, and hence, shallower slopes relative to the inner component. Combined with the inference from the previous figure, this finding suggests that galaxies in general provide a finite contribution to the two modes of star-formation that are present in the ensemble distribution. Additionally, the overall skewness of the distribution in Figure 7 towards lower values implies that the vast majority of galaxies belonging to the fiducial sample exhibit a greater amount of star formation in their diffuse outskirts also at the individual galaxy level, which is analogous to and confirms the population-wide trend noted in §4.

Resolved Galaxy Scaling Relations
Having thus far examined and discussed the statistical nature of our physical parameter space one quantity at a time, we now look into the mutual relationships amongst these spatially-resolved ISM properties in a pairwise fashion, more commonly known as resolved scaling relationships, predicted by TNG50.
In Figure 9, we present the joint distribution functions of all possible pairs of physical properties measured for starforming regions belonging to the galaxies in our fiducial sample. The diagonal panels show one-dimensional Σ sfrweighted probability density distributions for the property labeled on the corresponding abscissa (same as in Figure 2), while each of the off-diagonal panels show the Σ sfr -weighted two-dimensional cumulative density contours (upto 99% represented by the outermost contour) for the property pair indicated by the corresponding axes labels. In the following discussion, our use of the term linear is meant to indicate linearity in log-space.
Our results give rise to spatially-resolved counterparts to several canonical global galaxy scaling relationships (Tremonti et al. 2004;Noeske et al. 2007;Schmidt 1959;Kennicutt 1998), namely, the mass-metallicity relation (Σ − Z g ) (Sánchez et al. 2019(Sánchez et al. , 2017Barrera-Ballesteros et al. 2016), star formation main sequence (Σ −Σ sfr ) (Abdurro 'uf & Akiyama 2018;Liu et al. 2018;Cano-Díaz et al. 2016) and the Schmidt star-formation law (Σ g − Σ sfr ) (Calzetti et al. 2018;Roychowdhury et al. 2015;Leroy et al. 2013;Schruba et al. 2010;Bigiel et al. 2008). Apart from these, the figure demonstrates a widespread presence of linear or nearlinear relationships between multiple other quantities. The existence of some correlations such as those with galactocentric radius is intuitively expected on account of structural and dynamical considerations as most galaxies are known to have well-defined density and metallicity profiles. However, other correlations (for e.g. metallicity vs. stellar velocity dispersion) have seemingly less obvious physical origins and warrant detailed investigation in a separate future study (P. Torrey et al., in preparation).
We notice that some relationships, specifically the Schmidt law and mutual relations between stellar mass density, darkmatter density, radius and the epicyclic frequency are extremely tight with negligible scatter. In the case of the former, the low scatter indicates that in TNG50, star formation is not only very closely related to the mass of the gas (due to SH03), but is also well-sampled on kpc-scales across the entire range of Σ g . The latter set of relationships simply reflect the empirically known Σ and ρ dm radial profiles, as well as the definition used for κ in our analysis ( §2). Contrary to this, relationships involving gas density (with the exception of the Schmidt law) exhibit little to no correlation. These characteristics also come up in our subsequent analysis in §5.2.
In a similar vein of contrasting features, we find that while a majority of the scaling relationships have a monotonic and manifestly linear shape, others (most notably, all panels representing stellar velocity dispersion) hint at a break or a turnover. Finally, in many of the two-dimensional distributions, the underlying bimodality in the physical quantities is manifested as two distinct clouds, which in some cases devi-ate from one another in terms of their slope, thereby giving rise to the aforementioned broken scaling relationships.
The ubiquity of linear correlations in Figure 9 points to a high degree of redundancy in the overall parameter space, where multiple parameters encode shared information amongst them and lessen the effective degrees of freedom or "axes of variance" available. This information, in principle, should therefore be accessible using a lower-dimensional representation of the same space. Motivated by this feature, we subsequently embark on a search for a reduced representation of the ISM hyper-parameter space using the commonly used technique of principal component analysis for dimensionality-reduction.

Characterising the Hyperspace of ISM Properties
Even though each star-forming region in our analysis is represented by a set of multiple physical parameters, not all of them are expected to be equally informative. Sometimes, relationships between parameters exist (as evident in §5.1) thereby lowering the degrees of freedom needed to account for the information contained in the original space, also known as the intrinsic dimensionality of the dataset. In order to better scrutinise what these relationships are, and their implications for the conditions in which star formation takes place, we now conduct a statistical characterisation of our measured 8D parameter space by means of lowering its dimensionality using the technique of principal component analysis (PCA). Through this exercise, we seek to answer the question: Is there a simplified meaningful representation of the underlying distribution of star formation in the ISM, and if yes, how many controlling parameters are required for such a representation to work?

Principal Component Analysis
Principal component analysis is a widely-used nonparametric analysis technique to reveal low-dimensional representations of structures underlying complex highdimensional datasets. It does so by quantifying the importance of each original dimension for describing the information content (or variance) contained within the data. Mathematically, PCA aims to re-express a given dataset with a new set of orthogonal basis -constructed through a linear combination of its original basis -that minimises noise and whose axes (known as principal components or PCs) preserve most of the variance within the dataset. This essentially amounts to the diagonalisation of the covariance matrix of the dataset (which carries information about the redundancy between parameters and overall noise in the data) using eigenvalue decomposition. The PCs thus obtained are naturally uncorrelated, and are traditionally expressed as a rank-ordered set based on their corresponding variances. Accordingly, the leading component PC 0 is aligned with the direction of largest variance in the data, followed by PC 1 , PC 2 and so forth. A lower (say, k) dimensional hyperplane approximation to the initial space is then achieved by defining a threshold variance and keeping only the first k PCs required to capture that amount of variance. A common practice in the application of PCA is to transform and standardise the data to bring all variables on the same footing in terms of their magnitude and scales. Given the vastly different units of measurements and dynamic ranges associated with ISM properties within our dataset, we apply a logarithmic transformation to our entire dataset. In doing so, we alleviate the impact of skewness of the distributions in the linear space, and attain a more practically useful sampling resolution across the dynamic range of all the properties. In addition to this transformation, we also standardise our data such that each original dimension is rescaled to have a mean of zero and unit variance. This is done to avoid variables with larger scales from dominating the covariance structure of the dataset and biasing the directions of the PCs. Taking this data standardisation step then makes the PCA procedure equivalent to diagonalising the correlation matrix instead of the covariance matrix.
In our study, we use the correlation-matrix-based PCA implementation available in MATLAB, which uses a singularvalue decomposition algorithm in lieu of the more traditional eigenvalue decomposition to compute the principal component axes. Through the rest of this section, we will refer to the individual star-forming pixels as samples, and the corresponding ISM properties as features. The dataset derived from our fiducial sample of galaxies and analysed herein constitutes a total of 589,822 samples each associated with an 8-dimensional feature vector.

PCA Results
The principal components we get from our analysis represent a new set of axes (obtained by a generalised rotation of the initial basis) onto which the original variables can be  projected. By virtue of being a simple transformation of the original coordinates, the principal component space is expected to capture the most important characteristics (such as linear correlations) of the feature space. It is therefore instructive to look at the distributions of these transformed coordinates, known as component "scores", shown here in Figure 10. As evident from the figure, the weighted distribution of the leading component PC 0 has a bimodal shape while the other PCs have distributions unimodal in nature. This indicates that PC 0 by itself is able to capture the distinction between the two star formation regimes exhibited in multiple dimensions in the original feature space. Taking advantage of this fact, we proceed to split our full data into two separate datasets corresponding to the aforementioned regimes (cf. §4) by making a cut along the PC 0 axis, which is determined by the point of minimum between the two peaks. We conduct PCA separately on these two subsets of the dataset so as to allow us to better understand which physical correlations govern the dispersion within each of the two regimes.
Hereafter, the 'outer-mode' stands for the low-value peak of PC 0 corresponding to the high-radius low-density starforming region population, whereas the 'inner-mode' represents the low-radius (central) high-density population.
In Figure 11, we show the percentage of the total variance explained by each PC obtained from the decomposition of the entire fiducial dataset, and the two modes separately. For the combined data, the first component (PC 0 ) alone captures ∼80% of the total variance in the dataset, the second one (PC 1 ) ∼9%, the third (PC 2 ) ∼6%, and the values drop sharply thereafter with the last few PCs presumably representing noise. The widths of the distributions in Figure 10 also portray this trend. That the first two PCs collectively account for ∼90% of the total variability means that a 2D projection of the feature space could indeed provide a reasonable characterisation of the complete 8D dataset. Furthermore, the bimodal shape of the PC 0 distribution alongside its high value of associated variance suggests that most variability in the data is manifested as peak-to-peak variance arising from the mix of two different sample populations (i.e., bimodality) in the dataset. In fact, this variance far exceeds the sampleto-sample variability within the peaks themselves. On the other hand, in case of the separated modes, the explained variance curve declines rather gradually and the number of PCs needed to capture 90% of the variance goes from 2 up to 5, with the respective first components explaining far less variance (60 and 45%) than their counterpart for the combined data.
Next, we turn towards quantitatively examining the relationship between the feature space and the resulting principal component space. In other words, since PCs are a linear combination of the ISM properties, we can determine the contribution of each of those properties to the PCs, also known as "loading factors" (see Table 2). Figure 12 depicts the eight ISM properties as 2D vectors plotted in the PC 0 (abscissa) vs. PC 1 (ordinate) plane for the full data (upper panel) as well as the two modes separately (lower panels). The (x, y) coordinates of each property are their loading factors along the corresponding PC direction. Several key attributes emerge: From the top panel, we see that all properties contribute with roughly equal weights to the first principal component and, with the exception of galactocentric radius, have a similarly positive sign. This conveys a correlated equal variation of all quantities with this component to first order. Additionally, it explains why PC 0 of the full dataset captures the bimodality -at low values of PC 0 , samples would have high values of R and low values of all the other quantities thereby belonging to the high-R/low density regime i.e., the outer mode. By the same token, samples that have a high value of PC 0 would belong to the inner mode. A similar trend is seen in the PC 0 loadings associated with the outer mode, al- beit with a reduced contribution of Σ g and Σ sfr . Thus, PC 0 in this case highlights processes that modulate the environment within stellar discs, which is primarily governed by the dynamical influence of stars and dark-matter. In the inner mode, the pattern is significantly different and PC 0 loses most of its correlation with σ and ρ dm , hence tracing a more complex dynamical environment driven mainly by gas-and stellar-gravity.
In addition to the composition of PCs, we can draw insights from Figure 12 pertaining to the correlations existing between the features themselves. In the top panel, we observe a clear clustering amongst ISM properties hinting at a high degree of multi-collinearity in the system. In particular, the quantities {Σ , κ, Z g , σ , ρ dm } lie in a tight cluster indicating that they have a strong positive correlation 5 amongst themselves, and a strong negative correlation with the galactocentric radius R. Σ g and Σ sfr also have a positive correlation between them, which is an expected result based on the canonical Schmidt law. These observations are well in line with our interpretation of Figure 9. In the case of Σ g , the ap- parent overall lack of linear correlation with other parameters noted in the previous subsection also emerges in the results of our PCA analysis. This observation is perhaps suggestive of the fact that the variance associated with the Σ g scaling laws either comes from predominantly non-linear dependencies, or is identified as noise and hence not captured by the highranking PCs shown in Figure 11. The figure also affords us the additional clarity that this behaviour is almost entirely on account of the more dominant outer mode. In the case of the outer mode -which we previously saw to be representative of a stellar disc environment -the same correlations stand as in the case of the full dataset. However, in the inner mode, most correlations barring the R − κ − Z g and Σ g − Σ sfr relationships are appreciably weakened. Strikingly, in these central star-forming regions, σ is no longer linked with the properties of stars but is more tightly coupled to the gas instead. Furthermore, σ does not show a strong correlation with the radius, signalling the near-flatness or lack of a well-defined vertical velocity-dispersion profile in those regions.
It is worth keeping in mind here that PCA, being a linear method, is not expected to fully capture the non-linear aspects of relationships between features, if present. Given that, in order to describe a non-linear relationship fully, one cannot rely on a single principal component. Rather, in such a case, a group of PCs is needed, of which one would provide the best linear-approximation to the underlying relationship while the others would encompass variances in the directions of deviations to non-linearity. By using a logarithmic transform, however, we are not strictly confined to the linear regime and are able to additionally capture power-law relationships. Although non-linear generalisations of PCA (such as autoencoder neural networks and kernel-PCA) as well as more advanced manifold-learning methods exist, they are not conducive to the kind of analysis we have conducted in this study i.e., one that requires taking individual sample weights into account.
While so far we stepped through the PCA analysis results for our fiducial sample of galaxies, we now briefly consider how these results change when we conduct PCA on samples representing different galaxy stellar mass ranges and redshifts (akin to our approach in §4.2, 4.3). In Figure 13, we show the distribution of PC 0 for star-forming region datasets derived from galaxies in different stellar mass bins at z = 0, and for datasets corresponding to galaxies with M = 10 10−11 M at different redshifts. Darker curves represent higher stellar masses and lower redshifts. The variation in the relative amplitude of the two peaks as a function of M is reproduced well (compared to those observed in §4.2 and 4.3 for the full space), as is the trend with redshift, where the preference for star formation gradually shifts towards low density, disc-like extended environments for lower redshifts and higher stellar masses. This finding reinforces our understanding from previous results that the leading component fully captures the bimodality signature in the original features as well as the inherent physical relationships that correlate them.
Lastly, in Figure 14, we compare the actual distributions of the initial physical space parameters for our fiducial sample of galaxies with the reconstructed versions generated only by including the first three principal components. We find that the agreement between the original and the low-dimensional version is excellent. For all the physical parameters where there are clearly two distinct peaks, we recover their respective locations and the position of the minimum between them. On the other hand, in terms of the relative peak heights between the two modes, the reconstructed representations are somewhat discrepant from the original distributions. As such, it is possible to reproduce the presence of bimodality in quantity distributions by using PC 0 alone, however, capturing their exact attributes and meaningful variations within individual peaks requires additional components. In our case, the first three PCs account for ∼94% of the information present in the dataset (see Table 2). Hence, by retaining a substantial fraction of the information, a three-dimensional embedding of our dataset can serve as a practically useful avenue for sampling a family of star-forming regions for further specific investigations, such as tall-box simulations to study feedback and gas dynamics. As a supplementary data product, we provide the PCA loading factors, variances explained by all of the PCs, and joint distributions of the first three PC scores for star-forming regions from galaxies with M = 10 7−8 , 10 8−9 , 10 9−10 , 10 10−11 M at z = {0, 0.5, 1, 2, 3} at https://github.com/bhawnamotwani/smaug.

PRELIMINARY COMPARISON WITH OBSERVATIONS
To assess some of the results obtained from our simulations in the context of observations, we now proceed to examine the distributions of properties from resolved observations of nearby star-forming galaxies from the MaNGA IFU survey data (Bundy et al. 2015). Given the resolution of TNG50 and the scale chosen for our analysis, MaNGA offers a suitable dataset for a qualitative comparison against our results. However, due to the unavailability of an observational counterpart to several of the 'local' properties we have worked with, we limit the comparison in this section only to a handful of quantities, namely, Σ , Σ sfr , and R.

Survey Description
The MaNGA survey is one of the three programs undertaken as part of the fourth installment of the Sloan Digital Sky Survey (SDSS-IV) aimed at observing resolved kinematic structure of ∼10,000 nearby galaxies through integral field spectroscopy. The imaging and spectroscopy for the galaxies is conducted using the 2.5m telescope at the Apache Point Observatory (APO) alongside specialised IFUs and the BOSS spectrograph with coverage in the 3600-10300A range and resolving power R ∼ 2000. In this work, we use the latest public release DR15 (Aguado et al. 2019) consisting of 3D data-cubes for a sample of 4824 unique galaxies uniformly sampled over the stellar mass range ∼ 10 9−11 M . Raw data is reduced using the MaNGA's internal data reduction pipeline (Law et al. 2016) and analyzed using the data analysis pipeline Belfiore et al. 2019). The local mass density of spaxels used in this work is computed as part of the Pipe3D pipeline through stellar population modelling by performing a linear decomposition of the spectrum into simple stellar populations of different ages and metallicities for each spaxel and correcting for dust attenuation (using the Balmer decrement) prior to fitting (for full details, see Sánchez et al. 2016). The local star formation rates are derived from Hα luminosity using the formula from Kennicutt (1998) anda Chabrier (2003) initial mass function. Due to an imposed threshold of S/N = 3 on Hβ (used for extinction-correction) alongside distance and intrinsic luminosity constraints, the effective median sensitivity limit of Σ sfr in MaNGA lies at ∼ 10 −3 M yr −1 kpc −2 . The spatial coverage of galaxies is expected to be at minimum upto 1.5 effective radii (R e ). Lastly, MaNGA observations have a median spatial resolution of 2.5" FWHM ( 1.8 kpc at the median redshift of ≈ 0.03), and are sampled at a scale of 0.5" per spaxel in the final data cubes which translates into a physical scale of ∼1-2 kpc per spaxel given the redshift range of 0.01 < z < 0.15.

Galaxy and Spaxel Selection
For our analysis, we choose all galaxies from MaNGA DR15 with a given BPT classification (Kewley et al. 2001;Kauffmann et al. 2003) of either 'cLIER' (galaxies with kpcscale low-ionisation emission regions in their centres accompanied by star-formation in the outskirts) or 'star-forming' (as determined by Belfiore et al. 2018). To select a comparison sample for both MaNGA and TNG, we implement a selection criterion that matches the simuated and observed galaxies in their size-mass plane by stochastically sampling them in a binned fashion as shown in Figure 15. Specifically, we use a metric for the two-dimensional effective radius of the galaxy and the total stellar mass enclosed within the effective radius calculated from the corresponding pixels or spaxels for each galaxy. For MaNGA, we adopt the effective radius to be the inclination-corrected Petrosian halflight radius (R e ), and the face-on projected 2D r-band halflight radius (R 1/2,r−band ) for TNG50 galaxies. The particular choice of mass inside R 1/2,r−band for TNG50 is made in order to avoid uncertainties arising from the differences between the definitions of total mass in the simulation (all mass within the 3D virial radius) and in observation (mass calculated from light within twice the 2D Petrosian radius). Following this procedure gives us a total of only 147 galaxies in both MaNGA and TNG due to the highly non-overlapping nature of their initial distributions. Thereafter, for all the selected galaxies, we convolve our images with a MaNGAlike Gaussian PSF with FWHM 1.8 kpc, and then draw the corresponding spaxel contributions from within 1.5 times R e (R 1/2,r−band ) in MaNGA(TNG) so as to record their Σ , Σ sfr , and R values. Like the case for TNG50 galaxies, the R values in MaNGA are de-projected to be in the face-on orientation. Any spaxels containing bad data such as ill-defined stellar masses and/or undetected star formation rates are excluded from the comparison study. Additionally, to mimic the MaNGA detection thresholds in the simulation data, we limit our analysis to only the subset of all spaxels that obey Σ ≥ 2 × 10 7 M kpc −2 and Σ sfr ≥ 10 −3 M yr −1 kpc −2 .

Inference and Discussion
In Figure 16, we present the results of our comparitive analysis between the observational and theoretical datasets.
The top panels illustrate the individually normalised unweighted and Σ sfr -weighted distributions of log Σ , log Σ sfr , and log R as dotted and solid curves respectively. As the figure indicates, property distributions in MaNGA cover approximately the same range as TNG50 both in their original unweighted as well as the star formation-weighted forms. As a consequence, we also achieve an overall good agreement between the two in terms of peak height. Interestingly, due to the observational mocking steps involved, the TNG50 distributions no longer show a discernible bimodal shape, keeping in line with their observational counterparts. Specifically, applying the MaNGA point spread function on the simulation results reduces the Σ sfr values of the inner pixels (at R 2 kpc) by spreading star formation across multiple surrounding pixels. This in turn leads to an increase in the scale length of the otherwise steep inner exponential part of the TNG50 radial Σ sfr -profile and the smoothening of the elbow following it. This change thereby not only causes the suppression of the 'inner' mode (c.f. §4.5), but also fades the prominent separation between the two modes by decreasing the difference between the slopes of the inner and outer exponential components of the profile.
Notwithstanding the general conformity, property distributions in TNG50 exhibit a few minor departures from that of MaNGA. The weighted MaNGA distributions of Σ and Σ sfr seemingly favour peak values that are slightly greater in comparison with those exhibited by the TNG50 curves, a behaviour that is not perceptible in the case of their unweighted distributions. Another marginal disparity is apparent between the two R-distributions in that the TNG50 curve features a minute overall shift (≈ 0.1 dex) towards higher values both in the unweighted and weighted forms relative to MaNGA. Lastly, the imposed hard Σ -cut in the case of TNG50 expectedly manifests as a sharp lower limit of the unweighted distribution against a much softer, tapered edge in MaNGA. Curiously, while the peak of the weighted R distribution in MaNGA approximately lines up with the outer (large-radius) mode in TNG, the peak in Σ is more compatible with the inner higher-density mode of the corresponding simulation-derived distribution.
The bottom panel of Figure 16 depicts the average Σ sfr profiles for the two datasets, constructed in a similar fashion to the curve shown in Figure 6. Barring a slight offset in the normalisations at large radii (≈ 0.1 dex at ≥ 7 kpc), we find that the two profiles exhibit an excellent agreement with one another. In contrast with the 2-component profile composition discussed in §4.5, both the observed MaNGA and 'mock' TNG50 profiles have a smooth shape that can be best described by a single continuous component. Indeed, if we fit the two curves, we find that the MaNGA(TNG) profile can be well-represented by a single exponential of scale length 1.3(0.7) kpc plus a constant. We therefore deem the lack of two disparate scale lengths in this case to be of direct consequence to the shapes of the weighted distributions, explaining the absence of a bimodality. Finally, we notice that in spite of being largely akin to observations, the TNG-profile near the centre (at R < 1 kpc) is somewhat Figure 16. Top: Probability density distributions for Σ (left), Σ sfr (middle) and R (right) for all well-defined star-forming spaxels in MaNGA (orange) at z = 0 − 0.02 compared with those from TNG50 (black) at z = 0, derived in both cases from galaxies with total stellar mass tentatively in the [10 9 , 10 10 ] M range. Dotted lines represent the intial unweighted distributions while the solid curves are weighted by Σ sfr . Bottom: Binned average radial star formation rate surface density profiles for MaNGA and TNG50, with the errorbars indicating relative error of mean in each bin. Vertical bars in grey and orange denote the analysis scale used for TNG50 (1 kpc) and the spatial resolution of MaNGA at median redshift ( 1.8 kpc) respectively. shallower and has a value that is a factor of ≈ 2 lower compared to the inferred value in MaNGA. This difference could be partially responsible for the suppressed weighted probability i.e., star formation, at small-radii in TNG50 relative to MaNGA. Although, in light of our chosen simulation sampling scale and the spatial resolution of MaNGA ( 1 and 1.8 kpc respectively), we note that the nature of both profiles and their mutual comparison at these small radii may be inadequate to draw any robust conclusions. At large radii, the TNG50 curve has a steeper decline as indicated by the smaller best-fit scale length, while MaNGA retains a roughly constant value of star-formation albeit with a relatively noisier profile due to a significant dearth of star-forming spaxels in the outskirts.
Despite a careful galaxy sample selection and the application of MaNGA-like detection limits on simulation-derived data, the presence of minor dissimilarities between TNG50 and MaNGA suggests that the true extent of the selection effects inherent to the survey is perhaps not fully captured in the hard thresholds that we have used in our analysis. The resulting discrepancies are conspicuously reflected in the (difference between) unweighted distributions of Σ and R, as well as the Σ sfr (R) profiles. As an example, the difference between the average Σ sfr -profile slopes at large radii and in central regions of galaxies is symptomatic of that fact that low-Σ sfr regions are preferentially underrepresented in the observational data set that we have utilised. Such regions of low star formation are typically found in galaxy outskirts as well as in the centres of galaxies undergoing inside-out quenching (the latter would be expected due to our inclusion of cLIER galaxies). Considering that the properties of TNG galaxies such as sizes, star formation rates and specific star formation rates have previously been shown to be consistent with observed galaxy properties (e.g., Stevens et al. 2019;Hwang et al. 2019;Genel et al. 2018), we hypothesize that the dearth of these high-radius/central low-SFR pixels is borne out of vulnerability to the nuances of survey detection limits, particularly in Σ sfr . An exhaustive future study conducted using carefully generated mock-MaNGA observations from TNG will allow us to test this hypothesis. Nonetheless, we hope that with future resolved spectroscopic surveys pushing the detection limits, the trends emerging from our study could be directly tested against ISM property distributions in resolved observations, and provide the much needed clarity in this picture. 7. SUMMARY AND OUTLOOK In this work, we use the TNG50 cosmological simulation volume to generate and statistically survey the multidimensional parameter space of resolved ISM properties across a wide range of galaxy masses and redshifts. Specifically, we select star-forming galaxies in the mass range 10 7 − 10 11 M at z = {0, 0.5, 1, 2, 3}. This sample accounts for more than 80% of the total star formation in the simulation at each of the utilised snapshot. By dividing the galaxy into kpc-sized regions, we conduct a coarse-grained measurement of gas/stellar surface densities, gas metallicity, stellar vertical velocity dispersion, disc epicyclic frequency, star formation rate density and dark-matter volumetric density representative of each region (see §3.2 for further details). We present a synopsis of the main findings of our analyses below.
• The distributions of all ISM properties, with the exception of stellar vertical velocity dispersion and gas surface density, exhibit bimodally-shaped distribution functions when weighted by star formation (Figure 2), indicating that star formation in galaxies takes place in two separate environmental regimes. Star formation is most favoured to occur in the outer low-density low-metallicity regions for high-mass galaxies (above 10 9 M ), while being localised to the central high-density regions for lower mass galaxies (Figure 3). For galaxies in a fixed mass bin, the preference for star formation in the outer diffuse regions is greater at lower redshifts ( Figure 4). Additionally, our results show that most of the star formation in the universe takes place in galaxies with M = 10 10−11 M at z ≤ 2 ( Figure 5).
• The presence of a bimodality in property distributions results from an underlying bi-component average radial star formation rate profile for the galaxy sample. By fitting this profile with a combination of two exponentials, we demonstrate that the two peaks in the weighted distributions can be individually reproduced from the "inner" and "outer" exponential components separately ( Figure 6). We also find that almost all galaxies sustain a finite amount of star formation in both modes, albeit with varying degrees of relative contributions to them (Figure 7, 8).
• We investigate the 2D joint density distributions between parameters (Figure 9), and find a very high degree of multicollinearity (aka redundancy) in the 8D space. Through linear dimensionality-reduction via principal component analysis, we find that almost all of the intrinsic variance of the parameter space can be well captured via a transformed 3-dimensional representation (Figure 11, 14). Moreover, the leading principal component alone also captures the multi-parameter bimodality signature present in the original space (Figure 10, 13). This signature is manifested in the form of a "radius-relation" (Figure 12), i.e., the anticorrelation of galactocentric radius with the rest of the bimodally distributed ISM parameters.
• We conduct a preliminary comparison of our 1D property distributions and star formation rate profiles with those obtained from the MaNGA IFU survey for z = 0, with both galaxy samples selected to represent similar size-mass distributions (Figure 15, 16). Upon the application of ob-servational detection limits, the Σ sfr -weighted distributions in TNG50 lose their bimodal shape showing concordance with the shape of the observed resolved property distributions. The comparison reveals an overall good match between TNG50 and MaNGA for the spread and peak locations of the parameter distributions, as well the the underlying average radial Σ sfr profiles below 7 kpc. We argue that some of the minor deviations between MaNGA and TNG50 results possibly arise from our inability to fully capture the nuances of the observational detection limits necessary to make a maximally unbiased comparison.
We envision that the results from our study will provide impetus for the construction of new heuristic star formation/ISM prescriptions in the near-future that are driven by fewer free parameters compared to currently used subgrid models. Given that our analysis is based on a dataset acquired from a fully cosmological, high-resolution, large volume simulation, we provide strong constraints for any model that endeavours to physically describe star formation and ISM physics in galaxies. By facilitating an optimal sampling of realistic initial conditions for future high-resolution 'tallbox' ISM simulations (e.g. TIGRESS), our characterisation will allow us to bridge the gap between stellar and galactic scales by establishing a direct link between small-scale ISM conditions and large-scale outflow properties. Finally, our work will provide avenues for meaningful comparison with similar measurements conducted with other large-scale cosmological simulations, as well as detailed quantitative patterns emerging from future high-precision spatially-resolved observations, especially at high-redshifts. APPENDIX A. CONVERGENCE WITH SIMULATION RESOLUTION AND PIXEL SIZE We test the convergence of our results by way of comparing the average star formation rate profiles (shown in Fig. 16) in Figure  17. We use the two lower resolution, same volume counterparts of the standard TNG50-1 run, namely TNG50-2 and TNG50-3. TNG50-2(TNG50-3) has a mass-resolution of 6.8(54.2) M for gas, and 36.3(290.4) M for dark matter, making it roughly a factor of 8(64) coarser compared to TNG50-1. From the left panel in the figure, we observe that using a coarser-resolution box has marginal influence on our results both in terms of slope and normalisation, except at R ≥ 20 kpc. Given the mass range we are working with, we expect these regions to correspond to the far-outskirts of the galaxies with very few star-forming gas particles, thus making the profile noisier due to poorer sampling.
In the right panel of the figure, we show a comparison of the same profile in TNG50-1, but this time for different choices of pixel size (image resolution). The profiles do not appear to have a dependence on pixel size for pixels in the central parts where the simulation resolution elements are smaller due to higher densities. In these regions, star formation is adequately resolved with a few 10s of particles contributing to each pixel on average. At large radii, we start to a see a weak variation of the slope with pixel size, such that bigger pixels give rise to steeper profiles. This is because in galaxy outskirts, where star formation does not have a uniform coverage, larger pixels tend to smooth over small-scale spatial patterns hence acquiring increasingly lower values of Σ sfr as we go farther out. Lastly, as these profiles are binned and only composed of star-forming spaxels, they become noisier in the outskirts due to arbitrarily low area coverage as well as worsening Poissonian statistics.  Figure 17. Σ sfr profile variation with the simulation resolution (left) and choice of pixel size (right) for star-forming regions in our fiducial galaxy sample (M = 10 9−10 at z = 0). In keeping with Figure 16, only star-forming pixels are used to construct these profiles.
B. RECONSTRUCTING THE ORIGINAL SPACE FROM PCA RESULTS Let X be the data matrix corresponding to the initial space with n rows for the samples and m columns denoting features (m = 8 in our study). As described in §5.2.1, we standardise our dataset before conducting PCA by first subtracting the mean vector µ from all rows and dividing them element wise by the standard deviation vector σ. In our case, we made σ to be the Σ sfr -weighted standard deviations. Doing this gives us the corresponding matrix of standardised data, also known as z-scores Z. After the PCA analysis, we obtain our results in the form of a coefficient matrix C, which is an m × m matrix whose columns are the m eigenvectors representing the directions of the principal components (PCs). Then, the principal component scores are nothing but a projection of our original space along each of the PC directions. These are given in matrix form as P = ZC, with the same dimensionality as our original parameter space, i.e., n × m. Now, in order to reconstruct the original data from these scores, we apply the inverse operation such that Z = PC −1 . Due to the orthonormality of C, this is equivalent to Z = PC T . Finally, to obtain the reconstructed X, we multiply each column of Z by the corresponding σ value and add the corresponding mean µ. To obtain an approximate reconstruction using only a few PCs, say k in number, one would only use the first k PC scores, keeping just the first k columns of C in the calculation above.