Probing the Earliest Phases in the Formation of Massive Galaxies with Simulated HST+JWST Imaging Data from Illustris

We use the Illustris-1 simulation to explore the capabilities of the Hubble Space Telescope (HST) and James Webb Space Telescope (JWST) data to analyze the stellar populations in high-redshift galaxies, taking advantage of the combined depth, spatial resolution, and wavelength coverage. For that purpose, we use simulated broadband ACS, WFC3, and NIRCam data and two-dimensional stellar population synthesis (2D-SPS) to derive the integrated star formation history (SFH) of massive (M * > 1010 M ⊙) simulated galaxies at 1 < z < 4 that evolve into a local M * > 1011 M ⊙ galaxy. In particular, we explore the potential of HST and JWST data sets reaching a depth similar to those of the CANDELS and ongoing CEERS observations, respectively, and concentrate on determining the capabilities of this data set for characterizing the first episodes in the SFH of local M * > 1011 M ⊙ galaxies by studying their progenitors at z > 1. The 2D-SPS method presented in this paper has been calibrated to robustly recover the cosmic times when the first star formation episodes occurred in massive galaxies, i.e., the first stages in their integrated SFHs. In particular, we discuss the times when the first 1%–50% of their total stellar mass formed in the simulation. We demonstrate that we can recover these ages with typical median systematic offset of less than 5% and scatter around 20%–30%. According to our measurements on Illustris data, we are able to recover that local M * > 1011 M ⊙ galaxies would have started their formation by z = 16, forming the first 5% of their stellar mass present at z ∼ 1 by z = 4.5, 10% by z = 3.7, and 25% by z = 2.7.


INTRODUCTION
The most massive galaxies in the nearby Universe are generally quiescent and present relatively old stellar populations (Renzini 2006 and references therein).This is consistent with the 'downsizing' scenario, first introduced by Cowie et al. (1996), and which states that the most massive galaxies started and completed their stellar mass assembly earlier than less massive ones.Nevertheless, the mechanisms involved and, in particular, the role of feedback in the downsizing scenario of galaxy formation remain unclear.We can summarize this downsizing scenario and its implications in our understanding of galaxy formation by the following key questions: (i) When did the first massive galaxies appear in the lifetime of the Universe?(ii) When did the first star formation stages happen in massive galaxy formation, and how fast and bursty were they?(iii) How did stellar mass assemble in massive galaxies?One way to address these questions is to look further and fainter in order to search for the most likely progenitors of local massive galaxies at different redshifts, and subsequently analyze their stellar populations, especially their Star Formation Histories (SFHs).
There are many theoretical studies that have addressed these issues related to the formation and assembly of massive galaxies by reconstructing their full galaxy assembly histories (e.g., De Lucia & Blaizot 2007).This kind of studies have provided strong evidence for a two-phase evolutionary scenario (e.g.Oser et al. 2010) in which massive galaxies would have formed most of their stellar mass at high redshift (z 2) via a first main dissipative phase of in-situ star formation, followed by a secondary phase of multiple (dry) minor mergers that gradually increase the galaxy size.More recent studies based on both semi-analytical and hydrodynamical simulations have confirmed and polished this two-phase scenario by predicting the accreted stellar fractions of early-type galaxies and confirmed that the fraction of accreted material increases with lower redshift and higher stellar masses (e.g., Lee & Yi 2013, 2017;Dubois et al. 2016;Rodriguez-Gomez et al. 2016;Davison et al. 2020;Pulsoni et al. 2021;Remus & Forbes 2022).
From the observational point of view, the assembly history of a galaxy is imprinted on the properties of its stellar populations.Significant observational work has been conducted in the last few years to study the stel-lar populations of the most massive galaxies at z > 3 from the integrated emission of photometrically-selected (and sometimes spectroscopically-confirmed) samples (e.g.Glazebrook et al. 2017, Alcalde Pampliega et al. 2019, Forrest et al. 2020a,b, Marsan et al. 2022 and references therein).However, there are also spectroscopic and photometric studies that show that the physical properties of galaxies present systematic trends both radially, mainly, but also azimuthally in the local Universe (see, e.g., Di Matteo et al. 2013, Tacchella et al. 2015, Nelson et al. 2016, Wang et al. 2017, Ho et al. 2018, Sánchez-Menguiano et al. 2020, Abdurro'uf et al. 2022, Chamorro-Cazorla et al. 2022, Bellardini et al. 2022).For these kinds of studies, the stellar population synthesis (SPS) modeling in two dimensions (2D) has become an essential tool to derive these subgalactic-scale resolved properties by analyzing their spatially-resolved emission.If applied to massive progenitors at high redshift, this spatially-resolved analysis can provide further clues on the role of different mechanisms (e.g., internal vs. external, secular vs. fast) on the evolution of massive galaxies, since the different mechanisms proposed act at different scales and have a different impact as a function of galactocentric distance (Dekel & Burkert 2014, Zolotov et al. 2015, Tacchella et al. 2015, 2016, 2018, 2019, Akhshik et al. 2022) and for different morphological components (e.g., Méndez-Abreu et al. 2021, Johnston et al. 2022 for local galaxies, and Costantin et al. 2021Costantin et al. , 2022 for higher-redshift galaxies).
Still, we cannot ignore that the main drawback of the stellar population synthesis is the intrinsic degeneracies associated with the study of the emission from stars, even within individual pixels.Indeed, limitations in the spectral resolution, wavelength coverage and/or signalto-noise of the data, result in strong degeneracies in the parameter space of physical properties such as age, star formation timescale, metallicity, or attenuation by dust.The correlations among these parameters are more or less difficult to disentangle depending on the value of each specific parameter (e.g., young ages, such as those expected in high-redshift galaxies, are less prone to some of these degeneracies), as well as the mentioned observational errors and wavelength coverage (see e.g.Gil de Paz & Madore 2002).
The latter problems are especially difficult to tackle as we go to higher redshifts and/or study less massive galaxies, since galaxies are generally too faint to reach their continuum level with high spectral resolution.This makes the SPS analysis of progenitors of massive nearby galaxies at high redshift extremely challenging, as we cannot use a large enough sample of them reaching faint magnitudes to account for a significant fraction of their whole mass and, thus, obtain statistically robust results.These large samples are still harder to get when using spectroscopic (instead of photometric) data, which are resource-demanding and therefore relatively scarce.In addition, due the technological limitations of our observatories, it is hard to cover wide spectral ranges to account for different populations and effects such as dust extinction or other degeneracies.
In this regard, the James Webb Space Telescope (JWST; Gardner et al. 2006) will play an important part in overcoming these issues, thanks to its unprecedented sensitivity, high spatial resolution, wide wavelength coverage, sensitive to stars of different ages, and variety of spectral resolutions (from spectroscopic R ∼ 3000 to photometric R ∼ 7 and spectro-photometric R = 100).Our approach is to use spatially-resolved multiwavelength broad-band data from JWST, combined with already existing broad-band data from the Hubble Space Telescope (HST), to extract Spectral Energy Distributions (SEDs) of progenitors of nearby massive galaxies at high redshift and apply 2D-SPS modeling on them.The determination of stellar population parameters in two dimensions, and not only for the galaxy as a whole, will help us to analyze stellar mass distributions inside galaxies and to recover realistic integrated galactic SFHs.The power of analyzing the stellar populations in 2D resides in the fact that smaller regions of a given galaxy should have its observational properties driven by a more simple SFH, which can therefore be characterized with fewer parameters compared to that required to characterize an entire galaxy (which easily could count with several, quite different stellar populations).Analyzing the stellar content of high-redshift systems in 2D will be fundamental to identify the evolutionary stages a galaxy can undergo regarding its stellar content evolution and to determine when its assembly began.
This work aims at determining the robustness of spatially-resolved SFHs for massive galaxies at 1 < z < 4, derived from the 2D-SPS analysis of simulated HST+JWST broad-band photometry, to infer when these massive progenitors began their stellar mass assembly.The reason for focusing on this 1 < z < 4 redshift interval, for which JWST is expected to provide high-quality data, is that it includes the so-called 'cosmic noon', i.e., the epoch of the Universe where the cosmic star formation rate density history was maximum at z ∼ 2 and where a considerable fraction of the local stellar mass was formed: half of the present-day stellar mass was formed before z = 1.3 (see Madau & Dickinson 2014 and references therein).The results of this work will help us identify the first progenitors of massive galaxies around cosmic noon, an epoch of the Universe of great interest for many JWST studies in the upcoming years.
In order to establish a methodology and test its performance, we use galaxy images in different bands simulated by the Illustris Project.One of the most appealing features of recent cosmological simulations such as Illustris-1 (Vogelsberger et al. 2014a, Nelson et al. 2015) is that some of them provide synthetic stellar images generated for simulated galaxies at different redshifts in common broad-band filters from HST or JWST (Torrey et al. 2015) and with "realistic" morphologies (at least, relative to the morphologies provided by typical semianalytical models).These images, when compared to the available information of the stellar particles for each galaxy in the simulation, make them the perfect benchmark to test how successful our 2D stellar population synthesis method is at recovering the early formation of massive galaxies at high redshift when upcoming JWST broad-band photometry data become available.
The present paper is structured as follows.In Section 2, we give a brief overview of the Illustris simulation and explain how the sample of simulated galaxies has been selected from the synthetic images.In Section 3, we present the processing of the images and the photometric method conducted to build SEDs in 2D.Section 4 describes the SPS modeling of the SEDs measured in 2D and the derivation of the integrated SFHs from these 2D-SPS analysis.In Section 5, we evaluate the success of our method in recovering the earliest phases in the formation of each galaxy in the sample.Finally, in Sections 6 and 7, we present our results and outline our conclusions, respectively.

DATASET
In this paper, we present a method to study the star formation history of massive, spatially resolved highredshift galaxies using HST and JWST photometric data.We apply this method to a sample of galaxies simulated by the Illustris Project.Our aim is to assess the utility of HST plus JWST combined datasets for the analysis of the earliest evolutionary phases of nearby massive galaxies.For that purpose, our approach consists in selecting the progenitor galaxies at high redshift of nearby massive galaxies, and then analyzing their pixel-by-pixel and resulting integrated-galactic SFHs.In this Section, we describe the simulations, the sample, and the data.

Illustris Simulation and synthetic deep-survey images
The Illustris Project is a set of hydrodynamical simulations of a (106.5 Mpc) 3 periodic cosmological volume that trace the evolution of dark matter, gas, stars, and supermassive black holes from z = 127 to z = 0 (Vogelsberger et al. 2014a, Nelson et al. 2015).Illustris comprises six different runs: Illustris-(1,2,3) and Illustris-(1,2,3)-Dark, where the former include a baryonic physical component in addition to the dark-matter content, and the latter the dark-matter component alone.We make use of the Illustris-1 run, the one with the highestresolution box of these six runs in terms of the number of resolution elements and their masses: it follows the evolution of 1820 3 dark-matter particles with a mass of 6.26 × 10 6 M each, and 1820 3 baryonic particles with an initial mass of 1.26 × 10 6 M .The Illustris-1 volume contains ∼ 40, 000 galaxies at z = 0 with more than 500 stellar particles − roughly equivalent to galaxy stellar masses of M * > 5 × 10 8 M (Torrey et al. 2015).Out of all these galaxies at z = 0 in Illustris-1, 856 galaxies have stellar masses of M * > 10 11 M .
This work is based on the simulated galaxies and broad-band images from the Illustris-1 supplementary data catalogs published by Snyder et al. (2017, hereafter S+17) and available via MAST 1 .These catalogs correspond to three synthetic deep survey square images in three different fields, which are labeled as field A, B, and C, each 2.8 × 2.8 in size.As described in S+17, each of these three deep survey images has been created by applying the lightcone technique proposed in Kitzbichler & White (2007) to the periodic Illustris-1simulation volume.This technique is based on replacing distant volume in the simulation with the output from an earlier cosmic time.It consists in first replicating the periodic cubic volume simulation until a desired comoving distance is reached and tracing a lightcone across all the simulation replications.Then, the output time of the simulation that fills the lightcone volume is varied as a function of the comoving distance, i.e., the distant volume in the lightcone is replaced with the output from an earlier cosmic time in the simulation.The three differ-1 MAST: 10.17909/T98385 ent survey images used in this work have been generated using the same lightcone geometry with three different orientations.Each of these lightcones contains unique galaxies in the simulation up to z ∼ 18 with no repetition, although some of these galaxies can be repeated between the three different fields.
Finally, the physical information given by the output of the simulation at each redshift is processed with the spectral synthesis code SUNRISE (Jonsson 2006;Jonsson et al. 2010) to create synthetic images in arbitrary filters.This is done by assigning SEDs to the stellar particles according to their mass, age, and metallicity, and by projecting these quantities from the simulation space to pre-defined hypothetical cameras.For the creation of these images, Starburst99 stellar population models (Leitherer et al. 1999) were used, with a Kroupa (2001) initial mass function (IMF).The effect of dust absorption has been included in these mock images using a simple birth cloud plus diffuse dust model (Charlot & Fall 2000).This models the total effects of dust at our wavelengths of interest, without requiring expensive dust radiative transfer simulations for the entire fields.
These three synthetic deep-survey images, also called "mock ultra deep fields", are provided down to the spatial resolution of HST and JWST, among other observatories, in a wide range of broad-band filters, imitating the conditions of real galaxy surveys.Publicly available catalogs associated to these images include galaxies in the survey images whose rest-frame g-band apparent magnitude is g < 30.0 mag (19,347 galaxies).

Sample Selection
Starting from all the galaxies included in the catalogs of the S+17 mock images, we select those at 1 < z < 4 which will evolve into a very massive (M * > 10 11 M ) descendant at z = 0, as tracked forward in time via the Illustris merger trees (Rodriguez-Gomez et al. 2015).This means that we select 1 < z < 4 progenitors in the mock images of z = 0 very massive galaxies in the whole Illustris-1 simulation.Among all progenitor galaxies at 1 < z < 4 in the images, we restrict our analysis to massive galaxy progenitors with M * > 10 10 M , for which very high quality JWST data will be available in the near future.This implies that, for the same 10 11 M galaxy at z = 0, all its massive progenitors at 1 < z < 4 in the images are considered in our sample (e.g. two massive galaxies at 1 < z < 4 could merge to form a M * > 10 11 M galaxy at z = 0).
The left panel of Fig. 1 shows the location in the star formation rate (SFR) vs. stellar mass plane of all the progenitor galaxies at 1 < z < 4 in the S+17 simulated images that have a M * > 10 11 M at z = 0.These 2,994 The Main Sequence found for all the Illustris simulated galaxies at different redshifts (Sparre et al. 2015) has also been plotted.Middle panel: postage stamp images for some representative examples of our galaxies (size 2.5 × 2.5 ).These RGB images are created using ACS/F1814W, NIRCam/F200W, and NIRCam/F277W asinh-scaled images as B, G and R filters, respectively.The position of these galaxies in the left panel has been highlighted with a white dot inside.Right panel: g − r color vs. stellar mass diagram for descendants at z = 0 of all galaxies at 1 < z < 4 (independently of their stellar mass) in the S+17 mock survey images.Descendants of the final sample of 221 galaxies analyzed in this paper are plotted as stars.
progenitor galaxies shown in the left panel of Fig. 1 represent 19% of all galaxies located at 1 < z < 4 in the images.This is equivalent to saying that 81% of galaxies at 1 < z < 4 in the images do not end up as a very massive galaxy at z = 0. Out of that fraction of 1 < z < 4 galaxies with a very massive descendant at z = 0, our selected sample is composed of the 221 progenitor galaxies with M * > 10 10 M (represented by stars in Fig. 1, left panel).Among them, two of them appear simultaneously in two of the three mock ultra-deep fields with different orientations, and will be considered as independent galaxies in this work regarding the photometric analysis and the derivation of the SFHs.In fact, the initial selected sample from the images is composed of 248 galaxies, but only 221 (+ 2 repetitions) are kept until the final analysis.These 248 galaxies in the initial sample of massive progenitors represent the 64% of all M * > 10 10 M galaxies at 1 < z < 4 in the images (388 galaxies), i.e., if we were to select all massive galaxies at 1 < z < 4 in the images, only 64% of them would actually become a M * > 10 11 M galaxy at z = 0. Interestingly, this percentage rises to 68% and 85% when the mass cutoff is set to 10 10.1 and 10 10.5 M , respectively.In these cases, the number of galaxies in the sample would decrease from 388 galaxies to 305 for the 10 10.1 M cut and to 123 for 10 10.5 M .In Fig. 1, we also show the galaxy Main Sequence for Illustris galaxies, as determined by Sparre et al. (2015).
The right panel of Fig. 1 shows the g − r color-stellar mass diagram for descendants at z = 0 of all galaxies at 1 < z < 4 in the mock images of S+17.Only descendants with log(M * /M ) 8.68 are shown, for which integrated photometry data is available (5,498 galaxies).Since descendants at z = 0 of galaxies in S+17 mock images have been traced via the Illustris merger tree and do not appear in these survey images, the integrated photometry magnitudes used to build this diagram have been extracted from the Torrey et al. (2015) synthetic individual galaxy images.But these mock galaxy images are only available for z = 0 descendants with a minimum stellar mass of log(M * /M ) 8.68 (∼ 63% of all the 8,768 z = 0 descendants of 1 < z < 4 galaxies in the images).
Our final 221 galaxies selected from these images evolve to 132 (unique) very massive galaxies at z = 0, which means that some galaxies in our final sample at 1 < z < 4 have the same descendant at z = 0.These descendants are called z = 0 descendants of our main sample, hereafter, and are shown as violet stars in the right panel of Fig. 1.We note that 51% of all z = 0 M * > 10 11 M with a progenitor galaxy at 1 < z < 4 in the images do not come from any of the galaxies in our initial sample of 248 1 < z < 4, M * > 10 10 M systems.The main progenitor at z > 1 in the images of those nearby massive galaxies presents a typical stellar mass log(M * /M ) = 9.12 9.65 8.48 and a typical redshift of  z = 2.713.42 2.01 (median values and first and third quartiles).
The 221 galaxies in the sample account for 28% of the total stellar mass present in their 132 z = 0 descendants, where the 22 galaxies at 3 < z < 4 account for 8.1% of the stellar mass of their descendants, the 94 galaxies at 2 < z < 3 for 23%, and the 105 galaxies at 1 < z < 2 for 33%.These descendants are predominantly Red Sequence galaxies (RS; approximately 73%), but there are also a few galaxies in the Green Valley (GV; 14%) and in the Blue Cloud (BC; 13%).In the case of the M * > 10 11 M descendants of progenitor galaxies at 1 < z < 4 which are not massive, their location in the diagram is similar, but with a higher fraction of them in the BC: 63% in the RS, 16% in the GV, and 21% in the BC.
The galaxies in our sample are projected into three 2.8 × 2.8 arcmin 2 area (S+17).The typical stellar mass, star-formation rate (SFR), redshift and stellar half-mass radius (i.e., the radius enclosing half of the total stellar mass of the galaxy; r hm ) of galaxies in the sample are log(M * ) = 10.4 10.7 10.1 M , SFR = 25 43 12 M /yr, z = 2.0 2.5 1.6 , and r hm = 4.7 5.8 3.6 kpc (median and quartiles), respectively.Fig. 2 shows the histograms of these properties for our sample of 221 galaxies.
In this section, we describe how the simulated deep survey images have been processed to obtain SEDs for all galaxies in our sample.We measure integrated-light photometry for each galaxy, and we also consider spatially resolved SEDs, all constructed with data in several HST and JWST filters.Apart from simulated images, we also use redshifts and physical properties extracted from the Illustris database.Throughout this paper, we refer to 'ground-truth' or 'reference' properties as those galaxy properties derived either from the catalogs associated to the S+17 mock images or from the Illustris database, in contrast to the properties obtained from SED fits to stellar population synthesis models for different regions in each galaxy, what we call 2D-SPS-derived properties.

Photometric broad-band filters
As described in Section 2.1, this work uses the Illustris-1 "mock ultra-deep fields" from S+17, each 2.8 arcmin in size.These synthetic deep survey images are available to be used in 34 broad-band filters onboard the HST/ACS and WFC3, JWST/NIRCam and MIRI, and Roman/WFI.For this work, we consider 15 HST and JWST broad-band filters in the optical and nearinfrared (see Table 1), which will be available for several cosmological fields covered by HST legacy projects such as the Cosmic Assembly Near-infrared Deep Extragalactic Legacy Survey fields (CANDELS; Grogin et al. 2011;Koekemoer et al. 2011), and JWST Guaranteed Time Observations (GTO), Cycle 1 Guest Observers (GO1) or Early Release Science (ERS) programs such as the MIRI Deep Survey (Norgaard-Nielsen & Perez-Gonzalez 2017), the JWST Advanced Deep Extragalactic Survey (JADES; Williams et al. 2018, Rieke et al. 2019), the Public Release IMaging for Extragalactic Research (PRIMER; Dunlop et al. 2021) or the Cosmic Evolution Early Release Science (CEERS; Finkelstein et al. 2017).The filterset covers the optical through mid-infrared observed spectral region, corresponding to rest-frame UV to near-infrared wavelengths at 1 < z < 4, an adequate range to study the emission from young and old stellar populations (i.e., providing information about the SFH).The survey images released by S+17 are noisefree synthetic images with different spatial resolution depending on the instrument and telescope used (see Table 1) and available with or without considering the point spread function (PSF) degradation for each filter.For this work, we use as starting images those without the PSF model applied, with pixel scales provided in Table 1. a Pixel size in original images.Final (matched) pixel size is 0.06 .b HST depths from the CANDELS/3D-HST catalogs (Skelton et al. 2014): median 5σ depth calculated from the errors of objects in the final catalogs (apertures of 0.7 ) for all 5 CANDELS fields.JWST depths corresponding to the CEERS proposal: values represent the planned 5σ point source depths per filter, assuming a total integration time of 2867 s for all NIRCam filters except for F115W (5734 s).c This limiting magnitude is not included in the CANDELS/3D-HST catalogs from Skelton et al. 2014.We assume the same value as F125W, since the magnitudes for this band are slightly fainter/brighter than those of F105W in the CANDELS survey.

Photometric Measurements
We process the original idealized (without PSF and noise) images for HST and JWST filters in order to imitate CANDELS data (Grogin et al. 2011;Koekemoer et al. 2011) and the recently-begun CEERS observations (Finkelstein et al. 2017), respectively.The reason for choosing the CEERS survey as a test case is that, although depths for subsequent JWST surveys are expected to be better, CEERS is expected to provide one of the first publicly available datasets for deep-field JWST observations.Cycle 1 GO / Treasury programs such as PRIMER are expected to achieve similar depths.
First, we match the pixel size of the images to that of HST/WFC3 (0.06 arcsec).Second, we convolve the image for each filter with the PSF model for the WFC3/F160W filter (FWHM ≈ 0.19 arcsec).Finally, we add Gaussian sky noise to the HST and JWST images to achieve the same depth as in the CANDELS and CEERS observations, respectively.We degrade HST images adding noise in order to match the median background rms noise measured around CANDELS massive galaxies (M * > 10 10 M ) at 1 < z < 4. For this calcula-tion, we use actual regions covered by CANDELS/Wide, CANDELS/Deep (Grogin et al. 2011, Koekemoer et al. 2011), and the Hubble Ultra Deep field (HUDF; Beckwith et al. 2006, Oesch et al. 2010a, Ellis et al. 2013, Koekemoer et al. 2013, Illingworth et al. 2013).For JWST images, the sky noise is estimated using the official JWST exposure time calculator Pandeia2 (Pontoppidan et al. 2016), by measuring the predicted signal-to-noise ratio (SNR) as a function of the pixels surface-brightness in CEERS observations for the selected JWST broad-band filters.The initial spatial resolution of the images for each filter (before registering) and final depths (after the noise addition) are given in Table 1.The limiting magnitude has been estimated (without aperture correction) from the sky rms value adopting a circular aperture with a fixed radius of 0.2 at a 5σ level.
We use these processed (registered, PSF-matched, and sky noise-added) deep survey images to measure photometry.To do that, we first generate a segmentation map by running SExtractor (Bertin & Arnouts 1996) on the WFC3/F160W image using the following parameters: we filter images with a gaussian kernel FWHM 3 pixels, set the minimum area for detections to 20 (a bit smaller than the PSF FWHM), the number of deblending sub-thresholds to the maximum (64), the contrast to 5 × 10 −4 , and local background to a size of at least 4 times the largest galaxy in our sample.The results did not vary much when changing the relevant parameters given that we are dealing with relatively bright galaxies.In general, there is good agreement between sources detected in the segmentation map and galaxy positions given by the Illustris catalogs based on the original images.Then, this segmentation map is combined with the information of the sample in the Illustris catalogs to create circular galaxy apertures that enclose the integrated emission from all galaxies in the sample (see 3.2.1 below).We choose to use galaxy positions given by the Illustris catalogs instead of the ones provided by SExtractor in order to make the future comparison between our results derived from 2D-photometry and from the simulated particles in the galaxy more fair (otherwise, the centers for the galaxy photometric apertures would be shifted from the galaxy centers given by the simulated particles).
We build galaxy SEDs by measuring multi-wavelength photometry in two different ways: inside these circular galaxy apertures, which enclose the entire flux of the galaxy (hereinafter "integrated photometry"), and for small parts of the galaxy defined after creating a grid inside this circular aperture with cell size equal to 3×3 pixels, roughly the area of the FWHM of the PSFhomogenized dataset ("2D photometry", hereafter).In both cases, the flux uncertainty is estimated as √ N × σ, where N is the number of pixels within the aperture and σ the rms of the sky for that filter.

Integrated Photometry
In order to define the integrated-photometry apertures, we use both the segmentation map and information extracted from the Illustris database, in particular, the galaxy centers and stellar half-mass radii.The apertures are first defined in the segmentation map around galaxy centers with an initial radius equal to twice the stellar half-mass radius (r hm ).We decided to use the r hm values from the Illustris database because many of the galaxy properties provided by these simulations refer to this radius (and to 2 × r hm ) and, typically, papers compare the results based on observations to what Illustris provides for 2 × r hm (see, e.g., Vogelsberger et al. 2014b, Sales et al. 2015, Cook et al. 2016, Elias et al. 2018, Valentino et al. 2020) .With the following procedure, we test how those apertures compare to what is directly measured in the simulated images.The initial radius is first reduced to minimize the contamination from other sources until 80% of the pixels within the aperture belong to the considered galaxy, i.e., we allow up to 20% of pixels from neighbors.If the number of sky pixels (defined as those not belonging to any source) within this new aperture is greater than 10%, we further reduce the radius to decrease that number below 10%.We refer to the final photometric aperture radius as r phot hereafter.We discard galaxies (from the initial selected sample of 248 galaxies) whose final aperture radius is r < 0.75×r hm .This showed to be effective in removing sources overdeblended by SExtractor and galaxies whose initial aperture (with r = 2×r hm ) is partially beyond the edges of the simulated images.We also discard galaxies which present surface brightness values fainter than 25 mag/arcsec 2 in WFC3/F160W in their brightest pixel within the final aperture.
Typically, the fraction of pixels within the final integrated photometric apertures belonging to other nearby galaxies is very low, less than 1% for 75% of the sample of 221 galaxies.The maximum percentage of pixels from neighbor galaxies within the integrated apertures is ∼20%, but only 27 out of 221 galaxies have more than 10% of pixels belonging to other sources.
In Fig. 3 (top panel) we show the histogram for the final radii of the apertures as a fraction of 2×r hm for the 221 galaxies in the final sample.This means we do not to compare with observations.The cumulative fraction of galaxies in shown as a red line.Bottom panel: histogram for the fraction of stellar mass enclosed by the photometric apertures (blue filled histogram) and for a radius of 2 × r hm (orange hatched) with respect to the total stellar mass in the galaxy.In both cases, we neglect neighboring galaxies.These total masses have been extracted from the simulated particles belonging to each galaxy in the Illustris database.
At the top of both panels, we show the median and quartiles for each histogram.
have apertures in this histogram at r/(2 × r hm ) < 0.375, due to the threshold imposed on the final apertures to be kept: r > 0.75 × r hm .We find the median and quartiles for r phot /(2 × r hm ) are 0.83 1.00 0.66 .It can be noticed that a considerable number of galaxies presents radii very similar or equal to the (starting) aperture radius of 2 × r hm : ∼ 40% and ∼ 37% of galaxies have r phot /(2 × r hm ) ≥ 0.95 and 0.999, respectively.On the contrary, only ∼ 8% of galaxies have aperture radii smaller than r hm .But assuming 2 × r hm as the best aperture to compare our results with simulations has demonstrated not to work for a significant number of galaxies.
The stellar masses enclosed by our final apertures, a fraction of the total stellar mass of each galaxy, can be seen in the bottom panel of Fig. 3.As a comparison, we also show the histogram for the fraction of stellar mass enclosed by a radius of 2 × r hm .For both radii, the masses have been extracted from the Illustris database, i.e., they are ground-truth masses calculated by adding up the simulated stellar particles belonging to each galaxy which are closer to the galaxy center than the radii considered.The typical percentages of the stellar masses enclosed by these two apertures are (median and quartiles) 65 72 57 % for the final photometric apertures and 75 80 71 % for a radius of 2 × r hm .These percentages refer to total stellar masses provided by the Illustris database or using all particles (in both cases, referring to the whole dark matter halo, i.e., they include regions with very low mass surface densities, whose emission is well below our observational limits).This procedure aims at reproducing that to be applied to actual HST+JWST observations, so we should be able to evaluate its impact on reproducing ground-truth properties.
To measure the integrated flux for each galaxy, we only consider pixels whose center lies within the final circular aperture and which do not belong to other sources (according to the segmentation map).The fluxes of these pixels within the aperture (considered galaxy and sky pixels) are added together to build the integrated SED.
In Table 2 we show the typical integrated magnitudes (median and quartiles) of our 221 galaxies in several filters and in different redshift bins.
Table 2. Integrated magnitudes of our 1 < z < 4 sample Note: Median values, first and third quartiles of the integrated magnitude measured in different redshift bins and for several bands.

2D Photometry
Separately, we also define a grid inside the circular aperture with cell size equal to 3×3 pixels (0.18 ×0.18 ) and we measure SEDs for each of these spatial resolution elements.We only consider cells inside the aperture which have at least one pixel that belongs to the galaxy we are measuring.If there are any pixels in the cells belonging to other galaxies (according to the segmentation map), the value of these pixels is replaced by sky pixels by adopting the same Gaussian sky noise distribution previously added to the images.Additionally, we only keep SEDs for our analysis from cells with a SNR > 3 in at least 5 bands and with surface brightness brighter than 25 mag/arcsec 2 in WFC3/F160W.Cells that do not satisfy these conditions are discarded.The reason for imposing this surface brightness limit of 25 mag/arcsec 2 is to deal with the problem of the larger-than-observed galaxy sizes in Illustris galaxies with M * 10 10.7 M , which present larger half-light radii and more extended discs than real observed galaxies (Snyder et al. 2015).
Fig. 4 shows an example of the integrated aperture and the grid for one of the galaxies in the sample.We also include in this figure the segmentation map around this galaxy and the initial integrated aperture of r = 2 × r hm (dotted line).We show the WFC3/F160W image for this galaxy before and after replacing the values of the pixels of nearby galaxies by random values drawn from the same Gaussian sky noise distribution previously used to add the noise in this image.The initial grid covers all the region within the final integrated aperture, but we only keep cells in cyan for the stellar population analysis, as described in the previous paragraph.
The typical number of cells per galaxy decreases as we move to higher redshifts, as expected given the smaller size of high-redshift galaxies (e.g., Bouwens et al. 2004, Oesch et al. 2010b, Ono et al. 2013).Our galaxies present (median and quartiles) 98 136 69 cells at 1 < z < 2, 52 86 37 at 2 < z < 3, and 24 29 21 at 3 < z < 4. We also calculate the median SNR of the galaxy cells included in the analysis in different redshift bins, by calculating the SNR as the ratio between the measured flux and its uncertainty in these cells.Median

Redshifts
Galaxy redshifts are taken from the Illustris database.Specifically, we use the 'inferred redshift' values in the catalogs that were used to generate the simulated images in S+17.This redshift value corresponds to the inferred cosmological redshift obtained considering the contribution of both the true cosmological redshift and the galaxy peculiar velocity.A discussion about the possible differences in the derived stellar population properties due to uncertainties in the redshifts of the corresponding observed galaxies is beyond the scope of this paper.Although photo-z uncertainties can be substantial, it will be likely that future spectroscopic redshift samples from both JWST and the Atacama Large Millimeter/submillimeter Array (ALMA) will increase the fraction of galaxies with spectroscopic redshifts and, at the same time, improve the photo-z accuracy at such high redshifts.

WFC3/F160W
PSF Figure 4. Postage stamp images (size 2.5 × 2.5 ) for one of our galaxies: Illustris-1 066 0000006 at z = 2.192 (Field A of S+17 images).This galaxy code stands for galaxy id 6 and snapshot 66 (z = 2.21) in the Illustris-1 simulation.Top panel: Segmentation map, with colored regions delimiting the galaxy (in orange), three nearby galaxies (green, yellow, and red), and sky pixels (black).We also show the integrated photometric aperture (dark cyan solid line) and the starting aperture of r = 2 × r hm used to calculate the former (dotted line).Inside the integrated aperture, we show the grid where the 2D photometry is measured: the cells that are finally kept for the 2D-SPS analysis are shown in cyan, cells discarded for not fulfilling the SNR and surface brightness criterion as pink hatched squares, and cells discarded for not including any pixel from the considered galaxy as white hatched squares.Bottom left panel: WFC3/F160W image (already registered, PSF-matched, and sky noise-added) with the integrated aperture.Bottom right: WFC3/F160W image where the values of the pixels from nearby galaxies have been masked by replacing them with the same Gaussian sky noise distribution previously added to the images.We also show the integrated aperture and the grid.
Fig. 5 shows the stellar mass vs. redshift plot for our galaxies.Two different values of stellar masses are shown for each galaxy: considering all particles in the galaxy or only particles inside twice the stellar halfmass radius.The median values and quartiles for these mass measurements, in log(M * /M ), are 10.43 10.73  10.17 and 10.30 10.63  10.05 , respectively, and 2.03 2.47 1.57 for the redshifts.

Ground-truth Physical Properties of each Galaxy
We use the Illustris database to extract the information for all the simulated particles belonging to a galaxy.In particular, we extract total stellar masses and SFRs of the whole galaxy.These total stellar masses and SFRs values are the ones shown in Fig. 1.
We build ground-truth galaxy SFHs from the information of the individual particles that belong to each galaxy.The SFH can be computed by first loading from the database the stellar particles in the galaxy and then making a histogram of their formation ages in lookback time.We assume time bins for the formation ages of 25 Myr up to a formation age of 1 Gyr, and 250 Myr afterwards.Subsequently, the stellar mass formed per time interval is calculated by summing the masses of the stellar particles formed in each time bin.We will also use in the following sections an additional SFR estimation for each galaxy obtained from the SFH and calculated by loading the gas particles in the galaxy for the snapshot corresponding to the observed redshift and adding together their instantaneous SFR values.Since these ground-truth SFHs will be compared with those derived from the SED-fitting in Section 5, in order to make the comparison more fair, we only consider stellar and gas particles which are closer to the galaxy center than the radius of the photometric aperture.Still, our SFHs will be naturally affected by the limitation inherent to detection and photometry procedures.

ESTIMATION OF THE SFH FROM 2D SED FITTING
Our aim is to study the earliest formation phases of nearby massive galaxies by analyzing in detail the formation history and location of the star formation in massive galaxy progenitors at high redshift.For that purpose, our approach consists in using broad-band data covering the (observed-frame) optical-to-mid-infrared spectral range provided by HST and JWST.In this paper, we assess the robustness of the results derived from this type of realistic but naturally simplified analysis using simulated imaging data from Illustris (see Section 3).As described in the previous section, we build SEDs for each source for both the integrated emission as well as in a 2D grid.In this section, we describe the derivation of integrated SFHs for each galaxy based on SPS modeling for these SEDs.We then analyze how successfully we can recover the ground-truth SFHs when only broad-band HST and JWST photometric data is used.

Stellar Populations Synthesis Modeling
The integrated and grid SEDs are compared with stellar populations models of Starburst99 (Leitherer et al. 1999), assuming a Kroupa (2001) IMF and a Calzetti et al. (2000) attenuation law.To perform these fits, we use the synthesizer code described in Pérez-González et al. (2003, 2008).This code combines the emission of both stars and gas, compares the models with the observed data, and returns the model that best fits the data by performing a χ 2 minimization.
We assume a double-burst delayed exponential SFH, with each burst described by SF R(t) ∝ t • e −t/τ for t > 0 up to t = t burst , being τ the star formation timescale and t burst the age of the burst.The age of the burst must be understood as the time passed between the age of the Universe when the galaxy started to form stars in that burst and the time corresponding to the observed redshift.The SFH form was chosen after some testing which indicated that 2 bursts following a delayed-exponential SFH instead of other simpler parametrizations (e.g., one single exponential burst) as a more adequate parametrization to successfully reproduce the ground-truth SFHs given by Illustris.Several tests were performed assuming only one time-delayed exponential with different values for τ and the minimum age of the population, but all of them resulted in the 2D-SED fits selecting too young (recent) ages for the models which were unable to recover the ground-truth galaxy SFH.We will discuss this further in Section 5.
The free parameters in our SED fits are the age (t), star-formation timescale (τ ), extinction (A V ), and metallicity (Z) for the two stellar populations, in ad-dition to the burst strength (b).This burst strength describes the fraction of the total stellar mass that has been created by the most recent burst.We consider an old burst as the one occurring first in the galaxy formation history and a young burst as the one occurring closer to the age of the Universe corresponding to the observed redshift.The stellar mass (M * ) is derived from the SED fits by normalizing the best-fitting model (which provides mass-to-light ratios at all wavelengths) to the observed photometry.
In order to improve the resemblance of the estimated SFHs to the ground-truth SFHs built from simulated particles in the simulation, we found that the allowed age ranges for the old and young stellar populations must depend on the galaxy redshift.In other words, the age frontier between the two stellar populations is an important parameter to set a priori based on the galaxy redshift.We thus tested the dependence of our results on this age separation limit (age lim in Table 3), considering values as a function of the age of the Universe for a given redshift.The most accurate results in our analysis (considering the mass-fraction formation ages discussed in Section 5) are obtained when we impose an age limit between the 2 populations equal to 40% of the age of the Universe for galaxy redshifts at 1 < z < 2 and to 50% of the age of the Universe for 2 < z < 4 (i.e., 40% of the age of the Universe for the young population and 60% for the old population when 1 < z < 2, and 50% of the age of the Universe for both populations when 2 < z < 4).
Similarly, we tested different values for the star formation timescales of both bursts and found the best results are obtained when setting 200 < τ old < 1000 Myr, and τ young = 10 Myr.Table 3 shows the ranges within which each free parameter is allowed to vary in the SEDfittings of this work.
It is impossible to present the results of the full set of tests performed to explore the parameter space here, so instead we focus on those that yield the best match and will be the most useful for upcoming JWST observations.
The metallicity is allowed to adopt three different values: Z/Z = 0.2, 0.4, and 1.These values are expected for our sample according to the mass-metallicity relationship at the considered redshifts, at which this relationship shows lower metallicities than locally for a given mass (Erb et al. 2006;Maiolino et al. 2008;Mannucci et al. 2009;Zahid et al. 2011).Regarding the dust attenuation, we limit the attenuation range to values within 0 < A V,old < 1 mag for the old population, and within 0 < A V,young < 2 for the young population.a The age separation limit between the young and old populations, is measured (backwards) from the redshift of observation of the galaxy.This value depends on the redshift of the galaxy: it is set to 40% of the age of the Universe for 1 < z < 2 and to 50% of the age of the Universe for 2 < z < 4. b We use all the discrete values for the ages given by the SB99 models within the allowed range.c The maximum allowed value for the age of the old population is the age of the Universe at the galaxy redshift.
Fig. 6 shows an example of an integrated SED fit and the values of the model that best fits our data.For this particular galaxy, both the old and young stellar populations contribute at the 40%-60% level throughout the full spectral range.This analysis is performed for the 221 galaxies: both for their integrated measurements and their 2D emission (given by the cells in the grid).
To estimate uncertainties in the derived stellar population properties, Monte Carlo (MC) simulations are performed in synthesizer by allowing the photometric data to normally vary within their photometric uncertainty (without correlation), and then repeating the fit again for 300 resampled SEDs (more details in, e.g., Domínguez Sánchez et al. 2016).This results in 300 sets of solutions for each SED which also provide us with information about the typical degeneracies present in these kinds of studies (e.g., age-metallicity or τ -age degeneracy).Fig. 7 shows the stellar masses for our 1 < z < 4 sample derived from the integrated SED-fits vs. their ground-truth values and how the one-to-one relation is recovered, with a median offset of 0.04 dex and a scatter of 0.2 dex.Each of the SPS-derived galaxy masses has been calculated as the median stellar mass provided by the integrated SED fits.The ground-truth masses have been calculated from the database by considering only the simulated particles of each galaxy closer to the galaxy center than the radius of the integrated photometric aperture.In both cases, stellar masses correspond to the current mass of stars at the redshift of observation (i.e., calculated after tak- ing into account the time-dependent mass-loss and fraction of remnants as a function of time).The histograms for both distributions are also shown.We notice our SPS-derived stellar masses slightly overestimate the ground-truth masses, although the differences are small: log(M * /M ) = 10.33 10.80 9.96 for the integrated SED-fits vs. 10.21 10.72  9.92 for the stellar particles (median and 68% confidence interval).
Several effects could be responsible for the dispersion observed around the one-to-one relation between our integrated vs. ground-truth masses.One of these effects is due to the difference between the volume of the sphere used to calculate the ground-truth masses and that of the cylinder along the line of sight which would enclose all the stars contributing to photometry.This makes that, depending on the shape of the galaxy, when measuring photometry we may include the light from stellar particles in the line of sight located at large galactocentric distances (not included in spheri- cal aperture used to calculate the ground-truth galaxy mass).In fact, this is what happens for the galaxy with an integrated mass that differs in ∼1.5 dex in Fig. 7, which is being outshined by an ultra-massive galaxy of log(M ground-truth * /M ) = 11.47 located at only 0.1 from its center.Another effect which can cause dispersion in the recovered integrated mass is the presence of nearby neighbors and their identification in the segmentation map.Even though the light of these neighbors inside the photometric aperture is usually (but not perfectly) blocked by the segmentation map, so is the light of the considered galaxy inside this masked region.However, for the calculation of the ground-truth galaxy mass, we do take into account these stellar particles (from the considered galaxy) located in this region of the sphere.

Estimating SFHs from HST+JWST Photometry
Integrated SFHs are built for each galaxy from the 2D SED-fits, and compared with the ground-truth SFHs extracted from the Illustris database (see Section 3.4).We build 300 integrated SFHs for each galaxy using the 2D stellar population fits, which includes a MC method to estimate uncertainties and degeneracies.We first create one SFH for each resolution element in the galaxy grid.Subsequently, one global SFH is calculated by adding together all these SFHs in the grid.This procedure is then repeated 300 times, one for each of the MC particles presented in the previous section.By combining and adding the 300 SED-fits for each cell in the grid, we obtain 300 independent estimations of the integrated SFH for a given galaxy.
The resulting SFHs are smoothed using a 100 Myr square kernel.Then, we normalize these 300 galaxy SFHs with the median stellar mass of the galaxy derived from the 300 integrated SED-fits.Unless otherwise stated, these galaxy stellar masses refer to the galaxy masses at the time of observation without including remnants or yields.Finally, we calculate the median SFH of the galaxy from these 300 normalized SFHs.
In Fig. 8, we summarize the methodology followed for a galaxy to obtain this median 2D-SPS galaxy SFH from its 2D photometry measurements.We start from the HST+JWST data (RGB image and some individual bands shown on the figure), in our example, a galaxy presenting a red center which resembles a protobulge and what seems like a blue disk with some spiral structure.We measure integrated and 2D photometry in a grid.We show SEDs and their stellar population modeling results for some representative grid regions in the center, blue arm on the bottom left and a diffuse emission zone on the top right.We note the difference in color shown in the SEDs between the center and the disk, and how the integrated SED resembles more that of the galaxy center rather than the disk regions.In this regard, considering 2D photometry in our analysis facilitates the estimation of SFHs since small parts of a galaxy are expected to have more simple forms than the entire galaxy, whose integrated photometry is dominated by the regions with highest intensities, i.e., they overshine fainter regions located in the outskirts for this example galaxy.The individual SFHs for each grid region are added to obtain an integrated SFH.In this example, we demonstrate how the derived SFHs with our method nicely reproduce the ground-truth SFH, significantly improving what can be obtained by analyzing the integrated SED.In the following section, we compare our derived SFHs from our 2D analysis of broad-band imaging data with the reference SFHs built using the Illustris-1 database in a statistical way.First, we show an RGB image of the galaxy and some of the HST+JWST broad-band images (in Table 1) processed to imitate CANDELS and CEERS observations for HST and JWST filters, respectively.We measure the 2D photometry inside a grid (in cyan) with cell size equal to the spatial resolution element, in addition to the integrated photometry (blue aperture).As an example, we show the SEDs measured for three regions of the grid: the center of the galaxy (teal cell), an arm region (light green), and a diffuse emission zone (pink).To estimate the uncertainties and degeneracies in the derived stellar populations, each SED is fitted 300 times by performing MC simulations.We add the individual SFHs inferred from all the grid regions to obtain the SFH for the whole galaxy.The black thin SFHs in the bottom subfigure show the SFHs for the whole galaxy created by accounting for the uncertainties in the SFHs of each grid region, where the median galaxy SFH from this 2D-SPS method is shown in yellow.The blue dotted SFH is the galaxy SFH inferred from the integrated photometry.The ground-truth SFH of the galaxy, given by the stellar particles belonging to the galaxy, is the green solid SFH.

VALIDATION OF THE METHOD
In this section, we evaluate the levels of success of our method in recovering the SFHs of massive galaxies at high redshift.In particular, in this paper we are mainly interested in the earliest phases of massive galaxy formation.Therefore, we compare our results based on the analysis of broad-band HST and JWST images with the ground-truth provided by the Illustris database for full galaxies.We will analyze spatially resolved stellar population properties in a future paper, here we concentrate on the results about the SFHs.

Characterization of the Earliest Phases in the Formation of Massive Galaxies
In order to characterize when massive galaxies start their formation, we calculate the formation times and redshifts when galaxies have formed a given (small) fraction of their total stellar mass.In particular, we discuss here the formation times when the first 5%, 10% and 25% of the stars in each galaxy and the entire sample were formed.We define t k as the formation time (measured from the galaxy redshift) at which a galaxy formed the k% of its total stellar mass present at the galaxy redshift.These mass-fraction formation times, t k , can be directly computed by integrating the 2D-SED-derived galaxy SFHs over cosmic time and are similar to other parameters used in recent literature (see, e.g., Ji & Giavalisco 2022).We remark that t k depend on both t 0 and τ , so they might somehow alleviate possible degeneracies between ages and timescales.We also calculate mass-weighted ages, t mw , from the SFHs.To test their robustness, these t 5 , t 10 , t 25 , and t mw extracted from the SFHs derived with our 2D-SPS method are compared with their ground-truth values, which are obtained from the SFHs of the simulation stellar particles following the same procedure and enclosed by the photometric aperture.
Fig. 9 shows t 5 , t 10 , t 25 , and t mw calculated from the SFHs derived with 2D stellar population modeling of broad-band data versus their reference values.For each galaxy, we show 300 values of these quantities (vertically spread out), which correspond to the 300 MC particles or 2D-SPS galaxy SFH as described in Section 4.2.The black dotted line shows the one-to-one relation between values derived from our 2D-SPS method (output) and those from the simulated particles (ground-truth).As explained in Section 4.1, some a priori parameters of our method (e.g.Table 3) have been optimized to reproduce this one-to-one relation for these four mass-fraction formation times.In general, we find our ages are consistent with this relation at all redshifts.As an inset in each panel, we show the histogram of relative differences be-tween median and ground-truth values for all galaxies.We find our t 5 , t 10 , t 25 , and t mw values are consistent with the ground-truth values with a median (relative) offset of +71 (+4.4%), +16 (+1.8%), -2 (-0.1%), and -50 Myr (+5.2%), and a scatter (68% interval) of 0.33, 0.31, 0.27, and 0.17 Gyr, respectively.As commented in Section 4.1, when we consider only a single-population SFH roughly spanning the whole range of parameters (t, τ , A V , Z) of the two-population fitting, too young ages are assigned in the SED-fits, which leads to an underestimation of all the ground-truth formation times.Thus, when we assume a SFH given by only one population instead of two, these median offsets are never better than -0.9, -0.8, -0.5, and -0.4 Gyr for t 5 , t 10 , t 25 , and t mw , respectively.
We have included on the top of each panel the ratio between the t k of our 2D-SPS method and their groundtruth values.In general, we do not see any systematic effect as a function of redshift, but we do find our t 2D-SPS 25 and t 2D-SPS mw tend to underestimate ground-truth in some galaxies as the ground-truth t k increases: t 25 (t mw ) presents a median systematic offset of only ∼ 5% (−2%) with respect to ground-truth for lookback times younger than 2 (1.5) Gyr, but ∼ −13% (−16%) for older lookback times.We do not see this behaviour in the case of t 5 and t 10 , which present a median systematic offset of less than ∼ 6% for ages younger and older than 2 Gyr.A possible interpretation for this bias in t 25 and t mw observed for some low-redshift galaxies could have its origin in the double-burst SFH model assumed for the 2D-SPS analysis.This model causes the galaxy SFH to usually rise quickly in later epochs due to the presence of the young population burst (see the median 2D-SPS galaxy SFH in Fig. 8).As a consequence of this late and rapid increase in the SFR, the middle of the SFH necessarily presents lower SFR in order to reproduce the given final light or mass, which would produce a bias when deriving those formation times of the galaxy that are more sensitive to this middle part of the SFH, i.e., t 25 and t mw .Since low-redshift galaxies have a more extended and possibly more complex SFH than galaxies at high-redshift, this effect would be more noticeable in them.
Regarding the precision of our method, we find the scatter values are relatively small and similar for all t k , but we observe the scatter tends to increase from t 5 to t 25 .Taking all this into account, we conclude that our 2D-SPS method successfully recovers t 5 , t 10 , t 25 and t mw for 1 < z < 4 massive galaxies with a ∼25% uncertainty and small <3% systematic effects.Comparison between the mass-fraction formation times (measured backwards from the redshift of observation of the galaxy) of our 2D-SPS method and their ground-truth values: t5 (upper-left), t10 (upper-right), t25 (lower-left) and tmw (lower-right) calculated from our 2D-SPS-derived SFHs versus their reference values calculated from the SFHs built from the simulated stellar particles in galaxies.Each galaxy is represented by 300 vertically spread points, which correspond to the mass-fraction formation times of the 300 2D-SPS SFHs, built from the 300 MC particles in each resolution element in the grid.The median of these values for each galaxy is shown as bigger circles and error bars represent the standard deviation of these values (68% interval).All points are color-coded by galaxy redshift.On the top of each panel, we show the ratio between the t k of our 2D-SPS method and their ground-truth values as a function of the latter.As an inset, we include the offsets of the galaxies in the sample.

EXPECTATIONS FOR THE DERIVATION OF THE SFH OF Z > 1 MASSIVE GALAXIES WITH HST+JWST DATA
The aim of this section is to discuss when the early stages of stellar mass assembly took place in very massive galaxies.To do this, we apply stellar population synthesis in 2D on our galaxy sample at 1 < z < 4 and we compare the statistical results on the SFHs for the sample with the ground-truth values inferred from the simulation stellar particles.This comparison gives us information about the limitations and observational biases we will encounter when using this method on galaxy samples constructed with real data.

When Did Massive Galaxies Begin to Form?
In Section 5, we showed that our 2D-SPS method successfully recovers t 5 , t 10 , and t 25 for our galaxy sample.These quantities represent the lookback times at which galaxies reach 5%, 10%, 25% of the stellar mass formed, respectively, and can be used to estimate the beginning of star formation in galaxies.To address the question of when the population of very massive galaxies began to form, our approach consists in calculating t 5 , t 10 , and t 25 from a median SFH of the sample built from the 2D-SED fits.At this point, we should remind the reader that our sample consists of galaxies at 1 < z < 4 with M * > 10 10 M , all having very massive descendants (M * > 10 11 M ) at z = 0. Hence, we expect the median SFH from our 2D-SPS modeling to resemble the median SFH of very massive galaxies at z = 0 over the redshift interval of our sample (where both SFHs overlap).We say resemble because our sample limitations might have implications on the derived SFHs, i.e., we have to consider the systematic effects introduced by not taking into account lower mass galaxies and by the relative number of galaxies selected at different redshifts in our full interval.
Since the final goal of our 2D-SPS method is to be applied on real galaxy samples at high redshift as soon as JWST data are available, we need to check first whether our results from the 2D-SED fits regarding the first episodes of stellar assembly in very massive galaxies are compatible with the ground-truth of the simulation.To address this issue, we build two additional samples of galaxies at z = 0 in the simulation to be compared with our main 1 < z < 4 galaxy sample.For these initial three samples of galaxies, we build the typical (median) ground-truth SFH from the simulated particles in their galaxies, which will be compared with the typical SFH from the 2D-SPS analysis on the 1 < z < 4 studied sample of massive progenitors.The samples of galaxies in the Illustris-1 simulation considered in this section are: (1) Our main galaxy sample at 1 < z < 4 used for the 2D-SPS analysis.It consists of the 221 (out of the 248) M * > 10 10 M progenitors at 1 < z < 4 in the S+17 images with a very massive descendant at z = 0, and which have not been discarded during the analysis procedure (see Section 2.2).This is the "high-redshift sample of (massive) progenitors" or "main sample", hereafter.
(2) The descendants at z = 0 of our main high-redshift galaxy sample, all of them with M * > 10 11 M (132 descendants).This sample is a subset of the whole population of very massive galaxies at z = 0.
(3) The whole population of very massive galaxies (M * > 10 11 M ) at z = 0.This sample is composed by 856 galaxies.
Although the main sample of this study is composed of massive progenitors of local M * > 10 11 M galaxies, our selection criteria could not be applied as such to real observations where the estimation of the final mass at z = 0 of observed high-redshift galaxies is not easy without additional assumptions.To study the impact of this, we apply the same 2D-SPS method described for our main sample of 1 < z < 4 progenitors to the following fourth, additional sample: (4) All M * > 10 10 M galaxies at 1 < z < 4 in the S+17 images, regardless of the stellar mass of their descendants at z = 0.It consists of the 350 (out of the 388) massive galaxies at 1 < z < 4 in the images that are not discarded when the 2D-SPS method is applied to them.This sample is a combination of our main galaxy sample of 221 massive progenitors plus other 129 massive galaxies at 1 < z < 4 in the images that do not reach M * > 10 11 M at z = 0 for any reason.
The median SFH of this sample of 1 < z < 4 massive galaxies built from the 2D-SPS analysis, and also the ground-truth median SFH from the simulated particles in each galaxy, will also be compared to the results of our main sample of 1 < z < 4 progenitors.
We remind the reader that the S+17 images only contain a limited number of galaxies from the full Illustris-1 simulation.Thus, the descendants at z = 0 of the galaxies we have studied through their HST+JWST simulated imaging data are also a limited subset of all the galaxies at z = 0 in the simulation.Our aim is to see if, by analyzing the SFHs for high-redshift, M * > 10 10 M progenitor galaxies in the images of z = 0 very massive galaxies, we can learn when the whole population of very massive galaxies began to form.
If we consider the masses of all the simulated stellar particles in galaxies, our main 1 < z < 4 sample of 221 galaxies accounts for only 28% of the total stellar mass present in their 132 z = 0 descendants, of which 1 < z < 2, 2 < z < 3, and 3 < z < 4 galaxies in the sample would account for 33%, 23%, and 8.1% of the stellar mass in their descendants, respectively.If we also take into account the total stellar masses of less massive progenitors at 1 < z < 4 in the images, none of them considered by the mass cutoff, this number raises to 32% (36% for galaxies at 1 < z < 2, 27% for 2 < z < 3, and 11% for 3 < z < 4).This means that the remaining 68% of the stellar mass in nearby massive galaxies must be explained by either more recent in-situ star formation events or by subsequent mergers at z < 1.We refer the reader to Section 2.2 for more information about the main 1 < z < 4 galaxy sample and their z = 0 descendants.
In Fig. 10 we compare the histograms for the groundtruth stellar masses of galaxies in the four samples, calculated from the simulated particles in the database.On different panels we show the histograms for the stellar masses corresponding to all particles in galaxies and to particles inside 2×r hm .In both cases, the distribution of the stellar masses for our 1 < z < 4 sample of massive progenitors resembles that of all massive galaxies at 1 < z < 4, although the former is biased towards higher masses: the median and 68% interval are log(M * /M ) = 10.4 11.0 10.1 vs. 10.3 10.8 10.1 when all particles are considered, and 10.3 10.8  10.0 vs. 10.2 10.6 10.0 for particles inside 2×r hm , respectively.Regarding the two samples of z = 0 galaxies, we find that the distribution of masses for the population of very massive galaxies at z = 0 and for the specific descendants are very similar: log(M * /M ) = 11.3 11.7  11.1 vs. 11.2 11.6 11.1 (all particles), and 11.1 11.5  11.0 vs. 11.1 11.4 11.0 (particles inside 2×r hm ).As discussed below in this section, this difference in the median masses is a consequence of the mass-cutoff imposed for the selection of the 1 < z < 4 sample and will have an impact on the mass-fraction formation times of both samples.In summary, we conclude that our limited sample of progenitors would indeed provide representative results for the full population of massive galaxies at z = 0.
In Section 6.1.1,we calculate the median ground-truth SFH for these subsets of galaxies using only the simulated particles in them, while in subsection 6.1.2we compare these results with the median SFH of our main high-redshift sample of M * > 10 10 M precursors built using our 2D-SPS method.Finally, in subsection 6.1.3we show the variation of the onset of star formation of massive galaxies.when considering all particles in galaxies (top panel) or only particles inside twice the stellar half-mass radius (bottom panel).We show our main 2D-SPS sample of progenitors at 1 < z < 4 in red, massive galaxies at 1 < z < 4 in cyan, the descendants at z = 0 of our 1 < z < 4 progenitors in orange, and the whole population of very massive galaxies at z = 0 in dark gray.The median and the 68% intervals are shown at the top of each panel.
6.1.1.Ground-truth SFHs: very Massive z = 0 Galaxies in Illustris and the 1 < z < 4 Sample Fig. 11 shows the sample-averaged (median) SFHs built from the Illustris simulated particles for the four different subsets of galaxies considered: the 1 < z < 4 main sample of 1 < z < 4 progenitors (blue line), massive galaxies at 1 < z < 4 (cyan), the descendants at z = 0 of our main sample of 1 < z < 4 progenitors (orange lines), and all massive galaxies at z = 0 in Illustris-1 (green lines).To study whether there is any aperture effect on the SFHs, the SFHs of the two subsamples of z = 0 galaxies have been built by selecting simulated particles at z = 0 in two different ways: either by considering all the stellar particles belonging to each galaxy (dotted lines), or only the stellar particles inside a sphere with radius 2×r hm (measured at z = 0; solid lines).In the case of the two high-redshift samples, the radius of each sphere has been set to match the radius of the photometric aperture used for the 2D-SED fits, r phot , and only particles within this radius have been considered.These typical SFHs have been smoothed using a 250 Myr square kernel.Shaded areas correspond to the uncertainty in the median SFH calculated as the 95% 1 < z < 4, M * >10 10 M sample, with M * >10 11 M desc.(part. in r phot ) 1 < z < 4, M * >10 10 M galaxies, with any descendant (part. in r phot ) Descendants at z = 0 of main 1 < z < 4 sample (all particles) Descendants at z = 0 of main 1 < z < 4 sample (particles in 2•r hm ) All galaxies at z = 0 with M * >10 11 M (all particles) All galaxies at z = 0 with M * >10 11 M (particles in 2•r hm ) 1 1.5 2 3 4 5 9 16 z Figure 11.Ground-truth median SFHs calculated from the simulated particles for the subsamples of galaxies: all M * > 10 11 M galaxies at z = 0 (green lines), all massive galaxies at 1 < z < 4 (cyan), our main 1 < z < 4 progenitors sample (blue), and the descendants at z = 0 of this high-redshift sample of progenitors (orange).The results from the 2D-SPS analysis are not included in this figure.The median SFHs for the z = 0 subsets have been calculated in two ways: considering all the particles in galaxies (dotted lines) or only the particles inside a sphere with radius 2×r hm to the galaxy center (solid).The median SFH of the two highredshift samples has been built using only particles whose distance to the galaxy center is lower than the radius of the photometric aperture used for each galaxy.Shaded areas represent the uncertainty of the median.As vertical lines, we show the t5, t10, t25, and tmw calculated for each SFH shown.As a comparison, we show in purple the median SFH calculated in Iyer et al. (2020) for all galaxies at z = 0 in the Illustris simulation with M * ∼ 10 11 M .confidence interval: t N −1 ×σ/ √ N , where t is the t value from the t-distribution for 95% confidence, σ has been estimated from the dispersion of the different SFHs, and N is the number of galaxies at each age.We only show median SFHs of the galaxies at z = 0 down to z = 1, the lower limit of our main 1 < z < 4 sample.Additionally, we point out that the typical SFH of both samples at 1 < z < 4 has been calculated from different numbers of galaxies at different ages (what we would be able to perform when working with real data), in contrast to the median SFHs of the other two z = 0 samples, which count with the same number of galaxies in the entire redshift range.
We remark that no normalization has been applied to the median SFHs shown in Fig. 11.Regarding the different absolute levels of these SFHs, these are a consequence of several factors.First, the difference in SFR values between the dotted lines with respect to the solid lines of the same color, corresponding to the z=0 samples (green and orange) can be explained by the construction method of the SFH followed in each case: SFHs which have been built using all particles present generally higher SFRs than those built using only particles in 2×r hm .This is expected, since both for the population of z = 0 10 11 M galaxies and for the z = 0 descendants of the main 1 < z < 4 sample, the stellar mass inferred from the typical SFHs of particles within 2×r hm (i.e., the area enclosed by the SFH) is ∼ 66% of the stellar mass inferred from the typical SFH built from all the particles in galaxies.We will address the difference between the median SFH of both z = 0 samples below in this section.Secondly, the average SFR values from the median SFH calculated for the z = 0 samples (green and orange) is systematically higher than for the 1 < z < 4 samples (blue and cyan).This difference is mainly due to the fact that we calculate the SFH for each z = 0 galaxy from their particles at z = 0 instead of computing it after tracing and independently considering the actual precursors of the z = 0 galaxy at high-redshift (unfeasible in real observations).As a consequence, the average SFR per z = 0 galaxy is bigger than the average SFR per precursor, due to the lower number of galaxies considered at z = 0.In the case of the difference between the median SFH of our main 1 < z < 4 sample (blue) and their descendants at z = 0 (solid orange for particles in 2×r hm ), there is an additional contribution based on the fact that some of the stellar mass of the z = 0 descendants comes from other less massive progenitors which are not included in our main 1 < z < 4 sample of massive progenitors due to our mass cut-off.In addition to this, the different apertures considered for both 1 < z < 4 samples with respect to the other z = 0 samples also plays a role in lowering the average SFR of these two high-redshift samples: the radii of the photometric apertures used to build the sphere that contains the simulated particles in these high-redshift samples are usually smaller than 2×r hm measured at the observed redshift, as explained in Section 3.2, and should be smaller than 2×r hm measured at z = 0.In fact, we remind the reader that the typical photometric aperture radius for our main 1 < z < 4 sample of progenitors is nearly 20% smaller than 2×r hm and encloses around 65% of the total stellar mass in the galaxies (see Fig. 3).When all massive 1 < z < 4 galaxies are considered, these numbers are similar: the photometric aper-ture has a median radius 19.8% smaller than 2×r hm and includes ∼64.8% of the stellar mass.
If we integrate the median SFH of both high-redshift samples over cosmic time, the difference in their SFR levels leads to our main 1 < z < 4 sample (in blue) recovering a stellar mass of 0.17 dex higher than for all 1 < z < 4 massive galaxies (cyan).This is a likely consequence of our main 1 < z < 4 sample containing only massive progenitors of local 10 11 M galaxies, and these progenitors, in order to reach such elevated stellar masses at z = 0, are expected to be more massive (in average) and have higher SFRs than ordinary massive galaxies at the sample redshift range.In fact, our main 1 < z < 4 progenitors sample has a median stellar mass 0.08 dex higher than all 1 < z < 4 massive galaxies (for particles inside r phot ) and a SFR (in r phot , median and quartiles) of 16 31 7 vs. 11 23 5 M /yr for all massive galaxies at 1 < z < 4.
In Fig. 11, we also show with vertical lines the massfraction formation times t 5 , t 10 , t 25 , and t mw calculated by integrating each sample-averaged SFH over cosmic time.As mentioned before, we only take into account for these calculations the stellar mass formed in each SFH at ages of the Universe below that of the minimum redshift of our main 1 < z < 4 sample (z ∼ 1), as at lower redshifts we lack information on the photometric properties of any of the galaxies in this sampleaveraged SFH.This means we only consider the z > 1 part of the median SFHs for the z = 0 galaxy subsets and the whole SFH for the high-redshift samples.We find the mass-fraction formation times for the SFH of the high-redshift samples (blue and cyan) are systematically shifted towards earlier times (expressed in terms of the age of the Universe; or correspond to older lookback times), than the ones for the descendants at z = 0 of our main sample (orange), and these, in turn, are younger than those from the whole population of massive galaxies at z = 0 (green).These ∼200-400 Myr shifts (∼ 10% in relative terms) could be explained by a progenitor bias, which appears as a consequence of the mass cutoff of M * > 10 10 M imposed on our main sample selection.We first focus on the younger (or earlier) formation epochs of the descendants (orange) with respect to those of all the very massive galaxies at z = 0 (green).The mass cutoff makes the sample of z = 0 descendants to be biased towards larger masses and more massive galaxies tend to present older stellar population ages.In contrast, when taking into account all M * > 10 11 M galaxies at z = 0, i.e., a complete sample of local massive galaxies, we obtain slightly later formation ages.In fact, as it can be seen in Fig. 10, the median and quartile stellar masses are 0.1 − 0.2 dex larger for the descendants of our sample of M * > 10 10 M galaxies at 1 < z < 4 compared to the complete sample of M * > 10 11 M z = 0 galaxies.In addition, our mass cut at 1 < z < 4 implies losing 17% of massive M * > 10 11 M galaxies at z = 0.
Indeed, this progenitor bias also explains why the time-averaged SFR for z = 0 descendants (orange) is higher than that for the population of very massive galaxies (green).This applies to the galaxy apertures considered in Fig. 11 (for which no mass-normalization has been applied).The differences increase for ages corresponding to smaller fractions of the total stellar mass.Furthermore, both for all massive galaxies at z = 0 and for z = 0 descendants of the main high-redshift sample, mass-fraction formation times are older (smaller values in age of the Universe) when all stellar particles are included in the SFH computation than when considering only those inside 2×r hm .This may be explained by an outward migration of stars, which would result in larger apertures adding a larger fraction of older stars in the SFH computation and, consequently, earlier formation times.
Regarding the shift to earlier formation times of the main progenitors sample (blue) with respect to the z = 0 descendants (orange), possibly also due to the progenitor bias, we propose a similar explanation as the one given above.Our main sample of massive progenitors at 1 < z < 4 is only a biased subset (cut in mass and with a given redshift distribution) of all the progenitors that evolve to a galaxy from the whole sample of z = 0 descendants.The descendants, as mentioned above, also have other progenitors that do not fulfill the mass cutoff at z > 1 and which may have undergone a merger with a galaxy from our main 1 < z < 4 sample at a lower redshift.These minor progenitors at 1 < z < 4 would not be included in the SFH of the (massive) progenitors sample, and they would probably host younger stellar populations, which would explain the shift towards more recent formation times for their z = 0 descendants.
Finally, we notice that the ground-truth formation times for our main 1 < z < 4 sample of precursors (in blue) are almost identical to the ones derived for all the 1 < z < 4 massive galaxies (cyan), with (relative) differences of less than 34 Myr (1.4%) in all cases.This suggests that, although our main progenitors sample accounts for approximately two thirds of all the M * > 10 10 M galaxies at 1 < z < 4, we can estimate the formation times of local M * > 10 11 M descendants by considering all the massive galaxies at the same redshift range.

Recovering the SFH of Massive Galaxies with 2D-SPS Modeling of HST+JWST Data
We now calculate the median SFH of the main 1 < z < 4 galaxy sample obtained from the 2D-SED fits (not from the Illustris database, as done in Section 6.1.1).For this calculation, we take the median SFH for each galaxy in this sample constructed from the 2D-SPS analysis (see Section 4.2), normalize each galaxy SFH to recover the median stellar mass (without remnants or yields) given by its integrated SED-fits, and combine all these normalized galaxy SFHs to build the median (or typical) SFH for the whole sample.We remark that this implies that the SFR at a given age of the Universe involves the combination of a different number of galaxies, namely, all that lie at lower redshifts compared to the redshift corresponding to that age of the Universe.
Fig. 12 shows this median SFH of the main 1 < z < 4 progenitors sample derived from the 2D-SED fits (red solid line).We compare this SFH, which could be obtained following the same procedure on real galaxy samples when JWST (plus HST) data are available, with the ground-truth.First, we compare with the averaged SFH for the same main sample of 1 < z < 4 precursors but built from the simulated particles inside the radius of the photometric aperture (blue).This comparison allows to evaluate the accuracy of our method.We also compare with the SFH of z = 0 descendants using the stellar particles inside 2×r hm (orange).This comparison allows us to understand the bias of a sample based on imaging data linked to selection effects.The two latter comparison SFHs are the same shown in Fig. 11, but normalized to recover the same stellar mass as the median 2D-SPS-derived SFH of our main 1 < z < 4 sample (10 10.57M at z = 1).Note that, without normalization, the corresponding stellar masses at z = 1 would be log(M * /M ) = 10.67 for the median ground-truth SFH of this high-redshift sample and 10.91 for the median SFH of their descendants at z = 0 (built from particles inside 2×r hm ).As commented in Section 6.1.1,the difference in the mass recovered by the SFH of our main 1 < z < 4 of massive progenitors and the z = 0 descendants is caused by the reduction in the multiplicity of galaxies when calculating galaxy averages at z = 0, the presence of less massive galaxy branches for the z = 0 descendants which are not included in our main 1 < z < 4 sample due to the mass cut-off, and the smaller aperture radius (in average) used to calculate the galaxy SFHs for the 1 < z < 4 sample.Shaded areas correspond to the uncertainty in the median SFH calculated as the 95% confidence interval.Finally, we have calculated the median SFH derived from applying Comparison between the median SFH of our main 1 < z < 4 progenitors sample derived from the 2D-SED fits (in red) and from the simulated particles (in blue).We also include the median SFH of the descendants of this high-redshift sample at z = 0 built with particles inside 2•r hm (in orange) and the median SFH of massive galaxies at 1 < z < 4 derived from applying the 2D-SPS method to this sample (pink).Shaded areas represent the uncertainty of the median.The SFHs have been normalized to recover the same stellar mass as the median 2D-SPS-derived SFH of our main 1 < z < 4 sample over the same redshift intervals (z 1).The vertical lines depict the t5, t10, t25, and tmw mass-fraction formation times for each SFH.Gray diamonds show the Illustris SFMS level at z = 1, 2, and 4 (Sparre et al. 2015).On the top panel, we show the evolution of the integrated stellar mass at each redshift for our main highredshift sample (2D-SPS and particles, same color code).
the 2D-SPS analysis to all 1 < z < 4 massive galaxies in S+17, following the same procedure than that described for our main 1 < z < 4 sample of progenitors.This new median SFH, also normalized to the same mass as the others, is shown in pink.For clarity, we have not included the uncertainties of this last median SFH.
We find that the general shapes of the four SFHs shown are not very different from each other, except for the SFR increase in the last 0.5-1.0Gyr (near z = 1) in the SFHs built from the 2D-SPS analysis (in red and pink).The SFH of our main sample of precursors built from the 2D-SED fits (red) follows the general trend of the SFH built from the simulated particles in the sample (blue), but for this raise, and this also happens for the 2D-SPS-derived SFH of massive galaxies at 1 < z < 4 (pink).The number of galaxies considered in the calculation of the SFHs for the high-redshift samples drops as the redshift decreases (as clearly seen in the increase of uncertainties).Thus, there are fewer galaxies near z = 1 from which to calculate the typical SFH and, when averaging, the weight of any individual galaxy SFH is bigger.The individual galaxy SFHs built from the 2D-SPS, as explained in Section 4.1, are the combination of the SFH of one young and one old stellar population.This causes the galaxy SFHs built from the 2D-SED fits to normally have a peak in SFR near the redshift of the galaxy, which is not necessary present in the galaxy SFH built from the simulated particles (see, for example, the median 2D-SPS galaxy SFH shown in Fig. 8).At higher redshifts, these individual peaks, if present, are not reflected in the median SFHs not only because the SFH is averaged over more galaxies but also because these individual peaks in SFR are located at varying redshifts and distributed across a wide time span.On the contrary, near z ∼ 1, where the number of galaxies drops, we pile up some of these peaks and this is reflected in the median SFHs from the 2D-SPS analysis.
In Fig. 12, we include the SFR values expected from the Illustris SFMS at z = 1, 2, and 4 (Sparre et al. 2015) for the stellar masses obtained by integrating our median 2D-SPS SFH of progenitors down to each of those redshifts.We remind the reader that this median 2D-SPS SFH, like other SFHs in Fig. 12, has been normalized to recover the median galaxy stellar mass (without including remnants or yields) of the main 1 < z < 4 sample of progenitors given by the integrated SED-fits.We find that our median SFR values (in red in Fig. 12) are consistent within the errors with those expected from the SFMS in Sparre et al., although the agreement observed for the z ∼ 1 SFMS value with respect to our median SFR at that redshift is probably caused by the spurious rise of the SFR in this median SFH at z ∼ 1 (due to the second peak of star formation in the individual galaxy SFHs).On the top panel, we show the evolution of stellar mass assembly for our main high-redshift sample, calculated both from the median SFH of the 2D-SPS analysis of these progenitors (in red) and that of their stellar particles in r phot (blue).
The mass-fraction formation times for each SFH in the figure are shown as vertical lines.We remark the good agreement between the t 5 and t 10 formation times derived for our main high-redshift sample of precursors using the 2D-SED fits (red) and the simulated particles (blue), with (relative) differences of -29 (-2.1%) and -9 Myr (-0.5%) in age of the Universe, respectively, with the ground-truth values being larger.Additionally, the differences in the other two formation times, although higher, are also low: +107 (+4.5%) and +149 Myr (+4.4%) for t 25 and t mw , respectively, with larger values from the 2D-SED fits.Regarding the comparison of these formation times for the main high-redshift sample (2D-SPS and particles) with their z = 0 descendants (orange), the agreement is also good, with maximum (relative) differences in age of the Universe of -188 (-12%), -184 (-9.7%), -227 (-8.7%), and -199 Myr (-5.5%) for t 5 , t 10 , t 25 and t mw , respectively, and 74 (5.5%), 59 (3.5%), 112 (4.7%), and 98 Myr (2.8%) when the main progenitors sample is compared to all massive galaxies at 1 < z < 4 (pink).As mentioned before, the older values of the t 5 , t 10 and t 25 formation times for the highredshift sample with respect to the z = 0 descendants are due to the progenitor bias.
According to the 2D-SPS analysis of our main 1 < z < 4 sample, local M * > 10 11 M galaxies would have formed 5% of their stellar mass at z ∼ 1 by z = 4.5, 10% of their mass by z = 3.7, and 25% by z = 2.7.These redshifts correspond to ages of the Universe of 1.4, 1.7, and 2.5 Gyr, respectively.This is equivalent to saying that 5% of the stellar mass present in very massive local galaxies at z ∼ 1 was already assembled when the Universe was only ∼ 10% of its current age, 10% of the mass by 13% of its age, and 25% of the mass by 18% of the cosmic time.
The first star formation episodes for these galaxies are located at z = 16 ± 1 (calculated as the median redshift where the SFR starts to be larger than 0).Then it would rise up to z ∼ 3.After that, it would remain approximately constant down to z ∼ 2, and a slightly decreasing trend would follow.This can be clearly seen in the ground-truth SFH derived from the simulated particles in the database and agrees with the median SFHs calculated in Iyer et al. (2020) for 10 10.75 < M * < 10 11.25 M galaxies at z = 0 in the Illustris simulation (considering all the particles in galaxies).Nevertheless, we hardly reproduce the most recent part of the SFH from the 2D-SPS analysis, where our method begins to fail.We note that our 2D-SED fitting method was calibrated in order to successfully reproduce the first instants of the stellar mass assembly in massive galaxies, i.e., t 5 , t 10 , t 25 , along with t mw .Thus, even though we successfully determine these mass-fraction formation times for the typical SFH of massive galaxies, our method fails to reproduce this SFH at higher ages of the Universe.However, the recovery of the whole SFH of massive galaxies and, consequently, their typical SFH throughout the whole redshift range, is beyond the scope of this work.We do not find any starburst epoch in the typical SFH from the 2D-SPS analysis on the main 1 < z < 4 sample of progenitors, and the other SFHs from the simulated particles, albeit 34% (13%) of galaxies in our 1 < z < 4 progenitors sample reach 50 (100) M /yr at some time of their 2D-SPS-derived galaxy SFHs because of the young population assumptions in the SEDfits.These starburst episodes are also present in the ground-truth SFHs of this main 1 < z < 4 sample of progenitors with similar numbers: 31% (14%) of galaxies reach 50 (100) M /yr at some time of their groundtruth galaxy SFH.Starburst events, which are usually short-lived, ∼100 Myr (Tacconi et al. 2008, Wuyts et al. 2011, Espino-Briones et al. 2022), have been proven to occur during the evolution of some galaxies, as it can be inferred from the high SFR values of Main Sequence outliers or those of the population of very luminous highredshift submillimeter galaxies, discovered by Smail et al. (1997).The reason why these brief intense star formation episodes are not present in our median SFHs is that they averaged out when we consider a whole population of galaxies, in the same way that they do not appear when calculating the cosmic star formation history (Madau & Dickinson 2014).In addition to this, the Illustris simulation is known to have has a paucity of strong starburst galaxies, i.e., a fewer fraction of galaxies that lie significantly above the SFMS when compared to observations (Sparre et al. 2015), which appears to be a consequence of the insufficient resolution of Illustris to resolve the sub-kiloparsec starbursting regions (Sparre & Springel 2016).

The Variety of the Start of SFHs in Massive Galaxies
Finally, we briefly concentrate on when the star formation started for the individual galaxies in the subsamples.We assume the start of the star formation for each galaxy can be given by its t 5 (calculated from their individual galaxy SFH) or, even, by its t 1 .Fig. 13 shows the diversity in the values of t 5 (left) and t 1 (right) for the main sample of progenitors at 1 < z < 4 (2D-SPS method and ground-truth from particles in r phot ), their descendants at z = 0 (ground-truth from particles inside 2×r hm ), the whole population of very massive galaxies at z = 0 (particles inside 2×r hm ), and massive galaxies at 1 < z < 4 (ground-truth from particles inside r phot ).For clarity, in the case of the population of very massive galaxies at z = 0 we do not include the histogram but only its statistical values (median and quartiles).The median values of both t 5 and t 1 for the main 1 < z < 4 progenitors sample given by the 2D-SPS (in red) are consistent with the ground-truth (in blue) for this sample (with differences of 0.26 and 0.19 Gyr, respectively), even though the histogram for our 2D-SPS method predicts that some galaxies start to form a bit earlier (∼0.2-0.4Gyr earlier) than expected according to the database.These differences are already visible in Fig. 12, where the SFH corresponding to the 2D-SPS analysis (red curve) rises faster in the beginning than that of the simulated particles (in blue).Regarding the distributions of the two subsamples at z = 0 (descendants of the main 1 < z < 4 sample in orange and the whole populations of M * > 10 11 M galaxies in black), their median t 5 and t 1 are similar and also consistent with each other.This also happens with the ground-truth values of massive galaxies at 1 < z < 4 (in cyan) and our main 1 < z < 4 sample (in blue), and, consequently, suggests again that the whole population of massive 1 < z < 4 galaxies tends to reproduce the earliest formation times of actual massive progenitors (at the same high-redshift range) of M * > 10 11 M galaxies at z = 0.
Fig. 13 can be useful if we want to know where to look for the first star formation episodes in massive galaxies.According to the ground-truth distribution of t 5 and t 1 for the main 1 < z < 4 sample of massive progenitors (in blue), 25% of these galaxies would have formed 1% .Histograms of t5 and t1 calculated from the SFHs of galaxies in the different samples considered.Our main 1 < z < 4 sample of progenitors is represented by the outlined histograms: values calculated from the 2D-SPS-derived galaxy SFHs are outlined in red, while those calculated from the ground-truth SFHs built from the database (DB) are represented with a dashed blue line.For the 2D-SPS-derived values, we only show the median t5 or t1 for each galaxy (calculated out of the 300 SFHs per galaxy).The cyan, filled histogram represents the distribution of the ground-truth t5 and t1 for massive galaxies at 1 < z < 4, calculated from the ground-truth SFHs built by considering only particles inside r phot in the database.The ground-truth distribution for the z = 0 descendants of our main sample is shown as the orange, hatched histogram.For the whole population of very massive galaxies at z = 0, only its median and quartile values are shown (black segments).For these two z = 0 samples, only particles inside 2×r hm have been considered.Median and quartiles are shown as segments on the top.
In Table 4, we summarize the mass-fraction formation times (and corresponding redshifts) for all the SFHs shown in Figs.11 and 12.We have calculated an estimation of the uncertainties for the 2D-SPS-derived values of the main sample of 1 < z < 4 progenitors by taking into account the (300) individual SFHs for each galaxy and by calculating the mass-fraction formation times for all of them.We estimate the uncertainty for the 2D-SPSderived formation times as the quartile values of these distributions.

SUMMARY AND CONCLUSIONS
We assess the power of broad-band HST+JWST imaging data for the analysis of the earliest evolutionary phases of galaxies that evolve into local massive objects (M * > 10 11 M ) by analysing the SFHs of progenitors at high redshift in deep cosmological surveys.For that purpose, we apply stellar population synthesis in 2D to a sample of 221 Illustris-1 simulated M * > 10 10 M galaxies at 1 < z < 4 that evolve into M * > 10 11 M galaxies at z = 0. We use ACS, WFC3, and NIRCam data in the optical and near-infrared from the synthetic Illustris-1 deep survey images (S+17).We measure SEDs for galaxies in the sample from both integrated and 2D multi-wavelength photometry on these images, previously processed to mimic the depths of CANDELS imaging data from HST and ongoing CEERS observations with JWST along with the source extraction and analysis procedures to be applied to data from these surveys.From the SED modeling, we derive the SFH of each source by combining the 2D information from the fits, and compare it with the ground-truth SFH given by the simulated particles belonging to each galaxy in the Illustris-1 database.In this work, we focus on determining the capabilities of broad-band HST+JWST for determining the first episodes in the stellar mass assembly, and our main findings are the following: (i) We evaluate the success of our 2D-SPS method in recovering the earliest phases of the stellar mass assembly by comparing the mass-fraction formation times t 5 , t 10 , and t 25 calculated for each galaxy from its 2D-SPSderived galaxy SFH with those given by its ground-truth SFH.We find that our 2D-SPS method successfully recovers these quantities with a median relative offset between our values and ground-truth of +4.4%, +1.8%, and −0.1% for t 5 , t 10 , and t 25 , respectively, and scatters of 0.21, 0.24, and 0.28 dex.Additionally, no systematic effects are observed as a function of galaxy redshift.
(ii) We build the median SFH of our main 1 < z < 4 sample of precursors from the 2D-SPS-derived individual galaxy SFHs to infer the mass-fraction forma-tion times of the sample as a whole.Thus, local M * > 10 11 M galaxies would have assembled 5% of their stellar mass present at z ∼ 1 by z = 4.5, 10% of their mass by z = 3.7, and 25% by z = 2.67, or equivalently, when the age of the Universe was 1.38, 1.72, and 2.50 Gyr, respectively.These ages agree with their ground-truth values derived from the typical SFH built from the simulated particles of galaxies in the sample, with relative differences of -2.1%, -0.5%, and +4.5% for t 5 , t 10 , and t 25 , respectively.Nevertheless, our method fails to reproduce the shape of this SFH for at z 1, mainly because of the second star formation peaks in the 2D-SPS-derived galaxy SFHs (caused by the second burst assumed in the functional form of the SFHs) which accumulate at this redshift range.
(iii) We compare the formation times derived from the 2D-SPS analysis for this main sample of precursors with the values obtained from the ground-truth median SFH at z < 1 of local M * > 10 11 M galaxies and of the descendants of this high-redshift sample.The mass-fraction formation epochs from the 2D-SPS are systematically earlier than those inferred from the descendants of this sample.Besides, the latter are higher than the ones inferred from the whole population of M * > 10 11 M galaxies.In both cases, these shifts in the formation redshifts, of ∼200-400 Myr, are due to the progenitor bias.This bias arises as a consequence of the mass cut-off imposed on the high-redshift progenitor sample, aimed at reproducing the selection suffered in actual HST+JWST survey observations, and which makes this sample to contain only the most massive (and, thus, older) progenitors of local M * > 10 11 M galaxies.Given the way our progenitor sample has been selected, their descendants at z = 0 also present a median stellar mass that is slightly higher than the whole population of local M * > 10 11 M galaxies, which leads to mass-fraction formation times closer to the Big Bang.
(iv) With the aim of comparing our results with those that will be inferred from CEERS observations, we perform the same analysis on all the 1 < z < 4 10 10 M galaxies in the processed S+17 images.We find that the formation times derived from the median SFH of this sample are very similar to the ones inferred from our original sample of massive precursors with a local M * > 10 11 M descendant (differences of <1.4% in all the formation times when considering the ground-truth particles and <4.7% from the 2D-SPS analysis).This suggests that we can consider all massive 1 < z < 4 galaxies observed with CEERS (+CANDELS), regardless of their actual z = 0 descendant, to study the formation times of the most massive descendants at z = 0.
(v) Regarding the variety of the t 1 , and t 5 values, the distribution of these formation times shows that 25% of our main 1 < z < 4 sample of progenitors formed 1% (5%) of their stellar mass present at the redshift of observation by z ∼ 8 (6), and 50% of the galaxies by z ∼ 7 (5).
The results from this work show that our 2D-SPS method, when applied to real CANDELS+CEERS spatially-resolved broad-band observations of massive 1 < z < 4 galaxies, will be able to infer when the early stages of the stellar mass assembly took place in these galaxies and, from them, have an estimation of when local 10 11 M galaxies began to form, within the limitations and biases already discussed throughout this paper.
We caution the reader that the numerical values of the mass-fraction formation times and other quantities shown in this work are unique to Illustris-1, since they are dependent on the specifications of the simulation, such as the assumed cosmology, the volume and resolution of the simulation, the physical models for galaxy formation (e.g., the star formation and feedback implementations) and on any free parameter.Thus, the values presented in this paper for Illustris-1 are not expected to be necessary similar to those obtained from other simulations like, for example, the IllustrisTNG project (Marinacci et al. 2018;Naiman et al. 2018;Nelson et al. 2018;Pillepich et al. 2018b;Springel et al. 2018), follow-up of the Illustris simulation which includes as main changes in the physics the incorporation of magnetohydrodynamics and updates in the feedback physics model, among others updates.This new Illus-trisTNG series alleviates some of the tensions present between the outcome of the original Illustris simulations and observations (see Pillepich et al. 2018a), such as, for example, the (too) high cosmic star-formation rate density predicted by Illustris at z ≤ 1, the excess in the stellar mass function at z 1 at the low ( 10 10 M ) and the high ( 10 11.5 M ) mass end, the excessively large physical extent for M * 10 10.7 M galaxies (a factor of few larger than observed), or the overpopulation of the blue cloud and green-valley with respect to the red sequence in the galaxy color distribution (see full list in Nelson et al. 2015).An interesting matter to discuss when HST+JWST measurements are available will be to tell whether the prescriptions assumed for Illustris-1, IllustrisTNG or any other simulation are good enough to reproduce the mass-fraction formation times observed for massive high-redshift galaxies or if, on the contrary, they must be used as constraints to refine new galaxy formation models.

Figure 1 .
Figure1.Left panel: Main Sequence plot for those galaxies at 1 < z < 4 included in the mock survey presented in S+17 which are progenitors of very massive galaxies (M * > 10 11 M ) at z = 0.All progenitors are color-coded by redshift.In this paper, we concentrate on the analysis of the most massive progenitors (M * > 10 10 M ), which are plotted with star symbols.The Main Sequence found for all the Illustris simulated galaxies at different redshifts(Sparre et al. 2015) has also been plotted.Middle panel: postage stamp images for some representative examples of our galaxies (size 2.5 × 2.5 ).These RGB images are created using ACS/F1814W, NIRCam/F200W, and NIRCam/F277W asinh-scaled images as B, G and R filters, respectively.The position of these galaxies in the left panel has been highlighted with a white dot inside.Right panel: g − r color vs. stellar mass diagram for descendants at z = 0 of all galaxies at 1 < z < 4 (independently of their stellar mass) in the S+17 mock survey images.Descendants of the final sample of 221 galaxies analyzed in this paper are plotted as stars.

Figure 2 .
Figure 2. Histograms for our 1 < z < 4 sample of 221 simulated galaxies: redshift (top left panel), total stellar mass (top right), SFR (bottom left), and stellar half-mass radius (bottom right).These properties have been extracted from the Illustris-1 database.Median and quartiles are shown as segments on the top.

Figure 3 .
Figure3.Top panel: histogram for the radii of the photometric apertures of the 221 galaxies in the final sample as a fraction of 2 × r hm radius, the typical radius used by Illustris to compare with observations.The cumulative fraction of galaxies in shown as a red line.Bottom panel: histogram for the fraction of stellar mass enclosed by the photometric apertures (blue filled histogram) and for a radius of 2 × r hm (orange hatched) with respect to the total stellar mass in the galaxy.In both cases, we neglect neighboring galaxies.These total masses have been extracted from the simulated particles belonging to each galaxy in the Illustris database.At the top of both panels, we show the median and quartiles for each histogram.

Figure 5 .
Figure5.Stellar mass vs. redshift plot for galaxies in our sample, with values extracted from the Illustris database.We show the stellar mass when considering all the particles in the galaxy (total stellar masses) as light blue circles and the stellar mass obtained when only particles inside twice the stellar half-mass radius as small dark blue dots.Histograms of stellar masses and redshifts are shown at the top and on the right, with the median and quartiles marked.

Figure 6 .
Figure6.Example of one integrated SED fit for one of our galaxies: Illustris-1 062 0004079 at z = 2.843.This galaxy code stands for galaxy id 4079 and snapshot 62 (z = 2.73) in the Illustris-1 simulation.The best fit is shown as a red line.We also show the models corresponding to the old (green) and young (orange) stellar populations.The transmission curves of the filters have been included at the bottom of the figure.The best-fit stellar parameters for the old and young populations are also given.An RGB image (size 1.5 × 1.5 ) with the integrated aperture is shown as an inset.

Figure 7 .
Figure7.Stellar masses for the 1 < z < 4 sample derived from the integrated SED-fits vs. their ground-truth values calculated from the simulated stellar particles belonging to each galaxy (inside a sphere with the same radius as that of the integrated photometric aperture).The one-to-one relation is shown with the dotted black line.The histograms for both distributions are shown in different colors, with the median and the 68% intervals as horizontal segments.

Figure 8 .
Figure8.Schematic diagram of the methodology followed to obtain the median SFH for a galaxy from its 2D-SPS analysis.First, we show an RGB image of the galaxy and some of the HST+JWST broad-band images (in Table1) processed to imitate CANDELS and CEERS observations for HST and JWST filters, respectively.We measure the 2D photometry inside a grid (in cyan) with cell size equal to the spatial resolution element, in addition to the integrated photometry (blue aperture).As an example, we show the SEDs measured for three regions of the grid: the center of the galaxy (teal cell), an arm region (light green), and a diffuse emission zone (pink).To estimate the uncertainties and degeneracies in the derived stellar populations, each SED is fitted 300 times by performing MC simulations.We add the individual SFHs inferred from all the grid regions to obtain the SFH for the whole galaxy.The black thin SFHs in the bottom subfigure show the SFHs for the whole galaxy created by accounting for the uncertainties in the SFHs of each grid region, where the median galaxy SFH from this 2D-SPS method is shown in yellow.The blue dotted SFH is the galaxy SFH inferred from the integrated photometry.The ground-truth SFH of the galaxy, given by the stellar particles belonging to the galaxy, is the green solid SFH.

Figure 9 .
Figure9.Comparison between the mass-fraction formation times (measured backwards from the redshift of observation of the galaxy) of our 2D-SPS method and their ground-truth values: t5 (upper-left), t10 (upper-right), t25 (lower-left) and tmw (lower-right) calculated from our 2D-SPS-derived SFHs versus their reference values calculated from the SFHs built from the simulated stellar particles in galaxies.Each galaxy is represented by 300 vertically spread points, which correspond to the mass-fraction formation times of the 300 2D-SPS SFHs, built from the 300 MC particles in each resolution element in the grid.The median of these values for each galaxy is shown as bigger circles and error bars represent the standard deviation of these values (68% interval).All points are color-coded by galaxy redshift.On the top of each panel, we show the ratio between the t k of our 2D-SPS method and their ground-truth values as a function of the latter.As an inset, we include the offsets of the galaxies in the sample.

Figure 10 .
Figure10.Ground-truth stellar mass histograms for the different samples extracted from the Illustris-1 database: when considering all particles in galaxies (top panel) or only particles inside twice the stellar half-mass radius (bottom panel).We show our main 2D-SPS sample of progenitors at 1 < z < 4 in red, massive galaxies at 1 < z < 4 in cyan, the descendants at z = 0 of our 1 < z < 4 progenitors in orange, and the whole population of very massive galaxies at z = 0 in dark gray.The median and the 68% intervals are shown at the top of each panel.
Figure12.Comparison between the median SFH of our main 1 < z < 4 progenitors sample derived from the 2D-SED fits (in red) and from the simulated particles (in blue).We also include the median SFH of the descendants of this high-redshift sample at z = 0 built with particles inside 2•r hm (in orange) and the median SFH of massive galaxies at 1 < z < 4 derived from applying the 2D-SPS method to this sample (pink).Shaded areas represent the uncertainty of the median.The SFHs have been normalized to recover the same stellar mass as the median 2D-SPS-derived SFH of our main 1 < z < 4 sample over the same redshift intervals (z 1).The vertical lines depict the t5, t10, t25, and tmw mass-fraction formation times for each SFH.Gray diamonds show the Illustris SFMS level at z = 1, 2, and 4(Sparre et al. 2015).On the top panel, we show the evolution of the integrated stellar mass at each redshift for our main highredshift sample (2D-SPS and particles, same color code).
Figure13.Histograms of t5 and t1 calculated from the SFHs of galaxies in the different samples considered.Our main 1 < z < 4 sample of progenitors is represented by the outlined histograms: values calculated from the 2D-SPS-derived galaxy SFHs are outlined in red, while those calculated from the ground-truth SFHs built from the database (DB) are represented with a dashed blue line.For the 2D-SPS-derived values, we only show the median t5 or t1 for each galaxy (calculated out of the 300 SFHs per galaxy).The cyan, filled histogram represents the distribution of the ground-truth t5 and t1 for massive galaxies at 1 < z < 4, calculated from the ground-truth SFHs built by considering only particles inside r phot in the database.The ground-truth distribution for the z = 0 descendants of our main sample is shown as the orange, hatched histogram.For the whole population of very massive galaxies at z = 0, only its median and quartile values are shown (black segments).For these two z = 0 samples, only particles inside 2×r hm have been considered.Median and quartiles are shown as segments on the top.

Table 1 .
Photometric broad-band images used in this work λ central Width Pixel scale a 5σ depth b

Table 3 .
Free parameters and their ranges in the SED fitting for a double delayed-exponential SFH Notes:

Table 4 .
Mass-fraction formation times and redshifts for different samples Values have been calculated from the SFHs shown in Figs.11 and 12.The mass-fraction formation times are measured in Gyr from Big Bang. Note: