Galaxies Going Bananas: Inferring the 3D Geometry of High-redshift Galaxies with JWST-CEERS

The 3D geometries of high-redshift galaxies remain poorly understood. We build a differentiable Bayesian model and use Hamiltonian Monte Carlo to efficiently and robustly infer the 3D shapes of star-forming galaxies in James Webb Space Telescope Cosmic Evolution Early Release Science observations with logM*/M⊙=9.0–10.5 at z = 0.5–8.0. We reproduce previous results from the Hubble Space Telescope Cosmic Assembly Near-infrared Deep Extragalactic Legacy Survey in a fraction of the computing time and constrain the mean ellipticity, triaxiality, size, and covariances with samples as small as ∼50 galaxies. We find high 3D ellipticities for all mass–redshift bins, suggesting oblate (disky) or prolate (elongated) geometries. We break that degeneracy by constraining the mean triaxiality to be ∼1 for logM*/M⊙=9.0–9.5 dwarfs at z > 1 (favoring the prolate scenario), with significantly lower triaxialities for higher masses and lower redshifts indicating the emergence of disks. The prolate population traces out a “banana” in the projected b/a–loga diagram with an excess of low-b/a, large- loga galaxies. The dwarf prolate fraction rises from ∼25% at z = 0.5–1.0 to ∼50%–80% at z = 3–8. Our results imply a second kind of disk settling from oval (triaxial) to more circular (axisymmetric) shapes with time. We simultaneously constrain the 3D size–mass relation and its dependence on 3D geometry. High-probability prolate and oblate candidates show remarkably similar Sérsic indices (n ∼ 1), nonparametric morphological properties, and specific star formation rates. Both tend to be visually classified as disks or irregular, but edge-on oblate candidates show more dust attenuation. We discuss selection effects, follow-up prospects, and theoretical implications.


INTRODUCTION
The 3D geometry of galaxies provide important clues about their formation history.There is a rich tradition of population studies tracing back almost a century that attempt to infer the 3D geometries of galaxies on the basis of their projected shapes.Hubble (1926) recognized that the distribution of projected ellipticities of local galaxies shows many more round objects than can be explained by randomly oriented disks alone.This served to (equally long in all three dimensions) or prolate (flattened in two directions and thus elongated in one direction).More general "triaxial" ellipsoid models were introduced soon after to explain the puzzling lack of rotation in local giant ellipticals and bulges (Contopoulos 1956;Stark 1977;Binney 1978).
Connecting these constraints on the intrinsic shapes of nearby galaxies with those of their progenitors at highredshift became possible with the advent of the Hubble Space Telescope (HST).Numerous studies have established that the bright, massive population at high redshift already seems to have taken on oblate and spheroidal 3D shapes, albeit with smaller sizes and thicker minor-to-major axis ratios (Reshetnikov et al. 2003;Elmegreen et al. 2004bElmegreen et al. ,a, 2005;;Holden et al. 2012;Chang et al. 2013;van der Wel et al. 2014a;Satoh et al. 2019;Zhang et al. 2019, Zhang et al. 2022;Hamilton-Campos et al. 2023).For these bright objects, constraints on gas kinematics through deep emission line spectroscopy definitively showed the existence of large rotating disks with high random motions at early times (Förster Schreiber et al. 2006;Genzel et al. 2006;Law et al. 2009;Förster Schreiber et al. 2009;Kassin et al. 2012;Glazebrook 2013;Wisnioski et al. 2015;Simons et al. 2016Simons et al. , 2017)).There are also indications of incredibly compact, massive, spheroidal "nuggets" that may be the precursors of present-day ellipticals (van Dokkum et al. 2008;Barro et al. 2013) or bulges in massive spirals (de la Rosa et al. 2016;Costantin et al. 2021Costantin et al. , 2022)).
The situation for dwarfs at high redshift is less clear.1It was, and still is, common practice to classify any distant, faint galaxies that are not obviously disks or ellipticals as "irregular" or "peculiar" (Glazebrook et al. 1995;Driver et al. 1995) and then to draw connections to mergers and interactions (e.g., Dressler et al. 1994).However, in one of the first deep fields with HST, Cowie et al. (1995) identified a new class of consistently elongated, linear, clumpy objects that they termed "chain" galaxies (see also Dickinson et al. 1995;van den Bergh et al. 1996).Dalcanton & Shectman (1996) argued that these chain galaxies are the edge-on projections of intrinsically oblate disk galaxies.Elmegreen et al. (2004b), Elmegreen et al. (2004a) and Elmegreen et al. (2005) found more patterns among the peculiar/irregular population and grouped them into additional subclasses: chains, clump clusters, tadpoles and double clumps (see also van den Bergh 2002;Conselice et al. 2004;Straughn et al. 2006).They too argued that chains were the edge-on versions of the rounder clump clusters with the latter being harder to detect due to surface brightness dimming.Hydrodynamical simulations of early clumpy star-forming galaxies by Immeli et al. (2004a), Immeli et al. (2004b) and Bournaud et al. (2007) bolstered these claims.
But statistical analyses of projected axis ratios by Ferguson et al. (2004), Ravindranath et al. (2006), Yuma et al. (2011), Law et al. (2012) and Yuma et al. (2012) also suggested another possibility: that high-redshift dwarfs may be intrinsically elongated (prolate) or triaxial rather than normal oblate disks.Soon after, the Cosmic Assembly Near-infrared Deep Extragalactic Legacy Survey (CANDELS; Grogin et al. 2011;Koekemoer et al. 2011) and 3D-HST Survey (Brammer et al. 2012;Skelton et al. 2014) provided much larger sample sizes which allowed more robust 3D shape modeling.van der Wel et al. (2014a) demonstrated that the asymmetric projected axis ratio distributions of high-redshift dwarfs peaking at small values can indeed be explained if they are a new class of preferentially elongated (prolate) systems.Zhang et al. (2019) additionally incorporated size information to further constrain 3D shapes and found that up to ∼ 70% of log M * /M ⊙ = 9.0 − 9.5 galaxies at z = 1.5 − 2.0 and z = 2.0 − 2.5 may be intrinsically prolate.
We do not consider the 3D nature of high-redshift dwarfs a resolved problem.With the ever improving resolution of modern cosmological simulations, it has become possible to forward model the 3D shapes of galaxies at different epochs.Ceverino et al. (2015) and Tomassetti et al. (2016) showed that in their set of "zoom-in" simulations, low-mass galaxies at high redshift are indeed prolate and live in dark matter halos that are themselves prolate and aligned in the same direction as the stellar distribution.Pandya et al. (2019) used this to propose that intrinsic alignments of elongated high-redshift dwarfs may serve as tracers of cosmic web filaments but they did not detect the expected signal, though they attributed this to severe spectroscopic incompleteness.On the other hand, Figures 8  and 9 of Pillepich et al. (2019) show far fewer galaxies with intrinsically prolate 3D stellar distributions in the TNG50 simulation compared to the observational constraints from van der Wel et al. (2014a) and Zhang et al. (2019).This potential discrepancy between different simulations with respect to each other and versus observations demands more detailed theoretical studies of 3D shapes.At the same time, the limited sensitivity, spatial resolution and wavelength coverage of HST itself raises questions about the impact of completeness, blending and light-weighting effects on the projected shape distributions of faint, distant galaxies.In particular, did HST miss round, face-on, oblate dwarfs with low surface brightness?Were groups of unresolved objects systematically blended together leading to larger numbers of elongated sources?What do we infer about the 3D shapes of high-redshift galaxies by observing the bulk of their stellar population at even longer wavelengths?
The arrival of the James Webb Space Telescope (JWST) has the potential to transform our understanding of galaxy morphological evolution thanks to its high sensitivity and resolution at infrared wavelengths.Indeed, many exciting studies have already revealed the remarkable diversity of galaxy shapes at high-redshift through a variety of methods.A combination of visual classifications and parametric and non-parametric morphological measurements suggested early on that the fraction of disk galaxies is higher in JWST imaging compared to HST (Ferreira et al. 2022;Kartaltepe et al. 2023).Robertson et al. (2023) used a deep learning framework trained on previous HST imaging and CANDELS visual classifications to identify faint, distant disks in JWST imaging that were missed by HST (see also Huertas-Company et al. 2023;Tohill et al. 2023).But the visual classifications and metrics used in many of these studies so far do not necessarily distinguish between 3D shapes, namely prolate versus oblate geometries and the possibility of oval (triaxial) disks.Indeed, Vega-Ferrero et al. (2023) used a machine learning method trained instead on the TNG50 simulations to classify galaxies observed with JWST and found that a substantial fraction of visually classified disks may instead be intrinsically elongated.Nelson et al. (2023) identify a sample of 12 massive, elongated and surprisingly red galaxies at z = 2 − 6 which they claim may be either oblate or prolate.
In this paper, we place new constraints on the 3D shapes of high-redshift galaxies using JWST observations from the Cosmic Evolution Early Release Science survey (CEERS; Finkelstein et al. 2023).Our approach is distinct from and complementary to previous JWST studies of galaxy structure and morphology.We will use the distributions of projected axis ratios and sizes to fit a family of triaxial ellipsoid models to observed galaxies in different mass-redshift bins.We focus on star-forming galaxies since the fraction of quiescent galaxies drops dramatically towards high redshift (e.g., Pandya et al. 2017) which would limit our sample sizes for inference, and because quiescent galaxies likely occupy a different "mode" for 3D shapes (Chang et al. 2013).We will allow the data to speak for itself using a Bayesian approach to constrain the relative fractions of oblate, spheroidal and prolate galaxies as well as triaxial systems more generally.In addition to constraining 3D shapes, we will also simultaneously derive the 3D size-mass relation and its redshift evolution which is of great interest for constraining galaxy formation models.Although our sample sizes are small, we will show that the data has sufficient constraining power in many cases and that when this is not true, our posteriors reflect our uniform priors as they should.
This paper is organized as follows.In section 2, we describe the new JWST data as well as previous CAN-DELS observations for consistency checks.In section 3 we detail our methods.We present our results on 3D shape evolution in section 4 and compare some properties of high-probability prolate, oblate and spheroidal candidates in section 5. We discuss the implications of our findings in section 6 and summarize our conclusions in section 7. We assume a standard Planck Collaboration et al. ( 2016) cosmology with h = 0.6774, Ω m,0 = 0.3075, Ω Λ,0 = 0.691 and Ω b,0 = 0.0486.

CEERS JWST Observations
We use JWST observations from the CEERS survey (Program ID 1345;Finkelstein et al. 2023) which surveyed portions of the Extended Groth Strip (EGS; Davis et al. 2007) previously covered by CANDELS (Stefanon et al. 2017).Specifically, we focus on the 10 NIR-Cam pointings covering roughly 50% of the area of the CANDELS-EGS in six filters: F115W, F150W, F200W, F277W, F356W and F444W.Four of the pointings were observed in June 2022 with the other six in December 2022.Data reduction details are given in Bagley et al. (2023). 2e use two independent photometric and galaxy structural catalogs for the same CEERS imaging to ensure that our results are not driven by source detection and structural measurement methods.First, we use the internal CEERS team photometric catalog from Finkelstein et al. (2023) which uses the original Source Extractor code (Bertin & Arnouts 1996).Our stellar masses, star formation rates (SFRs), photometric redshifts and dust attenuations are based on broadband SED fitting with EAZY (Brammer et al. 2008, more details specific to CEERS can be found in Barro et al. 2023). 3We use galaxy structural measurements from Galfit (Peng et al. 2002) based on single-component Sérsic (1963) model fits done independently in each filter (details will be given in McGrath+ in prep.but this closely follows van der Wel et al. 2012).Only sources with F356W < 28.5 AB mag were fit with EAZY and Galfit but our sample selection is generally brighter than ∼ 27 AB mag (Appendix B).We only include sources with a Galfit flag of 0 which indicates no problems during the fitting process.
As an alternative, we use the next-generation Source Extractor++ code (SE++; Bertin et al. 2020;Kümmel et al. 2022) to independently do source detection, characterization and single-component Sérsic (1963) model fits jointly across all wavelengths including previous HST imaging in ACS-F606W, ACS-F814W, WFC3-F125W, WFC3-F140W and WFC3-F160W.SE++ starts with the same global background-subtracted image and fixes the local background around each source to zero just like Galfit, but it uses its own deblending, masking and fitting algorithms.All Sérsic model parameters were allowed to vary across filters except the position angle.Simulated parameter recovery tests have shown that SE++ performs very well and, unlike Galfit, provides meaningful uncertainties on Sérsic parameters (Euclid Collaboration et al. 2023).We only include sources whose sizes and axis ratios have fractional uncertainties < 10%, which removes at most ∼ 10% of the sample in each redshift bin.For every SE++ source, we cross-match to the nearest neighbor within 0.25" from the internal CEERS catalog above which gives us stellar mass, SFR and redshift estimates from EAZY.We have a negligible number of SE++ sources without a cross-match, and we do not expect our conclusions to change if we had done independent SED fits using the SE++ photometry itself.We visually inspected the Sérsic model fits and residuals for many of our objects and found that they generally look reasonable with very few catastrophic failures and with good agreement between SE++ and Galfit.Appendix C shows cutouts of the data, Sérsic model and residuals for some example galaxies fit by both SE++ and Galfit.
Our definition of projected size is the Sérsic half-light radius, i.e., the semi-major axis length of the ellipse that encloses half of the model light distribution.Importantly, all of our sizes and axis ratios from both Galfit and SE++ are intrinsic, i.e., from the best-fitting Sérsic model before being convolved with the PSF.We use the same empirical PSFs described in Finkelstein et al. (2023) for both Galfit and SE++.More goodness of fit details for Galfit will be presented in McGrath et al. (in prep.).
In this paper, we only consider star-forming galaxies with stellar masses between 10 9 − 10 10.5 M ⊙ and z = 0.5−8.0.These cuts reflect the completeness and sample sizes afforded by CEERS.The motivation for focusing on the star-forming population alone is three-fold: (1) this is distinct from the quiescent population which may have different 3D shapes, (2) it is easier to detect and hence be complete to star-forming galaxies in CEERS, and (3) the number of quenched galaxies drops rapidly at high redshift.We select star-forming galaxies following the procedure of Pandya et al. (2017): in each redshift bin, we find the median specific SFR (sSFR = SFR/M * ) of dwarfs with M * = 10 9 − 10 9.5 M ⊙ which should be overwhelmingly star-forming. 4This gives us the median sSFR of galaxies on the star-forming main sequence (for simplicity we assume zero slope for sSFR as a function of M * but this does not affect our conclusions).We only consider galaxies in each redshift bin whose sSFR is larger than −0.45 dex of this median main sequence sSFR.This cut crudely excludes any galaxies lying more than 1.5σ below the main sequence ridge line where σ ∼ 0.3 dex is the typical main sequence scatter.
Table 1 lists our mass-redshift bins and the number of galaxies with reliable structural measurements from both Galfit and SE++.To ensure that we probe Table 1.Our CEERS mass-redshift bins, adopted NIR-Cam filters that probe roughly the same rest-frame optical wavelength with redshift, and the number of CEERS galaxies from Galfit and SE++ that satisfy our selection criteria (star-forming, reliable structural measurements, larger log a than PSF FWHM).We also include the number of galaxies from all 5 CANDELS fields at z < 2.5 with reliable Galfit measurements.
galaxy structure at roughly the same rest-frame optical wavelength (∼ 4000 − 8000 Å) across redshift, we use a different NIRCam filter for each redshift interval as given in the table.We tried different filters and found that our conclusions are not sensitive to this.For both sets of catalogs, we only use galaxies whose semi-major axis length from the Sérsic model is larger than the PSF FWHM in the filter corresponding to their redshift.In practice, this size cut mainly affects our z = 3 − 8 bin since galaxies at lower redshifts are generally much larger than the PSF FWHM.

CANDELS HST Observations
Separately from CEERS, we also use the full 5-field CANDELS dataset to test our new 3D shape modeling code. 5Since we will try to reproduce Zhang et al. (2019) as a consistency check, we apply the same cuts to the CANDELS dataset as them which include the standard F160W < 25.5 AB mag, PhotFlag=0, CLASS STAR< 0.8, and reliable Galfit measurements (flag=0 in the catalogs from van der Wel et al. 2012).The redshifts are a mixture of photometric, grism and (when available) spectroscopic redshifts as described in Kodra et al.   5 These data can be found on MAST: 10.17909/T94S3X.
(2023).The stellar masses and SFRs are the medians from many different SED fitting codes (Santini et al. 2015;Mobasher et al. 2015).This is the same dataset that was used for Pandya et al. (2019).To be consistent with Zhang et al. (2019), we select star-forming galaxies in each redshift bin using the sSFR-M * relations provided in Fang et al. (2018).This is different from the Pandya et al. (2017) strategy we use for CEERS but the conclusions are unaffected regardless of approach.We have verified through visual inspection that the resulting sample of star-forming galaxies roughly matches Zhang et al. (2019).Table 1 lists the number of CAN-DELS galaxies in each mass-redshift bin at z < 2.5.CANDELS was designed to be complete to log M * ∼ 9 galaxies out to z = 2.5 so we do not attempt to extend beyond that for this part of the analysis.

CEERS Size -Axis Ratio Histograms
Figure 1 shows the distributions of projected axis ratio for CEERS galaxies in each of our mass-redshift bins as measured with Galfit and SE++.The two sets of measurements are generally consistent in showing that low-mass galaxies preferentially have low b/a (appearing edge-on) with a deficit of high b/a (round projected) dwarfs.Thus, with more sensitive, higher resolution, redder wavelength imaging from JWST, we still see the same asymmetric b/a distributions skewed towards low values as were seen in HST-CANDELS by van der Wel et al. (2014a) and Zhang et al. (2019).The log M * /M ⊙ = 10 − 10.5 bins generally have smaller sample sizes but the distributions are becoming more uniform which we expect for a mixture of disks and spheroids seen from random viewing angles.
Figure 2 similarly compares the distributions of the semi-major axis lengths for CEERS galaxies in our different mass-redshift bins from Galfit and SE++.Here again we see that the two sets of measurements are generally consistent.However, SE++ finds more largesize galaxies than Galfit.In Appendix C, we argue that this does not affect our overall conclusions and is likely due to SE++ genuinely detecting additional large, bright galaxies rather than finding systematically larger sizes than Galfit.
Figures 3 and 4 show the 2D histograms of projected b/a vs. log a for CEERS based on Galfit and SE++, respectively.This is the joint parameter space that we will use to constrain our 3D shape model.In the two lower mass bins, we generally have ∼ 100 − 450 objects which, as we will show later, is sufficient for constraining 3D shapes.However, our higher-mass bin tends to be very noisy with < 50 galaxies for some redshift intervals.In this case, our Bayesian 3D shape modeling code may be unconstrained by the data so the posteriors may reflect our priors.
We remind the reader that our sizes and axis ratios are all intrinsic, i.e., from the best-fitting Sérsic model before being convolved with the PSF.While we can recover intrinsic sizes and axis ratios below the resolution limit, we do not want our results to be driven by completely unresolved galaxies so we require the intrinsic semi-major axis length a > PSF FWHM but do not impose any such cut for the intrinsic semi-minor axis length b.In many of the lower mass, higher redshift bins, there is an excess of low b/a galaxies and a deficit of round (high b/a) sources rather than a uniform distribution in b/a for a given log a, particularly for larger (well resolved) log a.In section 3.1 below, we will illustrate how this curved "banana-like" joint distribution of b/a − log a arises from ellipsoids with intrinsically elongated 3D shapes.
In Appendix B, we describe completeness simulations to understand the reasonably faintest extended sources that we would be sensitive to with the CEERS imaging.We injected 10 4 fake Sérsic profiles into the CEERS imaging and processed those mock images using our entire pipeline.In short, we find that for extended sources spanning a reasonable range of sizes and axis ratios, we are complete to objects as faint as ∼ 26.5 AB mag (F277W).This is ∼ 2 magnitudes deeper than HST-CANDELS for which studies of galaxy morphology are typically restricted to sources brighter than 24.5 AB mag in the F160W filter.It is not clear that sources even fainter than 27 AB mag would still satisfy our log M * /M ⊙ > 9 cut.

Banana Diagram Decomposition
Following van der Wel et al. (2014a) and Zhang et al. (2019), we start by assuming that all galaxies can be approximated as 3D ellipsoids.6All 3D ellipsoids are described by three numbers that relate their three axis lengths A ≥ B ≥ C: the ellipticity E, triaxiality T and length of the largest axis in 3D which we denote log A. These are distinct from the projected quantities which we denote using lowercase variables log a and b/a.The ellipticity and triaxiality are respectively defined as: and In practice, the division between different kinds of ellipsoids is arbitrary except for the most extreme cases.We follow Figure 1 of Zhang et al. (2019, see also Figure 2 from van der Wel et al. 2014a) to classify 3D ellipsoids into spheroidal, oblate or prolate based on where they land in the 3D C/A vs. B/A diagram.We will show our exact boundaries later in this paper alongside the results (specifically in Figure 13).
For any 3D ellipsoid, it is straightforward to calculate its projected b/a and log a given a viewing angle, i.e., a combination of polar angle θ and azimuthal angle ϕ (Binney 1985; van de Ven & van der Wel 2021). 8We first construct a library of 1 million 3D ellipsoids spanning a range of combinations of E, T and log A on a    uniform grid.9For each of these 1 million 3D ellipsoids, we sample 100, 000 random viewing angles, i.e., pairs of (cos θ, ϕ) drawn uniformly over the range −1 < cos θ < 1 and 0 < ϕ < 2π.By calculating the projected b/a and log a for each of these viewing angles, we can construct a 2D histogram of b/a vs. log a for a single 3D ellipsoid which can be thought of as a probability map for how that ellipsoid would appear in projection.The 2D histogram of projected b/a vs log a for a mixture of different kinds of 3D ellipsoids can be obtained by summing their individual corresponding 2D probability maps.We follow section 5 of Chang et al. (2013) to incorporate typical observed uncertainties in b/a and log a when creating this library of 2D probability maps for all 1 million ellipsoids.Specifically, we assume a typical observational uncertainty on projected axis ratios of ∆(b/a) = 0.04 and then use a Rice distribution (see Appendix C of Rix & Zaritsky 1995) to convert the true projected b/a for every ellipsoid seen from any viewing angle into a random measured b/a.We then use the ratio of the true and randomly drawn b/a to re-scale the true projected log a into a randomly measured uncertain size.We further smear the predicted log a by a Gaussian of width 0.03 dex.
Figure 5 illustrates the differences between the 2D histograms of b/a vs. log a for four different types of ellipsoids.Spheroidal ellipsoids would appear round from any viewing angle and thus lie at the upper part of this diagram.Oblate ellipsoids would appear round when viewed "face-on" and thin when viewed "edge-on" but they are equally likely to be observed from any viewing angle and thus show a uniform distribution in b/a.The finite thickness (intrinsic C/A) of oblate ellipsoids means that they will show an abrupt truncation at low projected b/a as seen in the figure.
Prolate ellipsoids are different.Since they have two short axes, they are more likely to be observed "edgeon" with the longest axis seen in projection.Thus most of the projections of a prolate ellipsoid will have low b/a and large log a.A similar trend is expected for flattened oval (triaxial) disks which, even when seen faceon, would not appear circular and therefore also tend to have b/a < 1.The b/a − log a diagrams for both prolate ellipsoids and oval (triaxial) disks mimic the ap-pearance of a banana and so we refer to these as "banana diagrams" throughout this paper.

Multivariate Normal Model
In practice, a population of observed galaxies will be made up of a mixture of 3D ellipsoids with a variety of intrinsic shapes.Thus we need to decompose the observed 2D histogram of b/a vs. log a into a probabilityweighted sum of the 2D projected histograms for all 1 million ellipsoids in our toy library.We use a Bayesian model to accomplish this.
Suppose we have N observed galaxies in a single massredshift bin.We say that these N observed galaxies are drawn from N 3D ellipsoids, each of which is characterized by its intrinsic shape vector ⃗ θ = (E, T, log A).We further assume that this ⃗ θ vector is distributed as a 3D Gaussian with unknown mean vector and unknown covariance matrix with standard deviations σ E , σ T and σ log A .Following previous work, we allow for a covariance ρ between the ellipticity and the size.In practice we can also introduce covariances between the ellipticity and triaxiality, and between the triaxiality and size.This does not appear to change our results though it does require more sophisticated sampling methods and can slow down convergence as discussed in the next subsection.We believe the current model is flexible enough to encompass a wide spread in 3D shapes despite treating all galaxies in a given mass-redshift bin as a single population, but we will discuss alternative approaches in subsection 6.3.Intuitively, this model says that our N observed galaxies are drawn randomly from a population of 3D ellipsoids that are characterized by their mean shape parameters along with intrinsic scatter around those means (as well as covariance between the mean ellipticity and mean size).As shown in the flowchart on the right side of Figure 6, for a given choice of ⃗ µ and Σ, we can assign a probability to each of the 1 million ellipsoids in our toy library.Then we can do a probability-weighted sum of their corresponding 2D histograms of b/a vs log a.If the data have sufficient constraining power (and if the model is flexible and robust enough), then we can use the observed 2D histogram of b/a vs. log a to infer the relative contributions of 3D ellipsoids of different types to the observed population.For each type of ellipsoid, we show a 3D visualization along with a cross-section where the latter clearly differentiates between prolate systems and circular versus oval disks.Face-on and edge-on projections are also shown corresponding to locations in the histograms.Top: spheroids would be concentrated at large projected b/a because they appear round from angle viewing angle.Second from top: oblate/disky objects trace out a uniform distribution in b/a because they are equally likely to be seen from any viewing angle.They have a cut-off at low axis ratios due to their finite thickness.Third from top: prolate ellipsoids would trace out a banana in this diagram since they have two short axes leading to most projections appearing "edge-on" with small b/a and large size.Bottom: oval (triaxial) disks would also preferentially have lower b/a in projection because even when seen face-on, they would not be perfectly circular. where . A visualization of our Bayesian model.For observed galaxies in any given mass-redshift bin, the corresponding distribution of 3D ellipsoid shape properties is assumed to follow a 3D multivariate normal with 7 unknown parameters describing the mean ellipticity, triaxiality and size as well as the covariance matrix.The directed graph on the bottom-left shows our priors for these 7 parameters and our assumption of a Poisson likelihood for fitting the model to the observed 2D histogram of b/a vs. log a.The flowchart on the right illustrates our procedure for Hamiltonian Monte Carlo.

Hamiltonian Monte Carlo
We implement the above model in the probabilistic programming framework PyMC (Salvatier et al. 2015) which provides many different samplers for Bayesian parameter inference.In particular, we take advantage of its "automatic differentiation" capability to rapidly and exactly compute the gradient of our model likelihood with respect to all seven of its free parameters.Unlike symbolic differentiation or finite difference methods, automatic differentiation involves translating code into a computational graph and keeping track of not only the results of mathematical operations but also their partial derivatives for use with the chain rule (see Baydin et al. 2018, for a recent review).This allows us to leverage a powerful sampling technique known as Hamiltonian Monte Carlo (HMC) which uses the gradient of the likelihood to more efficiently explore high-dimensional parameter spaces compared to traditional Markov Chain Monte Carlo methods.We use a specific implementation of HMC called the No U-Turn Sampler (NUTS; Hoffman & Gelman 2014).
We assume uniform priors for all seven free parameters.Specifically, µ E and µ T both have a uniform prior between 0 and 1 (for numerical robustness, we use a Beta distribution with α = β = 1 which is just the uniform distribution).For µ log A we assume a uniform distribution between −1 and 1 dex.For the three standard deviations, we assume a uniform prior between 0 and 1.Finally, for the correlation coefficient ρ(E, log A), we assume a uniform prior between −1 and 1.As mentioned in the previous subsection, we experimented with fitting for the full covariance matrix directly using the sophisticated "Lewandowski-Kurowicka-Joe" (LKJ; Lewandowski et al. 2009) prior which is optimized for Bayesian sampling methods.The LKJ prior samples the Cholesky decomposition (lower triangular matrix) of the covariance matrix and specifies how much Σ deviates from the identity matrix.The advantage of the LKJ prior is that we can also fit for ρ(E, T ) and ρ(T, log A).In practice, for the few cases we tried, the LKJ prior gave similar results as our fiducial model but was slower to converge (especially with our relatively small sample sizes).
We assume a Poisson likelihood for comparing the observed and model 2D histograms of b/a vs log a.For simplicity, we assume that all of the cells of the observed 2D histogram are independent so that we can add the log-likelihood of all of them together to estimate the goodness of fit for any individual model realization.In principle, we should allow for the possibility that uncertainties in b/a and log a can mean that galaxies may contribute to other nearby cells of the 2D histogram compared to the one they are assigned to.However, we already accounted for the typical observed errors in b/a and log a when constructing our library of banana diagrams so this potential smearing is already included in the model.Also, the typical errors in b/a and log a should be smaller than our bin widths of ∆b/a = 0.05 and ∆ log a = 0.1 dex, especially for SE++ where we imposed a fractional uncertainty threshold of 10%.
We fit each mass-redshift bin independently.We use 5000 tuning (burn-in) draws and 2000 sampling draws with 4 chains in parallel.This is typically more than adequate for HMC/NUTS.We use the recommended target accept=0.95value for NUTS which means smaller adaptive step sizes and makes it easier for the sampler to explore the potentially complicated posterior (particularly important in cases where we have only a few tens of objects).With these options, we do not get any catastrophic divergences during NUTS sampling and all 4 independent chains converge to the same posteriors, even for the mass-redshift bins with small sample sizes (though of course in these cases the posteriors for some parameters can be relatively unconstrained).In Appendix A, we show mock tests where NUTS succeeds in constraining all parameters for sample sizes > 50.For smaller sample sizes, all parameters except µ T and σ T can still be constrained, with the latter showing very broad posteriors.Figure 7 shows an example corner plot for the z = 2.0 − 2.5 and log M * /M ⊙ = 9.0 − 9.5 population with constraints from the full 5-field CANDELS dataset as well as CEERS using either Galfit or SE++.We see that all 3 models are very well constrained.For this mass-redshift bin, the mean ellipticity is constrained to be high with µ E ∼ 0.75 and there is a relatively small scatter σ E ∼ 0.1 which means the galaxies are consistent with either disks or prolate systems.The mean triaxiality is also constrained to be ≫ 0.75 (with CAN-DELS hitting up against ∼ 1) which strongly favors the prolate interpretation.The corresponding scatter is σ T ∼ 0.25 − 0.75 which means that there must also be some contribution from disks.Although the three models seem to show discrepancies with each other, we stress that these are relatively minor: the differences between their constrained posteriors are much smaller than the ranges of the uniform priors for all parameters.In other words, the different models reach similar regions of the enormous 7D parameter space despite discrepancies in the data which suggests that our conclusions are robust.We also show that the observed histogram and mean model histogram agree well with each other and that the residual histograms are relatively featureless regardless of dataset used.This also holds for our other mass-redshift bins although some of the massive bins can show greater residuals due to their small sample sizes and perhaps limited flexibility of the model (we discuss this more in subsection 6.3).

Individual Ellipsoid Classification Probabilities
Figure 8 demonstrates that we can use our constrained model parameters to assign 3D ellipsoid classification probabilities for individual observed galaxies (see also section 5.1 of Zhang et al. 2019).For illustrative purposes, we use the means of the posteriors of each parameter to reconstruct the mean model 2D histogram of b/a vs log a.We can decompose the total mean model histogram into the relative contributions from prolate, oblate and spheroidal ellipsoids using the boundaries on the C/A vs. B/A diagram in Figure 13 (these are adapted from van der Wel et al. 2014a; Zhang et al. 2019).
For the particular low-mass, high-redshift bin shown in Figure 8 (see also analogous figures for the other massredshift bins in the corresponding figure set), prolate ellipsoids dominate at low b/a, oblate ellipsoids dominate for large b/a and large log a, and spheroids are negligible.As a result, most of the individual observed galaxies with low b/a have > 75% probability of being prolate in 3D.In contrast, observed galaxies in the upper right corner are given very high probability of being (face-on) disks.Of course, we cannot say for sure whether any individual galaxy is indeed prolate from this kind of statistical imaging-based analysis alone, but it is a first step towards more detailed comparative analyses and facilitating follow-up observational campaigns.Lastly, here we are using the means of the posteriors, but instead we could also do random draws of model parameter combinations from the posterior and then construct the mean model histogram that way (this would also provide a way to assign uncertainties on classifications).
Given the small sample sizes for many of our massive log M * /M ⊙ = 10 − 10.5 bins, the posteriors for µ T and σ T tend not to be constrained.As a result, in these bins, the classification probabilities for most galaxies are not very high and may be split roughly equally between all three ellipsoid classes.In any comparative analyses in section 5, we will only take galaxies with a > 75% probability of being assigned to one of the three ellipsoid classes.This is of course arbitrary but it allows us to focus on objects in mass-redshift bins with the greatest available constraints.After making this cut on our model based on SE++, we end up with 1806 prolate candidates, 220 oblate candidates and 73 spheroidal candidates irrespective of mass and redshift.We remind the reader that we are only considering star-forming galax-ies.As sample sizes increase, we can expect a larger fraction of observed galaxies to be assigned higher class probabilities with this technique.

Non-parametric Morphological Measurements
For individual observed galaxies with > 75% probability of being assigned to one of the three ellipsoid classes, we will want to look for other signatures that may discriminate between prolate, oblate and spheroidal objects.To this end, we use the publicly available statmorph Python package (Rodriguez-Gomez et al. 2019) to measure several non-parametric morphological properties.For each of our high-probability candidates, we create 3 ′′ × 3 ′′ cutouts of the science and error images in the relevant filter that probes rest-frame optical wavelengths (see Table 1).We also read in the empirical PSF created from stacking stars in the NIRCam fields for the relevant filter (Finkelstein et al. 2023).We create our own regularized segmentation map with photutils (Bradley et al. 2022) in just the cutout region using a pixel detection threshold of 1.5σ above the error map while ensuring that only the main object of interest in the center of the cutout will be fit.Removing objects with statmorph flag > 1, we are left with 1766/1806 prolate candidates, 201/220 oblate candidates and 73/73 spheroidal candidates using the SE++ based model.Visual inspection of the failed fits reveals bright neighbors or artifacts while the successful fits all look reasonable.
We focus on the concentration C, asymmetry A, clumpiness (smoothness S), Gini coefficient G, and second-order moment of the 20% brightest pixels M 20 .These are defined in Lotz et al. (2004, see also Abraham et al. 2003and Conselice 2003) but we briefly summarize here.The concentration C reflects the ratio of the circular radii containing 80% and 20% of the light, respectively (it is another way to measure how concentrated the light profile is akin to the Sérsic index n).The asymmetry A is computed by summing over the residuals after subtracting a 180 • rotated image from the original image.The clumpiness (smoothness S) similarly sums over the residuals after subtracting a boxcarsmoothed image from the original image with a smoothing scale of 0.25r p where r p is the Petrosian radius estimated by statmorph.Thus, lower values of A and S correspond to more symmetric, smooth light distributions.The Gini coefficient G measures how different the distribution of pixel fluxes is from being uniform with G = 1 corresponding to a single pixel containing all of the flux and G = 0 meaning every pixel has the same flux.Finally, the second-order moment of the 20% brightest pixels M 20 tracks the spatial distribution the brightest regions relative to the total underlying flux.It is computed by multiplying the flux in the 20% brightest pixels by the square of their distances from the galaxy center, and then dividing by the same calculation for all of the galaxy's pixels.The combination of G − M 20 has been used to separate low redshift galaxies into mergers, ellipticals and bulge-dominated systems (Lotz et al. 2008) and has also recently been explored in the context of CEERS visual classification morphologies (Kartaltepe et al. 2023).

RESULTS ON 3D SHAPE EVOLUTION
Here we present our constraints on the 3D shapes of JWST-CEERS galaxies as a function of stellar mass and redshift.Table 5.5 at the end of this section tabulates our results.

Ellipticity Evolution
In Figure 9, we start by showing the mass-redshift evolution of the mean and standard deviation model ellipticity parameters µ E and σ E .For all of our mass-redshift bins, the mean ellipticity is high with µ E ≳ 0.75 and the scatter is generally small with σ E ≲ 0.1.These high ellipticities translate to C/A ∼ 0.25 which is thicker than local disks by a factor of ∼ 2 − 3 (e.g., Elmegreen et al. 2005).We do not see evidence of strong evolution in C/A over the wide range of redshift and mass considered in Figure 9.These results indicate that the majority of the star-forming population that we see in CEERS may either be oblate or prolate in 3D since both configurations would have high ellipticity.The way to break this degeneracy is to constrain the triaxiality parameter which we will explore below.
But first we show the correlation coefficient between the mean ellipticity and mean 3D size, ρ(E, log A) in Figure 10.The correlation coefficient is strongly positive in all mass-redshift bins except perhaps the low-mass bin at z = 3 − 8 based on Galfit.This means that larger size galaxies tend to have higher ellipticity, i.e., larger size galaxies have a greater likelihood of being either prolate or oblate.It makes sense that the largest galaxies we see would be disks or prolate since we are studying star-forming galaxies and are not going to such high masses that we would be in the quiescent elliptical regime.In addition, while we believe this result is robust against PSF resolution effects, we will discuss this caveat in subsection 6.3.

Triaxiality Evolution
Figure 11 answers the key question about the massredshift evolution of the mean triaxiality µ T and its standard deviation σ T .Recall from equation 2 that disks that are nearly oblate/axisymmetric have low triaxiality (T ≈ 0) whereas nearly prolate systems have high triaxiality (T ≈ 1).All of the low-mass (log M * /M ⊙ = 9.0 − 9.5) bins are consistent with µ T ≫ 0.8 at z > 1 strongly favoring the prolate interpretation.Several of the intermediate-mass (log M * /M ⊙ = 9.5 − 10.0) bins are also consistent with high triaxiality and this becomes more pronounced at higher redshift (z > 2).The massive (log M * /M ⊙ = 10.0 − 10.5) bins tend to have lower triaxiality and especially at lower redshifts are consistent with µ T ≲ 0.2 indicative of oblate/disky 3D geometries.The scatter in the triaxiality is relatively high with σ T ≳ 0.5 in many mass-redshift bins.This means that even in cases where µ T is at one of the extremes (zero or one), there can still be a substantial contribution from other types of ellipsoids.Thus we need to combine the joint constraints on ellipticity and triaxiality to infer the relative fractions of ellipsoids of different types as we will show later.Nevertheless, the fact that in some bins we are seeing very high µ T strongly suggests that there is a pattern in the data driving us towards high-redshift dwarfs being prolate in 3D.

3D Size-Mass Relations
Figure 12 shows that our approach automatically also gives us the 3D size-mass relations which is otherwise difficult to retrieve observationally.By 3D size, we mean the longest axis of the ellipsoid that corresponds to the de-projected 2D ellipse that encloses half of the Sérsic model light distribution.We see that the mean 3D size µ log A is larger for higher-mass galaxies at fixed redshift.We also see that as one goes to higher redshifts, the mean sizes decrease systematically: naturally, galaxies are getting smaller in size at fixed mass at earlier times.In other words, we are recovering the growth of galaxy size but now with 3D size-mass relations based on JWST data.There is a hint that the 3D size-mass relation flattens out for z = 3 − 8.The scatter in the 3D size-mass relation is remarkably small and constant with mass and redshift at σ log A ∼ 0.2 dex which consistent with previous work (van der Wel et al. 2014b).
As has also been shown by Suess et al. (2022), our Galfit-based model recovers the striking trend that galaxies on average appear smaller in JWST NIRCam imaging than they did in HST WFC3.The discrepancy becomes worse for lower mass, higher redshift bins which would correspond to fainter galaxies for whom deblending and related issues would be preferentially important.We discuss this more in Appendix C. We stress that despite the ∼ 0.1 dex offset in the 3D size-mass relation derived from SE++ and Galfit measurements, overall our Bayesian model is still reaching roughly similar regions of its enormous 7D parameter space and so our conclusions about 3D shapes should be robust.2019) and our new HMC code applied to all 5 CANDELS fields combined.The magenta and cyan points show results from our code applied to JWST-CEERS shape catalogs from SE++ and Galfit, respectively.The errorbars denote the 1σ width of the marginalized posterior from our code.Note how the mean ellipticity is well constrained to be high with σE ≲ 0.3 for all mass-redshift bins that we consider indicating that either disks or prolate galaxies dominate.9 but now for the evolution of the correlation coefficient between ellipticity and 3D size.In general the correlation coefficient is positive and consistent with ≳ 0.5 indicating that larger size galaxies tend to have higher ellipticity, which in turn means that larger size galaxies are either disky or prolate rather than spheroidal.The value of the triaxiality can break the degeneracy between interpreting high ellipticity objects as either disky (low triaxiality) or prolate (high triaxiality).Our CEERS modeling extends the CANDELS trend of low-mass, high-redshift galaxies having high triaxialities and thus prolate shapes albeit with larger errorbars.Likewise, higher mass and/or lower redshift galaxies are consistent with lower triaxialities and thus disky shapes.The standard deviations are generally consistent with σT ∼ 0.5.

3D Axis Ratios
Figure 13 shows the distribution of 3D axis ratios C/A vs B/A for model ellipsoids in each mass-redshift bin.These 2D histograms are the means of 500 random draws from the posterior for each mass-redshift bin.The yellow curves mark the (arbitrary) boundaries between prolate, oblate and spheroidal systems following van der Wel et al. (2014a) and Zhang et al. (2019).It is immediately obvious that galaxies are clustered near the bottom-right oblate region at low redshift, and that many of the massive bins are unconstrained due to the small sample sizes in CEERS.However, the key trend is that galaxies are clearly in the lower left prolate region at low masses and high redshifts.This is another way to visualize that the model strongly prefers predominantly 3D prolate geometries for low-mass dwarfs at high-redshift.We will discuss possible evolutionary connections across mass and redshift later.
However, looking more carefully at the low-mass, high-redshift, prolate-favoring bins, we see that the peaks of the distributions are not in the extreme far lower left corner.The typical C/A ∼ 0.25 but the B/A peaks at a ∼ 2× larger value of ∼ 0.5.We also see this in CANDELS (including in Figure 12 of Zhang et al. 2019).Thus, we cannot rule out another interpretation that is just as tantalizing: that we are indeed seeing flattened disks at these low masses and high redshifts, but they are unusually oval (triaxial, i.e., non-axisymmetric) with 3D B/A ∼ 0.5 instead of 3D B/A ∼ 1 like the nearly round disks we see today.We will discuss this more later along with the puzzle that some modern largevolume cosmological simulations with sufficient resolution do not reproduce these observational constraints on the 3D C/A − B/A diagram (e.g., Figure 8 of Pillepich et al. 2019).

Dependence of 3D Size on 3D Geometry
Figure 14 shows that the 3D sizes of galaxies depend on their 3D geometry on average.We use the dividing lines in the previous Figure 13 to separately compute the mean 3D size-mass relation for prolate, oblate and spheroidal systems based on our SE++ model.Spheroids are systematically smaller than prolate and oblate ellipsoids.The latter are similar to each other with oblate systems being slightly larger.This is consistent with our finding in Figure 10 that larger size galaxies tend to have a higher ellipticity.Note that the small sizes of the spheroids are still in the well resolved regime except for perhaps our highest redshift, lowest mass bin.We will discuss the possibility of unresolved small disks in subsection 6.3.

Class Fraction Evolution
Figure 15 shows the mass-redshift evolution of the three ellipsoid class fractions.Given the joint constraints on ellipticity (always high) and triaxiality (high for low-mass high-redshift dwarfs, lower for more massive galaxies at later times), we can estimate the relative contributions of prolate, oblate and spheroidal ellipsoids to the observed population in each mass-redshift bin.We see the striking trend that the prolate fraction goes from ∼ 25% in our z = 0.5 − 1.0 bin up to 80% in the z = 3 − 8 bin.Thus these galaxies are not an insignificant population and the majority of low-mass high-redshift dwarfs may start out as prolate.
The oblate (disky) fraction remains relatively constant at ∼ 20 − 60% across cosmic time with the lower fractions based on the SE++ model.There is a hint that dwarf disk fractions increase towards low redshift.The errorbars from the CEERS modeling are larger in our massive bins due to the small sample sizes but the results are still generally consistent with CANDELS at z < 2.5.Since we are focusing only on star-forming galaxies, it is perhaps not surprising that we find very low 3D spheroidal fractions for dwarfs.However, there are puzzling exceptions for the high-redshift, high-mass population where the 3D (star-forming) spheroidal fraction can rise to ∼ 50%.Given the small sample sizes of this high-mass bin at different redshifts, these high massive spheroid fractions may be influenced by the implicit prior for how we generate our library of toy ellipsoids.
Inspired by Figures 4 and 13, respectively, in van der Wel et al. (2014a) and Zhang et al. (2019), our Figure 16 shows a stacked bar chart with the mass-redshift dependence of our average class fractions using the SE++ model.This clearly illustrates the dominance of prolate ellipsoids at low mass and the emergence of disks at lower redshifts.Star-forming spheroids are negligible for these low-mass bins.The log M * /M ⊙ = 10 − 10.5 class fractions may be influenced by the implicit prior for how we generate our library of toy ellipsoids due to the small sample sizes.

RESULTS FROM COMPARATIVE ANALYSES
In this section, we explore whether high-probability prolate, oblate and spheroidal candidates in JWST-CEERS show differences in any other properties besides their 3D shape.These comparative analyses are meant to motivate future work with larger sample sizes and more detailed observational modeling.

Images of High Probability Candidates
We begin with Figure 17 which shows postage stamps of representative example galaxies with a high probabil-    ity (> 75%) of being prolate, oblate or spheroidal in 3D using the model based on SE++.All postage stamps are false-color RGB (F356W+F200W+F115W) images and probe rest-frame optical wavelengths.These galaxies fall in mass-redshift bins where the model was able to strongly constrain how different kinds of 3D ellipsoids populate the observed projected b/a vs. log a diagram.Note the striking diversity of the example galaxies both within and between class definitions in terms of their colors and substructures.Some of the example prolate candidates have multiple bright clumps whereas others are smooth, and a few even show hints of warps or bends, all of which is reminiscent of previous works that subclassify chain galaxies into different morphological types (Elmegreen et al. 2005).

Sérsic index and non-parametric measures
Motivated by our Bayesian classification scheme and by the diversity of galactic substructure seen in the previous subsection, here we present a statistical comparison of the Sérsic index and non-parametric morphological properties for objects with a high probability (> 75%) of falling in one of the three 3D classes of ellipsoids.Given our small sample size, we do not attempt to probe mass/redshift evolution in this paper and defer that to future work.
Figure 18 shows the distributions of the Sérsic index from statmorph for objects with > 75% probability of being assigned to one of the 3D ellipsoid classes.The distributions overlap substantially which means that the shape of the projected light profile cannot be used by itself to infer the 3D shape of individual galaxies.In particular, note that both prolate and oblate candidates have n ∼ 1 which implies that not all high-redshift galaxies with exponential light profiles are automatically disks but may instead be prolate or triaxial.It may seem surprising that what we call spheroidal galaxies have only modestly higher Sérsic indices of n ∼ 2. This likely reflects our mass and SFR cuts to select only star-forming galaxies with M star < 10 10.5 M ⊙ thus preferentially removing quenched, massive ellipticals which are expected to have de Vaucouleurs (n ∼ 4) light profiles.
Figure 19 shows the concentration-asymmetry diagram, Gini-M 20 diagram, and distributions of clumpiness (smoothness parameter S) as violins.Following Kartaltepe et al. (2023), we show the concentrationasymmetry divisors for nearby galaxies from Bershady et al. (2000) and Conselice (2003).Our high-probability prolate, oblate and spheroidal candidates do not appear isolated in the concentration-asymmetry plane.Our spheroidal candidates do seem to be near the z ∼ 1 elliptical region on the G − M 20 diagram from Lotz et al. (2008) but the prolate and oblate candidates are scattered.Surprisingly, the three sets of candidates show overlapping distributions of clumpiness.This indicates that it may be difficult to distinguish 3D shapes using these traditional non-parametric features.Note that we have only included star-forming galaxies here, and only the highest probability candidates were pulled from an  already small sample size so we cannot comment on mass and redshift dependence.

Visual Classifications
Figure 20 shows the fraction of our high-probability prolate, oblate and spheroidal candidates that have visual classifications as spheroidal, bulge-dominated, disk and irregular systems based on the deep learning approach from Huertas-Company et al. (2023).Our 3D spheroid candidates also tend to be visually classified as spheroidal or bulge-dominated systems.Our 3D oblate candidates are predominantly visually classified as irregular likely due to their clumpy structure.Finally, our 3D prolate candidates tend to be visually classified as irregular or disk objects.As before, this suggests that visual classifications alone may not be able to differentiate between 3D prolate and oblate classifications.

Specific Star Formation Rates
Figure 21 shows the deviation of the specific SFR of high-probability prolate, oblate and spheroidal candidates with respect to the star-forming main sequence (SFMS) in their respective redshift interval.We do not see any trends such that different types of ellipsoids live on different parts of the SFMS.This may have been expected if, e.g., prolate candidates were preferentially undergoing gas-rich mergers that cause starbursts, hence leading to elevated sSFRs.Our sample sizes are currently too small to look for dependence on stellar mass and/or redshift.

Dust Attenuation
Figure 22 shows the dust attenuation A V inferred from SED fitting for observed galaxies with > 50% probability of being prolate, oblate or spheroidal in 3D.We use a less stringent cut of 50% instead of 75% since our sample sizes are too small to assign high probabilities to edge-on oblate systems which we otherwise expect to dominate at low redshift and high mass based on CAN-DELS.We see that our prolate candidates tend to have very low dust attenuation even though they primarily have low projected b/a.On the other hand, we see a hint that edge-on oblate candidates with low b/a tend to have higher dust attenuation compared to more faceon oblate candidates.
These findings are consistent with Figure 2 of Zhang et al. ( 2019) although we cannot yet explore mass and redshift dependence due to our small sample sizes (but this is motivation for future work).In the presence of volume-filling diffuse dust, the higher A V we observe for edge-on oblate candidates is expected since edgeon disks would have a larger path length for dust attenuation.In contrast, prolate systems with small 3D B/A = C/A would have small path lengths for dust attenuation even when seen with small projected b/a.Our spheroidal candidates are also surprisingly dusty which makes us wonder if some of them are prolate objects seen down the barrel (thus appearing round) with a large path length for dust attenuation.In detail, these arguments depends on the clumpiness of the dust as shown by Zhang et al. (2023) as well as the degeneracies between dust attenuation, stellar population age and metallicity, and photometric redshift.Conselice (2003) above which nearby galaxies are mergers, and with the two sets of divisors from Bershady et al. (2000) used to separate nearby early-and late-type galaxies.The prolate candidates do not occupy a special place in this diagram.Middle: Gini-M20 diagram with divisors from Lotz et al. (2008) for z ∼ 1 galaxies.While our spheroidal candidates are near the early-type region, the oblate and prolate candidates are not isolated.Right: violin plot comparing the distribution of clumpiness (smoothness parameter S).Surprisingly, all three sets of candidates have similar and overlapping distributions.All of these panels indicate that prolate galaxies seem to be difficult to characterize using these traditional non-parametric features.2023) for our high-probability prolate, oblate and spheroidal star-forming galaxies.The colors indicate different visual classifications: spheroidal (green), bulge-dominated (pink), disk (orange) and irregular (purple).Our 3D spheroid candidates also tend to be visually classified as spheroids and bulge-dominated systems.Many of our 3D oblate candidates are classified as irregular probably due to their clumpy structure.Our 3D prolate candidates tend also to be visually classified as irregular or disk systems.
Figure 21.Distributions of the deviation of the specific SFR relative to the SFMS for high-probability prolate (blue), oblate (orange) and spheroidal (green) candidates.These deviations are calculated with respect to the SFMS in the redshift interval that each galaxy belongs to.With our small sample size, we do not see evidence that the three types of ellipsoids occupy distinct locations on the SFMS.
Figure 22.Dust attenuation AV across the b/a − log a diagram for observed galaxies with > 50% of being prolate (left), oblate (middle) or spheroidal (right) according to our 3D shape modeling.The prolate candidates are generally dust-free.In contrast, oblate candidates with lower b/a tend to have higher dust attenuation which is expected for edge-on disks and consistent with Figure 2 of Zhang et al. (2019).Our 3D spheroidal star-forming candidates are also surprisingly dusty.

Connection to Previous Work
Prior to JWST, there was already evidence from HST that massive high-redshift galaxies were largely consistent with the intrinsically oblate and spheroidal 3D shapes of massive systems we see today (Holden et al. 2012;Chang et al. 2013;van der Wel et al. 2014a;Zhang et al. 2019, Zhang et al. 2022).There was also evidence that fainter galaxies, despite their overwhelmingly peculiar/irregular appearance, may still be sorted into distinct and comprehensible classes such as chains, clump clusters and tadpoles (Cowie et al. 1995;van den Bergh et al. 1996;Elmegreen et al. 2005).However, the interpretation of these faint objects has been puzzling.One side of the argument is that these apparently exotic faint star-forming galaxies are consistent with underlying oblate geometries viewed edge-on, and that we do not observe as many round, face-on objects because of surface brightness detection biases (Dalcanton & Shectman 1996).On the other hand, statistical 3D shape modeling of the deepest, largest surveys from HST such as CANDELS suggest a real paucity of round dwarfs at high redshift and propose predominantly prolate 3D shapes as the solution (van der Wel et al. 2014a;Zhang et al. 2019).Our own completeness simulations in Appendix B demonstrate that we are complete to large face-on disks down to ∼ 26.5 AB mag (F277W), which is ∼ 2 mag deeper than HST-CANDELS for which studies of galaxy morphology are typically restricted to < 24.5 AB mag (F160W; van der Wel et al. 2012).
Many other studies have also used JWST to analyze the evolution of galaxy structure and morphology.As emphasized by Huertas-Company et al. (2023), JWST-CEERS goes significantly deeper than HST-CANDELS but only a relatively small fraction of objects that were classified as irregular in HST-CANDELS now have diffuse, extended emission from faint disks newly detected by JWST.Ferreira et al. (2022) and Kartaltepe et al. (2023) both used visual classifications along with parametric and non-parametric modeling to show that there are more disks in JWST imaging than seen by HST.Robertson et al. (2023) also showed that deep learning methods trained on HST-CANDELS visual classifications recover fainter disks in new JWST imaging.They too used the distribution of projected axis ratios to place an upper limit of 57% on the pure disk fraction though they did not comment on mass and redshift dependence.It is important to realize that our results are not necessarily inconsistent with these previous studies since visual classifications and exponential light profiles (Sérsic index n ∼ 1) alone cannot distinguish be-tween prolate and oblate 3D geometries.Indeed, Vega-Ferrero et al. (2023) find that many visually classified disks in JWST seem to have peculiar features and more detailed follow-up is required.The machine learning study by Tohill et al. (2023) also identified several distinct morphological classes for high-redshift galaxies observed with JWST, one of which is a set of consistently elongated systems.While our own analysis does identify disks at the ∼ 20 − 50% level, in the dwarf regime we find that prolate or triaxial ellipsoids outnumber oblate disks by a factor of several.
The 3D shapes of galaxies are intimately related to their sizes and the latter are of great interest for galaxy formation theory (Ferguson et al. 2004;Somerville et al. 2018).Studies with JWST are now beginning to measure projected size-mass or size-luminosity relations at high redshift (Yang et al. 2022;Ito et al. 2023;Ward et al. 2023).Our modeling approach simultaneously constrains the 3D size-mass relation and its evolution with redshift to z = 8.This appears broadly consistent with previous work with HST but it will be interesting to compare to future compilations.We find that the scatter in this relation is remarkably constant and small with σ log A ∼ 0.2 dex which is in agreement with van der Wel et al. (2014b).We also saw in Figure 14 the dependence of 3D size on 3D geometry, namely that highredshift star-forming spheroids tend to be much smaller than the oblate and prolate populations, particularly at dwarf scales.Combined with the class fraction evolution in Figure 15, this invokes a basic picture of the high-redshift star-forming dwarf population as comprising relatively rare, small-size spheroids and mostly large prolate systems while oblate geometries emerge later.Finally, we see systematically smaller sizes in JWST-CEERS relative to HST-CANDELS when using Galfit as was also shown by Suess et al. (2022), but this discrepancy goes away when we use SE++.We discuss this more in Appendix C.
Finally, our analysis directly builds on the HST-CANDELS work by Zhang et al. (2019) which itself generalized the study by van der Wel et al. (2014a).We showed that our new code reproduces the results of Zhang et al. (2019) using all 5 CANDELS fields in a fraction of the computing time.Our JWST results are also roughly consistent at z < 2.5 where CEERS and CANDELS overlap.One subtlety that is worth commenting on is that both van der Wel et al. (2014a) and Zhang et al. (2019), and by extension we, assume rather arbitrary boundaries for dividing 3D ellipsoids into one of the three extreme classes (oblate, spheroidal or prolate).In reality, our models are based on general triaxial ellipsoids such that if E ≫ 0 and T ̸ = 0 or T ̸ = 1, then the three axes have different lengths so we cannot say that we are clearly in one of the three extreme shape scenarios.Our Figure 13 and Figure 12 from Zhang et al. (2019) show that the 3D C/A−B/A model distributions in what we call the prolate-dominated mass-redshift bins seem to be in the more ambiguous triaxial category.This is because C/A ∼ 0.25 and B/A ∼ 2 × C/A ≈ 0.5 which evokes an unusually oval, flattened disk compared to the rounder disks seen in the local Universe.This triaxial rather than extreme prolate conclusion was also considered by Elmegreen et al. (2005), Ravindranath et al. (2006), Yuma et al. (2011), Law et al. (2012) and Yuma et al. (2012).It calls to mind the notion of disk settling in terms of thickness (Kassin et al. 2012) but here we argue that there may also be another kind of disk settling from oval to circular shapes.

Astrophysical and Cosmological Implications
We will now discuss the implications of our results for each class of 3D ellipsoids starting with star-forming spheroids.Previous work with HST has shown that starforming spheroids are rare (Brennan et al. 2015) and indeed our Figure 14 shows their fractions at ≲ 20% for log M * /M ⊙ < 10.However they rise to the ∼ 40% level for log M * /M ⊙ = 10.0 − 10.5 at z = 0.5 − 3. Given the small sample sizes of this massive bin at different redshifts, these high massive spheroid fractions may be influenced by the implicit prior in how we generate our library of toy ellipsoids.If real, this abrupt rise in the spheroidal fraction at these masses and redshifts may be related to the formation of compact star-forming galaxies at high redshift and their eventual transformation into compact quiescent galaxies, which are thought to be the progenitors of local giant ellipticals (van Dokkum et al. 2008;Barro et al. 2013) or bulges in massive spirals (de la Rosa et al. 2016;Costantin et al. 2021Costantin et al. , 2022)).These compact star-forming spheroids may arise due to what is called a "compaction" event where gas-rich mergers, disk instabilities or cold gas inflows cause a galaxy to rapidly increase in mass surface density (Dekel & Burkert 2014;Zolotov et al. 2015;Tacchella et al. 2016;Lapiner et al. 2023).
The origin of galactic disks remains a complicated problem and it is therefore important to identify high redshift disks and understand their properties in the context of local spirals (see the review by van der Kruit & Freeman 2011).Many authors have shown that selecting objects on the basis of exponential light profiles alone may not be sufficient for identifying genuine rotating disks at high redshift (Law et al. 2007(Law et al. , 2012;;van der Wel et al. 2014a) and our own Figure 16 reveals that both oblate and prolate candidates can have remarkably similar Sérsic indices of n ∼ 1.Thus 3D shape modeling like ours is crucial to identify oblate disk candidates in addition to visual classifications and parametric and non-parametric morphological measurements.We find oblate fractions of ∼ 20−60% over our full mass and redshift ranges of log M * /M ⊙ = 9.0 − 10.5 and z = 0.5 − 8.0 from JWST-CEERS, suggesting that disks were indeed already in place at very early times.Our Figure 13 shows that, on average, for mass-redshift bins where the population is predominantly in the lower right oblate corner, we find a typical intrinsic C/A ∼ 0.2 − 0.3 which according to Elmegreen et al. (2005) is ∼ 2 − 3× thicker than local spirals.The mean 3D sizes and remarkably small ∼ 0.2 dex scatter of our highest redshift disk candidates may provide new constraints for modeling the relative roles of angular momentum conservation, gas accretion and outflows, mergers, and halo properties like concentration, virial radius and assembly history in governing disk formation and evolution.
Perhaps the most startling aspect of our results is the high prolate dwarf fractions rising to ∼ 50 − 80% at z = 3 − 8 for log M * /M ⊙ = 9.0 − 9.5 galaxies.Ceverino et al. (2015) and Tomassetti et al. (2016) have argued based on cosmological zoom-in simulations that prolate dwarfs at high-redshift can be explained if they form within host halos that are themselves elongated along their host dark matter filament.These objects would continually undergo mergers along the direction of the filament hence causing their elongation and possibly intrinsic alignments on large scales (as originally proposed by Pandya et al. 2019).At lower redshifts, as filaments become diffuse and the continuous merger process slows down, the observed prolate fraction may decrease in accordance with our Figure 14.In other words, our high prolate fractions at high-redshift may be telling us something about the hierarchical mergerdriven process of galaxy formation.Now, in this case, we may expect merger-driven starbursts but Figure 21 shows that high-probability prolate candidates do not occupy a special place on the SFMS, at least with our small sample size.Ceverino et al. (2015) and Tomassetti et al. (2016) have also suggested that the centers of halos hosting prolate galaxies are dark matter dominated, and as they undergo accretion and "compaction", they become baryon dominated and capable of supporting disks (see also Lapiner et al. 2023).This prolate to oblate transition is thought to involve the formation of compact "blue nuggets" in a characteristic mass range of log M * /M ⊙ ∼ 9.2 − 10.3 as observed in HST-CANDELS (Huertas-Company et al. 2018).We do not yet have large enough sample sizes with JWST to explore such an evolutionary connection across mass and redshift.
In the prolate phase, the stellar motions should be velocity dispersion supported and rotating gaseous disks may not be a stable configuration.Existing spectroscopic constraints do already show a decline in the fraction of rotationally-supported galaxies with increasing redshift and decreasing mass (e.g., Figure 4 of Simons et al. 2017), but prohibitively deep stellar spectroscopy will be needed to definitively test this picture.We have also discussed the possibility that what we are calling prolate dwarfs are actually unusually oval (triaxial) ellipsoids.If true, then just as galaxy disks "settle" from thick to thin over cosmic time (Kassin et al. 2012), they must also settle from oval to circular shapes towards low redshift.This oval to circular transition may be driven by a changing mode of gas accretion or mergers wherein at earlier times it is more clumpy leading to episodic star formation and at later times it is smoother.Alternatively, we may interpret these as stellar bars (Gullberg et al. 2019), though they may form differently from normal bars since there is no obvious, bright, extended stellar disk and since the dark matter likely dominates over the self-gravity of the stars in these dwarfs.Nevertheless, we point out that progenitors of Milky Way-mass galaxies at z ≳ 2 are thought to have log M * /M ⊙ ∼ 9 − 10 (e.g., Papovich et al. 2015) which is where we find consistently high fractions of prolate and/or triaxial ellipsoids.This implies that our own Galaxy may have gone through a prolate or triaxial morphological phase in its past.
Lastly it is worth commenting on hydrodynamical simulations.Early on during the debates about the nature of chain galaxies, Immeli et al. (2004a), Immeli et al. (2004b) and Bournaud et al. (2007) used simulations to argue that chain galaxies are edge-on manifestations of intrinsically oblate clumpy star-forming galaxies.As far as we are aware, only Ceverino et al. (2015), Tomassetti et al. (2016) and Lapiner et al. (2023) claim to have found unambiguously prolate or triaxial highredshift dwarfs in their zoom-in simulations.Pillepich et al. (2019) show in their Figures 8 and 9 that there are far fewer prolate galaxies in the TNG50 simulations compared to the observational estimates from van der Wel et al. (2014a) and Zhang et al. (2019), and now our paper as well.Their high-redshift dwarfs seem to be predominantly spheroidal or oblate with relatively high C/A and/or B/A compared to our Figure 13 (see also Zhang et al. 2022).This discrepancy between at least two sets of simulations with respect to each other and vs. observational constraints demands that 3D shapes be analyzed in more detail in modern simulations.We note in passing that the uncertain nature of dark matter has motivated simulations of alternatives to cold dark matter such as fuzzy dark matter in which filaments and galaxies may naturally be more elongated (Mocz et al. 2020;Dome et al. 2023).

Limitations of our study
Our analysis, like any, is subject to uncertainties.First and foremost, our sample sizes from JWST-CEERS are rather small.We argue that our key results regarding low-mass galaxies being predominantly prolate or triaxial are robust given the sufficiently large sample sizes from CEERS alone.However, our massive log M * /M ⊙ = 10 − 10.5 bins have small sample sizes and so their posteriors on 3D shape parameters may be influenced by our priors (in particular, our choice to uniformly sample toy ellipsoids in the E − T − log A space rather than C/A − B/A − log A space).We also see that these massive bins tend to have greater residuals between the observed and mean model histograms which may arise from both the small sample sizes and perhaps limited flexibility of the model.This can be remedied in the future by combining datasets from different JWST surveys and trying more sophisticated models.We plan to pursue this in the future since our differentiable Bayesian approach is uniquely fast and robust.By combining datasets from different areas of the sky, we can also address the issue of cosmic variance.Relatedly, our analysis may suffer from detection or measurement biases, though in Appendix B we showed that CEERS is complete to disks with a range of sizes and axis ratios as faint as ∼ 26.5 AB mag (F277W), which is ∼ 2 mag deeper than HST-CANDELS (F160W).We cannot rule out fainter disks but it is unclear if they would satisfy our log M * /M ⊙ > 9 sample selection limit.Future parameter recovery tests for mock Sérsic profiles and fake 3D ellipsoids inserted into the real imaging will help address these questions (see also McGrath et al. in preparation).
We found systematic size differences between SE++ and Galfit with the latter producing systematically smaller size measurements for the same galaxies seen in HST-CANDELS (see also Suess et al. 2022).Figure 12 shows that this discrepancy is worse for lower mass, higher redshift objects which would also be the faintest and thus preferentially susceptible to issues with deblending and masking.Both codes use the same global background-subtracted images, fix the local background to zero, and the same empirical PSFs so these cannot be the causes of the size discrepancies.In Appendix C, we discuss this in more detail.However, the key point is that regardless of the discrepancies between the SE++ and Galfit measurements, our two sets of 3D shape modeling results using the different datasets still lead to similar conclusions.In other words, despite any differences between the two codes, our model still converges to roughly the same region of its enormous 7D parameter space when using either dataset.
We showed in Figures 3 and 4 that, for many of the low-mass high-redshift bins, there is a deficit of large b/a objects and an excess of low b/a objects especially at large log a.This is the crux of our argument in favor of high-redshift dwarfs being prolate or triaxial.However, the impact of the PSF means that we cannot rule out the possible existence of a population of small-size disks with radii close to the PSF FWHM limit since it would be difficult to resolve their low or even intermediate b/a projections.This could be another explanation for why smaller size galaxies tend to have larger b/a: they may be intrinsically spheroidal as Zhang et al. (2019) and we suggest, or we may only be observing the face-on projections of small disks.Now, it is possible to recover the projected shapes of small galaxies if the PSF is well understood, and indeed many of our z = 3 − 8 low-mass, small-size galaxies are constrained to have b/a well below the PSF limit.This caveat also does not explain away our main finding that larger size (well resolved) dwarfs preferentially show up with small b/a indicating prolate or triaxial geometries.Ultimately, we argue that telescopes with even better resolution than JWST will be needed to definitively constrain the existence of small-size disks.In the meantime, larger datasets will allow more sophisticated modeling approaches such as fitting for multiple populations in a single mass-redshift bin (e.g., using Gaussian mixture models) which may also give us a better handle on PSF-related limitations.Somewhat related is that our analysis assumes that SE++ and Galfit are measuring the true (intrinsic) projected axis ratios of galaxies after accounting for instrumental effects but we have not considered the impact of weak lensing on distorting the shapes of distant dwarfs, namely making them appear more systematically elongated.
The redshifts and stellar masses we use to group galaxies into different bins, and the SFRs we use to select only star-forming galaxies, are all based on SED fitting.It is well known that these can all be uncertain depending on the assumptions and methods used for SED fitting (Pacifici et al. 2023).Along these lines, we found hints of a trend in Figure 22 such that high-probability oblate candidates seen edge-on have a higher dust attenuation A V whereas prolate candidates appear to be relatively dust-free.This requires follow-up since the detailed geometry of dust (i.e., whether it is clumpy or diffuse) will also affect attenuation in addition to the viewing angle dependence (e.g., as was recently shown by the joint analysis of 3D shapes and dust attenuation for HST galaxies by Zhang et al. 2023, and see also Padilla &Strauss 2008 andZhang et al. 2019).
Finally, we have not explored the possibility that the changing mass-to-light ratios (and radial gradients thereof) of disks can masquerade as geometric evolution.In other words, without any 3D shape modeling, can the asymmetric distributions of projected b/a and the banana-shaped b/a − log a 2D histograms be reproduced via disk color-related selection effects?Are we simply seeing different parts of faint underlying disks at high redshift such as bars and star-forming knots?In a related sense, could strong emission lines from the gas in these early systems affect their appearance even in broadband imaging (e.g., Amorín et al. 2015)?It is beyond the scope of this paper to address these questions but they would be fruitful avenues for future work.

CONCLUSIONS
We have developed a differentiable Bayesian model and used Hamiltonian Monte Carlo to constrain the 3D shapes of high-redshift star-forming galaxies from JWST-CEERS observations.To ensure that our results are not driven by source detection and shape characterization methods, we have used two different catalogs: internal CEERS team catalogs based on Galfit (Mc-Grath+ in prep.) and independent catalogs from the next-generation SourceExtractor++ (Bertin et al. 2020;Kümmel et al. 2022).We run our efficient and robust model on CANDELS data to reproduce previous results from (Zhang et al. 2019) in a fraction of the computing time and we also use mock tests to show that the model and data have constraining power for sample sizes as small as ∼ 50 galaxies.For the new JWST CEERS data, our model is able to constrain the mean ellipticity, mean triaxiality, mean size and the covariances of these quantities as a function of stellar mass and redshift over the ranges log M * /M ⊙ = 9.0 − 10.5 and z = 0.5 − 8.0.
Our findings can be summarized as follows: • With the better spatial resolution and sensitivity of JWST NIRCam imaging, we still find peculiarly asymmetric distributions of projected axis ratios peaking at low values of b/a ∼ 0.3 − 0.4 for galaxies with log M * /M ⊙ = 9.0 − 10.0 at z > 1.This confirms previous findings from HST but now alleviates concerns about blending leading to an overabundance of elongated objects as well as incompleteness to faint face-on disks.(Figures 1, 2, 3, 4) • We assume that galaxies can be described as 3D ellipsoids (of which the extreme types are oblate, prolate and spheroidal) and demonstrate how ran-dom projections of these types of systems would trace out different paths on the projected b/a − log a plane.Spheroids would preferentially show up with large b/a, axisymmetric disks would show a uniform vertical stripe, and prolate and oval (triaxial, i.e., non-axisymmetric) disks would both trace out a curved "banana" trajectory on this diagram.(Figure 5) • Using 2D histograms of projected b/a−log a as observational constraints, our Bayesian model combined with Hamiltonian Monte Carlo (Figure 6) finds high mean ellipticities with small scatter for all mass-redshift bins we consider.This means the galaxies we observe are either disks or prolate.In many mass-redshift bins, our model is able to break that degeneracy by constraining the triaxiality.For log M * /M ⊙ = 9.0 − 9.5 at z > 1, the mean triaxiality tends to be very high strongly favoring the prolate interpretation.The mean triaxiality tends to be lower for higher mass galaxies at lower redshifts suggesting the emergence of disks.(Figures 9,11) • Our model also automatically constrains the 3D size-mass relation and its mass-redshift evolution.At a fixed redshift, higher mass galaxies have larger sizes.As one goes to higher redshift, the size-mass relation drops in normalization: more distant galaxies are naturally smaller in 3D at fixed mass.The scatter in the size-mass relation is remarkably small and constant with mass-redshift at σ log A ∼ 0.2 dex.The 3D size-mass relation depends on 3D geometry in the sense that highredshift star-forming dwarfs tend to be systematically smaller than prolate and oblate ellipsoids (Figures 12,14) • The fraction of prolate galaxies rises from ∼ 25% at z = 0.5 − 1.0 up to ∼ 50 − 80% at z = 3 − 8 for log M * /M ⊙ = 9.0 − 9.5 dwarfs.The prolate fraction decreases towards higher masses at all redshifts.The dwarf disk fraction tends to rise from ∼ 20−40% to ∼ 40−60% towards low redshift.We find surprisingly high ∼ 40% spheroid fractions for massive galaxies with log M * /M ⊙ = 10 − 10.5 but this may be influenced by small sample sizes and the implicit prior for generating our library of toy model ellipsoids.(Figures 15,16) • If low-mass high-redshift dwarfs are indeed disks, they cannot be axisymmetric but instead must be unusually oval (triaxial) compared to local circular disks implying that their stars cannot move on circular orbits.This is supported by our model which suggests they have 3D axis ratios of C/A ∼ 0.25 but B/A ∼ 2 × C/A ≈ 0.5 as opposed to B/A ∼ 1 observed for the nearly round disks that we see today.This interpretation suggests that disks may "settle" not only from thick to thin but also from oval to circular shapes with cosmic time.(Figure 13) • We can assign high probability (> 75%) classifications of 3D ellipsoid class to ∼ 2000 galaxies irrespective of mass-redshift.For these, color postage stamps reveal remarkably linear, large and thin systems classified as prolate as well as obvious cases of face-on disks and compact spheroids.Some of the prolate candidates are reminiscent of the long forgotten class of "chain galaxies" identified in deep Hubble images (Cowie et al. 1995;van den Bergh et al. 1996;Elmegreen et al. 2005).
(Figure 17) • The high-probability prolate and oblate candidates have similar Sérsic indices of n ∼ 1 meaning this alone cannot be used to infer their 3D geometry.Both tend to be visually classified as disks or irregular.Surprisingly the three classes of high-probability candidates do not reveal significant differences in their non-parametric morphological properties like concentration, asymmetry, clumpiness (smoothness), Gini coefficient the second order moment of the 20% brightest regions.The three classes also do not differ in their distribution of deviations from the SFMS.However, we find hints that edge-on oblate candidates have higher dust attenuation A V whereas prolate candidates are generally blue and dust-free.(Figures 18,19,20,21,22)  For galaxies brighter than 26.5 AB mag (corresponding to most of our selected sample), we are nearly complete even to large, face-on disks.Middle: For marginally faint sources (26.5-27AB mag), we start to see hints of incompleteness to large face-on disks, but these would be at the extremely faint end of our sample.Right: For galaxies fainter than 27 AB mag, we are severely incomplete but it is not clear if these galaxies would satisfy our log M * /M⊙ > 9 cut.These simulations demonstrate that, for extended sources, JWST-CEERS goes ∼ 2 magnitudes deeper than HST-CANDELS for which a cut of 24.5 AB mag is usually made in the F160W filter when studying galaxy morphology.
Figure 26.Sérsic model fits and residuals for some example objects with high probability of being prolate (top 3 rows), oblate (second from bottom row) and spheroidal (bottom row).These are all 3 ′′ × 3 ′′ cutouts in either F115W or F200W which corresponds to rest-frame optical wavelengths given the redshifts of these galaxies.From left to right: cutout of NIRCam image, Galfit model, Galfit residuals, SE++ model and SE++ residuals.These fits and residuals all look sensible and this is also representative of many other galaxies that we visually inspected (including higher redshift galaxies for which we use redder filters).parameter space particularly for low-mass galaxies.This is for all sources in CEERS that satisfy our selection cuts, namely log M * /M⊙ = 9.0 − 10.5 and z = 0.5 − 8.0, split into our mass bins increasing from left to right.When SE++ finds a larger size than Galfit, it also tends to find a higher Sérsic index than Galfit as evidenced by the large number of points in the top-right quadrant especially for lower mass (fainter) galaxies.The colorbar shows how these n − Re residuals correlate with b/a residuals, namely that when SE++ finds both a larger size and n than Galfit, it also tends to find a lower b/a.This is for F200W but we see similar trends for the other filters.

Figure 1 .
Figure1.The distribution of projected axis ratios in our different mass-redshift bins as measured with Galfit (cyan) and SE++ (magenta).The smooth curves are kernel density estimates.The two sets of measurements are generally consistent.We emphasize that the distributions for low-mass, high-redshift bins are asymmetric and skewed towards low b/a suggestive of prolate populations.

Figure 2 .
Figure 2. Similar to Figure 1 but now for the size distributions.The dashed vertical lines indicate the PSF FWHM converted to kpc at the midpoint of each redshift bin.As before, the two sets of measurements are similar but SE++ finds more large-size objects than Galfit (we discuss this more in Appendix C).

Figure 3 .
Figure3.Two-dimensional histograms of projected b/a vs. log a from Galfit for our different mass-redshift bins.The colorbar denotes the number of galaxies in each histogram cell.The dashed cyan lines denote the PSF FWHM translated to proper kpc at the midpoint of each redshift bin where the curved lines come from assuming b = PSF FWHM.The sizes and axis ratios are intrinsic, i.e., from the best-fitting Sérsic model before being convolved with the PSF.The lower mass bins reveal a striking trend in this diagram, namely an excess of low b/a "edge-on" objects and a deficit of rounder objects.The histograms for some of the higher mass bins are noisy due to small sample sizes, so for these we expect our Bayesian model posteriors to reflect the priors.

Figure 4 .
Figure 4. Similar to Figure 3 but now using the SE++ catalog.We see the same trends particularly, namely that the lower mass, higher redshift bins have a deficit of rounder objects and an excess of low b/a sources.

Figure 5 .
Figure 5.An illustration of how a population of extreme ellipsoids would appear in projection on the b/a − log a diagram.The histograms on the left are for a single ellipsoid seen from many different viewing angles accounting for measurement errors.All four ellipsoids have the same intrinsic log A/kpc = 0.5.The histogram color corresponds to the fraction of projections that end up in a given b/a − log a cell with purple being very low and bright yellow being very high.For each type of ellipsoid, we show a 3D visualization along with a cross-section where the latter clearly differentiates between prolate systems and circular versus oval disks.Face-on and edge-on projections are also shown corresponding to locations in the histograms.Top: spheroids would be concentrated at large projected b/a because they appear round from angle viewing angle.Second from top: oblate/disky objects trace out a uniform distribution in b/a because they are equally likely to be seen from any viewing angle.They have a cut-off at low axis ratios due to their finite thickness.Third from top: prolate ellipsoids would trace out a banana in this diagram since they have two short axes leading to most projections appearing "edge-on" with small b/a and large size.Bottom: oval (triaxial) disks would also preferentially have lower b/a in projection because even when seen face-on, they would not be perfectly circular.
of the 2D projected b/a-log(a) banana diagram for all 3D ellipsoids in library P( ⃗ θ) Hamiltonian Monte Carlo Compute Poisson likelihood of model vs. observed 2D histogram taking into account sample size of data Observed uncertainties in b/a and log(a) are taken into account when creating the library of banana diagrams Poisson likelihood of each bin of the 2D histogram is assumed to be independent for simplicity

Figure 7 .
Figure7.Example corner plot from our HMC for the z = 2.0 − 2.5 and log M * /M⊙ = 9.0 − 9.5 bin.The different colors correspond to the different datasets used for the fitting: SE++ (magenta), Galfit (cyan) and CANDELS (yellow).Results from the 4 individual HMC chains for each run have been combined since the chains were all converged.The CEERS posteriors agree relatively well with each other and with CANDELS (i.e., the models are constrained to be in similar regions of the large 7D parameter space).The grid of histograms in the top-right shows that the mean model matches each observed dataset well and that the residual map is relatively featureless (the inset colorbars show the number of galaxies per histogram bin).CANDELS gives the tightest constraints because of the much larger sample size at z < 2.5.For this mass-redshift bin, both the mean ellipticity and mean triaxiality are high which suggests a predominantly prolate population.Analogous figures for the other mass-redshift bins can be found at the Harvard Dataverse: https://doi.org/10.7910/DVN/SWTKVA

Figure 9 .
Figure 9.The evolution of the mean ellipticity (top row) and its standard deviation (bottom row).Each column corresponds to a different redshift bin increasing from left to right as indicated by the subplot titles.The black and yellow points show results at z ≤ 2.5 respectively from Zhang et al. (2019) and our new HMC code applied to all 5 CANDELS fields combined.The magenta and cyan points show results from our code applied to JWST-CEERS shape catalogs from SE++ and Galfit, respectively.The errorbars denote the 1σ width of the marginalized posterior from our code.Note how the mean ellipticity is well constrained to be high with σE ≲ 0.3 for all mass-redshift bins that we consider indicating that either disks or prolate galaxies dominate.

Figure 10 .
Figure 10.Similar to Figure9but now for the evolution of the correlation coefficient between ellipticity and 3D size.In general the correlation coefficient is positive and consistent with ≳ 0.5 indicating that larger size galaxies tend to have higher ellipticity, which in turn means that larger size galaxies are either disky or prolate rather than spheroidal.

Figure 11 .
Figure 11.Similar to Figure 9 but now for the evolution of the mean (top) and standard deviation (bottom) of the triaxiality.The value of the triaxiality can break the degeneracy between interpreting high ellipticity objects as either disky (low triaxiality) or prolate (high triaxiality).Our CEERS modeling extends the CANDELS trend of low-mass, high-redshift galaxies having high triaxialities and thus prolate shapes albeit with larger errorbars.Likewise, higher mass and/or lower redshift galaxies are consistent with lower triaxialities and thus disky shapes.The standard deviations are generally consistent with σT ∼ 0.5.

Figure 12 .
Figure 12.Similar to Figure 9 but now for the evolution of the mean (top) and standard deviation (bottom) of the 3D size-mass relation.The dotted horizontal black lines in the upper panels show the PSF FWHM in the relevant filter at the midpoint of each redshift bin.There is a clear evolution in the 3D size-mass relation such that more massive galaxies in a given redshift bin are larger and that the size-mass relation overall decreases towards high redshift.The scatter in the 3D size-mass relation is remarkably constant at σ log A ∼ 0.2 with both mass and redshift.

Figure 13 .
Figure 13.The distribution of 3D axis ratios in each mass-redshift bin from the model fit to the SE++ catalogs.Each panel shows the average of 500 histograms constructed randomly from the posterior for each mass-redshift bin.The yellow boundaries classify the 3D ellipsoids into the prolate, oblate and spheroidal shapes following van der Wel et al. (2014a) and Zhang et al. (2019).Note how the distribution shifts towards the prolate bin at low masses and high redshifts, but peaks at B/A ∼ 2 × C/A which implies unusually oval (triaxial) disks.For the massive bins with small sample sizes, the distributions are less well constrained and thus broader.An analogous version of this figure based on Galfit can be found at the Harvard Dataverse: https://doi.org/10.7910/DVN/SWTKVA.

Figure 14 .
Figure14.Mean 3D size-mass relations separately for prolate (blue), oblate (orange) and spheroidal (green) ellipsoids.The split by 3D geometry is based on the dividing lines in the previous Figure13.Spheroids tend to have systematically smaller 3D sizes than prolate and oblate ellipsoids, which themselves are similar with hints that oblate systems are slightly larger.These constraints are based on our SE++ model.

Figure 15 .
Figure 15.Similar to Figure 9 but now the evolution of the prolate (top row), oblate (middle row) and spheroidal (bottom row) class fractions.Our new HMC modeling is generally consistent with the Zhang et al. (2019) fractions at z < 2.5 from all 5 CANDELS fields.With CEERS, we find that the prolate fractions of low-mass dwarfs continue to remain ≳ 50% out to z = 8.

Figure 16 .
Figure 16.Alternative visualization of Figure 15 as a stacked bar chart using our average model class fractions with SE++.Prolate fractions are shown in blue, oblate in orange and spheroidal in green.Prolate galaxies dominate at low masses at all z > 1. Oblate disks are found at the ∼ 20 − 50% level and increase towards low redshift.Spheroids are negligible at these masses.The log M * /M⊙ = 10 − 10.5 class fractions may be influenced by the implicit prior for how we generate our library of toy ellipsoids due to the small sample sizes.

Figure 17 .
Figure 17.Example 3" × 3" false-color RGB (F356W+F200W+F115W) postage stamps of galaxies with a high (> 75%) probability of being prolate (top two rows), oblate (middle two rows) or spheroidal (bottom two rows).The inset text shows the photometric redshift, stellar mass, F200W b/a, Re and n, and the highest class probability.These galaxies fall in mass-redshift bins where our model was able to assign confident classification probabilities to different regions of the b/a vs. log a diagram.

Figure 18 .
Figure18.Violin plot showing the distribution of Sérsic index from statmorph for star-forming galaxies with > 75% probability of being assigned to one of the 3D ellipsoid classes.The distributions overlap considerably which means that the shape of the projected light profile alone cannot be used to assign galaxies to a given 3D ellipsoid class.Many oblate and prolate candidates both have exponential light profiles with n ≈ 1 (cyan line).The Sérsic indices of spheroids are only marginally higher but recall that our starforming cut removes the typical quenched n ∼ 4 objects.

Figure 19 .
Figure 19.Non-parametric morphological properties for high-probability prolate (blue), oblate (orange) and spheroidal (green) candidates.Left: Concentration-Asymmetry diagram with the A = 0.35 divisor fromConselice (2003) above which nearby galaxies are mergers, and with the two sets of divisors fromBershady et al. (2000) used to separate nearby early-and late-type galaxies.The prolate candidates do not occupy a special place in this diagram.Middle: Gini-M20 diagram with divisors fromLotz et al. (2008) for z ∼ 1 galaxies.While our spheroidal candidates are near the early-type region, the oblate and prolate candidates are not isolated.Right: violin plot comparing the distribution of clumpiness (smoothness parameter S).Surprisingly, all three sets of candidates have similar and overlapping distributions.All of these panels indicate that prolate galaxies seem to be difficult to characterize using these traditional non-parametric features.

Figure 20 .
Figure 20.Visual classifications from the deep learning approach of Huertas-Company et al. (2023) for our high-probability prolate, oblate and spheroidal star-forming galaxies.The colors indicate different visual classifications: spheroidal (green), bulge-dominated (pink), disk (orange) and irregular (purple).Our 3D spheroid candidates also tend to be visually classified as spheroids and bulge-dominated systems.Many of our 3D oblate candidates are classified as irregular probably due to their clumpy structure.Our 3D prolate candidates tend also to be visually classified as irregular or disk systems.

Figure 23 .
Figure 23.Corner plot showing parameter recovery tests for a prolate-dominated mock ellipsoid population.The different colors show results from our HMC for mocks with different sample sizes based on random Poisson sampling of the underlying true 2D projected b/a vs. log a histogram: N=500 (gray), 100 (magenta), 50 (cyan) and 25 (yellow).The black vertical and horizontal lines mark the true parameter values.The HMC is successful at recovering the true parameter values for N = 500 and N = 100, and even for N = 50 albeit with broader posteriors for µT and σT.However for N = 25 the posteriors for the triaxiality parameters are broad and the model cannot distinguish between prolate and oblate populations, though it does get the ellipticity, size and correlation coefficient parameters right.This is generally true for other combinations of input mock parameters that we tried for which analogous figures can be found at the Harvard Dataverse: https://doi.org/10.7910/DVN/SWTKVA

Figure 25 .
Figure 25.Completeness across the b/a-size diagram based on mock simulations where we injected 10 4 fake Sérsic profiles into the CEERS imaging with uniformly randomly assigned F277W magnitude, b/a and half-light radius (fixing n = 1).Left:For galaxies brighter than 26.5 AB mag (corresponding to most of our selected sample), we are nearly complete even to large, face-on disks.Middle: For marginally faint sources (26.5-27AB mag), we start to see hints of incompleteness to large face-on disks, but these would be at the extremely faint end of our sample.Right: For galaxies fainter than 27 AB mag, we are severely incomplete but it is not clear if these galaxies would satisfy our log M * /M⊙ > 9 cut.These simulations demonstrate that, for extended sources, JWST-CEERS goes ∼ 2 magnitudes deeper than HST-CANDELS for which a cut of 24.5 AB mag is usually made in the F160W filter when studying galaxy morphology.

Figure 27 .
Figure 27.Illustration showing that SE++ and Galfit end up in different parts of the degenerate Sérsic n − Re − b/aparameter space particularly for low-mass galaxies.This is for all sources in CEERS that satisfy our selection cuts, namely log M * /M⊙ = 9.0 − 10.5 and z = 0.5 − 8.0, split into our mass bins increasing from left to right.When SE++ finds a larger size than Galfit, it also tends to find a higher Sérsic index than Galfit as evidenced by the large number of points in the top-right quadrant especially for lower mass (fainter) galaxies.The colorbar shows how these n − Re residuals correlate with b/a residuals, namely that when SE++ finds both a larger size and n than Galfit, it also tends to find a lower b/a.This is for F200W but we see similar trends for the other filters.