The Complete CEERS Early Universe Galaxy Sample: A Surprisingly Slow Evolution of the Space Density of Bright Galaxies at z ∼ 8.5–14.5

We present a sample of 88 candidate z ∼ 8.5–14.5 galaxies selected from the completed NIRCam imaging from the Cosmic Evolution Early Release Science survey. These data cover ∼90 arcmin2 (10 NIRCam pointings) in six broadband imaging filters and one medium-band imaging filter. With this sample we confirm at higher confidence early JWST conclusions that bright galaxies in this epoch are more abundant than predicted by most theoretical models. We construct the rest-frame ultraviolet luminosity functions at z ∼ 9, 11, and 14 and show that the space density of bright (M UV = −20) galaxies changes only modestly from z ∼ 14 to z ∼ 9, compared to a steeper increase from z ∼ 8 to z ∼ 4. While our candidates are photometrically selected, spectroscopic follow-up has now confirmed 13 of them, with only one significant interloper, implying that the fidelity of this sample is high. Successfully explaining the evidence for a flatter evolution in the number densities of UV-bright z > 10 galaxies may thus require changes to the dominant physical processes regulating star formation. While our results indicate that significant variations of dust attenuation with redshift are unlikely to be the dominant factor at these high redshifts, they are consistent with predictions from models that naturally have enhanced star formation efficiency and/or stochasticity. An evolving stellar initial mass function could also bring model predictions into better agreement with our results. Deep spectroscopic follow-up of a large sample of early galaxies can distinguish between these competing scenarios.


INTRODUCTION
The first 500 million years of cosmic time (z ≳ 10), when the first stars and galaxies formed, began to grow, and kick-started the process of reionization, was largely hidden from view until recently.The depth achievable with JWST near-infrared imaging, along with the capabilities to deeply probe beyond ∼1.6 µm for the first stevenf@astro.as.utexas.edu* NASA Postdoctoral Fellow † NSF Graduate Fellow time, were expected to revolutionize our understanding of this early epoch.As soon as the first data from JWST were released in July 2022, this renaissance in our understanding unfolded immediately.
Prior to JWST, the high-redshift community had debated about the evolution of the rest-frame ultraviolet (UV) luminosity function (and by extension, the cosmic star-formation rate density) at z > 8.While there was good agreement that these quantities evolved smoothly downward from z = 4 to z = 8 (e.g.Bouwens et al. 2015;Finkelstein et al. 2015), results differed at z > 8, with some advocating for continued evolution with the same declining slope from lower redshifts (e.g.McLeod et al. 2016;Finkelstein 2016), while others claimed evidence of an accelerated decline towards higher redshifts (e.g.Oesch et al. 2018;Bouwens et al. 2019).Early JWST surveys, including the Cosmic Evolution Early Release Science Survey (CEERS; PID 1345, PI Finkelstein) and GLASS (PID 1324, PI Treu) surveys, were designed in part to determine which of these evolutionary possibilities was correct.
In defiance of expectations, several studies immediately reported the presence of bright (m ≲ 27.5) galaxies at z > 10 from the CEERS and GLASS surveys (e.g., Castellano et al. 2022;Naidu et al. 2022;Finkelstein et al. 2022b;Donnan et al. 2023b).These galaxies were both brighter and at higher redshifts than expected from these early surveys, which were neither extremely wide nor deep.While some early results changed due to the uncertain characterization of the NIRCam photometric zeropoint (Boyer et al. 2022), within a few months more robust samples of galaxies were in place (e.g.Finkelstein et al. 2023;Harikane et al. 2023a;McLeod et al. 2023).
These first studies with larger (∼ 20 object) samples agreed that the abundance of z ≳ 10 galaxies was in excess of both theoretical and empirical predictions, with explanations ranging from an evolving initial mass function (IMF), changes in star-formation efficiency, changes in dust attenuation, contribution from active galactic nuclei, rampant sample contamination (e.g.Finkelstein et al. 2023;Harikane et al. 2023a;Ferrara et al. 2023;Dekel et al. 2023;Mason et al. 2022), to even changes to the underlying cosmology (e.g.Boylan-Kolchin 2022; Liu & Bromm 2022).Such explanations are compelling, yet these early datasets spanned small dynamic ranges in UV luminosity, with few galaxies yet included.
Here we report the results from a search for z ≥ 8.5 galaxies over the completed CEERS dataset.This follows on the work of Finkelstein et al. (2023, hereafter F23) who did a similar search, finding 26 galaxies over the first four CEERS pointings.Importantly, here we combine our results with those from Leung et al. (2023) who used a near-identical analysis procedure to identify a sample of 38 galaxies at similarly high redshift from the deep NGDEEP (the Next Generation Deep Extragalactic Exploratory Public survey; Bagley et al. 2023) NIRCam imaging, extending our dynamic range 1.5 magnitudes fainter.
In §2 we describe our imaging dataset, and give a detailed explanation of our photometry procedure, which has evolved from F23 to increase our color and flux accuracy.In §3 we outline our photometric redshift sample selection procedure, and discuss the available spectroscopy for our candidate galaxies.Our results are presented in §4, where in §4.2 we compare the observed sur-face density of galaxies to pre-launch predictions, and in §4.3 we calculate the rest-UV luminosity functions at z ∼ 9, 11 and 14.We discuss these results in the context of a variety of more recent simulation predictions in §5, and present our conclusions in §6.We assume the latest Planck flat ΛCDM cosmology with H 0 = 67.36km s −1 Mpc −1 , Ω m = 0.3153 and Ω Λ = 0.6847 (Planck Collaboration et al. 2020).All magnitudes are in the absolute bolometric system (AB; Oke & Gunn 1983).

DATA
The CEERS NIRCam imaging survey consists of 10 NIRCam pointings in the CANDELS Extended Groth Strip (EGS) field, done in parallel with prime MIRI and NIRSpec observations.These data were taken in two epochs.On June 21-22, 2022 four NIRCam pointings were obtained (known as pointings NIRCam1, 2, 3 and 6; results presented in F23).The remaining six pointings (NIRCam4,5,7,8,9 and 10) were completed on Dec 20-24, 2022.All 10 pointings include the same filter coverage of F115W, F150W, and F200W in the shortwavelength channel, and F277W, F356W, F410M, and F444W in the long-wavelength channel.In this analysis we make use of the official CEERS publicly released mosaics (DR0.5 for the June data, and DR0.6 for the December data).These images are available on the CEERS website (https://ceers.github.io/releases.html)and on MAST as High Level Science Products via 10.17909/z7p0-8481.

Data Reduction
We reduce the NIRCam imaging following the procedures outlined in Bagley et al. (2022b), which we briefly summarize here.We use the JWST Calibration Pipeline (Bushouse et al. 2022) with custom modifications and additional steps needed to remove features such as snowballs, wisps, and 1/f noise.Our reduction process is the same for all images, though we use different Pipeline and Calibration Reference Data System (CRDS) versions for the two epochs of CEERS NIR-Cam imaging: Pipeline version 1.7.2 and CRDS context 0989 for the Epoch 1 images (CEERS DR0.5, obtained in June, 2022, and including pointings NIRCam1, 2, 3 and 6), and Pipeline version 1.8.5 and CRDS context 1023 for the Epoch 2 images (CEERS DR0.6, obtained in December, 2022, and including pointings NIRCam4, 5, 7, 8, 9 and 10).There were no major changes between these pipeline+CRDS versions which we would expect to affect the photometry.
For our astrometric alignment and analysis we make use of the archival HST imaging from the All-wavelength Extended Groth Strip International Survey (AEGIS, Davis et al. 2007), the Cosmic Assembly Deep Extragalactic Legacy Survey (CANDELS, Grogin et al. 2011;Koekemoer et al. 2011), and the 3D-HST (Momcheva et al. 2016) surveys.The entire CEERS field is covered by F606W, F814W, F125W, F140W, and F160W; portions are covered by F105W (we note that the F140W imaging is very shallow [800 sec], thus while we include it in our catalog, we do not include these data in any figures below).We use the CEERS v1.9 1 HST EGS mosaics, which are created from these datasets, aligned to Gaia DR3, and on a 30mas pixel scale.We use a modified version of the TweakReg routine to align the images, using the HST F160W mosaic as the astrometric reference in all pointings except NIRCam3 and 9.In these two pointings, a low-level guide star tracking issue in the HST imaging caused a sub-PSF shift across a portion of the F160W mosaic, and so we use the NIRCam F277W (pointing 3) and F356W (pointing 9) mosaics as the absolute references.In all pointings, the RMS of the relative, NIRCam-to-NIRCam astrometry is ∼5-10 mas.
We create mosaics on a 30mas pixel scale in all filters, and extract smaller cutouts of the HST mosaics to match the footprints of the drizzled NIRCam images, providing pixel-aligned imaging in up to 13 filters per field from ∼ 0.5 − 5µm.Finally, we perform a global, two-dimensional background subtraction on the mosaics to remove any residual background variations.This method first performs a tiered source detection to identify progressively smaller sources in each filter.Then the source masks in each filter, including all available HST images, are combined into a single merged mask such that pixels with source flux identified in any filter are excluded when measuring the background.See Bagley et al. (2022b) and F23 for more information on the background method and details on the method's performance in CEERS images.

Photometry
We perform photometry on all 10 CEERS fields using Source Extractor (hereafter SE, Bertin & Arnouts 1996).Photometry is performed on each of the 10 pointings independently (the high-redshift galaxy sample is screened for duplicates in the small overlapping areas, as discussed below).The photometry process here is similar to F23, with some key differences designed to improve the photometric validity, as described below.Our fiducial photometry is measured in elliptical Kron apertures, using a Kron factor =1.1 and a minimum radius 1 ceers.github.io/releases.html#hdr1= 1.6, following F23.These small Kron apertures result in optimal signal-to-noise.We derive accurate colors in these apertures by matching the image PSFs between different filters, and calculate accurate total fluxes via a two-step simulation-based aperture correction process.

PSF Matching
We create empirical point-spread functions (PSFs) in each filter by stacking stars.We select stars by identifying the stellar locus in a plot of half-light radius versus magnitude in a preliminary SE run in each filter.Each star is then inspected to ensure it appears to be a nonsaturated point source in a non-crowded region.We make a single PSF per filter by stacking stars across all 10 pointings (as all observations used the same dither pattern).For each star, we extract a 101x101 pixel box, upsample by a factor of 10, measure the centroid, and shift the star to be centered in this upsampled image.We then downsample back to the native resolution, rotate the star by a random position angle (to account for situations when the position angle of the observations was not identical), and normalize the star's peak flux to unity.The final PSF is made by median-combining the individual stars.The final PSFs have a centroiding accuracy of ∼0.05-0.1 pixels.
We then use these PSFs to derive kernels to match the PSFs in images with PSF FWHMs smaller than that of F277W to the F277W PSF.This includes the NIRCam F115W, F150W and F200W images, and the HST/ACS F606W and F814W images.Kernels were created with the pypher Python routine2 (Boucaud et al. 2016).The NIRCam F356W, F410M and F444W filters, along with all four HST/WFC3 filters, F105W, F125W, F140W, and F160W, have PSF FWHMs larger than that of the NIRCam F277W filter.To correct for missing flux when performing aperture photometry on these larger PSF images, we first convolve the F277W image to match the PSF of a given larger-PSF image, and then derive source-specific correction factors as the ratio of the F277W flux prior to convolution to that after convolution (where the larger PSF images will have less flux in the aperture).These correction factors are then applied to photometry measured on the larger PSF images, to account for the missing flux.We tested our PSF-matching process by measuring curves-of-growth of the PSF stars in the images, finding that the median enclosed flux at an aperture diameter of 0.3 ′′ was within 5% (and often less) of the F277W value for all filters.
In F23 we did not employ these correction factors as we matched all NIRCam bands to the F444W PSF.
However, by matching here to the smaller-PSF F277W image we better optimize signal-to-noise by not smoothing to the largest PSF.Additionally, as the HST/WFC3 images have larger PSFs than even F444W, in F23 we devised correction factors by comparing the HST fluxes to previous HST catalogs.Our updated method here is independent of other catalogs, and is thus more self consistent.We confirm that our fluxes here are consistent to within 5% on average (and often lower) with the fluxes from F23.

Catalog Creation
We use the inverse-variance-weighted sum of the non-PSF-matched F277W and F356W images as our detection image, to better detect faint sources.Using this detection image, we run SE cycling through the seven NIRCam images and six HST images as the measurement image.The key SE parameters were: DE-TECT THRESH=1.4,DETECT MINAREA=5 pixels, and a top-hat convolution kernel with a width of 4 pixels, similar to F23.We force SE to skip the background subtraction step as this was previously removed ( §2.1).We use MAP RMS for the source weighting.As the pipeline-produced ERR images include Poisson noise, they are not appropriate for source detection.We thus convert the weight map associated with the detection image into an effective RMS map by taking 1/sqrt(WHT), and assign this to the detection image.For the measurement image, we use the pipeline ERR image.
We estimate an aperture correction to the total flux for these small apertures by performing a second run of SE on the F277W image with the Kron parameters set to the default "MAG AUTO" parameters of (2.5, 3.5), deriving an aperture correction as the ratio between the flux in this larger aperture to that in the smaller aperture for each object in the F277W catalog, which we then applied multiplicatively to the fluxes and uncertainties for all filters (thereby correcting fluxes to an estimated total, but not changing colors or signal-to-noise values).
As several previous studies have noted that the default Kron parameters we use for this aperture correction can miss light in the wings of the PSF (e.g., Bouwens et al. 2015;Finkelstein et al. 2022a), we estimate residual aperture corrections using source-injection simulations, adding 3000 mock sources to our real images in each field.We add sources from m = 22-28.5mag (to ensure a robust photometric measurement), with a log-normal half-light radius distribution peak-ing at ∼1.5 pixels (∼0.2 kpc at z = 10; compact but modestly resolved, comparable to high-redshift sources), with a log-normal Sérsic parameter distribution, peaking at 1.2.These mock sources were generated with galfit (Peng et al. 2002) and added at random positions to the F277W and F356W images.In galfit, the total flux is calculated such that half the flux is within r e .We combined the two images to create a detection image, ran SE in the same way as on our real data to generate a F277W catalog, and estimated residual aperture corrections as the ratio of input-to-recovered fluxes for recovered sources.While in F23 we derived a single factor, here we note that the residual correction needed is magnitude dependent.We thus fit a linear function to the flux ratios as a function of magnitude over 24 < m < 28, finding a correction ranging from ∼2% at m < 22, to ∼20% at m = 28.We performed this linear fitting in each field, applying these residual aperture corrections to every source (placing a bound on the corrections applied to be from 1.0 -1.2).We note that in the magnitude range of 25 < m < 26 used by F23, we see a similar correction factor (1.08) as they derived.

Flux Uncertainties
We derive flux uncertainties empirically based on the number of pixels in an aperture.We fit for the noise as a function of aperture size by measuring the fluxes in circular apertures with 30 different diameters, ranging from 0.1 ′′ (3.33 pixels) to 3 ′′ (100 pixels).We place nonoverlapping apertures randomly, avoiding pixels with zero values in the error image, and positive values in the segmentation map.We do this in two iterations, placing 3000 apertures with diameters <1.5 ′′ , and 500 apertures with larger diameters.We measure fluxes at these positions in all aperture sizes, calculating the 1σ noise in each aperture size by measuring the median absolute deviation of the measured flux values (multiplying by 1.48 to convert to a Gaussian-like standard deviation).Finally, we fit a curve to the noise in a given aperture as a function of pixels in that aperture, using this equation (Gawiser et al. 2006): where σ N is the noise in an aperture containing N pixels, and σ 1 is the pixel-to-pixel noise measured in each image as the sigma-clipped standard deviation of all non-object pixels (see Figure 3 in Finkelstein et al. 2022a for an example of this process).Compared to F23, here we include the second term in the parentheses as we find a better fit to the data.We fit the four free parameters with an IDL implementation of emcee (see Finkelstein et al. 2019 for details), taking the median of the posterior as our fiducial values.We use these functional form fits for each filter to calculate the photometric uncertainties for each object, using both the number of pixels in its Kron aperture (Area = πab, where the semi-major [minor] axis a [b] is equal to the A[B] IMAGE keywords multiplied by the KRON RADIUS value), as well as the area value for a given circular aperture.We scale these values by the ratio of the error image value at the central position of a given source to the median error value of the whole map, thereby allowing the noise to be representative of the noise level around a given galaxy.We refer to these measurements as our "global" noise measurements, which we use as our fiducial value.Finally, to account for variable image noise not captured by the error image value at the central pixel, for each object in our catalog we also calculate a "local" noise measurement.This local noise was calculated in apertures with 0.2 ′′ , 0.3 ′′ , 0.4 ′′ and 0.5 ′′ -diameters, measured as 1.48 times the median absolute deviation of the flux distribution in the 200 closest apertures from the above process.

Multi-band Catalog
For each object in the catalog we use astropy.wcswcs pix2world to derive celestial coordinates from the SE x, y positions (SE cannot presently parse the worldcoordinate system in the JWST data model image headers).We calculate physical fluxes by applying a photometric zeropoint to convert the image from MJy sr −1 to erg s −1 cm −2 Hz −1 , and apply both aperture corrections derived above to all flux and flux error estimates.We correct for Galactic extinction using an E(B-V) of 0.006 for the EGS field and a Cardelli et al. (1989) Milky Way attenuation curve.We create a multi-band catalog from the individualfilter catalogs created by SE, including our fiducial Kron apertures and fluxes measured in circular apertures with diameters ranging from 0.05 ′′ to 2.0 ′′ .For the latter we include fluxes measured from both the PSF-matched and native-resolution images; the latter are used below as a measure of detection significance.These circular apertures are corrected for Galactic attenuation, but not corrected to total, as we will use them solely for detection significance.For all flux measurements we calculate the noise per source following the above methods.We flag any sources that had either a zero or NaN in any error column, replacing their flux error with 10 12 nJy (several orders of magnitude larger than any real source error) such that these flux measurements do not impact any analysis.The final catalog contains only objects with valid measurements in the six broad-band filters, excluding the short-wavelength chips gaps, covering a total area of 88.1 arcmin 2 .
In Table 1, we include an estimate of the limiting 5σ magnitude for our catalog.To calculate this, we use the noise functions described above to derive the flux density uncertainty in an aperture of diameter 0.2 ′′ .We then measure the enclosed flux at this radius from the stacked PSF.We then divide the flux uncertainty by the enclosed fraction of flux to estimate the total noise for a point source.Finally, we multiply this value by five, and convert to an AB magnitude.This final multi-band catalog was known as "v0.51" internally to the CEERS team, and has been used in a variety of analyses (e.g.Arrabal Haro et al. 2023a,b;Vega-Ferrero et al. 2023a;Larson et al. 2023;Vega-Ferrero et al. 2023b;Ronayne et al. 2023).

Photometric Redshifts
The final photometric catalog contains 101,808 sources across the entire CEERS field (86.8% of which have signal-to-noise greater than three in F277W).We create a new numerical identifier as an ascending integer starting at 1 in CEERS1, adding in each field sequentially.We measure photometric redshifts for all sources in our 13-band photometric catalog using EAZY (Brammer et al. 2008).We perform three iterations of EAZY: (1) Fiducial : using our fiducial Kron, aperture corrected photometry with a maximum redshift of 20, (2) Low-z : The same as the fiducial run, but with the maximum redshift set to seven (allowing visualization of the best-fitting low-redshift model), and (3) circular : Replacing the Kron fluxes with the flux measured in d =0.2 ′′ diameter apertures, with a maximum redshift of 20 (see §3.2.2).
EAZY fits non-negative linear combinations of usersupplied templates to derive probability distribution functions (PDFs) for the redshift, based on the quality of fit of the various template combinations to the observed photometry for a given source.We use the same customized template list as F23, including the 12 FSPS (Conroy & Gunn 2010) templates in the recommended "tweak fsps QSF 12 v3" set, supplemented with six templates created by Larson et al. (2022b) to span the blue colors expected for early galaxies,.We assume a flat prior in luminosity, and include a systematic error of 5% of the observed flux values, and fit to our measured total flux and flux error values.

Selection Criteria
Here we describe the selection criteria we use to identify candidate z > 8.5 galaxies.Following our previous work (Finkelstein et al. 2010(Finkelstein et al. , 2015(Finkelstein et al. , 2022a(Finkelstein et al. ,b, 2023)), we use a combination of flux detection significance values and quantities derived from the full photometric redshift PDF, denoted P(z), to select our galaxy sample.We also make use of the peak P(z) redshift, denoted z best .All signal-to-noise ratios (SNRs), unless stated otherwise, are measured in 0.2 ′′ diameter apertures in the native resolution (non-PSF-matched) images.We note that while we primarily use the global empirical noise measurement, we also make use of the local noise measurements with slightly relaxed criteria as described below.
We also make use of SNR criteria in filters blue-ward of the Lyα break ("dropout" filters, which should contain no significant flux).To identify these filters, for each object we first zero out the P(z) at z < 7.5 such that any low-redshift solution does not impact the dropout filter choice (followed by renormalizing the P[z]).We then consider a filter to be a dropout filter if the wavelength corresponding to the red side of the filter transmission's FWHM is less than the observed Lyα wavelength at the 16th percentile of the renormalized P(z).In this way we make use of the full P(z) when calculating which bands should be considered a dropout filter.Here we list our selection criteria, separated into two categories: Detection Significance Criteria: • Detection Signal-to-Noise: SNR > 5.5 (>5.0) in at least two of the F150W, F200W, F277W, F356W, or F444W filters, using the global (local) empirical noise measurements.
• Dropout Signal-to-Noise: SNR < 2.0 (3.0) in all bands fully blue-ward of the Lyα break, and no more than one filter with SNR > 1.5 (2.0), using the global (local) empirical noise measurements.For this we consider the ACS F606W and F814W filters, and the NIRCam F115W, F150W, and F200W filters (we note that some candidates have ∼1-3σ flux measurements in WFC3 bands nominally below the Lyα break; however we consider these spurious as such objects show significant non-detections in the deeper shortwavelength NIRCam bands).
• Error-map values at the central pixel of the object <1000 in the F115W, F150W, F200W, and F277W filters.This ensures valid flux measurements in a minimum set of filters to robustly select z > 9 galaxies (and explicitly excludes the NIRCam short-wavelength chip gaps).
• Total F277W magnitude ≤ 29.2 as a conservative estimate to limit low-significance sources.
• S z ≥ 9, where S z is calculated as the unit redshift where the integral in a z ± 0.5 bin is the maximum compared to all other unit redshift bins.
These criteria are very similar to those in F23, with the key differences being the selection of dropout bands, the more conservative SNR < 2 dropout criteria (compared to SNR < 3 in F23), and the addition of the local noise measurements.
The combination of this sample selection criteria yielded an initial sample of 185 z > 8.5 galaxy candidates across the full CEERS field.We visually inspected all objects (examining cutout images in all filters, as well as the photometric SED and the P(z) curves) to identify any spurious sources, or potential cases where the SE photometry could be questionable (e.g., the presence of a bright neighbor).We removed 91 objects through this screening process.Cutout images for all removed objects as well as a table of their positions are shown in Appendix B.
Objects were removed for a variety of reasons, but the vast majority (78/91) were sources of an obvious spurious nature, including 10 sources identified as diffraction spikes, 45 sources associated with image edges, and 23 sources identified as bad pixels.The latter had a characteristic observational signature of a compact (few pixel), boxy morphology in the long-wavelength channel images only.We note that a few of our removed badpixel sources appear slightly less boxy than the bulk.It is possible these are real astrophysical objects at z ≳ 16 (where the Lyα break would be redward of F200W).However, as they are visible in the three long-wavelength channels only it is much more plausible (if indeed they are not bad pixels) that they are faint objects associated with the known z ∼ 4.9 overdensity in this field (Arrabal Haro et al. 2023a), or low-redshift dusty galaxies (e.g.Bisigello et al. 2023), rather than true z ∼ 16 galaxies.Improved outlier pixel rejection in future iterations of our data reduction could reduce the frequency of such objects.Likewise, use of the context map ("CON") extension of the JWST data model could remove the need to manually identify image edges.We do note that in some cases, objects removed due to image edges have legitimate long-wavelength photometry, but are adjacent to a short-wavelength chip gap.
The remaining 13 sources were a combination of objects identified as being associated with nearby bright galaxies, inadvertently split into separate objects by SE (10 objects), and objects where the photometric SED did not appear accurate (three objects); e.g., obvious flux visible in a dropout filter that was not accurately recorded (usually due to crowding).Of the full sample of 91 sources removed, a small number (five) are plausible to still be true high-redshift galaxies.While we conservatively keep them out of our sample, we note them specifically in Tables 8 and 9 in Appendix B.
Finally, we check this sample for duplicates which could arise as the 10 NIRCam CEERS pointings have slightly overlapping edges.We find one object which appears in our catalog twice, encouragingly satisfying our sample selection with two independent photometric measures.This object appears in CEERS4 (as ID 42447) and CEERS10 (as ID 93725), with similar P(z) distributions.As this object has slightly higher signalto-noise in the CEERS4 photometric catalog, we keep ID=42447, and remove ID=93725 from our catalog.We note that future work making use of a complete mosaic of all 10 NIRCam pointings will make better use of these near-edge regions by combining the images from both pointings.After removal of the 91 spurious sources and the one duplicate, the sample consisted of 93 candidate galaxies.

Spectroscopic Redshifts
Among the more transformative advances of JWST is NIRSpec's ability to efficiently spectroscopically confirm galaxies via strong [O III] line emission to z ∼ 9.5 and via Lyα spectral breaks to higher redshift.We make use of followup spectroscopy from both the CEERS spectroscopic program (which in its second epoch placed NIR-Spec slits on some sources observed in the first epoch   4.9 (Arrabal Haro et al. 2023a), the agreement is generally good.After removal of three sources (in red) with spectroscopic redshifts below our cut of z > 8.5, we find a median (mean) z phot − zspec =0.1 (0.3), with 8/13 sources having |z phot −zspec| < 0.2.As noted by Arrabal Haro et al. (2023b) and Fujimoto et al. (2023a) there does appear to be a mild systematic offset towards higher photometric redshifts.We also note that there are four sources with z phot > 10 observed with NIRSpec that showed no detectable signatures, which is consistent with z > 10, where all strong lines are shifted out of the NIRSpec range.Cross-matching our sample of 93 galaxy candidates to these spectroscopic lists, we find 17 sources which have published spectroscopic redshifts.A comparison between the photometric redshifts and spectroscopic redshifts is shown in Figure 1.Not shown in this figure is the single catastrophic redshift failure (defined as 3), the galaxy (ID=13256) originally presented in Donnan et al. (2023b) as having z ∼ 16.5, with similar redshifts proposed by Harikane et al. 2023a andFinkelstein et al. (2023); a lower redshift of z phot = 4.6 was proposed in Pérez-González et al. 2023a.As discussed in Arrabal Haro et al. (2023a), this object has a confirmed redshift of z ∼ 4.9, and is the result of a very pathological situation where at this specific redshift a red galaxy with extreme line emission can mimic a z ∼ 16 galaxy as the Hα line falls in all three of the F356W, F410M, and F444W filters, while [O III] enhances F277W (see also discussion in Zavala et al. 2022).
In addition we remove one object that also has z phot ∼ 16 (ID=43382, with a 68% confidence range on the photometric redshift of z = 15.9 -19.2).While fainter than ID=13256 (F277W = 28.8versus 26.5), its spectral signature is almost identical, thus we consider it a likely fainter companion to the z ∼ 4.9 overdensity confirmed in Arrabal Haro et al. (2023a).
We find three additional sources with spectroscopic redshifts below our nominal redshift cut of z > 8.5, which we thus remove.These are: ID=4774, ID=4777 and ID=23084.ID=4774 has a photometric redshift 68% confidence limit (CL) of 8.26-10.27,with z spec = 8.01.ID=4777 has a photometric redshift 68% CL of 9.43-11.05,with z spec = 7.99.ID=23084 has a photometric redshift 68% CL of 8.08-9.22,with z spec = 7.77.While the spectroscopic redshift is outside the 68% confidence range on the photometric redshift, the redshifts are not catastrophically low.We tabulate the five removed sources, including their celestial coordinates, in Table 10 in the Appendix.
Beyond these five removed sources, we find generally good agreement between the photometric and spectroscopic redshifts, with a median (mean) z phot −z spec =0.1 (0.3), with 8/13 sources having |z phot − z spec | < 0.2 (the median offset was 0.2 prior to removal of the five sources in the preceding paragraph).As noted by Arrabal Haro et al. (2023b) there does appear to be a systematic offset towards higher photometric redshifts.One likely explanation for this is that the shape of the Lyα break in these galaxy spectra is more extended than the sharp break assumed in the IGM attenuation models employed by EAZY.This could be due to a variety of factors, including stellar population properties not accounted by the typically used templates (e.g., Arrabal Haro et al. 2023b), the Lyα damping wing from an increasingly neutral IGM (e.g., Curtis-Lake et al. 2023;Arrabal Haro et al. 2023b;Umeda et al. 2023), and/or extremely dense line-of-sight damped Lyα systems in close proximity to a given galaxy (Heintz et al. 2023;Hsiao et al. 2023).Additionally some of the spectroscopic redshifts we compare to come from the Lyα break alone, which has additional uncertainties (see discussion in Fujimoto et al. 2023b).It is important to note that several additional sources were spectroscopically observed but not detected (indicated as "Nz" in Tables 3 and 6); this is modest evi-dence in favor of z > 9.6, as at lower redshifts [O III]+Hβ should have been detectable.For the remainder of our analysis, we use the spectroscopic redshift values when they are available, which is the case for 13 of our final sample of 88 candidate galaxies.

Kron Aperture Corrections
During the visual inspection step we found that some legitimate high-redshift galaxies had Kron apertures which appeared much larger than the galaxy in question, stretched by nearby galaxies.Similar to F23, we devise a correction to ensure that flux from neighboring galaxies does not bias the colors nor the total fluxes.To identify sources where this is needed, we explore the ratio between the area of the Kron aperture and the area of a d =0.2 ′′ circular aperture.Our galaxy sample shows a log-normal distribution, with a peak aperture ratio of ∼2, with a tail to higher values.There is a notable gap at a ratio of ∼10, thus we flag sources with aperture size ratios larger than this as potentially needing a correction.
We find just one source meets this criterion, ID=11384 (with an aperture ratio of 14.7; the next highest was 9.3).Upon inspection of this source, it is very compact, but is in a region of high background with a very bright galaxy ∼1.5 ′′ to the NW, and a modestly bright galaxy ∼0.5 ′′ to the S, resulting in an elongated Kron aperture in the N-S direction.For this object we thus make use of colors measured in d =0.2 ′′ circular apertures.To calculate total fluxes we derive an aperture correction as the median of the ratio of the total F277W flux to the flux measured in a d =0.2 ′′ circular aperture for all sources in our full photometry catalog with d =0.2 ′′ fluxes within 20% of the F277W =0.2 ′′ flux for this object.We find this correction factor is 2.9 for ID=11384, consistent with the values for other sources in our galaxy sample (median of 2.3 ± 0.8).
For this object we then replace its default fluxes with these new values, and adopt the photometric redshift results from colors measured in the small circular apertures.This increases the photometric redshift from 10.8-11.4 to 11.2-11.8.Of note is that this source has a spectroscopic redshift of 11.043 (Harikane et al. 2023a;Arrabal Haro et al. 2023a).While the uncorrected photometric redshift is more consistent, the higher value from our improved photometry matches the observed very shallow Lyα break observed in the prism spectrum of this source (the redshift inferred from this break in the prism spectrum is z ≈ 11.4; Arrabal Haro et al. 2023a), which can lead to minor photometric redshift overestimates as discussed in the previous subsection.

Lyα Break
As a cross-check on our Lyα-break criterion, we examine our sample for objects where there is a discrepancy between the primary photometric redshift peak and the SNR in bands nominally below the break; such sources can still satisfy our criterion for inclusion in a highredshift bin if their P(z) is bimodal or somewhat broad.
We identify two sources in our nominal z ∼ 11 sample (see §3.4 for sample definitions) which have P(z)'s that exhibit two high-redshift peaks; a larger peak at z > 9.5, and a significant secondary peak at z ∼ 9.These objects are ID=17898 and 42447 (neither have spectroscopic redshifts).Both objects exhibit SNR > 2 in F115W.As the lower 16th percentile of their P(z) was at z < 9.54 (the redshift of Lyα at the red edge of the F115W filter FWHM), this significant F115W flux did not violate our selection criteria.However, as thus flux is measured as significant (SNR=3.1 and 5.7 for these two objects, respectively), the z ∼ 9 peak is more likely to be correct.For these two objects, we thus applied a prior to their P(z), setting them to zero at z > 9.54, renormalizing to unity and recomputing the bestfit photometric redshift as the new peak.We find that the peak photometric redshift for these sources changes from z = 10.45 to z = 9.07 for ID=17898 and z = 10.30 to z = 9.13 for ID=42447.We confirm that both sources continue to satisfy our selection criterion of P(z > 7) > 0.7.
We also do a similar analysis for sources in our z > 13 sample.We find two sources with SNR in F150W > 2. These objects (ID = 2067 with SNR = 2.57, and ID = 77647 with SNR = 2.45) both exhibit a broad P(z), extending down to z ∼ 11, thus this F150W flux did not violate our selection criteria.We apply a similar prior, setting P(z > 12.72; corresponding to the red edge of F150W) to zero.We find that the peak photometric redshift for these three sources changes from z = 13.69 to 12.70 and z = 13.57to 12.70 (e.g., the new peak is at the edge of the prior).
We acknowledge that while EAZY had knowledge of these observed SNR ratios and still found a preferred peak at higher redshift, the obvious real flux in the images left us confident that applying this prior to the P(z) will result in more accurate redshift estimates.These P(z) priors were applied in the completeness simulations discussed in § 4.1.In Figures 4 and 5 we show the original P(z) as a faded black curve for these sources.

Stellar Contamination
The colors of high-redshift galaxies, especially between 1-2 µm, can be degenerate with low-mass stars and substellar objects (e.g.Wilkins et al. 2014;Finkelstein 2016).While the photometric coverage from 1- Only one object is formally compact with a stellar χ 2 comparable to the best-fitting EAZY model, though even for this object we conclude it is likely extragalactic due to its non-compact appearance in the imaging, and the very large (∼4 kpc) implied distance were it stellar.We conclude that stellar contamination is not significant in our sample.
5µm should mitigate this confusion, here we explore whether the colors of our candidate galaxies could plausibly be consistent with low-mass stars or brown dwarfs.We fit each candidate to a grid of lowtemperature, cloudy, chemical equilibrium substellar atmosphere models from Sonora-Diamondback (Morley et al. 2023, in prep).We explore a range of temperatures T ∼ 900-2400 K, surface gravities g = 100 and 3160, and metallicities [M/H] = 0 and −0.5.We convolve the model SEDs with the HST +JWST filter curves and perform a simple grid-fitting routine, scaling the fluxes of each model to minimize the χ 2 .We adopt the model with the lowest χ 2 as the best-fitting stellar model.We estimate the implied distance by scaling the model fluxes and assuming an intrinsic radius of 1 Jupiter radius.We note that we also ran fits with cloud-free models grids extending to particularly cold (Sonora-Bobcat, T ∼ 200-1300 K; Marley et al. 2021) and low-metallicity (LOWZ, [M/H] = −1; Meisner et al. 2021) parameter spaces; however, none provided a better fit over the Sonora-Diamondback models.
In Figure 2 we show the results of this analysis.Here we plot the SE measured F200W half-light radius for our 88 candidate galaxies versus their apparent F200W magnitude, compared to the half-light radius of the F200W PSF as measured from stars in the image.One can see that the majority of our galaxy sample is clearly resolved, thus non-stellar in origin.To diagnose the potential stellar nature of the more compact objects, the data points are color-coded by the difference in the goodness-of-fit (χ 2 ) between the best-fitting (sub)stellar model and the best-fitting EAZY model.We find that no objects are better fit by the stellar model (e.g., the EAZY χ 2 is always lower).We do find two sources where the difference between the stellar and EAZY χ 2 is < 4. One is significantly resolved, but the other object (ID=34925) is measured by SE as being very compact (in fact, unphysically smaller than the PSF).However, we conclude this object is much more likely a galaxy.First, examining the imaging of this object, it does not appear to be obviously point-like; rather, it is faint, barely above our significance thresholds, thus the SE half-light radius is quite uncertain.Second, due to its faint brightness, its implied distance (were it stellar in origin) would be ∼4 kpc, which would be extremely far into the halo, and thus highly unlikely.We conclude that we find no evidence for stellar contamination in the sources in our sample.

Sample Summary
After removing visually-identified spurious sources, the four sources with z spec < 8.5, and the faint z ∼ 16 candidate which is likely at z ∼ 4.9, our sample contained 88 candidate z > 8.5 galaxies.For our analysis we divide these candidate galaxies into three sub-samples: z ∼ 9, which contains the 58 galaxies with 8.5 ≤ z best ≤ 9.7; z ∼ 11, which contains the 27 galaxies with 9.7 < z best ≤ 13; and z ∼ 14, which contains the three galaxies with z best > 13.
For all objects in our sample we calculate an observed rest-UV absolute magnitude following Finkelstein et al. (2015).Briefly, we perform a simple round of SED fitting with BC03 (Bruzual & Charlot 2003) models to derive a best-fitting model spectrum.We then calculate the bandpass-averaged flux from this spectrum in a tophat filter curve spanning 1450-1550 Å in the rest-frame, converting to an apparent magnitude and then applying the cosmological distance modulus for a given redshift.As a part of this process, we run Monte Carlo simulations sampling the photometric redshift P(z), such that the resulting uncertainty on M 1500 is inclusive of both the photometric scatter and redshift uncertainty.We note that in this process we set P(z < 6) = 0, such that any low-redshift solutions (which are small by design) do not bias the magnitude calculation.During this pro- -Note-A summary of the key properties for the 30 galaxies in our sample at z > 9.7 (the remaining 55 galaxies with 8.5 ≤ z < 9.7 are presented in the appendix).CF U V is defined in §3.4 as the rest-frame far-UV color.The photometric redshift is "za" from EAZY, which is the redshift where the χ 2 is minimized.The 20 7 P(z) quantity is the integrated redshift probability density between z = 7 and 20, which was used in the sample selection.The ∆χ 2 compares the best-fitting low-redshift (0 < z < 7) model to the best-fitting high-redshift model; a value of ≥ 4 was required for selection.Spectroscopic redshifts come from Arrabal Haro et al. (2023a) and Arrabal Haro et al. (2023b); we list "Nz" when an object was spectroscopically observed but no robust redshift was determined.As discussed in the text, this is modest evidence in favor of z > 9.6, as at lower redshifts [O III]+Hβ should have been detectable.
cess we also calculated a rest-far-UV color (discussed in §5), dubbed C F U V , calculated as where f 1470 and f 1850 are the bandpass averaged fluxes in top-hat filters spanning 1430-1510 and 1800-1900, respectively (where these windows were designed to probe the color in the far-UV avoiding strong spectral features).
We show the distribution of our sample in F277W magnitude and photometric redshift in Figure 3. Fig We compare our sample to that of McLeod et al. (2023) and Adams et al. (2023), who have also selected z ≳ 8.5 galaxies from the full CEERS dataset.McLeod et al. (2023) restrict their sample to galaxies with detection signal-to-noise greater than eight, thus their sample is smaller than ours, at 23 galaxies.We find that 14 of these 23 galaxies are in our final sample.We explored the properties of the remaining nine galaxies in our catalog.We find that 2/9 objects have dominant z > 9 solutions, but fail our ∆χ 2 criterion.The remaining seven objects all have plausible high-redshift solutions (based on little-to-no detectable flux in F115W), but have dominant low-redshift solutions.
While Adams et al. (2023) did not provide a catalog in their paper, we obtained the revised version of their sample via private communication.Their final sample of z > 8.5 galaxies consists of 55 objects, of which 25 are in our final z > 8.5 galaxy sample.All 30 not contained in our sample are present in our parent photometric catalog, all with a plausible redshift solution at z > 7 (16/30 have >50% of their integrated P(z) at z > 7 in our catalog).Two of these sources nominally pass all our selection cuts except their best-fitting photometric redshifts are z ∼ 7.5-8.2.Of the remaining 28, 26 do not pass our ∆χ 2 cut, including 18 which do not pass our integrated P(z) cut.This is likely driven in some cases by weakly positive (≥1σ significance) flux present in F606W and/or F814W in 14/28 objects, down-weighting high-redshift solutions.Finkelstein et al. (2023) In F23 we presented 26 z > 8.5 candidates from the first epoch of CEERS, using 4 of the 10 CEERS NIR-Cam fields used here.As our photometry and sample selection procedures here are slightly updated, we crosschecked our sample with the F23 sample.We find that 20/26 galaxies presented in F23 are included in our sample here.Of the six galaxies not included here, four of them originally satisfied our sample selection but were removed because they are now known to have spectroscopic redshifts z spec < 8.5 (see §3.2.1).These are ID=4774 (F23 ID 3908), 4777 (F23 ID 3910), 13256 (F23 ID 2159), and 23084 (F23 ID 1748).The remaining two sources satisfied all sample selection criteria except the SNR below the Lyα break.ID 2241 (F23 ID 1875) has a 1.9σ detection in F606W and 1.98σ detection in F814W, failing that criterion.ID 8497 (F23 ID 7227) has a 2.6σ detection in F115W, which fails this criterion for this object's P(z) distribution, which peaks at z = 11.2.

Differences from
In our present work over these four fields we identify 35 z > 8.5 candidates, which includes 20 galaxies from F23 and 15 new sources.We explored these new sources to see why they were not included in the F23 sample.All 15 sources are present in the F23 photometric catalog.Of these 15, 11 sources have best-fit photometric redshifts in the Finkelstein et al. (2023) catalog of z > 8.5.Comparing the photometric redshifts of these 11 sources in this previous catalog to our own, we find a median photometric redshift difference of zero (with a mean difference of 0.11, with the new values being slightly smaller).The majority of these sources (8/11) were previously excluded as they failed the ∆χ 2 criterion (with values of ∼ 1-3).The other three fell just on the other side of the F23 best-fit photo-z, detection threshold, or Lyα break non-detection thresholds.Of the four objects fit at lower redshift, all three have significant high-redshift peaks, with P(z) > 7 of 0.73 (ID 10545), 0.43 (ID 17898) and 0.66 (ID 61620; with the remaining source, ID 20174, having a lower significance peak at z ∼ 9).
While this comparison makes it apparent that modest changes to photometry procedures can have notinsignificant effects on the composition of a high-redshift galaxy sample, the changes we implemented to our procedure here over that from F23 were done to increase photometric accuracy, primarily for the colors of faint galaxies.The fact that all new sources which we select had significant high-redshift solutions in the F23 catalog adds confidence to our sample, though we fully acknowledge that spectroscopic confirmation of a majority of this sample is needed for full confidence.Fortunately, with the power of JWST, this is possible.

RESULTS
Here we explore constraints on the abundance of galaxies at z > 8.5 that we can place with our sample selected from the full CEERS survey.In §4.2 we describe measurements of the cumulative surface density, while in §4.3 we describe the rest-frame UV luminosity function.Both measurements require a correction for incompleteness, which we describe in §4.1.

Completeness Simulations
We quantify the completeness of our sample selection, broadly following F23, with an update here to implement an important size dependence.Our completeness estimates come from complete end-to-end source injection simulations, injecting mock galaxies with a range of properties into our images, then performing photometry, photometric redshift measurements, and sample selection procedures identical to our that done on our real data.In this way, we account for incompleteness due to both photometric effects, as well as sample selection effects.
For each of the 10 CEERS NIRCam pointings we run 50 simulation iterations.Within each iteration we simulate 1000 sources over a uniform range of redshift from 8 < z < 17.The majority of the iterations had a log-normal distribution in F277W apparent magnitude peaking at m ∼ 29; we supplement these with additional simulations with a flat magnitude distribution to boost the number of brighter galaxies.The result is a roughly flat distribution from m F 277W = 22-26, with a larger, log-normal-shaped distribution from m F 277W = 26-30.
To simulate the fluxes in all observed HST and JWST/NIRCam filters, we use BC03 (Bruzual & Charlot 2003) stellar population models with a distribution of stellar population age, dust attenuation and metallicity tuned to reproduce the expected (and now observed, e.g., Cullen et al. 2023) blue colors of very high-redshift galaxies (see Finkelstein et al. 2015 for details on these models).The result is a log-normal distribution of rest-UV colors, which peaks at F200W−F277W = −0.05,with a 68% spread from F200W−F277W = −0.25 to +0.3, comparable to the measured colors of our observed objects (median of −0.1, 68% spread from −0.3 -0.3).The resulting model spectrum was then normalized to the F277W magnitude for a given object, with magnitudes in the remaining filters derived by integrating this spectrum through a given bandpass.Source morphologies were created using galfit (Peng et al. 2002) assuming a Sérsic profile with a log-normal distribution in the Sérsic index n (peaking at 1.2, with a minimum of n =1 and a tail to n =5), a log-normal distribution of the axis ratio with a peak at 0.8, and a uniform distribution of the position angle.While in F23 we tuned the half-light radius (r h ) distribution such that the recovered r h distribution matched that observed in our sample, here we update our procedure to implicitly include the size of galaxies in our sample in the completeness calculation.We thus use a uniform input distribution of r h from 1-8 pixels (0.03 -0.24 ′′ ).Finally, the empirical PSF described in §2 is provided to galfit for a given band.
These simulated galaxy images are then created with galfit as 101×101-pixel images, which we add at random positions to the real images (avoiding only image edges and regions of extremely high ERR-map values [>1000]).Notably we do not avoid the positions of real objects, such that our simulations account for incompleteness due to sources along similar lines-of-sight to high-redshift galaxies.We also create a full-frame image just of the simulated sources in F277W, on which we run SE to measure a "noiseless" version of the recovered SE r h in this filter for use with the completeness calculations (as we discuss below, the measured SE r h is biased low from the galfit input value, even in the noiseless image).The result of this process is a version of our data in all filters with these 1000 simulated sources included, as well as a catalog of their input properties (inclusive of the SE r h measurement of the input image).At this point, we run an identical process on these images as we did on our real data in §2 and §3, including creating the array of PSF-matched images, calculating photometry, aperture corrections, and PSF corrections.Photometric catalogs are created, EAZY is run, and M 1500 is calculated.
The recovered catalogs are then matched against the input catalogs, resulting in a combined catalog of all recovered sources with their input and recovered properties.As the imaging depths in the 10 CEERS fields are broadly similar (Table 1), we combine these catalogs from all 10 fields into one.This final input catalog consists of 500,000 simulated sources, of which 343,196 sources are recovered by SE (the ∼70% recovery fraction is expected given the larger number of very faint sources that were simulated).
While in F23 we calculated completeness only as a function of magnitude (with an assumed size distribution), here we explicitly add a size dependence, as many of the observed high-redshift galaxies are clearly resolved.For a given bin in absolute magnitude, we calculate the completeness in bins of F277W half-light radius (with a bin width of 0.5 pixels, starting with a bin centered at 1.2 pixels).For this process, we assume the intrinsic size as the half-light radius measured by SE on the noiseless images, as described above.We show how the completeness depends on both UV absolute magnitude and half-light radius in Figure 7.While there is minimal dependence on size at the brightest magnitudes, as one goes fainter the completeness decreases more steeply for larger sources, as expected.
As we might expect the recovered sizes from both the simulated and real data to be biased, we compare these intrinsic sizes to those measured from the recovered sources.We find that the intrinsic sizes are ∼50% larger.To account for this bias when we calculate the completeness we multiply the recovered SE sizes of our real sources by this scale factor of 1.53 (this quantity has a small scatter of σ =0.06, and is independent of magnitude for m < 29).We then set a minimum size for real sources of r h = 2.7 pixels, comparable to that expected for an unresolved source.We also set a maximum size of r h = 6.0 pixels; one object has a larger (bias-corrected) SE-measured size, yet it is clear from the images these sizes are incorrectly measured due to the presence of a neighboring source.For this object, we assume the median size from the sample.
For our surface density analysis described in §4.2, we calculate a completeness per source, based on the fraction of recovered sources at the observed F277W mag-nitude, photometric redshift, and galaxy size.For our rest-frame UV luminosity function analysis described in §4.3, we calculate the completeness in bins of absolute UV magnitude.In each magnitude bin we calculate a completeness by weighting the recovery fractions in each size bin by the observed number of sources at each size.
We then integrate the comoving volume element across our redshift range with these completeness values, resulting in an effective volume for our sample in a given magnitude bin (a process similar to that done in Finkelstein et al. 2015).These "weighted mean" effective volumes are shown as the circles in Figure 7.They broadly trace the completeness curves for sources with r h ∼ 3.5 pixels, comparable to the median size of objects in our sample.One can see that assuming point sources in these simulations would lead to effective volume estimates modestly larger across M U V ∼ −20 to −19.Our final effective volumes are listed in Table 4.
4.2.The Cumulative Surface Density of Galaxies at z > 8.5 Here we explore the evolution of galaxies at z > 8.5 via the cumulative surface density, updating the results from the first epoch of CEERS from F23. Figure 8 shows the surface density of objects with redshifts greater than z for sources with m F 277W < 28.5.Following F23, we shows the cumulative surface density of galaxies per unit surface area at redshift greater than z for m < 28.5.The dotted line shows the raw counts from our CEERS z > 8.5 galaxy sample, while the shaded region shows the 68% confidence interval on the completeness corrected surface density (including only sources with completeness estimates >20%), inclusive of photometric, photometric redshift, and Poisson uncertainties.The vertical error bars show the estimated cosmic variance uncertainty.The colored lines show predictions from a range of pre-launch simulations (including hydrodynamic, analytic, semi-analytic, and semi-empirical models), all also for m < 28.5.The observed surface density of galaxies lies above most predictions at z > 10, and above all except the Behroozi & Silk (2015) and FLARES models at z > 11.This confirms early results based on smaller samples that the observed abundance of z ≳ 10 galaxies significantly exceeds most pre-launch physically-motivated expectations.
correct for incompleteness by counting each galaxy as one divided by the estimated completeness at the redshift and magnitude of a given galaxy, including here the size of the source (as well as the size bias correction discussed in §4.1).To avoid significant uncertainty introduced by objects with very low completeness values, we further restrict this analysis to objects with completeness measurements greater than 20% (this includes 58 objects of the 66 with F277W < 28.5; this exclusion was not done in F23).We note that our magnitude limit is chosen as a compromise between maximizing the sample size while minimizing the completeness correction.While 25% of our sample is fainter than this limit, the majority of these fainter sources have completeness <20%.On the other hand, increasing the limit to F277W < 28.0 would cut the sample in ∼half, reducing the constraining power of our observations.
The shaded region shows the 68% confidence range on the cumulative surface density, derived via Monte Carlo simulations inclusive of photometric and photometric redshift uncertainties, as well as sampling the Poisson uncertainty.We separately show estimated cosmic variance uncertainties at four different redshifts as the vertical error bars calculated based on the bluetides simulation (Bhowmick et al. 2020, with caveats as discussed in F23).The dotted line shows the raw measurements with no correction.While not shown for clarity, the values from Finkelstein et al. (2023) lie at the low end of our posterior distribution here (which is consistent with our finding of ∼3.4× more galaxies in ∼2.6× more area).
Similar to F23, we find that at z < 10, our observations are higher than most predictions, yet are consistent with the Behroozi & Silk (2015) empirical and delphi SAM models, as well as the FLARES hydro predictions with no dust attenuation.At z > 10, we find that our results lie significantly above all predictions with the exception of Behroozi & Silk (2015).While this agreement is intriguing, this is not a physics-based model, but rather is based on the ansatz that the specific star formation rate in galaxies is proportional to the specific total mass accretion rate into halos.We thus confirm the initial result from F23 that the observed abundance of z > 10 galaxies discovered in CEERS lies above nearly all pre-launch predictions, where here the result is at higher confidence based on an updated sample ∼3× larger than the F23 sample.

The Evolution of the Rest-UV Luminosity
Function at z = 8.5-14.5 Here we present our measurements of the rest-frame UV luminosity function, measured separately for our z ∼ 9, z ∼ 11 and z ∼ 14 samples, where galaxies are again placed into a sample based on their bestfitting photometric redshift (or spectroscopic redshift when available).Stacking the P(z)'s of each sample (as shown in Figures 9 and 10), the median redshift in each bin is z = 8.9, 10.9 and 14.0, respectively.We note that our redshift bins become progressively wider for two reasons.First, at these redshifts, unity redshift bins become small in terms of cosmic time spanned.Second, our boundaries of z = 9.7 and z = 13 correspond to redshifts where the Lyα break falls directly between two NIRCam filters (F115W and F150W, and F150W and F200W, respectively), thus creating natural redshift demarcations given our available data.
We measure our luminosity function following the methodology of F23, here using a bin size of 0.5 magnitudes, with three key differences.The first is that here we use three redshift bins, and in the bin in common (z ∼ 11) here our sample size is nearly ∼3× larger (27, compared to 10 in F23).The second difference is our use of the size of the sources in the completeness correction, which as shown in §4.1 can affect the inferred effective volumes (Figure 7).When calculating the effective volume, we first calculate the effective volume as a function of absolute magnitude and size by integrating over the co-moving volume element as where C(z, M U V , r h ) is the completeness as calculated in §4.1.In each magnitude bin, we then calculate a final effective volume as the weighted mean of the effective volumes in all size bins weighted by the number of sources in that magnitude bin at a given size (corrected for the ∼1.5× radius bias described in §4.1) as ) where i is a radius index, and N (M U V , r h,i ) is the number of galaxies in a given UV magnitude and size bin.
The third difference comes when we estimate the number density in each bin via MCMC (see full details on this methodology in Finkelstein et al. 2015).We do this to sample galaxy absolute magnitude posterior distributions such that galaxies can fractionally span multiple magnitude bins.In F23 we treated each magnitude bin separately.Here, following Leung et al. (2023) each step in the MCMC chain samples the entire ensemble of galaxies for a given redshift bin, such that when a galaxy scatters out of one bin, it is accounted for in another bin (and thus is more realistic than what was done in F23).
Our measured number densities and final weightedmean effective volumes are listed in Table 4 for all three redshift bins.For bins with no galaxies, we list the 84% upper limit from the MCMC calculation.We also demarcate fainter bins with completeness values <20%; while we list these values, we caution that they are dominated by the completeness correction.
In Figure 9 we plot these luminosity function measurements compared to a wide range of literature results and extrapolations.In this figure we also prominently show the results from NGDEEP at fainter luminosities from Leung et al. (2023) at z ∼ 9 and 11, as their photometry and sample selection process are nearly identical to our own (while we use the Leung et al. 2023 measured number densities here, we verified that the results would be very similar had we calculated the number densities with their sample in our slightly modified redshift bins).At M U V = −19, where our results overlap, we find excellent agreement between CEERS and NGDEEP, with the NGDEEP results then continuing on faint-ward at a consistent slope at both z ∼ 9 and 11.
The small faded symbols in Figure 9 show results from the literature.At z ∼ 9 we find broadly good agreement with these literature results (which come from HST, JWST and ground-based studies).We note that our brightest bin is higher than previous results -this bin contains a single object, the z ∼ 8.7 potential AGN from Larson et al. (2023).Based on the confirmation of this object and another at a similar spectroscopic redshift, Larson et al. (2022a) concluded that the EGS region is ∼4-10× denser than average at z ∼ 8.7 (see also Whitler Note- † This column represents the nominal number of galaxies in a magnitude bin, though in the calculation of the luminosity function we account for galaxies moving between bins due to photometric and photometric redshift uncertainties.‡ The completeness in these bins is <20%, thus these data points are not used to constrain the luminosity function evolution. et al. 2023).While our unit-redshift bin used here mitigates this overdensity somewhat, at the bright end where numbers are small, our results are mildly higher than those from the literature, becoming broadly consistent at M U V ≳ −20.We do note that when considering the full CANDELS area (∼10× wider than we consider here), Finkelstein et al. (2022a) found a volume density ∼5× lower in the magnitude bin including this object.At z ∼ 11 we compare to a compilation of z ∼ 10-12 results from the literature (Donnan et al. 2023b,a;McLeod et al. 2023;Bouwens et al. 2023a;Harikane et al. 2023a;Pérez-González et al. 2023b;Castellano et al. 2023;Franco et al. 2023;Casey et al. 2023;Adams et al. 2023).While there is significant scatter, we find generally good agreement between our results and previously published values in the literature, where here our larger sample size (than most studies) results in smaller uncertainties.We note in particular good agreement be-tween our values and those from McLeod et al. (2023) and Adams et al. (2023), which are the only other previous studies to make use of the full CEERS area.
Each panel of Figure 9 also shows an empirical extrapolation of the UV luminosity function from Finkelstein & Bagley (2022).In this study, they fit an evolving doublepower-law model to all available UV luminosity function data at z = 3-9, assuming that the double powerlaw (DPL) parameters ϕ * , M * , β and α vary smoothly with 1+z (they also simultaneously fit the evolution of the AGN UV luminosity function with a separate DPL, though at the magnitudes we consider here star-forming galaxies were found to dominate).In each panel we show the measured DPL fits at z = 5-8 as the gray lines, then showing the extrapolation to a given redshift range as the light shaded region (where for the extrapolations, we used the MCMC chains from Finkelstein & Bagley 2022 to generate samples of the DPL parameters for each redshift).The upper and lower bounds show the median DPL at the upper/lower bound of the FWHM from the stacked P(z), as shown in the inset panels.
We show this more clearly in Figure 10, where we overplot both our measured number densities and these extrapolated luminosity functions for all three of our redshift samples.Based on pre-launch expectations of either a smoothly or rapidly declining luminosity function at z > 8, we would expect to see our results fall either on or below these extrapolations.While the results at z ∼ 9 are consistent with this extrapolation (with the exception of the brightest points, which are affected by the known overdensity), we see clearly and significantly that our z ∼ 11 observed number densities lie modestly above this extrapolation, while the z ∼ 14 observed number densities lie even higher above their extrapolated region.

Double Power Law Fits
By combining with NGDEEP, we are able to sample 3-4 magnitudes of dynamic range in M U V at z ∼ 9 and z ∼ 11.We therefore fit a double-power law function to our observations, following evidence that this functional form better represents the UV luminosity function at high redshifts than a Schechter function (e.g.Bowler et al. 2015Bowler et al. , 2020;;Finkelstein & Bagley 2022).We fit a DPL to each of our three redshift bins independently via a MCMC algorithm (following the methodology of Leung et al. 2023).We assign fairly uninformative priors on all parameters at z ∼ 9 and 11; at z = 14 due to our poor observational constraints we fix M * and β to the z ∼ 11 values (within a small tolerance).We list Figure 9.The evolution of the rest-frame UV luminosity function, at z ∼ 11 (9.7 < z best ≤ 13; top), z ∼ 9 (8.5 < z best ≤ 9.7; bottom-left) and z ∼ 14 (13 < z best ≤ 15; bottom-right).The large circles show the calculated number densities from our sample (the small red dots denote the magnitudes of individual galaxies, offset vertically for clarity), while the squares show the results from NGDEEP (Leung et al. 2023).The inset shows the stacked P(z) for each sample, with the dotted line denoting the median value of the P(z).Arrows show 1σ upper limits in the first bin with no galaxies, while white-filled symbols denote bins which are <20% complete.The black dot in the brightest bin at z ∼ 9 indicates that this bin has only one object, the z = 8.7 galaxy (which has AGN signatures) from (Larson et al. 2022a).Small symbols show literature results.At z ∼ 9 the HST results are from Bouwens et al. (2019Bouwens et al. ( , 2021)) 2022) from z = 5-8, while the light-shaded colored region shows this model empirically extrapolated to the median redshift for a given sample (where the width is the 68% uncertainty on the luminosity function at this redshift from Finkelstein & Bagley 2022).Our CEERS results are generally consistent with previous luminosity function estimates, with smaller uncertainties reflecting our larger sample size.We also note excellent agreement with the NGDEEP results where our samples overlap.Our brighter CEERS results sit above the expected number densities for the empirically expected extrapolation from Finkelstein & Bagley (2022), with this offset increasing to higher redshift.
the priors and posterior results in Table 5.The median DPL fit is shown as the thin line in Figure 10.While this does a reasonable job of representing the data, the uncertainties, particularly on β and M * at all redshifts, and on the faint-end slope α at z ∼ 14 are presently quite large.We do note that our measured faint-end slope of −2.2 at z = 11 is consistent with the value from Leung et al. (2023), though they imposed more   9, here plotted on the same scale.The thin curves show the median DPL fit to the data at each redshift.This figure highlights that brighter galaxies (MUV ≲ −20) have higher number densities than the extrapolated luminosity functions would predict.While there is a known overdensity at z ∼ 8.7 (Larson et al. 2022a;Whitler et al. 2023) which could affect our lowest-redshift bin, there is no evidence for such overdensities at higher redshifts.
restrictive priors on β and M * , thus our uncertainty is higher.

Evolution with Redshift
To explore the evolution in the UV luminosity function in more detail, in the left-hand panel of Figure 11 we show our observed number densities for the CEERS bin closest to M U V = −20 at each of our three redshift bins, on top of the expectations for this number density from the Finkelstein & Bagley (2022) extrapolation.This figure clearly shows that the observed number densities diverge from the observed evolution at z = 3 to 9, flattening at higher redshifts.We quantify this by measuring the slope (dlogϕ/dz) both for previous observations at z = 3-9, and our results here at z = 9-14.We find that this slope changes from dlogϕ/dz = −0.29 ± 0.03 at z = 3-9 to −0.11 ± 0.08 at z = 9-14.Thus, while the abundance of bright (M U V = −20) galaxies evolves somewhat steeply at a constant slope at z = 3-9, this evolution is flatter towards higher redshifts at the 2.1σ significance level.
For this calculation we have used as our fiducial values the actual measured number densities at M U V = −20.
In the left panel of Figure 11 we show as small stars the values of ϕ(M = −20) from the median DPL model at each redshift.The differences from the observed values are negligible at z = 9 and 14, while at z = 11 this parameterized value is slightly below the observed value at z = 11.Using these DPL values, the observed evolutionary slope steepens to dlogϕ/dz = −0.18± 0.07 (reducing the significance of the flattening to 1.5σ).
To consider whether the evolution in the abundance of faint galaxies changes at z > 9, in the right-hand panel of Figure 11 we show the specific UV luminosity density (ρ U V ), obtained by integrating the UV luminosity function from our DPL fits to M U V < −17 (integrating each step of the chain to calculate the median and 68% confidence range); this quantity is dominated by the abundance of faint galaxies given the steep faint-end slopes (we note that we plot this quantity rather than the number densities at fainter magnitudes due to the lack of constraints at z ∼ 14, visible with the very large error at this highest redshift in the plotted integrated quantity).Interestingly, our measured specific UV luminosity densities at z = 9 and z = 11 are fully consistent with the extrapolated values (at z ∼ 14 the poor constraints on the faint-end slope leave the uncertainty is too large to reach definitive conclusions, thus we show this as a faded data point).This can also be observed in Figure 10, as the faintest data points at z ∼ 9 and 11 are consistent with the extrapolated luminosity function.A similar specific UV luminosity density is found  2022) results to higher redshift.While pre-launch expectations were that the number densities at z >9 would either continue the observed trend at z = 3-9, or evolve more rapidly downward with increasing redshift, we find that the number density of bright galaxies surprisingly flattens at z > 9, where we measure a change in slope dlogϕ/dz between z < 9 and z > 9 at 2.1σ significance.Right) The evolution of the integrated specific UV luminosity density, obtained by integrating double-power law fits to our observed luminosity functions to MUV = −17.The evolution here is less clear, with increased uncertainties (particularly at z ∼ 14, which is shown faded to represent its large uncertainty) making it less clear whether the this quantity also has a flatter evolution at higher redshift.Taking both panels of Figure 11 together, we find clear evidence that the evolution of the number density of bright galaxies is observed to flatten at z > 9, while the evolution of the integrated UV luminosity density, which is dominated by the abundance of fainter galaxies, is less clear, and may possibly follow the lower-redshift evolutionary trend extrapolated to higher redshift.We discuss potential physical explanations for this intriguing result in the following section.

DISCUSSION
In §4 we presented strong evidence that (i) the abundance of galaxies at z > 9 is in excess of nearly all pre-JWST launch simulation predictions as well as above extrapolations from lower-redshift observations, and (ii) the evolution of the abundance of bright (M U V = −20) galaxies is flatter at z = 9-14 than at z = 3-9.Here we explore several potential explanations for these observations.Potential explanations could be due to galaxies being brighter than predicted or more numerous (e.g., horizontal evolution in the luminosity function rather than vertical).However, while the former is relatively easily achievable via a variety of reasonable physical modifications (as we discuss below) the latter would re-quire major revisions to modern cosmology (e.g., more dark matter halos than expected), which we consider less likely.

Redshift Accuracy
One valid concern with early JWST studies is that selection techniques which worked well at lower redshift would begin to fail.In particular, while the physics behind Lyα-break-based selection should persist at these high redshifts, it is possible that heretofore unknown populations of contaminants could have adverse affects.While one could model this contamination based on simulations, it relies on said simulations correctly modeling the colors of all potentially contaminating populations (e.g., Larson et al. 2023;Harikane et al. 2023a), which is unlikely, particularly prior to JWST observations.Spectroscopic validation of photometric redshifts is thus required.Unlike the past decade, when only the brightest HST z > 6 galaxies could have redshifts validated via either weak Lyα emission (e.g., Finkelstein et al. 2013;Zitrin et al. 2015;Oesch et al. 2015;Hoag et al. 2019;Jung et al. 2019Jung et al. , 2020;;Larson et al. 2022a) or Lyα breaks for the brightest sources (e.g.Oesch et al. 2016), JWST's spectroscopic capabilities allow easy restoptical-based spectroscopic redshifts out to z ≈ 9.5 (beyond which [O III] redshifts out of the NIRSpec window) and Lyα continuum-based redshifts (with the NIR-Spec prism mode) to arbitrarily higher redshifts (e.g.Curtis-Lake et al. 2023;Fujimoto et al. 2023a;Arrabal Haro et al. 2023a,b;Hainline et al. 2023;Fujimoto et al. 2023b).
As the CEERS spectroscopic component was observed in the second epoch and a DDT NIRSpec followup program was performed, a significant number of CEERS high-redshift candidates were spectroscopically observed in Cycle 1.Of our original sample of 93 candidate galaxies, 17 had NIRSpec spectroscopic observations, with redshifts originally presented in Fujimoto et al. (2023a); Arrabal Haro et al. (2023a,b); Harikane et al. (2023a); Larson et al. (2022b).As discussed in §3.2.1,only one of these 17 had a "catastrophically" (defined as |z spec −z phot |/(1+z spec ) > 0.3) incorrect redshift (similar success was seen in the UNCOVER survey by Fujimoto et al. 2023b).Of the remaining 16, all had z spec > 7.8.We also note that four of the galaxies in our sample at z phot > 9.7 were spectroscopically observed by CEERS with no spectroscopic detection.For these sources, the absence of strong emission lines is plausibly consistent with z spec > 9.6.These results imply that it is unlikely that significant contamination from low-redshift galaxies is affecting our results (with the caveat that the sample of galaxies confirmed is as-yet small and fairly biased towards brighter sources).
We next consider whether any smaller, yet nonnegligible, systematic offsets in redshift could play a role.A trend for the photometric redshifts to be overestimated at z ≳ 8 has been reported, with results from CEERS (Arrabal Haro et al. 2023b;Fujimoto et al. 2023a), JADES (Hainline et al. 2023) and UNCOVER (Fujimoto et al. 2023b) showing the photometric redshifts to be over-estimated by < ∆z >= 0.45 (± 0.11), 0.26 (± 0.04) and 0.28 (± 0.33), respectively.As discussed in these studies, suchaed offsets indicate a mismatch between the galaxy spectra and the adopted photometric redshift templates due to a variety of physical effects, including an increasing neutral fraction and/or enhanced DLA absorption (e.g., Umeda et al. 2023;Heintz et al. 2023).
For our specific sample of galaxies in this work (13 galaxies with z spec > 8.5), we find a median (mean) offset of z phot − z spec = 0.1 (0.3) with a standard deviation of 0.2.As discussed in §3.2.1 our sample excludes three galaxies selected with z phot > 8.5 with z spec ∼ 8; including these objects does not change the median offset, but it does increase the mean offset (to 0.5) and the standard deviation (to 0.3).These three objects in particular highlight the difficulty of working at z ≲ 9 within the CEERS dataset due to the bluest JWST filter being F115W.Upcoming F090W imaging from PID 2234 (PI Bañados) will improve this situation, probing below the Lyα break at z ∼ 8.At present these larger ∆z ∼ 1-2 offsets affect only a small fraction (3/16) of the spectroscopic sample.Should these larger offsets exist in the rest of the non-confirmed sample, the instances of these objects we do see imply it would primarily affect the z ∼ 9 results (though we note that ID=4777 [z spec =7.993] is in the z ∼ 11 galaxy sample, though it has a very broad P[z]).
We simulate the potential impact of these systematic redshifts offsets by re-measuring our observed luminosity function values in the same MCMC manner as above, where here in each step of the MCMC chain we assign simulated spectroscopic redshifts for each object.For objects which already have spectroscopic redshifts, we keep those values.For the remaining objects, we draw randomly from the observed ∆z (= z phot − z spec ) distribution, adding ∆z to the photometric redshift to simulate a (potentially biased) spectroscopic redshift.For each new "z spec ", we re-measure a new value of M U V .We perform this redshift assignment and subsequent SED-fitting-based M U V measurement prior to running the MCMC chain, pre-computing 100 random draws of ∆z and the corresponding M U V , then drawing from these pre-computed values randomly at each step of the chain.Through this process we simulate the effect of the potential redshift bias both on the specific galaxy samples (as objects can move between redshift bins, potentially lowering the median redshift), as well as the impact on the absolute magnitudes (due to changes in the distance modulus).Both effects can combine to affect the number density, though the former is the larger affect as the difference in the distance modulus for ∆z =0.3 (1) at z = 11 is only ∼0.04 (0.13) mag.
First we examine the impact on the median redshift in each bin, which we find to be fairly minimal: z med = 8.9, 10.9 and 14.2 (unchanged for the z ∼ 9 and z ∼ 11 bins, and 0.2 higher in the z ∼ 14 bin).The corresponding impact on the number densities at M U V = −20 are also modest, with these values being 12%, 14% and 36% lower at z ∼ 9, 11 and 14.Measuring dlogϕ/dz using both these new median redshift values and the corresponding simulated number densities, we find dlogϕ/dz = −0.14 ± 0.09 over z = 9-14, not significantly different from our fiducial value of −0.11 ± 0.08.We note that while for this exercise we restricted the ∆z sample to z spec > 8.5, we found that including the three z spec ∼ 8 galaxies did not change the results.
We thus conclude that the measurable redshift bias from the available spectroscopic confirmations is unlikely to be the primary cause of the observed change in slope at z > 9 in the evolution of the abundance of bright galaxies.We acknowledge again that the existing number of spectroscopic redshifts is small, and biased towards primarily bright sources.In addition, some of these redshifts come from the Lyα break only, and these values have been measured to be up to ∆z ∼0.2 different from more secure emission-line-based redshifts (Fujimoto et al. 2023b).Larger samples of spectroscopic confirmations of galaxies in this epoch are needed to increase confidence that any redshift bias does not affect the measured number densities.

AGN Contribution
Accreting supermassive black holes, particularly when they are not obscured from view, can emit quite strongly in the rest-UV (e.g., Stevans et al. 2014).While the contribution of AGN light to the rest-UV emission from high-redshift galaxies has been fairly unconstrained, some notable examples do exist at z ∼ 7 (e.g.Fujimoto et al. 2022;Endsley et al. 2023a).However, the evolution in the AGN UV luminosity function suggests that AGNs do not dominate the rest-UV emission in galaxies (e.g, non-quasars) at high redshift (e.g., Finkelstein & Bagley 2022).However, with JWST's spectroscopic abilities, it is worth revisiting whether AGN could be contributing significantly to the UV emission from galaxies.
Early observations hint that growing super-massive black holes are indeed somewhat common at z ∼ 5-9, with signs of potential AGN activity found in dozens of galaxies (e.g., Kocevski et al. 2023;Larson et al. 2023;Harikane et al. 2023b;Leung et al. 2023;Labbe et al. 2023;Bogdan et al. 2023), with several sources containing spectroscopically confirmed broad-line AGN (e.g., Kocevski et al. 2023;Harikane et al. 2023b;Matthee et al. 2023;Larson et al. 2023;Maiolino et al. 2023b;Kokorev et al. 2023;Furtak et al. 2023;Greene et al. 2023).While this may suggest AGN could contribute to the UV luminosity, the many of these sources have unique two-component SEDs with fairly flat UV spectral slopes, with a change to a steeply rising red slope in the rest-UV optical (e.g., Kocevski et al. 2023;Barro et al. 2023;Matthee et al. 2023;Labbe et al. 2023).While an AGN jet could trigger enhanced star-formation (e.g.Duncan et al. 2023), here we aim to assess whether the UV emission we observe is dominated by emission from an AGN accretion disk.For these red AGN in particular, the point-source morphology in the longest wavelength bands strongly suggests that obscured AGN light is dominating the rest-optical emission, while advanced SED modeling is needed to robustly constrain the amount of AGN contribution to the rest-frame UV.Such a contribution remains possible as scattered UV light from a partially obscured AGN could in theory contribute to the observed UV emission (e.g., Kocevski et al. 2023;Barro et al. 2023;Labbe et al. 2023;Greene et al. 2023), though the resolved nature of the rest-UV emission in these galaxies indicates stellar emission may dominate.
AGN have also recently been identified in objects with more typical galaxy-like morphologies and SED slopes.Larson et al. (2023) inferred the presence of an AGN in a z = 8.7 galaxy (via a 2.5σ significant broad-Hβ line), and noted that the SED has a flat slope through the restnear-infrared (aided by MIRI observations, Papovich et al. 2023) suggesting stellar light is dominating at all observed wavelengths.Maiolino et al. (2023a) inferred via extremely high gas densities that the nucleus of the well-known galaxy GN-z11 (at z = 10.6)likely hosts an AGN; analysis of this object by Tacchella et al. (2023a) shows that 2/3 of the rest-UV continuum emission emanates from the nucleus, hinting that this object could be AGN dominated in the UV.Harikane et al. (2023b) discuss 11 confirmed broad-lined AGN in the CEERS survey, and found that the majority of them showed extended morphologies in the rest-UV (with most of the rest being extremely UV-faint reddened AGNs), suggesting that much of the rest-UV emission is stellar in origin.Although some of these observations indicate that the AGN contribution to the total UV luminosity is negligible, this might not always the case, depending on the phase of the AGN duty cycle (which affects the contrast between the AGN and the host galaxy).A possible high AGN fraction has been argued in the recent NIRSpec follow-up for lensed galaxies at z = 8.5-13.2(Fujimoto et al. 2023b).
These early observations do not yet collectively paint a clear picture of the contribution of AGN to the restframe UV emission from early galaxies.It is clear that AGNs exist in these epochs, though many discovered so far appear to be primarily obscured.Deciphering the relative contribution from stars and AGNs to the emergent UV emission, including constraining the extent to which scattered UV light from obscured AGNs plays a role, will require a combination of more advanced SED modeling techniques along with deep ∼1-2µm spectroscopy.Until such analyses can be conclusively done, AGNs remain a possible scenario to explain the high abundance of bright galaxies at early times.

Change in Physical Processes
The remaining explanations for the observed UV luminosity enhancement involve changes in the physical processes regulating the ratio of observed UV light to the halo masses of these galaxies.We further explore this here, aided by a variety of recent theoretical results motivated by early JWST observations of the z > 9 universe, summarized in Figure 12.This figure compares our observed UV luminosity function (combined with that of NGDEEP) to recent predictions from Ferrara et al. (2022), Yung et al. (2023a), Dekel et al. (2023), Shen et al. (2023), and Jones et al. (in prep).We also compare to pre-launch predictions from FLARES (Lovell et al. 2021;Vijayan et al. 2020;Wilkins et al. 2022b), DELPHI (Dayal et al. 2017), UniverseMachine (Behroozi et al. 2020), THESAN (Kannan et al. 2022) and BlueTides (Feng et al. 2016;Wilkins et al. 2017).In this figure we compare our observations to these model predictions made at z = 11 when possible (when not we interpolate the number densities in log space), as this is the median of the stacked redshift probability distribution for the galaxies which make up our luminosity function sample (see inset panel of Figure 9).We note that differences between models can be due both to the underlying methodology and/or sub-grid physics, as well as the procedures used to generate observed luminosities.Ferrara et al. (2023) have developed a physical model which successfully reproduces the observed z = 7 UV luminosity function via a dust implementation which is designed to match both the shape of the UV luminosity function and the z ∼ 7 obscured SFR results from the ALMA REBELS survey (Bouwens et al. 2022).Evolving their model to higher redshifts, they naturally predict a slowing in the evolution of the UV luminosity function from z ∼ 9 to 11 due to significantly reduced attenuation in galaxies.This arises in their model due to outflows driven by very high ("super-Eddington") specific SFRs, which can efficiently drive gas (and dust) out from these galaxies (Ferrara 2023).They discuss that this is supported by the data as early results on the colors of z > 8 galaxies showed that they were fairly blue (e.g., Finkelstein et al. 2022b;Cullen et al. 2023;Topping et al. 2022;Papovich et al. 2023) and thus likely contained little dust.

Significant Evolution in Attenuation
As shown in Figure 12, this model does significantly better than pre-launch simulations, in fact slightly overpredicting the observed UV luminosity function.We thus explore whether significant color evolution is ob- served, particularly in modestly bright galaxies, which could further support this model.To study potential evolution with redshift, we select a sample of galaxies from the CEERS catalog at z ∼ 6-8.The sample selection was identical to that done here for z > 8.5, simply evolving the redshift cuts to lower redshift (primarily requiring P(z > 4) ≥ 0.7, z best > 5.0, and S z = 6, 7 or 8).The full sample of > 2000 sources was visually inspected, resulting in a final sample of 1018, 574, and 224 galaxies at z ∼ 6, 7 and 8, respectively.
As discussed in §3.3, we defined a rest-far-UV color C F U V , and calculated it for galaxies in both our z > 8.5 and z = 6-8 galaxy samples.We note that this specific FUV color is very sensitive to dust attenuation due to a narrow color baseline, in that ∆C F U V = 0.05 is equivalent to ∆A V = 0.1.In Figure 13 we plot C F U V versus redshift, color-coded by M U V .We plot the median in bins of redshift (z = 6.5-7.5, 7-5-8.5, 8.5-9.7, and 9.7-13) and while there is significant scatter in color at all redshifts, the median color is fairly blue.Focusing on the middle magnitude bin (M U V = −20) where our results have the strongest constraining power (too few brighter galaxies exist at z > 9 to make conclusions), we see that the median FUV color does not significantly evolve from z ∼ 8 to z ∼ 11.
This initial exploration into the evolution of UV colors does not support rapid changes in the evolution of the attenuation between z ∼ 7 to 11, as would be implied by the Ferrara et al. (2023) model.Rather, the majority of these z > 6 galaxies appear roughly dust free, implying that the ALMA REBELS/detected galaxies used to calibrate the Ferrara et al. (2023) model are not indicative of the bulk of the z ∼ 7 galaxy population.In particular, Papovich et al. (2023) showed that with the inclusion of MIRI photometry, typical galaxies at z > 7 are fairly blue.We do note that finding so little dust is in itself a puzzle, as these galaxies should have made significant amounts of dust in the process of building their observed stellar masses.Though UV-faint dusty sources may indeed exist in this epoch (Rodighiero et al. 2023), in our UV-selected sources the dust must get destroyed (via, e.g., reverse shocks in supernovae), or be removed via winds even down to z ∼ 7 in these sources.We acknowledge however that the scatter in C F U V is large and that more advanced UV spectral slope modeling with larger samples covering a wider dynamic range in stellar mass and M U V can provide further insight.

Change in Conversion from Mass to UV Luminosity
If reduced attenuation is unlikely to be the dominant effect explaining the discrepancy between models andobservations, then it is prudent to consider changes in the processes of star formation which could result in enhanced UV luminosities.A very straightforward potential explanation could be a change in the initial mass function (IMF).As discussed by Finkelstein et al. (2023) and Harikane et al. (2023a), a change in the characteristic stellar mass from ∼1 M ⊙ to ∼10 M ⊙ is expected when the cosmic microwave background temperature is higher and the gas metallicities are lower (e.g., Larson 1998;Bromm et al. 2002;Tumlinson 2006;Steinhardt et al. 2023).Such a change would decrease the mass-to-UV light ratio by factors of up to several (e.g., Raiter et al. 2010;Zackrisson et al. 2011).
As one example of this, we show the new semianalytic model predictions based on merger trees from the gureft simulation suite Yung et al. (2023a,b).This is an update to the "Santa Cruz SAM" predictions (Somerville et al. 2015;Yung et al. 2019), now using Nbody based merger trees constructed with finely spaced snapshots at very high-redshift to better capture halo growth at early times.Their fiducial model still underpredicts the observations, implying that the Extended Press-Schechter based merger trees used in the previously published models were not the sole reason for the discrepancy.However, as they discuss in their paper, the shape of their UV luminosity function appears consistent with the data.They show that if they decrease the mass-to-light ratio by a factor of ∼3 (dashed line), as would be plausible for a top-heavy IMF, their model becomes consistent with the observations.A smaller UV luminosity enhancement of only ∼1.5× would be needed to bring the empirically extrapolated DPL UV luminosity function from Finkelstein & Bagley (2022) at z = 11 (thick gray line) into agreement with our observations.However the discrepancy between this empirical extrapolation is greater for brighter galaxies than for fainter galaxies, such that a flat UV luminosity boost at all magnitudes would lead to an over-prediction of the faint-end due to the empirically predicted faintend slope being steeper (α = −2.5)than that observed from NGDEEP (α = −2.2).While our results cannot constrain the IMF presently, deep rest-UV spectroscopic observations could be capable of detecting highionization lines such as He II or [Ne V], which could indicate the presence of very massive stars in these galaxies (e.g., Tumlinson & Shull 2000;Bromm et al. 2001;Schaerer 2003;Olivier et al. 2022;Cleri et al. 2023).
One caveat to an increased light-to-mass ratio due to a top-heavy IMF would be that it would be accompanied by an increased feedback-to-mass ratio, because the massive stars that produce UV light are also those that produce energetic radiation, strong stellar winds and Type II supernovae.Changing the IMF would also change the integrated chemical yields, which could impact cooing.The models above have yet to selfconsistently model the increased feedback and modified chemical yields of a top-heavy IMF for high-redshift galaxies.However, other studies (e.g., Fontanot 2014) have found that evolving top-heavy IMFs (e.g., topheavy IMFs for galaxies with high SFRs) have tended to decrease the mass of stars formed relative to models with universal IMFs.

High Star Formation Efficiency
Even without a change in the IMF, the UV luminosities of very high-redshift galaxies could be enhanced if the rates of gas conversion into stars was increased.As discussed in Finkelstein et al. (2023), most models adopt fairly long gas-depletion timescales, based on observations of nearby galaxies.While perhaps a coincidence, it is interesting that among the pre-JWST predictions only the Behroozi & Silk (2015) model, which assumes a negligible gas depletion timescale, can match the observations in Figure 8.The physical processes which may lead to a decrease in this timescale are not currently obvious, though as the dependence of star-formation with gas density (Schmidt 1959;Kennicutt 1998) is superlinear (e.g.Vallini et al. 2023), the very high gas densities present in early-universe halos likely play a role.Very high-efficiency star formation has been observed in super-star clusters in local galaxies (e.g.Turner et al. 2015;Smith et al. 2020;Costa et al. 2021).The high cloud surface densities in the environments of these superstar clusters, which are rare in the nearby Universe, could be common in galaxies at z > 9. Theoretical work on molecular cloud scales has also shown that both stellar winds and radiation feedback may become ineffective at very high cloud surface densities, leading to higher cloud-scale star formation efficiencies (Grudić et al. 2020;Lancaster et al. 2021;Menon et al. 2023).
In this context, we explore two additional predictions.The first is an updated version of the Simba model (Davé et al. 2019) known as Simba-EoR (Jones et al., in prep.).This simulation includes a new subgrid ISM model that co-evolves dust and H 2 by explicitly tracking all formation and destruction mechanisms, which turns out to yield more H 2 at low metallicities relative to Simba.This leads to earlier star-formation in halos, increasing the luminosities of early galaxies compared to Simba (which, as shown in Figure 8 significantly underpredicts z > 9 observations).The Simba-EoR predictions are well matched to the CEERS observations, despite no specific tuning to EoR data, though the faintend slope appears higher than that observed.It is worth noting that the galaxies are not predicted to be dustfree; without accounting for extinction, the Simba-EoR predictions would be clearly above the observations.
We also compare to the Feedback-Free Starburst (FFB) physical model introduced by Dekel et al. (2023), using the luminosity function, including a dust attenuation prescription, presented in Li et al. (in preparation).This model predicts that in massive galaxies at z ∼ 10, where the gas density is above a threshold of ∼ 3 × 103 cm −3 and the gas-phase metallicity is below ∼ 0.2 Z ⊙ , star formation in thousands of globularcluster-like clouds is expected to proceed on a free-fall timescale shorter than the ∼ 2 Myr interval between a starburst and the onset of effective stellar and supernova feedback, thus allowing high star-formation efficiency free of suppression by feedback 3 .In Figure 12 we show as the blue shaded region the predictions of this model with a maximum SFE ranging from 20% (lower bound) to 100% (upper bound).These predictions match our observations reasonably well over the range where we detect galaxies.This model predicts an enhanced bright end due to the predicted high SFE preferentially at high halo masses, aided by the fairly low levels of dust attenuation and the high level of star-formation stochasticity predicted in the FFB phase.

Stochastic Star Formation
In §4 we presented Figure 11, which showed that the evolution of the abundance of bright galaxies appears to flatten at z > 9, while the specific UV luminosity density (integrated to M U V = −17) is not inconsistent with evolution with a consistent slope from lower redshift.It is important to note that presently these integrals are uncertain -the observations are also consistent with an elevated value at the faint end within the uncertainties.Future work combining the available deep fields (NGDEEP, MIDIS, JADES, etc.) will soon improve these constraints.Should these future studies conclude that any flattening evolution is observed primarily at brighter luminosities, we comment here on possible physical drivers of such a differential evolution.First, should any of the processes discussed above be at play primarily in more massive halos, it could lead to a preferentially enhanced abundance of bright galaxies.However, it seems unlikely that there would be dramatic differences in the star-formation efficiency or IMF across a fairly limited dynamic range of UV absolute magnitudes (and thus presumably halo mass).
One plausible explanation for the observed behavior could be a significant increase in the stochasticity of star formation (e.g., Shen et al. 2023;Sun et al. 2023;Mirocha & Furlanetto 2023).As shown by Shen et al. (2023), introducing a variability in the conversion from halo mass to UV luminosity (which could encompass both star-formation stochasticity, as well as variations of dust attenuation, metallicity, etc.), leads to an "upscattering" in the UV luminosity function.Due to the steep faint-end slope, more galaxies will scatter from faint-tobright luminosities than from bright-to-faint luminosities, which can lead to a shallower bright end (in an effect similar to Eddington bias).They explore a range of scatter values (encompassed by the variable σ U V which describes the Gaussian width of the kernel scattering the UV luminosities).As we show in Figure 12, σ U V = 1.75 yields a predicted UV luminosity function in good agreement with our observations, though it does overpredict abundances at M U V ≤ −21.Pallottini & Ferrara (2023) explored the level of stochasticity present within their high-resolution hydrodynamic simulations, and found that while stochasticity was present, it was at a lower level, equivalent to σ U V ∼ 0.6.This is just one simulation, so it remains to be seen if higher levels of stochasticity are plausible.We also note that when stochasticity is not included, the fiducial UV luminosity function of the empirical Shen et al. (2023) model is quite low (solid gold line in Figure 12).Taking, for example, the UV luminosity function from the physicsbased Santa Cruz GUREFT SAMs (Yung et al. 2023a), lower levels of UV scatter would be needed to match the observations, plausible more consistent with simulation results.
Observational evidence is emerging that high-redshift galaxies have significant variability in their starformation rates.Looser et al. (2023a) discovered a surprisingly quiescent galaxy at z = 7.3, which shows evidence for a ∼10-20 year lull in star-formation activity after a recent burst.They extended this study in Looser et al. (2023b), finding with a spectroscopic analysis that lower-mass galaxies at high redshift have particularly bursty star-formation histories.Endsley et al. (2023b) came to a similar conclusion after analyzing the photometry of a much larger sample of galaxies, finding that fainter galaxies at higher redshifts show lower [O III] equivalent widths than their brighter counterparts, which they interpret as evidence that the brightest galaxies are frequently experiencing a recent upturn in star-formation activity (see also Tacchella et al. 2023b).
While evidence thus far indicates that galaxies at higher redshifts and lower masses may host more variable star-formation histories, we cannot yet conclude whether this is the major physical driver of the slowerthan-expected evolution of the UV luminosity function at brighter luminosities.Deep JWST/PRISM spectroscopy could yield a dataset capable of constraining star-formation histories via SED modeling.Another interesting test to constrain this possibility was proposed by Muñoz et al. (2023), who noted that if variability in the ratio of UV luminosity to halo mass was significant, many UV-bright galaxies would exist in lower-mass halos, which could be distinguishable via a weaker clustering strength than if there was a more direct correlation between UV luminosity and halo mass.Such a measurement would require a wide and deep photometric survey capable of identifying a sufficient source density of z > 10 galaxies.This may be possible with COSMOS-Web (PID 1727, PIs Kartaltepe & Casey;Casey et al. 2022), and will certainly be possible with deep field observations with the Nancy Grace Roman Space Telescope (in particular with the added K s filter).

CONCLUSIONS
We have presented the results of a comprehensive search for z > 8.5 galaxies in the full NIRCam dataset from the Cosmic Evolution Early Release Science survey.We created a new photometric catalog aimed at measuring accurate colors for faint galaxies, as well as robust estimates of the total flux, implementing multiple key improvements over our previous work (Finkelstein et al. 2023).We identify a sample of 88 candidate z > 8.5 galaxies, with 55 galaxies in our z ∼ 9 sample (selected over 8.5 < z < 9.7), 27 galaxies in our z ∼ 11 sample (selected over 9.7 < z < 13), and three galaxies in our z ∼ 14 sample (selected over 13 < z < 15).Notably, 13 of our galaxies are spectroscopically confirmed, eight in the z ∼ 9 sample (z spec = 8.63-9.00),and five in the z ∼ 11 sample (z spec = 9. 77 -11.42).
We perform advanced source-injection simulations to assess our source completeness, accounting for both photometric and photometric redshift recovery, explicitly accounting for the sizes of our sources.While the impact of this update is minimal for bright (M U V ∼ − 21) galaxies, it does result in modestly (10-30%) lower effective volumes than the assumption of a point source at fainter (M U V ∼ − 19.5) luminosities.
We use these completeness estimates to first compare the cumulative surface density of galaxies in our sample to a variety of pre-JWST-launch simulation predictions, finding that the observed abundance of galaxies is higher than any physical model prediction at z > 10, with this tension increasing with increasing redshift.Our results are in the least tension with the empirical model of Behroozi & Silk (2015), which posits that the specific star-formation rate tracks the specific halo accretion rate; at these high redshifts this would imply very short timescales for star-formation.While it remains to be seen if this is physical, it would be consistent with the observed abundance of UV-bright galaxies.
We calculate the rest-UV luminosity function in our three redshift bins.Comparing to previous results, we find general agreement, though our uncertainties are typically smaller due to our larger sample sizes.Notably, we see evidence of the known z ∼ 8.7 overdensity in the EGS field (Finkelstein et al. 2022a;Larson et al. 2022a;Whitler et al. 2023) in the brighter bins of our z ∼ 9 UV luminosity function.
We analyze the evolution of the UV luminosity function from z ∼ 9 to z ∼ 14 in two ways.First, we examine the evolution of galaxies at M U V = − 20.While the abundance of galaxies at this fixed UV luminosity has been conclusively measured to rise smoothly from z = 9 to 3, we find evidence for a significant flattening.Extrapolating the observed evolution from z = 9 to 3 to higher redshifts, one would expect the abundance of galaxies to rise by a factor of ≳ 20 from z ∼ 14 to z ∼ 9. Conversely, we measure a rise of 4.3 (± 3.7) over this epoch.Phrasing this another way, dlogϕ/dz = −0.29 ± 0.03 at z = 3-9, while we find −0.11± 0.08 at z = 9-14.We also explore the total integrated specific UV luminosity density, fitting double-power law models to our observed luminosity functions, and integrating to M U V = −17.Interestingly, we find that this quantity follows the observed extrapolation at z > 9.This hints that whatever new physical processes in play at these epochs may primarily affect bright galaxies, though the uncertainty in the integrated specific UV luminosity density at z ≳ 11 is presently high.
We discuss a variety of potential physical causes for the observed results.The high yield of spectroscopic confirmations implies that significant sample contamination is unlikely, though confirmation of a larger fraction of our galaxy sample would increase confidence in this claim.We find, based on blue colors for not only our galaxies, but galaxies at similar UV luminosities at z = 6-9, that a significant drop in dust attenuation at earlier times is unlikely to be the dominant explanation.Rather, we find that models which implement a combination of increased star-formation efficiency and/or an increased degree of bursty, stochastic star formation at these redshifts are more consistent with our observations.A change in the underlying IMF may also play a role.
We find slight evidence that the physics at play may be more important in bright galaxies than faint galaxies.Should this represent a dependence of star formation efficiency on halo mass, it would be expected to imprint signatures into a broad range of other cosmological probes.A key example is the topology of neutral hydrogen 21cm intensity maps, where such a differential star-formation efficiency effect would introduce an additional non-linear bias in the power-spectrum analysis.This signature may be detectable with ongoing and upcoming experiments, such as the Hydrogen Epoch of Reionization Array (HERA; Liu & Parsons 2016) and the Square Kilometer Array (SKA; Thélie et al. 2023), thus providing independent cross-checks on our results.
Our results show that the abundance of bright galaxies at z > 9 robustly exceeds expectations based on prelaunch observations.While many possibilities exist to explain these observations, each of them are directly empirically testable with a modest investment in further JWST spectroscopy.Deep NIRSpec followup can not only confirm the redshifts of all galaxies in this sample, but it can also probe diagnostic emission lines for either AGN activity or the presence of very massive stars.Such observations could empirically measure the starformation histories, testing models of stochasticity.As we are still very early in the JWST mission, it is highly likely such observations will become available in the near future, answering these key questions about star and galaxy formation at early times.

Figure 1 .
Figure 1.A comparison of the photometric redshifts to the measured spectroscopic redshifts for the 17 galaxies in our initial sample with spectroscopic confirmation.With the exception of the (not shown) z ∼ 16 candidate at zspec = 4.9 (Arrabal Haro et al. 2023a), the agreement is generally good.After removal of three sources (in red) with spectroscopic redshifts below our cut of z > 8.5, we find a median (mean) z phot − zspec =0.1 (0.3), with 8/13 sources having |z phot −zspec| < 0.2.As noted by Arrabal Haro et al. (2023b) and Fujimoto et al. (2023a) there does appear to be a mild systematic offset towards higher photometric redshifts.We also note that there are four sources with z phot > 10 observed with NIRSpec that showed no detectable signatures, which is consistent with z > 10, where all strong lines are shifted out of the NIRSpec range.
prism observations, while Arrabal Haro et al. (2023b) presented both Lyα break and [O III]-based redshifts for sources in the ∼3100 sec CEERS third-epoch prism observations (additional redshifts were also presented in Larson et al. 2023 and Tang et al. 2023).Arrabal Haro et al. (2023a) presented both Lyα break and line-based ([O II] and [O III]) redshifts for sources in the ∼5 hr prism DDT observations (one of these sources had its redshift of z = 11.04 first presented in Harikane et al. 2023a).

Figure 2 .
Figure 2. The F200W half-light radius (measured from SE) versus F200W apparent magnitude.The gray bar shows the half-light radius from stars in the image (with the width showing the 68% spread in the values).The data points are color-coded by the difference in χ 2 between the best-fitting (sub)stellar model, and the best-fitting EAZY galaxy model.Most objects are clear resolved, and thus extragalactic in origin.Only one object is formally compact with a stellar χ 2 comparable to the best-fitting EAZY model, though even for this object we conclude it is likely extragalactic due to its non-compact appearance in the imaging, and the very large (∼4 kpc) implied distance were it stellar.We conclude that stellar contamination is not significant in our sample.

Figure 3 .
Figure 3.The symbols show our sample of 88 z > 8.5 galaxy candidates in a plane of F277W magnitude versus redshift, with the different colors representing the different redshift samples.Squares denote objects with spectroscopic confirmation, while the circles are plotted at the photometric redshifts.Triangles denote objects with spectroscopic observations but no confirmation.The small star denotes Maisie's Galaxy(Finkelstein et al. 2022b), one of the first JWST very high-redshift galaxy discoveries, while the small dot denotes CEERS-1019, a confirmed z = 8.7 galaxy which appears to have broad Hβ emission, indicative of an active super-massive black hole(Larson et al. 2023).The background shading shows the completeness (inclusive of both photometric and sample selection completeness) of our sample (for sources with half-light radii of 3.3 pixels), as described in §4.1; the black line shows the 20% completeness contour.

Figure 7 .
Figure 7.The left panel shows the effective volume for our z ∼ 11 (9.7 < z ≤ 13) galaxy sample.The purple dashed line shows the maximum volume one would obtain with a 100% completeness over an ideal (though unrealistic) top-hat selection function.The colored curves show our calculated effective volumes as a function of source half-light radii.The circles show the volumes we use, calculated by weighting the volumes by the sizes of galaxies in a given magnitude bin.The size distribution of our z ∼ 11 galaxy sample is given in the lower-right plot, while the completeness as a function of size at fixed input UV absolute magnitude is shown at the upper-right.The completeness is very sensitive to the size, particularly at fainter magnitudes.

Figure 8 .
Figure8.The top panel shows the redshift distribution of our sample (scattered vertically for clarity), while the main panel shows the cumulative surface density of galaxies per unit surface area at redshift greater than z for m < 28.5.The dotted line shows the raw counts from our CEERS z > 8.5 galaxy sample, while the shaded region shows the 68% confidence interval on the completeness corrected surface density (including only sources with completeness estimates >20%), inclusive of photometric, photometric redshift, and Poisson uncertainties.The vertical error bars show the estimated cosmic variance uncertainty.The colored lines show predictions from a range of pre-launch simulations (including hydrodynamic, analytic, semi-analytic, and semi-empirical models), all also for m < 28.5.The observed surface density of galaxies lies above most predictions at z > 10, and above all except theBehroozi & Silk (2015) and FLARES models at z > 11.This confirms early results based on smaller samples that the observed abundance of z ≳ 10 galaxies significantly exceeds most pre-launch physically-motivated expectations.
Figure9.The evolution of the rest-frame UV luminosity function, at z ∼ 11 (9.7 < z best ≤ 13; top), z ∼ 9 (8.5 < z best ≤ 9.7; bottom-left) and z ∼ 14 (13 < z best ≤ 15; bottom-right).The large circles show the calculated number densities from our sample (the small red dots denote the magnitudes of individual galaxies, offset vertically for clarity), while the squares show the results from NGDEEP(Leung et al. 2023).The inset shows the stacked P(z) for each sample, with the dotted line denoting the median value of the P(z).Arrows show 1σ upper limits in the first bin with no galaxies, while white-filled symbols denote bins which are <20% complete.The black dot in the brightest bin at z ∼ 9 indicates that this bin has only one object, the z = 8.7 galaxy (which has AGN signatures) from(Larson et al. 2022a).Small symbols show literature results.At z ∼ 9 the HST results are fromBouwens et al. (2019Bouwens et al. ( , 2021)); Bowler et al. (2020); McLeod et al. (2016); Morishita et al. (2018); Stefanon et al. (2019); Rojas-Ruiz et al. (2020); Bagley et al. (2022a); Finkelstein et al. (2022a), while the JWST results are from Donnan et al. (2023b); Adams et al. (2023); Harikane et al. (2023a); Pérez-González et al. (2023b); Bouwens et al. (2023b).The z ∼ 11 results shown are from Donnan et al. (2023b,a); McLeod et al. (2023); Adams et al. (2023); Pérez-González et al. (2023b); Bouwens et al. (2023b); Harikane et al. (2023a); Castellano et al. (2023); Franco et al. (2023); Casey et al. (2023), while at z ∼ 14 we compare to Donnan et al. (2023b) and Casey et al. (2023).The gray curves show the best-fitting double-power law (DPL) model from Finkelstein & Bagley (2022) from z = 5-8, while the light-shaded colored region shows this model empirically extrapolated to the median redshift for a given sample (where the width is the 68% uncertainty on the luminosity function at this redshift from Finkelstein & Bagley 2022).Our CEERS results are generally consistent with previous luminosity function estimates, with smaller uncertainties reflecting our larger sample size.We also note excellent agreement with the NGDEEP results where our samples overlap.Our brighter CEERS results sit above the expected number densities for the empirically expected extrapolation fromFinkelstein & Bagley (2022), with this offset increasing to higher redshift.

Figure 10 .
Figure 10.Our measured UV luminosity functions at z ∼ 9, 11 and 14 are shown by the large symbols (circles for our CEERS results, and squares for NGDEEP from Leung et al. 2023).The shaded regions are the same as in Figure 9, showing the extrapolated UV luminosity functions from Finkelstein & Bagley (2022).The inset panel likewise shows the same P(z) curves from Figure9, here plotted on the same scale.The thin curves show the median DPL fit to the data at each redshift.This figure highlights that brighter galaxies (MUV ≲ −20) have higher number densities than the extrapolated luminosity functions would predict.While there is a known overdensity at z ∼ 8.7(Larson et al. 2022a; Whitler  et al. 2023) which could affect our lowest-redshift bin, there is no evidence for such overdensities at higher redshifts.

Figure 11 .
Figure 11.Left) The evolution of the observed number density at MUV = −20.The red circles show the observed number density at this absolute magnitude from CEERS (connected by the light red shaded region; the small stars show the DPL fit values at this magnitude).The dark blue region shows the measured value from Finkelstein & Bagley (2022), and the lighter shaded region shows the extrapolation of the Finkelstein & Bagley (2022) results to higher redshift.While pre-launch expectations were that the number densities at z >9 would either continue the observed trend at z = 3-9, or evolve more rapidly downward with increasing redshift, we find that the number density of bright galaxies surprisingly flattens at z > 9, where we measure a change in slope dlogϕ/dz between z < 9 and z > 9 at 2.1σ significance.Right) The evolution of the integrated specific UV luminosity density, obtained by integrating double-power law fits to our observed luminosity functions to MUV = −17.The evolution here is less clear, with increased uncertainties (particularly at z ∼ 14, which is shown faded to represent its large uncertainty) making it less clear whether the this quantity also has a flatter evolution at higher redshift.
byPérez-González et al. (2023b), while McLeod et al.  (2023)  finds a slightly elevated value, though consistent with the empirical extrapolation within the uncertainties.

Figure 12 .
Figure 12.A comparison of the observed z ∼ 11 UV luminosity function from CEERS and NGDEEP (symbols are the same as in the top panel of Figure 9) to model predictions.Pre-launch predictions from FLARES, DELPHI, UniverseMachine, THESAN and BlueTides are shown as the thin gray lines, while the colored lines show more recent predictions from Ferrara et al. (2023, green), Dekel et al. (2023, the blue shaded region shows a range of maximum star-formation efficiency from 0.2-1), Shen et al. (2023, yellow; the dashed line includes a strong stochastic star-formation component), Yung et al. (2023a, red; the dashed line indicates a top-heavy IMF UV luminosity enhancement of 3×) and Jones et al. (in prep, purple).The thick gray line shows the empirical DPL luminosity function from Finkelstein & Bagley (2022) extrapolated to z = 11.These predictions show that a variety of potential physical solutions can predict a z ∼ 11 luminosity function in agreement with observations.

Figure 13 .
Figure13.The evolution of the FUV color CF U V with redshift, color-coded by MUV .The large squares show median values in bins of redshift and magnitude (error bars show the 1σ spread), for bins with ≥ five sources.While reddened galaxies exist at z ∼ 7-8, the median colors are still fairly blue.Notably, the median FUV color for MUV = −20 galaxies is similar at z ∼ 8 to z ∼ 11, suggesting a significant drop in dust attenuation is unlikely to explain the high abundance of bright z ∼ 11 galaxies.

Figure 14 .
Figure 14.Cutout images, 5 ′′ on a side, of objects originally selected, but identified via visual inspection as being diffraction spikes.

Figure 15 .
Figure 15.Same as Figure 14, for objects identified as bad pixels.

Figure 16 .
Figure16.Same as Figure14, for objects identified as oversplit portions of nearby brighter galaxies.

Figure 17 .
Figure 17.Same as Figure 14, for objects identified as being associated with chip edges.

Figure 18 .
Figure 18.Same as Figure 14, for objects identified as being affected by bad photometry.

Table 1 .
NIRCam Imaging Summary Note-The depths given represent 5σ limiting magnitudes, measured in d=0.2 ′′ diameter circular apertures and corrected to total fluxes assuming a point-source.The PSF FWHM and fraction of the flux enclosed in a 0.2 ′′ diameter circular aperture are given below the horizontal line.

Table 2 .
HST Imaging Summary Note-Similar to Table1, for the HST imaging used.All depths given represent 5σ limiting magnitudes, measured in d=0.2 ′′ diameter circular apertures and corrected to total fluxes assuming a point-source.

Table 3 .
Summary of z > 9.7 Candidate Galaxies ures 4, 5, 6 contain summary figures of sources in the z ∼ 14, 11 and 9 samples, respectively.The latter two are figure sets, with two example objects shown, and with all objects viewable in the electronic version of the manuscript.We tabulate key properties of our sample in Table3for z > 9.7, and Table 6 in the Appendix for z < 9.7.3.4.1.Comparison to McLeod et al. 2023 and Adams et al.

Table 5 .
UV Luminosity Function Double Power-Law Parameters