CEERS Key Paper. I. An Early Look into the First 500 Myr of Galaxy Formation with JWST

We present an investigation into the first 500 Myr of galaxy evolution from the Cosmic Evolution Early Release Science (CEERS) survey. CEERS, one of 13 JWST ERS programs, targets galaxy formation from z ∼ 0.5 to >10 using several imaging and spectroscopic modes. We make use of the first epoch of CEERS NIRCam imaging, spanning 35.5 arcmin2, to search for candidate galaxies at z > 9. Following a detailed data reduction process implementing several custom steps to produce high-quality reduced images, we perform multiband photometry across seven NIRCam broad- and medium-band (and six Hubble broadband) filters focusing on robust colors and accurate total fluxes. We measure photometric redshifts and devise a robust set of selection criteria to identify a sample of 26 galaxy candidates at z ∼ 9–16. These objects are compact with a median half-light radius of ∼0.5 kpc. We present an early estimate of the z ∼ 11 rest-frame ultraviolet (UV) luminosity function, finding that the number density of galaxies at M UV ∼ −20 appears to evolve very little from z ∼ 9 to 11. We also find that the abundance (surface density [arcmin−2]) of our candidates exceeds nearly all theoretical predictions. We explore potential implications, including that at z > 10, star formation may be dominated by top-heavy initial mass functions, which would result in an increased ratio of UV light per unit halo mass, though a complete lack of dust attenuation and/or changing star formation physics may also play a role. While spectroscopic confirmation of these sources is urgently required, our results suggest that the deeper views to come with JWST should yield prolific samples of ultrahigh-redshift galaxies with which to further explore these conclusions.


INTRODUCTION
The epoch of reionization marks the period when energetic photons (presumably from massive stars in early galaxies; e.g., Stark 2016;Finkelstein et al. 2019; stevenf@astro.as.utexas.edu* NASA Postdoctoral Fellow † NASA Hubble Fellow Robertson 2021) ionized the gas in the intergalactic medium (IGM).Understanding when and how this process occurs is crucial to constraining both the earliest phases of galaxy formation (which kick-started this process), and how the evolution of the IGM temperature affects the star-formation efficiency in low-mass halos throughout (and after) this transition.Advances in deep near-infrared (near-IR) imaging with the Hubble Space Telescope (HST) have pushed con-straints on galaxy evolution into the first billion years after the Big Bang.Studies of public blank-field surveys including the Hubble Ultra Deep Field [HUDF; Beckwith et al. 2006;Oesch et al. 2010;Bouwens et al. 2010], the Cosmic Assembly Near-infrared Deep Extragalactic Legacy Survey [CANDELS; Grogin et al. 2011;Koekemoer et al. 2011], the Hubble Frontier Fields [HFF; Lotz et al. 2017], and the Brightest of Reionizing Galaxy survey [BoRG;Trenti et al. 2011] have uncovered thousands of galaxies at z > 6 (e.g.Bouwens et al. 2015;Finkelstein et al. 2015;Ishigaki et al. 2015;McLeod et al. 2016;Oesch et al. 2018;Morishita et al. 2018;Bridge et al. 2019;Rojas-Ruiz et al. 2020;Finkelstein et al. 2022a;Bouwens et al. 2022a;Bagley et al. 2022a).The evolution of the rest-frame ultraviolet (UV) luminosity function has been well studied to z ∼ 8 (e.g.Finkelstein et al. 2015;Bouwens et al. 2015), with some constraints placed at z ∼ 9 and 10 (e.g., Oesch et al. 2018;Bouwens et al. 2019;Finkelstein et al. 2022a;Bagley et al. 2022a).However, little was known about the z > 10 universe prior to JWST, beyond the unexpected discovery of an exceptionally bright z = 10.957galaxy (Oesch et al. 2016;Jiang et al. 2021).This knowledge gap is due to the modest light-gathering power of the 2.4m HST and the fact that at z > 10 galaxies become one-band (F160W; HST/WFC3's reddest filter) detections.Rest-UV emission from galaxies completely redshifts out of HST observability at z 12.5.This has left a major gap in our knowledge of galaxy formation at early times.Do galaxies form stars fairly inefficiently, like our own Milky Way, and build up slowly?Or is star formation in the early Universe more rapid due to high gas densities and frequent interactions?Equally exciting and unknown, does the initial mass function (IMF) begin to show signs of evolution?Models predict top-heavy IMFs should dominate at very low metallicities (e.g.Bromm & Larson 2004), so observations should begin to see such signatures.Answering these questions about the physical processes dominating the earliest star formation requires detailed observations of the earliest galaxies to form in our universe, and JWST was designed to push our cosmic horizons to the highest redshifts.The 7× larger light-gathering power combined with the large field-of-view and near-infrared sensitivity of NIRCam (Rieke et al. 2005) sets the stage for major advances in our ability to study early galaxy formation.Cycle 1 of JWST includes several programs encompassing 100's of hours which all have the early Universe as their primary science goal.
Indeed in just the days-to-weeks after the first science data were released, several papers were submitted discussing the detection of objects not only at the expected redshifts of z ∼ 10-11 (e.g., Castellano et al. 2022;Naidu et al. 2022b;Adams et al. 2022;Whitler et al. 2022;Labbe et al. 2022), but with some candidates at z ∼ 12-17 (e.g.Finkelstein et al. 2022b;Donnan et al. 2022;Harikane et al. 2022).The existence of galaxies at such early times, and especially at such bright magnitudes for some sources, could potentially challenge early models of galaxy formation (e.g.Mason et al. 2022;Ferrara et al. 2022;Boylan-Kolchin 2022).However, these studies originally relied on very early photometric calibration; subsequent calibration data shifted the photometric zeropoints significantly (Boyer et al. 2022).Now that the flux calibration, and overall data reduction pipeline has stabilized, it is prudent to take a detailed look at what constraints we can place on this early epoch.
Here we use the first epoch of data from the Cosmic Evolution Early Release Science Survey (CEERS; survey description to come in Finkelstein et al., in prep).CEERS was designed in part to provide our earliest detailed glimpse into the z > 10 universe, and these CEERS data were among the first Cycle 1 science exposures taken, included in the first publicly released data on 14 July 2022.We search these data for z 9 galaxy candidates, heretofore difficult (if not impossible) to find with HST.We place an emphasis on building a robust sample via a detailed photometric cataloging process, coupled with stringent selection criteria, both backed by simulations.Section 2 describes the observations and data reduction, while Section 3 describes our photometry and photometric redshift measurements, and Section 4 discusses our sample selection procedure.We describe our sample in Section 5, and present a comparison to other early samples in Section 6.In Section 7 we present the z ∼ 11 UV luminosity function and the cumulative surface density of early galaxies, and discuss implications on the physics dominating galaxy formation at the earliest times in Section 8. We summarize our results and present our conclusions in Section 9.In this paper we assume the latest Planck flat ΛCDM cosmology with H 0 = 67.36km s −1 Mpc −1 , Ω m = 0.3153, and Ω Λ = 0.6847 (Planck Collaboration et al. 2020).All magnitudes are in the absolute bolometric system (AB; Oke & Gunn 1983).

DATA
CEERS is one of 13 early release science surveys designed to obtain data covering several scientific themes of astronomy early in Cycle 1, along with testing out multiple instrument modes and providing early public data to the community.CEERS consists of a mosaic of 10 NIRCam pointings in the CANDELS Extended Groth Strip (EGS) field, with six obtained in parallel with prime NIRSpec observations, and four in parallel with prime MIRI observations (four of these pointings also include NIRCam wide-field slitless grism spectroscopy).Here we use imaging data from NIR-Cam obtained during the first epoch of CEERS, during June 21-22, 2022.This consists of short and longwavelength images in both NIRCam A and B modules, taken over four pointings, labeled NIRCam1, NIRCam2, NIRCam3, and NIRCam6.Each pointing was observed with seven filters: F115W, F150W, and F200W on the short-wavelength side, and F277W, F356W, F410M, and F444W on the long-wavelength side.

Data Reduction
The NIRCam images used here are those released with the first CEERS public data release (Data Release 0.5), which are fully described in Bagley et al. (2022c).Here we briefly highlight the key features of the data reduction, directing the reader to Bagley et al. (2022c) for more details.
We reduce the raw NIRCam imaging through version 1.7.2 of the JWST Calibration Pipeline, with custom modifications designed to correct for additional features in the data.We use the calibration reference file context1 pipeline mapping (pmap) 0989.We begin by running Stage 1 of the calibration pipeline, which performs detector-level corrections and outputs a countrate image in units of counts/s, adopting all default parameters for this stage.We then perform custom corrections to flag and remove snowballs from all exposures, subtract off the large-scale wisps in F150W and F200W using wisp templates created by the NIRCam team, and measure and remove 1/f noise via a median measured (amplifierby-amplifier) along rows and columns.The images are then flat fielded and flux calibrated using Stage 2 of the calibration pipeline, again adopting the default values, to produce images in units of MJy/sr.The pmap 0989 reference files include ground flats that have been corrected for illumination gradients measured with in-flight data, and improved but still preliminary photometric calibration reference files.We find that the flux calibration does a good job of synchronizing the zeropoints across the NIRCam detectors (to within the 2-5% level, Bagley et al. 2022c), but that an additional absolute flux calibration may be required at the few percent level (see §3.6).These flat and photometric calibration reference files will continue to be improved and updated throughout Cycle 1.
We align the images using a custom version of the TweakReg routine, which is designed to register image to an absolute WCS frame by matching sources detecting in each image with those in a reference catalog.Our modified version of the routine uses Source Extractor (hereafter SE; Bertin & Arnouts 1996) to measure source centroids in each individual image.We then align each image to a reference catalog constructed from an HST F160W mosaic with astrometry tied to Gaia EDR3 (see Section 2.2 and Koekemoer et al. 2011, for details on the methods used for the mosaic construction).We first align images from the same detector but different dithers to each other, allowing for shifts in x and y and achieving an RMS of ∼ 3 − 6 mas per source for this relative alignment.We next align all images to the F160W reference image, allowing xy shifts and rotations in the SW images and an additional scaling factor to account for large-scale distortions in the LW images.The RMS of this absolute alignment is ∼ 12 − 15 mas and ∼ 5 − 10 mas when comparing WFC3 to NIRCam and NIRCam to NIRCam, respectively.We note that we followed a slightly different procedure for NIRCam3, aligning F277W to HST/WFC3 F160W and then using F277W as the new reference for all other NIRCam filters.This altered procedure was required to address additional offsets registered in one portion of the F160W image in this region (see Bagley et al. 2022c, for details).
Finally, we create mosaics for each pointing in all filters in the following way.We subtract a pedestal value off of each individual image, and scale the readnoise variance maps such that they include an estimate of the robustly-measured sky variance.The mosaics are then created using the Resample routine in Stage 3 of the calibration pipeline, which uses the drizzle algorithm with an inverse variance weighting (Casertano et al. 2000;Fruchter & Hook 2002).We drizzle the images to an output pixel scale of 0. 03/pixel and use the same tangent point as that of the HST mosaics, such that the images in all filters, NIRCam and HST, are pixel-aligned.

HST Imaging Data
The EGS field has archival HST imaging from the All-wavelength Extended Groth Strip International Survey (AEGIS, (Davis et al. 2007)), the Cosmic Assembly Deep Extragalactic Legacy Survey (CANDELS, Grogin et al. 2011;Koekemoer et al. 2011), 3D-HST (Momcheva et al. 2016), and various followup programs.The entire CEERS field is covered by F606W, F814W, F125W, (shallow 800 s) F140W, and F160W; portions are covered by F105W.For CEERS, we produced an updated Figure 1.An example of the results of our background subtraction procedure.The left panel shows a zoom in on the science image of the ALONG module of our F444W mosaic in the CEERS1 pointing.The middle panel is the derived background, and the right panel is the background-subtracted science image.By progressively masking out objects in smaller tiers, this method is able to capture both small and large-scale fluctuations.Full details on this process are available in Bagley et al. (2022c).
version (v1.9) of the CANDELS EGS mosaics 2 specifically aligning their astrometry onto Gaia DR3, and on the same 30 mas pixel scale as our NIRCam images.In each of the HST mosaics, we create smaller cutouts to match the footprints of the drizzled NIRCam mosaics in each pointing.In this way we have pixel-aligned imaging in 12-13 filters per field (NIRCam1 does not include F105W coverage) from ∼ 0.5 − 5µm.

PHOTOMETRY
In this section we describe the creation of our HST + JW ST catalog.As the focus is on high-redshift galaxies, this catalog is optimized for faint, compact sources.To ensure accurate photometric redshifts, significant attention is paid to calculation of accurate colors and uncertainties.

Background Subtraction
The procedure adopted to remove background nonuniformities is described in detail by Bagley et al. (2022c).Briefly, it involves several different tiers of source masking, aimed at removing large-scale structures -including the wings of bright, extended galaxies -while preserving small-scale structures (including the wings of faint galaxies).We have found this procedure to be more effective than the built-in background subtraction procedure in SE , and we therefore use these images as inputs to SE ( §3.3) and disable its backgroundsubtraction step.

Point-Spread Function Matching
2 https://ceers.github.io/releases.html#hdr1 The full-width at half-maximum (FWHM) of the NIR-Cam point-spread function (PSF) varies significantly across the wavelengths of the filters used.As the selection of high-redshift galaxies depends on our ability to measure accurate colors, ensuring a similar fraction of light is measured in all bands is crucial.In this work, we accomplish this by matching the point-spread functions in our images to the F444W filter, as it is the reddest filter and thus has the largest PSF (FWHM=0.161 ).
Our procedure for PSF matching follows Finkelstein et al. (2022a), which we summarize here.In each filter, we create a preliminary photometric catalog made using Source Extractor v2.25.0 (Bertin & Arnouts 1996), and identify potential PSF stars by searching for the stellar locus in a plane of half-light radius versus source magnitude (making custom cuts in both quantities for each filter).We excluded objects with neighbors within 50 pixels with magnitudes brighter than the magnitude of the star in question minus one.As stars were more difficult to identify in some bands (e.g., F115W, and the shallower F410M), we combined stars identified in the short-wavelength and long-wavelength channel bands to one list for each channel, which resulted in typically 10-20 stars per NIRCam pointing.We then visually inspected each star in each image, removing stars near detector edges, other defects, or with close neighbors not excluded by the previous cut, to generate one star list per filter per pointing.
PSFs were then generated by stacking stars that passed this inspection.As our observations in all four pointings utilized the same dither pattern and were taken at a similar time, we create one PSF per filter by stacking all stars over all four pointings, increasing the signal-to-noise ratio of our PSF.For each star, we extract a 101x101 pixel box, upsample by a factor of 10, measure the centroid, and shift the star to be centered in this upsampled image.We then downsample back to the native resolution, rotate the star by a random position angle (to account for situations when the position angle of the observations was not identical), and normalize the star's peak flux to unity.The final PSF was made by median-combining the individual stars.The final PSFs have a centroiding accuracy of ∼0.05-0.1 pixels.
Kernels to match bluer PSFs to F444W were created with the pypher Python routine3 (Boucaud et al. 2016), and these bluer images were then convolved with their respective kernels.We included the HST/ACS F606W and F814W imaging in this PSF-matching process, as their PSF FWHMs are smaller than that of F444W.However, the HST/WFC3 band PSF FWHMs are larger than F444W, so we do not convolve these images.This will necessitate a correction accounting for the lower fraction-of-flux encompassed in a given aperture in these filters, which we discuss in §3.3.1.
We tested our PSF-matching process by measuring curves-of-growth of the PSF stars in the images.We find that the median enclosed flux at an aperture diameter of 0.3 was within 5% (and often less) of the F444W value for all filters (while prior to this PSF matching process, the bluer filter measurements encompassed ∼20% more light at this radius).We provide the median FWHM values in each filter in Table 1.

Source Extraction
We use SE in two-image mode to measure accurate photometry for each of our four pointings.SE requires a detection image to identify sources.We elect to use the inverse-variance-weighted sum of the PSF-matched F277W and F356W images as our detection image, to better detect faint sources.We do not include F200W in this stack as the Lyα break enters this filter at z = 13.4,and we do not wish to bias our catalog against extreme redshift galaxies (the blue edge of F277W corresponds to a Lyα break redshift of z = 18.9), while the inclusion of F444W could have potentially begun to bias against very blue sources.
Using this detection image, we run SE cycling through the seven NIRCam images and six HST images as the measurement image.The key source detection parameters were initially optimized using the CEERS simulated imaging4 (Bagley et al. 2022c), and further tweaked by inspecting their performance on the final mosaics.The parameters we used best recovered faint sources while minimizing contamination by spurious objects.These key parameters were: DETECT THRESH=1.3,DE-TECT MINAREA=5 pixels, and a top-hat convolution kernel with a width of 4 pixels.
We forced SE to skip the background subtraction step as this was previously removed ( §3.1).We use MAP RMS for the source weighting.As the pipelineproduced ERR images include Poisson noise, they are not appropriate for source detection.We thus convert the weight map associated with the detection image into an effective rms map by taking 1/sqrt(WHT), and assign this to the detection image.For the measurement image, we use the pipeline ERR image.
Following previous work (e.g., Finkelstein et al. 2022a) we measure colors in small elliptical apertures, which has been shown to accurately recover colors of distant galaxies.In SE these apertures are defined by two parameters -a Kron factor, and a minimum radius.We set these two quantities to (1.1, 1.6).These are the same values found by Finkelstein et al. (2022a) via optimization simulations, and we verified via our own simulations ( §3.3.1) that the signal-to-noise ratio in these apertures was significantly better than larger parameters, and that gains in signal-to-noise ratio were negligible for smaller values.We estimate an aperture correction to the total flux for these small apertures by performing a second run of SE on the F444W image with the Kron parameters set to the default "MAG AUTO" parameters of (2.5, 3.5), deriving an aperture correction as the ratio between the flux in this larger aperture to that in the smaller aperture for each object.The median aperture correction across all four fields was 1.5.This aperture correction was then applied multiplicatively to the fluxes and uncertainties for all filters.

Residual Aperture Correction
While we use small Kron apertures to derive accurate colors, the aperture correction applied above should yield total fluxes close to the true value.However, several previous studies have noted that the default Kron parameters we use for this aperture correction can miss light in the wings of the PSF (e.g., Bouwens et al. 2015;Finkelstein et al. 2022a), yielding underestimates of the total fluxes at the 5-20% level.
We estimate these corrections using source-injection simulations, adding 3000 mock sources to our real images in each field.We add sources from m = 23-27 mag (to ensure a robust photometric measurement), with a log-normal half-light radius distribution peaking at ∼1.5 pixels (∼0.2 kpc at z = 10; compact but modestly re- Note-PSFs were created by stacking stars across all four pointings.For our photometry, we PSF match all filters with FWHM smaller than the F444W PSF FWHM to the F444W PSF.We note the HST imaging used is on the same 30mas pixel scale, which affects the FWHM of the PSF.The limiting magnitude is that measured in a 0.2 diameter aperture on the unmatched images, corrected to total based on the PSF flux enclosed in that aperture size, averaged over the four fields.The derived corrections to the photometric zeropoints for each filter were derived using best-fitting EAZY models to ∼900 galaxies with secure spectroscopic redshifts.These corrections are due to a combination of residual differences between our estimated total fluxes and true total fluxes, differences between the model templates and true galaxies, and true photometric zeropoint inaccuracies (using the photometric reference files from pmap 0989 for NIRCam), Because these corrections depend specifically on our photometry procedure, they may not be appropriate for other photometric catalogs.
solved, comparable to high-redshift sources, see §5.4), with a log-normal Sérsic parameter distribution, peaking at 1.2.These mock sources were generated with galfit (Peng et al. 2002), and added at random positions to the F277W, F356W and F444W images.We combined the former two to create a detection image, and ran SE in the same way as on our real data to generate a F444W catalog (focusing on this one band as all images were PSF-matched to F444W).Finally, we match sources in the SE catalog to their input values, and compare the ratio of input-to-recovered fluxes.We find a median ratio of 1.08, measured between 25 < m F 444W < 26.There is a slight trend with magnitude of lower corrections for brighter sources, and higher for fainter sources, but only at the 1-2% level.We thus elect to use a single correction factor of 1.08 to all NIRCam fluxes and uncertainties.
For the HST fluxes, we elect to derive any residual aperture correction from comparison to the Finkelstein et al. (2022a) photometric catalog, which performed similar simulations to derive total fluxes.Matching sources in each of the six HST bands, we find a typical needed correction factor of ∼1.35 (± 0.02).These values are roughly consistent with the combination of the correction derived in Finkelstein et al. (2022a) of 1.20 with the NIRCam correction derived here of 1.08.We apply this same 1.35 correction to all HST bands, such that colors amongst these bands are not changed.We note that in §3.6 below we test for the presence of any remaining photometric offsets in our catalog, and find these to be small ( 5%), indicating our procedure for deriving total fluxes in all 13 HST + JWST bands is robust, especially for this nascent observatory.

Noise Estimation
While SE does provide an estimate of the noise, it is reliant on the accuracy of the provided error maps.Given the nascent nature of the JWST reduction pipeline, we obtain estimates of our image noise directly from the images themselves.We follow the methods of Finkelstein et al. (2022a), based on previous methodology outlined in Papovich et al. (2016).Our goal is to estimate the noise based on the number of pixels in an aperture.We fit for the noise as a function of aperture size by measuring the fluxes in circular apertures with 30 different diameters, ranging from 0.1 (3.33 pixels) to 3 (100 pixels).
When defining these positions, we restrict aperture placement to pixels with real non-zero values in the ERR image, and a zero value in the SE segmentation map within the aperture, avoiding real objects.We also require these apertures to be non-overlapping to avoid correlating our noise estimation.To improve statistics for smaller apertures, we placed apertures in two separate iterations -"small" (d ≤ 1.5 ) and "large" (d > 1.5 ).We were able to place 5000 and 500 non-overlapping apertures in these two iterations, respectively.
We create a detection image setting the pixels at these positions to unity, with the rest of the image set to zero, and ran SE in two-image mode.We measured fluxes at these positions in all 30 circular apertures with diameters ranging from 1 -200 pixels.We calculate the 1σ noise in each aperture size by measuring the median absolute deviation of the measured flux values (multiplying by 1.48 to convert to a Gaussian-like standard deviation).Finally, we fit a curve to the noise in a given aperture as a function of pixels in that aperture, using this equation (Gawiser et al. 2006): where σ N is the noise in an aperture containing N pixels, and σ 1 is the pixel-to-pixel noise measured in each image as the sigma-clipped standard deviation of all non-object pixels (see Figure 3 in Finkelstein et al. 2022a for an example of this process).We fit the free parameters α and β with an IDL implementation of emcee (see Finkelstein et al. 2019 for details).We used these functional form fits for each filter to calculate the photometric uncertainties for each object, using both the number of pixels in its Kron aperture (Area = π× A IMAGE × B IMAGE × KRON RADIUS 2 ), as well as a value for a given circular aperture.These values were scaled by the ratio of the error image value at the central position of a given source to the median error value of the whole map, thereby allowing the noise to be representative of the local noise level.
Finally, to account for variable image noise not captured by the error image value at the central pixel, for each object in our catalog we also calculate a local noise measurement.This local noise was calculated at 0.2 , 0.3 , 0.4 and 0.5 -diameter apertures, by fitting a Gaussian to the negative side of the flux distribution in the 200 closest apertures from the above process.

Multi-band Catalog
We compose a multi-band catalog from the individualfilter catalogs created by SE.As SE cannot parse the world-coordinate system in the JWST data model image headers, we use astropy.wcswcs pix2world to derive celestial coordinates from the SE x, y positions.We apply a photometric zeropoint to convert the image from MJy sr −1 to erg s −1 cm −2 Hz −1 , and apply both aperture corrections derived above to all flux and flux error estimates.We correct for Galactic extinction using an E(B-V) of 0.006 for the EGS field, and a Cardelli et al. (1989) Milky Way attenuation curve.
Both the MAG AUTO-derived aperture corrections and simulation-based (and HST catalog-based) residual aperture corrections were applied to all Kron-derived fluxes and uncertainties, maintaining our accurate colors while representing the total fluxes for each source.In our final catalog, we include both these Kron-based fluxes, as well as fluxes measured in circular apertures with diameters ranging from 0.05 to 2.0 .While the Kron fluxes will be used for most of the analysis, we do use the circular apertures (with a fiducial diameter of 0.2 , or 6.67 pixels) as a measure of detection significance, as the sizes of Kron apertures can be affected by the proximity to bright sources.These circular apertures are corrected for Galactic attenuation, but not corrected to total, as we will use them solely for detection significance.
For both the Kron and circular apertures, we calculate the noise per source following the method in §3.4,which is dependent on both the aperture area and the effective rms map value at the position of the source.Finally, we flag any sources which had either a zero or NaN in any error column, replacing their flux error with 10 12 nJy (several orders of magnitude larger than any real source error).
In Table 1, we include an estimate of the limiting 5σ magnitude for our catalog.To calculate this, we use the noise functions described above to derive the flux density uncertainty in an aperture of diameter 0.2 .We then measure the enclosed flux at this radius from the stacked PSF.We then divide the flux uncertainty by the enclosed fraction of flux to estimate the total noise for a point source.Finally, we multiply this value by five, and convert to an AB magnitude.Both the enclosed flux values and the limiting magnitudes are listed in Table 1.While the depths were broadly as expected based on the pre-launch exposure-time calculator, the F115W image (with double the exposure time), was expected to be 0.3 mag deeper.The very low background at that wavelength has led to those images being more readnoise dominated than expected, thus this image has a depth comparable to the bulk of the NIRCam filters.The primary impact is that photometric redshifts will Figure 2. The total corrections applied to our catalog to account for residual systematic offsets in our total flux estimations, mismatches between the spectral templates used for photometric redshift fitting, and any remaining photometric calibration corrections (shown as squares; small dots are individual galaxies).These multiplicative offsets were derived by comparing the observed fluxes in each filter to the best-fitting EAZY model templates for ∼900 sources with robustly known spectroscopic redshifts in our field.The values are tabulated in Table 1.
be slightly more uncertain at m > 29 than originally planned.

Photometric Calibration Validation
While the photometric calibration of NIRCam has substantially improved over the first few months since science acquisition began, there is still some uncertainty, and many of the reference files still used in the pipeline are preliminary (to be refined during all of Cycle 1; see Boyer et al. 2022 for more details on the NIRCam photometric calibration used here).It is thus prudent to check the accuracy of the photometric calibration.The JWST pipeline applies a photometric calibration taking into account the area of each pixel (using a photom reference file), resulting in image units of MJy sr −1 .For the 30mas scale of our images, the conversion factor from MJy sr −1 to erg s −1 cm −2 Hz −1 is 2.1154 × 10 −31 .
To check the accuracy of this calibration, we use a sample of objects with published spectroscopic redshifts in this field (compiled by N. Hathi, private communication; including DEEP2; Newman et al. 2013, MOSDEF;Kriek et al. 2015, and 3DHST;Momcheva et al. 2016, among others).Applying a matching radius of 0.5 , removing duplicates, keeping the two highest quality redshift flags, and restricting to F160W < 24 mag, we find 988 matches which fall in our CEERS/NIRCam-covered area.These objects span z = 0-4, with a median z spec = 1.1.
We generate the expected fluxes of these sources in all 13 photometric bands using the EAZY (Brammer et al. 2008) photometric-redshift fitting software.While in §4 below we will use EAZY to measure photometric redshifts, here we run EAZY with the redshift fixed to the spectroscopic redshift value.This thus obtains the best-fitting galaxy template to the observed photometry.Assuming that this template set spans the color range of real galaxies (see §4.1), a comparison of the observed fluxes in a given filter to those predicted by EAZY can inform us on any systematic discrepancies in the fluxes in our catalog.We measured the median offset for the subset of these sources with a measured signal-to-noise ratio of > 5 and an EAZY goodness-of-fit (χ 2 ) < 20.This ensures that a well-fit model is found to a set of robust photometric measurements.This resulted in typically ∼850-900 sources per filter (84 in F105W, which has significantly less coverage), from which we measured a sigma-clipped median and standard deviation.
We tabulate these derived corrections in Table 1, and plot the median values and full dispersion in Figure 2.For NIRCam, the LW bands are all consistent with unity, which is a significant improvement over the state of the calibration in late summer 2022.For the short-wavelength channels, we do find a needed correction on the 3-7% level.For HST, we find that with the exception of F606W, the remaining bands are ∼3-5% too bright.As HST is extremely well calibrated, this implies that the 35% correction we applied based on the comparison to the Finkelstein et al. (2022a) catalog may have been slightly too large (we note that that study did not complete such a zeropoint-offset analysis).Nonetheless, we applied these corrections.To test their accuracy, we performed another iteration of this analysis after applying these offsets, and found that no significant residual correction was present.We thus apply these corrections, listed in Table 1, to all fluxes and flux errors in our final photometric catalog.We reiterate that while photometric calibration was our motivation for this test, the resulting corrections are a combination of residual differences between our estimated total fluxes and true total fluxes, differences between the model templates and true galaxies, and true photometric zeropoint inaccuracies.

SELECTION OF REDSHIFT > 9 GALAXIES
Our analysis method produced a photometric catalog which contains our best estimates of the total fluxes in each of 13 filters spanning 0.6-5.0µm,with robust flux uncertainty values.We also include fluxes measured in a range of circular apertures.In this section we will use this catalog to select galaxies at z > 8.5.Below we will create identifiers for each object inclusive of the CEERS field the object was covered in, and the SE number within that field.For example, "Maisie's Galaxy", the previously identified z ∼ 12 galaxy candidate from Finkelstein et al. (2022b), is referred to here as CEERS2 5429.

Photometric Redshift Estimation
We measure photometric redshifts with EAZY for our entire photometric catalog, which contains ∼40,000 sources across all four fields.EAZY fits non-negative linear combinations of user-supplied templates to derive probability distribution functions (PDFs) for the redshift, based on the quality of fit of the various template combinations to the observed photometry for a given source.The template set we use includes the "tweak fsps QSF 12 v3" set of 12 FSPS (Conroy & Gunn 2010) templates recommended by the EAZY documentation.
As the population of z > 9 galaxies is expected to exhibit fairly blue rest-frame UV colors, we follow Larson et al. (in prep) in adding six additional templates.They derived these templates by combining stellar population spectra from BPASS (Eldridge & Stanway 2009) with (optional) nebular emission derived with Cloudy (Fer-land et al. 2017).They used models with low metallicities (5% solar), young stellar populations (stellar ages of 10 6 , 10 6.5 , and 10 7 Myr), inclusive of binary stars, and with a high ionization parameter (log U = −2).Larson et al. (in prep) showed that the inclusion of these templates significantly improved the recovery of photometric redshifts from a mock catalog derived by a semianalytic model (Yung et al. 2022), due to the better match between galaxy and template colors.These templates were also used by Finkelstein et al. (2022b) in their analysis of the z ∼ 12 "Maisie's Galaxy", where they found the inclusion of these bluer templates significantly improved the goodness-of-fit between the data and the best-fitting model.
We assume a flat prior in luminosity; while bright galaxies will have a redshift distribution significantly tilted towards lower redshift, the bright end of the highredshift luminosity function is poorly known, thus we do not wish to bias against the selection of true, bright distant galaxies.We include a systematic error of 5% of the observed flux values, and fit to our measured total flux and flux error values.In our fiducial EAZY run the redshift can span 0-20.We perform an additional "Low-z" run with the maximum redshift set to z max = 7 such that we can visualize the best-fitting low-redshift solution.

Sample Selection
Here we describe our selection criteria we use to identify candidate z > 8.5 galaxies.Following our previous work (Finkelstein et al. 2010(Finkelstein et al. , 2015(Finkelstein et al. , 2022a,b),b), we utilize a combination of flux detection significance values and quantities derived from the full photometric redshift PDF (denoted P[z]) to select our galaxy sample.As a part of this selection, we make use of the integral of the P(z) in ∆z = 1 bins centered on integer redshift values.Specifically, we denote the unit redshift where the integral in a z ± 0.5 bin is the maximum compared to all other redshifts as S z .For example, a sources with S z = 10 would have 10.5 9.5 P(z) dz greater than the integrated P(z) in all other ∆z = 1 bins.
Our primary selection criteria are: • A signal-to-noise ratio, as measured in 0.2 diameter apertures in the non-PSF-matched images, of > 5.5 in at least two of the F150W, F200W, F277W, F356W or F444W bands.We required this to be true with both the global as well as local noise values.This allows the selection of galaxies across a broad range of redshifts and rest-UV colors.We note that we experimented with requiring a higher significance detection in just one band, but this significantly increased the spurious source fraction.
• Error map values < 1000 in all of the F115W, F150W, F200W, and F277W images.This includes only objects with a measurable (though not necessarily significant) flux in the bluest four filters, necessary for selection of galaxies at z ∼ 9-13.
• A signal-to-noise ratio of < 3 in bands blue-ward of the Lyα break.While studies occasionally use more stringent signal-to-noise cuts, we choose this value to both account for the fact that any positive flux in all dropout bands is already accounted for by EAZY, and that >1σ random fluctuations can coincide at the positions of real galaxies with non-Gaussian noise as is present in these images.For this criterion, we include F606W and F814W for all redshifts considered here.We add F115W for S z = 11-12, and F150W for S z = 13-17.These redshift values were chosen to ensure that the Lyα break is fully red-ward of the ∆z = 1 range for a given filter.
• Best-fitting photometric redshift z best > 8 (defined as "za" with EAZY; the redshift corresponding to the highest likelihood) with a goodness-of-fit χ 2 < 60.
• S z ≥ 9 (selecting a sample of galaxies at z 8.5).
• ∆χ 2 > 4, calculated as the difference between the best-fitting χ 2 value for the low-redshift restricted model to the fiducial model.By requiring a value greater than four, we require the lowredshift model to be ruled out at ≥ 2σ significance.We note that Harikane et al. ( 2022) required a more conservative ∆χ 2 > 9, based on results from the CEERS simulated imaging.We explore the impacts of such a cut in §7.2.

Sample Vetting
This automated selection resulted in a sample of 64 candidate galaxies with z 8.5.To explore the impact of potential contamination by spurious (e.g., nonastrophysical) sources, we visually inspected each object in all images to search for obvious diffraction spikes, sources on image edges (as the dithering is different in different filters, the edges do not necessarily line up between different images), and un-flagged cosmic rays.
To perform this inspection, three authors (SF, JK, PAH) viewed "bio" plots which showed small 1.5 cutouts of the candidate in all images, with two different stretches, a 5 cutout in the F200W and detection images, the fiducial and z < 7 restricted P(z), and the spectral energy distribution plotted against both fiducial and Low-z EAZY models.From this process we identified 36 spurious sources: six diffraction spikes, 11 unflagged cosmic rays, and 19 sources on image edges.We remove these 36 sources from our galaxy sample, and show images of all in Figure 20 in the Appendix, and list their positions in Table 6.The cosmic rays were predominantly an issue in the long-wavelength image in pointings 3 and 6; these have longer exposure times than pointings 1 and 2, amplifying the impact of cosmic rays.We note that future versions of our images and catalogs should be able to minimize these types of spurious sources by improving the cosmic-ray removal and using the JWST data model "Context map" to identify pixels close to the image edge.
Upon visual inspection, we noticed that a small fraction of the sample (four objects) had Kron apertures which were drawn too large due to the presence of bright, nearby galaxies.To explore the impact of potential light from neighboring galaxies affecting the colors, we performed an additional run of EAZY using the colors measured in 0.3 diameter apertures to more accurately measure the colors of the candidate galaxies.For this run we used the PSF-matched photometry, but did not apply any aperture correction as the photometric redshift depends only on colors and not luminosity.We found that two sources, CEERS3 1537 (initial z phot = 10.5) and CEERS6 7478 (initial z phot = 8.6) had their photometric redshifts shift to lower redshift with these smaller apertures; these were the same two sources where the Kron aperture was most significantly stretched.We thus remove these two objects from our sample, though we show their SEDs and image stamps in Figure 19 in the Appendix.
The remaining two sources (CEERS1 3908 and CEERS1 3910) had no significant change in their photometric redshift when measured with these small apertures.For these two objects we correct their fluxes by a single factor to account for potential excess brightness from the neighbor.We do this by deriving a flux correction as the ratio between the median total flux for objects with similar 0.3 diameter aperture fluxes as a given source to the actual Kron-based (aperturecorrected) total flux for those comparison sources, and divided this quantity by the same quantity for the sources in question.We apply this correction to all filters such that the colors do not change.For CEERS1 3908 .The distribution of the best-fitting (minimized χ 2 ) photometric redshift versus apparent magnitude.Each object is color-coded by its redshift sample (with the background shading denoting the redshift range where the P(z) was integrated to determine the sample placement; the green circle far into in the blue region [CEERS6 7641] has a double peaked P(z)).The magnitude plotted is F150W for z ∼ 9, F200W for z ∼ 11, and F277W for z > 12 galaxies.The small black star denotes "Maisie's Galaxy" (Finkelstein et al. 2022b), known here as CEERS2 5429.The top axis shows "Cosmic Time" (the time since the Big Bang) for our adopted cosmology.
this correction factor was 0.6; for CEERS1 3910 it was close to unity, so we applied no correction.

AN EARLY REDSHIFT > 9 JWST GALAXY SAMPLE
Our final sample consists of 26 galaxies.We divide them into redshift bins for analysis.The high redshifts now accessible with JWST coupled with the broad filter transmission functions make it harder to place galaxies in often-used ∆z = 1 redshift bins, and such bins span progressively smaller epochs of cosmic time (for example, a ∆z = 1 bin centered at z = 12 would cover the Lyα break over only ∼40 Myr of cosmic time, compared to 200 Myr for one centered at z = 6).We thus split our sample into three redshift bins: a "z ∼ 9" sample from z = 8.5-10, a "z ∼ 11" sample from z = 10-12, and a "z > 12" sample from z = 12-17.We place candidate galaxies in the sample bin based on where their integrated P(z) across the bin is the largest.These three bins cover roughly similar ranges of cosmic time (116,105,122 Myr,respectively).We show the distribution of photometric redshifts and apparent magnitudes of our sample in Figure 3, and we tabulate the sample in Table 2.We show cutout images of all candidates in Figures 5,  6, 16 and 17, and SEDs in Figure 5, 7 and 18.We also present a rest-UV color montage of our sample in descending redshift order in Figure 4.
As a check on our fiducial EAZY photometric redshifts, we also measured photometric redshifts with Cigale (Burgarella et al. 2005).We find excellent agreement, with a median difference of 0.2 (with EAZY preferring the slightly higher redshifts).Only the source CEERS1 4143 has a difference in best-fitting redshift of >2, as Cigale prefers z = 5.4 for this source, whereas EAZY finds z = 9.0, though this source lies right at the boundary of our ∆χ 2 selection criterion.Finally, we inspected the spatial distribution of our full sample of galaxies across the full CEERS field, and do not visually see any evidence of strong clustering (which would not be expected with such a small sample covering a broad redshift range).

The z > 12 Sample
This highest-redshift sample contains two galaxies: CEERS2 2159 and CEERS1 1730, with 68% confidence ranges on their photometric redshifts of 16.0-16.6and For each galaxy the image shows as blue-green-red the three NIRCam bandpasses closest to the Lyα break, this includes: F150W, F200W, and F277W for galaxies at z = 9; F200W, F277W and F356W for galaxies in the z = 10-12 samples; F277W, F356W, and F444W for galaxies in the z ≥ 12 samples.In each case the images are 1.5 × 1.5 .The inset scale bar shows 1 (physical) kpc.We list the numerical IDs and the best-fitting photometric redshift values (the redshift uncertainties are listed in Table 2).
12.3-14.2,and F277W magnitudes of 26.5 and 27.7, respectively.We will compare in detail our sample to those previously presented in the literature in §6 below, but we note here that CEERS2 2159 was first presented as a robust ultra-high-redshift candidate in CEERS1 1730 is presented here for the first time.
The cutout images for these two objects are shown in the top panel of Figure 5.It is apparent that there is no significant flux at the positions of both candidates in the stacked ACS F606W+F814W images, nor in the F115W image.For CEERS2 2159 no flux is also seen in F150W, and the F200W flux is noticeably fainter than in F277W, consistent with its photometric redshift of z ∼ 16-16.6 (where the Lyα break would be in the red half of the F200W bandpass).For CEERS1 1730, faint flux is seen in F150W.However, this is measured at just 1σ significance in the Kron aperture.This leads to a fairly broad P(z), spanning z ∼ 12-14.If this flux is real, then z ∼ 12 is more likely.If it is not significant, then z ∼ 14 is possible.The SEDs, shown in the bottom panel of Figure 5, show the strong observed Lyα break in both objects, with fairly blue UV spectral slopes redward of the break.For both objects, the highredshift model is a significantly better fit to the photometry than the best-fitting low-redshift model, though for CEERS1 1730, ∆χ 2 is only 4.4; this low-redshift solution is visible as a small peak in the fiducial P(z).CEERS1 1730 has a best-fit photo-z of 13.4, albeit with a wide 68% confidence range of 12.3-14.2.CEERS2 2159 is best-fit by z = 16.5 (16.0-16.6), and was first identified by Donnan et al. (2022).The bottom panels show the best-fitting EAZY models (both overall, and constrained to z < 7), the model bandpass fluxes (open squares), alongside the observed photometry (circles; upper limits are 2σ).The inset panels show the P(z) distributions.Both sources exhibit well-constrained Lyα breaks, implying the redshifts are z > 11.CEERS1 1730 does show a small low-redshift solution, and its primary P(z) peak extends to z ∼ 10.5.
CEERS2 2159 shows no significant lower-redshift peak in its P(z) due to its brighter magnitude leading to more robust constraints on the full shape of the SED.

The z ∼ 11 Sample
The z ∼ 11 sample contains nine galaxies.As shown by their cutout images in the bottom panel of Figure 6, all show no significant flux in the ACS stack and F115W images, consistent with z > 9.5.Many exhibit a red F150W−F200W color, suggesting the Lyα break is in F150W.The SEDs and P(z)s for all nine are shown in Figure 7. First looking at the P(z)s in the upper-left of each panel, the amplitude of the detected Lyα break is strong enough to either eliminate, or leave a very small low-redshift solution.As shown in Table 2, the integrated P(z > 7) is ≥0.98 for 8/9 of these sources, with the remaining source (CEERS1 7227) having a value of 0.85, significantly greater than our sample limit of 0.7 (this object also just barely satisfies our ∆χ 2 cri-terion, as only the F200W flux is discrepant with the best-fitting lower-redshift model, and it has a 2.2σ significance detection in F115W, which could indicate a redshift closer to z ∼ 9).
CEERS6 7641 shows a double-peaked high-redshift P(z), with peaks at z ∼ 9 and z ∼ 11.As its integrated P(z) at z = 10-12 is larger than at z = 8.5-10, this object was placed in the z ∼ 11 sample (this is the green z ∼ 11 symbol in Figure 3 that is in the z ∼ 9 sample region), though clearly it has a near-equal probability of being at slightly lower redshift.Finally, we note that similar to the z > 12 galaxies, the UV spectral slopes for these nine sources all appear fairly blue.Though a detailed analysis of this quantity is beyond the scope of this paper, it is clear that these objects all appear fairly low in dust attenuation.We acknowledge that Lyα-break selection can be biased against red sources (e.g.Dunlop et al. 2012), thus a full quantita- Note-The horizontal lines divide our three redshift samples (given by the sixth column).The photometric redshift is "za" from EAZY, which is the redshift where the χ 2 is minimized.The ∆χ 2 in the final column compares the best-fitting low-redshift (0.5 < z < 7) model to the best-fitting high-redshift model; a value of ≥ 4 was required for selection.a Previously published as Maisie's Galaxy by Finkelstein et al. (2022b).The half-light radii are listed in pixels; our pixel scale is mas.* Galfit did not converge for these two sources, so we list their SE half-light radii.
tive analysis on the colors of these galaxies is reserved to future work.

The z ∼ 9 Sample
The remaining 15 candidate high-redshift galaxies fall into our z ∼ sample.We show the cutout images in Figures 16 and 17 in the Appendix.At z < 9.5, the Lyα break is in the F115W band, thus we expect to see signal in that image for all but the highest-redshift sources in this subsample, though the F115W flux should be fainter than the F150W flux, which is apparent in the images.Likewise, we see no significant flux in the ACS filters as expected.Examining their SEDs and P(z)s in Figure 18 in the Appendix, we see that nearly all show a tight highredshift peak with very little probability of being at z < 7. Similar to the z ∼ 11 sample, the lowest integrated P(z > 7) is 0.84, with 13/15 galaxies having integrated P(z > 7) > 0.9, and 8/15 having integrated P(z > 7) ≥ 0.99.
The object with the most complex P(z) constraints is CEERS2 2274, which shows small peaks at z ∼ 2 and 6, a dominant peak at z ∼ 9, and a modest peak at z ∼ 10.5, with this uncertainty due to the lower signalto-noise ratio on this object's faint fluxes.While the majority of these z ∼ 9 galaxies show blue rest-UV spectral slopes, the lower redshift means that the red-  dest filters probe the 4000 Å breaks, and it is clear that some objects have redder colors in these reddest bands.This may be due to significant stellar mass in (somewhat) older stellar populations (e.g.Labbe et al. 2022), though could also be due to the presence of very strong rest-frame optical nebular emission lines, which early JWST results indicate are ubiquitous in high-redshift star-forming galaxies (Endsley et al. 2022;Papovich et al. 2022).
Finally, we discuss the interesting pair of objects CEERS1 3908 and CEERS1 3910.As discussed in §4.3, the presence of a bright neighbor skewed their Kron apertures, but even after correcting for this they appear as robust z ∼ 9 candidates.In inspecting their cutout images, it is apparent that these two candidate galaxies are very close together with each consisting of a small knot of emission with centroids 0.6 (∼2.5 kpc) apart.It is possible that these are two star-forming regions of the same host galaxy.This hypothesis could be sup- ported by the apparent very faint emission between the two clumps.Deeper imaging could resolve this, though as they are resolved in our catalog we do not merge them here.
As one additional check, we explored the impact of our photometric correction factors ( §3.6) which we had applied to our catalog.We re-ran EAZY on our final sample of 26 high-redshift galaxy candidates removing this factor.We find that this makes effectively no change in the high-redshift solution, with best-fitting redshifts unchanged to more than the 1-2% level, and all candidates continuing to satisfy our P(z) selection criteria.We did notice a slight change in the best-fitting low-redshift model, leading to the median value of ∆χ 2 being reduced by 5%.This impacts sources which were close to our ∆χ 2 threshold, with CEERS1 1730 (∆χ 2 = 4.4 → 3.5), CEERS1 4143 (∆χ 2 = 4.0 → 3.7) and CEERS1 7227 (∆χ 2 = 4.0 → 3.7) falling below our threshold of ∆χ 2 > 4. While these sources remain in the sample following our fiducial selection, their presence near this threshold leaves their inclusion more sensitive to these small correction factors.

Galaxy Sizes
We derive the sizes of all 26 galaxies in the sample by performing parametric fits on the F200W NIRCam images using Galfit 5 (Peng et al. 2002(Peng et al. , 2010)).Galfit finds the optimum Sérsic fit to a galaxy's light profile using a least-squares fitting algorithm.As input, we use a 100x100 pixel cutout of the F200W image for each source, the corresponding error array (the 'ERR' extension) as the input sigma image, and the empirically derived PSFs.We use the source location, magnitude, size, position angle, and axis ratios from the SE catalog as initial guesses.We allow the Sérsic index to vary between 0.01 and 8, the magnitude of the galaxy between 0 and 45, the half-light radius (r h ) between 0.3 and 100 pixels, and the axis ratio between 0.0001 and 1.We also allow Galfit to oversample the PSF by a factor 9. We then visually inspected the best-fit model and image residual for each source to ensure that the fits were reasonable and that minimal flux remained in the residual.We also performed a fit on the PSF image itself in order Figure 8.The F200W half-light radii of our high-redshift galaxy candidates, measured with Galfit.The shaded region denotes the half-light radius for the F200W PSF, of 1.18 ± 0.01 pixels (our pixel scale is 30 mas).While the galaxies are compact, all but the faintest appear to be resolved.Our candidate galaxies have a median half-light radius of 0.46 kpc.The small circles denote the two objects where Galfit did not converge, thus we show the SE half-light radius.
to determine the smallest resolvable size.This value is 1.18±0.01pixels.Two sources (CEERS1 3908, a member of the pair discussed above, and CEERS2 588) failed to converge on a fit.
In Figure 8 we show these measured sizes, highlighting that the majority of the sample is significantly spatially resolved.The measured r h values range from 0.41 to 8.59 pixels, with a median value of 3.6 pixels (0.11 or 0.46 kpc).These sizes are consistent with the rest-UV sizes found in the GLASS survey by Yang et al. (2022), who found a median half-light radius of 0.45 ± 0.13 kpc for galaxies at 7 < z < 15.

Stellar Contamination Screening
It is important to analyze whether any objects in our sample could potentially be low-mass stars or browndwarfs, as they can have similar colors as high-redshift galaxies when observed in broadband filters (e.g., Yan et al. 2003;Ryan et al. 2005;Caballero et al. 2008;Wilkins et al. 2014).We first analyze the galfitproduced half-light radii of our candidates, as any resolved objects cannot be stellar in origin.Figure 8 shows the Galfit-measured radii for the 24/26 of our candidate high-redshift galaxies where Galfit converged (showing the SE values for the remaining two).We find that the majority of the sample is obviously resolved, with only five of the faintest sources having sizes within the 1σ error bars that allow for a point-source.
For these five unresolved sources, we then follow the methodology of Finkelstein et al. (2022a,b) to explore whether the colors of any of the few unresolved galaxy candidates could be consistent with stars.We derive a grid of models for the colors of low-mass stars and brown dwarfs (spectral types of M4-T8) in the NIRCam filters, by integrating the IRTF SpEX brown dwarf templates (Burgasser 2014).As these spectra end at 2.5µm, we use the tabulated 2MASS photometry to link each SpeX model with Spitzer/IRAC photometry from Patten et al. (2006).Following Finkelstein et al. (2022b) we assume we can map IRAC 3.6µm onto F356W and 4.5µm onto F444W, though future spectroscopic observations of brown dwarfs with JWST at λ 2.5 µm will improve this methodology.To explore which model is preferred, we use the Bayesian Information Criterion, which includes the goodness-of-fit (χ 2 ), the number of photometric constraints (five for stars, 12 for galaxies [due to the use of HST for the latter]), and the number of free parameters (one for stars, 19 [18 templates fit simultaneously, plus the redshift] for galaxies).We find that two of these five sources have BIC values which indicate a preference for the best-fitting stellar model.
Although these two objects (CEERS2 2274 and CEERS2 1075) are formally unresolved and have optical/near-infrared colors that consistent with brown dwarfs (Patten et al. 2006), we nevertheless doubt this interpretation as they are very faint (m F 277W = 29.1 mag).The colors imply a very late spectral type of T5, which corresponds to an absolute magnitude of K=16.1 AB mag (Patten et al. 2006).Such a brown dwarf would therefore be at a heliocentric distance of 3.6 kpc, or 3.1 kpc above the Galactic plane, which would place this object approximately nine scale heights out of the thin disk (e.g., Ryan et al. 2011;Holwerda et al. 2014).These objects are therefore highly unlikely to be part of the Population I thin disk stars, but could be a member of the thick disk or Galactic halo (Ryan & Reid 2016).At present, there are very few constraints on the density of brown dwarfs in these more distant Galactic components at these extremely low effective temperatures, but based on more massive main sequence stars, these are expected to have many orders-of-magnitude fewer stars than the thin disk (e.g., Jurić et al. 2008).A rigorous Bayesian model of the halo brown dwarfs would provide the strongest statistical evidence and prediction for or against this source, but this is beyond the scope of the present work.We also note that CEERS2 2274 appears to have a very nearby neighbor with a similar SED, which could indicate it is a merging system at high-redshift.We do note that both sources are below the brightness limit (m < 28.5) we apply when analyzing our sample in §7, so their inclusion (or exclusion) does not affect any of our conclusions.

Examining Ancillary Multi-wavelength Observations
We have examined the positions of our 26 candidate z 9 galaxies at both shorter and longer wavelengths, and find no significant detections, increasing confidence in the very high-redshift interpretation.While the depth of these data are not necessarily sufficient to completely rule out low-redshift solutions, these non-detections do increase our confidence in the fidelity of our sample.First we searched for X-ray emission coincident with our candidate positions using Chandra imaging from the AEGIS-XD survey (Nandra et al. 2015), which has a flux limit of 1.5 × 10 −16 erg cm −2 s −1 in the 0.5-10 keV band.There was no emission detected with a Poisson false probability less than 4 × 10 −6 in the soft (0.5-2 keV), hard (2-7 keV), full (0.5-7 keV) and ultrahard (4-7 keV) energy bands.
We then investigated possible far-infrared (FIR) emission at the position of our high redshift galaxy candidates, using the super-deblending catalog technique from Liu et al. (2018) and Jin et al. (2018), which was adapted for the EGS field.To summarize, this multiwavelength fitting technique is meant to optimize the number of priors fitted at each band to extract the deepest possible information.We use data from Spitzer (24µm from FIDEL; Dickinson & FIDEL Team 2007), Herschel 100µm and 160µm from PEP (Lutz et al. 2011) and 250µm, 350µm, 500µm from HerMES (Oliver et al. 2012), JCMT/SCUBA2, including 850µm from S2CLS (Geach et al. 2017) and 450µm and 850µm from Zavala et al. (2017) and AzTEC 1.1mm from Aretxaga (2015).
The key is to obtain an adaptive balance as a function of wavelength/dataset between the number of priors fitted, the quality of the fit, and the achievable deblending given the PSF sizes.We start with the deepest images and fit each band from deeper to shallower images.
In general, the high-redshift galaxy candidates are not already contained in our prior-lists, with the exception of two sources, for which all measurements are already contained in the forthcoming catalog (A. Le Bail et al., in preparation).For the other candidates we consider them one at a time and add a specific prior at its position, that we fit together with the rest of priors that are relevant for each band.Extensive Monte-Carlo simulations ensure that the uncertainties associated to the flux measurements are "quasi-Gaussians" (see Liu et al. 2018;Jin et al. 2018, A. Le Bail et al. in preparation).None of the candidates are significantly detected at any band between 24µm and 1.1mm.We do note that for one galaxy, CEERS1 1730, this method implies a ∼ 3σ detection at 500 µm.However, inspecting the images, it is clear at 160, 250 and 350 µm there is significant emis-sion from a bright neighbor to the south, which overlaps our source's position at 500µm due to the worsening PSF.We conclude it is highly unlikely that this emission is associated with our candidate.
We do also note that our object CEERS2 2159, first published as ID=93316 by Donnan et al. (2022), does have a formal 2.6σ detection in the SCUBA-2 850µm data, as noted by Zavala et al. (2022).However, that marginal 850µm detection could plausibly be associated with other nearby z ∼ 5 galaxies within the beam as discussed in Zavala et al. (2022); higher resolution mm interferometry is being carried out on this system to gain further insight (Fujimoto et al., in prep).In Figure 9, we show a comparison of the first object in NIRCam versus HST and Spitzer imaging.While the object was only detected at modest significance in the ∼few-orbit HST imaging, and barely at all in the 50 hr depth Spitzer/IRAC imaging, it is extremely well detected (signal-to-noise ratio >30) in < 3000 seconds in all seven NIRCam filters.
In this figure we also compare constraints on the photometric redshift from the previous HST + Spitzer photometry of EGS z910 40898 to the present JWST + HST photometry.While this object was previously selected as a z ∼ 9 candidate as the majority of the P(z) was at z 9, the primary peak extended down to z < 7, and there was a non-negligible secondary peak at z ∼ 2. With the significant improvement in photometric constraints, there is now a single narrow peak at z ∼ 8.1.The best-fitting redshift is modestly lower than the previous value due to the only marginally red F115W−F150W color, implying the Lyα break is towards the blue end of the F115W filter.Observations of this object highlight the capabilities of JWST to significantly improve constraints on photometric redshifts compared to the pre-JWST era.
The second object, EGS z910 65860, however shows no significant flux at all at the expected position (Figure 10).This implies that the source identified in the HST imaging was either spurious, or a transient phenomenon.Finkelstein et al. (2022a) performed a detailed vetting process to remove all forms of spurious sources, including persistence, thus this seems like an This source was detectable by HST, albeit in only three filters and at only modest significance.Even in 50 hr of Spitzer/IRAC imaging, it was marginally detected.In just <3000 sec with JWST/CEERS, this source is extremely well-detected in all seven NIRCam filters, highlighting the power of JWST to probe the very early universe.In the top-middle panel we compare the previous P(z) to that now possible with JWST, finding a much sharper peak, centered at z ∼ 8.1 (the F115W−F150W color is only marginally red, implying a Lyα break at the blue edge of F115W).

Hubble Space Telescope
unlikely solution.We therefore consider whether the observations are consistent with a transient source.This object showed significant detections in HST F125W and F160W imaging, with no detections in F606W, F814W, F105W nor F140W.While 3D-HST F140W pre-imaging was shallower than the CANDELS imaging, it was curious that this object showed no detection as its SED (anchored by F125W and F160W) should have been detectable.
We thus investigate the date of each of these observations.The F125W and F160W images were taken in 2013, with two images taken ∼50 days apart (2 April, and 24 May).We made updated HST mosaics (following the procedure in Koekemoer et al. 2011) around the position of this source in each epoch separately, and we do see a clear detection in both bands in both epochs.This further refutes the hypothesis that this disappearing source was spurious.Adding to the likelihood of a transient explanation is that the source is fading between the two epochs, with the ratio of fluxes from the first to second epoch being 2.5 and 2.0 for F125W and F160W, respectively (fluxes were measured with SE on each epoch separately, using MAG AUTO to approximate total fluxes).The non-detection of this source in F140W is easily explained, in the context of a transient interpretation, by the acquisition date of those images, which was 1.5 years earlier than F125W and F160W (2 Dec 2011).
While the apparent Lyα break between F105W and F125W could imply that this object is a z ∼ 9 transient, the F105W imaging was obtained much later, on 1 April, 2015.Likewise, the F606W images were taken in 2004 and 2011, while the F814W were taken in 2004, 2011, and 2013.The SCANDELS Spitzer/IRAC imaging was obtained over many years from 2003-2012, but the bulk was obtained between 2010-2012 (Ashby et al. 2015).We therefore do not have the contemporaneous photometry needed for a reliable redshift of this source.What we do know is that the host galaxy is fainter than the limit of our NIRCam imaging.Taking the limit of our deepest image in Table 2, we find that the observed flux in the first F160W epoch (23.6 nJy) is ∼150 times brighter than the 1σ upper limit of our NIRCam imaging.The compact morphology of this source, along with the lack of apparent proper motion between the two CANDELS epochs leaves a supernova as the most likely explanation (a proper motion of less than the F160W PSF in 50 days indicates a distance of more than one parsec, ruling out a Solar System object).We show the observed fluxes/limits versus time in the left panel of Figure 10, while in the right panel we show constraints on host galaxy absolute magnitude.
While nearly any redshift is plausible from a luminosity standpoint (from a low-redshift dwarf galaxy to a very high-redshift ∼L * galaxy), the significant fading over a 50 day time period argues against a very high redshift due to time dilation of the SNe decay curve.From the observed (sparse) light curve, we can only place loose constraints on the nature of the transient, following the methods of Rodney et al. (2014).The data favor classification as a core-collapse supernova with a redshift ranging from z 0.2-1.2, with maximum likelihood at z 1.1 and a slight preference for SN Ib/c over SN II.Nevertheless, the data are also compatible with a SN Ia at z 1.45.2022a) as a z ∼ 9 galaxy candidate.The colors denote the filter.The small circles denote the integrated CANDELS F125W and F160W fluxes, while the larger circles denote the fluxes measured in each epoch, 50 days apart.The object is detected in both epochs in both bands, fading by a factor of ∼2.5 and 2.0 in F125W and F160W, respectively, across this time interval.The object is not detected in any of the CEERS imaging; the inset image shows this position in the CEERS F200W image.This implies the object is >150× fainter now than it was when it was detected.Right) The limits on the host galaxy absolute magnitude, taken by applying the cosmological distance modulus to the CEERS upper limit.With the exception of the extremely low-redshift Universe, a wide range of host galaxy absolute magnitudes are plausible, making it difficult to constrain the redshift of this supernova, though analysis of the light curve implies a likely redshift of z ∼ 0.2-1.2.The inset panels show the images of this transient source in both CANDELS epochs in F125W and F160W.

Candidates Identified with CEERS
At the time of this writing, there have been five previously submitted articles which have identified z 9 candidates from these CEERS imaging data (Donnan et al. 2022;Finkelstein et al. 2022b;Harikane et al. 2022;Labbe et al. 2022;Whitler et al. 2022).In this section we compare our current analysis to these previous results, noting that differences in selection techniques may bias the redshift ranges that a particular study is sensitive to.We also emphasize that while the selection of a candidate by more than one study does increase the confidence in the object's fidelity, the presence or lack of a galaxy in a given sample is often easily attributable to differences in data reduction, photometric methodology, and selection criteria.
First, Finkelstein et al. (2022b) published a paper on a single galaxy, "Maisie's Galaxy", which they identified as a z ∼ 12 galaxy candidate.While they used the same full CEERS imaging dataset as we use here (albeit with an earlier version of the data reduction), they employed highly conservative selection criteria to identify a single extremely robust galaxy candidate due to the nascent state of the data reduction pipeline and photometric calibration c.July 2022.This object is in our galaxy sample, here known as CEERS2 5429.Our photometric redshift constraints for this source are z phot = 11.5 +0.2 −0.6 .This is consistent at the ∼1σ level with the value of z phot = 11.8 +0.3 −0.2 from Finkelstein et al. (2022b), with the small differences in the redshift attributable to small changes in photometry due to the updated photometric calibration now available.
The largest previous sample of high-redshift galaxies in the CEERS field comes from Donnan et al. (2022, hereafter D22), who selected 19 high-redshift galaxies over the CEERS field.They required dropouts in the F115W (or redder) bands to be selected, biasing the sample to z > 9.5 (where Lyα redshifts out of F115W).
Here we compare to their revised sample (c.Oct 2022).We find nine galaxies in common between our two samples.Of the 17 candidates in our sample but not in D22, 14 are at z 9.5, thus their absence from the D22 sample is expected via their requirement of no significant flux in F115W.The remaining three candidates in our sample and not in D22, CEERS1 1730, CEERS1 7227 and CEER6 4407 have z best > 10, thus in principle could have been selected by D22.Of the nine sources in common, the median difference in photometric redshifts is 0.3 ± 0.7, with all redshifts agreeing within ∆z < 1 (Figure 11), with the exception of our source CEERS6 7641 (D22 ID 30585), which we find has z best = 9.0 compared to their value of z best = 10.6.However, as shown in Figure 7, the P(z) for this source is quite broad, and contains a second peak at z ∼ 10.4, thus our results are fully consistent with those of D22.
Of note is that our sample contains D22 object 93316, as object CEERS2 2159, and our photometric redshift is nearly precisely equal to the D22 value of z = 16.4.This object is remarkably bright (m F 277W = 26.5),and some evidence is present both from sub-mm imaging and environmental studies that indicate z ∼ 5 (Zavala et al. 2022;Naidu et al. 2022a).The JWST photometry alone very strongly prefer this ultra-high redshift.Followup spectroscopy and millimeter interferometry will soon reveal the true nature of this potentially record-breaking system.
We next explore the properties of the 10 galaxies selected by D22 that are not in our final high-redshift galaxy sample.Of these 10, we find that five are faint enough that they did not meet our source detection significance criteria.Some of these are faint enough in our catalog that their photometric redshifts are not well constrained, while others do show peaks at z > 10.Four more sources do show primary peaks at z > 10, but have secondary peaks at z < 4 that are large enough for our selection criteria to remove these sources (these D22 IDs 1434 2, 26409 4, 5628 2 and 6647 all have ∆χ 2 < 4 in our catalog).For only one source, D22 ID 61486, do we find a strong low-redshift solution.This object exhibits a fairly flat SED in our catalog, though it does exhibit a red F115W−F150W color, which could be indicative of a true z > 9 Lyα break.Overall, we find strong consistency between objects in common in both our sample and that of D22, and for the D22 sources not in our sample, a high-redshift nature is possible given our photometry.
We next compare to Harikane et al. (2022).At the time of this writing only the original sample, prephotometric calibration update, was available for comparison.In the CEERS field, they selected galaxies as F150W or F200W dropouts, restricting their sample to z 12, selecting six high-redshift candidates.Of these six, two are in common with our sample: CEERS2 5429 ("Maisie's Galaxy"; Harikane ID CR2-z12-1) and CEERS2 2159 (D22's 93316; Harikane ID CR2-z17-1), with Harikane et al. having z best = 11.88 and 16.45 for these two objects, respectively.Of the four sources in their sample we do not recover, for CR2-z12-3, CR3-z12-1 and CR6-z12-1, our P(z) does show a primary peak at z > 10, but the low-redshift peak is significant, with ∆χ 2 < 4 for all three.All three sources are also faint, and do not meet our detection significance criteria.For their source CR2-z12-1, we find z < 6, driven by a ∼6σ detection in F115W, and a blue F115W−F150W color.Labbe et al. (2022) selected candidate massive galaxies at 7 < z < 11 as those with detectable Lyα and Balmer A comparison of our photometric redshifts to those for objects in our sample which were previously published as z > 8.5 galaxy candidates from CEERS data in the literature, showing good agreement (not shown is the z ∼ 16 candidate, where the agreement is good between the three studies who have published it).As discussed in §6.2, for literature sources not present in our sample, we find in our catalog that they typically miss our detection significance and/or ∆χ 2 criteria, though often the P(z > 10) is non-zero.
Finally, Whitler et al. (2022) studied the stellar populations of bright galaxies at high-redshift in CEERS.We compare to an updated sample, again following improved photometric calibration (L.Whitler, private communication).In this sample, they have six galaxies with z best > 8.5.Three of these sources are in our sample (CEERS1 3858,CEERS1 6059 and CEERS6 7641,with Whilter et al. IDs of 37135,37400 and 9711,respectively).The photometric redshifts for these three agree extremely well (∆z < 0.05) with our estimates.Of the three sources we do not recover, we do find a strong z ∼ 10 peak for their ID 7860; this source narrowly misses our sample with ∆χ 2 = 3.7.ID 14506 does show a red F115W−F150W color, but our analysis prefers a 4000 Å break rather than a Lyα break, though a very small peak is present at z ∼ 11.For ID 34362, a z ∼ 9 peak is dominant in our measurements, but ∼50% of the integrated P(z) is contained in a z ∼ 2 peak.
In Figure 11 we compare the photometric redshifts from our study to these previous works for sources in common.As discussed above, especially considering the mix of photometric procedures and photometric redshift techniques, the agreement is generally good.
As one final comparison, we compare our estimated total magnitudes with the published magnitudes for those candidates in common (where, using the information available, we compare F200W for D22 and Whitler et al. 2022, F356W for Harikane et al. 2022, and F444W for Labbe et al. 2022).We find the largest discrepancy with D22, where for the nine sources in common, the median magnitude differences between our cataloged total magnitudes and those of D22 is −0.3 mag (meaning our fluxes are brighter); however there is significant scatter, with one source being as much as 1 mag brighter in our catalog, and another being 0.4 mag fainter in our catalog.Further exploring this discrepancy requires a deeper comparison between the procedures adopted to estimate total fluxes.Comparing to the other studies, we find results more in agreement.For the four sources in common with Whitler et al. (2022), the median magnitude offset is −0.1, with individual objects ranging from −0.2 to 0. For the two objects in common with Harikane et al. (2022), our magnitudes are fainter by 0.1 and 0.3 mag.Finally for the five sources in common with Labbe et al. (2022) our magnitudes are all fainter by 0.05-0.2mag, with a median of 0.15 mag.We conclude that while there is significant scatter in the photometry due to the various processes used to reduce and analyze the data, we find no evidence that our photometry is significantly systematically shifted compared to other studies, especially by the large amount needed to explain the observed luminosity function excess ( §7.1).Clearly improving photometric agreement between different photometric catalogs is a key goal for the near future.

THE EVOLUTION OF GALAXIES IN THE FIRST 500 MYR
Here we investigate what constraints we can place on galaxy evolution with our sample of 26 candidate z ∼ 8.5-16.5 galaxies.While it is early in the JWST mission and the sample size is modest, the fact that our sample contains a large number of galaxies at z > 9 allows us to investigate what constraints are possible.We first measure the rest-frame UV luminosity function at z ∼ 11 in comparison to both previous results as well as expectations from empirical extrapolations.We then compare the cumulative number of galaxies to predictions from a suite of theoretical models, to explore the accuracy with which these models predicted galaxy abundances in this early epoch.

The z ∼ 11 Rest-UV Luminosity Function
The UV luminosity distribution function is one of the key observational diagnostics of galaxy evolution in the early universe.With each technological leap leading to a new redshift era being observable, this quantity is always of immense interest (e.g.Bouwens et al. 2004;McLure et al. 2013;Finkelstein et al. 2015) as it is directly comparable to simulation predictions, helping to constrain the physical mechanisms regulating galaxy evolution.While the extensive amount of deep-field imaging data available from the full Cycle 1 dataset will lead to excellent constraints on this quantity, here we gain a first look by exploring constraints placed by the CEERS data.
For this first-look luminosity function we focus on the specific redshift range of z ∼ 9.5-12.We exclude z = 8.5-9.5 for two reasons.First, there is a likely overdensity at z = 8.7 (e.g.Finkelstein et al. 2022a;Larson et al. 2022) which would bias this quantity high.Second, at z > 9.5, the Lyα break is fully redward of F115W, providing a true JWST dropout sample that is not dependent on the shallower HST imaging.We choose the upper bound of z ∼ 12 as only two galaxies in our sample have higher photometric redshifts.
To measure the UV luminosity function we require an estimate of the effective volume over which we are sensitive to galaxies.This is a function of both redshift and source brightness, and accounts for incompleteness due to both photometric measurements and sample selection.Following Finkelstein et al. (2022a) we estimate the completeness by injecting mock sources of known brightness into our images.We do this separately for each of the four fields, injecting 10 3 sources per iteration, with ∼30 iterations per field, to avoid crowding.
We generate source morphologies with galfit (Peng et al. 2002).While the completeness can depend sensitively on source size due to surface brightness dimming, our sources are quite compact (Figure 8), thus we choose a log-normal half-light radius distribution such that the size distribution of the recovered sources matches well that of our galaxy sample (noting that if there were highly extended sources not present in our sample, we would be underestimating the incompleteness).While we choose Sérsic profiles (with a log-normal distribution peaked at n = 1), our size assumption results in a peak of unresolved sources, with a tail towards modestly resolved sources.We generate galaxy SEDs assuming a log-normal distribution in magnitude and a flat distribution in redshift, using Bruzual & Charlot (2003) templates with stellar population properties tuned to Figure 12.The total completeness as a function of input redshift and F277W apparent magnitude.These values were derived by simulations where we inject compact sources with realistic (modestly blue) SEDs into our imaging, and attempt to recover them with our analysis pipeline.The shading denotes the fraction of sources in each bin of input redshift and magnitude which were both detected by SE and satisfied our sample selection.The incompleteness at z < 8.5 is expected via our sample selection.We see that the CEERS imaging allow recovery of galaxies across the entire redshift range of interest to very high completeness at mF 277W < 28.0, with the completeness dropping steadily from m = 28 to m = 29.
generate fairly blue galaxies (this results in median UV spectral slope (β) of −2.3, with a tail to −3 and −1).
After sources are added to the images, the images are processed through our entire analysis pipeline in the same way as our real data, measuring photometry with SE, applying all aperture corrections (though we did not apply the small zeropoint offsets as those may be due to real instrumental offsets not captured in these simulations), measuring photometric redshifts with EAZY, and applying our sample selection.In Figure 12 we show our completeness as a function of input redshift and F277W magnitude.This highlights that our sample selection accurately begins to select galaxies at z 8.5.Our completeness remains high to m ∼ 28.0, beginning to fall off at m > 28.5, consistent with pre-launch expectations.
To calculate rest-UV absolute magnitudes for both our real and simulated sources, we follow Finkelstein et al. (2015) performing a basic round of SED fitting, measuring M 1500 from the bandpass-averaged flux over a tophat bandpass spanning 1450-1550 Å in the rest-frame.We use 10 3 Monte Carlo simulations to obtain uncertainties on these magnitudes.For the completeness for our luminosity function estimate, we require the simulated sources to have best-fit photometric redshifts spanning 9.5-12, to match those of our galaxy sub-sample.We then calculate our completeness as a function of absolute magnitude, and measure effective volumes in bins of absolute magnitude by integrating over the co-moving volume element multiplied by the completeness for a given magnitude and redshift, over our current survey area of 35.5 arcmin 2 .We list our effective volumes in Table 3.
We measure our luminosity function following the methodology of Finkelstein et al. (2015), adopting a bin size of 0.7 magnitudes.The number density in each bin is estimated via MCMC (see details in Finkelstein et al. 2015), sampling galaxies absolute magnitude posterior distributions, such that galaxies can fractionally span multiple magnitude bins.We note that at M U V > −19.5 our completeness falls below 30% of its maximum value, thus our faintest bin is shaded in white to indicate that the value is dominated by the completeness correction.
In Figure 13 we show our luminosity function results.In the top-left panel, we show the stacked P(z) of the 10 sources in our sample at 9.5 < z best < 12.0.The FWHM of this normalized distribution spans z = 9.5-11.7,with a peak at z ∼ 11.The dashed line shows the completeness from our simulations as a function of redshift at the median absolute magnitude of our sample, which probes a broadly similar redshift range, albeit with a flatter distribution.In this figure we also plot recent z ∼ 11 results from Donnan et al. (2022), Naidu et al. (2022b) and Bouwens et al. (2022a).While numbers in all studies are fairly small, the agreement between all studies is encouraging.We also compare to a wealth of studies in the literature from HST at z = 8.5-11.
Broadly speaking, our results at z ∼ 11 do not show significant evolution from these modestly lower redshifts.This is consistent with the conclusion from Bowler et al. (2020) who noted that the bright-end of the UV luminosity function (M U V = −22 to −23) shows little evolution from z = 8-10.However, here we are beginning to probe fainter, yet we still see little evolution.(Donnan et al. 2022, , who also used UltraVISTA), and the HUDF (Bouwens et al. 2022b).The light blue points show a compilation of data from the literature at z ∼ 9-10.Circles denote results from studies which used (modestly) deep imaging from surveys such as CANDELS and the Hubble Frontier fields, including McLeod et al. (2015), Oesch et al. (2018), and Bouwens et al. (2019Bouwens et al. ( , 2021)).The squares denote studies making use of HST pure parallel surveys, including Bernard et al. (2016), Morishita et al. (2018) and Rojas-Ruiz et al. (2020).The triangles denote results from wide-area ground-based studies of Stefanon et al. (2019) and Bowler et al. (2020).The blue lines show the evolving double power-law (DPL) luminosity functions from Finkelstein & Bagley (2022) at z = 4-8 (this model was fit to data at z = 3-9).The darker shaded gray region show the predictions from these DPL fits extrapolated to z = 9 (upper bound) -12 (lower bound); the lighter gray (outlined with dashed lines) shows a similar extrapolation from the evolving Schechter function fits from Finkelstein (2016, ; this model was fit to data at z = 4-8).The inset shows the stacked P(z) of the galaxies used in this luminosity function, as well as the redshift distribution estimated from the completeness simluations at MUV = −20.The observed z ∼ 11 luminosity function is consistent with the top end of both smooth extrapolations, implying that the observed smoothly UV luminosity function evolution from z = 4 to z = 9 may be slowing at z ∼ 11.
Finally, we comment on both the shape of the UV luminosity function, and the overall evolution.In Figure 13 we plot two empirical extrapolations.The first is a Schechter function from Finkelstein (2016), who measured the evolution of Schechter function parameters as a linear function of (1+z) from z = 4-8; we plot this function evolved to z = 9 and 12, shaded by the light gray color.The darker gray shading is a similar empirical evolution, this time using data from z = 3-9, and assuming a double power-law (DPL) form, from Finkelstein & Bagley (2022).Over the magnitude range of our observed sources, these two functions agree, and our observations are consistent with, albeit at the high end of, these empirical extrapolations, implying that pur observed z ∼ 11 UV luminosity function is similar to the z = 9 DPL luminosity functional fit from Finkelstein & Bagley (2022).
These results suggest that the evolution of the UV luminosity function, which had been smoothly declining from z ∼ 4 to 8, begins to slow by z ∼ 11.The luminosity function decline has been debated in the literature prior to the JWST era, notably by Oesch et al. (2018) and Bouwens et al. (2019), who found evidence for an accelerated decline in the UV luminosity function at z > 8.While our small sample cannot conclusively distinguish between these two scenarios, should future luminosity Note-The number densities are derived via a MCMC method which includes photometric uncertainties, thus galaxies can contribute to the number density in more than one bin.The effective number column lists the mean and standard deviation of the number of galaxies per bin from these MCMC simulations, while the number column gives the actual value based on the measured magnitude.a Our data is <30% complete in this faintest bin, so we do not consider this value reliable; it is indicated as an open symbol in Figure 13.
function efforts validate our observed number densities, it provides further evidence that there is significant star formation in our universe at z > 10.Previous efforts to recover the star formation histories of galaxies detected at z ∼ 9 to 11 with HST and Spitzer /IRAC (Tacchella et al. 2022) and analysis of early JWST luminosity functions from Donnan et al. ( 2022) also arrived at a similar conclusion.
7.2.The Cumulative Surface Density of Galaxies at z > 8.5 As another view into the evolution of galaxies in the first 500 Myr, in Figure 14 we plot the cumulative surface density of sources in our sample.This plot shows the integrated surface density for sources at redshift greater than z, limiting to m F 277W < 28.5 to avoid fainter luminosities where we are highly incomplete.We correct for incompleteness by counting each galaxy as one divided by the estimated completeness at the redshift and magnitude of a given galaxy.The solid black line shows the completeness corrected value from our sample, while the dotted line shows the results with no correction.We note that across the redshift range considered here, the completeness correction is typically a factor of ∼2, which is typical of most analyses of modestly faint galaxies.This, of course, means that the accuracy of completeness corrections remains an important systematic, and one we will explore in more detail in future works.However, we note that with the exception of the lowest redshifts studied here, even the uncorrected counts exceed most of the model predictions, and therefore this is a reasonable lower limit on the true surface density.
To encompass the uncertainty in both flux and photometric redshift, we run Monte Carlo simulations, sampling both the F277W flux and photometric redshift posterior distributions, and plotting the 68% confidence range on the surface density as the 68% spread in values from these 10 3 simulations as the light gray shaded region.We also run a set of Monte Carlo simulations additionally sampling the Poisson uncertainty, shown by the wider dark gray region.We find that at z > 8.5 and m F 277W < 28.5, our results suggest >1 galaxy per arcmin 2 .This trends downward when integrating from higher redshifts, albeit slowly, with this quantity not dropping to 0.1 arcmin −2 until z > 11.5-12.We note that these surface densities are only slightly reduced when removing sources with 4 < ∆χ 2 < 9 (dashed line).
We also show the cosmic variance calculated based on the bluetides simulation (Bhowmick et al. 2020).While these uncertainties appear comparable to the combined redshift, photometric and Poisson uncertainties, these cosmic variance uncertainties are likely an upper limit.As they are simulation based, they rely on the predicted abundance of galaxies, and as we show here, most simulations under-predict the abundance of z > 9 galaxies.As the relation between UV luminosity and halo mass in this epoch is very uncertain ( §8.2), this leads to uncertainty in these calculations.We show this upper limit on the true cosmic variance uncertainty to highlight that it implies our results are still significantly above the predictions.
In the bottom panel of this plot, we also compare to nine recent model-based predictions, including from the Santa Cruz (Yung et al. 2019a(Yung et al. , 2020) ) and delphi (Dayal et al. 2017) semi-analytic models (SAMs), empirical models by Behroozi & Silk (2015), Mason et al. (2015), and UniverseMachine (Behroozi et al. 2019(Behroozi et al. , 2020)), and cosmological hydrodynamic simulations FLARES (Wilkins et al. 2022a,b), THESAN (Kannan et al. 2022), MillenniumTNG (Kannan et al. 2022), and Simba (Davé et al. 2019).Interestingly our results are noticeably higher than most predictions from physics based models.At z < 9.5, our observations are consistent with both the Behroozi & Silk (2015) and delphi models (and the FLARES predictions with no dust attenuation), while at z > 10 our results lie significantly above all predictions with the exception of Behroozi & Silk (2015).We include results with and without dust attenuation for the Santa Cruz SAM, delphi, FLARES, and Simba.We note that while there is likely an overdensity at z ∼ 8.7 in the observed field, this has no Figure 14.The cumulative surface density of sources with mF 277W < 28.5 at redshift greater than a given x-axis value, starting at z ≥ 8.5.The top panel shows the redshifts of individual objects (with blue, red and purple denoting the z ∼ 9, 11 and >12 samples).In the middle panel the solid line shows the observed surface density, after applying a correction for incompleteness; the dotted line shows the un-corrected (incomplete) values.The light shaded region shows the posterior on the distribution of the completeness-corrected surface density derived from Monte Carlo simulations marginalizing over the uncertainties in magnitude and photometric redshift; the dark shading includes Poisson uncertainty in this marginalization.The dashed line shows the completeness-corrected surface density if we had applied a more conservative sample selection criterion of ∆χ 2 low−z−high−z > 9.In the bottom panel, we show the same shaded regions, now comparing several recent model predictions, shown by the various colored lines (with solid, dot-dashed and dashed denoting predictions from hydrodynamical, semi-empirical, and semi-analytic models, respectively).For four models we plot thicker lines for predictions with no dust attenuation, and thinner lines for attenuated predictions.In this panel we also show an estimate of the cosmic variance uncertainty using the method from Bhowmick et al. (2020), though this is likely an upper limit on this quantity.Even including all sources of uncertainty, our observed surface densities are higher than nearly all predictions, with the exception of the Behroozi & Silk (2015) semi-empirical model.This apparent excess of high-redshift galaxies holds true at all z = 8.5-14, regardless of the ∆χ 2 cut, and is even true of the raw, non-corrected counts at z > 10.5.These results strongly imply that these predictions lack the full complement of physics describing star formation in the early universe, which we discuss in §8.2.impact on the surface density measured at z > 9, and the excess of our observed galaxy numbers over nearly all theoretical predictions persists at all redshifts.We discuss possible explanations for this discrepancy in §8.2.
It is important to note that the compilation of theory results included in this comparison is made utiliz-ing several different modeling approaches and with some different modeling assumptions (see Somerville & Davé (2015) for a thorough review).For instance, cosmological hydrodynamic simulations (e.g.MilleniumTNG, THESAN, Simba) are carried out by solving the equations of hydrodynamics, thermodynamics, and gravity for large numbers (typically billions) of dark matter, gas, and star particles.These simulations are capable of selfconsistently tracking various properties (e.g.stellar and gas mass) and morphology, as well as their correlation with the large-scale environment.Cosmological zoomin simulations (e.g.FLARES, based on the EAGLE model) track galaxy evolution at higher resolution by zooming into regions of interest in a cosmological simulation and re-simulate at higher mass resolution and in some cases with more fundamental treatments of baryonic processes.
All numerical cosmological hydrodynamic simulations require the use of "subgrid" prescriptions to represent physical processes that operate at physical scales below the resolution limit of these simulations (e.g.star formation, stellar feedback, black hole growth and feedback, etc.).Hydrodynamic simulations are subject to tension between the simulated volume and resolution.Large volumes are required to capture the rare over-dense regions that host massive galaxies (the ones of interest in this comparison) and high mass resolution is required to properly resolve the evolution and assembly history of these galaxies.Thus it is extremely challenging to simultaneously capture the rare peaks where high redshift galaxies form, while simultaneously resolving these low-mass halos with sufficient numbers of particles.
On the other hand, the semi-analytic modeling technique uses phenomenological recipes to track the formation and evolution of galaxies in dark matter halo merger trees and is able to predict a wide variety of physical and observable properties of galaxies with a relatively low computational cost.These models are typically capable of simulating galaxies over a wider mass range and are less susceptible to the volume-resolution tension faced by hydrodynamic simulations (though even obtaining adequate dynamic range for dark matter only simulations and halo merger trees is challenging; see Section 8.2 for a more detailed discussion).For both the semi-analytic models and the hydrodynamic simulations described above, the predictive power of these models relies on the assumption that the physical recipes within, often derived from or calibrated to nearby observations, accurately represent the processes that drive galaxy formation in the high-redshift universe.
A different class of models are (semi-)empirical methods, also known as subhalo abundance matching (SHAM) or halo occupation distribution (HOD), which efficiently map the properties of a large ensemble of galaxies onto the properties of dark matter halos guided by a set of observed quantities and scaling relations (see Wechsler & Tinker 2018 for detailed review).This approach is independent of specific galaxy formation mod-els and is guaranteed to match the observational constrains to which these models are calibrated.However, extrapolating these models to redshifts different from those where they were calibrated has less physical grounding.
This is only a broad overview of the simulation methods that produced the results presented in Fig. 14, with the aim of emphasizing that these results are produced with different methods, each having their own advantages and disadvantages.We refer the reader to these works for a full description of the design of these simulations and their performance against observational constraints.

Observational Effects
In the previous subsections we showed that the evolution of the UV luminosity function appears to be "slowing" at z > 10, and that the abundance of z > 9 galaxies significantly exceeds predictions by most physicallybased theoretical models.Here we explore several possible reasons for these unexpected results.At this early time with JWST, we must first acknowledge that the purity of the sample is uncertain.Should the majority of our galaxy sample turn out to be low-redshift interlopers, it would put our results more in line with the theoretical predictions.While this is unlikely given our detailed photometry and selection process, these strong claims require strong evidence, thus spectroscopic confirmation is a must.Without this, it remains possible that heretofore unexpected contaminants could be dominating our sample.The CEERS spectroscopic program will observe a number of these sources, and thus we may soon be able to gain further confidence in our galaxy sample.We do note that non-detections in ALMA dustcontinuum followup of three z 11 sources in other early JWST fields strengthens the high-redshift solutions in all four cases (e.g.Fujimoto et al. 2022;Bakx et al. 2022;Yoon et al. 2022).
One other observational systematic effect which could affect these results is that of aperture corrections.In the above sections we described our multi-step approach to deriving total fluxes, accounting both for what can be detected in the images, and then using simulations to correct for any missing flux from the wings of the PSF.As one additional check, in §6.2 we explored whether our fluxes are significantly systematically brighter (or fainter) than other objects published in the literature, finding that while there was significant scatter, there was no evidence that our fluxes were systematically brighter (especially by the large amount needed to explain the luminosity function excess).Nonetheless, future improve-ments in photometry can increase confidence in these results.

Theoretical Implications
Here we speculate on potential physical causes for the observed abundances should future observations validate our results.One potential explanation would be a complete absence of dust attenuation (e.g.Ferrara et al. 2022;Mason et al. 2022).As shown in Figure 14 for those models where we plot the (un) attenuated predictions as thicker (thinner) lines (when available), higher number densities are predicted.While a full analysis of the colors of these galaxies is beyond the scope of this paper, their SEDs do appear quite blue, and thus it is possible that they have little dust attenuation.Of note is that bright z ∼ 9-10 galaxies exhibit only marginally redder spectral slopes.For example, Tacchella et al. (2022) found a median UV spectral slope of ∼ −2 for M U V = −21, implying that a small amount of dust attenuation was present.Simulations (FLARES, Simba) predict a large ∼ 3× reduction in surface densities due to dust, while the Delphi SAM predicts much less.Should a lack of dust attenuation at z ∼ 10-12 be responsible for our observed evolution, one may expect to see the luminosity function at higher redshifts continue to decline.
Most models we have compared to employ some kind of "Kennicutt-Schmidt" (KS) star-formation law, relating the star-formation rate surface density to the dense gas surface density, assuming a constant efficiency per free fall time.While models such as the SC SAM (Yung et al. 2019a,b) adopt a KS law that becomes steeper at higher gas surface densities, leading to higher star formation efficiencies at early times, when the typical gas densities are higher, these models still imply a fairly long gas depletion time compared to the age of the Universe at these extreme high redshifts.For example, in the local universe, the depletion time for dense (molecular) gas is ∼1.5 Gyr.This timescale does appear to be shorter at higher redshift, as short as ∼0.7 Gyr at z = 2, and perhaps even shorter at higher redshifts (e.g.Sommovigo et al. 2022).The Behroozi & Silk (2015) model would require stars to form at the same rate that gas is funneled into halos, effectively assuming a negligibly short gas depletion time.It is also possible that the star formation efficiency could be higher in low metallicity gas, though the opposite trend has been proposed (e.g.Krumholz & Dekel 2012), or that stellar driven winds are weaker.Additionally, given the extremely high gas densities at these high redshifts, the cold neutral medium itself may also participate in star formation in addition to the molecular phase typically tracked by some simulations (Bialy & Sternberg 2019).
Another explanation for the poor match between the predictions and observations in SAMs is the mass and temporal resolution of halo merger trees.The halo merger and assembly histories, as an important component of semi-analytic models and some empirical models, can either be constructed with the extended Press-Schechter (EPS) formalism (e.g.Somerville & Kolatt 1999) or extracted from N -body cosmological simulations (e.g.Behroozi et al. 2013).However, even the current generation state-of-the-art cosmological simulations only saved a few dozen snapshots at z > 6 and among which only a handful at z 13, which is insufficient to properly capture the merger histories of these early-forming halos.Furthermore, the bulk of these high-redshift halos have masses near the resolution limit of these cosmological simulations, which further limit their ability to resolve halo merger trees at these extreme redshifts.On the other hand, while EPS-based merger trees have been compared to and shown to be in good statistical agreement with n-body merger trees, they are untested in the mass and redshift ranges that are explored here (see Yung et al. (2020) for a detailed discussion).
Conversely, the poor resolution of the gas content in cosmological hydrodynamic simulations that does not resolve the multi-phase nature of the ISM could contribute to the discrepancy between predictions and observations.It has been repeatedly shown that increasing the resolution typically leads to more dense gas and thus, burstier star formation (without adjusting any free parameters; although this depends on the nature of the sub-grid recipes used for star formation and stellar feedback).Thus, solely increasing the resolution in cosmological hydro simulations, such as in the Lyra cosmological zoom simulation set (Gutcke et al. 2022), and/or improving the sub-grid recipes for the ISM, star formation and stellar feedback, may partly alleviate the poor match between simulations and these observations.Finally, and perhaps most interestingly, would be an evolution in the initial mass function (IMF).It has long been predicted that the first stars would have a topheavy IMF (e.g.Bromm et al. 2001;Clarke & Bromm 2003), not only due to extremely low metallicities (e.g., Chon et al. 2021;Sharda & Krumholz 2022), but also the rising cosmic microwave background temperture floor (e.g., Larson 1998).While the relatively massive systems we see here are unlikely to be dominated by metalfree stars, it is possible their metallicities are low enough to affect the characteristic mass of their IMF.In fact, these observations now probe so close to the Big Bang,  2022) for z = 4-8, and extrapolated to z ∼ 11 (the thick blue curve is equivalent to the average of the darker gray shaded region in Figure 13).For our observed z ∼ 11 UV luminosity function we use the z = 9 DPL fit from Finkelstein & Bagley (2022) as it is very consistent with our observations (Figure 13).This relation implies that our observed galaxies at MUV = −20 (where our observations place the tightest constraints) have a host halo mass of log(M h /M ) = 10.55.At this halo mass, the expected UV luminosity based on the expected UV luminosity function is 0.6-0.7 mag fainter.This implies that based on the observed abundances, our observed galaxies are 1.8× more UV luminous than expected from extrapolation of HST results.While larger sample sizes and spectroscopic confirmations are needed to have greater confidence in the z ∼ 11 luminosity function, should these results be confirmed it implies that z > 10 galaxies are more luminous in the rest-UV than expected.One possible explanation could be an increasing prevalence at higher redshifts of a top-heavy initial mass function, which is predicted to dominate at very low metallicities.
that it would be surprising if the IMF did not begin to evolve at some point.
We therefore explore what excess UV luminosity is needed to match our observation.We first do a simple test by exploring how much we would have to shift our observed luminosity function data in Figure 13 in UV magnitude such that they would match the DPL luminosity function fits from lower redshift extrapolated to z ∼ 11.We find our data would need to shift ∼1 magnitude fainter to match these empirical predictions.Therefore if an evolving IMF was the sole explanation for the higher-than expected luminosities, the UV luminosity would need to be boosted by about a factor of ∼ 2.5.
As a more complex version of this analysis we estimate the total masses for the host halos of galaxies across the luminosity function via abundance matching.We follow the procedures of Reddick et al. (2013), assuming a 0.2 dex scatter in UV luminosity at fixed halo mass, showing the observed relations between halo mass and UV absolute magnitude in Figure 15; qualitatively similar findings applied for scatter ranging from 0 − 0.4 dex.
For z ≤ 9, we use the DPL UV luminosity functions from Finkelstein & Bagley (2022).For the expected evolution to z ∼ 11, we extrapolate their fits to z ∼ 12, and show the volume-averaged extrapolated LF for z = 9.5 − 12 with the bright blue line.For our actual observed values, as we have not fit a functional form here due to our limited dynamic range in luminosity, we assume the z = 9 DPL fit from Finkelstein & Bagley (2022) as this is most consistent with our observed number densities in Figure 13.We show in red the M halo -M U V relation when abundance matching the average of the z ∼ 9.5-12 halo mass functions to this assumed observed luminosity function.In this figure, we highlight M U V = −20, which is where our z ∼ 11 observations are most constraining.Our abundance matching analysis predicts that based on our observations, these galaxies are hosted in halos with log(M h /M ) = 10.55 (upper light-red bar).If we had instead used the extrapolated z ∼ 11 results, these halos should host galaxies with M U V = −19.3 to −19.4 (lower light-red bar).This implies that our observed galaxies are ∼ −0.6-0.7 mag brighter (1.8× in UV luminosity) than expected for a galaxy in their host halo.
A UV-luminosity boost of ∼1.8-2.5 is not unexpected in the context of a top-heavy IMF.Raiter et al. (2010) explore the impact of differing IMFs on the UV luminosity (specifically investigating the conversion from UV luminosity to SFR).They find that for the top-heavy IMFs proposed by Tumlinson (2006), which peak at ∼10-40 M , yield a UV luminosity ∼0.4 dex higher than Salpeter at a stellar metallicity of log (Z/Z ) = −2.Such a brightness increase is exactly what we find would be needed to bring our observed luminosity function in line with expectations.This interpretation was also considered by Harikane et al. (2022), who found that some top-heavy IMF models can boost the UV luminosity by ∼3-4× (e.g.Zackrisson et al. 2011).The top end of the expected UV luminosity boost was discussed by Pawlik et al. (2011), who predicted up to a factor of 10 increase in UV luminosity for a top-heavy IMF in a zero metallicity system.
If the interpretation of invoking a top-heavy IMF were correct, there would be a number of additional effects and consequences that may be detectable, thus providing an independent cross-check.One such diagnostic is the predicted boost to nebular line emission, including the occurrence of strong He II 1640 Å features (e.g.Bromm et al. 2001;Raiter et al. 2010), accessible via our upcoming CEERS spectroscopic follow-up.A qualitatively different signature of a top-heavy IMF would be a supernova rate that may be enhanced by up to an order of magnitude, per unit stellar mass, resulting in strong SN-driven galactic winds (e.g., Dayal & Ferrara 2018;Jaacks et al. 2019).
We note that the high observed galaxy number densities do not represent a fundamental problem for hierarchical structure formation models.For instance, using a Simba simulation with all explicit feedback processes turned off and no extinction results in ∼ 2× higher number densities than observed (even with a standard IMF), and the empirical Behroozi & Silk (2015) model can match the data.Hence the results do not necessarily challenge the ΛCDM paradigm, but rather our understanding of the physics of early galaxy evolution within that paradigm.

SUMMARY AND CONCLUSIONS
We have presented the results from a study of galaxies at z > 9 in the first epoch of NIRCam imaging from the Cosmic Evolution Early Release Science program.These imaging cover ∼35 arcmin 2 with seven photometric filters (six broadband and one medium band) covering ∼1-5 µm, reaching m 29.In addition to being some of the widest moderate-depth imaging available early in Cycle 1, these data probe a parameter space in wavelength and depth optimal for studying these early redshifts.
Following a detailed reduction of the data (with full details available in Bagley et al. 2022c) we measure photometry from all sources in the field.Using a combination of the F277W and F356W image as the detection images, we create a 13-band photometric catalog, inclusive of imaging in six HST filters in this field.We emphasize the calculation of robust colors, total fluxes, and flux uncertainties, using simulations to validate our choices in cataloging.We explore any systematic offsets in our photometry by comparing to the best-fitting model templates for ∼800 spectroscopically-confirmed sources in our field, finding that the large zeropoint offsets that affected early JWST/NIRCam studies have been resolved.We do measure (and apply) offsets from ∼1-5%, which are reflective of potential residual zeropoint corrections in the NIRCam imaging, adjustments to our estimations of the total fluxes, and potential mis-matches between the used template set and the SEDs of the real galaxies.
We estimate photometric redshifts for all sources in our catalog using EAZY.In addition to the standard set of 12 FSPS templates, we add six new templates designed to span the blue colors expected in very early galaxies, as well as very strong emission lines expected at low metallicities.We designed a set of selection criteria to robustly select galaxies at z > 9, balancing our desire to maximize completeness with our need to minimize contamination.These selection criteria demand both robust detection significance (signal-to-noise ratio > 5.5 in at least two filters, measured in 0.2 diameter apertures), as well as photometric redshift probability distribution functions that are strongly constrained to z > 8.5.
Following visual inspection to remove any remaining spurious sources (shown in the Appendix), our selection resulted in a sample of 26 robust candidate galaxies at z > 8.5.The majority of the sample (15 galaxies) lies at 8.5 < z < 10, while nine galaxies lie at 10 < z < 12. Two candidates lie at higher redshifts -the previously published z ∼ 16.5 source (Donnan et al. 2022), and a new candidate at z ∼ 13.While a full analysis of the properties of these galaxies is reserved for future work, their rest-frame UV SEDs are fairly uniformly blue, and the galaxies are compact, with a median half-light radius of 3.6 pixels (0.11 , or 0.46 kpc).We explore information from multi-wavelength constraints on these galaxies, and find no significant detections in either X-ray or submm/mm wavelengths, strengthening the conclusion that these sources reside at z > 9. We compare to the few previous searches for z > 9 sources in this field, and find that for sources in common between studies, the photometric redshifts agree quite well.Of the two known HST sources previously published in this field, we very robustly constrain the SED of one, placing it at the slightly lower redshift of z ∼ 8.1.The other is completely absent in our NIRCam data, and we conclude it was a likely supernova serendipitously captured by the HST data taken in 2013.
We estimate our sample completeness as a function of redshift and magnitude using source-injection simulations, and present an early look at the z ∼ 11 rest-UV luminosity function.We find that the abundance of modestly bright (M U V ∼ −20) galaxies at z ∼ 11 does not appear to be evolving from z ∼9 -11, which is unexpected given the steady decline in the abundance of such galaxies from lower redshifts to z = 8, though such non evolution at the very bright end from z = 8-10 had previously been discussed by Bowler et al. (2020).We then compare the surface density of our sources to a variety of model predictions, finding that even after accounting for several sources of systematic and random uncertainty, the observed abundance of galaxies is in significant excess of these predictions.
We explore several potential explanations for these unexpected results.While not the most exciting, significant sample contamination cannot be conclusively ruled out.These data represent our first foray into a new cosmic epoch, and spectroscopic confirmation of the redshifts to at least a subset of these ultra-high-redshift sources are necessary to gain confidence in our sample selection processes.However, such data will begin to flow soon, with CEERS scheduled to spectroscopically observe ∼10 of these sources in late 2022 (though these high redshifts may necessitate longer exposure times for future cycle programs).
Should these high abundances of z = 9-13 galaxies be confirmed, we explore what possible changes in the models could bring their predictions into agreement with observations.One very exciting possibility is that we are beginning to probe an era where star-formation in galaxies is dominated by a top-heavy IMF due to the presence of very low metallicities, which could increase the ratio of UV luminosity per unit halo mass.We explore the "excess" UV luminosity from our observations, both by comparing to the expected UV luminosity function based on extrapolations from lower redshift and through an abundance matching exercise, and find that our UV luminosities may be enhanced by 1.8-2.5×.This is very similar to the predicted excess UV emission from a lowmetallicity stellar population where the IMF peaks at 10-40 M (Tumlinson 2006) of a factor of ∼2.5×, implying a top-heavy IMF is a plausible physical explanation.We also discuss how potential changes to the dust attenuation, star formation law, galactic feedback, and resolution of numerical simulations could collectively contribute to reconciling our observations with model predictions.
These possibilities are exciting, and while one might expect that at z > 10 we should expect to see changes in star-formation physics such as a top-heavy IMF, our results are just a first glimpse, and the data available in the near future will provide much stronger constraints.Specifically, the remainder of Cycle 1 will see the creation of high-redshift galaxy samples orders of magnitude larger than we present here from the combination of the full CEERS survey, COSMOS-Web (PIs Casey & Kartaltepe), PRIMER (PI Dunlop) and NGDEEP (PIs Finkelstein, Papovich & Pirzkal), along with JADES (PI Rieke).Additionally, Cycle 1 will also see spectroscopic followup with NIRSpec of NIRCam identified sources from CEERS and JADES, and several Cycle 2 programs will certainly target these sources with deep observations.While it is early days with JWST, our first-look CEERS results provide an enthralling glimpse of the potential secrets the early universe has which our observations can unlock.

Facility: HST (ACS, WFC3)
Facility: JWST (NIRCam) We acknowledge that the location where this work took place, the University of Texas at Austin, that sits on indigenous land.The Tonkawa lived in central Texas and the Comanche and Apache moved through this area.We pay our respects to all the American Indian and Indigenous Peoples and communities who have been or have become a part of these lands and territories in Texas, on this piece of Turtle Island.We acknowledge support from NASA through STScI ERS award JWST-ERS-1345.We thank Marcia Rieke, Daniel Schaerer, Volker Bromm and Mike Boylan-Kolchin for helpful conversations.Note-All fluxes are in nJy.The horizontal lines distinguish our z > 12, z = 10-12 and z = 8.5-10 samples.

APPENDIX
A. FULL PHOTOMETRY In Tables 4 and 5 we list the measured photometry for our final sample.

B. Z ∼ SAMPLE PLOTS
Here we show the cutout images (Figure 16 and 17), and SED plots (Figure 18) for the z ∼ 9 sample as described in §5.3, presented here in the Appendix for clarity in the main text.

C. SOURCES REMOVED FROM SAMPLE
Figure 19 shows two objects removed from our sample after re-measuring colors in smaller apertures, due to the stretching of their Kron apertures by nearby bright sources.Figure 20 shows cutout images for the 36 sources removed as spurious sources following visual inspection described in §4.3.We list the coordinates for all removed sources in Table 6.Note-All fluxes are in nJy.The horizontal lines distinguish our z > 12, z = 10-12 and z = 8.5-10 samples.We do not include a column for F105W as none of these candidate galaxies were covered by the sparse amount of F105W imaging in this field.The small images show a 1.5 region around each source in the seven NIRCam bands (and the stacked ACS dropout bands), with the red circle denoting a 0.3 diameter region around the source.The P(z) from the colors measured in this small circular aperture prefers lower redshifts in both cases, thus these sources were removed from the sample.We conclude these are highly likely to be unflagged cosmic rays, which will be better flagged in future reductions (Bagley et al. 2022b).We note that 7/8 are in the CEERS3 or 6 pointings, which had fewer images with longer exposure times than CEERS1 and 2, leading to more numerous cosmic ray hits.
Figure3.The distribution of the best-fitting (minimized χ 2 ) photometric redshift versus apparent magnitude.Each object is color-coded by its redshift sample (with the background shading denoting the redshift range where the P(z) was integrated to determine the sample placement; the green circle far into in the blue region [CEERS6 7641] has a double peaked P(z)).The magnitude plotted is F150W for z ∼ 9, F200W for z ∼ 11, and F277W for z > 12 galaxies.The small black star denotes "Maisie's Galaxy"(Finkelstein et al. 2022b), known here as CEERS2 5429.The top axis shows "Cosmic Time" (the time since the Big Bang) for our adopted cosmology.

Figure 4 .
Figure4.A montage of images for the galaxy candidates identified here.For each galaxy the image shows as blue-green-red the three NIRCam bandpasses closest to the Lyα break, this includes: F150W, F200W, and F277W for galaxies at z = 9; F200W, F277W and F356W for galaxies in the z = 10-12 samples; F277W, F356W, and F444W for galaxies in the z ≥ 12 samples.In each case the images are 1.5 × 1.5 .The inset scale bar shows 1 (physical) kpc.We list the numerical IDs and the best-fitting photometric redshift values (the redshift uncertainties are listed in Table2).

Figure 5 .
Photo−z PDF

Figure 6 .
Figure 6.Similar to Figure 5, for the nine galaxies in the z ∼ 11 sample.

Figure 7 .
Photo−z PDF

Figure 9 .
Figure 9.The candidate galaxy EGS z910 40898, first published by Finkelstein et al. (2022a) as a z ∼ 9 galaxy candidate.This source was detectable by HST, albeit in only three filters and at only modest significance.Even in 50 hr of Spitzer/IRAC imaging, it was marginally detected.In just <3000 sec with JWST/CEERS, this source is extremely well-detected in all seven NIRCam filters, highlighting the power of JWST to probe the very early universe.In the top-middle panel we compare the previous P(z) to that now possible with JWST, finding a much sharper peak, centered at z ∼ 8.1 (the F115W−F150W color is only marginally red, implying a Lyα break at the blue edge of F115W).

F200WFigure 10 .
Figure 10.Left) Flux versus observational epoch for the object EGS z910 68560, first published by Finkelstein et al. (2022a) as a z ∼ 9 galaxy candidate.The colors denote the filter.The small circles denote the integrated CANDELS F125W and F160W fluxes, while the larger circles denote the fluxes measured in each epoch, 50 days apart.The object is detected in both epochs in both bands, fading by a factor of ∼2.5 and 2.0 in F125W and F160W, respectively, across this time interval.The object is not detected in any of the CEERS imaging; the inset image shows this position in the CEERS F200W image.This implies the object is >150× fainter now than it was when it was detected.Right) The limits on the host galaxy absolute magnitude, taken by applying the cosmological distance modulus to the CEERS upper limit.With the exception of the extremely low-redshift Universe, a wide range of host galaxy absolute magnitudes are plausible, making it difficult to constrain the redshift of this supernova, though analysis of the light curve implies a likely redshift of z ∼ 0.2-1.2.The inset panels show the images of this transient source in both CANDELS epochs in F125W and F160W.
Figure11.A comparison of our photometric redshifts to those for objects in our sample which were previously published as z > 8.5 galaxy candidates from CEERS data in the literature, showing good agreement (not shown is the z ∼ 16 candidate, where the agreement is good between the three studies who have published it).As discussed in §6.2, for literature sources not present in our sample, we find in our catalog that they typically miss our detection significance and/or ∆χ 2 criteria, though often the P(z > 10) is non-zero.

Figure 13 .
Figure13.The rest-frame UV luminosity function at z ∼ 11, shown as the red circles (the open circle denotes our faintest bin, where we are <30% complete).Each galaxy's magnitude and magnitude uncertainty is denoted by a small circle and line at the top of the figure.The light red symbols show literature constraints from JWST data, from GLASS(Naidu et al. 2022b), CEERS+GLASS(Donnan et al. 2022, , who also used UltraVISTA), and the HUDF(Bouwens et al. 2022b).The light blue points show a compilation of data from the literature at z ∼ 9-10.Circles denote results from studies which used (modestly) deep imaging from surveys such as CANDELS and the Hubble Frontier fields, includingMcLeod et al. (2015),Oesch et al. (2018), andBouwens et al. (2019Bouwens et al. ( , 2021)).The squares denote studies making use of HST pure parallel surveys, includingBernard et al. (2016),Morishita et al. (2018) andRojas-Ruiz et al. (2020).The triangles denote results from wide-area ground-based studies ofStefanon et al. (2019) andBowler et al. (2020).The blue lines show the evolving double power-law (DPL) luminosity functions fromFinkelstein & Bagley (2022)  at z = 4-8 (this model was fit to data at z = 3-9).The darker shaded gray region show the predictions from these DPL fits extrapolated to z = 9 (upper bound) -12 (lower bound); the lighter gray (outlined with dashed lines) shows a similar extrapolation from the evolving Schechter function fits fromFinkelstein (2016, ;  this model was fit to data at z = 4-8).The inset shows the stacked P(z) of the galaxies used in this luminosity function, as well as the redshift distribution estimated from the completeness simluations at MUV = −20.The observed z ∼ 11 luminosity function is consistent with the top end of both smooth extrapolations, implying that the observed smoothly UV luminosity function evolution from z = 4 to z = 9 may be slowing at z ∼ 11.

Figure 15 .
Figure15.The relation between rest-UV absolute magnitude and halo mass, obtained via abundance matching the observed UV luminosity functions assuming the DPL fits fromFinkelstein & Bagley (2022)  for z = 4-8, and extrapolated to z ∼ 11 (the thick blue curve is equivalent to the average of the darker gray shaded region in Figure13).For our observed z ∼ 11 UV luminosity function we use the z = 9 DPL fit fromFinkelstein & Bagley (2022)  as it is very consistent with our observations (Figure13).This relation implies that our observed galaxies at MUV = −20 (where our observations place the tightest constraints) have a host halo mass of log(M h /M ) = 10.55.At this halo mass, the expected UV luminosity based on the expected UV luminosity function is 0.6-0.7 mag fainter.This implies that based on the observed abundances, our observed galaxies are 1.8× more UV luminous than expected from extrapolation of HST results.While larger sample sizes and spectroscopic confirmations are needed to have greater confidence in the z ∼ 11 luminosity function, should these results be confirmed it implies that z > 10 galaxies are more luminous in the rest-UV than expected.One possible explanation could be an increasing prevalence at higher redshifts of a top-heavy initial mass function, which is predicted to dominate at very low metallicities.

Figure 16 .
Figure 16.Similar to Figure 5, showing the eight candidates with best-fit photometric redshifts of z ∼ 9 and a F277W magnitude brighter than 27.6.

Figure 17 .Figure 18 .
Figure 17.Similar to Figures 5, showing the nine candidates with best-fit photometric redshifts of z ∼ 9 and a F277W magnitude fainter than 27.6.

Figure 19 .
Photo−z PDF

Figure 20 .
Figure20.This compilation shows 3 × 3 cutout images of the 36 sources identified as spurious from our visual inspection of the initial list of candidates.The majority of these spurious detections come from very near to image edges, which are easily identifiable (and can in the future be automated).Six objects are obvious diffraction spikes.The remaining eight sources are very compact and boxy, and visible only in the LW channels.We conclude these are highly likely to be unflagged cosmic rays, which will be better flagged in future reductions(Bagley et al. 2022b).We note that 7/8 are in the CEERS3 or 6 pointings, which had fewer images with longer exposure times than CEERS1 and 2, leading to more numerous cosmic ray hits.

Table 1 .
Imaging Data Summary

Table 2 .
Summary of z > 9 Candidate Galaxies

Table 5 .
HST ACS and WFC3 Photometry for z > 8.5 Galaxy Sample