Bright z ∼ 9 Galaxies in Parallel: The Bright End of the Rest-frame UV Luminosity Function from HST Parallel Programs

The abundance of bright galaxies at z > 8 can provide key constraints on models of galaxy formation and evolution, as the predicted abundance varies greatly when different physical prescriptions for gas cooling and star formation are implemented. We present the results of a search for bright z ∼ 9–10 galaxies selected from pure parallel Hubble Space Telescope (HST) imaging programs. We include 132 fields observed as part of the Brightest of Reionizing Galaxies survey, the Hubble Infrared Pure Parallel Imaging Extragalactic Survey, and the WFC3 Infrared Spectroscopic Parallel survey. These observations cover a total of 620 arcmin2, about 70% of which is also covered with Spitzer Space Telescope infrared imaging. We identify 13 candidate galaxies in the range 8.3 < z < 11 with 24.5 < m H < 26.5 (−22.9 < M UV < −21.2), 11 of which constitute new discoveries. This sample capitalizes on the uncorrelated nature of pure parallel observations to overcome cosmic variance and leverages a full multiwavelength selection process to minimize contamination without sacrificing completeness. We perform detailed completeness and contamination analyses, and present measurements of the bright end of the UV luminosity function using a pseudobinning technique. We find a number density consistent with results from Finkelstein et al. and other searches in HST parallel fields. These bright candidates likely reside in overdensities, potentially representing some of the earliest sites of cosmic reionization. These new candidates are excellent targets for follow up with JWST, and four of them will be observed with the NIRSpec prism in Cycle 1.


Introduction
The study of galaxies in the very early Universe, and particularly of UV-bright galaxies, provides key input to models of galaxy formation.The number density of UV-bright galaxies is affected by factors including feedback processes (e.g., Somerville et al. 2008;Bower et al. 2012), dust attenuation (e.g., Finkelstein et al. 2012a;Vogelsberger et al. 2020), the buildup of dark matter halos, and star formation efficiency.Constraining the abundance of UV-luminous galaxies thus directly constrains the fundamental physics of star formation in the early Universe.By probing ever larger volumes, recent ground and space-based surveys are providing a progressively more complete analysis of the abundance of bright galaxies.These studies are showing that the characteristic luminosity (L * ) of the rest-frame UV luminosity function only shallowly evolves (if at all) in the range z = 4-8 (Bowler et al. 2014(Bowler et al. , 2015(Bowler et al. , 2020;;Bouwens et al. 2015;Finkelstein et al. 2015b).As the UV luminosity function probes recent star formation, these results imply that star formation rates evolve more slowly than the halo mass function, which would indicate that galaxies are efficient at converting gas into stars even out to these redshifts (e.g., Behroozi & Silk 2015;Finkelstein et al. 2015c;Yung et al. 2019aYung et al. , 2019b; though see Stefanon et al. 2021).If this observational trend continues to higher redshifts, it will provide a significant challenge to current galaxy formation models.
Considerable effort has been devoted to modeling galaxy evolution at early cosmic times (z  6), via methods such as hydrodynamical simulations (e.g., Gnedin 2016;Wilkins et al. 2017), (semi)analytic models (e.g., Somerville et al. 2015;Yung et al. 2019aYung et al. , 2019b)), and empirical models (e.g., Mason et al. 2015a;Behroozi et al. 2020).Yet predictions for the density of galaxies at z > 9 can differ dramatically.As the predicted number of UV-bright galaxies is sensitive to the assumed relationship between molecular gas density and the star formation rate density in cosmological simulations, robust observations in this epoch will provide powerful constraints on these physical processes.Analyses at z > 8 Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 licence.Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
continue to push the limits of what is possible with current data.The majority of candidates at these redshifts (e.g., Coe et al. 2013;Oesch et al. 2013Oesch et al. , 2014;;Bouwens et al. 2015Bouwens et al. , 2016;;Finkelstein 2016;Bouwens et al. 2019) were selected from a small set of well-studied fields such as the Cosmic Assembly Near-infrared Deep Extragalactic Legacy Survey (CANDELS; Grogin et al. 2011;Koekemoer et al. 2011), the Hubble Ultra Deep Field (HUDF; Beckwith et al. 2006), and the Hubble Frontier Fields (Lotz et al. 2017).Results measured from a few independent pointings are susceptible to cosmic variance, which can dominate the uncertainties at the bright end of the UV luminosity function.Thus, the density of UV-bright galaxies is presently ill constrained at z > 8, despite its importance for understanding high-redshift star formation and galaxy evolution.
Pure parallel programs-where random field pointings result in completely independent, uncorrelated observations-provide a key opportunity to address these outstanding questions.Often shallower than targeted surveys, pure parallel programs are well suited to identifying bright high-redshift galaxies-a population of rare sources that may not be fully sampled by surveys covering the same area in contiguous fields.Parallel observing can therefore provide better constraints on the bright end of the UV luminosity function and is complementary to deeper surveys that probe the fainter population.
Pure parallel programs such as the Brightest of Reionizing Galaxies (BoRG; PI: Trenti; General Observer [GO] programs 11700 and 12572) and the Hubble Infrared Pure Parallel Imaging Extragalactic Survey (HIPPIES, PI: Yan; GO 11702 and 12286) were indeed designed for just this purpose.Using WFC3 imaging in at least four filters to probe the Lyman break at z ∼ 8, these surveys drastically increased the known sample of bright galaxies and provided crucial constraints on the luminosity function (Trenti et al. 2011;Bradley et al. 2012;Trenti et al. 2012;Schmidt et al. 2014b).The independent pointings significantly reduce the effect of cosmic variance from  20% to <1% (Trenti & Stiavelli 2008;Bradley et al. 2012), providing comparable constraining power as a similardepth survey with twice the area but in a single pointing (Bradley et al. 2012).Studies using the BoRG[z910] fields, which included JH 140 -band imaging, find a relatively high number density of bright z ∼ 9-10 galaxies (Calvi et al. 2016;Morishita et al. 2018;Rojas-Ruiz et al. 2020), potentially indicating more efficient star formation at these early times.Similar results have been seen in the CANDELS fields (Finkelstein et al. 2022), though Oesch et al. (2018) and Bouwens et al. (2021) find a relatively low number density of z > 8 galaxies in the same fields.Significant scatter in highredshift samples remains, demonstrating the systematic uncertainty resulting from different sample selection techniques (i.e., strict color cuts versus photometric redshift selection).While JWST will quickly shed light on this tension, it is not optimized to conduct wide-field surveys for new bright, high-redshift galaxies.The power of JWST will lie in targeted follow up for redshift confirmation and source characterization.
To that end, we have performed a new independent selection for z = 9-10 galaxies in a subset of available Hubble Space Telescope (HST) pure parallel fields, improving on previous efforts by including all available HST and Spitzer/IRAC photometry in our photometric redshift selection.Here we consider fields from the BoRG/HIPPIES surveys that do not include the JH 140 filter, and perform an analysis similar to that presented by Rojas-Ruiz et al. (2020).We intend Rojas-Ruiz et al. (2020) and this paper taken together to cover the full BoRG survey with a consistent approach to z > 8 galaxy selection.We also included in our search 45 fields (∼160 arcmin 2 ) from the WFC3 Infrared Spectroscopic Parallel (WISP) survey (PI: Malkan; GO 12283, 12902, 13352, 13517, and 14178), the first time these fields have been searched for z ∼ 9 candidates.
While many have performed searches for z > 9 galaxies in the BoRG/HIPPIES fields (Bernard et al. 2016;Calvi et al. 2016;Morishita et al. 2018;Bridge et al. 2019;Morishita et al. 2020;Rojas-Ruiz et al. 2020;Morishita 2021;Roberts-Borsani et al. 2022), the results at the bright end of the UV luminosity function remain uncertain due in part to varied selection techniques probing close to the image detection limits.In fact, these teams find different samples of galaxy candidates in the same fields, highlighting how sensitive HST results at z  9 are to photometric measurements, noise characterizations, and sample selections.In our search, we employ a full multiwavelength selection (independent of a simple J 125 − H 160 color cut) including Spitzer/IRAC photometry where available, with detailed noise calculations and source vetting to remove contaminants-resulting in a larger, more complete sample.This sample includes some of the brightest candidates at these redshifts, potentially representing some of the most massive galaxies to form 500 Myr after the big bang.
This paper is organized as follows.In Section 2, we present the pure parallel imaging data sets and describe our photometric measurements.We describe our sample selection criteria in Section 3, including a detailed discussion of how we vetted each candidate, performing a thorough check for cases of persistence (Section 3.3) and a careful visual inspection (Section 3.4).We incorporate available Spitzer/IRAC imaging of each candidate in Section 3.5, and check for stellar contamination in Section 3.7.In Section 4, we present our 13 high-redshift candidate galaxies and compare them to previous studies of the same HST fields.We address the possibility of low-redshift contamination in Section 5, and use our sample to infer the density of galaxies at the bright end of the UV luminosity function in Section 6.Finally, we discuss the implications of our sample of bright, high-redshift galaxies in Section 7 and summarize in Section 8. Throughout this paper we assume a Λ cold dark matter (ΛCDM) cosmology with Ω M = 0.3, Ω Λ = 0.7, and H 0 = 70 km s −1 Mpc −1 and express all magnitudes in the AB system (Oke & Gunn 1983) unless otherwise noted.

Observations
We include observations from multiple programs, all obtained as part of parallel imaging programs with WFC311 (Kimble et al. 2008) while another HST instrument was in use.Typically these parallel observations are taken while either the Cosmic Origins Spectrograph (COS; Froning & Green 2009) or the Space Telescope Imager and Spectrograph (STIS; Kimble et al. 1998) are engaged in long integrations of a primary target.Pure parallel programs (unaffiliated with the primary programs) can accrue observations of independent and uncorrelated fields during these long primary integrations.In this paper, we combine data from a subset of the BoRG survey (Trenti et al. 2011), HIPPIES (Yan et al. 2011), the WISP survey (Atek et al. 2010), and additional coordinated parallels from the COS Guaranteed Time Observer (GTO) program.
The programs include imaging with both the UVIS and IR cameras on WFC3, as well as some optical imaging with the Advanced Camera for Surveys (ACS; 12 Clampin et al. 2000)  for HIPPIES observations from Cycle 18.All fields are observed with the F160W filter, which we use for detection, and three additional filters covering wavelengths from ∼0.6-1.4 μm.True z  9 galaxies will be detected in F160W and at most one additional filter (either F125W or F110W depending on the survey in question), and the optical imaging is crucial in detecting the Lyα (Lyα) break at 1216 Å.By z ∼ 10, F160W is the only filter available in these programs that is redward of the Lyα break.
In total, we consider 132 WFC3 parallel fields covering ∼620 arcmin 2 : 64 BoRG[z8] fields (comprised of 34 fields from the BoRG survey, eight HIPPIES fields from GO program 11702, and 22 parallel pointings from the COS GTO program), 23 additional HIPPIES fields (GO 12286), and 45 fields from the WISP survey.The locations of all 132 fields are shown in Figure 1, where the independent nature of the pointings is clear.The observations in each field depend on the survey strategies of each program and the length of each parallel opportunity.The fields we assemble here therefore have inconsistent depths and filter coverages, which we handle with careful field-specific simulations to assess the completeness and effective volume probed in each field.We provide a summary of the data sets in Table 1 and field-specific information about all 132 fields, including filter coverage and imaging depths, in Table 2.
The field areas and filter depths are calculated as follows.We measure the area covered in each pointing by summing the pixels in the F160W weight maps with values greater than 100 counts s −1 (=rms < 0.1), as this allows us to account for the area lost to cosmic rays, detector artifacts, or other bad pixels.The imaging depths presented in Table 2 are the 5σ limiting magnitudes as measured in circular apertures with radius r = 0 2 via the following method.We place ∼5000 −10,000 uncorrelated apertures randomly across the background of each image, avoiding source flux and bad pixels.The limiting magnitude quoted in each filter is found by measuring the standard deviation of the distribution of fluxes from all apertures, and thus represents the average depth across the full image.The depth measured in this way can vary across a field by up to ±0.3 mag from the median, and so the values quoted here are not used in our analysis.We rely on the noise measured locally around each source (see Section 2.4.1) to determine detection significances that we use when selecting high-redshift candidate galaxies (Section 3.2).We note that we have not performed an aperture correction to account for the fractional flux enclosed by the r = 0 2 apertures when calculating these limiting magnitudes.
We briefly explain the characteristics and data reduction of each data set in the following sections.We hereafter refer to the HST filters using a letter that specifies the bandpass followed by a three-number identification: i.e., I 600 for the long pass filter F600LP, H 160 for F160W.

BoRG[z8]
We include in our analysis data from the first set of fields observed as part of the BoRG survey (PI : Trenti;Trenti et al. 2011).These fields were observed in HST Cycles 17 and 19 (GO 11700 and 12572) with the Y 098 , J 125 , and H 160 IR filters and the V 606 UVIS filter (see top row of Figure 2).The BoRG data releases also include data from programs with similar observing strategies and filter coverages: HIPPIES (PI: Yan, GO 11702; Yan et al. 2011) and coordinated parallels from the COS GTO team (PI: Green, GO 11528, 11530, 11534, 11541, 12024, and 12025), though the HIPPIES fields as well as some of the COS parallels were observed with the I 600 filter instead of V 606 .Collectively, these three programs cover 71 fields, with data and results described in detail in Trenti et al. (2011), Bradley et al. (2012), and Schmidt et al. (2014b).These surveys were optimized to detect z ∼ 8 galaxies as Y 098 dropouts with detections in both J 125 and H 160 and are therefore referred to collectively as BoRG[z8].This designation distinguishes the first set of BoRG observations with those from later HST cycles that included the JH 140 filter.Imaging with JH 140 allows for the detection of z  9 galaxies as J 125 dropouts with two-band detections redward of the break.The BoRG[z910] fields have been used to identify z  9 candidates in multiple studies (Calvi et al. 2016;Morishita et al. 2018;Bridge et al. 2019;Morishita et al. 2020;Morishita 2021;Rojas-Ruiz et al. 2020;Roberts-Borsani et al. 2022).In this paper we focus on the BoRG[z8] fields, and perform an analysis very similar to that presented in Rojas-Ruiz et al. (2020).This paper is complementary to that of Rojas-Ruiz et al. (2020), and taken together our two papers present a consistent selection of high-redshift galaxies across the full BoRG survey The independent, uncorrelated nature of the fields-ideal for studies of rare, bright galaxies-is evident from their locations shown here.We use the reduced images and weight maps available from the BoRG data Delivery 3 from MAST. 13 These images are all pixel aligned and on a scale of 0 08 pixel −1 .We convert the weight maps (actually inverse variance maps output by the WFC3 pipeline, CALWFC3) to rms maps for use with source detection via = / rms 1 WHT .Pixels with zero weight are assigned a value of 10 4 in the rms maps.
After visually inspecting all 71 BoRG[z8] fields, we chose to remove seven fields from consideration in our analysis.Three of the fields-Par0540-6409, Par0553-6405, and Par1815-3244 -appear to cover star clusters, resulting in a very high density of point sources across the full image.Two others-Par1014-0423 and Par1230+0750-are heavily contaminated by strong detector persistence, likely resulting from observations of bright star clusters immediately preceding the BoRG[z8] exposures.We remove the remaining two-Par0751+2917 and Par1209+4543-because they were also observed in Cycle 22 with additional filters (F350LP and JH 140 ) and so are part of the BoRG[z910] survey.One field, Par0110-0224, was covered by both the BoRG and HIPPIES GO 11702 programs and is therefore partially covered by five filters (including both V 606 and I 600 ).We separated this field into two parts-one with and one without I 600 -which are labeled a and b, respectively, in Table 2.We count this as a single field in the total (of the 132), but treat them as separate fields in all other aspects of our analysis.Specifically, we perform separate completeness simulations to account for the varying filter coverage and depth across the field.Finally, a few fields are partially affected by significant detector issues or other contamination, including scattered light from Earth's limb, highly variable background across part of the images, and extremely strong Dragon's Breath (Fowler et al. 2017) in the UVIS filters.In these cases, we mask out the affected regions in all filters and weight maps, therefore accounting for the corresponding loss in total survey area.The field areas listed in Table 2 reflect the unaffected, unmasked portions of these fields.In total, we include 64 BoRG[z8] fields in our analysis, covering ∼308 arcmin 2 .fields are observed with one or the other of these filters.In place of J 125 , the WISP survey observed with the broader J 110 , making WISP fields less sensitive to galaxies at z ∼ 9 than at z ∼ 10.For reference, we show a model galaxy spectrum at z = 9 and z = 10 with a weak Lyα emission line.At z = 9, the Lyman break is near the center of the J 125 , while by z > 10 it has redshifted almost completely out of the filter, resulting in single-band H 160 detections.

HIPPIES
The HIPPIES observations from Cycle 17 were reduced and released as part of BoRG[z8] (see the previous section), due to the similar filter sets and observing strategies of the two programs.However, HIPPIES observations from Cycle 18 (PI: Yan, GO 12286) were not included, and so we count them here as a separate data set.This program used the near-infrared (NIR) filter Y 105 rather than the medium-band Y 098 , and also used I 600 as the bluest filter.
We downloaded the FLT files and supporting files for this data set from MAST and created reduced mosaics following the same procedure outlined in Rojas-Ruiz et al. (2020), which we briefly summarize here.The final, drizzled images are produced using a pipeline custom built for use with HIPPIES data, with the modifications necessary to account for the added challenges of undithered, pure parallel observations.The pipeline uses an Markov Chain Monte Carlo (MCMC) sampler to align individual images based on detected source positions.It then creates mosaic images with a pixel scale of 0 1 pixel −1 using the MultiDrizzle software package (Koekemoer et al. 2003).The output weight maps are then scaled by the average amplitude of the correlation measured in blank regions of the science images in order to produce rms maps.
We rejected a handful of pointings that are severely impacted by features such as scattered Earth light, and we masked portions of two others.We also separated field Par2346-0021 into two parts due to inconsistent filter coverage and imaging depth across the field, similar to that described for Par0110-0224 for the BoRG[z8] data set.We include 23 fields (∼108 arcmin 2 ) from the HIPPIES GO program 12286.

WISP
Similar to the programs described above, WISP (PI: Malkan; Atek et al. 2010) obtained imaging of uncorrelated fields through pure parallel observations.However, WISP also acquired slitless spectroscopy using WFC3ʼs two NIR grisms G102 and G141.Primarily a spectroscopic program, WISP devoted the majority of available integration time to grism observations and obtained imaging mainly to aid in extracting and calibrating spectra from the slitless grism images.Yet for long parallel opportunities consisting of 5 orbits, WISP pointings were observed with four imaging filters along with both grisms.We include 45 of these "deep" WISP pointings (∼204 arcmin 2 ) in our analysis, representing the first search for z > 8 galaxies in WISP observations.The WISP pointings are observed with two IR filters (J 110 and H 160 ) and the UVIS filter I 814 .Most of the fields are also observed with the UVIS filter V 606 .
The main difference between the WISP observations we include here and those of the other data sets is the use of the J 110 filter rather than J 125 .The considerable width of the J 110 filter (FWHM ∼ 5000 Å, compared to ∼3000 Å for J 125 ) makes it challenging to identify sources at z ∼ 8-9 via the dropout technique.Galaxies at redshifts of z ∼ 7-10 will all be detected in J 110 , and the resulting photometric redshift probability distributions are too broad to distinguish galaxies at the high-redshift end of this interval reliably from those at the lower end.We account for this selection effect with our field-specific completeness simulations.Additionally, all 45 WISP fields were observed at 3.6 μm with Spitzer/IRAC, which helps constrain the photometric redshifts (see Section 3.5).The detection power for WISP fields returns at z ∼ 10, when sources drop out of the J 110 filter and become single-band H 160 detections.
The WISP data reduction pipeline is presented by Atek et al. (2010) and Battisti et al. (2024), and uses the WFC3 pipeline CALWF3 with modifications required to address the specific challenges of parallel observing.We use the reduced data available through the MAST archive14 that were produced with version 6.2 of the WISP pipeline.Pixel-aligned images in all filters are available at 0 04 and 0 13 pixel scales, and we adopt the 0 13 scale for this analysis because it better samples the point-spread function (PSF)-with the undithered parallel data there are not enough exposures to cover the IR pixels at the finer resolution.

Source Extraction and Photometry
We detect sources and perform photometry in all three data sets using Source Extractor (v2.5;Bertin & Arnouts 1996) in dual image mode with H 160 as the detection filter and the rms maps as the weights.For source detection, we use the parameter values DETECT_THRESH = 1.1σ and DETECT_MINAREA = 4 pixels, which were chosen via source injection simulations as described in Appendix C. We found that the same set of detection parameters optimizes source detection (maximizing the fraction of recovered simulated sources while minimizing the spurious fraction) for all three data sets down to a magnitude of m H160 < 27.Therefore, although the images are on different pixel scales, we use one consistent set of Source Extractor detection parameters for all fields.
Photometry is performed in both circular and elliptical apertures, with each measurement serving a different purpose in our sample selection (see Section 3.2).We use small circular apertures with radius r = 0 2 to measure the detection significance of sources in all filters, because they capture the light in the core of sources while minimizing the contribution of off-source noise.We use small elliptical Kron (1980)-like apertures to measure source colors, setting the Source Extractor parameter PHOT_AUTOPARAMS to (1.2,1.7),where the first number is the scaling parameter for the ellipse and the second is the minimum radius below which elliptical apertures are converted to circular.This choice is motivated using our source injection simulations described in Appendix C. As shown by Finkelstein et al. (2010), these choices are motivated by the need to maximize both the recovered flux and the signal-to-noise ratio (S/N) for faint, compact galaxies.Additionally, the smaller aperture scaling parameter (here 1.2 compared to the Source Extractor default 2.5) reduces the number of small sources with elliptical apertures that are artificially stretched to larger sizes by bright neighbors in crowded regions.We derive aperture corrections for these small elliptical apertures in Section 2.4.2.

Photometric Errors
As the noise in the pixels of drizzled images is partially correlated, the total noise in an aperture is larger than the sum in quadrature of the noise in individual (uncorrelated) pixels (Casertano et al. 2000).The flux uncertainties calculated by Source Extractor are therefore underestimated, and the S/N measurements based on these flux uncertainties will be artificially high.We therefore perform our own calculation of the noise in the images following the process described by (Papovich et al. 2016; see also Labbé et al. 2003;Blanc et al. 2008;Whitaker et al. 2011), which we briefly summarize here.
In an image with completely uncorrelated pixels, the noise σ N measured in an aperture with N pixels is expected to scale as , where σ b is the standard deviation of the pixels containing only the sky background.At the other extreme, the noise will scale as σ N = σ b × N for completely correlated pixels (Quadri et al. 2007).We thus expect the noise to scale as σ N ∝ N β , with 0.5 < β < 1 to reflect the two limiting cases.For our analysis of the noise as a function of aperture size, we place ∼5000 nonoverlapping circular apertures randomly across each image, using the segmentation and rms maps to avoid both detected source flux and bad pixels.The apertures have radii ranging from 0.5 pixels to 12 pixels.We use Photutils (v0.6;Bradley et al. 2019) to measure the flux in these apertures and estimate σ N via the normalized median absolute deviation (Beers et al. 1990) as a robust estimator of the standard deviation in the distribution of aperture fluxes.We then fit the following parameterized function to the relation between σ N and N: where α and β are free parameters.For σ b , we again use the normalized median absolute deviation to estimate the pixel-topixel variation in background pixels in the image.We show the measured σ N and fits for an example field in Figure 3.For each source, filter, and aperture type, we use Equation (1) to calculate a flux error determined by the number of pixels in the apertures.Finally, we scale the fit by the value of the rms map at the position of the source divided by the median of the rms map.This scaling is small (the median factor is ∼1.05), but it preserves any location dependence in the noise that may be present in the images.We include Poisson photon errors in this scaling parameter, though we note that including this shot noise increases the flux uncertainties by only ∼3%-6%.

Photometric Corrections
We apply three correction factors to the catalog in order to account for (1) the missing wings of the PSF in the small Kron apertures (i.e., aperture corrections), (2) any unrecovered source flux due to the source extraction parameters, and (3) Galactic extinction.These corrections are applied in all filters to both the measured fluxes and flux uncertainties, thus preserving the optimized S/N in the smaller apertures while correcting for any missing or extinguished flux.First, we derive aperture corrections in H 160 as the ratio of the fluxes calculated in the small Kron apertures with those using larger Kron apertures corresponding to PHOT_APERTURES = (2.5, 3.5).The median flux ratio in the full catalog is ∼0.69 (i.e., source fluxes are ∼31% higher when measured in the larger apertures), though the aperture corrections are calculated and applied on a source-by-source basis.Next, we use the simulated sources from our completeness simulations (see Section 6.2) to quantify any unrecovered source flux.We find that the median recovered H 160 brightness of sources detected with S/N 5 in all data sets is ∼0.14 mag fainter than the input brightness after application of the aperture correction.We apply this correction factor to the total (aperture-corrected) fluxes and uncertainties, obtaining a final estimate of the total source flux.Lastly, we correct for Galactic extinction using the color excess E(B − V ) for each parallel field taken from the IRSA Galactic Dust Reddening and Extinction calculator. 15his service provides position-based reddening estimates from Schlafly & Finkbeiner (2011).Following Schlafly & Finkbeiner (2011), we use the Fitzpatrick (1999) extinction curve for the Milky Way, with R V = A V /E(B − V ) = 3.1, where A V is the extinction in magnitudes in the V band, to calculate the extinction at the central wavelength of each HST filter.
Finally, we perform a photometric correction on a small subsample of sources in the catalog to account for elliptical apertures that were stretched or overly elongated.This aperture stretching can occur for faint objects with very close, bright, neighboring sources.In these cases, while the fluxes measured in the small Kron apertures may be affected, the aperture corrections, which rely on the larger Kron aperture, are particularly unreliable.Following Finkelstein et al. (2015b), we identified sources in the catalog that needed this correction as those with a f AUTO /f APER flux ratio larger than the 95th percentile of all sources in their H 160 magnitude bin.Here, f AUTO refers to the flux measured in the elliptical Kron apertures, and f APER is that measured in the circular r = 0 2 apertures.We also required that the sources had a neighbor within 1″.We then determined the median aperture corrections required to go from the r = 0 2 f APER fluxes to the aperturecorrected f AUTO fluxes in bins of H 160 magnitude.We applied this correction to the f APER fluxes and uncertainties of the affected sources.For these sources only, we use the corrected circular aperture fluxes (rather than the Kron fluxes) when measuring source colors.This correction affected only one of the sources in our sample of high-redshift candidates (see Section 4).+7654 as a function of aperture size.The noise, σ N , is calculated as the normalized median absolute deviation of background flux measured in uncorrelated circular apertures placed on blank regions of the field.We show σ N for the filters in this field as blue triangles (I 600 ), green diamonds (Y 098 ), orange squares (J 125 ), and red circles (H 160 ).The functional fits to these measurements (Equation ( 1)) are plotted as curves of the corresponding colors.The black dashed and dotted-dashed lines indicate the cases of completely correlated pixels (β = 1) and completely uncorrelated pixels (β = 0.5), respectively.The noise in these images is partially correlated in all filters.Neglecting to correct for this correlation would result in underestimated flux uncertainties and overestimated S/Ns in our catalog.

Sample Selection
We select candidate z ∼ 9-10 galaxies through a combination of criteria related to detection significance, photometric redshift fitting, and visual inspection.Whereas many have selected high-redshift galaxies through strict color cuts (e.g., Trenti et al. 2011;Bradley et al. 2012;Oesch et al. 2013;Schmidt et al. 2014b;Bouwens et al. 2015;Bernard et al. 2016;Bouwens et al. 2016), we use a more holistic approach by including all available HST and Spitzer imaging in our photometric redshift selection (as in, e.g., McLure et al. 2010;Finkelstein et al. 2015b;Bouwens et al. 2019;Bowler et al. 2020;Rojas-Ruiz et al. 2020;Bouwens et al. 2021;Finkelstein et al. 2022).As the Lyman break is the main spectral feature within the HST filter coverage for galaxies at z  8, the photometric redshift fitting is akin to a Lyman-break selection.However, this approach allows for the selection of sources that might lie right on the edge or even outside of a selection window in color-color space.Yet in constructing a potentially more complete sample, we must take extra care with contaminants.We therefore aim to be conservative and as objective as possible in our process for source vetting, removing subjective steps entirely where possible and quantifying our subjectivity where not.We describe the aspects of our high-redshift galaxy selection and source vetting in the following sections.

Photometric Redshifts
We use the Easy and Accurate z phot from Yale (EAZY; version 2015 May 8; Brammer et al. 2008) software to calculate photometric redshifts for each source, real and simulated.The EAZY code fits all photometric measurements to a synthetic spectral energy distribution (SED) template via a chi-squared minimization process.In deriving a best-fitting template, it can consider linear combinations of templates from a user-supplied set, and the resulting fits are therefore less dependent on the choice of input templates.We use the 12 tweak_fsps_QSF_12_v3 EAZY templates that are based on the Flexible Stellar Population Synthesis (FSPS) models (Conroy et al. 2009;Conroy & Gunn 2010).We also add the spectrum of Q2343-BX418, a young (<100 Myr) galaxy at z = 2.3 with high equivalent width nebular emission lines (Erb et al. 2010).We include this low-mass (M * ∼ 10 9 M e ), lowmetallicity (Z ∼ 1/6 Z e ), blue (UV continuum slope β = −2.1)template as galaxies with blue colors are expected at high redshift (e.g., Bouwens et al. 2009;Finkelstein et al. 2012b).We also add a version of this spectrum with all Lyα emission removed to approximate attenuation by a neutral intergalactic medium (IGM) while still providing a template with strong optical emission lines that can affect Spitzer/IRAC colors (e.g., [O III] and Hβ for galaxies at z ∼ 8-9).We allow EAZY to construct best-fitting templates through linear combination pulling from all 14 of these input templates.
In fitting templates over a grid of redshifts, EAZY compares the input source fluxes and flux errors with the synthetic fluxes of the templates convolved through each filter.We use the total fluxes computed in the small elliptical Kron apertures as the input source fluxes (see Sections 2.4 and 2.4.2), with flux uncertainties computed as described in Section 2.4.1.We also add a minimum fractional error of 0.05 to the flux uncertainties in all filters (EAZY parameter SYS_ERR) to allow for a systematic uncertainty in our flux measurements.Template fitting is performed in the redshift range 0.01 z 12 in steps of Δz = 0.01.Since our source selection is performed with HST, the completeness falls off rapidly for z > 12 (where the Lyα break is almost out of the H 160 filter), and so we note that our results are unchanged if we fit to z = 15.At each redshift step, EAZY applies IGM attenuation following Madau (1995) and includes absorption by the Lyα forest and damped Lyα systems as prescribed by Inoue et al. (2014).As the population of galaxies at redshifts 8−10 is currently not well understood, we assume a flat luminosity prior.This approach has the benefit of not biasing us against potential true high-redshift sources, yet also means we are treating sources at all redshifts the same, i.e., a galaxy at z ∼ 2 is equally likely as one at z ∼ 10.We must therefore take extra care in considering the contamination fraction of lower-redshift galaxies in our sample, which we discuss in Section 5.

Selection Criteria
We begin by imposing an initial set of detection criteria on the full Source Extractor catalog using the S/N as measured in circular apertures of radius r = 0 2.These small apertures capture the light at the central core of the source positions as detected in the H 160 image and therefore maximize the measured S/N.First, we require that sources be detected in the H 160 band with an S/N H > 5.As we are aiming to select z  9 galaxies, we expect the sources to have "dropped out" of the V, I, and Y filters.We therefore only consider sources with an S/N < 2 in all available filters blueward of the J band (see Figure 2), permitting an S/N > 1.5 in at most one of these filters.This criterion allows for a detection up to S/N = 2 in one, but not both, of the bluer filters, such that a coincidental <2σ noise fluctuation in an optical filter or a partial detection in, e.g., Y 105 for 8.5  z  9 will not disqualify a source.
We next select sources based on their H 160 magnitude, requiring 22 < H 160 26.5.The cut at the bright end helps remove contaminants such as stars and lower-redshift galaxies, as H 160 ∼ 22 corresponds to an absolute UV magnitude of M UV ∼ −25 at z ∼ 9. We choose the magnitude limit at the faint end in an attempt to remove spurious sources.As our analysis involves relatively shallow HST imaging (as compared to the HUDF or the Hubble Frontier Fields, for example), we focus on bright high-redshift candidates.The majority of the fields in these data sets16 have 5σ depths  26.5, and so the likelihood of detecting noise as spurious sources increases for magnitudes fainter than 26.5.We use the total fluxes measured in the elliptical apertures for these magnitude cuts.
As an additional check against including spurious sources in our selection, we impose a minimum size criterion.Given the similarities in the depth and noise properties of the parallel imaging we consider in this paper and that of Rojas-Ruiz et al. (2020), we similarly expect a half-light radius criterion to remove the majority of hot pixels and cosmic rays from our sample.Following the example of Rojas-Ruiz et al. (2020), we used a random forest algorithm to obtain a quantitative size cut for each data set.In order to construct a training set, we performed an initial visual inspection of 2500 randomly selected sources from each data set that satisfied the detection significance criteria in H 160 and all optical filters.Based on our visual inspection of the imaging in all available filters, we classified each source as "real" or "spurious."These classifications became the target values or class labels input to the random forest.To this sample, we added an equivalent number of randomly selected simulated sources that satisfied the S/N criteria, each of which were classified as "real."For each data set, we thus had a sample of 5000 sources that we fed to a random forest classifier using the Python Scikit-learn package (Pedregosa et al. 2011).Though we included information such as the H 160 magnitude, source isophotal area, elongation, and surface brightness, we found that the half-light radius was the most discerning parameter.
Through this analysis, we found that the majority of sources classified as "spurious" could be removed by imposing a halflight radius cut at r 1/2 > 0 1 for the BoRG[z8] and HIPPIES GO 12286 fields and >0 14 for the WISP fields.The r 1/2 values are measured by Source Extractor in the H 160 .We do not include a maximum effective radius criterion in our selection, as it has been noted that such a criterion can remove real sources along with lower-redshift interlopers (e.g., Holwerda et al. 2020).
Finally, we use the redshift probability distribution functions (PDFs; which we denote p(z)), that EAZY calculates for each source to select a sample of candidates with preferred highredshift solutions.Specifically, we require that >70% of the integrated redshift probability be at z > 8.We do not enforce any additional criteria related to the photometric redshift fits at this stage, as the PDFs derived from measurements in only four filters provide minimal constraining power.For example, the redshift PDFs are flat for the majority of our candidates at z  10, where the Lyα break has redshifted almost entirely out of the J band and the source is detected in only a single filter.While we do not select sources based on the χ 2 of the bestfitting template, we explore the Δχ 2 of the best-fitting templates at high and low redshift in Section 5.
We begin with a catalog of 127,346 sources across all three data sets.In the following summary, we identify how many sources remain after applying each selection criterion.In order for sources to be considered z ∼ 9-10 candidates, we require that they: 1. are detected in H 160 with (S/N) H > 5 (92,941 remaining sources); 2. have H 160 magnitudes in the range 22 H 160 26.5 (80,657); 3. are undetected at the 2σ level in V, I, and Y, allowing an S/N > 1.5 in at most one of these filters (5707); 4. have a half-light radius of r 1/2 > 0 104 for BoRG[z8], r 1/2 > 0 11 for HIPPIES GO 12286, and r 1/2 > 0 14 for WISP (2059); 5. have at least 70% of their redshift PDF at z > 8 (193).
We find 193 sources that pass these selection criteria: 116 from the BoRG[z8] fields, 48 from HIPPIES GO 12286, and 29 from the WISP fields.In the following section we screen this sample for cases of detector persistence from previously observed bright targets, which could masquerade as highredshift galaxies.

Persistence
Image persistence occurs when a bright source saturates the detector and leaves a residual charge that appears as a ghost image in subsequent exposures.This situation is particularly problematic in fields for which the H 160 band is observed before the J band, as persistence fades with time and so can mimic a J dropout.The BoRG observing strategy was designed to mitigate image persistence by observing with the Y band followed by J 125 and H 160 in every orbit (Trenti et al. 2011).Persistence is therefore expected to impact both the dropout and detection filters such that the residual ghost images will not be selected as dropout candidates.However, the other data sets we include in our analysis did not always perform their observations in this order.Even some of the HIPPIES observations from GO 11702 that are included in the BoRG [z8] data set were not scheduled such that the Y-band images could shield the detection filters.The WISP observing strategy is particularly prone to self-persistence, which occurs when the offending bright targets are observed in the same visit as the affected image.As a slitless spectroscopic survey, WISP observed fields in direct imaging-grism pairs, with the J 125 and H 160 images used for source extraction and wavelength calibration of the G102 and G141 grism observations.The zeroth orders of bright stars are then common causes of persistence in the direct images that follow grism exposures.
We therefore performed a detailed persistence check for all candidates in the following way.For each candidate, we searched the MAST archive for observations taken in the 24 hr prior to each of the individual exposures (FLTs) that went into the H 160 mosaics.If any of those observations had a count >100,000 e − within a 10 × 10 pixel box centered on the position of a given candidate, the candidate was removed from our sample.We note that the use of a 10 pixel box is likely overly conservative, and that the pixel with the highest counts need not be the central pixel.However, considering a box rather than a few central pixels allows for offsets in the drizzle solution, as the world coordinate solution in the undrizzled image (that has not been distortion corrected) could be off by a few pixels at the position of the candidates.We identify 10 candidates as heavily affected by persistence and remove them from our sample.
However, given the WISP observing strategy, we find that even this aggressive approach is likely to have missed some cases of self-persistence in WISP images-where zeroth orders appear as J 110 -dropout candidates.For each source, the grism dispersion determines the location of all spectral orders in the grism exposures with respect to the position of the source in the direct image.We can therefore expect persistence from G141 zeroth orders to exist in the H 160 image ∼180-195 pixels (23 4-25 4) to the left of a bright star, where the exact offset is location dependent.We illustrate this situation in Figure 4. First, in Figure 4(a), we show an example of a WISP candidate that was identified as persistence based on the counts in previously observed FLTs as described above.This persistence is caused by a zeroth order in an earlier G141 observation that is shown in Figures 4(c) and 4(d).We show the H 160 image of this candidate in Figure 4(e), where it can be seen that this candidate is at the expected distance to the left of a bright star.In Figure 4(f), we show another WISP candidate that was not identified by our persistence check described above (the maximum counts in all FLTs was <8000 e − ), yet lies at the expected position.We have removed two such WISP sources from our sample based on their positions relative to a very bright source.We note that we are only able to identify these additional cases of persistence because of the specifics of the WISP observing strategy, and that similar unidentified cases of persistence may still contaminate the rest of our sample.Detecting candidates in a second imaging band is the only way to confirm whether such single-band detections are real sources.We show all 12 sources that are rejected as persistence in Figure 16(d) in the Appendix, and indicate those that were identified based on their distance from a bright star.
After removing the 12 cases of persistence, there are 181 remaining candidates.We further explore these sources in Section 3.4 with a visual inspection process to remove spurious or misclassified sources.

Visual Inspection
Thus far, we have selected candidate z ∼ 9-10 galaxies using only quantitative criteria.However, we find that the majority of sources in our sample at this stage are spurious detections, where artifacts such as diffraction spikes and partially removed satellite trails are selected as high-redshift candidates.Given the wavelength dependence of the PSF, diffraction spikes will be largest in the H 160 images and can therefore appear as J-band dropouts.Similarly, a satellite trail present in the H 160 image can also cause false dropout candidates, and many of the fields were observed with only two or three H 160 exposures, too few to remove a satellite trail fully through outlier rejection during mosaic creation.Spurious sources are therefore detected in the increased noise of these satellite remnants.Hot pixels and cosmic rays that impacted the detector in a compact area can also masquerade as high-redshift sources.The drizzling process and kernel smoothing that is applied by Source Extractor can both act to spread out the light into nearby pixels, giving cosmic rays a more extended appearance like that of real sources.As these pure parallel data sets are undithered, such features will not all be removed by image stacking.While bad pixels on the IR detector should appear in the same location in all IR filters, cosmic rays in the H 160 images can appear as J-band dropouts in our selection.A careful visual inspection is necessary to identify and remove such spurious candidates and other sample contamination.
However, visual inspection is an inherently subjective process, and the effect that visually rejected candidates have on the sample incompleteness is typically not accounted for with completeness calculations.We therefore introduce a new approach to visual inspection of high-redshift candidate samples.We use simulated sources (see Section 6.2) to quantify any biases that may enter our sample selection through the visual inspection process.We compile a set of 362 sources, consisting of all selected real candidates and an equal number of simulated candidates selected in the same fields and via the same criteria.We randomize this set of real and simulated candidates and carefully inspect each one, accepting or rejecting them without knowing which is simulated.In this way, we are able to quantify the effect of the visual inspection on sample incompleteness and incorporate the subjective nature of visual inspection into our completeness simulations.We defer the discussion of our analysis of the visually inspected simulated sources to Section 6.2.1, and for the rest of this section refer to the inspection and rejection of the real candidates.However, we recommend this "anonymous" strategy for any sample selection that relies on visual inspection.
We create a set of categories for rejection, ensuring that we use a standardized classification scheme in all visual inspections.The classification categories we consider are as follows.
First, we consider whether the source in question is a spurious detection.As mentioned above, diffraction spikes and satellite remnants are common examples of spurious detections.Additionally, we reject sources located immediately along image edges where the noise levels are significantly higher, because these sources at best have unreliable photometry and at worst are only partially extracted.We also expect spurious sources may be identified on the edges of bright sources, where the Source Extractor deblending can be too aggressive.In these oversplit regions, the light from a bright source gets separated and attributed to multiple sources.The PSF is smaller in bluer filters, and so any oversplit regions are closer to the bright object core.Oversplit regions in J and H can therefore appear as high-redshift galaxies that have dropped out of the bluer filters.We consider this possibility during our inspection, but do not identify any sources in this category.As part of our visual inspection, we also inspect the rms maps at the position of each candidate and reject sources with clusters of bad pixels identified within the r = 0 2 circular apertures used to determine detection significance.Our final category of spurious detections includes hot pixels, cosmic rays, and other artifacts that are not identified as bad pixels in the rms maps.We identify these features visually as sources appearing as single pixels, disjointed clusters of pixels that may have been smoothed together during source detection, or otherwise strange morphologies such as very sharp edges or plus signs (indicative of a single pixel that was spread out during drizzling).
We then consider two additional categories for candidate rejection, cases in which a source is identified as real yet was incorrectly selected due to problems related to photometric measurement or redshift fitting.We inspect all available V-, I-, and Y-band images at each source position to ensure that there is no significant flux that was missed in the catalog measurements.Second, we check that any elliptical apertures stretched by the light from a large, bright neighbor have been properly corrected (see discussion in Section 2.4.2).This last As can be seen in panel (e), the position of self-persistence resulting from this specific observing strategy will be offset by ∼180-195 pixels from the offending bright source.We have identified and rejected two additional WISP sources in our sample based on their position relative to a bright star, one of which is shown in panel (f).These candidates were not identified as persistence based on counts from previous observations at the candidate position on the detector, indicating that even this thorough check can miss cases of persistence.
check is to confirm that the apertures used to measure colors closely match the source morphology for all candidates.
In summary, we consider the following categories of spurious sources or unreliable measurements in our visual inspection.We indicate in parentheses the number of real sources rejected in each category, but emphasize that real and simulated sources are inspected together in a random, unknown order.stretched by a neighboring source and not subsequently corrected?(0) 3. Is there significant optical flux that was not measured correctly in the catalog?(0) As can be seen, all 166 candidates we rejected through visual inspection were identified as some type of spurious detection or false source.While 85 of these rejections are objectively motivated (diffraction spikes, satellite trails, image edges, and bad pixel clusters), 81 are based on subjective opinion during the inspection (category 1e).In Section 6.2.1, we use the classifications of simulated sources performed as part of the "anonymous" visual inspection to calibrate biases introduced by these 81 rejections.
Following this visual inspection, there are 15 candidates that have satisfied all of our selection criteria.We further explore these sources in Sections 3.5, 3.7, and 5.

IRAC
With 15 candidates remaining after rejecting sources from our visual inspection and persistence checks, we now turn to incorporating Spitzer/IRAC (Fazio et al. 2004) imaging to our analysis.Imaging at these redder wavelengths is a crucial tool for disentangling the SEDs of true high-redshift galaxies and sources that are common contaminants in high-redshift samples (e.g., Finkelstein et al. 2022).The J − H colors of both passive and dusty star-forming galaxies at z ∼ 2-3 can be indistinguishable from those at z > 8, and these contaminants will likely be undetected in bluer filters given the shallow imaging of these parallel data sets.Additionally, as the majority of galaxies at z > 8 are compact or unresolved in the HST NIR imaging, M, L, and T dwarf stars can similarly contaminate HST-selected high-redshift samples.However, the SEDs of all three types of contaminants are expected to diverge from those of high-redshift galaxies at λ  2 μm, and even shallow IRAC imaging can help distinguish between them.
We searched the Spitzer Heritage Archive hosted by IRSA17 for all available IRAC imaging at the positions of each candidate and downloaded the corresponding Level 2 ("Post Basic Calibrated Data;" PBCD) mosaic images.These mosaics are processed by the IRAC pipeline version S19.218 and are on a 0 6 pixel scale.Spitzer/IRAC 3.6 μm imaging exists for 11 of the HST parallel fields in which we identify candidates (nine of which also have 4.5 μm imaging), amounting to IRAC coverage of all but two of the 15 candidates.The imaging was obtained as part of seven unique programs, many of which were designed with the goal of following up high-redshift candidates previously identified in these fields.Table 3 lists the program information for each IRAC data set that we include in our analysis.We measure IRAC photometry for each candidate following the method presented in Rojas-Ruiz et al. (2020), which we describe in the following sections.

Galfit Modeling
We model all sources in each stamp with the Galfit19 imagefitting software (v3.0 Peng et al. 2010) as demonstrated by Finkelstein et al. (2015a).Given the lower resolution of IRAC images, this modeling approach is necessary to deblend the light from the candidate and any neighboring sources.For each candidate, we create background-subtracted IRAC stamps that are 30 6 × 30 6 (51 × 51 pixels).We consider each IRAC observation separately, such that a source observed at multiple position angles or through multiple programs may have multiple stamps per channel.This approach results in 36 stamps (24 of which are at 3.6 μm) for the 13 sources with IRAC coverage.In our photometric analysis, we treat each stamp as an independent measurement.
We use source positions and magnitudes from the H 160 catalog as inputs to Galfit, including everything in the catalog down to H 160 = 25 and making the 3.6 and 4.5 μm magnitude initial guesses one magnitude brighter than the H 160 value.We constrain source positions in the Galfit model to be within ±0 9 (±1.5 pixels) of the input values and model magnitudes to be brighter than 40.We can safely assume that sources that hit this lower magnitude constraint are undetected in the IRAC imaging, and so we iteratively remove them from the model.
The modeled sources may be either extended or point sources.Extended sources, defined as those with a semimajor axis in the H 160 Source Extractor catalog that is >2× larger then the FWHM of the IRAC point response function, are modeled as Sérsic profiles.However, the majority of sources in the IRAC stamps-all high-redshift candidates and almost all of their neighbors-are modeled as point sources.For the point-source models in each IRAC channel, we create two sets of median PSFs following the procedure described in Appendix D. One set combines point sources identified in all programs except 90045 (PI: Richards) and is used to model sources in the IRAC imaging for these six programs.The second set was created from point sources identified in program 90045 and is used exclusively with the Galfit modeling for Par2346-0021.We use a separate PSF for Par2346-0021 because program 90045 employed a significantly different dither strategy that could affect how the PSF is sampled.We discuss this further in Section 3.5.4.Both sets of PSFs are shown in Figure 18 in the Appendix.In Figure 5, we show the results of running Galfit on the 13 candidates with IRAC coverage.

IRAC Photometry
We successfully model the target source with Galfit in 23 of the 36 IRAC image stamps.For these 23 observations, we adopt the Galfit model magnitudes.For the remaining 13 observations, Galfit did not measure a significant flux at the source position, i.e., the Galfit model magnitude hit the constraint at 40 mag and the source was removed from the model.In these cases, we measure the photometry of the target in the Galfit residual maps with all neighboring source flux removed.For this measurement, we use circular apertures of radius r = 1 5, corresponding to a diameter that is ∼1.5× the FWHM of the 3.6 μm warm mission point response function. 20he black circles in the zoomed-in residual maps in Figure 5 indicate the aperture sizes used to measure the candidate fluxes.We apply aperture corrections of 1.69 (3.6 μm) and 1.70 (4.5 μm), obtained by measuring the flux of our custom PSFs in circular apertures of increasing radii.
We estimate the flux uncertainty in these measurements in the follow way.First, we determine the average 1σ depth in r = 1 5 circular apertures placed randomly across the full, background-subtracted IRAC images, using Source Extractor segmentation maps to avoid source flux.We then treat this 1σ uncertainty as a noise floor, adopting it as the minimum uncertainty for both the Galfit model fluxes and those measured through aperture photometry.Indeed, the majority of the Galfit uncertainties are larger than the background-measured standard deviation, indicating that the Galfit photometry is also accounting for uncertainties related to the source flux deblending.

Photometric Redshifts with IRAC
Finally, we rerun EAZY for each high-redshift candidate, incorporating all available HST and Spitzer/IRAC photometry.As mentioned previously, we fit each IRAC observation with Galfit separately, leading to multiple photometric measurements for some candidates.We adopt a weighted mean of the available measurements for each IRAC channel in our EAZY runs.We note that the EAZY results-including the best-fitting SED templates and the redshift probability distributions-are nearly identical if we instead treat each IRAC flux as an independent measurement, i.e., with multiple instances of the IRAC 3.6μm and 4.5 μm filters in the EAZY setup files.The best-fitting redshifts from each method (weighted mean versus multiple photometric measurements) differ by at most Δz < 0.05.Note.
a The average 5σ limiting magnitude for each image, measured in r = 1 5 apertures.
With the exception of one candidate (Par2346-0021_164, discussed in Section 3.5.4), the HST and Spitzer/IRAC photometry continues to prefer a high-redshift solution.For many of these candidates, the addition of the IRAC photometry reduces the fraction of the redshift PDF that lies at z < 6, decreasing this fraction from ∼0.1-0.2 in most cases to <0.01.The most extreme improvements are for candidates Par0926+4000_369 (p(z < 6) HST = 0.295 to p(z < 6) HST+IRAC < 0.001) and Par0843 +4114_120 (p(z < 6) HST = 0.236 to p(z < 6) HST+IRAC = 0.009).This rejection of the lower-redshift solutions typically occurs when the upper limits measured in one or both of the IRAC channels fall below the flux density expected for lower-redshift, red, dusty galaxies.For some candidates, the EAZY run with IRAC photometry even helped narrow the redshift probability peak at z > 8, thereby tightening the photometric redshift constraints.The 95% interval of the redshift PDF at z > 8 decreases by z ∼ 0.36 on average and as much as 0.9 for candidate Par0926+4000_369.In these cases the IRAC measurements, as either detections or upper limits, help constrain the expected location of the 4000 Å break.We show the best-fitting SED templates and photometric redshift PDFs incorporating the HST and IRAC photometry for our high-redshift candidates in Figures 8 and 9, and discuss the IRAC results of individual candidates in Section 4.1.

Field Par2346-0021
While almost all of the IRAC observations listed in Table 3 covered a single pointing with relatively small dithers, those covering HIPPIES field Par2346-0021 (PI: Richards; PID 90045) are a little different.This program aimed to cover a large enough volume to study the clustering and luminosity function of quasars at z > 3, and so it created wide-area maps along the Sloan Digital Sky Survey "Stripe 82."The PBCD maps at the position of our high-redshift candidate cover ∼2.2°× 0.65°and are significantly shallower than the IRAC imaging available for the rest of our sample (∼1.8 [3.6 μm] and 0.7 [4.5 μm] magnitudes shallower, see Table 3).The imaging in PID 90045 also used two small dithers per pointing, compared with the 30-100+ medium dithers of the other IRAC programs.The fewer, smaller dithers will result in a PSF that is more undersampled.We therefore create a separate set of PSFs for this program as described in Appendix D. We therefore measure IRAC photometry for the candidate in Par2346-0021 separately, following the same steps described in Section 3.5 but with the field-specific custom PSFs.We apply aperture corrections of 1.67 (3.6 μm) and 1.98 (4.5 μm), measured on the PID 90045 PSFs.
The Galfit model brightnesses calculated for Par2346-0021_164, when combined with the HST photometry, reduce our confidence in this candidate.As shown in Figure 6, this source has a very red H − [3.6] color that is consistent with a dusty, red SED.The photometric redshift PDF obtained with the HST + IRAC photometry (blue, solid line) has an increased peak around z ∼ 3 when compared with that from the HST photometry alone (purple, dashed).Due to this added probability at lower redshifts, only 62% of the integrated PDF is at z > 8, rather than the 70% threshold we require as part of our selection criteria.
While we may opt to keep an HST-selected source in our sample if we find that we cannot trust the IRAC photometry, that is not the case for this candidate.It lies outside of the existing I 600 imaging in this field, and so is only covered by three HST filters, and the HST-only photometric redshift solution is not well constrained (the 68% redshift interval ranges from z = 3.07 to 10.06).Additionally, the candidate does not appear to have any close neighbors in the H 160 image, and so we consider the IRAC photometry reliable.Although the IRAC imaging is shallow, the Galfit model magnitudes for the candidate are significant.We therefore remove Par2346-0021_164 from our sample as a possible z < 7 contaminant, but note that it could be a viable z ∼ 9 candidate.

Photometric Redshift Goodness of Fit
While we did not include a minimum χ 2 threshold on the EAZY templates as part of our selection criteria, we now explore the goodness of fit of these photometric redshifts, including IRAC where available.Specifically, we consider the difference between the χ 2 of the best-fitting template and the minimum χ 2 at z < 6.This Δχ 2 provides an estimate of the quality of the photometric redshift fit at high redshift compared to the best possible fit at lower redshift.In all cases, the χ 2 of the best-fitting template is lower than that at z < 6, with a median . However, the Δχ 2 for candidate Par2132+1004_509 is 2.53, lower than the value corresponding to the 95% confidence interval (χ 2 > 4).This candidate also has a broad PDF (the redshift interval containing 68% of the PDF ranges from z = 4.01 to 11.51), is not covered by IRAC imaging, and falls on the UVIS chip gap in the I 600 imaging (see Figure 6).We therefore consider this source as an unreliable high-redshift candidate and remove it from our sample.

Comparison with Stellar Colors
Finally, we explore whether any of the z  9 candidates are likely to in fact be stars.Low-mass stars and brown dwarfs can have NIR colors that are very similar to those of high-redshift galaxies, and so are common sources of contamination in Lyman-break samples.The case for potential contamination is made worse by the fact that all but one of the candidates in our sample are unresolved in the HST imaging.Eight of the candidates have half-light radii (r 1/2 ) as measured in H 160 that are smaller than the median r 1/2 of the stars in their respective data sets (see Section D).The r 1/2 of another four candidates are within 1σ of the median values.The one exception is Par0456-2203_473, which has a larger r 1/2 and is discussed further in Section 6.1.The very compact sizes measured for the majority of our sample are not necessarily concerning.These sources have fairly low S/Ns in undithered data, and so are not expected to show significant extended morphologies.However, it does make a comparison with stellar colors crucial.
We therefore compare the colors of the 13 remaining candidates with those of M, L, and T dwarfs.The stellar colors are calculated from spectra in the range ∼0.63−2.5 μm taken with the medium-resolution (R ∼ 2000) spectrograph SpeX (Rayner et al. 2003) at the NASA Infrared Telescope Facility (IRTF).We downloaded from the IRTF Spectral Library21 (Burgasser 2014) 582 observations of 449 unique stars (132 M dwarfs, 210 L dwarfs, and 107 T dwarfs), covering the full range of temperature subclasses and including J, H, and Ks photometry from the Two Micron All Sky Survey (2MASS).
We calculate broadband HST fluxes by integrating each SpeX spectrum through the HST filter profiles shown in Figure 2. The filters are interpolated onto the observed wavelength array of each spectrum such that only the portion of the filter profile that overlaps the spectrum is used in calculating the broadband fluxes.For each SpeX spectrum, we remove any fluxes that result from a <25% wavelength overlap between the filter and the spectrum.Following Finkelstein et al. (2022), we then assign IRAC magnitudes to each stellar spectrum using the median Ks − [3.6] and [3.6] − [4.5] presented by Patten et al. (2006).We match the SpeX stars to the colors in Tables 1 and 3 from Patten et al. (2006) by spectral type, using Δtype < 1 where possible and Δtype < 3 for a few sources with no closer spectral matches.We convert all photometry-2MASS J, H and Ks from the SpeX Library, and 3.6 and 4.5 μm obtained by matching to Patten et al. (2006) -from Vega to AB magnitudes using m Vega + 0.91, 1.39, 1.85, 2.79, and 3.26 for 2MASS J, H, Ks, and Spitzer/IRAC 3.6 μm and 4.5 μm, respectively.
Finally, with a full suite of photometry for each SpeX spectrum, we calculate a χ 2 goodness of fit for the photometry of each high-redshift candidate.We then compare this SpeXderived χ 2 with that from the best-fitting galactic template for all candidates.As an example, in Figure 7, we show the χ 2 measured for each SpeX spectrum given the photometry of Par0713+7405_95.Of all candidates in our sample, this source has both the lowest minimum SpeX-derived χ 2 (=13.07) and the smallest Δχ 2 (SpeX − EAZY) = 11.1.
Therefore, although almost all of the 13 candidates are unresolved, none of their HST/WFC3 and Spitzer/IRAC (when available) colors are better fit by these 500+ M, L, and T dwarf spectra.We conclude that all 13 candidates are better fit by galactic spectral templates and do not remove any from the sample due to stellar contamination.Future observations with JWST/NIRCam imaging, for example, will vastly improve on the stellar contamination analysis currently possible in highredshift Lyman-break samples by providing both higherresolution imaging in the NIR and imaging in five or more filters redward of the expected position of the Lyman break.

High-redshift Candidates
After applying all selection criteria, visual inspection, persistence screening, and stellar color comparison, we have a sample of 13 high-redshift candidate galaxies spanning the range 8.3  z  11.In Table 4, we list the candidates, their positions, H 160 magnitudes, half-light radii measured in H 160 , and their photometric redshifts.The photometric redshift fits all include IRAC where available.We also provide the percentage of the integrated photometric redshift PDF that lies at z > 8 (recall that we require p(z > 8) > 70% for sample selection) and the redshift interval containing 68% of the redshift PDF in columns 9 and 10, respectively.The lower bound of this interval is above z = 8 for all but two of the candidates, and these both have p(z > 8) > 80%.Column 9 also demonstrates that while the redshift PDFs are relatively well constrained to be at high redshifts, they are still quite broad with 68% p(z) intervals ranging from ∼0.8 to ∼1.4.Column 11 lists the difference between the χ 2 of the best-fitting template and the minimum χ 2 at z < 7 for the same EAZY run (see the discussion in Section 3.5).
In Figures 8 and 9 we show the best-fitting SEDs and photometric redshift PDFs for the HST photometry (purple dashed) and the HST+IRAC photometry when available (blue solid).We also show the best-fitting template obtained by limiting EAZY to z < 7 as a red dashed-dotted curve.The lower-redshift (z < 7) templates are ruled out by the upper limits measured blueward of the expected Lyman break, especially those in the Y bands for candidates at z < 10.5 and in the J bands for candidates at z > 10.5.The addition of the Spitzer/IRAC photometry also significantly helps, especially the upper limits measured in the 3.6 μm images for candidates at z > 10.The IRAC nondetections rule out dusty galaxies with redder H -[3.6 μm] colors in favor of higher-redshift sources with a flat spectrum or bluer color.Postage stamps of all available filters are shown along the top, where we display the Galfit residual maps for each individual IRAC observation.All modeled fluxes are removed from the residual maps except that attributed to the candidate.
In the following subsections, we briefly discuss each candidate.

Par1033+5051_116
Candidate Par1033+5051_116 (m 160 = 25.99) is detected in both J 125 and H 160 .As a two-band detection, we consider this candidate to be higher confidence than the single-band detections in the sample, both because it is less likely to be a spurious source and because its photometric redshift is better constrained.With a photometric redshift of z phot = 8.28, this candidate is at the low end of our sample, yet the 68% interval extends out to z ∼ 8.7.This candidate was originally identified by Bradley et al. (2012).

Par0440-5244_497 & 593
Candidate Par0440-5244_497 (m 160 = 25.47) is also a twoband detection, with z phot = 8.53.A second candidate in our sample, Par0440-5244_593 (m 160 = 26.46) is detected in the same HST pointing at a distance of ∼1 25 from source 497, corresponding to ∼0.35 pMpc at z = 8.5.However, as Par0440-5244_593 is a single-band detection with a photometric redshift of z phot = 11.07, the proper distance between the two sources is ∼58 pMpc.Here we have adopted each of their photometric redshifts and used the average z phot to convert their comoving distance to a proper distance.Given the redshift separation, we do not expect these sources to be associated with the same overdensity (see discussion in Section 7.3).Candidate Par0440-5244_497 was originally identified by Bradley et al. (2012).

Par0713+7405_95
The third candidate in Table 4, Par0713+7405_95 (m 160 = 25.74), is also a two-band detection.While the photometry in Figure 8 shows a detection in I 600 , we note that it is not significant with S/N < 1.5.The inclusion of the IRAC photometry helps to narrow the width of the primary redshift PDF peak, placing z phot = 9.23.However, the rejection of the lower-redshift solution is driven by the nondetection in Y 105 .(12) difference in the χ 2 of the best-fitting EAZY template and the best-fitting stellar SpeX spectrum; (13) median rest-frame UV absolute magnitude and 68% interval of magnitude distributions (see Section 6.1); and (14) median magnification and standard deviation of the magnification distribution (see Section 6.1). a This field does not have Spitzer/IRAC coverage.b These candidates were previously reported by Bradley et al. (2012) and Roberts-Borsani et al. (2022).c These candidates were found to experience intermediate-to-strong lensing, and so the M UV values have been corrected for magnification as described in Section 6.1.d This redshift is based on an upper limit estimated for a contaminated J 125 flux.See text for details.

Par0456-2203_473
The IRAC deblending for Par0456-2203_473 is uncertain due to its close, bright neighbor.The flux remaining in the IRAC residual maps has been attributed to the candidate, but it is not fully centered in the blue box that indicates the size of the HST stamp.This flux may therefore be partially or fully coming from the neighboring source.With such a close neighbor, any measured fluxes at IRAC wavelengths will be uncertain until the much higher-resolution JWST/NIRCam can be used to separate the light from these sources.However, the HST-only photometry yields a high-redshift solution that is still preferred even when these relatively bright IRAC fluxes are included.We note that Par0456-2203_473 has a bright H 160 magnitude (m H160 = 24.52),which could make it one of the brightest candidates detected at z > 8.We calculate an absolute UV magnitude for this source of M 1500 = −22.86,∼0.2 brighter than the z ∼ 10 candidate presented by Morishita et al. (2018) in the BoRG[z910] fields.However, as discussed in Section 6.1, this source is strongly lensed by its very close neighbor, and is likely much fainter intrinsically.This source is also the only one in our sample that received the aperture correction for stretched apertures described in Section 2.4.2.The H 160 magnitude in Table 4 is an aperture-corrected flux calculated in a circular aperture, rather then the elliptical Kron apertures used for the other candidates.We note that we measure a large half-light radius for this candidate, r 1/2 = 0 33.This r 1/2 is consistent with the radius measured by Holwerda et al. (2015) for low-redshift interlopers, yet is not inconsistent with a high-redshift nature (Holwerda et al. 2020).Also, as a strongly lensed source, this 0 33 radius may not represent the intrinsic size of this candidate.

Par0259+0032_194
A fifth candidate, Par0259+0032_194 (m 160 = 24.79), is potentially another two-band detection, though there is a cluster of bad pixels in the J 125 image at the position of the source The inclusion of the IRAC photometry almost always tightens the width of the PDF at high redshifts while reducing the portion of the PDF present at lower redshifts.The red dotted-dashed curve shows the best-fitting template/PDF at low redshift, when EAZY is restricted to z < 6.These candidates are all better fit with a high-redshift galactic spectral template than that of a lower-redshift galaxy.We use a different color scheme with Par0259 +0032_194, and show the photometric redshift fits when the J 125 photometry is excluded (solid green curve) and when a lower limit is assumed as described in the text (dashed black curve).
(there is no such cluster affecting H 160 ).The J 125 flux of this source was masked out in our photometric catalog, and the source is treated as having no J 125 coverage in our analysis (sample selection, photometric redshift fitting, and comparison with SpeX spectra).However, the removal of J 125 leads to a poorly constrained redshift solution with the source placed at a higher redshift than it likely should, considering it may very well be detected in J 125 .We remeasure the J 125 flux of Par0259 +0032_194 after masking out the bad pixels (using Source Extractor in dual image mode with H 160 as before).We use this J 125 flux to measure a new photometric redshift, z phot = 9.36, which is quoted in Table 4.We adopt this redshift PDF (gray "H 160 and J 125 " curve in Figure 8) in estimating the candidate's magnification and absolute magnitude posterior (Section 6.1) and constructing the luminosity function (Section 6.4).We note that this J 125 flux is likely a lower limit, and a brighter J 125 flux would indicate a stronger spectral break and therefore more strongly reject the smoother slope of a lower-redshift, red galaxy.In Figure 8, we show the original SED and redshift PDF for Par0259+0032_194 (i.e., excluding the J 125 flux) in green and those including the estimated J 125 lower limit in black, using a different color scheme as the rest of the figure to signify this special case.

Par335_251
Included in our sample is the first z > 8 candidate from the WISP survey, Par335_251 (m 160 = 24.74).It has a photometric redshift of z phot = 10.53 and is detected only in H 160 , as expected for WISP fields (see the discussion in Section 6.3).As can be seen in Figure 8, it lies on the UVIS chip gap in V 606 , and so we treat the source as having no V 606 coverage.As Par335 has IRAC 3.6 μm coverage, we have this additional constraint for the WISP source, and so we include it in our sample.Additionally, as a slitless spectroscopic survey, WISP observed Par335 with both of WFC3ʼs IR grisms, G102 (8.8-1.1 μm, R ∼ 201) and G141 (1.07-1.7 μm, R ∼ 130).The integration times were tuned to achieve approximately uniform sensitivity at all wavelengths for a line of a given flux, and for this field are ∼8800 s in G102 and ∼3800 s in G141.
While these spectra will be far too shallow to detect an emission line or continuum for a source at z ∼ 10.5, it is instructive to explore them for any spectral features that may indicate the source is at a lower redshift.We downloaded the extracted, calibrated spectra from the WISP page on MAST (see footnote 4) and plot them in Figure 10.Assuming the bestfitting redshifts for the high-and low-redshift solutions, we show the expected location of Lyα at z = 10.53 and [O II] λ3727 at z = 2.54.There are two possible emission lines at λ obs ∼ 12200 Å and λ obs ∼ 14000 Å that lie on either edge of the expected range for [O II] given the low-redshift solution.
We measure these lines using a Monte Carlo analysis, assigning fluxes at each wavelength step that are pulled randomly from a normal distribution centered at the flux value and with σ equal to the flux uncertainty.We then fit the fluxes with a Gaussian profile using least-squares minimization to determine the best fit, and repeat this process 1000 times.For each line, the median of all 1000 fits is displayed in the top panels of Figure 10.We define the S/N of the line as the flux of the median fit divided by the standard deviation of the fits.The potential emission lines at 12200 and 14000 Å have S/N = 2.3 and 1.5, respectively.We note that the sky background has been oversubtracted in the spectrum, and so we fit a new continuum (black horizontal dashed line) from which we measure the strength of each line.If fit instead with a continuum at flux = 0, the S/N of each line would decrease.
Both lines have S/N < 3 and lie far from the expected observed wavelength for [O II] λ3727 given the low-redshift solution's narrow PDF.Additionally, the Δχ 2 between the best-fit solution at z ∼ 10.5 and the minimum χ 2 at lower redshift is 20.41, indicating that the low-redshift solution is disfavored.We note that photometric redshifts and the associated PDFs depend heavily on the choice of templates used for SED fitting.While neither line coincides with the peak of the low-redshift PDF, and >99% of the integrated PDF for this source lies at z > 8, either line could be [O II] and an indication of a lower-redshift galaxy.
It is interesting to consider briefly the alternative option, as the potential line at λ obs ∼ 14000 Å would be very consistent with Lyα at a redshift of z = 10.53.Lyα has been observed into the epoch of reionization (e.g., Castellano et al. 2018;Jung et al. 2020;Tilvi et al. 2020;Wold et al. 2022), yet such a detection would require hours more integration time, as in Zitrin et al. (2015) and Larson et al. (2022).We conclude that the WISP spectrum does not clearly help constrain the nature of Par335_251, and while either of the two possible emission lines may correspond to [O II] at z ∼ 2.5, they are very low significance.We therefore keep this candidate in our highredshift sample.are all single-band detections in the HST imaging.All five have IRAC coverage at both 3.6 and 4.6 μm, and are nondetections in both bands.The H 160 -only detections imply the candidates are at z > 10 because the Lyα break has redshifted out of the J 125 filter.However, if the break is instead partially within the J 125 , there may be J 125 flux below our detection threshold in the individual images.We therefore stacked the J 125 images of these five candidates, regridding each stamp to a common pixel of 0 01 pixel −1 as described in Section 5.1.We do not detect any flux at the position of the candidates (S/N < 1) and so are not able to constrain the redshift of these sources further via image stacking.
While the nondetections in the bluer HST filters as well as both IRAC channels help to rule out lower-redshift contaminants, there is an increased concern that these single-band detections are spurious sources (e.g., hot pixels or cases of persistence).Three of these candidates are from the HIPPIES GO 12286 data set, which has been noted to suffer from increased levels of persistence and contamination (e.g., Morishita 2021).We have done our best to identify and remove spurious sources (see Sections 3.4 and 3.3), yet the possibility remains that some or all of these candidates are contaminants in our sample.Spectroscopic confirmation or imaging in more filters with NIRCam is needed to evaluate the high-redshift nature of these sources confidently.The best-fitting low-redshift solution would place [O II] at λ obs ∼ 1.32 μm (red dashed-dotted line).We show a horizontal rectangle for each emission line weighted by the redshift PDFs and extending to their 95% intervals.There are potential emission lines at ∼12200 Å (S/N = 2.3) and 14000 Å (S/N = 1.5) that are broadly consistent with the z ∼ 2 redshift PDF, though this low-redshift solution is disfavored with a Δχ 2 = 20.41.The fits to each line are described in the text and displayed in the top panels.The 14000 Å feature would be consistent with Lyα at z = 10.53,though a Lyα detection at this redshift is unlikely due to the relatively short integration times.While there is no conclusive indication that this candidate is at a lower redshift, its true nature is not clear from this spectrum.

Comparison with Previous Studies in the Same Fields
The data sets we include in our analysis are from surveys that were designed to detect high-redshift galaxies.The BoRG survey in particular was optimized for searches for z  8 sources, and several teams have previously reported results for galaxies in these fields.In this section, we discuss the results from the previous studies that were performed in the same HST fields that we consider in this paper.As this is the first search for z > 8 candidates in the HIPPIES GO 12286 or WISP fields, we focus here on results in the BoRG[z8] fields.
We begin by discussing the BoRG[z8] results aimed at searching for galaxies at z ∼ 8 as Y 098 dropouts (Trenti et al. 2011;Bradley et al. 2012;Schmidt et al. 2014b).These studies require a significant detection in J 125 and use H 160 as a second detection band to rule out spurious sources.We are focusing our search at higher redshifts and allow for single-band detections, and so do not expect to recover many of the candidates presented in these papers.However, we do find two candidates in common, which is unsurprising because the redshift PDFs of some sources in our sample extend to z ∼ 8. Our two lowest-redshift candidates, Par1033+5051_116 and Par0440-5244_497, were originally published by Bradley et al. (2012).For both candidates, we measure slightly brighter H 160 magnitudes: m H160 = 25.99 (Par1033+5051_116) and 25.47 (Par0440-5244_497) compared with 26.2 and 25.9, respectively, based on the J 125 -H 160 colors reported in Bradley et al. (2012) While Bradley et al. (2012) do not calculate photometric redshifts for their candidates, the completeness they present for a representative field peaks at z ∼ 7.8-8, which is consistent with the 68% intervals we measure for our redshift PDFs.We do not recover the remainder of the candidates presented in these three papers because our selection function is more sensitive to galaxies at higher redshifts than those identified as Y 098 dropouts.We also focus on a slightly brighter magnitude range than these BoRG[z8] studies, requiring m H160 26.5 while many of the candidates presented in these papers are in the range 26.7  m H160  27.2.
Next, Morishita et al. (2020) perform a search for quasars at z ∼ 8 in the BoRG[z8] fields (among others), focusing on the identification of Lyman-break candidates that have pointsource morphologies rather than resolved shapes.Morishita et al. (2020) also employ a combination of color cuts and detection significance criteria in their sample selection, but additionally require that the 70% of the source redshift PDFs be at z > 6.5.Morishita et al. (2020) are again focusing on a slightly lower redshift range than we are, and so it is not surprising that our samples do not overlap.We do not select their one candidate in a BoRG[z8] field because it is detected in Y 105 with an S/N > 2. We do find one candidate in the same field as they do (Par0926+4536_155), but our candidate is ∼2′ away from the one they identify.
While the BoRG[z8] survey was designed to detect galaxies at z ∼ 8 as Y 098 dropouts, (Bernard et al. 2016; hereafter B16) performed a search comparable to ours for z ∼ 9-10 galaxies in these fields.They focused on the 62 fields with an exposure time in H 160 of at least 900 s, a survey area covering 293 arcmin 2 .(We include 66 pointings and make no exposure time cut.)B16 identified three z ∼ 10 candidates.They perform their search using the Lyman-break technique to identify J 125 dropouts, implementing a J 125 − H 160 > 1.5 color selection that will select a strong J 125 − H 160 break while still allowing for J 125 detections.This color selection is coupled with a detection requirement of S/N H160 8 and nondetections required in the V and Y 098 filters (S/N V and S/N Y098 < 1.5).They also implemented a threshold of CLASS_STAR < 0.95 and performed a visual inspection of their candidates.This source selection method is in contrast to our process, which we base on photometric redshifts rather than color cuts (Section 3).As discussed in Appendix D, we also find that the accuracy of the CLASS_STAR flag decreases with magnitude, and so we did not incorporate it into our selection but instead compare the colors of our sources to identify stellar contaminants.We also note that B16 did not have access to Spitzer/IRAC imaging of their candidates, but applied for IRAC follow up which we are able to include in our analysis here (Spitzer PID 12058, PI: Bouwens).
B16 identified a further three candidates as contaminants based on source size.Following Holwerda et al. (2015), the authors adopt 0 3 as an upper limit for the half-light radius of their candidates and reject larger sources as low-redshift interlopers.On the other hand, Holwerda et al. (2020) find that while such a size criterion could remove up to 75% of lowredshift interlopers, it may also reject ∼20% of real, bright galaxies in the targeted redshift range.Source size and morphology is another area where the higher resolution of the JWST/NIRCam imaging will significantly improve our understanding of the size-luminosity relation out to z  10.We do not apply an upper limit size cut to our sample selection, though our candidates are fairly compact with (non-PSFcorrected) half-light radii in the range 0 11−0 19 (with the exception of Par0456-2203_473, see Section 4.1.4).
We do not recover any of the three candidates presented in B16.Two of their candidates (borg_0240-1857_25 and borg_0456-2203_1091) satisfied our selection criteria but were rejected during visual inspection in category 1e (a hot pixel, cosmic ray, or source with a strange morphology).We identify these two sources in Figure 17.Our rejection of these two candidates illustrates the necessarily subjective nature of highredshift candidate searches with HST, where potentially real candidates may be removed in an effort to clean samples of spurious sources.Interestingly, we identify a different candidate in one of the same fields, Par0456-2203_473.Our candidate is located ∼56″ away from borg_0456-2203_1091, yet was not selected by B16.We did not detect the third candidate presented in B16 (borg_1153+0056_514) with enough significance in H 160 to meet our S/N H160 > 5 criterion (S/N = 4.78) and so this source was not selected in our sample.
Roberts-Borsani et al. (2022) perform a search for z ∼ 8, 9, and 10 candidates in the SuperBoRG fields (Morishita 2021).They begin with a selection using color-color cuts to identify dropout candidates in each redshift range, then further vet their candidates via photometric redshift fitting.For their EAZY runs, they construct an informative prior using simulated photometry of galaxies in the range 0 < z < 15, where the redshift distribution of their recovered sources as a function of magnitude is used to inform the EAZY-derived redshift PDFs.This approach is in contrast to our photometric redshift fitting with a flat prior, where our sample may be both more complete and more contaminated by lower-redshift galaxies as a result.Roberts-Borsani et al. (2022) also recover the two lowestredshift candidates in our sample first reported by Bradley et al. (2012), for which they report photometric redshifts of 7.93 (Par1033+5051_116) and 7.84 (Par0440-5244_497).These redshifts are lower than our estimates of 8.28 and 8.53, respectively, but consistent within each of our photometric redshift PDFs.Roberts-Borsani et al. (2022) also identify three more candidates that we do not recover.One of these sources (their candidate 1437+5043_1241 at z phot = 7.84) has an S/N = 2.2 in the Y 098 , and so does not meet our selection criteria due to its detection in a filter that is blueward of the Lyα break for z > 8.For their second source (2203 +1851_1071 at z phot = 7.84), we measure z phot = 7.85, but only 65% of the photometric redshift PDF lies above z = 8, and we require 70%.It is unsurprising that we do not recover these two sources, as our selection is optimized for higher redshifts.We measure a half-light radius of r 1/2 = 0 099 for their third candidate (1459+7146_344 at z phot = 10.0), a value just below our threshold of 0 104.
The discrepancies between the sample we present in Section 4.1 and those of B16 and Roberts-Borsani et al. (2022) highlight how dependent z > 8 candidate samples are on the sample selection methods used.The Source Extractor parameters used to detect objects in an image, the choice of aperture size and shape, and the way in which the noise is calculated can all affect the detection significance.The selection criteria related to color cuts or photometric redshift fitting, the choice of SED templates used for the fitting, and any thresholds imposed on the resulting redshift PDFs can all contribute to the selection of different samples in the same fields.

Low-redshift Contamination
The data sets we consider in this paper are relatively shallow compared to those of the well-studied fields such as CANDELS or the Hubble Frontier Fields that are the best current options for detecting L * high-redshift galaxies.While these parallel data sets are particularly well suited to detecting rare bright galaxies needed to explore the bright end of the luminosity function, they are likely not deep enough to detect the far more common faint galaxies at lower redshifts.The possibility remains that our selection criteria are in fact identifying faint, low-redshift galaxies.Faint, red galaxies at z ∼ 2-3 are of particular concern.Whether passive or star forming and dusty, their steepening red SEDs may only rise above our detection thresholds in the reddest filters (J and H).Extreme emission line galaxies at lower redshift are another possible source of contamination, where high equivalent width lines such as or Hα could boost the H 160 flux at z ∼ 3, z ∼ 2, or z ∼ 1.3, respectively (e.g., Atek et al. 2011;Faisst et al. 2016).In both cases, photometry in IRAC bands can help identify these sources as their SEDs likely increase past 2 μm.However, a faint contaminant with an emission-line-boosted H 160 flux may fall below the IRAC detection limit.Additionally, one of our candidates (Par0259+0032_194) does not have IRAC coverage, and we consider the IRAC deblending for another candidate (Par0456-2203_473) to be unreliable.While Spitzer/IRAC imaging should help break these redshift degeneracies, the shallow depths of the IRAC imaging as well as the uncertainties in the IRAC deblending complicate the picture.Our conservative selection criteria and the imaging currently available at ∼3-5 μm are not sufficient to rule out such contaminants conclusively.In the following sections, we therefore explore the low-redshift contamination fraction in our sample from faint galaxies falling below our detection thresholds.

Estimate from Stacking
The optical imaging in these HST data sets may be too shallow to detect flux from individual low-redshift sources, and so we stack postage stamps of the candidates in order to check for any optical detections.We create several stacks, separating the candidates by data set to avoid introducing any artificial features by combining images on different pixel scales.We also separate the imaging by filter, creating one stack for each data set that includes all available images obtained with the WFC3/UVIS and ACS/WFC cameras (V 606 and I 600 for BoRG[z8], and I 600 for HIPPIES GO 12286), a stack of all WFC3/IR Y images as an additional dropout band, and stacks of all J and H images as the candidate detection bands.We do not include the candidate from the WISP survey in these stacks because it is the only one on the 0 13 pixel scale.In each case, the images are combined using a weighted mean in order to down-weight any bad pixels.The image stacks are displayed in the top two rows of Figure 11, where we indicate the number of images in each filter that contributed to the stack.As can be seen in the left two panels of each row, there is no appreciable source flux at the expected position in the dropout bands.
We next estimate the detection significance at the center of each stack using r = 0 2 circular apertures.We place a grid of unique apertures across each stamp, using a segmentation map created by combining the individual maps for each input field to avoid source flux.The 1σ noise in the stamp is then taken as the standard deviation of a Gaussian distribution fit to the distribution of background aperture fluxes.With the aperture fluxes measured at the center of each stamp, we find S/N < 1 in both sets of V + I and Y stamps, confirming that there is no significant optical detection in the stacked imaging.These imaging stacks provide at best an upper limit estimate of the low-redshift contamination in the sample, suggesting that the majority of our sample is not comprised of contaminants.For reference, the S/N in H (J) is 24.1 (3.7) and 65.5 (9.3) for the BoRG[z8] and HIPPIES GO 12286 stacks, respectively.
As an extra check, we resample all available V, I, and Y imaging, including the I 814 imaging from the WISP candidate, to a much finer pixel scale of 0 01 pixel −1 .This resampling then allows us to combine all 25 images into a single, summed stack, which is shown in the bottom right corner of Figure 11.If a majority of the candidates in our sample were in fact lowerredshift interlopers, we would expect to detect flux in this combined stack of 25 images.However, there is no visible signal in the black circle in the bottom right panel of Figure 11, indicating that lower-redshift contamination does not dominate our sample.
In the bottom row of Figure 11, we also show the stacked redshift PDFs for all 13 sources in our sample.There is a nonzero portion of the PDF at z ∼ 2-4, with a peak around z ∼ 2.5.This low-redshift peak is consistent with the detection of the 4000 Å break in lieu of the Lyα break between J 125 and H 160 .Alternatively, if high equivalent width emission line galaxies are contributing to that low-redshift peak, it could correspond to strong [O III] in H 160 up to z ∼ 2.4, with [O II] transitioning into the H 160 filter at z ∼ 2.75.
The portion of the stacked PDF that lies at z < 7 amounts to only 1.6% of the total integrated PDF.However, as we have not used luminosity priors in our EAZY runs (see Section 3.1), the redshift PDFs do not take into account the relative abundances of high-and low-redshift galaxies in the Universe.Therefore, we cannot use the redshift PDF to estimate the contamination from lower-redshift galaxies in our sample.Instead, in the next section we aim to quantify the contamination fraction by artificially dimming low-redshift sources and attempting to reselect them as high-redshift candidates.

Catalog-level Estimate of Faint Lower-redshift Interlopers
We begin by identifying a high-fidelity sample of galaxies at low redshift.From the full Source Extractor catalog, we select all sources with S/N H > 5 and H 160 < 23.5 mag.After running EAZY on this sample of bright sources, we select all those with 70% of their redshift probability distribution contained in the range 1 z 4. We consider only resolved sources in an attempt to avoid including in this analysis any stars that have been incorrectly assigned z > 0 photometric redshifts.Using the size measurements presented in Section 3.7, we select sources with a half-light radius larger than the median r 1/2 + 1σ measured for stars in the corresponding data set.There are a total of 622, 142, and 335 lowredshift sources-those that are bright, resolved, and reliably in the range 1 z 4-in the BoRG[z8], HIPPIES GO 12286, and WISP data sets, respectively.
For each HST field containing a candidate high-redshift galaxy in our sample, we create a catalog of dimmed, lowredshift galaxies by assigning a new H 160 magnitude to each real source.The dimmed H 160 magnitudes are pulled randomly from a truncated Gaussian distribution with μ = 26.5 and a maximum at m = 27, 0.5 mag fainter than the 5σ requirement in our selection criteria.We dim the fluxes in all filters by the same value, Δm = H obs − H dimmed .This process maintains the colors of a z ∼ 2 galaxy while moving all photometry to fainter magnitudes.We create multiple realizations of each input lowredshift source, resulting in a catalog of 5000 dimmed sources for each field.We measure the median flux uncertainties in magnitude bins for all sources in the Source Extractor catalog that are detected in the given HST field, and perturb the dimmed fluxes by the error for the appropriate magnitude bin.In this way, we incorporate field-to-field variations in depths while accounting for photometric scatter as a function of magnitude.After they are photometrically scattered, these lowredshift sources span colors in the ranges −2.2 < Y − J < 3.1 and −1.3 < J − H < 3.5, ranges broad enough to encompass the red, dusty sources that could contaminate our high-redshift samples.Finally, we run EAZY on the new photometry and reselect sources as those with H 160 26.5 and 70% of their redshift PDF at z > 8.
For each field, we then look at the total number of the 5000 sources that were "recovered" as high-redshift candidates.There were no dimmed galaxies selected as high-redshift sources in five of the 12 fields that contain our candidates (Par0440-5244, Par0713+7405, Par0756+3043, Par1033 +5051, and Par1301+0000), implying a near-zero contamination rate.For the remaining fields, we select between one and four dimmed sources.Figure 12 shows an example of one such low-redshift source, with the original (undimmed) HST image stamps of the source as well as the original and dimmed photometry plotted as orange squares and red circles, respectively.We also show both sets of EAZY template fits, redshift PDFs, and χ 2 distributions.As can be seen here, the dimming process maintains the J − H color of the source within the photometric scatter, but the bluer fluxes have dropped out of the optical bands.
We use Equation (4) from Finkelstein et al. (2015b) to estimate the contamination fractions in each field that correspond to the number of recovered sources: Here, N dim,sel is the number of dimmed low-redshift sources that were selected as high-redshift galaxies and N z dim,low is the number of input dimmed sources (5000 in all fields).The ratio N tot,lowz /N highz accounts for the relative number densities of low-and high-redshift sources in our survey volume, where Dimming the fluxes in all bands preserves the source's colors, but results in the source dropping out of the bluer bands, and the new photometry is better fit at z ∼ 9. Through this analysis, we expect low contamination rates (<1%-12%) from faint, lower-redshift galaxies in our sample.
N highz is the number of high-redshift candidates detected in the data set (i.e., seven candidates in the BoRG[z8] fields, five in HIPPIES GO 12286, and one in WISP), and N tot,lowz is the total number of faint, low-redshift galaxies identified in the catalog of each data set.For the latter number, we select low-redshift sources with 70% of their integrated redshift PDF contained in the range 1 z 4 and magnitudes in the range 24 < m H160 27.We include sources below the threshold of m H160 26.5 in order to account for the prevalence of fainter, low-redshift galaxies that may be photometrically scattered into our sample magnitude range.In an effort to avoid stars and spurious detections and to ensure quality photometric redshift fits, we also require a half-light radius larger than the median measured for stars in the catalog as well as a 5σ detection in H 160 .We note that these selections, and especially the last two, are likely to miss real faint, low-redshift sources in our catalog.However, we find that without these requirements, we incorrectly include many stars and marginally detected sources with poorly constrained redshift PDFs in our total for N tot,lowz .
As an example, we find that the BoRG with N tot,lowz = 3342 in the BoRG[z8] catalog and N highz = 7, resulting in a contamination fraction of F = 0.095.We find contamination rates of 1.4% (Par0926+4536), 2.2% (Par0956-0450 and Par0259+0032), 2.9% (Par0843+4114), 9.5% (Par0926+4000 and Par0456-2203), and 12.7% (Par335).The fields with = N 0 dim,sel have contamination rates of <1% (Par0713+7405) and <9.5% (Par1033+5051, Par0440-5244, Par0756+3043, and Par1301 +0000).We again note that these contamination rates may be underestimated.We have limited our analysis to galaxies in the range 1 < z < 4 because the majority of the low-redshift portions of our candidates' photometric redshift PDFs is in this range.Yet we are therefore not accounting for low-redshift contamination from sources outside of this range.Also, the SEDs of sources in the catalog are constrained by only a handful of HST photometric measurements, resulting in uncertainties in the photometric redshifts measured for the bright, low-redshift sources that are chosen to be dimmed, the dimmed sources that are then "recovered" as high-redshift galaxies (N dim,sel ), and the faint, low-redshift sources (N tot,lowz ).Additionally, the results of this analysis depend on the template set used to identify both the high-and low-redshift sources.However, with at most four bands of photometry, this analysis represents our best attempt at identifying contamination from faint, low-redshift galaxies.
As the estimated contamination rates are small, and as we expect the contamination rates measured in the bins of magnitude we use when calculating the UV luminosity function to be even lower, we do not reduce our observed number densities by the fractions reported here.

The Bright End of the Luminosity Function at z ∼ 9
In this section we calculate the rest-frame UV luminosity function at 8.5  z  11.This measurement first requires correcting the candidate luminosities for any magnification imparted by nearby neighbors (Section 6.1), and estimating the completeness of our sample as a function of magnitude (Sections 6.2 and 6.3).

Source Magnification and M UV
Gravitational lensing can have the effect of boosting the brightness of sources such that they are detected in a fluxlimited sample when intrinsically they would be too faint.Lensing can benefit high-redshift surveys by bringing the fainter portions of the galaxy population within reach of HST surveys (e.g., Ellis et al. 2001;Coe et al. 2013;Schmidt et al. 2014a;Atek et al. 2015;Coe et al. 2015;Bouwens et al. 2017;Livermore et al. 2017), providing more constraining power for the faint-end slope of the luminosity function to higher redshifts than would otherwise be possible.However, the magnification of sources can also significantly change the shape of the measured luminosity function, especially at the bright end where the low number counts are easily boosted by an increase in observed luminosity.Many of the candidates in our sample have close neighbors, and five in particular have one or more bright neighbors within ∼3″.We now estimate the magnification that each candidate experiences, following the steps laid out by Mason et al. (2015b) for strong lensing.
The observed magnitude of a candidate will be increased by a factor m -( ) 2.5 log 10 by a bright, close neighbor.The magnification, μ, depends on the redshifts of the candidate and the lensing source, the velocity dispersion of the lens, and the projected distance between the lens and the candidate in the image plane.Following Mason et al. (2015b), we assume each of the potential lensing sources can be modeled as a singular isothermal sphere, where the Einstein radius of the lens is proportional to the square of the velocity dispersion of the lensing galaxy (σ 2 ; see Equation (4) of Mason et al. 2015b).
We estimate σ using the empirical, redshift-dependent Faber-Jackson relationship that Mason et al. (2015b) derive using the BoRG[z8] fields (see their Table 1).For this estimate, we use the EAZY-calculated photometric redshifts for the neighboring sources, including the IRAC photometry where available to improve on the photometric redshift constraints.We obtain IRAC photometry for each neighbor as described in Section 3.5, adopting the Galfit model magnitudes for detections and using r = 1 5 aperture fluxes for nondetections.When calculating θ ER , ideally we would use the apparent magnitudes from the same filters as Mason et al. (2015b) for each redshift bin.However, we find that the majority of the neighboring sources have a significant detection (S/N 3) only in H 160 , and so we use H 160 apparent magnitudes for all neighbors, regardless of redshift.As can be seen in Figure 8 and 9, some of the candidates have multiple bright, close neighbors.We assume that each neighboring source is an independent lens, such that the total magnification μ for a candidate is the product of the magnification from each neighboring source, μ i .This method assumes that the neighbors each lens the light from the candidate along the line of sight independently, an assumption that we discuss in more detail below.
As in Rojas-Ruiz et al. (2020), we use a Monte Carlo approach to calculate μ, the corrected H 160 apparent magnitude, and the absolute rest-frame UV magnitude, M UV , of each candidate.We perform 5000 calculations of θ ER and μ i for each neighbor within 5″ of a candidate, increasing the number of realizations from 1000 in Rojas-Ruiz et al. (2020) because of the broader redshift PDFs in our catalog.For each realization, we pull the redshift for the candidate and each neighbor randomly from their respective redshift PDFs.We randomly draw the H 160 fluxes for the candidate and neighbors from Gaussian distributions centered on the observed value and with a standard deviation equal to the flux uncertainties.In this way, we are including the uncertainties in the photometric redshifts and the flux uncertainties in our calculations of μ and M UV .We calculate M UV by converting the randomly drawn H 160 apparent magnitudes to absolute, using the fact that a rest-frame 1500 Å wavelength falls near the middle of the H 160 filter in the z ∼ 9.5 observer frame.We present the median and 68% interval of the M UV magnitude distributions in Table 4. Finkelstein et al. (2022) show that including all nearby neighbors in these calculation biases the resulting magnification toward higher values of μ.Specifically, they show that on average this approach results in μ ∼ 1.4 along randomly selected lines of sight where the expected magnification is μ = 1.Following their example, we therefore only consider the magnification contribution from neighbors with a median μ i > 1.4.The μ i of any neighbor with μ i,median < 1.4 is set to unity.As a result of our Monte Carlo process, we have a distribution of 5000 μs, corrected H 160 fluxes, and M UV values.We take the median of each distribution as the parameter estimate, and adopt the 68% interval as the uncertainty.
We find that three candidates experience a significant magnification: Par0713+7405_95 with μ = 1.45 ± 0.13, Par0259+0032_194 with μ = 3.81 ± 11.8, and Par0456-2203_473 with μ = 15.68 ± 12.50 (note that this candidate also has the largest observed r 1/2 ).This intermediate-to-strong lensing is unsurprising as each of these three candidates has a bright, close neighbor.However, we note that the magnifications of the two candidates with the strongest lensing are highly uncertain, with σ μ > μ for Par0259+0032_194.We report the median M UV for each candidate in Table 4, adopting the magnification-corrected magnitude for the three lensed candidates and providing the 68% intervals that account for the uncertainties in photometric redshift, H 160 fluxes, and magnifications where applicable.We use the distribution of M UV values in calculating the luminosity function in Section 6.3.

Simulated Sources
In order to quantify the incompleteness in our sample of high-redshift galaxies caused by our selection criteria, we create a catalog of simulated sources and add them to the real images from each WFC3 pointing.We then perform the same steps of source detection, photometry, and sample selection described in Sections 2.4 and 3.2, treating the images containing simulated sources identically to how we treat the original images.
We create a catalog of 10 5 sources, with each source assigned a redshift, spectral template, observed flux, size, and shape from the parameter distributions as follows.The source redshifts are pulled randomly from a uniform distribution in the range 7 z 12.Each source is then assigned a spectral template from the Bruzual & Charlot (2003) library with either a Salpeter (1955) or Chabrier (2003) initial mass function, and an exponentially declining star formation history with a characteristic timescale of either τ = 0.01, 0.5, or 5.0 Gyr.These timescales are adopted to approximate a single burst of star formation at one extreme (τ = 0.01 Gyr) and a constant star formation history at the other (τ = 5 Gyr).The template metallicities are one of 0.02, 0.2, 0.4, or 1.0 Z e , and are pulled randomly from a lognormal distribution that peaks at 0.2 Z e .The ages of the stellar populations are similarly taken from a tweaked lognormal distribution that peaks at 10 Myr and decreases with an enhanced tail toward older ages to ensure that some older (>400 Myr) populations exist.The ages available for each simulated source are restricted to be between 10 Myr and the age of the Universe at the given redshift.To each spectral template, we then apply dust extinction due to the interstellar medium using the Calzetti et al. (2000) extinction law and a color excess E(B − V ) pulled randomly from a truncated Gaussian distribution with μ = 0.1, σ = 0.15, We finally apply attenuation through the IGM following the prescriptions in Inoue et al. (2014).The observed H 160 magnitudes are pulled from a distribution constructed as the combination of a uniform distribution in the range 22 H 28 and a truncated Gaussian for fainter magnitudes with μ = 26.5, σ = 1, = H 22 min , and = H 28 max .We then normalize each spectral template to the H 160 magnitude assigned to the simulated source.
We generate images of each source using Galfit (Peng et al. 2002) as Sérsic profiles with index n and axis ratio b/a pulled randomly from truncated Gaussian distributions with μ n = 1, σ n = 2.5, and 0.5 n 5, and μ b/a = 0.8, σ b/a = 0.2, and 0.1 b/a 1, respectively.The half-light radii are determined by the absolute magnitude of each source following the empirical size-luminosity relation presented by Kawamata et al. (2018): where r 0 is the modal radius at M UV = −21, M * is the characteristic magnitude of the Schechter function fit to the luminosity function, and β is the slope of the size-luminosity relation.We adopt the following parameter values from Table 2 of Kawamata et al. (2018): and additionally impose β = 0.25 for sources with M UV −21, resulting in a size-luminosity relation described by a broken power law such that the slope flattens for brighter galaxies.In order to account for intrinsic scatter in the relationship at fixed magnitude, we apply a 0.2 dex scatter to the sizes derived from Equation (3).We note that the r 1/2 distribution measured in the Source Extractor catalogs of recovered simulated sources traces that of the real sources, with the simulated source sizes bracketing those of the real sources, confirming our choice of parameter values for this relation.The Galfit model images for each simulated source are convolved with the measured PSFs (Appendix D) corresponding to each data set and filter.
Finally, we create 200 realizations of each of the 132 HST pointings considered in this paper, each containing ∼100 sources randomly placed across the image, for a total of ∼20,000 input sources per pointing. 22We then run our full sample selection process on these fields, including source detection and photometry (Section 2.4), photometric redshift fitting (Section 3.1), and the selection criteria used to select real high-redshift candidates (Section 3.2).The one exception is that we do not add Galactic extinction to the fluxes of the simulated sources, and so we skip the Galactic extinction correction step described in Section 2.4.Finally, we match the catalogs of recovered sources to the simulated input source positions using a matching radius of 0 5.

Quantifying the Effects of Visual Inspection
As described in Section 3.4, the visual inspection process can affect sample compeleteness by removing sources in a subjective way that is not replicable with simulated sources.In an attempt to quantify this completeness effect, we included the recovered simulated sources in our visual inspections, inspecting an equal number of real and simulated sources.For each field, we randomly selected the same number of simulated sources as there were real initial candidates.In this way, the field-to-field filter coverages and noise properties of the simulated sources matched those of the real candidates.Importantly, this inspection was "anonymized," or performed without knowing which sources were simulated.This is the first time such a method-crucial to calculating sample completeness correctly-has been used in high-redshift galaxy selection.
Of the 166 real candidates that were rejected through visual inspection, we focus here on the 81 rejections that were based on a subjective opinion (category 1e, i.e., "a hot pixel, cosmic ray, or source with a strange morphology", Section 3.4).In Figure 13, we show the rejected fraction of the real and simulated sources that were rejected as category 1e.The fraction of rejected simulated sources increases toward fainter H 160 magnitude (left panel), lower detection S/N in H 160 (middle), and smaller half-light radius (right).Additionally, the fraction of rejected real sources largely follows the same distribution as that for simulated sources, indicating that this trend represents a bias in our visual inspection.The one exception is the H 160 S/N (middle panel), where the four real sources with S/N > 12 were rejected (100% rejection fraction in the S/N bin), while simulated sources with a similar S/N were not.However, these four sources comprise <5% of all rejections as category 1e, and so we do not consider them to be a significant source of bias introduced through our inspection.For reference, we show the distribution of real rejected candidates excluding these four high-S/N sources as dashed lines in Figure 13 and indicate their image stamps in Figure 17.
We do not visually inspect the many thousands of simulated sources that meet our selection criteria.However, we use the recovery of simulated sources to derive the effective volume probed in each field.Any bias imparted through our visual inspection can therefore directly affect our measurement of the luminosity function.In order to account for the rejected fraction of simulated sources, we use the left panel of Figure 13 to apply a correction factor to the measured completeness in bins of H 160 magnitude.For example, we decrease the number of recovered sources in each field with 25.5 < m H160 < 26.0 by 38%, assuming that this percentage of sources would have been rejected during a complete visual inspection.We discuss this correction further in Section 6.3.

Effective Volume
We calculate the number density of UV-bright galaxies at z ∼ 9 using the effective volume method (Felten 1977).This method involves determining the maximum volume within which a galaxy of a given luminosity would be detected by these HST observations and selected with our specific set of Figure 13.The fraction of real (gray histograms) and simulated (green histograms) sources rejected as category 1e during visual inspection (Section 3.4).This category represents the only subjective rejections in our visual classification scheme.We show the rejection fractions as a function of H 160 magnitude (left panel), H 160 detection S/N (middle panel), and half-light radius (right panel).The dashed black histograms in the left and right panels show the distribution of real sources excluding those with H 160 S/N > 12, the bar in the middle panel that is not caught by the rejected simulated sources.See Section 6.2.1 for details.The distributions of rejected real sources approximately follow those for the rejected simulated sources, indicating a bias likely introduced by our visual inspection that can directly affect our measurement of the luminosity function.We use the left panel of this figure to derive a correction factor that accounts for these rejection fractions as a function of H 160 magnitude, and apply it to the calculated completeness and effective volumes in Section 6.3.criteria.The effective volume, V eff , is given by: ò where dV/dz is the differential comoving volume element and P(M UV , z) is the probability that a source of absolute magnitude M UV and redshift z will make it into our sample given our selection function.We estimate P(M, z) using the set of fieldspecific simulations described in Section 6.2.We calculate the completeness as the ratio of the number of recovered sources to input sources in bins of input magnitude and redshift.In order to account for the simulated sources that we rejected during our visual inspection, we decrease the number of recovered sources in each magnitude bin by the fraction of rejected simulated sources shown in Figure 13.For this correction, we scale the magnitude bins to the 5σ H 160 magnitude limit of each field.In this way we approximate rejecting an increasing number of simulated sources as we approach the field depth.We do not include the nonzero fraction of bright (m H160 ∼ 22.3) rejected simulated sources, as there are no rejected real sources in that magnitude bin.
We show the resulting completeness as a function of redshift and H 160 magnitude in the right panel of Figure 14.There are almost no recovered sources at z < 7.5, where the Lyα break falls close to the centers of Y 098 and Y 105 .In these cases, the sources are detected in these filters with S/N > 2 and therefore do not satisfy our selection criteria.As the Y 098 is narrower than the Y 105 , simulated sources at z ∼ 7.5 are recovered in the BoRG[z8] fields, while the recovery fraction in the HIPPIES GO 12286 fields starts to increase for z > 8.The completeness dips slightly at z ∼ 8.6, the redshift at which the Lyα break transitions almost entirely out of Y 105 and sources in the BoRG [z8] and HIPPIES fields become two-band detections.There is another dip at z ∼ 10, as the Lyα break shifts out of the J 125 and J 110 filters and sources are detected in H 160 only.After each initial dip, the completeness increases with redshift as more filters lie blueward of the break, ruling out lower-redshift solutions and increasing the portion of the redshift PDF that lies at z > 8.The recovery in WISP fields is very low at z  10 because the broad J 110 filter results in poorly constrained photometric redshift PDFs for sources from z ∼ 7 to z ∼ 10.The completeness is therefore highest at z > 10.5, when the WISP recovery fractions increase again for H 160 -only detections.
We calculate the effective volume probed in each field using Equation (5).In the left panel of Figure 14, we show the effective volume calculated as a function of H 160 apparent magnitude.For each data set, the darker shaded region shows the V eff that has been corrected for the bias introduced by our visual inspections, and the lighter shaded region shows the original, uncorrected V eff .It can be seen here how this correction increasingly reduces the effective volume for fainter H 160 magnitudes.The thick solid black line indicates the total, corrected volume of all three combined data sets.We use the effective volume calculated as a function of absolute UV magnitude for our calculation of number densities in Section 6.4, adopting a rest wavelength of 1500 Å that corresponds to approximately the center of the H 160 filter at z ∼ 9.5.

Volume Number Densities
We calculate the observed number density of galaxies in the redshift range 8.5  z  11 following the methods described in Finkelstein et al. (2022).We describe our steps here but refer the reader to Finkelstein et al. (2022) for more details.We use an MCMC analysis with the Cash statistic (Cash 1979;Ryan et al. 2011) as the goodness-of-fit measure.For a given assumed number density, the Cash statistic describes the showing each data set separately with the total volume in black.For all data sets, the lighter shaded region shows the effective volume calculated directly from the completeness simulations, while the darker region includes the correction for reduced completeness due to sources removed through visual inspection.As expected given the distribution of rejected fractions in the left panel of Figure 13, the effect of these corrections increases toward fainter magnitudes.The thick black line indicates the total, corrected volume for all three combined data sets.We show the results of our completeness simulations as a function of redshift and H 160 magnitude in the right panel.The white circles indicate the redshifts and magnitudes of our 13 candidates.The completeness is highest for bright sources and falls off toward fainter magnitudes.It also peaks at z > 10, when the Lyα break redshifts out of the J band and sources are H 160 -only single detections.At these redshifts, nondetections in all three blue filters result in larger portions of the redshift PDF at z > 8, and the recovery fraction in the WISP fields increases significantly and boosts the overall completeness.likelihood that the observed number of galaxies matches the expected number, but models the probability distribution as a Poisson rather than a Gaussian distribution.This choice of statistic is therefore appropriate for our small sample size.
Additionally, we use the "pseudobinning" technique developed by Finkelstein et al. (2022), which reduces the biases imparted by choice of absolute magnitude bin center and width on the recovered shape of the luminosity function.At each step of the MCMC chain, the number density is calculated for a given M UV in a bin with a width randomly pulled from the range 0.2-1.5 mag.The volume associated with this bin is determined using the completeness simulations described in Section 6.3 and Equation (5).We pull an absolute magnitude for each source randomly from the distributions calculated in Section 6.1, using the magnification-corrected distributions for the three strongly lensed sources.We then determine the observed number of sources in the bin given these randomly drawn quantities.We perform this calculation in magnitude steps of ΔM UV = 0.1 in the range −24 < M UV < −20.In this way, the number density posterior distribution incorporates the uncertainties on the absolute magnitude of each candidatewhich in turn fold in the uncertainties in redshift, H 160 observed magnitude, and magnification correction-as well as reducing the dependence on the choice of magnitude bins.We use 20 walkers with a burn in of 10 5 steps, after which we discard the chains and use an additional 10 4 steps to calculate the posterior distribution.
We present the calculated number densities in Table 6 in the Appendix and display our results in Figure 15, plotting the 68% and 95% ranges of the density posterior as blue shaded regions.The shading extends from M UV = −23.05, the brightest bin that contains a candidate, to M UV = −20.5, the absolute magnitude at which our average completeness drops below 20%.We plot vertical blue error bars throughout the shaded region to indicate the uncertainty in our measurements due to cosmic variance.We estimate this uncertainty using the calculator from Bhowmick et al. (2020).At all magnitudes, the uncertainty on the number density due to cosmic variance is very low.The overall uncertainties are dominated by Poisson noise.For comparison, we also show a number of results from the literature, including the pseudobinning results from Finkelstein et al. (2022) for their 11 candidates in the same redshift range (purple shaded region).We discuss the implications of our results in the context of other similar high-redshift searches in Section 7.

Comparison with Previous Luminosity Function Calculations
We begin by comparing our results to other studies in similar redshift ranges.In Figure 15, we include number density measurements from high-redshift galaxy searches performed in HST parallel fields (B16; Calvi et al. 2016;Livermore et al. 2018;Morishita et al. 2018;Rojas-Ruiz et al. 2020;Leethochawalit et al. 2023), ground-based imaging in the COSMOS/UltraVISTA field (Stefanon et al. 2019;Bowler et al. 2020), the CANDELS fields (Finkelstein et al. 2022), as well as those that included the Hubble Frontier Fields (Oesch et al. 2018), and the HUDF (Bouwens et al. 2021).Collectively, these surveys cover a large dynamic range in both magnitude and volume, yet individually they have complementary strengths.The Hubble extragalactic legacy fields have the depth needed to probe to fainter magnitudes, yet include only a handful of lines of sight and are therefore more susceptible to cosmic variance.On the other hand, the HST pure parallel programs include hundreds of independent pointings, and the ground-based COSMOS/UltraVISTA field covers a very large area at a shallow depth.These surveys are more likely to detect the rare, brightest galaxies, but at the expense of lower sample completeness for M UV  −21.5.
Although we cover a broad redshift range, our results are consistent with other searches performed in HST parallel surveys.Calvi et al. (2016), Morishita et al. (2018), Rojas-Ruiz et al. (2020), and Leethochawalit et al. (2023) measured the rest-frame UV luminosity function in fields that are part of the BoRG[z910] and SuperBoRG (Morishita 2021) surveys, which include imaging in JH 140 .The second detection filter helps protect against the selection of spurious sources and improves the photometric redshift by more fully constraining the location of the Lyα break.The 132 HST fields we include in our results are less optimized for the selection of z ∼ 9-10 galaxies.Additionally, our broad redshift range makes direct comparison with results from the literature challenging.Nevertheless, the number densities from these HST pure parallel studies at both z ∼ 9 (gray symbols) and z ∼ 10 (orange symbols) lie approximately at the median density we calculate across the full absolute magnitude range that we consider.This consistency implies that the excess we measure of bright galaxies is not an artifact of our large redshift range.A more direct comparison is possible with the densities from Rojas-Ruiz et al. (2020; orange stars), as their measurements span the full redshift range 8.4 < z < 10.6 and are in good agreement with our own.B16 explore some of the same fields that we do, yet our two samples of high-redshift galaxy candidates do not overlap.However, the number densities reported by B16 at z ∼ 10 are also consistent with our measurements.Finally, our luminosity function is consistent with those at z ∼ 9 and z ∼ 10 from Leethochawalit et al. (2023) for all magnitudes except the very brightest at M UV  −22.5.In fact, our two measurements are consistent within their 1σ uncertainties out to M UV  −22.8.In Figure 15, we have plotted their double power-law fits, though we note that their Schechter function fits are comparable.
Overall, the results using the HST pure parallel surveys find number densities above those measured in the HST legacy or ground-based imaging.The two brightest magnitude bins at z ∼ 9 from Bouwens et al. (2021) are consistent with our measurements within the uncertainties, yet the Oesch et al. (2018) measurements at z ∼ 10 are much lower.While our measurements over a broader redshift range cannot be directly compared to those of Oesch et al. (2018) and Bouwens et al. (2021), the densities from HST parallel surveys-including ours-are all generally higher than those reported by these two papers.Both Oesch et al. (2018) and Bouwens et al. (2021) include fields deep enough to probe the z ∼ 9-10 galaxy population down to M UV > −18, yet do not find galaxies at M UV < −22.The brightest z ∼ 10 candidate contributing to the luminosity function of Oesch et al. (2018) is GN-z11 (Oesch et al. 2016) with a magnitude of M UV = −21.6.The bright end of the luminosity function measured by Stefanon et al. (2019) is also lower but consistent with our measurements within the uncertainties.
We next focus on the comparison between our results and those of Finkelstein et al. (2022), as we have performed very similar sample selections, completeness analyses, have used the same method to measure the UV luminosity function, and cover a similar redshift range.Our luminosity functions are fully consistent down to M UV ∼ −21.8, below which the purple region decreases and is more consistent with the Schechter functions of Finkelstein (2016; predicted for a smooth decline in the luminosity function at z > 8), Bowler et al. (2020; showing a slower evolution in the bright end of the luminosity function at z > 8), and Bouwens et al. (2021; proposing an accelerated decline in the luminosity function).The highredshift sample presented by Finkelstein et al. (2022) includes an overdensity in the Extended Groth Strip (EGS) field, which may serve to increase their number densities over similar studies in the HST legacy fields.At the bright end, however, the uncertainties of Finkelstein et al. (2022) are dominated by cosmic variance, finding a fractional uncertainty of 0.95 at M UV = −22.5.With over 130 independent pointings included in our analysis, we find a fractional uncertainty of 0.14 at M UV = −22.8(blue vertical error bars).This indicates that the high density of bright galaxies as reported here and in Finkelstein et al. (2022) is robust against cosmic variance, though we note that the Poisson uncertainties in our measurements are large.As described in Section 5.2, we expect our contamination from lower-redshift interlopers to increase toward fainter magnitudes.This increasing contamination could explain why we calculate a higher luminosity function than that presented by Finkelstein et al. (2022) for M UV  − 21.8.
These results indicate that wide-area or pure parallel programs provide valuable insight about the bright end of the rest-frame UV luminosity function.Their large areas or many uncorrelated pointings significantly reduce the effects of cosmic variance and allow for a more complete sampling of the population of bright, z ∼ 9-10 galaxies.While there is significant scatter and Poisson uncertainty for M UV  −22, there is a broad consensus among these programs that there is an excess of bright galaxies at z ∼ 9-10 compared to a Schechter function parameterization of the luminosity function.By M UV ∼ −21.5, the results from these shallow observations may be more prone to contamination, and we should turn to deeper fields for measurements of the luminosity function.

Implications of the Number Density of Bright Galaxies
While many factors affect the evolving abundance of galaxies with redshift, the two dominant processes influencing  Finkelstein et al. (2022).The inner (outer) blue shaded region is the Poisson uncertainty determined by the 68% (95%) range of the number density posterior, extending from the brightest bin that contains a candidate to the magnitude at which our average completeness drops below 20%.The solid blue line shows the upper 68% range at brighter and fainter magnitudes, and the blue vertical error bars indicate the uncertainty due to cosmic variance based on the calculator from Bhowmick et al. (2020).We show the absolute magnitudes of each of our 13 candidates at the top as blue circles, with error bars representing their M UV posterior distribution 68% ranges.We compare our results with those from Finkelstein et al. (2022; purple shaded region), as well as other studies using HST parallel fields (B16; Calvi et al. 2016;Livermore et al. 2018;Morishita et al. 2018;Rojas-Ruiz et al. 2020), ground-based imaging in the COSMOS/UltraVISTA field (Stefanon et al. 2019;Bowler et al. 2020), and HST imaging in the extragalactic legacy fields (Oesch et al. 2018;Bouwens et al. 2021).For reference, we also show the Schechter function fits from Oesch et al. (2018) and Bouwens et al. (2021), the Schechter function predictions from Finkelstein (2016), and the double power-law luminosity functions from Bowler et al. (2020) and Leethochawalit et al. (2023).We find a high density of bright galaxies, in good agreement with other HST parallel studies.Though our measurements cover a broader redshift range, they are consistent with the parallel studies at both z ∼ 9 and z ∼ 10.Our luminosity function is consistent for M UV  −21.5 with that of Finkelstein et al. (2022), which was calculated using the same method but in a handful of fields.The uncorrelated nature of the pure parallel observations we include in our analysis significantly reduces the effects of cosmic variance, allowing for a more complete sampling of the population of bright galaxies at z ∼ 9-10.the bright end of the observed UV luminosity function at z ∼ 9 are likely star formation efficiencies and dust attenuation.Somerville et al. (2015) and Yung et al. (2019a) showed that the predicted number density of high-redshift galaxies (z  4) is very sensitive to the efficiency with which molecular hydrogen is converted into stars.This is because at these redshifts, the molecular gas consumption time (a few gigayears in nearby galaxies) becomes comparable to the age of the Universe.Specifically, in their fiducial model, Yung et al. (2019a) adopt a double power law for the relationship between molecular gas surface density and star formation rate surface density, where the slope of the power law becomes steeper above a critical molecular gas surface density.Galaxies at early times are more compact than those at lower redshifts, and so most star formation is likely occurring in gas at densities above this critical density.The net effect is that the gas depletion times are shorter, and star formation efficiencies higher, at high redshift.The authors find that with a slope of ∼2 at high gas densities (a bit steeper than the slope of unity measured for nearby spiral galaxies; Bigiel et al. 2008), their semianalytic model can reproduce the observed number density of bright galaxies out to z ∼ 8.A model with a single power-law star formation relation with a slope of either unity or 1.5 (as in the canonical Kennicutt-Schmidt relation;Schmidt 1959Schmidt , 1963;;Kennicutt 1989Kennicutt , 1998) ) significantly underproduces bright galaxies even by z ∼ 6.
Dust attenuation can also play a major role in determining the shape of the bright end of the UV luminosity function (e.g., Vogelsberger et al. 2020).The high density of UV-bright galaxies in Figure 15 may indicate low or negligible dust content in massive galaxies at these redshifts.The shallow evolution in the bright end at high redshift may be caused by a redshift and/or mass dependence in dust production and dustto-metal ratios.Finkelstein et al. (2022) show that the semianalytic model from Yung et al. (2019a) can best reproduce their observed density of bright galaxies at z ∼ 9 when there is no dust attenuation included (yet SED fitting for the galaxies in their sample indicate nonnegligible dust contents; Tacchella et al. 2022).It is also important to note that the effects on the bright end of lower dust attenuation are degenerate with those of higher star formation efficiencies, and detailed observations with JWST NIRCam and MIRI will be required to separate these effects in bright, high-redshift galaxies.
This observed excess could also be due to a high fraction of faint active galactic nuclei (AGNs) in our sample.While there have been only a handful of AGNs detected at z > 7, massive accreting black holes must exist at earlier times to explain the masses of the quasars observed at z ∼ 6-7 (Bañados et al. 2018).Additionally, there is a growing body of evidence that some of the brightest known z > 7 star-forming galaxies may in fact host detectable AGN activity, typically detected via significant N V emission (e.g., Tilvi et al. 2016;Hu et al. 2017;Laporte et al. 2017) including one of the highest-redshift spectroscopically confirmed galaxies at z = 8.68 (Zitrin et al. 2015;Mainali et al. 2018).We cannot rule out the possibility that the luminosity of these bright galaxies is partially due to an AGN contribution.Spectroscopic follow up with NIRSpec's medium-and high-resolution gratings could help shed light on this question.

Bright Galaxies during the Epoch of Reionization with JWST
This sample of 13 galaxy candidates at 8.5  z  11 represents the very bright end of the rest-frame UV luminosity function, with M UV  −21.2.Such massive galaxies likely formed in regions of relative overdensity, where the buildup of mass began at earlier times.These regions are of particular interest in understanding the early stages of the reionization of the IGM.Whether the progression of the epoch of reionization at early times was dominated by ionizing emission from massive galaxies with high escape fractions (e.g., Naidu et al. 2020;Matthee et al. 2022;Naidu et al. 2021), or from the more numerous lower-mass population (e.g., Finkelstein et al. 2019), these overdense regions would have been among the first to ionize.Bright galaxies may therefore reside in ionized bubbles even as early as z  8. Indeed, Lyα has been detected in the spectra of two galaxies at z ∼ 8.7 in the EGS field (Zitrin et al. 2015;Larson et al. 2022), potentially indicating ionized regions large enough for the Lyα photons to redshift out of resonance before encountering the neutral IGM.
The z ∼ 9-10 candidates we present in this paper may trace overdense regions at early times, making them excellent targets for follow up with JWST.Deep NIRCam imaging can be used to detect the population of fainter galaxies as demonstrated by Castellano et al. (2016Castellano et al. ( , 2018; at z ∼ 7), and deep NIRSpec spectroscopy can be used to probe reionization in these regions via Lyα and Mg II (e.g., Henry et al. 2018).The NIRSpec prism also provides the opportunity to characterize the ionizing efficiencies of bright galaxies.The building narrative surrounding reionization hints at a chaotic early Universe, with galaxies undergoing intense bursts of star formation, producing massive, bright stars that power strong nebular emission lines (e.g., Finkelstein 2016;Stark 2016;Endsley et al. 2021).The relative strength and ratios of these lines, such as that of [O III] λλ4959, 5007/[O II] λλ3727, 3729 (=O32; e.g., Steidel et al. 2016), can therefore reveal significant detail about the strength of the ionizing continuum in galaxies at z > 9 (e.g., Shapley et al. 2003;Erb et al. 2010;Steidel et al. 2016;Ravindranath et al. 2020).Observational evidence shows that high O32 correlates with increasing ionizing efficiency (e.g., Tang et al. 2019).For galaxies during the epoch of reionization, large measured ionizing efficiencies may indicate sources that are capable of ionizing a big bubble in the IGM.The combination of NIRSpec spectroscopy and deep NIRCam imaging will test whether reionization is driven by the most massive galaxies, or by the smaller, fainter sources.
With JWST GO-2426 (Co-PIs: Bagley & Rojas-Ruiz),we will follow up five sources from this paper (Par0456-2203_473, Par0756+3043_92, Par0956-0450_684, Par335_251, and Par2346-0021_16423 ) as well as six additional candidates presented in Rojas-Ruiz et al. (2020).With approximately 1 hr of exposure time per target with the NIRSpec prism in fixed slit mode, the observations will be sensitive enough to detect the Lyα break for redshift confirmation as well as nebular emission lines such as  4. a The J 125 photometry quoted here for Par0259+0032_194 is that measured after masking out the bad pixels as described in Section 4.1.5.

Appendix B Rejected Candidates
In this section, we show H 160 postage stamps for all 166 rejected sources.These candidates were selected via the criteria discussed in Section 3.2 and then later rejected through persistence checks (Section 3.3), or visual inspection (Section 3.4).All stamps are displayed on a linear stretch with a zscale interval that highlights pixel values near the image median.These stamp normalizations are calculated using the pixels in a 5″ × 5″ box centered on the source position.We note that on the reverse color map adopted here, very negative values are displayed as white pixels.We do not apply any pixel interpolation or smoothing to these stamps, and use r = 0 75 circles to indicate the source positions.
Of the 193 candidates that pass our selection criteria, the majority of them are spurious detections that we classify visually in four categories.First, we reject 48 sources as satellite remnants or diffraction spikes (Figure 16(a)).As can be seen in Figure 16(a), many of the sources in this category are detected as part of the same set of satellite trails.Next, we reject 12 sources that are detected on or very close to an image edge (Figure 16(b)).The sources in this category suffer from incomplete source extraction and unreliable photometry.Through inspection of the rms maps at the position of each candidate, we identify 25 sources that are bad pixels in the rms map (rms > 0.1, equivalent to a weight less than 100) or are contaminated by bad pixels (i.e., bad pixels are included in the isophotal area of the source, therefore impacting the photometry).We show the central 2″ × 2″ around each of these candidates in Figure 16(c) to zoom in on the morphologies of these spurious or contaminated sources, and indicate the locations of the bad pixels with red contours.The final category of visually rejected sources are those that have been identified as hot pixels, detector artifacts, or sources with strange morphologies.This category is the only subjective part of our visual inspection, and we attempt to account for these rejections in our calculate of the effective volume (see Section 6.3).We show the 81 sources rejected in this category in Figure 17, zooming in on the central 2″ × 2″ to highlight the morphologies.
Finally, we reject 12 candidates as contamination by image persistence.In these cases, a previous observation of a bright source left a residual charge on the detector that fades with time.If this leftover flux is detected in the H 160 and is absent from the other filters, the persistence will be selected as a highredshift source.We have identified 10 cases of persistence based on the counts registered on the detector at the position of the candidates prior to the candidate's observation.An additional two WISP sources are identified as persistence due to their location relative to a bright source and the survey strategy of the WISP observations (see Section 3.3).We show all 12 cases of persistence in Figure 16(d).IRAC PSFs are used to model the source fluxes in the IRAC bands with Galfit (see Section 3.5).

D.1. HST PSFs
We begin by creating a sample of stars in each of the three data sets.We select sources that are 5″ from any neighboring objects in the catalog and that have measured half-light radii in H 160 in the range 0 1 <r 1/2 < 0 25 pixels.We also require H 160 magnitudes in the range 16 < m H160 < 21, where the cut at the bright end is determined to avoid saturated sources and the cut at the faint end is to allow for a clean separation between point sources and extended sources.Next, we visually inspect the H 160 imaging of all star candidates, rejecting any that lie along an image edge, are heavily affected by bad pixels or other detector artifacts, or appear extended or as two very close sources that have been extracted as a single source.Through this selection, we identify a median of four moderately bright, isolated stars per field, though the number of stars meeting these criteria in a given field ranges from 0 to 15.In total, we identify 539 stars, 253 in the BoRG[z8] fields, 112 in HIPPIES GO 12286 fields, and 174 in the WISP fields.
Ideally, we would create a composite PSF for each filter in each HST pointing, constructed from stars that are located across the full field.In this way, we would account for any variations in the telescope focus during each set of observations that could result in field-to-field variations in the PSF.By including stars across the full field, we would also account for the detector location dependence of the PSF.However, as these HST surveys were designed as extragalactic programs, they attempted to avoid stars as much as possible.Many of the HST pointings contain only a handful of isolated stars in the desired magnitude range, and some fields contain none.As there are not enough stars to construct a PSF for each individual HST pointing, we instead use a single PSF for all fields of a given data set.To validate this approach, we explored the field dependence of the PSF by creating individual stacks of stars for each field containing at least three point sources.We constructed a radial profile of each combined PSF by computing the azimuthally averaged flux in circular apertures of increasing radii and used it to measure the FWHM.We found that while the FWHM of the PSF varies from field to field, it does so by an acceptably small amount.For example, the standard deviation of H 160 FWHMs measured for individual fields is 0 04 for all three data sets.We therefore create a combined PSF for each data set/filter combination that we use for all fields in that data set.
For each data set/filter, we create a combined PSF in the following way.The individual image postage stamps of each identified star are resampled by a factor of 10, registered to a common centroid using subpixel shifts, and then resampled back to their original pixel scale.We normalize each image by the flux measured in a circular aperture of radius r = 0 2. Because the fields are observed at a range of roll angles, we rotate each stamp by a randomly assigned angle in order to randomize the location of the diffraction spikes.Finally, we median combine all image stamps.Combining multiple stars more fully samples the undersampled HST/WFC3 PSF, as together they provide images of point sources with a range of subpixel centroids.We perform this stacking on each of the three data sets separately, to preserve the pixel scales and drizzling patterns.The combined PSFs for each filter and data set are displayed in the top three rows of Figure 18, where we show the full stamps for all filters and the central cores of the PSFs in the H 160 images as insets.We calculate the radial profiles and FWHMs as described above.Each PSF panel in Figure 18 is labeled with the measured FWHM and the number of stars that are included in the stack.We also list the measured FWHMs in Table 7.
Finally, we use this sample of visually vetted stars to measure a typical r 1/2 for stars in each data set.While Source Extractor provides a measurement of stellarity that can be useful in separating point and extended sources (CLASS_ STAR), we find that it is most reliable for bright stars, m H160  17.The stars in this sample with H 160 magnitudes fainter than ∼17 have CLASS_STAR values that range from 1.0 (as expected for point sources) all the way down to ∼0.2 (closer to that expected for extended sources).We therefore choose to use the r 1/2 as measured in H 160 with Source Extractor to classify unresolved sources.In Table 7, we provide the median and standard deviation of r 1/2 for the stars in each data set, values we use in Section 3.7 to identify potential stars in our sample of high-redshift candidates.

D.2. Spitzer/IRAC PSFs
We create median PSFs for each IRAC channel in the same manner as for the HST images.Specifically, we run Source Extractor on the IRAC images and identify point sources as bright (14 < M IRAC < 18), isolated (no neighbors within 15″) sources with a measured half-light radius between 0 9 < r 1/2,IRAC < 1 35.We visually inspect all sources to remove any with bad pixels or undetected neighbors in the Source Extractor catalog, resample all image stamps by a factor of 10 to allow for subpixel centroiding, and median combine the resulting stack.We do not randomly rotate each stamp, as all the IRAC mosaics are aligned by detector coordinates.
As described in Section 3.5.1,we create two sets of IRAC PSFs.One set is created from point sources identified in all programs except PID 90045 (PI: Richards) and is used with Galfit models for all observations from these six programs.We combine 243 (3.6 μm) and 88 (4.5 μm) sources to create these PSFs, which are shown in the bottom two rows of Figure 18.The second set is created from point sources identified in the PID 90045 imaging, and is used for modeling Par2346-0021.This IRAC program follows a different observing strategy, mapping a large area at a much shallower depth (∼1.8 and 0.7 magnitudes shallower at 3.6 μm and 4.6 μm, respectively).Most significantly, the IRAC imaging covering Par2346-0021 uses two small dithers per pointing, compared with the 30-100 + medium dithers employed by the other six programs.With only two small dithers, the PSF in this imaging is likely undersampled to a larger extent than that in the other programs, and so we create a program-specific PSF for use with the Galfit modeling.The PSFs we create for Par2346-0021 are comprised of 1561 (3.6 μm) and 1259 (4.5 μm) sources and are also displayed in Figure 18 in columns titled "Par2346." For all IRAC PSFs, we calculate aperture corrections by measuring the flux of our custom PSFs in circular apertures of increasing radii.The correction is the factor that is required to scale from the encircled flux in r = 1 5 apertures to the total flux measured in the largest apertures where the growth curve has flattened.The aperture corrections are included in Table 7, and are applied to the aperture fluxes and uncertainties in Section 3.5.2.

Figure 1 .
Figure 1.The positions of all 132 fields we consider in our analysis.Purple circles indicate BoRG[z8] fields, which include fields from the BoRG survey (PI: Trenti), HIPPIES (PI: Yan, GO 11702), and parallels from the COS GTO program (PI: Green).The 23 HIPPIES fields from GO 12286 are shown as orange squares, and the 45 WISP fields are shown as blue triangles.(Symbol sizes are not to scale.)Theindependent, uncorrelated nature of the fields-ideal for studies of rare, bright galaxies-is evident from their locations shown here.

Figure 3 .
Figure 3.An example of the noise calculated for the BoRG[z8] field Par0846+7654 as a function of aperture size.The noise, σ N , is calculated as the normalized median absolute deviation of background flux measured in uncorrelated circular apertures placed on blank regions of the field.We show σ N for the filters in this field as blue triangles (I 600 ), green diamonds (Y 098 ), orange squares (J 125 ), and red circles (H 160 ).The functional fits to these measurements (Equation (1)) are plotted as curves of the corresponding colors.The black dashed and dotted-dashed lines indicate the cases of completely correlated pixels (β = 1) and completely uncorrelated pixels (β = 0.5), respectively.The noise in these images is partially correlated in all filters.Neglecting to correct for this correlation would result in underestimated flux uncertainties and overestimated S/Ns in our catalog.

Figure 4 .
Figure4.Two examples of WISP sources identified as self-persistence and removed from our sample.First, in panel (a) we show a 50 × 50 pixel stamp of a candidate identified as persistence due to the high detector counts in the FLTs observed before the H 160 observations.In panel (b), we show 10 × 10 pixel stamps of the four affected FLTs that contributed to the H 160 mosaic.The cause of the persistence was the zeroth order of a bright star observed with the G141 grism just prior to the H 160 exposures.We show a zoom-in view of this zeroth order in panel (c) and a larger portion of the G141 image in panel (d) with the zeroth and first orders indicated.As can be seen in panel (e), the position of self-persistence resulting from this specific observing strategy will be offset by ∼180-195 pixels from the offending bright source.We have identified and rejected two additional WISP sources in our sample based on their position relative to a bright star, one of which is shown in panel (f).These candidates were not identified as persistence based on counts from previous observations at the candidate position on the detector, indicating that even this thorough check can miss cases of persistence.

1.
Is the candidate a real source or a spurious detection (166 rejected, see Appendix B)? I.e., is the source: 1a. a diffraction spike or satellite remnant (48 rejected, see Figure 16(a)); 1b.along an image edge or detected in an area of increased noise near an image edge (12, Figure 16(b)); 1 c. a bad pixel in the weight map (25, Figure 16(c)); 1d. the oversplit region of a bright neighboring source (0); or 1e. a hot pixel, cosmic ray, or source with a strange morphology (81, Figure 17)? 2. Is the aperture drawn correctly, or was the Kron ellipse

Figure 5 .
Figure5.Spitzer/IRAC imaging of the 13 candidates.For each candidate we show a 30 6 IRAC image centered at the candidate's position (left), the 30 6 residual map with all source flux removed except that of the candidate (middle), and a 10 2 zoom-in view of the residual map to highlight the structure close to the candidate position (right).In each set of stamps, the top and bottom rows display the 3.6 and 4.5 μm (where available) images, respectively.The black squares indicate the size of the zoom-in stamp, and the circles have radius 1 5, the size used for aperture photometry as described in the text.For fields observed multiple times, we show a single observation as an example.Additional images and residual maps are similar to those shown here.The image stamps are displayed with no image smoothing and on zscale intervals calculated individually for each stamp and zoom.The deblending at the target positions is mostly clean with the exception of Par0456-2203_473, which is discussed in Section 4.1.4.

Figure 6 .
Figure 6.The image stamps, photometry, best-fitting SEDs, and redshift PDFs for two candidates we have removed from our sample.For each candidate we show 3″ HST stamps and, where available, 10″ Galfit residual maps for the IRAC imaging from each individual Astronomical Observation Request (AOR, i.e., a Spitzer observation sequence) with all modeled flux removed except that attributed to the candidate.The purple r = 0 5 circles identify the position of the candidate, and the blue 3″ squares indicate the size of the HST stamps on the larger IRAC images.Beneath the image stamps we show the SEDs in the left panel (measured fluxes as black circles, 1σ upper limits as arrows, and the fluxes of the best-fitting template as open squares) and the photometric redshift PDFs in the right panel, with the fit to the HST + IRAC photometry (solid blue curve) and HST-only photometry (dashed purple curve).Top: we remove Par2346-0021_164 due to an increase in its redshift PDF at z ∼ 2-4 after including IRAC photometry.Bottom: we remove Par2132+1004_509 from our sample primarily due to its low Δχ 2 = 2.53, but also because it is only observed in three HST bands and is not covered by Spitzer/IRAC.

Figure 7 .
Figure 7.The SpeX-derived goodness-of-fit χ 2 values as a function of spectral type (orange, top axis) compared with that determined by EAZY as a function of redshift (blue, bottom axis) for the high-redshift candidate Par0713+7405 ID 95.The χ 2 calculated for each SpeX spectrum is identified by a separate point, indicating the variation measured for a given spectral type.The minimum galactic and stellar χ 2 values are identified by horizontal dashed and dashed-dotted lines, respectively.The difference in χ 2 values for this source is c c c D = -= 11.11 2 stellar 2 galactic 2, which is the smallest Δχ 2 of all candidates in our sample.All candidates are better fit by galactic templates than by MLT dwarf spectral templates.

Figure 8 .
Figure 8.The image stamps, photometry, best-fitting SEDs, and redshift PDFs for the first six candidates in our sample, sorted by photometric redshift.For each source, we show all available imaging including 3″ HST stamps and 10″ Galfit residual maps for the IRAC imaging from individual AORs with all modeled flux removed except that attributed to the candidate.The purple r = 0 5 circles identify the position of the candidate in the HST images, and the blue 3″ squares indicate the size of the HST stamp in the larger IRAC image.Beneath the image stamps we show the SEDs in the left panel and the photometric redshift PDFs in the right.We show the measured photometry as filled circles, 1σ upper limits as arrows, and the predicted fluxes for each template as open squares.The blue solid curve and blue shaded regions indicate the best-fitting template and PDF including all HST + IRAC photometry.The dashed purple curves indicate the best-fitting template/PDF considering only HST photometry.The inclusion of the IRAC photometry almost always tightens the width of the PDF at high redshifts while reducing the portion of the PDF present at lower redshifts.The red dotted-dashed curve shows the best-fitting template/PDF at low redshift, when EAZY is restricted to z < 6.These candidates are all better fit with a high-redshift galactic spectral template than that of a lower-redshift galaxy.We use a different color scheme with Par0259 +0032_194, and show the photometric redshift fits when the J 125 photometry is excluded (solid green curve) and when a lower limit is assumed as described in the text (dashed black curve).

Figure 9 .
Figure 9.The candidate plots for the remaining seven candidates.Plots are displayed as in Figure 8.

Figure 10 .
Figure 10.The one-dimensional WISP spectra in the WFC3 grisms G102 (blue) and G141 (orange) of candidate Par335_251.The light shaded regions show the 1σ uncertainties.The best-fitting high-redshift solution (z phot = 10.53) would place Lyα at λ obs ∼ 1.4 μm (blue dashed vertical line).The best-fitting low-redshift solution would place [O II] at λ obs ∼ 1.32 μm (red dashed-dotted line).We show a horizontal rectangle for each emission line weighted by the redshift PDFs and extending to their 95% intervals.There are potential emission lines at ∼12200 Å (S/N = 2.3) and 14000 Å (S/N = 1.5) that are broadly consistent with the z ∼ 2 redshift PDF, though this low-redshift solution is disfavored with a Δχ 2 = 20.41.The fits to each line are described in the text and displayed in the top panels.The 14000 Å feature would be consistent with Lyα at z = 10.53,though a Lyα detection at this redshift is unlikely due to the relatively short integration times.While there is no conclusive indication that this candidate is at a lower redshift, its true nature is not clear from this spectrum.

Figure 11 .
Figure 11.Stacks of image postage stamps and the photometric redshift probability distributions of all candidates in our sample.We separate the stamps by data set in order to combine images with the same pixel scales, showing BoRG[z8] in the top row and HIPPIES GO 12286 in the second row.These stamps are 5″ on a side, displayed on a zscale interval with no image smoothing.The black circle of r = 0 5 indicates the expected position of source flux.In the bottom row we show the stacked redshift PDF of all sources in our sample (left) and a stack of all V, I, and Y imaging for all candidates (including WISP) resampled onto a common pixel scale of 0 01 pixel −1 (right).In all cases, including the resampled stack, no source flux is visible in the optical and Y filters at the expected position, indicating that our sample is not dominated by lower-redshift contaminants.

Figure 12 .
Figure 12.An example of a low-redshift source from Par0259+0032 that is selected as a high-redshift candidate after the fluxes are dimmed as described in the text.In the large panel on the left, we plot the original and dimmed photometry as orange circles and red squares, respectively.The best-fitting EAZY templates are also displayed, with the corresponding χ 2 distributions in the top right panel and the redshift PDFs in the bottom right panel.We show the original (undimmed) HST imaging in 5″ stamps along the top left.Dimming the fluxes in all bands preserves the source's colors, but results in the source dropping out of the bluer bands, and the new photometry is better fit at z ∼ 9. Through this analysis, we expect low contamination rates (<1%-12%) from faint, lower-redshift galaxies in our sample.

Figure 14 .
Figure14.The completeness and effective volume calculated for our selection criteria.We plot the effective volume as a function of H 160 magnitude in the left panel, showing each data set separately with the total volume in black.For all data sets, the lighter shaded region shows the effective volume calculated directly from the completeness simulations, while the darker region includes the correction for reduced completeness due to sources removed through visual inspection.As expected given the distribution of rejected fractions in the left panel of Figure13, the effect of these corrections increases toward fainter magnitudes.The thick black line indicates the total, corrected volume for all three combined data sets.We show the results of our completeness simulations as a function of redshift and H 160 magnitude in the right panel.The white circles indicate the redshifts and magnitudes of our 13 candidates.The completeness is highest for bright sources and falls off toward fainter magnitudes.It also peaks at z > 10, when the Lyα break redshifts out of the J band and sources are H 160 -only single detections.At these redshifts, nondetections in all three blue filters result in larger portions of the redshift PDF at z > 8, and the recovery fraction in the WISP fields increases significantly and boosts the overall completeness.

Figure 15 .
Figure15.The UV luminosity function (blue shaded region) in the range 8.5  z  11 as measured in the BoRG[z8], HIPPIES GO 12286, and WISP data sets using the pseudobinning technique developed byFinkelstein et al. (2022).The inner (outer) blue shaded region is the Poisson uncertainty determined by the 68% (95%) range of the number density posterior, extending from the brightest bin that contains a candidate to the magnitude at which our average completeness drops below 20%.The solid blue line shows the upper 68% range at brighter and fainter magnitudes, and the blue vertical error bars indicate the uncertainty due to cosmic variance based on the calculator fromBhowmick et al. (2020).We show the absolute magnitudes of each of our 13 candidates at the top as blue circles, with error bars representing their M UV posterior distribution 68% ranges.We compare our results with those fromFinkelstein et al. (2022;  purple shaded region), as well as other studies using HST parallel fields (B16;Calvi et al. 2016;Livermore et al. 2018;Morishita et al. 2018;Rojas-Ruiz et al. 2020), ground-based imaging in the COSMOS/UltraVISTA field(Stefanon et al. 2019;Bowler et al. 2020), and HST imaging in the extragalactic legacy fields(Oesch et al. 2018;Bouwens et al. 2021).For reference, we also show the Schechter function fits fromOesch et al. (2018) andBouwens et al. (2021), the Schechter function predictions fromFinkelstein (2016), and the double power-law luminosity functions fromBowler et al. (2020) andLeethochawalit et al. (2023).We find a high density of bright galaxies, in good agreement with other HST parallel studies.Though our measurements cover a broader redshift range, they are consistent with the parallel studies at both z ∼ 9 and z ∼ 10.Our luminosity function is consistent for M UV  −21.5 with that ofFinkelstein et al. (2022), which was calculated using the same method but in a handful of fields.The uncorrelated nature of the pure parallel observations we include in our analysis significantly reduces the effects of cosmic variance, allowing for a more complete sampling of the population of bright galaxies at z ∼ 9-10.

Figure 16 .
Figure16.The H 160 postage stamps of the 97 spurious sources rejected in the nonsubjective categories during our source vetting (Sections 3.4 and 3.3).Group (a) shows 5″ × 5″ stamps of the 48 sources that were rejected as satellite trail remnants or diffraction spikes through visual inspection.Group (b) shows 5″ × 5″ stamps of the 12 sources that were rejected along image edges or near image edges in regions of increased noise.Group (c) shows stamps of the 25 sources rejected as either bad pixels or sources with photometry contaminated by bad pixels through visual inspection of the rms maps at the source positions.We adopt a threshold of rms > 0.1, equivalent to weights of <100.The bad pixels are identified by red contours.Each stamp is 2″ on a side, zooming in on the source position to highlight the source shapes.Group (d) shows 5″ × 5″ stamps of the 12 sources that were rejected as image persistence.10 were identified based on analysis of counts in FLTs observed in the 24 hr preceding the observation of the FLTs that contributed to the mosaic.The remaining two (surrounded by red boxes) were identified in the WISP survey based on the source position relative to a bright source.Each stamp is displayed on a zscale interval (calculated using the pixels in the 5″ stamp) with no image smoothing or pixel interpolation.The r = 0 75 circles show the position of the rejected candidate as a reference.

Figure 17 .
Figure17.The H 160 postage stamps (2″ × 2″) of the 81 sources that were rejected as having unreliable source shapes or morphologies.We note that this rejection is the only subjective part of the visual inspection and rejection process, and we quantify the biases introduced through these rejections using Figure13.The stamps are displayed in the same manner as those in Figure16.The two candidates from B16 that we rejected during visual inspection are identified by red boxes: borg_0240-1857_25 in the top row (third from the right) and borg_0456-2203_1091 in the second row (fifth from the left).Finally, the four rejected candidates with H 160 S/N > 12 discussed in Section 6.2.1 are identified by red dashed boxes.

Figure 18 .
Figure18.Median-combined HST and Spitzer/IRAC PSFs.Top three rows: the HST PSF stamps for each data set are 10″ × 10″, displayed on a zscale interval and linear stretch with no image smoothing or pixel interpolation.We also show the central 3″ × 3″ of the H 160 PSFs in the inset panels (with a min-max interval and a log stretch) to zoom in on the core of the PSF.In each panel, we list the measured FWHM of the PSF profile and the number of stars that were combined in the stack.We display the V 606 PSF rather than the I 600 PSF for the BoRG[z8] data set, as ∼2/3 of the BoRG[z8] fields are covered by V 606 .The I 600 FHWM is included in Table7.Bottom two rows: the IRAC 3.6 μm PSFs (left two columns) and 4.5 μm PSFs (right two columns).The separate PSFs for Par2346-0021 are labeled "Par2346."The top row displays the full stamps (30″ × 30″) on a zscale interval and linear stretch with the number of individual point sources that were combined in the stack.The bottom row zooms in on the central 15″ and uses a min-max interval with a square root stretch to highlight the PSF core.

Table 1
Data Set Summary Note.The BoRG[z8] data set includes 34 fields from the BoRG survey (PI: Trenti), eight HIPPIES fields (PI: Yan, GO 11702), and 22 parallel fields observed as part of the COS GTO program (PI: Green).
aThe average 5σ H 160 limiting magnitude and standard deviation for the fields included in each data set.

Table 2
Parallel Field Coverages and Depths The V 606 and I 600 profiles in the top row have dashed outlines to indicate that the BoRG[z8] Notes.Five representative fields from each data set are presented here.aThe 5σ limiting magnitudes are measured for each filter in circular apertures with r = 0 2. b The area in units of arcmin 2 is measured as H 160 pixels with weights of >100 (rms < 0.1).cFor a few of the BoRG[z8] and HIPPIES fields, two overlapping HST pointings have been combined to form a single field, resulting in varied filter coverage and depths.In these cases, we have separated the field into two components, a and b, and consider them separately in our analysis.See the text for details.(This table is available in its entirety in machine-readable form.) Figure 2. The filter coverage of each data set, showing that of BoRG[z8] (top panel), HIPPIES GO 12286 (middle), and WISP (bottom).

Table 4
Sample of z ∼ 9-10 Candidates [O II], Hβ, and [O III].There are a number of other JWST programs that will be exploring bright galaxies during the epoch of reionization, including GO 1740 (PI: Harikane), Note.The candidates are ordered by filter set, rather than in redshift order as in Table

Table 7
HST and Spitzer/IRAC Empirical PSFs Note.Our measured FWHMs of the empirical, pixelated HST PSFs are larger than the prepixelated values reported in the WFC3 Instrument Handbook (hstdocs.stsci.edu/wfc3ihb).On the other hand, our empirical IRAC PSFs are smaller than the mean FWHMs in the IRAC Instrument Handbook (https:// doi.org/10.26131/irsa48),which are calculated for the 1 2 pixel scale using a star observed at 25 locations on the array.