Characterizing the Near-infrared Spectra of Flares from TRAPPIST-1 during JWST Transit Spectroscopy Observations

We present the first analysis of JWST near-infrared spectroscopy of stellar flares from TRAPPIST-1 during transits of rocky exoplanets. Four flares were observed from 0.6–2.8 μm with the Near Infrared Imager and Slitless Spectrograph and 0.6–3.5 μm with the Near Infrared Spectrograph during transits of TRAPPIST-1b, f, and g. We discover Pα and Brβ line emission and characterize flare continuum at wavelengths from 1–3.5 μm for the first time. Observed lines include Hα, Pα–Pϵ, Brβ, He i λ0.7062 μm, two Ca ii infrared triplet (IRT) lines, and the He i IRT. We observe a reversed Paschen decrement from Pα–Pγ alongside changes in the light-curve shapes of these lines. The continuum of all four flares is well described by blackbody emission with an effective temperature below 5300 K, lower than the temperatures typically observed at optical wavelengths. The 0.6–1 μm spectra were convolved with the Transiting Exoplanet Survey Satellite (TESS) response, enabling us to measure the flare rate of TRAPPIST-1 in the TESS bandpass. We find flares of 1030 erg, large enough to impact transit spectra occur at a rate of 3.6−1.3+2.1 flare day−1, ∼10× higher than previous predictions from K2. We measure the amount of flare contamination at 2 μm for the TRAPPIST-1b and f transits to be 500 ± 450 and 2100 ± 400 ppm, respectively. We find up to 80% of flare contamination can be removed, with mitigation most effective from 1.0–2.4 μm. These results suggest transits affected by flares may still be useful for atmospheric characterization efforts.


INTRODUCTION
The majority of rocky planets suitable for atmospheric characterization with JWST orbit M-dwarfs (Kempton et al. 2018).M-dwarfs make up 75% of stars in the solar neighborhood (Henry et al. 2006) and have high occurrence rates of rocky planets (Dressing & Charbonneau 2015;Hardegree-Ullman et al. 2019).Many terrestrial planets orbit within the so-called "habitable zone" where water oceans can exist in the presence of an atmosphere (Kopparapu 2013), while others orbit interior to the HZ as potential Venus-analogs (Kane et al. 2014).Many of these rocky planets belong to multi-planet systems (Sandford et al. 2019;Zhu 2020), providing an ideal opportunity to investigate the effect of decreasing stellar radiation on the atmospheric compositions of these worlds (Barclay et al. 2023).A number of rocky planets in nearby multi-planet systems are high-priority targets for JWST, including three transiting planets around L 98-59 (Kostov et al. 2019), two transiting planets around LTT 1445A (Winters et al. 2022), and seven transiting planets in the TRAPPIST-1 system (Gillon et al. 2017).
High levels of stellar activity are observed from M-dwarfs in the form of starspots, faculae, plages, and flares (e.g.Delfosse et al. 1998;Berdyugina 2005;Astudillo-Defru et al. 2017;Medina et al. 2020), complicating the characterization of exoplanet atmospheres (Rackham et al. 2023).Transmission spectroscopy assumes that an out-of-transit, diskintegrated spectrum of the host star can be subtracted to isolate the spectrum of the planet (McCullough et al. 2014;Rackham et al. 2017Rackham et al. , 2019)).Stellar activity that varies in time and both occulted and unocculted surface inhomogeneities therefore contaminate the transmission spectrum, altering the transit depth as a function of wavelength (Ballerini et al. 2012).Contamination from flares in the near-infrared (NIR) has received less attention than slowlyvarying noise sources such as spots (Rackham et al. 2023 and references therein).While flares peak at near-ultraviolet (NUV) and blue optical wavelengths (Kowalski et al. 2013), they emit across the entire electromagnetic spectrum (MacGregor et al. 2021) and must be considered as a key source of unpredictable stellar contamination.
Flares occur when magnetic reconnection events in the corona accelerate electrons along field lines toward the photosphere, heating the plasma.Particles of different energies brake at different depths in the chromosphere to the upper photosphere, producing emission at different wavelengths from the FUV to the NIR (Fuhrmeister et al. 2008;Kowalski et al. 2013;Klein & Dalla 2017).Flare spectra are well-described by the superposition of line emission onto a blackbody-like spectrum that evolves in time.Flare light curves generally occur in two phases, with a rapid period of prompt emission characterized by an effective temperature of 9000-16,000 K, followed by a more gradual phase with a lower effective temperature.A blackbody spectrum with an effective temperature of 9000 K would place the NIR in the Rayleigh-Jeans tail (Fuhrmeister et al. 2008) and produce a flux enhancement of 700 ppm at 2 µm for even a small flare of 10 30 erg.On the Sun, such flares occur on a monthly basis near solar maximum (Youngblood et al. 2017).Multiple flares of 10 30 erg are emitted from active M-dwarfs on a daily basis (Lacy et al. 1976).Although the NIR enhancement is low compared to the total energy of the flare, such signals are substantial compared to the ∼20-200 ppm signatures of atmospheric features in transmission spectra of rocky exoplanets (e.g.Moran et al. 2023;Lustig-Yaeger et al. 2023; Barclay et al. 2023) and far larger than pre-launch estimates of the JWST noise floor near 10 ppm (Schlawin et al. 2020(Schlawin et al. , 2021;;Feinstein et al. 2022b).Furthermore, NIR spectra provide some of the strongest available constraints on cooler sources of red optical emission during the long decay phase of flares (Kowalski et al. 2016).
Continuum emission is the dominant source of flare contamination at wavelengths from 1-3.5 µm, assuming a 9000 K blackbody (Fuhrmeister et al. 2008).Nevertheless, flare continuum has not yet been observed at all beyond 1.7 µm, and has not been characterized in spectra at wavelengths longer than 1 µm.Near-infrared spectroscopy from 1.1-1.7µmwas obtained for two small flares from TRAPPIST-1 with the HST Wide Field Camera 3 (WFC3) by Zhang et al. (2018).However, Zhang et al. (2018) note the spectrum of each flare cannot be meaningfully constrained, as the variation in the spectrum is mostly at the 1-2σ level.Characterization of the continuum does exist at NIR wavelengths below 1 µm for several flares.Continuum was detected in a large flare from Wolf 359 out to 1.01 µm with the red arm of the Ultraviolet-Visual Echelle Spectrograph (UVES) on the ESO Kueyen telescope (Fuhrmeister et al. 2008).Flare continuum has also been detected out to 0.76 µm in an event from the M9.5 dwarf 2MASS J0149090+295613 with the Low Resolution Imaging Spectrograph on Keck II (Liebert et al. 1999) and out to 0.94 µm in a large flare from the M7 dwarf 2MASS J1028404-143843 (Schmidt et al. 2007).The lack of spectral coverage or sensitivity at longer wavelengths during each of these flares makes it difficult to characterize contamination at the wavelengths used for JWST transit spectroscopy.
The lower order hydrogen I Paschen and Brackett lines, 1.0833 µm helium I infrared triplet (IRT), and several metal lines such as the Ca II IRT (0.85 to 0.87 µm) each contribute to flare contamination at NIR wavelengths.The cumulative effect of the forest of weak lines near the Paschen jump (0.82 µm) and the Brackett jump (1.46 µm) may also induce contamination in transit spectra.While Pα has not been detected yet during an M-dwarf flare, the higher-order Paschen lines have been detected in several flares.High-resolution flare spectra obtained in the NIR with CARMENES contain 57 and 30 detections of Pβ and Pγ, respectively (Fuhrmeister et al. 2023).However, spectra of each star are typically obtained several days apart, making it difficult to know if the flares are observed during the peak or decay phase.Pβ has been detected with spectral time series in three flares (Schmidt et al. 2012) using the 0.95-2.45µm TripleSpec near-infrared spectrograph on the 3.5 m ARC telescope at APO, and Pγ and the He I IRT have been detected in five flares with TripleSpec and the 0.810-1.280µm Habitable-zone Planet Finder high-resolution spectrograph on the 10 m Hobby-Eberly Telescope (Schmidt et al. 2012;Kanodia et al. 2022).Evidence for He I IRT emission in flares was also obtained in CARMENES spectra (Fuhrmeister et al. 2020).A wealth of detections for Pδ (1.005 µm) and higher order Paschen lines during flares exists in the literature below 1.1 µm (e.g.Liebert et al. 1999;Fuhrmeister et al. 2008;Schmidt et al. 2012;Kanodia et al. 2022).A single detection of Brackett emission (Brγ) was reported by Schmidt et al. (2012).However, the lack of observations of the lowest energy transitions limits our understanding of hydrogen line emission in the NIR more broadly since these lines are likely to dominate the energetics of their respective series.
Flares observed by Spitzer to partially overlap transits of TRAPPIST-1 planets in Gillon et al. (2017) have raised concerns about flare contamination (Davenport 2017).However, the exact flare rate of the star is uncertain (Wilson et al. 2021).K2 observed TRAPPIST-1 for 70.6 days and observed 39 flares with energies from 2.0×10 29 to 2.3×10 32 erg (Vida et al. 2017;Paudel et al. 2018).The flare frequency distribution (FFD) derived from the K2 data predicts a flare of 10 30 erg occurs at a rate of 0.3 flares d −1 when considering only flares above the 100% completeness limit as determined by injection testing (Paudel et al. 2018).The average flare rate of TRAPPIST-1 analog stars was measured in TESS data and corrected for missing flares using an incompleteness curve and uncertainties measured from injection testing (Seli et al. 2021).The resulting average flare rate of the TRAPPIST-1 analogs corrected for incompleteness predicts ∼1-5 flares d −1 of 10 30 erg.The NEAT programs will obtain a total of 75 hr of JWST observations across the entire sample, including 14.3 hr in transit.Many of these observations should occur during flares, with 1-15 flares of 10 30 erg expected based on the flare rate.These flares provide a serendipitous opportunity to characterize both the continuum and line emission of flares from 0.6-3.5 µm at unprecedented precision.In turn, the flare observations can enable empirical insights to be developed for the mitigation of flare contamination, increasing both the yield of usable transits and sensitivity to spectral features from the planet atmospheres in spectra impacted by flares.
In Section 2, we describe the JWST transit observations.In Section 3, we describe the spectral reduction methods for each instrument and the extraction of calibrated spectra and light curves.In Section 4, we describe the methods used for the identification and characterization of flare lines.In Section 5, we describe the methods used to measure the continuum properties.In Section 6, we describe how the amplitude, energy, and shape of the flare light curves are quantified.In Section 7, we present the detection of flare lines and continuum emission and discuss the properties of the flares both individually and as a sample.In Section 8, we model the luminosity emitted by the flare lines using radiative hydrodynamic flare models.In Section 9, we mitigate flare contamination in the TRAPPIST-1b and f transits.In Section 10, we discuss the implications of our findings, and conclude.

OBSERVATIONS OF TRAPPIST-1 WITH JWST
We describe the transit spectroscopy observations from the NEAT program used in this work.

The NEAT program and Target Selection
The NEAT program is designed to measure molecular abundances in planetary atmospheres as functions of the planet mass and radiation environment in order to explore the processes sculpting the formation of close-in planets.Most targets are observed with the Single Object Slitless Spectroscopy (SOSS; Albert et al. 2023) mode of the Near Infrared Imager and Slitless Spectrograph (NIRISS; Doyon et al. 2023).The remaining targets are observed with the Near Infrared Spectrograph (NIRSpec) using the Bright Object Time Series (BOTS) mode.The sample of 14 systems is primarily defined by the JWST Cycle 1 and 2 Guaranteed Time Observation Programs GTO 1201 and GTO 2759 (PI: Lafrenière).These two programs include a number of rocky or likely-rocky planets orbiting M-dwarf stars, including TRAPPIST-1 (Gillon et al. 2017) 2 and 3.The x-axis time scales of the top and bottom panels are fixed within each panel to better observe changes in activity between observations.The final transit of TRAPPIST-1g is 44.4 d after the previous observation, so a break in the x-axis (but still at the same time scale) is used to include this data in the figure.NIRSpec observations are binned to the same cadence as NIRISS and have a lower photometric uncertainty.

NIRISS and NIRSpec Observations of TRAPPIST-1
In this work, we focus on a subset of the TRAPPIST-1 observations obtained with NIRISS SOSS and NIRSpec BOTS as part of GTO 1201 and GO 2589.Our sample consists of NIRISS observations of two transits of TRAPPIST-1b, one transit of TRAPPIST-1c, and one transit of TRAPPIST-1f, and NIRSpec observations of two transits of TRAPPIST-1g.The TRAPPIST-1 observations are ongoing, with visits of TRAPPIST-1d and h not yet available at the time of this work.A total of 16.87 hr of monitoring time on target was obtained with NIRISS across all observations, consisting of 586 integrations taken at 105 s cadence (18 groups per integration).Similarly, a total of 9.91 hr of monitoring time on target was obtained with NIRSpec, consisting of 22,234 integrations taken at 1.6 s cadence with NIRSpec.Analysis of the NIRISS transit spectrum of TRAPPIST-1b is presented in Lim et al. (2023).Analysis of the NIRSpec transit spectrum of TRAPPIST-1g is presented in Benneke et al. (2023, submitted).
During the NIRISS observations, two large enhancements in the Hα flux from the star indicative of flaring are observed during the second transit of TRAPPIST-1b and the transit of TRAPPIST-1f.A period of Hα enhancement composed of two peaks is likewise detected just prior to the second transit of TRAPPIST-1g.We call the enhancements during the TRAPPIST-1g observations Flare 1 (F1) and Flare 2 (F2) throughout this work.The enhancement during the TRAPPIST-1f visit is called Flare 3 (F3) and the enhancement during the second visit of TRAPPIST-1b is called Flare 4 (F4).The event numbering is by decreasing flare energy in the TESS bandpass as described in §6.2 and given in Table 3.A summary of these observations is presented in Table 1 and an overview of the Hα time series are shown in Figure 1.
NIRISS SOSS is designed to perform transit spectroscopy of exoplanets orbiting stars of 7<J<15 (Doyon et al. 2023).The instrument acquires spectral time series observations at NIR wavelengths from 0.6-2.8µm at medium resolution (R≈700).NIRISS is a cross-dispersed spectrograph with three orders, although sufficient signal to noise (S/N) for science observations of most targets is only obtained for orders 1 and 2. The spectra in each order are defocused in the cross-dispersion direction to minimize pointing and flat-field systematics, leading to blending of the orders at several points along the trace1 .Our NIRSpec BOTS observations are taken without a filter in the PRISM/CLEAR mode2 (Böker et al. 2023).In our setup, NIRSpec BOTS obtains spectral time series of bright targets at NIR wavelengths from 0.6-5.3µm at low resolution (R≈100) and high throughput (Birkmann et al. 2022).The BOTS mode of NIRSpec passes the spectra through the 1.6 in 2 S1600A1 aperture for maximum spectroscopic precision for planets around bright stars.The target is kept centered on the same pixels on the detector for increased stability.

NIRISS data reduction with supreme-SPOON
We use the supreme-Steps to Process sOss ObservatioNs (supreme-SPOON) pipeline for our primary spectral reduction.supreme-SPOON begins with the raw, uncalibrated NIRISS SOSS FITS files and extracts one-dimensional spectra with differential calibration, correcting systematics on a group-by-group basis within each integration (Feinstein et al. 2023;Radica et al. 2023;Coulombe et al. 2023).Correction of detector effects is largely consistent between supreme-SPOON and stage 1 of the jwst pipeline.Superbias correction is applied to each raw image within each group and integration, and saturation flags are applied to pixels with values below the noise floor or above the saturation limit.Following Feinstein et al. (2023) and Radica et al. (2023), reference pixel correction is performed to further mitigate bias variations between groups and adjacent rows within individual frames.Jump detection is carried out after the other steps to avoid inducing false positives.Finally, the ramps are fit to produce rateints files.We do not correct for dark current due to the negligible dark current rate of the NIRISS observations Feinstein et al. (2023).
The Space Telescope Science Institute (STScI) JDox SOSS SUBSTRIP256 background model is used to correct for the zodiacal background (Rigby et al. 2023).A region in the images without significant contamination from any of the orders is identified, and a scaling factor is defined from the median flux of the image region and the median flux of the comparison region of the background model and applied to the background model.The scaled background is then subtracted from each image.supreme-SPOON corrects for 1/f noise separately within each group prior to combining the data from all the groups in a given integration.The 1/f noise is a form of readout noise that changes with time due to the voltage amplification process.The presence of 1/f noise is mitigated using a difference image composed of the target frame and a median stack of all out-of-transit images in the same group and integration scaled to the flux in the target frame using the transit light curve.The median of each column in the difference image is then subtracted from the original column in the corresponding integration within the group.The y-positions of the trace are identified using the edgetrigger algorithm, which takes the derivative of the y-axis values and identifies the midpoint between the maximum and minimum values (Radica et al. 2022).Lastly, the spectra are extracted as a differential flux-calibrated product for science analysis using a simple box aperture of 25 pixels since the transmission self-contamination level for TRAPPIST-1f is expected to be <20 ppm (with a similar value for TRAPPIST-1b), much smaller than the final uncertainties in our analysis (Darveau-Bernier et al. 2022).
Photons of the same wavelength do not follow entirely vertical contours on the NIRISS detector.Rather, the wavelength maps follow contours that smoothly vary across multiple columns in patterns approximating cubic functions3 .The curve of the trace intersects different parts of the wavelength maps at different y-pixel positions, inducing systematic offsets in the wavelength solution.We therefore cross-correlate the spectrum of TRAPPIST-1 with a PHOENIX model of the star from Wilson et al. (2021) to determine these systematic offsets and correct the wavelengths.

NIRISS data reduction with transitspectroscopy
We use the transitspectroscopy pipeline (Espinoza 2022) as a secondary spectral reduction to validate our primary reduction.transitspectroscopy (ts) is a spectral extraction and transit fitting package for NIRISS SOSS that starts from the rateints files produced by stage 1 of the jwst pipeline instead of the raw data (Coulombe et al. 2023).First, ts.trace spectrum locates the trace for each order through cross-correlation of the image along the dispersion axis against a two-component Gaussian function with parameters µ G,1 =-4.5, σ G,1 =3, µ G,2 =4.5, σ G,2 =3.While all pixels in the columns are used when locating the order one trace, a y-tolerance of 20 pixels is used for order two.The order one trace is computed for columns x ∈[4, 2043], and the order two trace is computed for x ∈[700, 1625] to avoid order blending.Flux from the first order within the pixel radius is masked out before computing the second order trace.Each trace is then smoothed with cubic splines.We follow Feinstein et al. (2023) in reporting the number of spline knots in order one n 1 and two n 2 and the partitions of the x-axis x 1 and x 2 of the spline fits: [6,, [1200,, [1500,, [1700,2041]], and x 2 = [[701, 850-5], [850,, [1100,1624]].
A model provided by STScI in the JDox User Documentation is used to subtract the zodiacal background.A region in the images without significant contamination from any of the orders is identified and used to construct a scaling between the background model and the data.For the TRAPPIST-1b observations, this region is x ∈[500, 800], y ∈ [190,246].For the TRAPPIST-1f observations a larger background region is available, x ∈[10, 750], y ∈ [190,246].The ratio between the actual background and the model is measured for each pixel in the comparison region to generate a distribution of ratios.The scale factor for the TRAPPIST-1b observations is f S =0.68 and is f S =0.0064 for the TRAPPIST-1f observations with the larger background region.The bottom and top quartiles are clipped and the median of the remaining values is adopted as the scale parameter.For the TRAPPIST-1b observations, quartile three (0.5-0.75) is also clipped since most flux is below this range.
The pipeline computes a median stack of all background-subtracted images that were not obtained during the transit.The median reference frame is scaled by the normalized transit light curve for images to correct for the reduced flux.transitspectroscopy corrects 1/f noise after the groups in each integration have been combined since it does not work with the raw uncalibrated files.The dominant noise source left after background removal is 1/f banding.We select two regions close to the trace consisting of 15 pixels in the y-axis direction starting 20 pixels away from the top and bottom of the trace.The median of this region is computed on a per-column basis separately for the first and second orders and subtracted from the image.Cross-order contamination from this process has been demonstrated to not be significant (Feinstein et al. 2023).
Spectra are extracted using ts.spectroscopy.getSimpleSpectrum for an aperture of 15 pixels for the first order and 30 pixels for the second order.Each spectrum is median-normalized, enabling a direct comparison of the variability observed for the flux in each column.The median and standard deviation of the resulting distribution of the normalized spectra are used to define a reference spectrum.The reference spectrum is needed to identify bad columns, such as those resulting from cosmic ray hits.The reference spectrum is scaled to the individual integrations and subtracted to compute residuals.Any 5σ outlier is corrected based on the values adjacent to it.
We find the wavelength solution provided with transitspectroscopy is good to within a few nm.However, subpixel accuracy is desired for identifying emission lines, so we construct a correction factor for the wavelength solution.We perform a cross-correlation between the spectrum and the Wilson et al. (2021) PHOENIX model of TRAPPIST-1.We divide the spectrum into sections of ∼0.2 µm and compute the wavelength offset in each section, observing they follow the trace.We fit a function that varies with wavelength in the same way the trace does, but with the y-axis scaled to the wavelength offsets instead of the detector position.We then verify the new solution reproduces the positions of well-known emission lines in the spectrum.However, we find the supreme-SPOON reduction is more optimized for flare photometry as the point-to-point variability during the peak phase is reduced by >20%.This effect is particularly noticeable in the second peak in the Hα light curves and the Pβ peak.

NIRSpec BOTS data reduction
NIRSpec prism data are reduced with a customized version of Stage 3 of the open-source Eureka!pipeline for the extraction and analysis of time series JWST observations (Bell et al. (2022), Benneke et al. 2023, submitted).The reduction starts from the "uncal" outputs from Stage 0 of the JWST pipeline (version 1.6.0).We perform data calibration in Stages 1 and 2 following the steps described for the analysis of the WASP-39 b NIRSpec PRISM observations with Eureka! presented in Rustamkulov et al. (2023), except for how pixel saturation flags are set.We perform group scaling, and linearity correction using the standard JWST pipeline steps.Our customized routine first updates the saturation flags: rather than flagging a user-provided range of columns as saturated, we use the "percentile" method with a 50th percentile threshold: for each group, we flag as saturated entire columns which contain at least one pixel marked as saturated in the median frame of all integrations at this group.We then perform row odd-even by amplifier (ROEBA) correction of 1/f noise.Within each group, the median of the background pixels in the odd rows is subtracted from all odd rows, and the median of the even rows is subtracted from all even rows.Group-level background subtraction follows: for each column in each group, the median of the background pixels is subtracted from the entire column.Finally, ramps are fitted while ignoring 3σ outliers.The flux calibration step is skipped since we flux calibrate directly from the PHOENIX model of TRAPPIST-1 as described below in §3.4.Stage 2 of the pipeline outputs calibrated the fitted slopes for each pixel and each integration ramp in the time series.
We perform custom spectral extraction with the Horne (1986) optimal extraction routine in Stage 3 of Eureka!.Only columns 14 to 495 along the dispersion axis are included in the extraction due to the low throughput of NIRSpec in the excluded columns at the detector edges.The trace is located from the peak of a Gaussian fit to each column in a summed image across the integration.Similarly to Stage 1, we perform column-by-column median subtraction after masking all pixels with data quality flags and using only background pixels that are 8 or more pixels away from the central trace.Optimal median-weighted 1D extraction is then computed for the half-width aperture of 4 pixels in the cross-dispersion direction on either side of the trace to obtain the time series of spectra.The large (40-200 Å) wavelength bins do not require wavelength corrections.We only use the one NIRSpec reduction pipeline since we do not study NIR lines with the NIRSpec data and the shape of the continuum is robust across all flares.We flux calibrate the extracted spectra of TRAPPIST-1 from each instrument and pipeline against a PHOENIX spectrum of TRAPPIST-1 whose reliability is verified with NIR photometry bands (Wilson et al. 2021).The PHOENIX model is computed for the non-flaring stellar spectrum by linearly interpolating four PHOENIX model spectra computed at 2600 and 2700 K and logg values of 5.0 and 5.5 to the values of 2628 K and log g=5.21 measured by Gonzales et al. (2019).Different values of the temperature and log g result in different values in the absolute flux calibration, although uncertainties in both parameters are reported at the ∼1% level in Gonzales et al. (2019).Other spectral energy distributions in the literature such as that of Pineda et al. (2021) differ at the ∼10% level based on the assumed stellar parameters.

Flux calibration
For each set of observations, an averaged spectrum of all integrations that are both out of transit and out of flare is constructed for the color correction.The averaged spectrum and the PHOENIX model spectrum are both heavily binned to remove spectral features.For NIRISS, 200 wavelength points per bin are used; for NIRSpec, 90 points per bin are used.A smoothly-varying response curve results from dividing the resulting binned model by the data.The choice of bin size on systematic uncertainty of the flux calibrated spectrum is measured by varying the bin size and creating an alternate flux calibration.For NIRISS order one, the bin size is increased by 50% in the alternate calibration, while the bin size is increased by 25% for order two.For NIRSpec, the bin size is varied by 10%.These values were selected because varying the bins within the specified ranges kept the bin scale large enough to blur out spectral features without altering the overall shape of the spectrum.Using these values, the mean systematic uncertainty of the order one NIRISS calibration is 7.8%, with a 1σ upper limit of 15%, while the mean systematic uncertainty of the order two NIRISS calibration is 8.1%, with a 1σ upper limit of 22%.The mean systematic uncertainty of the NIRSpec calibration in the wavelength range 0.6-3.5 µm is 3.6%, with a 1σ upper limit of 12%.We use both the primary and secondary flux calibration to create two versions of the same flare spectra and compare the final values of the flare properties to ensure the impacts of this systematic uncertainty are not significant.

Light curve extraction
We create light curves for the flux within specific wavelength regions by iterating through the integrations, summing the counts for all wavelength points within the line.These light curves are useful for exploring the time evolution of line emission (e.g. the Balmer and Paschen lines) as described in §4.2 and 4.3.We create light curves in each continuum bandpass by convolving the flux-calibrated spectra with the band response function and then summing the counts.We choose the TESS, J, H, and Ks bandpasses since they cover the range of wavelengths where most flare flux is observed (0.6-2.2 µm) and are widely recognized.We also note these bands allow us to connect our flares to the extensive flare literature from the TESS mission (e.g.Günther et al. 2020;Feinstein et al. 2022a;Howard & MacGregor 2022) and flare observations from 2MASS (Davenport et al. 2012).Photometric uncertainties are propagated from the individual flux uncertainties at each wavelength.Systematic uncertainties due to flux calibration are obtained by varying the calibration function as described above and re-computing the light curves.The light curves of the lines are not impacted by systematic color offsets, and the systematic color offsets of the continuum bands are small.Systematic uncertainties for the F3 and F4 flare light curves are at the 5% level for the TESS band and 0.1% level for the J, H, and Ks bands.
Transits are removed from the light curves prior to flare analysis using the batman package (Kreidberg 2015) as shown in Figure 3.The transit model is specified by fixed values of the orbital period, semi-major axis, inclination, and planetary radius taken from Gillon et al. (2017).The time of inferior conjunction is measured from the transit light curve since transit timing variations can change this value slightly between orbits.For TRAPPIST-1f where the flare covers the entire transit, this value is initially estimated as the center time of the transit and then verified after an exponential function is subtracted from the broadband transit light curve as described below.An eccentricity of zero and longitude of periastron of 90 degrees are assumed for each planet.
Quadratic limb darkening of the transit light curve is computed with batman as described in Equation 9of Kreidberg (2015).We reproduce the limb-darkened intensity in our Eq.1: Here, I 0 is the 1D intensity profile prior to limb-darkening, µ = √ 1 − x 2 with the normalized radial coordinate x given in units of stellar radius, and c 1 and c 2 are the quadratic limb darkening coefficients.Quadratic limb darkening coefficients are c 1 =0.34 and c 2 =0.26 for TRAPPIST-1b and c 1 =0.17 and c 2 =0.26 for TRAPPIST-1f.The TRAPPIST-1b coefficients are computed by fitting the broadband light curves of the transits, excluding in-flare integrations and setting the limb darkening coefficients as free parameters.The flare covers the entire transit of TRAPPIST-1f, so we first fit and remove an exponential representing the flare decay from the λ >1.1 µm broadband transit light curve using the Davenport et al. (2014) flare template.The Davenport et al. (2014) template is scaled to the data using a flux amplitude of 0.00148, width of 1.225 hr, and peak time during the 69 th integration.The coefficients were initially identified by manually varying the TRAPPIST-1b values for TRAPPIST-1f.The transit fit was then verified with a Monte Carlo analysis that minimized the residuals to the transit light curve.The errors on the fit are at the 10% level and induce systematic uncertainties of 0.01% in the resulting fluxes, an order of magnitude below the flux amplitudes of the flares as listed in Tables 2 and 3. We note Lim et al. (2023) find the best-fit batman transit parameters to the TRAPPIST-1b NIRISS data differ from the literature values.However, the 10 3 -10 4 ppm signals of the flares are much less affected by uncertainties in the exact shape of the transit than are planetary signals of 100 ppm.This can be seen in the O-C diagrams of Figure 3, where the shape of the flare shows a smooth decay after the transit is subtracted.The peak of each flare occurs during integrations outside of the transit or in the flat bottom, further minimizing the effect of uncertainties in the exact shape of the transit.
We compare the flare light curves between the transitspectroscopy and supreme-SPOON pipelines.Although neither reduction was initially designed for creating light curves of single lines, both pipelines produce qualitatively similar light curves for most lines.Example light curves are shown in Figure 2 for the Hα, Pα, and Pβ lines, since these span a range of wavelengths from 0.6565-1.8756µm.The photometric variability in the lines is slightly lower for the supreme-SPOON reduction, which aids comparison of the flare light curve shapes.We note the point-to-point variability during the rapid peak phase of the flares is sometimes significantly lower for the supreme-SPOON reduction.The point-to-point variability during the flare peak is defined as the standard deviation of the difference between the flux of successive integrations.Lower amounts of scatter for supreme-SPOON are especially noticeable during the peak of the Pβ flare light curve in the middle panel of Figure 2 and the secondary flux increase in the Hα flare light curve at ∼2.4 hr in the first panel of Figure 2. The point-to-point variability is typically 20-50% greater for our implementation of transitspectroscopy, but is 270% greater for the Pβ peak and 100% greater for the second Hα peak.We speculate the difference may result from the group versus integration level reduction choices of the pipelines.Based on this comparison, we choose supreme-SPOON as our primary reduction for the forthcoming analysis.The light curves are visually inspected to identify times during quiescence and times during flares.All flux calibrated spectra during non-flaring integrations are averaged together to create a quiescent reference spectrum.Likewise, spectra during integrations near the flare peak are averaged together to maximize the flare signal.A flare-only spectrum is created by subtracting the out-of-flare spectrum from the peak flare spectrum and propagating the errors of the component spectra.The peak spectrum of the TRAPPIST-1b observations occurs during the transit.In this case, the fractional depth of the transit light curve is used to correct the flux before subtracting the quiescent spectrum.

Identifying flare lines
Candidate lines in the NIRISS data are first identified in an initial visual inspection of the flare-only spectra.To aid identification of lines, the central wavelengths for Hα, the Paschen series, and Brackett series in vacuum are overlaid on the spectrum.Other NIR flare lines in the literature (e.g.Liebert et al. 1999;Fuhrmeister et al. 2008;Schmidt et al. 2012;Kanodia et al. 2022) are also overlaid.While we overlay the Paschen series from Pα at 1.8756 µm through the Paschen jump at 0.82 µm, we do not see any candidate lines exceeding the local noise beyond the Pϵ line at 0.9549 µm characterizing the 8-3 transition of hydrogen.We searched for Brackett emission up to Brγ, but only found a candidate line for Brβ at 2.6259 µm characterizing the 6-4 transition of hydrogen.Brα lies beyond the wavelength range of NIRISS at 4.0523 µm.We also searched for helium emission, including a helium I line at 0.7062 µm and the well-known helium infrared triplet (IRT) activity line at 1.0833 µm.We also searched for the Mg I and Fe I lines reported in Table 2 of Kanodia et al. (2022).

Validating flare lines
Candidate lines are vetted in a two-step process designed to remove false positives.First, the significance of the peak σ line is estimated compared to the surrounding continuum.The FWHM of each line is measured and a continuum region of 20 FWHM to either side of the line is defined.The mean µ G and standard deviation σ G of the continuum region are measured and the probability that the maximum flux of the candidate line would be produced from a Gaussian distribution of fluxes with the same µ G and σ G is computed.Twenty FWHM were selected to minimize the number of other potential peaks in the continuum region, to ensure consistency in the local noise properties, and to avoid arbitrary selection of continuum regions.
Second, the time variation in the line is compared with the surrounding continuum.Light curves for continuum regions adjacent to and on either side of the candidate line are created.These light curves are visually examined to ensure the light curve fluxes at the target wavelength are higher than in the surrounding continuum.If the light curves of the line and continuum are similar, the candidate is classified as a false positive.This step is necessary because small continuum enhancements are present during the flare at wavelengths where line emission might be expected.Next, the significance of the flare light curve in the line σ lcv is estimated compared to the out-of-flare regions in the light curve.The mean µ G and standard deviation σ G of the out-of-flare portions of the light curve are measured and the probability that the maximum flux of the light curve would be produced from a Gaussian distribution of fluxes with the same µ G and σ G is computed.We bin the light curves of flares that evolve more slowly than the instrument cadence prior to comparing the peak flux to the noise.The overall rise and decay of the F4 event occurs on ∼10-20 minute timescales.We therefore bin the light curves in each line into groups of ten, or 17.5 min per bin.Ca II IRTc is an exception, as the rise and decay of its light curve is captured with five points per bin, or 8.75 min per bin.We likewise bin the NIRSpec flares, F1 and F2, to 30 s cadence using 18 integrations per bin prior to computing the light curve significance.
For both the σ line and σ lcv estimates, Gaussian noise is assumed due to the small number of points in the continuum region and the out-of-flare portions of the time series during each 3-4 hour JWST observation, respectively.Random shuffling of points or bootstrapping are designed for scenarios in which higher-resolution spectra or longer time series are available.We consider a secure line detection to either exceed the 3σ significance level for both the σ line and σ lcv measurements or to exceed 5σ for one of these measurements and be clearly visible by eye in the light curve and spectrum.We allow for the second case because some lines such as the λ0.7062 µm He I line in the F3 event have lower values of σ line of 2-3σ but have a clearly visible flare light curve that is much stronger than the light curves for the continuum regions on either side of the line.We consider a formal but weak detection to reach 3σ for either of the σ line or σ lcv measurements, and a candidate line to be above 2σ but below 3σ for both the σ line and σ lcv measurements.Each candidate or confirmed line in the spectra of the F3 and F4 events is shown in Figure 4. Likewise, the light curves of each line are shown in Figure 5.
While the line widths are comparable to the wavelength bins of NIRISS, this is not true for NIRSpec.The NIRSpec wavelength bins vary from 42 Å at 0.6 µm to 201 Å at 1.55 µm, substantially diluting line emission.We isolate line emission from the continuum in the NIRSpec wavelength bins using reference light curves.The continuum remains essentially the same in the light curve of the flare at wavelengths immediately adjacent to the line, while the line emission is different.We therefore average the light curves in the continuum regions on either side of the line to estimate the contribution of the continuum in the target wavelength region.We normalize the light curve of the target wavelength region by the reference light curves and propagate the errors on the fluxes to produce a corrected light curve of the line.This process is illustrated for the Hα line in Fig. 6.Our method indicates 61% of the flux in both NIRSpec flares F1 and F2 is due to Hα emission and 39% is actually due to the underlying continuum.Light curves of the higher-order lines are much weaker than Hα, with 0.2% enhancements above the continuum at peak for Pα, Pδ and for the combined flux of the Pγ and the He I IRT lines.We note each of these excesses except for the Pγ and He I IRT emission are at the scale of the error bars and do not constitute detections.No evidence exists for other Paschen or Brackett lines in the NIRSpec data.

Identifying flare continuum
The lines identified in §4.2 are masked out of the flare-only spectra in order to observe any underlying continuum.The majority of the emission of a blackbody of T eff =5000-10,000 K is at wavelengths below 1 µm where the response curves of the instruments are lower.As a result, the flux calibrated flare-only spectrum is used to identify and characterize the continuum.We bin the NIRISS spectra by N =20 points per wavelength bin and visually inspect both the peak flare spectrum and the time evolution of the spectrum.We select 20 points per bin based on inspection of the bin size necessary to reduce scatter sufficiently to see the large scale variation in the spectrum.Binning is not  necessary for NIRSpec due to the lower wavelength resolution.
An initial visual inspection reveals the spectrum of each flare appears consistent with that of a blackbody once the lines are masked out, as shown in Figure 7.We fit the Planck function to the flares to determine if the NIR flare continuum is indeed consistent with the Rayleigh-Jeans tail of a blackbody.In addition to the apparent blackbody curve present from 0.6-3 µm, spectral features are also present on smaller scales.Increasingly deep spectral features and large error bars near 0.6 µm may represent systematics near the edge of the detector.Spectral features at longer wavelengths are unlikely to be systematics, however.Even though some features are present in both NIRISS and NIRSpec data, features are masked out separately to account for the different resolutions of the instruments.For NIRISS, we exclude regions from 0.723 to 0.761, 0.782 to 0.790, 1.370 to 1.395 µm as well as the region of the Paschen jump near the Paschen series limit from 0.820 to 0.867 µm.For NIRSpec, we exclude regions from 0.676 to 0.745, 1.351 to 1.410, and 1.453 to 1.527 µm.We note that the ∼1.5 µm feature is near the Brackett jump.The other features are in regions of high stellar flux and are likely astrophysical in origin.
We define new spectral flux errorbars to capture the empirical point-to-point variability as a function of wavelength, ϵ λ .The formal errorbars ϵ formal of the extracted spectra are often smaller than the scatter seen in the spectra.Our ϵ λ errorbars are agnostic to the nature of this small-scale variability and improve the fit to the overall shape of the spectrum.We define ϵ λ as the dispersion of the absolute value of the difference in flux between adjacent wavelength points in the spectrum.As shown in Fig. 7, this approach does a good job representing the variation in the spectrum at smaller scales than the overall shape of the spectrum to be fit by the Planck function.Finally, each spectrum was reduced separately for both calibrations and compared against the other to account for systematic errors from flux calibration.The alternate calibration spectra are nearly identical to the primary reduction spectra.

Fitting the Planck function to the flare continuum
The flare spectrum holds information on both the effective temperature T eff and filling factor of the flare X eff , defined as Here, R fl is the effective radius of the flare and R * is the stellar radius.The shape of the spectrum is determined by T eff , while the absolute luminosity of the flare is set by X eff .Following Hawley et al. (2003), we fit the spectrum for both variables simultaneously by scaling the flux received at the telescope f λ,f l,⊕ for the stellar distance d * and setting it equal to the monochromatic luminosity of the Planck function F λ,f l, * = 4π 2 R 2 fl B λ (T eff ).We use values of the stellar radius and distance of R * /R ⊙ =0.1192±0.0013(Agol et al. 2021) and d * =12.43±0.02pc (Chambers et al. 2016).The flux of the flare at each wavelength is given from the Planck function as B λ (T eff ).We fit the resulting expression for the flare-only flux, shown in Eq. 2: (2) We fit the continuum with the expression given in Eq. 2 using a bootstrap approach.Fits are performed at the native wavelength resolution of NIRSpec and at the binned resolution for the NIRISS data using 200 bootstrap trials.In each trial, the fluxes of each wavelength point are varied within the empirical ϵ λ errorbars and a 10-20% region of the spectrum is randomly dropped to avoid over-fitting to any specific features.The best-fit and 1σ range of the flux values for each wavelength bin are computed from the bootstrapped fits.Likewise, we record the best-fit and 1σ range of the T eff and X eff values from the bootstrap.We also fit Eq. 2 to a flare-only spectrum produced with the alternate flux calibration to assess the effects of systematic errors arising from flux calibration.During the peak of the flares, the mean systematic offsets in temperature are 860, 410, 100, and 20 K for the F1, F2, F3, and F4 events, respectively.The mean systematic offsets of the filling factors between flux calibrations are 0.05, 0.06, 0.01, and <0.01%for the F1, F2, F3, and F4 events, respectively.The systematic offsets are lower outside the peak times since most of the color correction occurs at the bluest wavelengths.
The continuum of each individual integration is fit to obtain time series for T eff and X eff .For the smallest NIRISS flare (F4), light curves are binned in groups of 5 integrations to increase the signal to noise prior to fitting.Likewise, the native 1.6 s cadence of the NIRSpec data are binned to ∼30 s cadence with groups of 18 integrations.The flare spectrum is strongest during integrations with high count rates during the impulsive phase.We maximize the signal of the continuum by binning the points during the peak of each flare.For the NIRSpec flares (F1 and F2), 18 integrations per bin are sufficient to maximize the signal of the flare peak in a single point, while 4 integrations are used for the peak spectrum of the larger NIRISS flare (F3) and 5 integrations are used for the smaller NIRISS flare (F4).The continuum spectrum during the peak of each flare is displayed in Figure 7 at the native wavelength resolution of the instrument as well as the binned resolution for NIRISS.The best-fit blackbody models and uncertainties are overlaid on the spectrum of each flare.

Validating continuum enhancements in the light curves
The continuum is sufficiently elevated during the flares to induce a flux increase in the broadband light curves.We therefore verify the significance of the light curve peaks seen in the TESS, J, H, and Ks bands using the σ lcv method described in §4.3.The large scale structure of the flares evolves more slowly than the native cadence of the integrations.The TESS light curve is binned to 10 s cadence and the NIR bands are binned to 30 s cadence to increase the S/N by 2.5-4.3×prior to computing σ lcv for NIRSpec.The TESS, J, H, and Ks light curves of both NIRISS flares (F3 and F4) are binned to 18 minute cadence using 10 integrations per bin to resolve only the large-scale structure in the flare light curve.The light curves are shown in Figure 8.

MEASUREMENT OF THE FLARE PARAMETERS
We measure the quiescent luminosity, start and stop times, flux amplitude, peak luminosity, and total energy of each flare using the light curves produced for each line and for the continuum bands as follows.

Measuring quiescent luminosity for the lines and continuum bands
We use the out of flare times to measure the quiescent luminosity L 0 in erg s −1 for each line and for the continuum.For the lines, we sum the calibrated spectrum across all wavelength points identified within a given line and then scale the flux received by the detector to the flux emitted at the surface of the star using the stellar distance d * =12.43±0.02pc (Chambers et al. 2016).We also multiply out the wavelength dependence using the effective wavelength coverage for each point in the line.We propagate the formal errors of the spectrum and stellar distance.Because most integrations during the TRAPPIST-1f visit are affected by flares, we use the quiescent integrations in the TRAPPIST-1b spectrum to measure L 0 for both sets of lines.The number of wavelength points within the lines of the stronger F3 flare are sometimes larger than for the smaller F4 flare, so we do not use the L 0 values from the F3 event directly.Rather, we visually compare the wavelength solution and number of points in each line between the spectra of the TRAPPIST-1b and f visits.We verify the same wavelength points are used for the quiescent spectrum that were used to measure the line emission in F3.For each flare, the best-fit effective temperature and filling factor are shown for the integrations, as well as the broadband light curves for the TESS and 2MASS filters.The temperature is hottest near the beginning of the flares for the F1-F3 events, and cools throughout each event.It is difficult to compare the times of maximum temperature and peak flux for the F4 event due to the binning of integrations to increase the signal to noise.The flare area expands as the plasma cools, leading to an increase in the filling factor.The TESS band light curve evolves fastest, with the H and Ks band light curves evolving the slowest.
Quiescent luminosities are also measured for the TESS, J, H, and Ks continuum bands of each flare.The calibrated spectrum during each integration is scaled to the stellar distance and convolved with the response function of each band.Since the wavelength resolution of both NIRISS and NIRSpec changes significantly across each bandpass, we multiply the spectrum by the effective wavelength coverage of each point prior to summation.The fluxes of the convolved spectrum are summed and the uncertainties propagated to the final L 0 value.For both the lines and continuum bands, we re-compute the quiescence luminosity using the alternate flux calibration L 0,alt and take the difference in luminosity computed with the two flux calibrations, ∆L 0 =L 0 -L 0,alt , to assess systematic uncertainty.If ∆L 0 is larger than the formal uncertainty, | ∆L 0 | is adopted as the final uncertainty value for L 0 .

Measuring flare amplitudes and energies from the fractional flux light curves
The start and stop times of each flare are identified visually as excursions above the photometric noise.Fractional flux is computed as ∆F/F= F −F0 F0 where F 0 is the median of the out-of-flare flux values (Hawley et al. 2014).Small precursor events and small sympathetic events following the main flare are excluded from the median calculation.In the F3 NIRISS flare and the F1 and F2 NIRSpec flares, small flux enhancements during the gradual decay tails of flares that occurred prior to the beginning of the JWST observations are also excluded.The remaining out-of-flare flux values in the light curve are varied within their photometric uncertainties and the median flux is recomputed across 10,000 Monte Carlo (MC) trials.In each trial, 10% of the light curve is dropped to reduce biases from periods Notes.Properties of the line emission of each flare.For each flare and line, the central wavelength, FWHM of the line, significance of the spectral line relative to the spectral continuum to either side, significance of the flare peak in the light curve, time of the peak relative to the Hα peak time in min, amplitude in fractional flux, quiescent luminosity in erg s −1 , total flare energy in erg, and maximum luminosity in erg s −1 are given.A dagger denotes the flare was observed with NIRSpec, while no dagger denotes it was observed with NIRISS.For multi-peaked flares, the first peak is used to compute t rel .A machine readable version of the table is available.
of variability in the quiescent flux.Uncertainties in the median flux of the light curve σ F0 are computed as the 1σ confidence interval of the resulting distribution.
The equivalent duration (ED) is defined as the area under the fractional flux light curve between the start and stop times in units of seconds (Hawley et al. 2014).We measure the ED by summing the fractional flux values of each integration multiplied by the cadence.Uncertainty in the median out-of-flare flux F 0 propagates to the ED since lower values of F 0 produces higher normalized flux values.In order to fully account for this systematic source of uncertainty, the ED is recomputed for different F 0 values across 1000 MC trials.In each trial, the fractional flux is computed for a given median F 0,i drawn from a Gaussian distribution of mean F 0 and standard deviation σ F0 .The flux values in the resulting light curve are then varied within their photometric uncertainties and the ED is measured.Finally, the uncertainty in ED is computed as the 1σ confidence interval of the distribution of ED values.
The peak flux amplitude of each flare A fl is defined as the maximum value of the flare peak in the fractional flux light curve, and the peak luminosity of the flare is defined as the quiescent luminosity of the line or band in erg s −1 times the peak amplitude, L fl =L 0 × A fl in units of erg s −1 .The total energy emitted by the flare in a given line or band in erg is defined as E f l =L 0 × ED.Uncertainties in L 0 and ED are propagated to the luminosity and energy.The values of each flare parameter are given for the lines in Table 2 and for the continuum bands in Table 3.

RESULTS ON THE NIR CHARACTERISTICS OF THE FLARES
Small-scale variability in the Hα line exists throughout the spectroscopic time series.We identify a sample of four flares of sufficient size for multi-wavelength characterization, F1-F4.These flares produce detectable levels of continuum and line emission in the NIR and emit approximately E Hα =10 29 erg and E TESS =10 30 erg.Two overlapping flares F1 and F2 are detected with NIRSpec, which we consider as separate events since the peak phases of the flares are distinct and therefore should represent two heating episodes in the stellar atmosphere with different continuum formation temperatures.We likewise identify two flares in the NIRISS data during the TRAPPIST-1f visit and the second TRAPPIST-1b visit, F3 and F4, respectively.Among the flares, F3 is a truly exceptional event in which the star increased in brightness by a factor of 2.74 in the Hα line and overlapped the ingress and flat bottom of the Notes.Properties of the continuum emission of each flare.For each flare and band, the central wavelength of the filter, significance of the flare peak in the light curve, time of the peak relative to the Hα peak time in min, amplitude in fractional flux, quiescent luminosity in erg s −1 , total flare energy in erg, maximum luminosity in erg s −1 , and effective blackbody temperature at peak in K are given.A dagger denotes the flare was observed with NIRSpec, while no dagger denotes it was observed with NIRISS.For NIRSpec flares, t rel is computed in 30 s bins for NIRSpec to reduce scatter around the peak time.
For multi-peaked flares, the first peak is used to compute t rel .A machine readable version of the table is available.
TRAPPIST-1f transit.The majority of the NIR lines investigated in this work occurred during the F3 event.

Flare Line Discoveries and Properties
Seven lines are securely detected at NIR wavelengths of 0.7-3 µm from the F3 event and two lines are securely detected from F4 (Fig. 5).A number of hydrogen lines are securely detected for the F3 event, which are Pα, Pβ, Pγ, Pδ, and Brβ.Two helium lines are securely detected from F3, He I λ0.7062 µm and He I λ1.0833 µm IRT.Securely detected lines for the F4 event include the He I IRT and Pα lines.Lines that exceed 3σ in either the σ line or σ lcv measurements and are considered weakly detected are the Ca II IRTb and Pϵ lines for the F3 event and Ca II IRTc and Pβ lines for the F4 event.The Hα line is strongly detected in all four flares.The NIR lines are connected to the optical by the Hα detections, as Hα is the primary line used to assess flares from low-mass stars (Kanodia et al. 2022).

The Balmer, Paschen, and Brackett lines
In general, a decrement in line energy is expected for the higher order transitions within each series.The Paschen decrement has been confirmed in several flares at and above the 6→3 Pγ transition (Kanodia et al. 2022).However, Schmidt et al. (2012) reported a higher peak luminosity for Pγ than P β for two large flares from EV Lac, which raises the intriguing possibility that the lower energy states may not be energetically favored.As shown in the left panel of Figure 9, we observe a reversed Paschen decrement for Pα-Pγ before the luminosities decrease again from Pγ to Pδ-Pϵ.Pγ appears to sit at the peak of the energy partition curve in Figure 9, but it is worth emphasizing that the Pβ and Pγ luminosities are within 1σ.Likewise, the Pδ and Pϵ line luminosities are the same within errors, consistent with the Paschen decrement observed for Pγ-Pϵ in the literature (Fig. 9).The wings of He I IRT line 107 Å away are a potential source of line contamination for Pγ.However, the majority of the flux at 1.0941 µm in Figure 4 is clearly associated with the Pγ line since the two lines are 4 FWHM apart.Flares with multiple Paschen lines in the literature are plotted for comparison to F3 and F4.These include two flares from the M4 dwarf EV Lac and one flare from the M4 dwarf YZ CMi (Schmidt et al. 2012), two flares from the M8 dwarf vB 10 ( Kanodia et al. 2022), and four flares from the M9.5 dwarf TIC 26891612 (Liebert et al. 1999).The largest flare from EV Lac shows a similar increase in the Pγ luminosity relative to Pβ and Pδ seen in our flares.This is significant for two reasons.First, the similar energy partition of the Paschen series in flares from both an M4 and an ultra-cool dwarf strengthens the case this behavior could be typical.Second, the resolution of the spectra in Schmidt et al. (2012) is much higher than NIRISS.The separation between the He I IRT and Pγ lines is fully resolved in Figure 1 of Schmidt et al. (2012).
The shape of the flare light curve also changes from Pα to Pϵ.The change in shape can be seen most clearly when the flare light curves are normalized to one, as in the right panel of Figure 10.Here, the peaks and FWHM times of the Hα and Pα light curves evolve slower than do those of Pβ and Pγ.We quantify this change using the impulsiveness index of Kowalski et al. (2013), defined as the peak amplitude of a flare in fractional-flux units over its FWHM time in minutes, I fl =A fl /∆t FWHM (Kowalski et al. 2013).Spiky flares have a high impulsiveness index while more gradual flares have a low impulsiveness index.We modify this definition and define the line-normalized impulsiveness index I line,norm to account for changes in the flare luminosity at different wavelengths.For example, the Hα and Pα light curves are nearly identical when their peaks are normalized to one (Fig. 10), but the 100× greater flux in Hα gives it a 100× higher impulsiveness index than Pα.We therefore normalize the flare light curves to one before computing I line,norm .In a similar fashion, different flares last for different amounts of time.The different evolution timescales of different flares are captured by the FWHM time ∆t FWHM and set the y-scale of the impulsiveness index.To facilitate comparison between different flares, we normalize the FWHM times of the light curves for each line in our F3 event and for the larger EV Lac flare from Schmidt et al. (2012) by the FWHM time of the Pβ line.Pβ is chosen as the lowest-order line available in both data sets.Finally, we compute I line,norm for the flares once the amplitudes are set to one and the widths are in FWHM units rather than minutes.The results are plotted for our F3 event and the large EV Lac flare from Schmidt et al. (2012) in the left panel of Figure 10.We see that the impulsiveness increases from Pα through Pγ before decreasing again from Pγ through Pϵ.
Next, we look at the rise time for each line, or how long it takes after the flare start time for the light curve to reach its maximum value (Howard & MacGregor 2022).In the left panel of Figure 11, the Pα-P δ lines peak during one of two integrations.The lower order lines peak one integration later than the higher order lines up through Pδ.The Pϵ and Brγ light curves have two peaks, although the signal to noise is lower and hence the difference in peak time may not be significant.The first peak of the Pϵ and Brγ light curves appears to occur an integration earlier than Pγ-Pδ and two integrations prior to Hα and Pα, while the second peak is coincident with Hα and Pα.In the right panel, the relative evolution of the luminosities is shown.We normalize the luminosity light curves by the Hα light curve to search for departures from the dominant transition of hydrogen.The L line /L Ha ratios of the Paschen lines are ∼0.1 for most integrations, while the Brγ ratio is ∼0.02.The Pβ and Pγ lines diverge most strongly from this value during the flare peak compared to the other lines.Pϵ and Brβ show similar luminosity curves.
Insight into the reversed Paschen decrement, rise times, and impulsiveness of the lines may be gained from a similarly reversed decrement for the Balmer series.RADYN models of the relative intensities of the Balmer series find Hα to be weaker than Hβ-Hγ, with a peak in intensity at Hγ (Kowalski et al. 2022).This is because the effective temperature of the source contribution function of the Hγ line is ∼1000 K higher than for Hα, producing more overall flux.On the Sun, the region of the chromosphere where Hα is formed is optically thicker than for Hγ, producing greater collisional pressure broadening of the Hα line in flares.However, the height from the photosphere at which Hα escapes (i.e. at an optical depth of τ ≈ 1) is greater than for the optically thin Hγ region (Kowalski et al. 2022).If Pα is similarly formed at a lower effective temperature in an optically thick region and reaches τ ≈1 at a greater height from the photosphere than Pγ, a reversed decrement would also be expected.Furthermore, this scenario naturally results in the increased impulsiveness of the Pβ-Pγ light curves relative to Pα and Hα.As the precipitating electron beam decays during the flare, fewer electrons would interact with the plasma in the optically thin layer where Pβ-Pγ are formed than the optically thick layer where Pα is formed (Kowalski et al. 2017a).The faster drop in excitation of the Pβ-Pγ lines then produces more impulsive peaks compared to the Pα and Hα lines where excitation continues for a longer time.

Helium and calcium lines
The strongest helium line is the He I IRT, which appears as a single line at the resolution of NIRISS.The He I IRT has been observed during a number of solar (e.g.You & Oertel 1992;Penn & Kuhn 1995;Zeng et al. 2014;Kerr et al. 2021) and stellar (Schmidt et al. 2012;Fuhrmeister et al. 2020;Kanodia et al. 2022) flares.In solar flares, He I IRT emission results from recombination after extreme UV (EUV) photoionization, collisional excitation, and collisional ionization in the chromospheric footprints of flare loops (Ding et al. 2005).The IRT occurs alongside soft X-ray and EUV emission in moderate to large solar flares (Ding et al. 2005), making it valuable as a diagnostic of the X-ray and EUV radiation environment of exoplanets during transmission spectroscopy (Fuhrmeister et al. 2020).We also detect the He I λ0.7062 µm line seen previously in the Liebert et al. (1999) and Fuhrmeister et al. (2008) stellar flares.The 0.7062 µm He I line is the strongest helium line in the NIR after the IRT, explaining why this is the only other He I  (Liebert et al. 1999;Schmidt et al. 2012;Kanodia et al. 2022).Both our flares and the strongest EV Lac flare indicate that higher Pγ luminosities might be a typical feature of NIR flares.Right: The same as the left panel, but for the helium and calcium lines.line we detect (Liebert et al. 1999).
In the right panel of Figure 9, we compare the helium and calcium line luminosity values of the F3 and F4 events against literature values.The Hα luminosity is also included as a reference point.Our flares have the lowest luminosity in the He IRT observed to date, 5× lower than the Kanodia et al. (2022) flares from vB 10 and 60× lower than the Schmidt et al. (2012) flares from EV Lac.The low luminosity of our F3 and F4 flares enables us to confirm that line ratios hold across a range of flare sizes.The four flares from the M9.5 dwarf TIC 26891612 have Hα to λ0.7062 µm He I ratios L Hα /L He I of 38.1±2.5, 36.7±2.5, 43.6±4.1 and 93±11, ordered from the largest flare to the smallest.Our F3 flare has a ratio of L Hα /L He I =35.5±2.6,consistent with the larger three flares from TIC 26891612.For another example, Kanodia et al. (2022) estimate the Hα luminosity of the vB 10 flares from the ratio L Hα /L He I IRT =6.66 since the spectral range of the HPF does not observe both lines.We measure L Hα /L He I IRT values of 11.2±2.6 and 7.2±2.4for F3 and F4, respectively.If these ratios hold for the larger flares observed from vB 10, the Hα emission associated with its flares may be slightly higher than expected.

Results for the NIR continuum of the flares
Continuum is clearly observed in the flare peak from 0.6 to 3.5 µm in the F1 and F2 NIRSpec flares and from 0.6 to 2.8 µm in the F3 NIRISS flare as shown in Figure 7.The F4 flare is weaker, with continuum present out to at least ∼2.5 µm and possibly further.The NIRSpec data reduction does not produce fluxes beyond 3.5 µm.The mean and 1σ errors of the flare-only flux density do not drop to zero even at the longest wavelengths of each spectrum in Fig. 7.The mean continuum flux density of the F1 and F2 peaks for 3-3.5 µm at the distance of Earth is 3.9±0.14×10−18 and 1.6±0.14×10−18 erg s −1 cm −2 Å−1 , respectively.The mean continuum flux density of the F3 and F4 peaks for 2.5-2.8µm at the distance of Earth is 2.5±0.23 ×10 −18 and 7.3±1.4×10−19 erg s −1 cm −2 Å−1 , respectively.

Properties of the spectral continuum
The Planck function fits the spectra of the flares reasonably well as shown in Figure 7.We confirm that the continuum is indeed the dominant source of the flare spectrum at most wavelengths outside the hydrogen and helium lines.As a further check, we divide the blackbody model spectrum by the flare spectrum to see if they are consistent with one.The ratios of the F1, F2, F3, and F4 flares are 1.04±0.27,1.05±0.53,1.05±0.42and 0.82±0.81,respectively.The best-fit blackbody temperature for each flare peak is considerably lower than the canonical 9000 K blackbody temperature.Cooler flare temperatures were previously observed in broadband photometry of two flares from TRAPPIST-1 by Maas et al. (2022).Although these temperatures are lower than 9000 K, they are broadly consistent with the temperatures of small flares in the flare energy to temperature scaling relationship from Howard et al. (2020).Furthermore, the   2017) is shown in gray and scaled to the TESS bandpass by Seli et al. (2021).The flare rate of TRAPPIST-1 analog stars observed by the TESS mission in its full-frame images is shown in orange and is higher than that predicted from K2.The flare rate measured by JWST agrees with the flare rate measured from the TRAPPIST-1 analogs, suggesting flares are likely to contaminate more transits than suggested by the K2 data.NIRSpec flares are noted with a dagger.Right: Cooler flares lead to an under-estimated TESS-band energy when scaling from the Kepler band using a 9000 K blackbody.Both 9000 K and 5000 K flare spectra emit the same energy in the Kepler band, but the TESS band energies are higher for the cooler flare by 46%.
spectral range of our flares is redder than the wavelengths at which flare temperatures are usually measured.Two blackbody components are regularly observed in optical flares, with a hot >10,000 K blackbody component that decays quickly and a cooler ∼5000 K blackbody that decays slowly.The hot component is thought to trace prompt emission at the flare footprints due to the initial episode of particle acceleration, while the cool component is thought to trace emission in larger flare structures heated by the earlier prompt emission (Kowalski et al. 2016).The cooler component dominates the flux at longer wavelengths and may impact our derived temperatures.

Properties of the broadband emission
Peak luminosity decreases with flare size, effective temperature, and longer wavelengths as listed in Table 3. Within each flare, the peak luminosity drops across the broadband filter measurements from the TESS band to the J, H, and Ks bands as shown in the right panel of Figure 12.A drop in luminosity was also observed in a sample of flares observed in Stripe 82 of the Sloan Digital Sky Survey (SDSS) and the Two Micron All Sky Survey (2MASS).We reproduce scaling relations for the J, H, and Ks bands from Davenport et al. (2012) and plot them alongside our observations.An earlier search for flare emission in the 2MASS bands by Tofflemire et al. (2012) did not produce any detections and is not considered here.We note the 2MASS flares in Davenport et al. (2012) were observed for only a single point in time and that their i band is similar to the TESS band.We linearly extrapolate an M8 relation from the M4-M6 dwarf relations in Davenport et al. (2012) after verifying the flare amplitudes vary nearly linearly with stellar mass from 0.1-0.2M⊙ .Our flares show a shallower drop in the NIR bands relative to TESS than do the Davenport et al. (2012) flares.The discrepancy could be due to changes in the effective temperature of the flares with stellar mass and the difference between the peak luminosity measured for our flares and the decay phase luminosity observed for many of the 2MASS flares.
The shape of the broadband light curves changes from the TESS band through the Ks band.As shown in Figure 8, the peaks of the flares are impulsive in the TESS band and less pronounced in the 2MASS bands.The H and Ks light curves have a particularly noticeable period of elevated emission in the decay phase.The gradual decay observed at the longest wavelengths is consistent with the longer decay phase of the secondary emission of the cooler blackbody component in the two-temperature flares observed in optical spectra (Kowalski et al. 2016).The rise time of the TESS and 2MASS band flares varies between events.The bands peak at the same time in the F1 flare, but longer wavelength bands peak at later times for the F2-F4 flares.

The flare rate of TRAPPIST-1
Four flares observed in the TESS bandpass observed over a total of 26.8 hr place strong constraints on the true flare rate of the star.We measure the cumulative flare frequency distribution (FFD) for the TESS band energies to determine whether the flare rate derived from K2 observations of the star or TESS flares observed from M8 TRAPPIST-1 analog stars is more accurate.We also desire to place the NIR properties of our flares into the context of how often a flare of that size occurs from the star.The cumulative FFD is measured from the number of flares of energy E TESS or greater observed across the total monitoring time.Uncertainties on the cumulative occurrence rate for flares of each energy are measured with a binomial 1σ confidence interval.We find flares equivalent in size to our F4 event occur at a rate of 3.6 +2.1 −1.3 flare d −1 , and flares equivalent in size to our F3 event occur at a rate of 2.7 +2.0 −1.1 flare d −1 .Flares the size of our largest event, F1, occur at a rate of 0.9 +1.4 −0.5 flare d −1 .The F4, F3, and F1 flares therefore occur in our data at 5.7, 11.7, and 21× the rates predicted by scaling from the K2 flares, respectively.As shown in the left panel of Fig. 13, our FFD excludes the flare rate estimated by scaling the K2 flares into the TESS bandpass, but agrees with the FFD computed from the TRAPPIST-1 analog stars of Seli et al. (2021).These results suggest that the underlying spectral energy distribution of the flares in the Kepler and TESS wavelength regimes may differ and complicate scaling relations between wavelengths.Assuming a 9000 K temperature for a 5000 K flare when scaling from the Kepler to the TESS bandpass under-estimates the TESS band energy by 46% (right panel in Figure 13).We also find the true rate of flares that may contaminate transit spectroscopy observations is at the higher end of the literature range.

RADIATIVE HYDRODYNAMIC MODELING OF THE FLARES
Non-local thermodynamic equilibrium (non-LTE) radiative hydrodynamic models are able to match NUV/optical spectra of stellar flares (e.g.Kowalski et al. 2016Kowalski et al. , 2017b;;Kowalski 2022;Brasseur et al. 2023), but have not been compared with NIR observations.The RADYN code (Carlsson & Stein 1995, 1997) solves the coupled non-LTE radiative transport and hydrodynamics equations for the stellar atmosphere during nonthermal electron beam heating in flares (Allred et al. 2006(Allred et al. , 2015)).A grid of RADYN models exploring a broad range of injected electron energies has recently become available (Kowalski 2022), enabling us to test if the line and continuum emission of our flares is consistent with non-LTE models.The RADYN models each inject a non-thermal beam of electrons into the plasma according to a power-law distribution of energies given by the index of accelerated electrons δ, varying the maximum flux and low-energy cutoff.The models assume apex coronal conditions of a starting atmosphere with an effective temperature of ∼3600 K, a magnetic loop half-length of 10 9 cm, log g of 4.75, ambient electron density of 3×10 10 cm −3 , a gas temperature of 5 MK, and a uniform cross-sectional area.The response of the atmosphere is computed across a 1 s increase and 10 s decrease of the beam intensity following the pulsed injection profile described in Aschwanden (2004).The spectra are then averaged across the 10 s period to produce a fiducial spectrum of the flare.The hydrogen lines were calculated using the pressure broadening profiles of Tremblay & Bergeron (2009) as described in Kowalski et al. (2022).Further details are given in Kowalski et al. (2017b) and Kowalski et al. (2023, in prep).Model outputs in the spectral range of our observations are given as emergent radiative fluxes at the stellar surface in erg s −1 cm −2 Å−1 and include line profiles for the Hα and He I IRT lines and continuum emission through 2.83 µm.In addition to stellar flares, solar flares have also been observed at NIR wavelengths (Simões et al. 2017;Penn et al. 2016).Both a high-energy and low-energy electron beam are needed to match observations of solar and stellar flare spectra (Kowalski 2022;Brasseur et al. 2023).The high-energy beam penetrates deep into the chromosphere where it thermally ionizes the plasma and induces pressure broadening of the hydrogen lines.Observations of solar flares indicate high-energy beams are coincident with the compact flare kernel responsible for the majority of the continuum emission (Kawate et al. 2016).The low-energy beam heats plasma in the upper layers of the chromosphere and produces strong, narrow hydrogen lines and a faint continuum.On the Sun, this emission is thought to occur in the elongated flare ribbons.We adopt the mF13-85-3 (henceforth F13) and m5F11-25-4 (henceforth 5F11) RADYN models for our high-energy and low-energy beams, respectively.The mF13-85-3 model injects an electron beam with a maximum flux of 10 13 erg s −1 cm −2 , a low-energy cutoff of 85 keV, and δ=3.The m5F11-25-4 model injects an electron beam with a maximum flux of 5×10 11 erg s −1 cm −2 , a low-energy cutoff of 25 keV, and δ=4.These two models are representative of the allowed range of electron beam properties in RADYN (Brasseur et al. 2023).
We subtract the continuum from the Hα and He I IRT model spectra to obtain the flux due to the lines (Kowalski 2022).A linear superposition of the F13 and 5F11 line fluxes is defined with the filling factors as free parameters as in Kowalski (2022), shown in Eq. 3: Here, f λ,model,⊕ is the flux density of the combined model at the distance of Earth, X F13 and X 5F11 are the filling factors, and S λ,F13 and S λ,5F11 are the emergent flux densities of the models.The relative filling factor X rel =X 5F11 /X F13 describes how much larger the effective area of the flare ribbon is compared to the kernel.We bin the combined model f λ,model,⊕ to the wavelength resolution of NIRISS and fit the spectra of the Hα and He I IRT lines shown in the middle and right panels of Figure 14 to those of our F3 flare.The result is shown in the left panel of Figure 14.The fit produces reasonable filling factors of X 5F11 =0.0038 and X F13 =0.0018 for an X rel of 2.1.These filling factors are typical of large flares and consistent with the filling factors derived from the simple blackbody fits in §5.2.Our X rel of 2.1 is comparable to the X rel =2.28 measured from the Hγ line for the Great Flare of AD Leo (Kowalski 2022).The models do not reproduce the continuum well, suggesting X-ray/extreme-UV backwarming (Hawley & Fisher 1992) or other processes may be more important than in optical spectra.The continuum of the 5F11 ribbon component is within a factor of ∼3 of the observed spectrum, but the F13 continuum from the kernel is 5-30× higher.The 5F11 model also dominates the fit to the line centers, indicating ribbons are a key source of flare emission in the NIR.Fitting the F4 flare spectrum produces X rel ≈3, but the S/N of the fit is low.Since our empirical blackbody fits trace the continuum more closely than the model, we employ our blackbody fits to decontaminate the transit spectra.

CORRECTING FLARE CONTAMINATION IN TRANSIT SPECTRA USING OUR MODELS
We correct for flare contamination in the transit spectra of TRAPPIST-1f and b during the F3 and F4 events, respectively.The best-fit blackbody model with parameters T eff and X eff from the flare-only spectrum is subtracted from the original spectral time series.Wavelength points with candidate lines are also masked out.We do not perform blackbody fits for integrations outside of the flare start and stop times as the lack of continuum outside these times can lead to an unconstrained fit.We also do not correct for contamination in the TRAPPIST-1g transit from the F1 and F2 flares as the transit occurs far into the decay phase of the flares where the continuum signals are weakest.The large F3 flare during the transit of TRAPPIST-1f shows the clearest improvement across all wavelengths in the spectrum.As shown in Figure 15, the flare peaks just prior to ingress in the uncorrected data, but is almost completely absent in the corrected spectrum.The smaller F4 flare during the flat bottom and egress of the TRAPPIST-1b transit shows some improvement in Figure 16, but the continuum fits do not remove the flare as cleanly.
The contamination correction does not have a minimum wavelength bin size, but is illustrated for representative wavelengths across the full spectral range of NIRISS.Corrections to the first-order spectra are clearer than the second order due to the higher precision in the first order.For both transits, the blackbody fits perform best at wavelengths from approximately 1.0-2.4µm and worse at wavelengths close to 0.6 and 2.83 µm.In the F3 flare, the blackbody that best fits the spectrum of the flare from 0.6-2 µm slightly overshoots the flux for λ >2.0 µm.The extreme Rayleigh-Jeans tail does not constrain the flare temperature (Fuhrmeister et al. 2008), so we use the global fit for the determination of T eff and X eff .However, we fit a second blackbody to the spectrum from 1.7 µm-2.83µm in order to decontaminate the spectral time series in the Rayleigh-Jeans tail beyond 1.7 µm for F3 and 2.0 µm for F4 where the global fit slightly over-estimates the flux.
We explore the efficacy of the contamination correction by computing the fractional change to the transit depth between the original and corrected transit spectrum.The comparison is performed for the integration during which the flare is brightest in white light.The results are shown for the TRAPPIST-1f and b transits in Figure 17.The wavelength dependence of the correction varies inversely with the contrast of the flare against the stellar spectrum.The peak contamination in the TRAPPIST-1f transit light curves due to the larger F3 event ranges from 20000±1700 ppm at 0.7 µm, 2100±400 ppm at 2 µm and 2700±900 ppm at 2.8 µm, assuming the best-fit blackbody.The contamination in the TRAPPIST-1b transit light curve from the smaller F4 flare ranges from 4100±1800 ppm at 0.7 µm, 500±450, and 650±960 ppm at 2.8 µm (i.e.consistent with zero).
We also quantify the improvement in the transit signal as a function of wavelength.Residuals are computed for the difference between the transit model and the original and flare-corrected light curves.The root mean square (RMS) of the transit light curves is computed for the corrected and uncorrected transit light curves and the ratio RMS corr /RMS uncorr is plotted for the TRAPPIST-1f and b transits in the bottom panels of Figure 17.The transit improvement RMS corr /RMS uncorr of the F3 flare is better than 0.5 for wavelengths between 1.0-2.4µm, with no measurable improvement near 0.6 µm.The best improvement in F3 is observed at 1.12-2.1 µm, where ∼80% of the flare contamination is removed.The improvement of F4 is not well constrained.The wavelength bins with the smallest errorbars (e.g.0.9-1.2µm) are each consistent with values from 10-60%.

DISCUSSION AND CONCLUSIONS
We have characterized the combination of line and continuum emission of stellar flares at NIR wavelengths of 1-3.5 µm for the first time.Several spectral features of major carbon and oxygen bearing species are produced in this range, such as the ∼100 ppm CO 2 features at 2.0 and 2.8 µm.However, the most significant CO 2 feature expected in the atmospheres of the TRAPPIST-1 planets exists at 4.3 µm (Lustig-Yaeger et al. 2019).We estimate our larger F3 flare would have produced a 300-1600 ppm signal at 4.3µm through extrapolation of the best-fit blackbody spectrum.Such a signal would be larger than the 160 ppm CO 2 feature at 4.3 µm.
The strongest lines in the largely unexplored wavelength range of 1-3.5 µm are the Paschen α to δ lines, Brackett β and the He I IRT.We report the first detection of Pα and Brβ in a stellar flare from the NIRISS spectra of our F3 event.All lines from 0.7-2.8µm are at least an order of magnitude weaker than Hα, with the exception of Pβ and Pγ which are 0.7 dex weaker.We discover that the intensity of the Paschen lines does not decrease sequentially for higher transitions.Instead, intensity increases from Pα to Pγ and then decreases out to Pϵ. Future work is needed to model the conditions that induce increased flux for the less energetically favored states.
The NIRISS and NIRSpec observations of four flares from TRAPPIST-1 reveal that flare continuum exists out to at least 3.5 µm, and is well-described at these wavelengths by a blackbody with an effective temperature of ∼5000 K.Note the ∼3000 K peak temperatures of the NIRISS flares are due to the longer integration times.These temperatures are much lower than the 9000 K blackbody typically assumed when estimating the amount of flare emission at any given wavelength (Osten & Wolk 2015), but comparable to those seen in several red optical flare spectra (Kowalski et al. 2016).We confirm the continuum is the dominant source of flare contamination.Indeed, the continuum gives rise to signals of 10 3 ppm at wavelengths of 1-3 µm for even small flares of E TESS =10 30 erg.Multiple flares of this size occur each day, requiring care in the reduction and interpretation of the transit spectra.Smaller flares of E TESS =10 29 erg occur an order of magnitude more frequently, but are expected to lead to lower levels of contamination of 30-150 ppm at 4.3 µm.For context, an E TESS =10 30 erg flare produces a mean contamination level of ∼1900 ppm from 1-3 µm, while an E TESS =10 29 erg flare produces a mean contamination level of ∼200 ppm from 1-3 µm.In this work, we focus on contamination of the transit spectra by the flares.However, photochemistry due to the high-energy radiation of the flares may dominate the molecular abundances detectable in the planetary atmospheres with JWST (Tsai et al. 2023;Batalha et al. 2023).Several flares have been observed simultaneously in the TESS bandpass and at X-ray/UV wavelengths (e.g.MacGregor et al. 2021;Paudel et al. 2021;Jackman 2022).Jackman et al. (2023) use TESS and GALEX flare rates of field age late M-dwarfs to show that far UV (FUV) and near UV flare energies exceed those predicted from the TESS energies using pseudo-continuum templates of ∼9000 K by factors of 30.6±10.0 and 6.5±0.7,respectively.Underestimation of the FUV energies may be even higher for a 5000 K blackbody than the assumed 9000 K blackbody.Similarly, the He I IRT emission in flares observed by JWST can be used to estimate the soft X-ray and EUV emission by scaling from the multi-wavelength properties of solar flares of comparable size to those in this work (Ding et al. 2005).A C3.9 solar flare observed simultaneously in the GOES 1-8 Å soft X-ray band and in the λ1.083 µm He I IRT reached a luminosity ratio of L SXR /L He I IRT ≈440 (Zeng et al. 2014).We obtain this ratio by multiplying the emergent flux in the He I IRT line by the area of the flare extraction regions and propagating the flux to 1 au to compare with the GOES X-ray flux.If a similar ratio holds for our flares, the F3 and F4 flares would have emitted 1.3×10 28 and 5.7×10 27 erg s −1 in the 1-8 Å GOES band, respectively.The F3 and F4 events would therefore be large X45-X20 flares, which are associated with extreme coronal mass ejections from the Sun (Youngblood et al. 2017).The flat spectrum of TRAPPIST-1b and c in recent MIRI observations by Greene et al. (2023) and Zieba et al. (2023) are consistent with no-atmosphere scenarios that could result from high fluxes of stellar energetic particles (Tilley et al. 2019).These results are also consistent with the spectrum of TRAPPIST-1b observed with NIRISS SOSS in Lim et al. (2023), which showed no evidence of an atmosphere on the planet.ow effective temperatures may not necessarily imply favorable conditions for pre-biotic chemistry or survivable levels of UV flux for surface life.Low effective temperatures have been previously reported for the decay phase of TRAPPIST-1 flares by Maas et al. (2022), who consider the impacts of these low temperatures for the habitability of planets around ultracool dwarfs (UCDs).While hot flares emit a greater portion of their energy at NUV wavelengths, cool flares do not emit nearly as much high-energy radiation.Since photochemistry in both exoplanet atmospheres (Ranjan et al. 2020;Chen et al. 2021) and biochemistry (Ranjan et al. 2017;Abrevaya et al. 2020) is sensitive to light at wavelengths of 200-300 nm, lower flare temperatures could mean UV surface conditions are more favorable for life than often assumed (Kowalski 2022).
However, pre-biotic chemistry on planets around M-dwarfs requires UV radiation from flares due to the lack of quiescent UV emission, assuming the RNA world hypothesis (Ranjan et al. 2017;Rimmer et al. 2018).Cooler flares from UCDs would produce less energy for origin of life scenarios for systems like TRAPPIST-1.Furthermore, neither the Maas et al. (2022) study nor our own sufficiently probes the wavelengths for the hot blackbody component in the two-temperature flare scenario (Kowalski et al. 2016).The hotter blackbody of 10,000-12,000 K is dominant below 0.5 µm, while the shortest broadband filter used in the Maas et al. (2022) study is g (400-550 nm).Likewise, our spectra only extend down to 0.6 µm.Maas et al. (2022) observe the flare temperatures decrease with color across g-r, g-i, and g-z s , which could be due to the increasing prominence of the cooler component at longer wavelengths.
Our work illustrates the utility of JWST transit spectroscopy for stellar astrophysics in addition to exoplanets.The spectral time series of M-dwarfs obtained as a by-product of exoplanet programs provide a rich dataset for the characterization of stellar activity.Multiple programs targeting TRAPPIST-1 and other high priority exoplanet systems have and will continue to obtain data on the host stars, substantially increasing the temporal baseline of the stellar variability of M-dwarfs at NIR wavelengths.At the same time, we find that understanding the line and continuum emission of the flares enables the amount of contamination to be assessed and even mitigated in the transit spectra.Along these lines, Flagg et al. (2023, in prep.) are applying flare mitigation based on this work for specific features in the transmission spectra such as for major carbon and oxygen bearing species.We recommend masking the lower-order Paschen and Brackett series lines and fitting blackbody models to the continuum to decontaminate transit spectra during flares.As in Lim et al. (2023), we also recommend caution when interpreting features in the transit spectrum if excess Hα emission is observed.
As it becomes available, archival JWST transit spectroscopy of early and mid M-dwarf flare stars such as LTT1445A and L 98-59 will enable us to explore whether stellar mass and age impact the NIR properties of the flares.The flare rates of the host stars of ∼3 flare d −1 are sufficiently high to contaminate transits (Howard 2022;Brown et al. 2022).Our results suggest mitigation of the flare contamination may be possible and increase the scientific impact of the transit observations.A survey of the flaring TESS Objects of Interest found 25.7 +12 −7 % of potentially-rocky TESS planets suitable for JWST transit spectroscopy orbit flare stars.Beyond JWST, flares have been observed during transits of planets from the far-UV (Feinstein et al. 2022b) to the mid-IR (Gillon et al. 2017).Based on flares observed in the FUV during two transits of AU Mic b, Feinstein et al. (2023) estimate flare contamination from 1-2 µm will occur at the 200 and 70 ppm level for AU Mic b and c, respectively.Likewise, K2 observations of a flare during the transit of TRAPPIST-1h (Luger et al. 2017) and Spitzer observations of five flares at 4.5µm overlapping several transits of TRAPPIST-1 planets (Gillon et al. 2017;Ducrot et al. 2020) induce contamination as described in Davenport (2017).Retrievals of atmospheric species at these and other wavelengths may also benefit from improved flare modeling.specific observations analyzed can be accessed via doi:10.17909/b0ee-8j61,https://doi.org/10.17909/b0ee-8j61.These observations are associated with programs GTO 1201 and GO 2589.The authors acknowledge the Lim team for developing their observing program with a zero-exclusive-access period.

Figure 1 .
Figure 1.Overview of stellar activity in the Hα line during transit spectroscopy observations of TRAPPIST-1 used in this work.During the JWST observations, the Hα line shows considerable variability, including flares as described in §3.5 and §4.2.Flares with high enough signal for multi-wavelength characterization (approximately EHα=10 29 erg in the Hα line) are numbered by TESS band energy.Energies in these bands are given in Tables2 and 3.The x-axis time scales of the top and bottom panels are fixed within each panel to better observe changes in activity between observations.The final transit of TRAPPIST-1g is 44.4 d after the previous observation, so a break in the x-axis (but still at the same time scale) is used to include this data in the figure.NIRSpec observations are binned to the same cadence as NIRISS and have a lower photometric uncertainty.

Figure 2 .
Figure 2. Comparison of flare photometry from our implementation of the supreme-SPOON and transitspectroscopy pipelines for lines from the F3 flare at representative wavelengths.Both reductions produce qualitatively similar flare light curves.However, we find the supreme-SPOON reduction is more optimized for flare photometry as the point-to-point variability during the peak phase is reduced by >20%.This effect is particularly noticeable in the second peak in the Hα light curves and the Pβ peak.

Figure 3 .
Figure 3. Broadband light curves of the flare-affected transits of TRAPPIST-1b and f.The transit models are shown in orange, and the exponential fit to the integrations in both the transit and flare is shown in blue.The exponential fit is necessary to determine the shape of the transit light curve.As shown in the O-C diagrams at the bottom, we use an initial guess of c1 and c2 to fit the exponential, then subtract the exponential to obtain final values of c1 and c2.We note c1 and c2 represent the coefficients of the quadratic limb darkening equation.The first and third O-C diagrams are directly comparable and show the light curve with the final transit model removed.The middle O-C diagram shows an intermediate step in the determination of the TRAPPIST-1f transit model fit in which the initial guess on the transit model is subtracted prior to fitting and subtracting the exponential.The residuals of the middle panel therefore show the quality of the exponential fit.
4. CHARACTERIZING THE NIR LINE EMISSION OF THE FLARES 4.1.Creation of flare-only spectra

Figure 4 .
Figure 4. Line emission is observed in the flare-only spectrum of the two NIRISS flares.Top: The F3 event is larger, with more candidate or confirmed lines during the flare peak.Bottom: The F4 event is the weakest flare for which lines beyond Hα could be detected.

Figure 5 .
Figure 5. Left: the light curves of the lines and candidate lines identified in the larger F3 flare.Pα and Brβ are the first detections of flare emission from a main sequence star in these lines.Right: The light curves of the lines and candidate lines identified in the smaller F4 flare.∆F/F is the fractional flux as defined in §6.2

Figure 6 .
Figure6.Separation of Hα from the continuum contribution in the same NIRSpec wavelength point.On the left, the wavelength point containing Hα is highlighted in dark red, while two continuum reference wavelengths are highlighted in grey.In the second panel, the light curve at 0.6565 µm is displayed, presumably that of the Hα line.In the third panel, light curves are extracted for each wavelength point and compared (with an offset applied in the plot).The average of the two reference light curves is divided from the target light curve to remove behavior common to all three wavelengths.On the right, the resulting corrected Hα light curve is shown.

Figure 7 .
Figure 7. Continuum is observed in each event.The continuum of the NIRSpec flares (top) is readily discernible without binning due to the low wavelength resolution.The wavelength points of the NIRISS flares (bottom) are binned to more clearly illustrate the flare spectrum, with the native resolution data shown in light gray and the binned spectrum in dark gray.The best-fit Planck function and uncertainties are displayed in red.The higher temperatures of the NIRSpec flares are a result of the higher cadence.Binning the NIRSpec flares to 105 s cadence produces nearly identical spectra to the NIRISS flares.For comparison, the effective temperature of the star is 2628 K.All spectra are plotted to the wavelength range of NIRISS to facilitate comparison.

Figure 8 .
Figure8.Overview of the continuum properties of the flares.The F1 and F2 NIRSpec events are shown together in the left column of panels, while the F3 and F4 NIRISS flares are displayed in the middle and right columns.For each flare, the best-fit effective temperature and filling factor are shown for the integrations, as well as the broadband light curves for the TESS and 2MASS filters.The temperature is hottest near the beginning of the flares for the F1-F3 events, and cools throughout each event.It is difficult to compare the times of maximum temperature and peak flux for the F4 event due to the binning of integrations to increase the signal to noise.The flare area expands as the plasma cools, leading to an increase in the filling factor.The TESS band light curve evolves fastest, with the H and Ks band light curves evolving the slowest.

Figure 9 .
Figure 9. Left: Luminosity emitted in each hydrogen line during the flare peak.Other flares with multiple NIR lines detected simultaneously are displayed for reference(Liebert et al. 1999;Schmidt et al. 2012;Kanodia et al. 2022).Both our flares and the strongest EV Lac flare indicate that higher Pγ luminosities might be a typical feature of NIR flares.Right: The same as the left panel, but for the helium and calcium lines.

Figure 10 .
Figure 10.Left: The relative impulsiveness of the flare light curve for each hydrogen line in our F3 event (red) and the largest EV Lac flare from Schmidt et al. (2012).More impulsive flares have light curves that are spiky, while less impulsive flares have light curves that are gradual.To facilitate comparisons with different flare light curves of different amplitudes and FWHM times, we normalize amplitudes to one and FWHM times by the FWHM of the Pβ line of each flare prior to computing the impulsiveness index I line,norm = A fl,line /F W HM line .The impulsiveness of both flares reaches a maximum at Pγ, suggesting that its excitation period is brief and intense relative to the lower and higher order transitions.Right: Light curves of the F3 lines are peak-normalized to show changes in the light curve shape across the Paschen series, described by the line-normalized impulse I line,norm .Pα and Hα are shown with solid lines to emphasize their similar shapes, while the other lines are shown with dashed lines.

Figure 11 .
Figure 11.Left: Relative timing of hydrogen line emission during the F3 flare.The faster rise of the higher order Pβ-Pδ lines likely reflects a shorter period of excitation compared to the Hα and Pα lines.Note Hα has been scaled down by 100× to facilitate comparison with the weaker lines.Right: Relative luminosity of the hydrogen lines during the F3 flare, normalized by the luminosity of Hα.Pβ and Pγ exhibit both the highest peak luminosity and fastest decay.The Hα light curve is shown for reference.

Figure 12 .
Figure 12.Left: Effective temperature of the flares during the peak phase T eff as a function of TESS band energy ETESS.The T eff values are lower than the peak temperatures often observed in earlier M-dwarf flares (Kowalski et al. 2013), as well as those reported in a sample of TRAPPIST-1 flares by Maas et al. (2022).For reference, we display the distribution of peak flare temperatures as a function of TESS energy for early and mid M-dwarfs observed by Evryscope and TESS (Howard et al. 2020).The ETESS-T eff scaling relationship from Howard et al. (2020) is also shown in black and qualitatively agrees with the lower temperatures of the TRAPPIST-1 flares.Right: Scaling relations from the TESS band to the 2MASS broadband luminosities during the flare peaks.Model predictions from Davenport et al. (2012) are shown for reference.

Figure 13 .
Figure 13.Left: We compare the JWST measured flare rate of TRAPPIST-1 in the TESS band (0.6-1 µm; blue points) against two previous flare frequency distributions of the star reproduced from Seli et al. (2021).The K2 flare rate of TRAPPIST-1 from Vida et al. (2017) is shown in gray and scaled to the TESS bandpass bySeli et al. (2021).The flare rate of TRAPPIST-1 analog stars observed by the TESS mission in its full-frame images is shown in orange and is higher than that predicted from K2.The flare rate measured by JWST agrees with the flare rate measured from the TRAPPIST-1 analogs, suggesting flares are likely to contaminate more transits than suggested by the K2 data.NIRSpec flares are noted with a dagger.Right: Cooler flares lead to an under-estimated TESS-band energy when scaling from the Kepler band using a 9000 K blackbody.Both 9000 K and 5000 K flare spectra emit the same energy in the Kepler band, but the TESS band energies are higher for the cooler flare by 46%.

Figure 14 .
Figure 14.Left: The best-fit model for the observed strengths of the Hα and He I IRT lines in our F3 flare, with and without continuum from the F13 kernel.Continuum from the 5F11 ribbon is included in both cases.Middle and Right: The F13 kernel flux is pressure broadened to a greater degree than the 5F11 ribbon flux, especially for Hα.As a result, the 5F11 flux dominates the fit to the line centers and the F13 flux dominates the wings.

Figure 15 .
Figure 15.Correction of flare contamination across a range of wavelengths for the transit spectra of TRAPPIST-1f.The F3 flare peaks just prior to ingress.The original NIRISS order 1 and 2 spectra of the uncorrected transit are shown for reference in gray, while the corrected spectra are color-coded by wavelength.While the F3 flare is clearly seen in the uncorrected spectra, it is much weaker in the corrected spectra after the best-fit blackbody model from Fig. 7 is subtracted and the activity lines are masked.

Figure 16 .
Figure 16.Correction of flare contamination across a range of wavelengths for the transit spectra during the second visit on TRAPPIST-1b.The F4 flare peaks during the flat bottom and continues through egress.The original NIRISS order 1 and 2 spectra of the uncorrected transit are shown for reference in gray, while the corrected spectra are color-coded by wavelength.The F4 flare is much smaller than the F3 event, leading to a less pronounced difference between the uncorrected and corrected spectra during the flare.

Figure 17 .
Figure 17.Fractional change in the transit depth as a function of wavelength after subtracting the flare contamination model for TRAPPIST-1f (top left) and b (top right).The spectrum of the star is shown in gray for reference.The larger F3 flare induces contamination at the 2100 ppm level at 2 µm, while the smaller F4 flare causes contamination of ∼500 ppm at 2 µm.TRAPPIST-1's flare rate predicts that flares of these sizes occur multiple times per day.The improvement in the RMS of the transit spectrum before and after applying the transit correction is shown for TRAPPIST-1f (bottom left) and b (bottom right).

Table 1 .
Overview of TRAPPIST-1 ObservationsNotes.Description of the observations used in this work.Columns are the JWST Cycle 1 program ID, the archived observation number given in the JWST Visit Status Report, the TRAPPIST-1 planet observed, the visit sequence number, date, start and stop times listed for archived observations in the Visit Status Report, total time spent on target and usable for science observations, number of integrations in the time series, and whether any flaring was observed during the visit.A dagger denotes the instrument and mode as NIRSpec BOTS prism observations, while no dagger denotes the instrument and mode as NIRISS SOSS.The time on target is approximately 70% of the total time listed in the Visit Status Report.

Table 2 .
Line emission properties of NIR flares from TRAPPIST-1

Table 3 .
Continuum emission properties of NIR flares from TRAPPIST-1