Optimal Fitting and Debiasing for Detectors Read Out Up-the-Ramp

This paper derives the optimal fit to a pixel’s count rate in the case of an ideal detector read out nondestructively in the presence of both read and photon noise. The approach is general for any readout scheme, provides closed-form expressions for all quantities, and has a computational cost that is linear in the number of resultants (groups of reads). I also derive the bias of the fit from estimating the covariance matrix and show how to remove it to first order. The ramp-fitting algorithm I describe provides the χ 2 value of the fit of a line to the accumulated counts, which can be interpreted as a goodness-of-fit metric. I provide and describe a pure Python implementation of these algorithms that can process a 10-resultant ramp on a 4096 × 4096 detector in ≈8 s with bias removal on a single core of a 2020 Macbook Air. This Python implementation, together with tests and a tutorial notebook, are available at https://github.com/t-brandt/fitramp. A companion paper describes a jump detection algorithm based on hypothesis testing of ramp fits and demonstrates all algorithms on data from JWST.


INTRODUCTION AND STATEMENT OF THE PROBLEM
Many detectors may be read out nondestructively to reduce the impact of read noise, with the reads being saved either individually or in groups for later analysis.This approach is standard on NICMOS (Skinner et al. 1998) and on the infrared channel of WFC3 (Baggett et al. 2008), both of which are installed on the Hubble Space Telescope.Groundbased instruments using infrared detectors can also be read out nondestructively.Some save only a combination of the reads as an estimate of the count rate, while others save all individual reads.The CHARIS instrument on the Subaru telescope is an example of the latter (Groff et al. 2016;Brandt et al. 2017).
The initial phase of processing data from a detector read out nondestructively is to derive the count rate from a sequence of reads.Each read measures the number of electrons in a pixel; it is subject to both read noise and photon noise.For an ideal detector in the absence of read noise and photon noise, the number of counts in a pixel would be the reset value plus the count rate times the time since reset.The reset value itself is subject to kT C noise and must be fitted from the data.
The problem of fitting a ramp has been studied extensively in the past.Fixsen et al. (2000) and Offenberg et al. (2001) derived and validated nearly optimal weights for combining individual, equally spaced reads as a function of signal-to-noise ratio.They also used the individual, saved reads to identify cosmic rays as instantaneous jumps in a pixel's counts.Kubik et al. (2016) extended the ramp fitting approach for the Euclid spacecraft while Casertano (2022) updated the weight calculation of Fixsen et al. (2000) for nonuniform sampling.Robberto (2014) proposed an optimal approach for ramp fitting at the cost of additional matrix operations to diagonalize each pixel's covariance matrix.
In this work I revisit the problem of fitting a ramp to a sequence of nondestructive reads.In the companion paper Brandt (2024), hereafter Paper II, I address the problem of identifying jumps in a pixel's counts.I consider the general case of a detector reading out at many arbitrary times and possibly averaging some of these reads together into groups; the average of a group of reads is also called a resultant.The reads are typically averaged with equal weights.Appendix A shows that equal weights are not optimal, and demonstrates the gains that are possible with alternative weights of the reads that combine to form a resultant.
Fitting a ramp to a sequence of resultants can be decomposed into two tasks.The first task is to derive the covariance matrix for the resultants.In practice, the read noise for each pixel may be measured, but the photon noise will have to be approximated from the data themselves.The second task is to use the covariance matrix to derive the maximum likelihood count rate.
The treatment I present here assumes an ideal detector and a constant astrophysical+dark count rate.I further assume that shot noise, digitization noise, and other noise sources are sufficiently modeled as Gaussian rather than, e.g., Poisson.This is necessary in order to identify the χ 2 statistic with the log likelihood for hypothesis testing.The assumption of an ideal detector includes perfect linearity and read noise that is uncorrelated between detector reads (though not necessarily between detector pixels).The 1/f noise ubiquitous in H2RG infrared detectors (Moseley et al. 2010;Rauscher 2015) is not a problem so long as the very low frequency component (∼seconds long, between reads at a fixed pixel) may be removed.Deviations from linearity may be corrected to create a ramp appropriate for the treatment presented here.Real detectors will have a number of additional complications, from pre-amplifier effects to random telegraph noise (e.g.Schlawin et al. 2020), that may or may not have a significant impact on the efficacy of the method presented here.
Consider a ramp consisting of many resultants r.If the covariance matrix C for this set of resultants is known, the problem of deriving the count rate involves minimizing where r meas are the measured counts in each resultant, r model are the model counts, and χ 2 is the chi-squared statistic.
If a resultant consists of a single read, then where a is the count rate, t i is the time of that read (with t = 0 corresponding to the time of the last reset), and b is the reset value.Equation (1) requires computing and then inverting a covariance matrix.If the covariance matrix is dense, then its inversion lacks a convenient closed form and has a computational cost that scales as n 3 , where n is the number of resultants.If this can be overcome, Section 2 shows the potential improvement in signal-to-noise ratio over current, approximate approaches.
In the rest of this paper I will recast the problem using only the differences between resultants.In Section 3 I will derive the resulting covariance matrix and show that it is tridiagonal.In Section 4 I will derive closed-form expressions for the maximum likelihood count rate and for the goodness-of-fit that may be used for hypothesis testing, e.g., for a possible jump in counts due to a cosmic ray hit.In Section 5 I will derive an analytic expression for the first-order bias of the count rate estimator.Section 7 describes a pure Python implementation of optimal ramp fitting at a cost linear in the number of resultants; it is computationally straightforward on a laptop computer even for long ramps on large-format detectors.I conclude with Section 8.

GENERALIZED LEAST SQUARES VS. APPROXIMATE APPROACHES
The current data processing pipelines for HST and JWST use adaptations of the approach suggested by Fixsen et al. (2000) and Offenberg et al. (2001).This approach uses a weighted average of the resultants, where the weights are constant in bins of estimated signal-to-noise ratio.The resulting weighted sum provides an estimate of the count rate that approaches, but does not reach, the precision of the treatment with the full covariance matrix.Because the weights change discretely with the properties of a ramp, I refer to the approach of Fixsen et al. (2000) and Offenberg et al. (2001) as the discrete weighting case.The full covariance matrix provides for continuously variable weights.
The most straightforward metric of the benefit of the full covariance matrix treatment presented in this paper is the signal-to-noise ratio of the inferred count rates.Figure 1 shows the noise in the count rate for the approach of Fixsen et al. (2000) and Offenberg et al. (2001), as adapted by Casertano (2022), as a fraction of the noise from a χ 2 minimization using the correct covariance matrix.The latter approach provides the smallest uncertainty of all unbiased estimates; the ratios are strictly larger than one.All currently available readout patterns for NIRCam on JWST 1 are shown with 10 groups; the proposed Roman readout patterns are detailed in Casertano (2022).In all Figure 1.Ratio of the noise in the count rate using the Offenberg et al. (2001) approach used for JWST (left) and the suggested modification by Casertano (2022) for Roman (right) to the noise from the χ 2 approach using the full covariance matrix.All current readout patterns of JWST are shown assuming 10 groups.In both cases I assume a read noise of 10 electrons/read.A fit using the full covariance matrix offers an improvement of up to ≈2% in signal-to-noise for JWST and between 0.5% and 3% for Roman, corresponding to increases in collecting area of up to 4% and between 1% and 6%, respectively.The "saw-tooth" pattern comes from transitions between discrete weighting schemes that are close to, but not exactly at, the level where both weighting schemes produce the same signal-to-noise ratio on the ramp fit.
cases the covariance matrix itself is assumed to be known.In reality the covariance matrix must be estimated; this introduces biases that I derive in Section 5. I assume a fiducial read noise of 10 electrons in a single read (10 √ 2 electrons in a read difference).The noise in the discrete weighting case shows discontinuities where one set of weights transitions to another; this appears as a saw-tooth pattern on Figure 1.These discontinuities could be avoided by choosing slightly different signal-to-noise thresholds between the weighting schemes, such that both sets of weights produce the same signal-to-noise ratio at a transition.Achieving this in practice would require the transitions between weights to change with the readout pattern.
The noise values shown in Figure 1 show improvements from the discrete weighting case ranging from ≪1% for long JWST ramps with many reads per ramp (Deep8, with ten resultants each of eight reads) to 3% for long Roman exposures from using a fit with the full covariance matrix.Typical improvements range from 0.5% to 2%, corresponding to an increase in equivalent collecting area of 1% to 4%.These represent meaningful improvements to the missions if the full χ 2 fit can be implemented robustly and efficiently.If the construction of the resultants themselves can be modified to accommodate weighted averages of reads, Appendix A shows that further improvements are possible especially with low numbers of resultants as would be the case if downlink bandwidth were severely restricted.

DERIVING THE COVARIANCE MATRIX
The first task in fitting a ramp is to derive a covariance matrix for the groups of reads or, in this case, for the differences between sequential groups of reads.I will denote individual reads by y; N reads may be averaged together into a group or resultant that I will denote by r: where y i refers to the N i reads that were averaged together to produce resultant r i .The individual reads k were taken at a series of times t k measured from the last reset at t = 0; each read k represents t k worth of accumulated signal.The mean time of resultant i is then where t i are the times since reset of the N i reads that comprise resultant i. Figure 2 shows this notation on an example ramp that begins with an unmeasured reset at t = 0 and consists of 19 reads averaged together into six resultants.
Throughout the rest of this paper I operate almost exclusively in the space of resultant differences.This serves to make the covariance matrix as close to diagonal as possible.A photon present in the first read will be present in all subsequent reads, so the covariance matrix of the accumulated counts will have all nonzero elements.In contrast,

0
Time of Read

Accumulated Counts
Reset  3) and ( 4), respectively.The red labels indicate the five resultant differences to be used in the algorithms derived in this paper.
nonoverlapping differences between resultants will not share any photons.The covariance matrix due to read noise is already diagonal in the accumulated counts.It is not diagonal, but rather tridiagonal, in the space of resultant differences: pairs of resultant differences (e.g.resultant 5 minus resultant 4 and resultant 2 minus resultant 1) will not share any reads unless the resultant differences are sequential.Most elements of the covariance matrix will then be zero, and this fact enables the algorithms described in this paper.Throughout this paper I will refer to the average of a group of reads as a resultant.I will assume that I have many resultants {N 1 , . . ., N n+1 }: the first resultant is the unweighted average of N 1 reads, etc. Appendix A shows how to treat the case of a weighted average of reads and demonstrates the potential improvement in performance.I assume n+1 resultants so that there are n differences between adjacent resultants; this will make the notation more convenient later.In Figure 2, with six resultants, n = 5.The normalized difference between two successive resultants i + 1 and i is then where r i is given by Equation (3) ⟨t i ⟩ is given by Equation (4); these differences are indicated in red in Figure 2. The quantity d i has units of counts per unit time.I will assume henceforth that the counts are in units of electrons.I will also assume an ideal detector that steadily accumulates counts after the last reset and that is subject only to read noise and photon noise.In this section I will first derive the variance and covariance of resultants, and then transform these into the variance and covariance of resultant differences.The derivation of the covariance matrix that I present is similar to those in Kubik et al. (2015) and in Casertano (2022).
The variance of a read due to read noise is σ 2 , where σ 2 is the single read variance (one-half the correlated double sampling variance).The variance of a resultant due to read noise is then where N i is the number of reads in resultant i.The covariance between different resultants due to read noise is zero since they do not share any reads.The covariance between two reads due to photon noise is the expected number of photons that are shared between the two reads, i.e., Cov(y i , y j ) = a • min(t i , t j ) (7) for a count rate a.For two different resultants, assuming i < j and that all reads in resultant i precede the first read in resultant j, the covariance is given by The variance of a single resultant due to photon noise is given by The time of the first read will appear 2N i − 1 times in this double sum, N i times for each sum minus one from double counting.The time of the second read will appear 2N i − 3 times, and so on.The variance can then be written Following Casertano (2022) I define a variance-weighted time τ i for each resultant i If there are a large number N of evenly spaced reads in each resultant, with the total duration of the resultant being ∆t we have Using Equations ( 8) and (10), we can now write the variance of the resultant difference r i+1 − r i and the covariance of resultant differences r i+1 − r i and r j+1 − r j , including both read noise and photon noise.For the variance, we have If the resultants each last for a time ∆t, consist of many reads, and occur immediately after one another, the variance becomes This is slightly less than the variance from two reads evenly spaced by ∆t, which would have a factor of unity in place of 2 3 .
For the covariance, with j = i + 1 (i.e.consecutive resultant differences), we have If the resultants consist of uninterrupted sequences of many reads with no gaps between resultants, and each lasts for ∆t, this covariance becomes The second term would be absent from Equation (20) for single read resultants because the time intervals of the two resultant differences would be fully disjoint and no photons would be shared.If j > i + 1, we have In other words, only adjacent resultant differences-those that share a resultant-have nonzero covariance.For notational convenience I will define so that δ i t is the characteristic difference of the integration times in the resultant difference d i .The scaled resultant differences (Equation (5)) are then The covariance matrix of all of the scaled resultant differences d i may be written as a matrix C r to be multiplied by the read noise variance σ 2 and a second matrix C γ to be multiplied by the photon count rate a: The read noise matrix has components and the photon noise matrix has components In Equation ( 24), C γ and C r depend only on the properties of the readout pattern, i.e., the number and times of the individual reads within each resultant: they do not need to be computed separately for every pixel.Both are tridiagonal, so the total covariance matrix C will also be tridiagonal.This fact was also pointed out by Kubik et al. (2015).
The arguments and derivations above provide the elements of the tridiagonal covariance matrix of the resultant differences d = {d i } as Each element α and β is the sum of a term scaled by a given pixel's photon rate a and another term scaled by the read noise variance σ 2 : The read noise σ may typically be measured for each pixel, but the true count rate a will be unknown.For the following section I will assume that the count rate is given and will derive the slope of the best-fit ramp, its uncertainty, and its goodness-of-fit χ 2 .I will then turn to the problem of estimating the covariance matrix itself.

Including the Reset Value
The preceding discussion derived the covariance matrix for the differences of adjacent resultants.For some applications the reset value is also useful.This could be for applying a nonlinearity correction, for monitoring the detector stability, or even for using the first read to measure the count rate.The precision of measuring the count rate using the first resultant alone is limited by kT C noise in the reset value.
If we wish to include the reset value, then we will also make use of the first resultant r 1 .We define d 0 as so that, in the absence of noise, where b is the reset value (the counts in a pixel at t = 0).If we wish to measure b, we can prepend d 0 to the vector {d 1 , . . ., d n }.We also need to prepend values to both α and β for the covariance matrix.The value of α 0 will be while the value of β 0 will be Equations ( 28) and ( 29) may also be used directly if we take δ 0 t = ⟨t 1 ⟩ and 1/N 0 = 0.The covariance matrix remains tridiagonal.

FITTING A RAMP
With the covariance matrix defined by Equation ( 27) via Equations ( 28) and ( 29), we want to fit the scaled resultant differences.I will defer the calculation including the reset value, which uses the additional elements of the covariance matrix given in Section 3.1, for Section 4.1.
All scaled resultant differences d i for i = 1, . . .n should have the same value in the absence of noise assuming the astrophysical count rate to be constant and the detector to be linear and well-behaved.The likelihood of a model consisting of a single count rate a is then where 1 refers to a vector of all ones.I can find the maximum likelihood count rate by differentiating this and setting it equal to zero: The formula for χ 2 itself may be expanded out as These equations all include a matrix inverse and matrix multiplications.A general matrix inverse has a computational cost of n 3 where n is the dimensionality of the matrix, while matrix multiplication with a vector has a cost of n 2 .These costs could be unacceptable if there are many reads or many resultants for millions of pixels.In the following I will show that the best-fit a and χ 2 may be computed using closed formulas for a cost that is linear in the number of resultant differences n.I will begin by computing the inverse of the covariance matrix, using the formula for a tridiagonal matrix.I will first define some helper variables using recursion relations (Equations (1.1), (1.3) and (1.4) of Usmani 1994).I use the same notation as Usmani for the helper variables but I adopt Greek letters for the elements of the covariance matrix following Equations ( 28) and ( 29).I have and The inverse of the covariance matrix is then given by I will further define Each of these is computable with a cost linear in the number of resultant differences.However, they are problematic if any of the β terms are zero.We can avoid this possibility by using the following equivalent recursion relations: with the initial conditions With these definitions, I will compute the terms I need to solve.First, the best-fit slope is given by The first term in Equation ( 56) may be written as The second term will look just like the first term but without the d factor, i.e., The only term that remains to compute for χ 2 is For this term, I will use the symmetry of the covariance matrix to write taking β 0 = 1 for the i = 1 term of the first sum.Again, this is computable at a cost linear in the number of resultant differences.
So, to sum up, I will define The best-fit count rate is then its standard error is and the best-fit χ 2 value is This section showed that I can compute the general up-the-ramp count rate with the full covariance matrix at a cost that is linear in the number of resultant differences.For a very small additional cost (evaluating the A term), I can also compute χ 2 and see whether a constant count rate is a good fit to the data.There is no need to precompute coefficients or interpolate within different signal-to-noise regimes.The full covariance matrix will be calculated once per frame as a term that is proportional to the photon rate at a given pixel and a second term that is proportional to the read noise variance at each pixel.

Fitting the Reset Value
If we want to fit for the reset value, we use the tridiagonal covariance matrix with the additional α 0 and β 0 defined by Equations ( 32) and (33), and the additional scaled resultant defined by Equation (30).The equation for χ 2 becomes where i is a vector that is one in the first entry and zero elsewhere.This may be expanded to obtain Some of these terms were already computed in the first part of Section 4, allowing me to write The term i T C −1 i is given in Equation ( 44) as while i T C −1 1 may be written using just the first term in the sum of Equation ( 63): In all of these formulas the β and α values prepended to the arrays in Equations ( 28) and ( 29) are indexed starting at 1.In other words, where β 1 appears in these equations, it now refers to the value in Equation ( 33), and where d 1 appears, it refers to the value in Equation (30).
To write the remaining term in Equation ( 69) more conveniently, I will define one additional quantity As for the terms in Equations ( 46)-( 48), this is equivalently defined by the recursion relation which avoids the possibility of division by zero.With this definition, I can write and finally In some cases, there may be a prior placed on the reset value b.If the reset value is stable up to kT C noise and only the first resultant is usable, then the use of a prior on b is the only way to obtain a constraint on the count rate a.
Assuming a Gaussian prior with a mean of z and an uncertainty σ z , the expression for χ 2 becomes Setting z = 0 and σ z = ∞ recovers the case of a uniform prior on the reset value.The first step to computing the best χ 2 is to differentiate χ 2 and set the result equal to zero: This yields The covariance matrix for a and b is then This may be inverted by hand to get the standard errors on a and b and their covariance.

Omitting One or More Resultant Differences
Sometimes a resultant difference is corrupted, e.g., by a cosmic ray: there can be a jump in counts between two resultants.There can also be a jump within a resultant, in which case two resultant differences must be discarded (both of the differences that contain the resultant with a jump).Saturation of a pixel or of its neighbor can also corrupt all resultant differences after the onset of saturation.
One way of discarding a resultant difference is to write down a new, smaller covariance matrix for the pixel in question that omits the corrupted difference(s).In order to facilitate the use of the equations developed in this section, I adopt a different approach.Assuming that we wish to discard resultant difference j, i.e. d j , I first decouple d j from the other differences by setting β j−1 = β j = 0.This renders the covariance matrix block diagonal.The inverse of the covariance matrix now has elements in row/column j where δ jk is the Kronecker delta.We need to set these to zero or to ensure that terms containing them are zero.The elements within the sum for C in Equation ( 63) are the column-summed elements of C −1 ; so we can set the j term to zero (equivalently, we can set Θ j = Φ j = 0).For B and A, the only resultant difference that multiplies the j row or column of C −1 is d j , so we can set d j = 0.
In sum, if we wish to ignore resultant difference j, we set We can do this for any number of resultant differences in any subset of pixels, and continue to apply the equations derived above.This statement holds true whether or not we are fitting for the reset value.

BIASES AND ESTIMATING THE COVARIANCE MATRIX
Sections 3 and 4 assumed that the covariance matrix is known.In general the read noise of each pixel may be accurately known, but that pixel's true count rate will not be known.The covariance matrix must first be approximated using the resultants themselves.This could introduce biases.In this section I will compute those biases to first order and show that fitting for the count rate using two iterations effectively avoids them.Prior to this, I will treat the case of discretely varying weighting schemes to estimate the count rate, as presented by Fixsen et al. (2000) and refined by Casertano (2022), showing that it is also biased and deriving an analytic formula for the bias.

Biases with Approximate Weights: The Discrete Case
The ramp-fitting approach of Fixsen et al. (2000), Offenberg et al. (2001), and Casertano (2022) uses fixed weights for the different resultants, with the weights determined by the signal-to-noise ratio as estimated by the difference between the first and last resultants.The use of discrete weights does introduce biases in the recovered count rate near the signal-to-noise ratios at which the weights are discontinuous.This section provides intuition for the source of the bias and then presents a calculation of its magnitude as a function of the read noise, the true count rate, and the readout pattern.
A bias exists because the estimated count rate is a weighted sum of the resultants, but this is covariant with the difference between the first and the last resultants which is used to determine the weights.I will use s to denote the difference between the last and first resultants.The signal-to-noise ratio estimate used by Casertano (2022) is Figure 3. Illustration of the phenomenon that leads to bias when using weights that vary discontinuously to estimate the count rate.The weights used depend on the difference between the first and last resultant (y-axis).The error in this quantity is covariant with the error in the inferred slope.When the weights used are independent of the count rate, this covariance leads to an error ellipse and an expectation value of zero for the error in the inferred count rate.Near a threshold between two weighting schemes, however, two different Gaussians are combined, and the expectation of the error in the inferred slope can be nonzero.The top panels show the probability densities marginalized over the error in the difference between the first and last resultant, decomposed by the sign of this error.
where σ 2 is the read noise.The estimate of the count rate is a weighted sum of the resultants, Different sets of weights are used depending on the signal-to-noise ratio inferred from s.The inferred count rate a will be covariant with s.As the signal-to-noise ratio increases, the first and last resultants are weighted more heavily in the sum of Equation ( 86) and this covariance becomes stronger.Near a break in the weighting scheme the joint distribution of a and s then has a discontinuity along a line of constant s: the covariance above this line is larger than the covariance below this line.A joint distribution that is symmetric far from any discontinuity in the weighting scheme becomes asymmetric near a discontinuity.Figure 3 illustrates the idea expressed above.The figure shows two different Gaussians, each with the same uncertainty in the inferred slope and in the difference between the first and the last read, but with different covariances between these two quantities.The left and right panels show each two-dimensional Gaussian individually; these would correspond to two different sets of weights.Each Gaussian shows the joint probability density of realizing a value of the fitted slope and of the difference between the first and last resultants.The middle panel shows what would happen at a discontinuity in the weights: the two-dimensional Gaussians differ at a threshold in the error in the last minus the first resultant.If the error in the last minus the first resultant is positive, the estimated S/N is slightly higher than the true S/N, and the weights corresponding to the right Gaussian are used.If the error is negative, the S/N is slightly underestimated, and the weights corresponding to the left Gaussian are used.
Far from a discontinuity in the weights, in the left and right panels, the joint distribution of a and s is symmetric and its center-of-mass is at zero error in both directions.The probability densities marginalized over the error in s are symmetric when the marginalization is restricted to positive or negative errors in s; this is shown by the blue and orange lines in the top panels.Near a discontinuity, however, these symmetries no longer hold.The center of mass of the two-dimensional distribution is at zero error in s (because this uncertainty remains symmetric), but it is no longer at zero error in a.The larger covariance at larger measured values of s leads to an expectation value of the error in a that is greater than zero, i.e., a positive bias.The probability density marginalized over the error in s is no longer symmetric when restricted to either positive or negative errors in s, and the total marginalized distribution has a positive mean (black line, top middle panel).
To calculate the bias of the discrete weighting scheme, we first need the variance of s, the variance of a, and the covariance of s and a.The variance of a is given in Casertano (2022) while the variance of s is given in Section 3. Their covariance is given by Cov(s, a) where r 1 and r n+1 denote the first and last resultant, respectively.The joint distribution between s and a is then given by the covariance matrix I will denote the elements of the inverse of this matrix as with m 12 = m 21 .The covariance matrix will differ above and below any threshold in s; I will assume that there is a discontinuity at s = λ, and that the elements of Σ −1 above and below λ are denoted by m and m ′ , respectively.In this case the expectation value of the error on a, a − ã, is I will focus only on the second term, which I will denote ⟨a − ã⟩ + .The first term may be equivalently written with limits on s from −λ to ∞ by replacing s with −s.I first complete the square and integrate over a: The odd portion of this function integrates to zero, while the even portion is an ordinary Gaussian integral.I will also use the facts that This allows me to write Finally, I use this result to compute the bias near a transition between two different weighting regimes.Denoting the covariance above the threshold as Cov + (s, a) and the covariance below the threshold as Cov − (s, a), and assuming the difference between s and its threshold value to be µ, I have This is maximised when s is at a transition between weighting schemes, in which case the exponent is zero.Two or more transitions between weighting regimes can contribute to the bias.If two transitions contribute, the bias is given by a sum of integrals of the form where, e.g., f 1 is the integrand in the first regime (c.f. the first line of Equation ( 95)), and µ and ν are the number of standard deviations the noiseless value of s is away from a transition.The functions f 1 , f 2 , etc. are odd functions of s (c.f. the second line of Equation ( 95)).Using this fact, Equation (97) can be written The first and second integrals each give the same result as Equation ( 96) across the two respective transitions.The total bias is then the sum of Equation ( 96) over both transitions.This argument can be extended to show that the total bias is the sum of Equation ( 96) calculated over all transitions.
Figure 4 shows the bias for ramps of 10 and of 30 reads.For this calculation I have used the weighting schemes and signal-to-noise ratio thresholds given in Casertano (2022), and have adopted a read noise of 20 electrons.The bias is mostly due to the read noise component of the covariance.For ten reads, the bias can be ≈0.5% at count rates near the boundaries between different weighting schemes.The orange points in Figure 4 show empirical calculations of the bias using Monte Carlo; they verify the accuracy of the theoretical curve derived in this section.
As for the continuous case discussed below, the bias can be mostly removed from the discrete weighting schemes.The difference in covariance between a and s is a linear combination of the weights (c.f.Equation ( 87)).The quantity s, the difference between the first and last reads, results in a biased estimate of a.A general estimator for a would have a covariance with a that is likewise given by a linear combination of the weights {w i }.The dimensionality of this set of estimators is one less than the number of resultants.Given a number of transitions N less than the number of resultants, it is possible to choose an estimator for a for which the covariance difference with a is zero across all N transitions.Such an estimator would produce a nearly unbiased way of choosing which weights to apply.2022), for 10 reads (left) and 30 reads (right) with a read noise of 20 electrons.The orange points are the averages of 5×10 6 Monte Carlo realizations each with read and photon noise; they confirm the accuracy of the bias derived in this section.The bias is positive because a larger s, the difference between the first and last resultants, results in an increased weighting of these resultants in the computation of a, and a larger covariance between s and a.The peaks are centered at the transitions between weighting schemes while the widths of the peaks are given by σs ≈ √ s + σ 2 scaled to the total number of reads.The width is a weakly increasing function of the count rate and appears to narrow because of the logarithmic axis.

The Continuous Case
I now derive the bias for a general least-squares fit as described in Section 4 and show how to remove it.I will start with Equation ( 56), the formula for a ramp, and define where the covariance matrix only applies to the resultant differences.I then have and The w i are themselves functions of the count rate a assumed in the construction of the covariance matrix (for the photon noise portion).For the rest of this discussion I will assume that the (unknown) actual count rate is ã while the covariance matrix is derived using a slightly different a ′ .I will assume that the read noise associated with each pixel is accurately known.I will first treat the case where the a ′ used in the construction of the covariance matrix is not directly derived from any of the resultant differences d i .In this case, I can use the fact that for all reads i because the observed count rate is an unbiased estimator of the true count rate and because read noise has zero mean.Equations ( 100), ( 101), and ( 102) then imply that the χ 2 -minimizing fit gives an unbiased estimator of the flux: This does not hold if the w i depend on the values of d i , i.e., if the d i values are used in determining the count rate for the purposes of deriving the covariance matrix.In that case, I will assume that the covariance matrix is calculated assuming a photon rate of This is fairly general: if all c i are equal then this is the case of using the average count rate (scaled differences between adjacent groups of reads) to compute the covariance matrix; it is equivalent to using the difference between the first and last groups of reads.Iteratively updating w i and estimating a would correspond to another set of c i (different at each iteration).Using weights for each resultant derived from the read-noise limited fit would correspond to a different set of c i .
The dependence of the weights w i on the adopted value of a is complicated so I will use a Taylor expansion of w i to first order about the true count rate ã.I have where I used c j = 1, ⟨d i ⟩ = ã, and w i = 1.I will further expand this by subtracting and adding ã to d i : The last term is zero because the individual scaled resultant differences are unbiased estimators of the true count rate.
The first term has the covariance matrix of the resultant differences, Cov(d i , d j ), given by C in Equation ( 24): So, if we adopt a covariance matrix built using a weighted sum of the resultant differences to estimate the photon rate, then Equation ( 108) gives a first-order estimate of the bias introduced to the recovered count rate.It is possible to either correct for this bias or to choose a set of weights c j for the initial estimate of the count rate in order to have zero bias to first order.If we want to avoid the bias, then we wish to choose a vector of initial guess coefficients c so that c is orthogonal to In fact, the set of optimal coefficients w to combine the resultant differences is a bias-free choice for c.To prove this I will use the fact that the weights w given in Equation ( 99) provide the minimum-variance unbiased estimate of the true count rate if the true covariance matrix is C (Aitken 1935).The variance of the sum of resultant differences weighted by w is the variance of the recovered count rate a, and is given by A weight vector w ′ derived with a different assumed count rate (i.e. a different approximation to the true covariance matrix) will still produce an unbiased estimate of the count rate due to Equation ( 103).In other words, w(a) gives an unbiased estimate of the true count rate for any assumed count rate a used to approximate the covariance matrix and, from this, compute w using Equation ( 99).The Gauss-Markov theorem then states that σ 2 a is minimized if the weight vector w is derived using the true count rate a true .So, differentiating σ 2 a with respect to the count rate used to derive w will equal zero at a true : where the last line used the symmetry of the covariance matrix C = C T .So, if the optimal weight vector w can be approximately calculated, then the covariance matrix computed from a = w • d allows for a nearly unbiased estimate of the true count rate.If the covariance matrix is approximated using a = c • d for some other c, then the resulting best-fit count rate will be biased by an amount bias The bias of Equation ( 114) results from a series expansion of w about the true count rate a. Negative values of a are incompatible with the Poisson distribution; the covariance matrix should not have a negative coefficient times the photon noise covariance matrix.In practice this means that Equation (112) overestimates the bias when the count rate is close to zero assuming that the covariance matrix is approximated using the maximum of a = c • d and zero.We can estimate this effect using the probability that the initial weight vector will produce a negative estimated count rate, and reduce the bias by this factor.Assuming Gaussian errors and an initial uncertainty on the count rate of we finally have bias where the initial factor is the probability of obtaining a negative measured value assuming a true value of a and an uncertainty of σ 0 .

Empirical Demonstrations of the Bias
I have tested the first-order approximation for the bias on synthetic data with 30 reads each treated individually.The off-diagonal elements of the covariance matrix in this case consist only of read noise.I further adopt a read noise of σ = 20 e − /read.The bias will depend upon the actual count rate (which partially determines the covariance matrix C) and on the weight vector c used to estimate the count rate for use in approximating the covariance matrix.114) for a sequence of 30 individual reads with a read noise of 20 electrons using various initial weight vectors c.Left panel: the c vectors are set to the optimal weights for three different true count rates.Right panel: biases for 50 random vectors c with all elements drawn from U (0, 1) and the vector finally normalized to a unit sum.The bias can be significant at low count rates depending on the weights on the resultant differences used to estimate the covariance matrix.The bias is zero when the vector c is the optimal weight vector for the actual count rate. .Verification of the bias computed using Equation (114) (blue line) using Monte Carlo (orange points) assuming 30 reads with 20 electron read noise (as in Figure 5).The green points use the same ramps as the orange points, but fit the ramp twice as suggested in Section 5.3.The bias after fitting the ramp twice is negligible.The open red points show the difference between the orange and green points, i.e., the empirical bias assuming that fitting the ramp twice produces an exactly unbiased fit.Uncertainties on the open red points are negligible.
Figure 5 plots the bias computed using Equation (114) as a function of count rate for several different initial weight vectors c.The left panel shows the bias resulting from the optimal weight vector for zero count rate, a moderate count rate of 50 electrons/read, and an arbitrarily high count rate for which all elements of c are the same.The right panel of Figure 5 shows the bias resulting from 50 random realizations of the initial weight vector c.In all cases, I use uniform random numbers between zero and one for all elements and then normalize the vector so that the elements sum to one.The biases in both cases can be non-negligible at low count rates.
Next, I test the bias calculated using Equation ( 114) with Monte Carlo.For this I continue to assume 30 individual reads with a read noise of 20 electrons.Figure 6 compares the bias from an initial estimate using only the first and last reads (i.e.averaging the read differences) to fits of Monte Carlo realizations of ramps.Using only the first and last reads provides the optimal estimate at high photon rates, and we therefore expect it to produce unbiased count rates in this regime.At low photon rates, however, this weighting is not optimal and Equation ( 114) predicts a bias.I generate at least 5 × 10 6 synthetic ramps at each sample count rate to provide a robust empirical bias; this is indicated by the orange points in the left panel.The Monte Carlo results agree well with the prediction for count rates ≳1 electron/read.The points also agree well at low count rates where the correction factor in Equation ( 114) approaches 1 2 .
I can remove the bias to first order by computing Equation (109), projecting this vector off of my initial weight vector, renormalizing the weight vector, and repeating the Monte Carlo test.This does not give the optimal weight vector but it does give one that will produce an estimate of the covariance matrix that results in unbiased fitted count rates.Using this approach with 10 7 synthetic ramps and a true count rate of 2, the mean best-fit slope becomes 2.00015 with an uncertainty on the mean of 0.00016, i.e., the bias is more than 10 times lower and is no longer detectable without running a much larger set of synthetic ramps.
I can also avoid almost all of the bias by performing the fit to the ramp twice.I use the first fit to infer the weights w, and after using these weights to estimate the photon rate, I recompute the covariance matrix.I then use this new covariance matrix to perform a second fit to the ramp in order to compute the final count rate.
The green points in the left panel of Figure 6 are fits to the same ramps as the orange points, but fit the ramp twice to remove bias.The red points in the right panel of Figure 6 show the difference between the orange and green points in the left panel.They show the bias from using only the first and last reads to estimate the covariance matrix, assuming that fitting the ramp twice produces unbiased results.These red points agree almost perfectly with the prediction at high count rates, and agree almost as well at low count rates where the correction factor for nonnegative inferred count rates becomes significant.
To obtain the best estimate of the true photon rate and avoid biases in the process, I therefore suggest the following procedure: 1. Use uniform weights or a median on all scaled resultant differences d i to estimate a count rate; 2. Use this count rate to estimate the covariance matrix and fit for the count rate; 3. Use this updated count rate to re-estimate the covariance matrix; and 4. Perform the optimal fit with this re-estimated covariance matrix.
The total computational cost of this approach is approximately double the cost of fitting the ramp once.

EXAMPLE: A NIRCAM RAPID EXPOSURE
I demonstrate the new ramp fit on NIRCam RAPID data with eight reads, and a single read per group.These data are from Early Release Science (ERS) imaging of NGC 3324 in the F200W filter with the nrca1 detector.The total exposure time was 161 seconds.A visual comparison of the ramps themselves (i.e.images of count rates at each pixel) requires comparable jump detection and masking; I defer this comparison to Paper II and do not show any images here.
In this paper I address the bias and χ 2 statistics of the ramp fit as visible in the count rates available in the rate.fitsfile available on MAST.I perform no bias subtraction or nonlinearity correction to the groups.I do, however, correct for the reference pixels.I subtract the average value of the reference pixels at either end of each of the four readout channels and then use the reference pixels along the sides of the detector in Channels 1 and 4 to remove some of the 1/f noise.I smooth these reference pixels with a Gaussian and subtract the pattern from the each channel, choosing the smoothing length and the factor by which I subtract to minimize the scatter about the resulting ramp fit.I then adopt the read noise files available on the JWST calibration center, dividing by √ 2 to convert from correlated double sampling (CDS) noise to single read noise, and I use the calibration gain of 2.05 e − /DN.
Section 5.1 suggests that the rate.fitsfile derived from the individual groups may have a detectable bias from the discrete change in the weights.Figure 7 shows that this is indeed the case.The figure plots a histogram of the count rates near a signal-to-noise ratio (S/N) of 20 for a pixel with a typical noise level; the weighting scheme changes at this S/N.The rate.fits file has a small hump just past this value, which my new debiased ramp file lacks.A new ramp fit with the original weights given in Fixsen et al. (2000) confirms this as the source of the bias.An alternative weighting scheme that does not change near this S/N also does not show a bias (green histogram).The lower panel shows the ratio of the histograms of the biased ramps and of the constant weight ramp to that of the unbiased χ 2  (2000) discrete weights.The bias is visible in both a custom ramp fit using the Fixsen et al. (2000) weights (blue dashed line) and in the ramp.fitsfile (orange dashed line) at the expected location, where the S/N crosses the threshold between weighting schemes.Each pixel has a different noise level, a fact that blurs the bias over a broader range of count rates.The bias is not visible when using weights that do not change at this threshold (green line) or when using a debiased fit (red line).
ramp.The biased ramps show a deficit of points just below the threshold and an excess just above due to the shifting of individual ramp fits to slightly higher values.Finally, the algorithms described in this paper provide a direct measurement of the χ2 value for the fit.Figure 8 shows a histogram of the χ 2 values for the ramp fit compared to a theoretical distribution with six degrees of freedom: seven group differences minus one fitted slope.The left panel adopts the read noise values from the JWST Calibration Reference Data System 2 .In this right panel I have scaled the noise down slightly, by a factor of 0.97, for better agreement.The noise scaling depends on how well the reference pixel correction removes correlated noise, leading to slight differences between my favored noise values and those in the JWST calibration package.This result suggests that some improvement in the reference pixel correction used by JWST may be possible.With this slight scaling of the noise, the empirical distribution of χ 2 values is indistinguishable from the theoretically expected distribution.This suggests that the adopted covariance matrix provides a very good statistical description of the data, and validates the formal uncertainties of the fitted slopes.

A PURE PYTHON IMPLEMENTATION
I have implemented the algorithms described in this paper in pure Python.In this section I briefly summarize the implementation and its computational cost.All tests running the code were performed on a 2020 Macbook Air.I have further included a series of tests to verify that all calculations are correct: that the calculated covariance matrix agrees with a Monte Carlo approximation and that the best-fit slopes agree with the results of explicit matrix inversion.
The first step in my implementation is to compute the α and β components of the covariance matrix and the δt values for a set of read times.This set of read times is a list of the time(s) since reset, or the integration time(s), for the read(s) corresponding to each resultant.A single read resultant may be specified by either a floating point number for the integration time or a list of numbers for a multiple-read resultant.for photon noise and read noise are then stored in a specifically designed Python class.A calling sequence for a six resultant ramp with a mixture of single-read resultants and multiple-read resultants could look like the following: , 2 , [3 , 4 , 5] , [6 , 7 , 8] , [10 , 11 , 13] , 15] C = fitramp .Covar ( readtimes ) If the user would like to fit for and/or apply an informative prior on the pedestal value, they would call fitramp.Covar with the boolean pedestal set to True.The covariance structure would then have an extra element in both α and β as described in Section 4.1.The next step is to fit a ramp.The corresponding function takes a 2D array of resultant differences (number of resultants minus one by number of pixels), the Python class holding the covariance information from the integration time of each read, and the read noise of each pixel.It is vectorized to operate on many pixels simultaneously.Optionally, this step may include a mask of the same shape as the resultant differences (differences with a mask value of zero are ignored), and it can compute count rates and χ 2 values leaving out resultant differences and pairs of differences.These quantities may be used in a jump detection approach, described in Paper II.If the pedestal value is to be used, the first element of the resultant difference array should be the first resultant divided by its integration time.Finally, I have included a Python method to compute the bias in the count rate from using a weighted average of the resultant differences to estimate the count rate for the covariance matrix.
Implementing the ramp fitting algorithm described in this paper requires computing a number of auxiliary quantities.These quantities, defined throughout this paper, all have a linear cost in the number of resultants.They also require memory.To limit the memory footprint, I recommend using the ramp fitting algorithm on one row of detector pixels at a time.A row-by-row loop also enables more efficient memory access compared to trying to access large parts of many different arrays that exist in distant regions of RAM.In practice I have found maximal efficiency from operating on 10 3 -10 4 pixels at a time.This approach leaves a negligible memory footprint.
Figure 9 shows the computational time for ten resultant ramps using both the row-by-row implementation and by operating on the full array at once.The full array approach becomes costly as the memory required for auxiliary quantities approaches the system's RAM; this point will be machine dependent but happens for an array size of ∼ 2000 2 pixels on my laptop (with 8 GB of RAM).A row-by-row implementation remains efficient for 4000 2 -pixel detectors and has a negligible memory footprint beyond that required to store the resultants themselves.
The computational cost of this approach, while larger than that of Fixsen et al. (2000) and especially its memoryefficient implementation by Offenberg et al. (2005), is modest for a modern computer and is linear in the number of resultants.For that reason the performance numbers I quote, and those shown in Figure 9, are in units of seconds

Implementation
Row-by-Row Full Array Figure 9. Computational time per pixel and per resultant of an up-the-ramp fit with ten resultants, fitting twice to remove bias, with a row-by-row implementation (blue points) compared to a single pass on the full array (orange points).The single pass is faster when the detector is small, but as the detector grows larger, arrays for all of the auxiliary quantities begin to demand all of the system's RAM and performance suffers.This crossover point will be machine dependent.The row-by-row implementation is better for large-format detectors processed using a laptop or desktop computer (the tests were run on a computer with 8 GB RAM).
per 10 8 pixel-resultants.A ramp with twice as many resultants will take twice as long to process, as will a ramp with twice as many pixels.A ramp with 10 8 pixel-resultants roughly corresponds, for example, to an H2RG (≈ 4 × 10 6 pixels) with 24 resultants.
Running my pure Python implementation on a single core of a 2020 Macbook Air takes ≈2.6 seconds per 10 8 pixelresultants to fit a ramp once.Fitting a ramp twice to remove bias doubles this cost to a little over five seconds per 10 8 pixel-resultants.For an H4RG ramp with 10 resultants this cost corresponds to ∼8 seconds to fit a ramp and remove bias.These times correspond to a 3-year-old laptop and could be considerably lower on a better computer.They would be correspondingly lower for ramps taken from a smaller H2RG detector.

Numerical Considerations
The computations needed to calculate the best-fit slope, its uncertainty, and χ 2 are given by the equations of Section 4. For a diagonally dominant matrix (as all covariance matrices that will arise for realistic ramps are) these typically do not present numerical difficulties.
If the photon rate and/or read noise are large, then overflow is a risk.For example, assuming one read per second, If n = 100 and σ 2 = 10000 (for a very long ramp with a very noisy pixel) then overflow could result.Overflow could occur for similar reasons in the recursive computation of auxiliary quantities.By default, my implementation factors the geometric mean of α out of the covariance matrix to guard against overflow or (less likely) underflow.This does not affect the best-fit slope.After computation, the uncertainty on the best-fit slope is multiplied by the square root of this scaling factor and the value of χ 2 is divided by the scaling factor.

CONCLUSIONS
Past work in the literature has either approximated the optimal solution to the problem of fitting a ramp (Fixsen et al. 2000;Kubik et al. 2016;Casertano 2022), or has required expensive matrix operations (e.g.Robberto 2014).
Here I have shown that the optimal approach can be implemented with a computationally efficient algorithm.Closedform solutions for the weights of the resultants are available, and the computational costs are linear in the number of resultants.The optimal approach does require the covariance matrix of the resultants to be estimated first; this can introduce a bias in the best-fit count rate.I have derived a formula for the bias and shown how it can be removed to first order.
As a byproduct of deriving the optimal count rates, I have also shown that the χ2 values of the fits may be computed for little additional cost.This enables straightforward flags for the goodness of fit, which may be used to identify bad pixels.The distribution of χ 2 values may also be used to verify the quality of the noise model.
The algorithms presented here can be implemented efficiently in pure Python.They are computationally straightforward on a laptop computer even for long ramps on a large-format detector.This could enable more straightforward and sensitive ramp fitting for existing and future instruments using detectors that are read out nondestructively.
Software: scipy (Virtanen et al. 2020), numpy (Oliphant 2006;van der Walt et al. 2011) Figure 10.Left: Ratio of optimal slope fit to the optimal slope fit with all of the reads assuming uniform intra-resultant weights.The different colors show the same proposed Roman readout patterns used in Figure 1.Right: the same ratio but with intra-resultant weights optimized in the low count rate regime.The signal-to-noise ratio now reaches the value when using all of the reads for low count rates, and it is superior to the signal-to-noise for uniform intra-resultant weights at all count rates.The curves from the left panel are shown semitransparent in the right panel to facilitate a visual comparison.
where j runs over the reads within resultant i, taking the place of Equation ( 6).For photon noise, the covariance between resultant i and resultant i + 1 is which simplifies to Equation (8) if the κ values are all equal (in which case the new and old definitions of ⟨t i ⟩ are equivalent).The variance of resultant i due to photon noise is Var(r i ) = a j k κ i,j κ i,k min(t i,j , t i,k ). (A10) This expression takes the place of Equation ( 9), which can no longer be simplified.These equations may be propagated through the remainder of the derivations in Section 3 to derive the appropriate values for α and β that define the covariance matrix.
The covariance matrix will remain tridiagonal unless a resultant has weights that sum to zero.In that case, Equation (A7) is undefined.Adopting Equation (A7) without the denominator would not solve the problem.Two of the four terms in Equation ( 21) (either the first two or the last two) would be zero and the sum would no longer vanish.This numerical problem may be avoided by slightly changing the weight of one of the reads while keeping κ i = 1; a small change to the weights entails a negligible penalty in performance.
I can now measure the performance of uniform weights against the performance of nonuniform weights within each resultant.Figure 10 shows the results for the same proposed Roman readout patterns shown in Figure 1.Using the intra-resultant weights given in Equation (A7) matches the signal-to-noise ratio from using all of the reads in the low count rate limit.At all count rates, it gives superior results to uniform intra-resultant weighting.This is because in all cases the best weights increase toward the first and last reads; the optimal low count rate weights give the most gradual increase in weights toward either end.This approach is still closer to the optimal photon noise limit of only using the first and last read than is the case of uniform weights within each resultant.
The proposed intra-resultant weights given in Equation (A7) are not the only possible choices.Weights could also be optimized for intermediate count rates.In this case, slightly improved signal-to-noise ratios at higher count rates would come at the expense of slightly degraded signal-to-noise ratios at low count rates.
Nonuniform weighting within resultants can improve the final signal-to-noise ratio at all count rates, significantly so for shorter exposures at low count rates.The use of nonuniform weights will not affect the properties or removal of the correlated noise endemic to HxRG detectors, because these weights would still be the same for all pixels.Cosmic ray flagging, discussed in Paper II, will similarly be unaffected in principle.Sensitivitities to cosmic rays will change slightly, but a detailed analysis of that is beyond the scope of the current discussion.A reweighting could complicate nonlinearity corrections though, if the nonlinear behavior is accurately known, this could be propagated through the known readout and weighting pattern.
As shown in this section, nonuniform weighting within a resultant offers promise for preserving more useful information in a limited number of resultants.In the read noise limit it can preserve all useful information.If nonuniform weighting can be implemented in practice, it could improve the performance of a mission limited by downlink bandwidth.

Figure 4 .
Figure4.The bias in the discrete weighting scheme ofFixsen et al. (2000) as adapted byCasertano (2022), for 10 reads (left) and 30 reads (right) with a read noise of 20 electrons.The orange points are the averages of 5×10 6 Monte Carlo realizations each with read and photon noise; they confirm the accuracy of the bias derived in this section.The bias is positive because a larger s, the difference between the first and last resultants, results in an increased weighting of these resultants in the computation of a, and a larger covariance between s and a.The peaks are centered at the transitions between weighting schemes while the widths of the peaks are given by σs ≈ √ s + σ 2 scaled to the total number of reads.The width is a weakly increasing function of the count rate and appears to narrow because of the logarithmic axis.

Figure 5 .
Figure5.Biases computed using Equation (114) for a sequence of 30 individual reads with a read noise of 20 electrons using various initial weight vectors c.Left panel: the c vectors are set to the optimal weights for three different true count rates.Right panel: biases for 50 random vectors c with all elements drawn from U (0, 1) and the vector finally normalized to a unit sum.The bias can be significant at low count rates depending on the weights on the resultant differences used to estimate the covariance matrix.The bias is zero when the vector c is the optimal weight vector for the actual count rate.
Figure6.Verification of the bias computed using Equation (114) (blue line) using Monte Carlo (orange points) assuming 30 reads with 20 electron read noise (as in Figure5).The green points use the same ramps as the orange points, but fit the ramp twice as suggested in Section 5.3.The bias after fitting the ramp twice is negligible.The open red points show the difference between the orange and green points, i.e., the empirical bias assuming that fitting the ramp twice produces an exactly unbiased fit.Uncertainties on the open red points are negligible.

FitFigure 7 .
Figure7.Demonstration of nonzero bias in actual JWST data from NIRCam.The top panel shows the probability density of count rates a using different ramp fitting approaches.The bottom panel normalizes these probability densities to the ones I obtain with the approach described in this paper (red line in the top panel).The MAST rate.fitsfile uses theFixsen et al. (2000) discrete weights.The bias is visible in both a custom ramp fit using theFixsen et al. (2000) weights (blue dashed line) and in the ramp.fitsfile (orange dashed line) at the expected location, where the S/N crosses the threshold between weighting schemes.Each pixel has a different noise level, a fact that blurs the bias over a broader range of count rates.The bias is not visible when using weights that do not change at this threshold (green line) or when using a debiased fit (red line).
The resulting α and β components Distribution of χ 2 values for a NIRCam image taken in RAPID mode.The left panel uses read noise levels from the Calibration Reference Data System; in the right panel the noise values have been scaled by a factor of 0.97 (i.e.slightly reduced) to obtain the best agreement with a theoretical distribution.The theoretical distribution is a χ 2 distribution with six degrees of freedom (seven group differences minus one fitted parameter).The theoretical and empirical distributions are indistinguishable when scaling the read noise.The ≈4% of pixels with a detected jump, identified as described in Paper II, are excluded.