The Sloan Digital Sky Survey Reverberation Mapping Project: Key Results

We present the final data from the Sloan Digital Sky Survey (SDSS) Reverberation Mapping (RM) project, a precursor to the SDSS-V Black Hole Mapper RM program. This data set includes 11 yr photometric and 7 yr spectroscopic light curves for 849 broad-line quasars over a redshift range of 0.1 < z < 4.5 and a luminosity range of L bol = 1044−47.5 erg s−1, along with spectral and variability measurements. We report 23, 81, 125, and 110 RM lags (relative to optical continuum variability) for broad Hα, Hβ, Mg ii, and C iv using the SDSS-RM sample, spanning much of the luminosity and redshift ranges of the sample. Using 30 low-redshift RM active galactic nuclei with dynamical-modeling black hole masses, we derive a new estimate of the average virial factor of logf=0.62±0.07 for the line dispersion measured from the rms spectrum. The intrinsic scatter of individual virial factors is 0.31 ± 0.07 dex, indicating a factor of 2 systematic uncertainty in RM black hole masses. Our lag measurements reveal significant R–L relations for Hβ and Mg ii at high redshift, consistent with the latest measurements based on heterogeneous samples. While we are unable to robustly constrain the slope of the R–L relation for C iv given the limited dynamic range in luminosity, we found substantially larger scatter in C iv lags at fixed L 1350. Using the SDSS-RM lag sample, we derive improved single-epoch (SE) mass recipes for Hβ, Mg ii, and C iv, which are consistent with their respective RM masses as well as between the SE recipes from two different lines, over the luminosity range probed by our sample. The new Hβ and Mg ii recipes are approximately unbiased estimators at given RM masses, but there are systematic biases in the C iv recipe. The intrinsic scatter of SE masses around RM masses is ∼0.45 dex for Hβ and Mg ii, increasing to ∼0.58 dex for C iv.


INTRODUCTION
Accurate measurements of supermassive black hole (hereafter SMBH or BH for short) masses are cornerstones for understanding the cosmic assembly of SMBHs, their potential co-evolution with host galaxies, and fundamental accretion physics.Far beyond the nearby universe, it is difficult to use spatially resolved gas or stellar kinematics to measure a dynamical mass of the accreting SMBH.Currently, the primary method to measure black hole masses in distant AGNs or quasars is reverberation mapping (e.g., Blandford & McKee 1982;Peterson 1993;Cackett et al. 2021).RM measures a characteristic size of the broad-line region (BLR) from the time lag between continuum variability and the response in the broad-line flux.By combining the size of the BLR (assumed to be viralized) with the broad-line width (as the surrogate for the virial velocity), one can derive a virial black hole mass estimate for the AGN.RM results for local AGN provide the foundation of the socalled "single-epoch" method (e.g., Greene & Ho 2005;Vestergaard & Peterson 2006;Shen 2013) that allows efficient estimation of quasar black-hole masses using single-epoch spectroscopy (e.g., Shen et al. 2008Shen et al. , 2011;;Wu & Shen 2022).This single-epoch technique is built on the observed tight relation between the Hβ BLR size and the optical luminosity of the AGN for the local RM sample (e.g., Kaspi et al. 2000;Bentz et al. 2013), although some recent RM studies targeting a broad range of AGN properties have shown deviations and increased scatter in the R − L relation (e.g., Du et al. 2016;Fonseca Alvarez et al. 2020).The systematic uncertainties of extrapolating the local RM results to distant, luminous quasars, however, are not yet well quantified (e.g., see the review in Shen 2013).
In the past few years, there have been two major advancements in the RM field.On the one hand, high-quality monitoring data have become available for bright, low-redshift (typically z ≲ 0.3) AGNs to measure broad-line RM lags with unprecedented precision, and to cover a broader range of accretion parameters (e.g., Barth et al. 2013;Fausnaugh et al. 2016;Du et al. 2016Du et al. , 2018;;Brotherton et al. 2020;Kara et al. 2021).These high-quality RM data further enabled more powerful constraints on the dynamics of the BLR with velocityresolved RM (e.g., Grier et al. 2013;De Rosa et al. 2018), or dynamical modeling of the BLR response to continuum variations (e.g., Pancoast et al. 2014;Li et al. 2018;Williams et al. 2020).These new measurements of high-quality RM data are offering new insights on the structure and kinematics of the BLR gas, as well as high-fidelity BH masses in AGNs.In particular, the SEAMBH project (Du et al. 2016(Du et al. , 2018) ) targeting highaccretion-rate low-redshift AGNs is extending RM studies to systems with extreme accretion parameters.However, most of these high-quality RM measurements are for Hβ, with only two cases for C iv using intensive UV spectroscopy from HST (De Rosa et al. 2015;Kara et al. 2021).
On the other hand, there have been considerable efforts to push RM to the high-redshift regime, and for additional broad lines in AGN spectra.For example, there have been monitoring programs to measure RM lags in high-redshift and high-luminosity quasars (e.g., Kaspi et al. 2007;Czerny et al. 2019a;Lira et al. 2018;Kaspi et al. 2021), with decade-long baselines to capture the anticipated multi-year lags.These most luminous quasars are rare and sparsely distributed on the sky, thus requiring individual monitoring campaigns.In most of these cases, the data quality is only sufficient to derive a mean lag.One notable exception is the gravitationally lensed quasar SDSS J2222+2745 at z = 2.8, for which Gemini optical monitoring spectroscopy combined with ground-based photometric light curves enabled velocityresolved RM measurements (Williams et al. 2021).The success of RM measurements in SDSS J2222+2745 rests on anticipated variability features from prior photometric monitoring for this lensed quasar.For other monitoring programs of high-redshift quasars, however, the variability patterns are unknown ahead of time, resulting in a reduced success rate of lag measurements.Therefore individual object monitoring becomes expensive and inefficient in building up the statistics of high-redshift RM measurements.One solution to circumvent these observational difficulties is to perform multi-object spectroscopic monitoring that piggybacks on a survey program to greatly improve the efficiency of RM monitoring.
The Sloan Digital Sky Survey Reverberation Mapping (SDSS-RM) project (Shen et al. 2015a) and the OzDES-RM project (King et al. 2015) recently emerged as the first two multi-object spectroscopic RM programs (MOS-RM) to perform RM measurements for large statistical samples of quasars far beyond the low-redshift universe.Both MOS-RM programs target all quasars within small areas of the sky that can be simultane-ously monitored with wide-field optical MOS facilities.Because the target quasars are typically fainter than m ∼ 18 in the optical, the spectral S/N is on average much lower than those achieved for low-redshift RM campaigns, a situation to be improved with future larger-aperture MOS facilities for similar target brightness (e.g., McConnachie et al. 2016;Swann et al. 2019).Nevertheless, these MOS-RM programs are starting to produce large numbers of lag measurements at z > 0.3 that cover major rest-frame UV to optical broad emission lines (e.g., Shen et al. 2016;Grier et al. 2017bGrier et al. , 2019;;Shen et al. 2019a;Homayouni et al. 2020;Hoormann et al. 2019;Yu et al. 2023;Malik et al. 2023).
In this work we present the final data set from the SDSS-RM project, which includes 11 years of photometric monitoring (2010-2020) and 7 years of spectroscopic monitoring (2014)(2015)(2016)(2017)(2018)(2019)(2020).Using this data set, we measure RM lags for four major broad lines in quasar spectra: Hα, Hβ, Mg ii, and C iv.We describe the data in §2, and the preparation of light curves in §3.We detail our methodology of lag measurements in §4, and present our RM results along with discussions in §5.
We conclude in §6 with future prospects.In the appendices, we provide additional information about data access and lag detection efficiencies.Throughout this paper we adopt a flat ΛCDM cosmology with Ω Λ = 0.7 and H 0 = 70 km s −1 Mpc −1 .By default time lags refer to those in the observed frame unless otherwise specified.We adopt the convention that a positive lag means line variability lags behind continuum variability.

The SDSS-RM sample
The SDSS-RM sample contains 849 broad-line quasars at 0.1 < z < 4.5 spectroscopically confirmed in a single 7 deg 2 field (R.A. J2000= 213.704 deg, decl.J2000= +53.083 deg).These quasars were discovered in previous SDSS surveys and were selected by different algorithms (see details in Shen et al. 2019b).The sample is flux limited to i psf = 21.7,although the completeness near this flux limit is not well quantified and is particularly low at z < 0.5 (see fig. 1 in Shen et al. 2019b).No additional selection cuts, such as variability or emissionline strength, were imposed on the SDSS-RM sample.The SDSS-RM sample covers a broad and contiguous range in redshift-luminosity space, and is representative of luminous quasars with L bol ≳ 10 45 erg s −1 .The detailed sample characterization is provided in Shen et al. (2019b), where spectral variability properties are based on the 2014 SDSS-RM spectra alone.

Spectroscopy
Optical spectroscopic monitoring was obtained with the BOSS multi-object spectrographs (Smee et al. 2013) on the SDSS telescope (Gunn et al. 2006) during 2014-2020, as an ancillary program within SDSS-III (Eisenstein et al. 2011) and SDSS-IV (Blanton et al. 2017).The spectroscopy had an average cadence of ∼ 4 days in 2014, with a total of 32 epochs at a nominal exposure time of 2 hr each.In subsequent years, SDSS-RM obtained ∼ 12 epochs per year (2 per month) with a nominal exposure time of 1 hr each during 2015-2017 and ∼ 6 epochs per year (monthly cadence) during 2018-2020.The final spectroscopic baseline covers 7 years, with a total of 90 spectroscopic epochs.The wavelength coverage of BOSS spectroscopy is ∼ 3650−10400 Å, with a spectral resolution of R ∼ 2000.The typical signal-tonoise ratio (S/N) per 69 km s −1 pixel averaged over the g band in a 2 hr exposure is ∼ 4.5 at g psf = 21.2, but could be lower for epochs observed with poor observing conditions.The nominal broad-band spectrophotometry accuracy is ∼ 5% in gri bands, but can be worse than ∼ 10% below 4000 Å (Shen et al. 2015a).

Photometry
Optical monitoring photometry provides continuum light curves to facilitate RM lag measurements.SDSS-RM obtained photometry in the g and i bands with the Steward Observatory Bok 2.3 m telescope on Kitt Peak and the 3.6 m Canada-France-Hawaii Telescope (CFHT) on Maunakea.The Bok and CFHT imaging roughly covers the same monitoring period as the SDSS-RM spectroscopy, with the highest cadence in 2014 and reduced cadences in subsequent years.The details of these photometric observations and data processing are described in Kinemuchi et al. (2020).In addition to dedicated Bok and CFHT photometry, the SDSS-RM field coincides with the MD07 Medium-Deep Field in the PanSTARRS-1 survey (Kaiser et al. 2010), with few-night-cadence, multi-band (grizY ) photometric light curves during 2010-2013.These PS1 data were published in Shen et al. (2019b), and provide early continuum light curves to extend the effective baseline for our RM measurements.Finally, we compile available photometric light curves for the SDSS-RM sample from the Zwicky Transient Facility public survey (Bellm et al. 2019) during 2018-2020.The combined photometric light curves span a baseline of 11 years (2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017)(2018)(2019)(2020), and were merged to provide a uniform continuum light curve for each quasar ( §3.2).

PrepSpec
PrepSpec is a spectral refining procedure performed on the flux-calibrated multi-epoch SDSS spectra in order to reduce further the scatter in the flux calibration (Shen et al. 2016).PrepSpec rescales the flux levels of each individual epoch by optimizing model fits (in parameterized functional forms) to describe the continuum and broad-line variability patterns as functions of time and wavelength, using the fluxes of the narrow emission lines (in particular [Oiii] λλ4959,5007) as an internal calibrator (e.g., van Groningen & Wanders 1992), which are assumed to remain constant over the relatively short monitoring period compared with the lightcrossing time of the narrow-line region.This procedure improves the calibration of the relative spectrophotometry to ∼ 2% for low-redshift quasars with strong narrow emission lines.This refining procedure is particularly useful in calibrating a small number of spectroscopic fiber+epochs that suffered significant flux losses in the fiber due to unknown systematics (Shen et al. 2015a).
PrepSpec also generates model light curves (and uncertainties) for a given set of broad emission lines and continuum fluxes, as well as model mean and RMS profiles of the broad lines.We use these broad-line light curves in our lag measurements, and the line widths in the calculation of RM BH masses.The full technical details of PrepSpec are described in Shen et al. (2016).All PrepSpec outputs are released as part of the final data products of SDSS-RM.
Compared with earlier PrepSpec results used for intermediate SDSS-RM lag measurements (Shen et al. 2016(Shen et al. , 2019b;;Grier et al. 2017bGrier et al. , 2019;;Homayouni et al. 2020), there are several improvements.First of all, we have run PrepSpec on the full 7-year spectroscopic data, which provide better constraints on variability models than earlier shorter light curves.Second, PrepSpec measured more reliable broad-line widths in the mean spectrum by subtracting the emission of the Fe ii complex.Third, PrepSpec adopted more accurate systemic redshifts provided in Shen et al. (2019b) and refined line windows for better line width and flux measurements.
The broad-line RMS spectrum constructed by Prep-Spec is the true RMS profile for the variable line component.This is also the standard approach used by most of the recent RM campaigns on low-z AGNs (e.g., the LAMP project, Barth et al. 2015), where the individualepoch spectra are decomposed into separate emissionline and continuum components.Some earlier RM studies (e.g., Peterson et al. 2004;Grier et al. 2012) construct the RMS spectrum from the total (emission line+continuum) epoch spectra, and subtract a local continuum to derive a line-only RMS spectrum.There are systematic biases in the measured RMS line widths from the latter approach when the monitoring period is not much longer than the lag (e.g., Barth et al. 2015;Wang et al. 2019).We will revisit this issue in §4.6.

Merged continuum light curves
We combine the photometric light curves from different facilities/surveys to form a merged continuum light curve for lag measurements, as commonly done for RM studies.For the PS1 data, we use observations in gri filters to improve the sampling.The zY data are not used since the continuum variability there is small and more contaminated by host-galaxy emission.We use the PS1 data published in Shen et al. (2019b), and coadd the nightly observations weighted by the inverse variance of the individual flux measurements.
For the purpose of measuring broad-line lags, we ignore the inter-band continuum lags.Such continuum lags trace the structure of AGN accretion disks, which are typically much more compact than the BLR, hence these continuum lags are on much shorter timescales (e.g., less than a few rest-frame days, but can be longer for more luminous quasar accretion disks; Fausnaugh et al. 2016).A complication is that a fraction (e.g., ∼ 10 − 20%) of the optical continuum emission may come from the diffuse BLR emission (e.g., Guo et al. 2022), and cause inter-band continuum lags that are non-negligible compared with broad-line lags.This is still an ongoing investigation, and without a robust method to correct for inter-band continuum lags for SDSS-RM quasars, we simply follow previous RM work and ignore this detail.We scale the merged light curves (in f ν ) to physical flux scales in r band.This band was chosen because it offers the best spectrophotometry in SDSS spectra (Shen et al. 2015a).We use the synthetic broad-band flux computed from the SDSS spectroscopy as the baseline to calibrate and merge other photometric light curve sets.
The continuum light curve merging is performed with the public code PyCali (Li et al. 2014).PyCali is a Bayesian method to calibrate the light curves obtained with different facilities at different times to a reference set of light curves, during which the flux uncertainties of individual light curve sets can also be adjusted to account for overestimated or underestimated flux uncertainties in the original data set.Before running PyCali we coadd (weighted by inverse variance) intra-night photometry points from the same facility and same band.However, for CFHT and Bok observations, some objects are covered on multiple detector chips, and we only coadd the nightly photometry from the same chip, using the light curves compiled in Kinemuchi et al. (2020).
We show an example of the merged 11-year continuum light curves in Fig. 1.The full set of PyCali-merged continuum light curves is released as part of the data products from this work.

Light curve properties
We quantify the variability characteristics of the light curves and compile the results in the summary Table 2.We use a maximum-likelihood estimator (Shen et al. 2019b) to measure the intrinsic variability (rms) of the light curve, corrected for flux uncertainties.These rms variability measurements are performed for both the 11yr photometric light curves and the 7-yr emission-line light curves from spectroscopy+PrepSpec analysis.In addition to the rms variability metric, we use another empirical metric to assess evidence for intrinsic variability of the light curve (Shen et al. 2019b), defined as 1 SNR2 = χ 2 − dof.Here χ 2 ≡ Σ(y i − ⟨y i ⟩) 2 /σ 2 i is relative to the optimal average of the light curve ⟨y i ⟩, using flux measurement errors σ i only; dof is the number of degrees of freedom (N − 1), where N is the number of light curve points.Larger intrinsic variability (with respect to flux uncertainties) tends to have a larger SNR2.
Fig. 2 displays the distribution of the SDSS-RM sample in the redshift versus expected (Hβ) lag space, colorcoded by the intrinsic fractional rms variability of the continuum.The continuum rms variability has a sample median of ∼ 15%, and broadly increases with decreasing luminosity at a given redshift (e.g., Vanden Berk et al. 2006;MacLeod et al. 2012).Fig. 3 displays the comparisons between the continuum and broad-line (fractional) rms variabilities.Given these typical fractional variability amplitudes in the continuum and in the responding lines, measuring a lag requires good flux calibration and spectral S/N.Many targets in the SDSS-RM sample do not have the adequate S/N in the emission-line light curves to allow a robust lag detection.In addition, to measure a lag, the light-curve variability must contain distinctive inflection "features", as a monotonically changing light curve will produce broad correlations over a range of lags.Nevertheless, given the large sample size, there are many SDSS-RM quasars with well-detected variability patterns to measure the lags.A handful of the best-quality light curves are even good enough to measure velocity-resolved lags, which will be presented elsewhere (e.g., Fries et al. 2023).
1 In rare cases within the SDSS-RM sample, if the light curve is consistent with noise or the flux uncertainties are overestimated, χ 2 − dof may become negative.These cases will not yield a meaningful lag detection.In such cases PrepSpec calculates SNR2 = − |χ 2 − dof|, which is negative.Compared with the optical continuum variability, different broad lines exhibit different levels of fractional variability.Fig. 3 shows that while there are broad correlations between the continuum and line fractional RMS variability, Mg ii and Hα have slightly lower average variability compared with C iv and Hβ.The lower variability for Mg ii, as seen in earlier studies (e.g., Sun et al. 2015;Yang et al. 2020), can be understood if the Mg ii-emitting gas is further out than the BLRs of other species, and/or the response of Mg ii is in general weaker than Hβ, as shown in photoionization calculations (e.g., Goad et al. 2012;Guo et al. 2020).The high-ionization C iv line usually shows larger RMS variability than Hβ (Peterson et al. 2004) in local low-luminosity AGNs.However, for high-luminosity quasars, it is argued that a portion of the C iv line does not reverberate to continuum variability, thus diluting the line fractional RMS variability (e.g., Denney 2012;Wang et al. 2020b).The situation may be further complicated by the fact that the ionizing continuum (powering the lines) and the optical continuum have different RMS variability, as the variable accretion flow onto the SMBH results in varying shapes of the spectral energy distribution (SED).
Although not covered in our lag measurements, we show the fractional RMS variability for additional broad lines in Fig. 4. The two strong UV broad lines, Si iv and C iii], show similar RMS variability as C iv, while the two high-ionization lines He ii 1640 and He ii 4686 show notably stronger RMS variability than other lines.The behaviors of the He ii broad lines are consistent with local RM observations (e.g., Peterson et al. 2004).
Finally, we show the comparisons between the FWHM measured from the mean spectrum and the line dispersion σ line,rms measured from the RMS spectrum in Fig. 5, using PrepSpec outputs.A strong correlation between these two line widths for the same line is a necessary (but not sufficient) condition for the validity of single-epoch virial mass estimators.The average ratios of FWHM/σ line for all four broad lines are smaller than 2.35, implying the line profile is more Lorentzian (e.g., Collin et al. 2006;Villafaña et al. 2023).

Methodology
Most RM studies measure a reverberation-weighted "average" lag across the BLR between the continuum and broad-line light curves.This is mainly limited by the quality of the data (i.e., cadence, baseline and S/N), relative to the intrinsic variability of the quasar.For high-quality RM light curves, it is possible to measure the 1D transfer function Ψ(τ ) between the continuum and integrated line fluxes (e.g., Horne et al. 2004).If the spectroscopic data are of sufficient quality to reveal the velocity-resolved BLR responses, it becomes feasible to measure the 2D transfer function (Ψ(τ, v); or velocitydelay map) and constrain the dynamical structure of the BLR.The ultimate goal of RM is to measure highfidelity velocity-delay maps and to constrain the geometry and kinematics of the BLR, along with a dynamical measurement of the black hole mass.
SDSS-RM is a pioneering program to measure BLR lags in a large sample of distant and luminous quasars.As such, the quality of the monitoring data is generally     We do not report lag measurements for these lines, but their variability properties are compiled in Table 2.
insufficient to robustly measure the 1D or 2D transfer functions in individual objects, except for a small number of objects with very large variability amplitudes.Rather, the main science goal of SDSS-RM is to measure an average lag for quasars in a luminosity-redshift regime uncharted by earlier RM programs.While an average lag does not inform us about the detailed geometry and kinematics of the BLR, it allows us to measure an approximate mass of the black hole, assuming the BLR is virialized.The uncertainty of the inferred RM mass in individual objects is dominated by the assumed geometric factor (the virial factor, see §4.5) and the uncertainties in the measured average lag and line width.
To measure this average BLR lag, RM studies traditionally deploy non-parametric (or with minimal parameterization) cross-correlation analysis methods, such as the interpolated cross-correlation function (ICCF, Gaskell & Peterson 1987), discrete correlation function (Edelson & Krolik 1988), or the z-transform discrete correlation function (ZDCF, Alexander 2013).Other nonparametric methods are also available to measure the average time delay between two light curves (see, e.g., Czerny et al. 2019a, for some examples).Alternatively, recent approaches such as Javelin (Zu et al. 2011) and CREAM (Starkey et al. 2016) use a stochastic variability model to interpolate the light curves and measure the lag using MCMC to sample the posterior lag distribution.PyROA (Donnan et al. 2021) takes a slightly dif- ferent approach: it performs a Bayesian analysis on the light curves with a running optimal average, from which the mean lag and the shape and width of the transfer function can be simultaneously constrained.The later lag measuring techniques are more reliable in recovering the lag and its uncertainties when the light-curve quality is insufficient for the other non-parametric methods to measure a significant lag (e.g., Li et al. 2019;Yu et al. 2020).Nevertheless, earlier correlation analysis methods are less model-dependent, and provide important confirmation of the mean lag measurements.In particular, we found that the empirical ICCF correlation coefficient provides an efficient and intuitive metric on the successful detection of a lag.For this reason, information in the original ICCF will be used in determining the success of a lag measurement, even if the best lag is taken from more sophisticated approaches such as Javelin or PyROA.
Each pair of continuum and broad-line light curves is passed through a lag detection pipeline (wrapper) that collects publicly available packages (Sun et al. 2018;Zu et al. 2011;Donnan et al. 2021) to perform ICCF, Javelin (v0.33) and PyROA (v3.1.0)analyses.The pipeline first flags and removes outliers 5σ away from a Damped Random Walk (DRW) model fit to the entire continuum light curve, using the package celerite (Foreman-Mackey et al. 2017).This outlier rejection is only performed for the continuum, given the numerous epochs combined from different surveys.The bestfit (taken as the median of the posterior distribution) DRW parameters are adopted (and fixed) in the subsequent Javelin analysis for self-consistency.We report these best-fit continuum DRW parameters in Table 2.The best-fit damping timescales τ DRW for the SDSS-RM sample have a median value of ∼ 270 days in the quasar rest frame, which is consistent with the findings from past studies using ∼ 10 yr long light curves (e.g., MacLeod et al. 2012;Suberlak et al. 2021;Stone et al. 2022).However, there is evidence that the damping timescale may be even longer if measured from light curves with much longer baselines (e.g., Stone et al. 2022).
For the Javelin run we use MCMC parameters n chain = n walkers = n burn = 200 to provide enough sampling of the posterior over the lag search range.For PyROA we use a similar amount of MCMC samples, and turn on the extra variance and delay distribution (adopting a Gaussian kernel) options in PyROA.The pipeline computes the relevant lag measurement properties in each method and determines the best-fit lag and its uncertainty following the rules specified in §4.2.The pipeline (wrapper) is publicly available at https://pypetal.readthedocs.io/en/latest/Our long-term baseline allows the exploration of lags up to as much as ∼ 11 years in the observed frame (but the success rate will drop dramatically when approaching the maximum baseline), given the leading photometric light curves from PS1.However, quasars in the SDSS-RM sample have a wide range of observed-frame lags, from a few days to a few years.Searching the entire possible lag range enabled by the baseline is expensive (given the required MCMC samples) and unnecessary for objects with much shorter lags, and greatly increases the chance of aliases that become difficult to mitigate with our methodology described in §4.2.We therefore limit the lag search range on an object-by-object basis.
We first estimate the expected observed-frame lag, τ obs,exp , based on the median continuum luminosity over the spectroscopic baseline, using the Hβ R − L relation from Bentz et al. (2013).The C iv lag is typically shorter than the Hβ lag, and the Hα and Mg ii lags are slightly longer than that of Hβ based on past RM results (e.g., Peterson et al. 2004;Homayouni et al. 2020).We then Figure 6.An illustration of lag measurements using ICCF, Javelin, and PyROA, as well as our alias mitigation scheme using lag posterior weights.The left two panels show the light curve data in points, and the PyROA model light curves in solid lines and shaded area (best model and 1σ uncertainty).The right panels show the lag measurements.The "ICCF" panel shows the original ICCF, with the red vertical line (and number) indicating the expected (observed-frame) lag for Hβ.The "Weights" panel shows the ACF (orange), periodic overlapping light curve data points (N (τ )/Nmax) 2 (gray), and the final weights (blue).The next three panels show the lag results for ICCF, Javelin and PyROA, with the lag posterior displayed in the normalized histogram.The gray shaded band shows the boundaries of the identified primary peak, from which we measure the lag (median) and 1σ uncertainties (16th and 84th percentiles).The final lag and uncertainties are reported in each panel.For this particular object, the ICCF is relatively broad, resulting in a different lag using the cross-correlation centroid distribution (CCCD).However, the lag inferred from the cross-correlation peak distribution (CCPD; also see the top-right panel of this figure) is in excellent agreement with the Javelin and PyROA lags.
limit the lag search range within ±5 × τ obs,exp , where negative lags refer to the line variability leading the continuum variability.However, we also require a minimum search range of ±250 days and a maximum search range of ±2500 days for each object.The latter maximum lag search range is defined to be roughly the same as the spectroscopic baseline.Importantly, as we are imposing a symmetric lag search range, we do not expect a systematic bias toward aliased (positive) lag values.In a small number of cases, the lag approaches 90% of the search range, and we rerun the lag detection with expanded search ranges to avoid introducing an artificial boundary in the measured lags.
There is a small fraction (∼ 4%) of objects for which the expected τ obs,exp exceeds 1000 days (Fig. 2).For these objects, we relax the symmetry requirement on the lag search range, and utilize the full range of detectable lags given the leading years of photometry.The lag search range for this subset of quasars is [−2500, 4000] days in the observed frame.

Alias mitigation and detection of lags
The standard ICCF, Javelin and PyROA analyses do not take into account the number of LC pairs contributing to the correlation calculation at each lag.For typical SDSS-RM quality light curves, there are often a num- ber of aliasing peaks in the lag posterior distribution that impact the measurement of the true lag.We mitigate the impact from aliasing peaks in the lag posterior following the scheme introduced in earlier work (Grier et al. 2017b(Grier et al. , 2019)).In short, we calculate a weight , where N (τ ) is the number of shifted emission-line data points that overlap in the date ranges with the continuum and N (0) is the number of overlapping points at τ = 0. Shifted emission-line data points falling into seasonal gaps are not counted.Since our continuum light curve encompasses the emission-line light curve by design, N (0) is the maximum number of spectroscopic epochs.We then derive a final weight P f (τ ) = P (τ ) * ACF(τ ), where ACF(τ ) is the auto correlation function of the continuum LC and " * " is the convolution operation.The weight P (τ ) accounts for the reduction in the statistical constraint on the lag due to the fewer data points.While the exponent of 2 in the P (τ ) weight is somewhat arbitrary, we find it results in optimal rejection of aliasing peaks (e.g., Grier et al. 2019).The convolution with the continuum ACF accounts for the effect of seasonal gaps on the detection of certain lags.If the ACF declines rapidly, the annual seasonal gaps will have a significant effect on our detection because we are less likely to account correctly for the light-curve behavior during the gaps.On the other hand, if the ACF declines slowly away from zero lag, it implies it is straightforward to interpolate across the seasonal gaps as the variability is slow; in this case the seasonal gaps are less likely to have an effect on our lag measurements.In the latter case, it is possible to measure a lag even if most of the shifted emission-line data points fall into seasonal gaps in the continuum light curves (see examples in, e.g., Shen et al. 2019a).
Once we have the posterior lag distribution (for Javelin or PyROA, this is the posterior lag distribution; for ICCF, this is the cross-correlation centroid distribution; CCCD), we weight the distribution with P f (τ ).The weighted lag distribution is then smoothed by a Gaussian kernel with a width of 15 days, and we identify the tallest peak within this smoothed distribution as the "primary" peak.We identify local minima in the distribution to either side of the peak and adopt these minima as the minimum and maximum lags to be included in our final lag calculation.We then return to the original unweighted posterior, reject all lag samples that lie outside the determined range, and use the remaining samples to calculate the final lag (median of the truncated distribution) and its 1σ uncertainties (16th and 84th percentiles of the distribution).Fig. 6 demonstrates this procedure with one example of our targets (RM078).We did not detect an Hβ lag for this object using the 2014 data (Grier et al. 2017b), but the final SDSS-RM data, with multi-year coverage, are able to firmly detect a lag.In Appendix D, we show several additional examples that sample different lag detection qualities, and all lag diagnostic figures are available from our public server (see Appendix A).Fig. 7 shows the distribution of ICCF lag measurements in the lag versus r max plane (e.g., Shen et al. 2016;Grier et al. 2017bGrier et al. , 2019;;Homayouni et al. 2020), where r max is the maximum Pearson correlation coefficient within ±1σ from the reported ICCF lag, which can be positive or negative.For all four lines, there is a clear asymmetry toward positive lags (i.e., line lagging contin-  In most cases, our fiducial PyROA lags are consistent with the ICCF lags.However, for our multi-year light curve data, the Javelin method often traps the lag in the seasonal gaps, leading to artificial clustering of lags there.uum), especially when r max is high.This figure demonstrates the statistical detection of RM lags for all four lines, given the symmetrical lag search ranges.When the light-curve quality is poor (or variability insignificant), the significance of correlation r max is low, and there are more random incidents of positive and negative lags.At r max > 0.4, the overall negative-to-positive lag ratio within the plotted lag bounds is ∼ 0.04 − 0.17, indicating a false positive rate of ∼ 4 − 17% in the reported (positive) lags.At r max < 0.4, although there are still more positive lags than negative ones, the false positive rate is substantially higher, and the measured positive lags there are of lower significance.Similar lag asymmetries are observed when we replace r max with the line variability SNR2, which is expected as successful lag recoveries rely on well-detected line variability.However, there is no rigid boundary on SNR2 that would guarantee the detection or non-detection of a lag.Fig. 7 also dictates that using a positive lag search range in ICCF will not introduce a large confirmation bias if r max > 0.4.This may justify the usage of a positive lag search range for MOS-RM samples to reduce computational time.However, it is important to use the r max value derived from the original ICCF to pre-filter low-significance lag detections.
During our lag measurements with ICCF, Javelin and PyROA, we discovered that many Javelin fits have lag posteriors that peak in the seasonal gaps.Because the ACF for SDSS-RM quasars is generally broad enough to allow the DRW model to interpolate into seasonal gaps, our weighting scheme is often unable to suppress these gap peaks.The situation is different from our previous analyses using shorter SDSS-RM light curves (e.g., Grier et al. 2017bGrier et al. , 2019;;Homayouni et al. 2020).Now with our full multi-year baseline, it becomes more common for the Javelin posterior to be trapped in these seasonal gaps, resulting in spurious lag measurements.
In these failed Javelin fits, the favored model is where the line light curve is shifted into an annual gap, with minimal overlap with the continuum light curve in order to produce a good match between the two light curves.One particular reason for this gap alignment may be underestimated systematic uncertainties in the flux measurements that cause tension in matching up the continuum and line light curves.To remedy the situation, we deploy PyROA, which includes an excess variance term to account for unknown systematic uncertainties in flux measurements.Like ICCF, PyROA is an empirical method that does not rely on a physical DRW model to interpolate the light curves.However, PyROA implements modules to account for the smearing of the line light curve due to the transfer function and for excess variance in flux measurements, with MCMC sampling to produce the lag posterior and rigorous lag uncertainties.Therefore, PyROA preserves the advantages of being model-independent, while being less prone to underestimated flux uncertainties than Javelin.
We visually inspect the ICCF, Javelin and PyROA results for each object, and conclude that indeed PyROA returns the most reasonable lag posterior and model light curves among all three methods.We therefore adopt the PyROA lag results as our fiducial results in the following analyses.
We now quantify the significance of individual lag measurements, and define a set of criteria to declare successful detection of a positive lag.These cases have $line$ LAG DET> 0 in Table 2. To do so, we rely on a hybrid set of parameters from the ICCF and PyROA results: • The PyROA lag must be positive at ≥ 2σ significance.Although lower-significance detection might still be valid, the large uncertainties will render their measurements less useful, and could complicate our comparative analyses.
• The original ICCF must reach Pearson correlation coefficient r max > 0.4 within ±1σ of the reported PyROA lag.
• Less than half of the lag posterior samples can be removed by our alias-removal procedure.If this procedure eliminates more than half of the samples, it indicates we lack a solid measurement of the lag as most of the samples are rejected.For PyROA lag posterior, this criterion is usually always satisfied.
As a final check, we visually inspect all detected lags (see §4.3) and confirm there are no peculiarities in the light curves (e.g., predominantly corrupted data) or in the measured lag posterior.The choice of a threshold of r max > 0.4 is somewhat arbitrary, but provides an efficient means to remove a large number of false positives.If we lower this threshold, we would recover more lags, but the contamination rate will also rise quickly.On the other hand, we caution that imposing more stringent selection criteria may introduce additional selection biases in the resulting lag sample, and artificially reduce the observed (intrinsic) scatter or bias the slope in the R − L relation.
In Appendix C, we show the results with more stringent quality cuts on detected lags, which are generally consistent with our fiducial results using the full sample.However, there are noticeable differences when we limit to the manually inspected subset with the highest grades.These highest-grade lags, while being the most reliable, are limited to well-detected lags much shorter than the baseline.Thus using this subset of lags imposes severe selection biases in the R − L plane.
In Fig. 8, we compare the detected lags for the four lines using the three methods.As expected, overall the PyROA lags agree with ICCF lags, with the latter producing larger lag uncertainties.On the other hand, Javelin produces too many artificial lags within the seasonal gaps, although there is agreement between Javelin and PyROA for many objects.The underperformance of Javelin on large MOS-RM samples with multi-year light curves is unexpected. 2 Considering the flexibility of adding extra variance in the light curves with PyROA, and the excellent agreement between PyROA results and the empirical, model-independent ICCF results, we recommend PyROA as an efficient method to measure lags for MOS-RM data.

Manual inspection of lags
For the manual inspection procedure, we follow the convention of our earlier work (Grier et al. 2017b(Grier et al. , 2019;;Homayouni et al. 2020) to assign an integer grade (1−5; higher grades correspond to better quality) to each detected lag (see below).This manual inspection process is of course subjective, but nevertheless provides a useful means to select the most reliable lags.Subsequently we consider the subset with lag grade= 4 or 5 the "Gold" sample.These lag grades are included in Table 2.
For detailed manual inspection, the grading criteria are as follows: 1. Spectral Variability: We use the RMS spectrum to ensure that the emission line of interest is variable over 7 years of observation.Therefore, we inspect the 7-year RMS spectrum to ensure that the line variability signal is present, and is not overlapping with the edges of the spectrum.
2. Light Curve Variability: The continuum and line light curves demonstrate sufficient variability, and features that deviate from a smooth trend, suggesting underlying dynamic processes.
3. Lag Posterior Peak: As an additional check on the quality of the measured lags, we require the PyROA lag pdf has a well-defined primary peak away from the lag search range boundaries.

Lag Validation:
We inspect the superposition of the shifted line light curve onto the continuum light curve to validate the measured lag against the continuum and line light curves.This is particularly important due to the seasonal gaps in SDSS-RM observations.This could coincide with lags of approximately 180, 540, etc. days, which could introduce false positive lag detections in our 7-year measurements.This validation process consists of two steps: • Good alignment between the continuum and line light curves suggest that the measured lag represents a reliable estimate of the underlying light travel time delay, characteristic of physical RM lag.
• If the measured lag coincides with the seasonal gap, indicating fewer overlapping data points between the continuum and the shifted line light curve, then we examine to see if the overall trends between the two light curves and correlated variations match a reverberation picture.
In either case, if the superimposed light curves show excessive overlap due to high data variance, we consider the light curves are an inadequate match for reliable lag determination.
The visual inspection procedure introduces complex selection functions on the resulting best-quality lags.As a consequence, we caution on the danger of removing certain lags using visual inspection from the sample in measuring the R − L relation.We do not recommend on using the "Gold" lag sample alone or in combination with other heterogeneous lag samples to measure the R − L relation.On the other hand, studies that are not affected by lag selection effects, such as the investigation of host galaxy properties, can utilize this "Gold" sample for best-measured RM masses.

False positive rate for individual lags
Using Javelin or PyROA to estimate the false positive rate (FPR) for individual lags would be prohibitive, given the large number of lag detections in the SDSS-RM sample and the computational demands of running Javelin or PyROA for the full light curve sets.However, Fig. 7 demonstrates that using the much faster ICCF method to estimate the FPR is a reasonable approach.Here we pair the observed line light curve with randomly generated continuum light curves for mock lag measurements.To capture the variability characteristics of individual objects, we use the best-fit DRW parameters for each object to generate random continuum light curves.
For each detected positive lag, we generate 100 mock continuum light curves, and perform the ICCF analysis on each realization.If a mock lag detection from the CCCD has r max > 0.4 (within 1σ range of the CCCD median) and recovers a positive lag at ≥ 2σ, we count it as a false positive.The fraction of such incidents among the 100 mock realizations is recorded as the FPR for the observed lag.
For the detected lags (i.e., $line$ LAG DET> 0 in Table 2), the median FPR for each line ranges from 6% to 14%, which is in line with the expected rates using our r max cut ( §4.2).There is a small fraction of detections (0%, 5%, 5% and 15% for Hα, Hβ, Mg ii and C iv) for which the estimated FPR > 0.3.However, visually inspecting these cases, the light curves are often significantly variable and the lags robustly detected.Some examples are RM078 (Hβ lag), RM101 (Hα and Hβ lags), RM578 (Mg ii), RM256 (C iv), etc.We argue that the FPR has been significantly overestimated in these cases for the following reasons: (i).The dominant variability feature in the light curve is a broad bump or dip.Given the typical DRW damping timescale, the mock continuum light curve often contains multiple peaks/dips over the 11-year baseline, leading to aliasing lags in the cross-correlation, some of which will be identified as positive detections.In that sense, even for some of the best measured lags in the local RM AGN sample, this FPR estimation approach would produce a non-negligible fraction of false positives, if we use simulated continuum light curves that are much longer than the measured lag.
(ii).Our lag search range is generous (e.g., ±5 times the expected lag).When combined with the long light curve baseline, we will often recover an alias lag with the mock continuum light curve, boosting the FPR.In other words, the prior information on the anticipated lag is not used in this FPR estimation.
(iii).Using ICCF in the FPR estimation is efficient, but not as robust as using PyROA in measuring the lag, which may also tend to overestimate the FPR.
Given these reasons, and our visual inspection of detected lags with high FPR estimates, we conclude that the individual FPR estimates should not be used as the sole metric to assess the robustness of individual lag measurements.Some high-fidelity lag detections (e.g., Fig. 6) may have a high FPR.We do not recommend to use the reported FPR to select a refined lag sample, unless the lag measurement quality (e.g., r max and lag errors) and diagnostic plots are checked simultaneously.
Nevertheless, the most important utility of our FPR test is to ensure consistency.The statistical behaviors of our FPR estimates confirm that our reported lags have a low overall false positive rate of ∼ 10%, in concordance with our estimates in §4.2 based on the positive/negative lag asymmetry (e.g., Fig. 7).

The virial coefficient f
To scale the measured virial product, VP ≡ cτ rest ∆V 2 /G, to the RM BH mass we need to multiply by the virial coefficient f .The exact value of f depends on the choice of velocity indicator ∆V , which can be the FWHM or line dispersion (σ line ) measured either from the mean spectrum or from the rms spectrum (e.g., Collin et al. 2006;Wang et al. 2020b).Throughout this paper we focus on the line dispersion measured from the rms spectrum, σ line,rms , as commonly adopted in RM studies.Given a specified line width, this scale factor depends on the geometry and kinematics of the BLR gas, and can differ significantly from object to object.For example, if the BLR gas is primarily distributed in a flattened structure as suggested in earlier observations of broad Balmer lines (e.g., Wills & Browne 1986;Shen & Ho 2014;Mejía-Restrepo et al. 2018), then more face-on BLRs require larger f -factors than more edge-on BLRs.The situations may be more complicated for realistic kinematics of the BLR gas that could involve non-virial motions, e.g., for the high-ionization C iv line (e.g., Proga et al. 2000;Waters et al. 2016).
For traditional RM work that utilizes the average lag, the virial coefficient f is usually determined by comparing the virial products to the expected BH masses from the local M BH − σ * relation (e.g., Onken et al. 2004;Graham et al. 2011;Park et al. 2012;Woo et al. 2013).These calibrations can only provide an average viral coefficient ⟨f ⟩ for the sample (because the M BH − σ * relation itself has significant intrinsic scatter), varying from ⟨f ⟩ = 2.8 (Graham et al. 2011) to ⟨f ⟩ = 5.9 (Woo et al. 2013)  Recent progress on dynamical modeling of the BLR with RM data has made it possible to derive the individual virial coefficient for AGNs (e.g., Pancoast et al. 2014;Grier et al. 2017a;Williams et al. 2018;Li et al. 2018;Williams et al. 2020;Bentz et al. 2021;Li et al. 2022;Villafaña et al. 2022;Bentz et al. 2022).By dynamically modeling the BLR and constraining the model parameters with high-quality RM data, this approach can derive the underlying BH mass along with the geometrical and dynamical properties of the BLR.Comparing the dynamical BH mass with the average-lag-based virial product, one can derive the required virial coefficient for individual systems, rather than an average virial coefficient from the M BH − σ * relation.This approach, while still dependent on the details of the dynamical modeling, eliminates the systematics and limitations in the estimated average f -factor with BH-host relations.
Based on 16 local RM AGNs with available dynamicalmodeling BH masses, Williams et al. (2018) found an average ⟨log f ⟩ = 0.57 dex with a dispersion of 0.19 dex, which is consistent (within 1σ) with the average f -factor derived from the M BH − σ * relation.Several studies (e.g., Pancoast et al. 2014;Williams et al. 2018) also studied the correlations between the individual f -factor and AGN properties and found no significant trend (but see Linzer et al. 2022;Villafaña et al. 2023, for the latest results on potential correlations).However, the individual f -factor strongly depends on the inclination of the BLR derived from dynamical modeling, where the BLR gas is often distributed in a thick-disk geometry.
Here we compile the latest sample of RM AGNs with dynamical modeling BH masses from the literature.This sample contains 30 objects (NGC5548 has two independent measurements) and the details are summarized in Table 3.We only compile the virial product computed using the line dispersion measured from the RMS spectrum, since this is the most commonly adopted line width in RM work.Fig. 9 (top) shows the distribution of the individual f -factors (log f (σ line,rms )) in the sample, which is peaked around log f ≈ 0.6.
To quantify the mean and intrinsic dispersion in log f (σ line,rms ), we follow the approach in, e.g., Pancoast et al. (2014).We assume the intrinsic log f distribution is a Gaussian distribution with mean value µ logf and dispersion σ logf .The posterior probability distributions of parameters µ logf and σ logf , given the sample of measured log f , are calculated following appendix B in Pancoast et al. (2014) that takes into account the posterior probability distribution function (PDF) of individual log f measurements.Because these dynamical modeling papers typically do not publish the full PDF of log f , we simply adopt symmetric Gaussian errors for these log f measurements.The constrained marginalized PDFs for µ logf and σ logf are shown as black and red lines in Fig. 9 (bottom), where the peak and 16th/84th percentile of Note-For each of the lag measuring methods (ICCF, Javelin and PyROA ), we provide the best lag (* LAG), lag uncertainties (* LAG ERR), bounds of the identified primary peak in the lag posterior (* PEAK BOUNDS), the fraction of rejected samples (* FRAC REJ), and the maximum Pearson correlation coefficient within ±1σ of the best lag (* LAG RMAX).In a handful of cases, the reported PyROA lag may lie outside the LAG SEARCH range.These are cases where the original PyROA lag measurement reached 90% of the search boundary, and we have expanded the search range (for that specific line) to the maximum allowed range ([−2500, 4000] days).Finally, we caution that the false positive rate estimate for individual lag detections ($line$ LAG FPR) is often an overestimation, and should be combined with the lag measurement diagnostic plot to assess the fidelity of the reported lag. the distribution are marked.The constrained mean log f is µ logf = 0.62 ± 0.08, and the constrained intrinsic dispersion σ logf = 0.32 +0.08 −0.06 , where uncertainties are 1σ.Alternatively, given the set of log f measurements and their uncertainties, we can calculate the mean and intrinsic dispersion in log f using a maximum-likelihood (ML) estimator described in Shen et al. (2019b).This ML approach also assumes that the intrinsic log f values are Gaussian distributed around mean value µ ML with a dispersion of σ ML .The results are µ ML = 0.62 ± 0.07 and σ ML = 0.31 ± 0.07, which are nearly identical to the results using the PDF approach above.
Thus we conclude, based on a sample of 30 independent measurements of log f (σ line,rms ) with both VPs and dynamical-modelling BH masses, that the mean loga-rithmic virial factor is ⟨log f ⟩ = 0.62 ± 0.07.This value is consistent with most of the recent determinations of the virial factors either using BH-host scaling relations or using dynamical masses from RM (Onken et al. 2004;Graham et al. 2011;Park et al. 2012;Woo et al. 2013;Ho & Kim 2014;Pancoast et al. 2014;Grier et al. 2017a;Williams et al. 2018).Importantly, this literature sample compiled in Table 3 allows us to robustly constrain the intrinsic dispersion of the individual virial factor to be ∼ 0.3 dex (or a factor of two in f ).This dispersion in individual f factors sets a typical systematic uncertainty of RM-based BH masses derived using a single, average f factor.This intrinsic scatter in log f is notably larger than (but consistent within ∼ 2σ) the value of σ logf = 0.14 ± 0.10 reported in Williams et al. (2018) based on 16 objects with RM data and dynamical masses.The additional objects in our sample require a larger σ logf to account for their dispersion.
In our analysis we have only focused on the f -factor based on σ line,rms , and ignored potential dependences of the f -factor on AGN properties.Villafaña et al. (2023) used a similar AGN sample with dynamical BH masses to investigate correlations between the virial factors based on different line width definitions and AGN properties, for Hβ only.Such correlations, if confirmed with future larger samples with dynamical BH masses, will further shed light on the structure of AGN BLRs and improve BH mass estimation using average RM lags.
In the following discussion, we adopt the same average f -factor for all four broad lines.As shown in §5.1, the overall agreement between the RM BH masses from two different lines indicates adopting the same f -factor for all lines is a reasonable approach.However, we acknowledge that the SDSS-RM sample is different from the local RM AGN sample in terms of luminosity and BH mass.Whether or not the average f -factors differ among different samples is a topic for future studies.

Lags and RM masses
Our fiducial RM masses are computed from the measured broad-line lags combined with the line dispersion measured from the RMS spectrum of the monitoring data.In mathematical form: where ⟨f ⟩ is the geometric mean f factor when using σ line,rms for the line width.The PrepSpec σ line,rms measurements are based on the continuum-subtracted RMS spectrum, which corresponds to the true line-only variability.Some earlier approaches (e.g., Peterson et al. 2004) generate the RMS 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 Redshift spectrum from the (total) emission line+continuum spectra and measure the line dispersion by removing a local continuum from the RMS spectrum.The latter approach can produce line dispersions that are on average biased low by ∼ 20 − 30% if the monitoring period is not much longer than the lag (e.g., by a factor of ≲ 3, Barth et al. 2015;Wang et al. 2019).Fortunately, this monitoring duration criterion is satisfied in most of the recent low-z RM campaigns listed in Table 3, and many of these campaigns (such as the LAMP project) measured σ rms from the properly constructed line-only RMS spectrum.Therefore we can safely apply the mean f factor derived in §4.5 to the SDSS-RM sample.We compile our lags and RM masses in Table 2.In Fig. 10 we show the distribution of RM masses and redshift for the SDSS-RM sample.With ∼ 300 unique objects, this sample represents the largest sample of quasars with direct RM-based BH masses up to z ∼ 3.5.

RM results
For a MOS-RM program like ours that targets a fluxlimited quasar sample across broad redshift and luminosity ranges, the common baseline of the program may impose significant selection biases in the measured lags, missing long lags that cannot be measured, given the length of the light curves.Fortunately for the SDSS-RM program, the final light curves span an effective baseline of 11 years, sufficient to recover the vast majority of expected lags for the SDSS-RM sample (see Fig. 2).
Fig. 11 shows our measured lags for the four lines in their respective R − L planes, where luminosity is taken as the usual local continuum luminosity corresponding to each line and is the median luminosity over the spectroscopic baseline.For L5100, we have corrected for host contamination, using the estimated host fraction (within the SDSS fiber) from the 2014 spectroscopy (Shen et al. 2015b).When fitting an R − L relation to these data, we only use the SDSS-RM sample, rather than combining the SDSS-RM sample with other literature samples.This is because by number the SDSS-RM sample overrepresents the population over the probed luminosity range, and would likely dominate the global fit.A joint fit with other lag samples probing different luminosity ranges will be the objective of future work, after selection effects are properly taken into account.
For Hβ and Mg ii, the luminosity range in the SDSS-RM lag sample spans ∼ 2 dex and a R − L relation is clearly present.Overall the SDSS-RM lag sample follows the reported global R−L relation in the latest studies that derived such relations using heterogeneous samples (Bentz et al. 2013;Yu et al. 2023).However, the Hβ lags for SDSS-RM quasars on average fall slightly below the local relation by ∼ 0.1 dex, if we fix the slope to that in the local relation in the regression.A similar, albeit slightly larger, average offset in Hβ lags was reported in earlier SDSS-RM results based on shorter baselines (Grier et al. 2017b;Fonseca Alvarez et al. 2020) 2013) and other samples.In particular, the SEAMBH collaboration has targeted local AGNs with high accretion rates, and found that the lag deviation from the canonical Bentz et al.R − L relation can reach as much as a factor of few (e.g., Du et al. 2016;Du & Wang 2019).Different accretion rates of the AGN lead to different ratios between the ionizing continuum (directly responsible for the lag) and the local continuum near the line, which in turn leads to an offset from the Table 3. Compiled VPs and dynamical-modeling BH masses from previous RM studies.For simplicity, we have symmetrized all uncertainties in order to use these uncertainties in our PDF calculation ( §4.5).Reference keys are: Barth11b (Barth et al. 2011), Barth13 (Barth et al. 2013), Barth15 (Barth et al. 2015), Bentz09b (Bentz et al. 2009) Note-a log f values directly taken from Pancoast14, which did not include errors in log VP. b VP in Busch14 was computed using σ line from the mean spectrum and an estimated lag from the mean R − L relation (Bentz et al. 2013).Note that NGC5548 has two independent log f measurements, both of which are included in our analysis.
mean R − L relation.From a theoretical point of view, it is expected that there is a dispersion in lags at fixed optical luminosity caused by the diversity in BH mass and accretion rate and subsequently the underlying SED (Wu et al., in prep).
For Mg ii, we found that there is a strong R − L relation over the luminosity range probed by our lag sample (Fig. 11).The slope is somewhat shallower than 0.5.Combining the OzDES-RM results on Mg ii lags and literature Mg ii lag measurements, Yu et al. (2023) measure a global R − L 3000 relation for Mg ii, that is consistent with our measurements.The intrinsic scatter in the Mg ii R − L relation is ∼ 0.32 dex.
The luminosity range for the SDSS-RM C iv lag sample is limited to ∼ 1 dex.Thus, similar to Grier et al. (2019) (Bentz et al. 2013;Yu et al. 2023;Kaspi et al. 2021).The light blue lines are random draws from the linear regression of predicting log τ with log L using the Bayesian approach in Kelly (2007).The blue dashed line is a forced regression fit with slope fixed to that measured in earlier work.Overall, measured SDSS-RM lags follow these earlier relations, but there are notable deviations (in particular for C iv).Our lag measurements suggest that the Hβ and Mg ii R − L relations are reasonably tight (with an intrinsic scatter of ∼ 0.3 dex in lag), while the C iv R − L relation has substantially larger scatter (∼ 0.5 dex) than the other two lines.The Spearman rank-order test results on the SDSS-RM sample (correlation coefficient ρ and p-value p) are marked in each panel.These results are discussed in detail in §5.1 and Table 4.  .Comparisons between MRM from two lines for the same object.For Mg ii, we have corrected the rms line dispersion for the velocity split V ≈ 750 km s −1 of the doublet (σtrue = σ 2 measure − (V /2) 2 ).There is general agreement between the RM masses measured from two different lines, validating the RM technique of measuring BH masses.The two most significant outliers in the line comparisons are marked with their RMIDs.
using the SDSS-RM sample alone.Kaspi et al. (2021) recently compiled literature C iv lag results that span ∼ 8 orders of magnitude in luminosity and derived a global R − L relation for C iv.The SDSS-RM lags generally fall on this relation, with a slight offset to longer lags.It is possible that we were able to recover these longer lags given the baseline of our light curves.But more importantly, within the luminosity range probed by the SDSS-RM C iv lag sample, the scatter around a mean R − L relation is ∼ 0.5 dex, substantially larger than those seen for Hβ and Mg ii.This large scatter argues against using the single-epoch C iv BH mass estimator, which assumes there is a tight R−L relation for C iv.Since C iv is a high-ionization line, the diversity in the SED due to accretion parameters could be the main driver for the large scatter of lags around the mean R−L relation, where we are using the local continuum L 1350 near C iv as the luminosity indicator.
Given the large sample size of SDSS-RM, there are subsets of quasars for which we successfully measured lags for two lines.Fig. 12 compares the lags measured from two different lines in the same quasar.Compared with Hβ lags, Hα and Mg ii lags are on average longer, while C iv lags are somewhat shorter than Mg ii lags overall (but the sample size is small).These results are consistent with earlier RM results for local AGNs (Peterson et al. 2004), where BLR stratification and radiative transfer effects are invoked to explain these lag differences (e.g., Goad et al. 2012;Guo et al. 2020).
We also compare the RM masses from two different lines in the same quasar in Fig. 13.The RM masses between two lines are generally consistent with each other, suggesting that the line dispersion measurements are adequate.In addition, this consistency implies using the same virial factor (derived from Hβ) for all lines does not introduce significant mass offsets in the average sense.Figure 14.Comparisons between MRM and MSE (using earlier SE recipes) for the same line.Hβ has the best correlation while C iv has the worst correlation.However, for all three lines, there are good correlations between σ line,rms and FWHMmean, so the poor mass correlation for C iv is mainly driven by the poor R − L relation.Because the single-epoch mass is solely determined by continuum luminosity and line width, its dynamic range can be artificially suppressed compared with RM-based masses (Shen 2013).In the bottom panels, the blue dotted line is a simple median of the residuals, rather than the average offset weighted by measurement uncertainties.
There are, however, individual objects for which the RM masses from two lines differ significantly.We examined the lag and line-width measurements for these objects, and found overall these measurements are reasonable. 3 It is possible that these objects have different BLR kinematics for different lines and their respective virial factors differ (e.g., Runnoe et al. 2013;Fries et al. 2023).Either way, our results highlight the importance of deriving velocity-resolved RM results for Mg ii and C iv to fully understand the BLR structure and kinematics for these lines.

Comparison with earlier SDSS-RM results
3 We found two exceptions where the lag measurements may be problematic.For RM078 (the most significant outlier in the Hβ-Mg ii comparison), the Mg ii lag is much longer than Hβ, which is potentially due to different BLR structures for the two lines.For RM815 (the most significant outlier in the Mg ii-C iv comparison), the variability in Mg ii and C iv is weak and both lag measurements may be spurious.
Our final lag results are generally consistent with earlier SDSS-RM lag measurements based on shorter baselines (Shen et al. 2016;Grier et al. 2017bGrier et al. , 2019;;Homayouni et al. 2020).However, over the multi-year baseline, some SDSS-RM quasars reported in earlier work have undergone significant luminosity changes, resulting in changes in lags as well.One notable improvement of our final light curves is that we are able to recover many additional lags that were missed in earlier studies (e.g., Fig. 6).Those lags require either longer baselines or the capture of significant variability features for their successful measurements.In terms of RM masses, some quasars have significantly revised values compared with their earlier SDSS-RM results, due to changes in the RMS line width measurements (i.e., PrepSpec has been run on different light-curve baselines), as well as changes in the lag.While there is no overall concern on these final RM masses upon our random inspections, we will perform a more thorough investigation on individual cases and update their RM masses if necessary in the near future.

Shen et al.
On the other hand, the much longer final baselines also led to more aliases in the lag measurements for some objects.One such example is the C iv lag in RM231.In previous work with 4-year SDSS-RM data, Grier et al. (2019) was able to measure an observed-frame C iv lag around 200 days for this object.Examining the ICCF for the C iv lag measurement in RM231 with the final SDSS-RM data, the peak around 200 days is still there and strong.But there are additional ICCF peaks, and the highest one happened to be in the negative lag regime, leading to a non-detection for this object.Nevertheless, the number of such missed lags is negligible compared with the additional lags recovered with the final SDSS-RM light curves.

Comparisons with single-epoch estimators
As mentioned earlier, SE mass estimators are extensively used to estimate quasar black-hole masses near and far (see the review in, e.g., Shen 2013).There are a large number of SE calibrations for each line from various groups (e.g., Greene & Ho 2005;Vestergaard & Peterson 2006;Wang et al. 2009;Shen & Liu 2012;Ho & Kim 2014;Park et al. 2017;Coatman et al. 2017;Dalla Bontà et al. 2020;Dix et al. 2023).The differences in the luminosity and line-width slopes, the adopted virial f factor, and the methodologies of measuring line widths, combine to produce systematic differences among these SE mass recipes even for the same data.There have not been detailed comparisons between SE and RM masses for the same line and for a large statistical sample (e.g., N > 10), except for Hβ using the local RM AGN sample (e.g., Vestergaard & Peterson 2006).
The validity of single-epoch mass estimators relies on two conditions: (1) a tight R − L relation; (2) a tight correlation between line widths measured from the mean (or single-epoch) spectrum and from the rms spectrum.Condition (2) is generally satisfied for the four major broad lines considered here (cf.Fig. 5), although anomalous line "breathing" behaviors would lead to luminosity-dependent biases in the derived SE masses (e.g., Wang et al. 2020b).
However, Condition (1) has only been tested for Hβ, but not for Mg ii and C iv, despite recent efforts in deriving a global R − L relation for Mg ii and C iv using heterogeneous samples, for which selection functions are not well quantified.As discussed in §5.1, there is evidence for a global R − L relation for Hβ and possibly for Mg ii as well, although it seems additional physical parameters are involved to introduce scatter in these two R − L relations.For C iv, an R − L relation is only apparent when the dynamic range in luminosity is sufficiently large (Kaspi et al. 2021).Compared with Hβ and Mg ii, there is substantially larger scatter in C iv lags around a mean R − L relation, as probed by the SDSS-RM sample (Fig. 11).As a result, we expect the correlation between RM mass and SE mass would be the worst for C iv.
Fig. 14 compares RM and SE masses for Hβ, Mg ii and C iv.For this comparison, we use fiducial SE estimators from Vestergaard & Peterson (2006) (Hβ and C iv) and from Shen et al. (2011) (Mg ii). 4 As for the luminosity and line width in the SE mass calculation, we adopt the median luminosity over the 90 spectroscopic epochs, and the FWHM measured from the mean spectrum generated from the 90 epochs.By doing so we are ignoring additional scatter in SE masses from anomalous line "breathing", where for the same quasar, varying luminosity does not lead to anticipated changes in the single-epoch FWHM value (Wang et al. 2020b).
Fig. 14 demonstrates that these SE mass recipes can produce average values that are within ∼ 0.2 dex of the RM masses, with typical scatter of ∼ 0.4 − 0.5 dex.This systematic offset can be explained by the differences in the adopted R − L relation, the average f factor, or systematic differences in the line-width measurements.As expected, the general correlation between SE and RM masses is the best for Hβ, and the worst for C iv.In fact, using Mg ii and C iv SE masses could artificially narrow the mass dynamic range.Namely, for a fluxlimited quasar sample like the SDSS-RM sample, the dynamic range in mean FWHM is smaller than that in σ line,rms , and the dynamic range in L 3000 or L 1350 is smaller than that in BLR lags.Thus the dynamic range in SE masses would be smaller than the dynamic range in RM (true) masses.These caveats of SE masses for flux-limited samples were discussed in detail in Shen (2013), and procedures to account for the resulting statistical biases in the BH mass distribution are outlined in e.g., Shen & Kelly (2012) and Kelly & Shen (2013).

Improved SE mass recipes
Next, we consider potential refinements of SE mass recipes using our new measurements of RM masses for Hβ, Mg ii and C iv.The SE estimator is defined as: 4 The reason that we adopt the Mg ii SE recipe in Shen et al. (2011) instead of that in, e.g., Vestergaard & Osmer (2009), is that the former yields consistent results compared with both Hβ and C iv SE recipes in Vestergaard & Peterson (2006) for luminous SDSS quasars.This apparent agreement is likely a coincidence from the methodologies of measuring the mean FWHMs from the spectra.There is no consensus on what specific SE mass recipes are the best, since the calibration sample and spectral fitting methodologies vary from study to study.

88
Note-Default units for L 5100,host−corr , L3000, L1350, line widths, time delays, and BH masses are 10 44 erg s −1 , 10 45 erg s −1 , 10 45 erg s −1 , km s −1 , days, and M⊙.Linear regressions for the R − L relations and width correlations were performed using the Bayesian approach in Kelly (2007).Regressions with fixed slopes were performed using a MCMC model with emcee.Line dispersion for Mg ii has been corrected for the velocity split of the doublet.
where L is the corresponding local continuum luminosity for each line.By default, luminosity is in units of 10 44 erg s −1 for L 5100 and 10 45 erg s −1 for L 3000 and L 1350 , roughly the midpoint of the luminosity distribution for each line sample.Here, both continuum luminosity and FWHM are measured with negligible measurement uncertainties, since they are the average values during the monitoring period.We only consider FWHM mean for the SE mass recipes, given its greater reproducibility compared with the more challenging σ line,mean measurements (see discussions in, e.g., Wang et al. 2019).As shown in Fig. 5, there are good correlations between FWHM mean and σ line,rms used for RM mass calculations, justifying the use of FWHM in SE recipes.Nevertheless, there are physical motivations to use σ line instead of FWHM in SE recipes, as reasoned in, e.g., Dalla Bontà et al. (2020).
Because there are correlations between FWHM mean and σ line,rms , and correlations between continuum luminosity and BLR size for each line, we expect M SE as defined in Eqn.(2) will be correlated with M RM in general.With the right choices of slopes α and β, the calibration of SE mass recipes becomes a 1-parameter fitting problem to constrain the normalization C in Eqn.(2) using a sample of quasars with M RM measurements (e.g., Vestergaard & Peterson 2006).One option is to adopt slopes that are similar to those in the measured R − L relation and FWHM mean − σ line,rms correlations.In the work of Vestergaard & Peterson (2006), they adopt luminosity slopes of α ≈ 0.5, and β = 2.The adopted luminosity slopes are consistent with the expected canonical value, and the adopted line-width slopes assume there is a linear relation between FWHM mean and σ line,rms .
One subtlety in the above calibration procedure is that the regression is performed as matching the observed log M RM (Y ) against the model log M SE (X), such that the distribution of (Y |X) is unbiased (i.e., the expectation value ⟨log M RM ⟩ = log M SE at fixed log M SE ).However, to design an unbiased SE mass estimator, we want ⟨log M SE ⟩ = log M RM at fixed true mass log M RM instead.If the scatter in the R − L relation and in the FWHM mean − σ line,rms relation is small, it would be adequate to directly use the slope in the R − L relation (predicting R with L) and the slope  4) for the same line.The new Mg ii SE recipe is now an approximately unbiased estimator (i.e., at a given RM mass, they return an unbiased expectation value), while the Hβ and C iv SE recipes are still biased estimators (as we adopted similar slopes on L and mean FWHM as the earlier recipe in Vestergaard & Peterson (2006).However, the normalizations have been recalculated using our RM samples.
in the FWHM mean − σ line,rms (predicting σ line,rms with FWHM mean ).In reality, there is substantial scatter in these correlations, and the slope of predicting Y at X could differ significantly from that of predicting X at Y .These differences in correlation slopes for scattered samples are apparent in Fig. 5 and Fig. 11, as well as the compiled regression results in Table 4.So how do we choose slopes of α and β for each line?The SDSS-RM lag sample alone does not constrain the R − L relations particularly well to allow fitting α and β as free parameters.Instead, we follow Vestergaard & Peterson (2006) to fix these slopes, and use the lag sample to determine the normalization C in Eqn.(2).
For Hβ and C iv, we adopt α = 0.5, the canonical slope in the R−L relation, and β = 2.These choices are similar to the choices in Vestergaard & Peterson (2006).Adopting β = 2 is reasonably well justified as the bisector relation between FWHM mean and σ line,rms does appear to be approximately linear for Hβ and C iv (Fig. 5) -recall for an unbiased SE estimator, we require the re-gression relation of FWHM mean |σ line,rms .Our measured R − L relation for Hβ is slightly shallower than but consistent with α = 0.5 (e.g., Bentz et al. 2013).We are unable to robustly constrain the slope of the R − L relation for C iv, but α = 0.5 is roughly consistent with the findings in Kaspi et al. (2021).The comparison between the uncalibrated M SE and the measured M RM for Hβ and C iv are shown in the top row of Fig. 15.For Hβ, the correlation is nearly diagonal, indicating our choices of α and β are reasonable.For C iv, however, the correlation is poor, mainly resulting from the large scatter in the C iv R − L relation.
For Mg ii, we adopt somewhat different slopes of α and β.The measurement of the Mg ii R−L relation is still in an early stage, and a slope different from 0.5 is possible.In addition, the FWHM mean −σ line,rms is more non-linear than those for Hβ and C iv (Fig. 5).This non-linearity implies β should not be 2, since the fiducial RM masses are computed as ∝ σ 2 line,rms .We found that in order to produce the most consistent results, we require β > 2 and α > 0.5 for Mg ii.After experimenting, we adopt α = 0.6 and β = 3.0 for Mg ii.This luminosity slope is not too different from the canonical value, and this FWHM mean slope is consistent with the measured slope in predicting FWHM mean with σ line,rms (see Table 4).Compared with the canonical case of α = 0.5 and β = 2, our fiducial Mg ii recipe produces: (1) a less biased estimator at fixed M RM ; (2) more consistent results with Hβ and C iv SE masses for the common objects, over the luminosity range probed by SDSS quasars.
Given the chosen slopes of α and β, we show the uncalibrated log M SE against log M RM in the top row of Fig. 15.There are good correlations between the SE mass and the RM mass for Hβ and Mg ii, and a weak correlation for C iv.We then constrain the normalization C using a MCMC model that incorporates a term for intrinsic scatter, in addition to the measurement uncertainties in M RM .The constrained ranges of C are listed in Table 4.We adopt fiducial C values that are slightly different (but consistent within 2σ) from the MCMC results.These tweaked normalizations produce the best agreement between the SE masses from two lines for the same quasars.
The middle row in Fig. 15 displays the residuals in log mass as a function of continuum luminosity, which show no significant luminosity trend.The bottom row displays the residuals as a function of log M RM .By design, the Mg ii SE estimator is approximately unbiased, with no significant correlation between the mass residual and M RM .For Hβ and C iv, however, the adopted slopes are appropriate for producing unbiased log M RM at given log M SE .As a consequence, there are system-atic biases in log M SE as a function of log M RM for Hβ and C iv, as also seen in Vestergaard & Peterson (2006).Nevertheless, this SE mass bias is much less severe for Hβ than for C iv.It is possible to design less biased SE estimators for C iv and for Hβ, which we plan to investigate with more RM data from upcoming MOS-RM programs (e.g., Kollmeier et al. 2017).
Our results thus signify the severe limitations and caveats of SE BH mass recipes, especially for C iv.It is our opinion that individual quasar studies that require robust BH mass estimates should not rely too heavily on C iv SE masses.On the other hand, Hβ and Mg ii SE masses seem to reproduce the RM masses well, with an additional ∼ 0.45 dex scatter.

Caveats
The dynamical time of the BLR is typically a few years to a few decades for luminous SDSS quasars.While rms quasar variability is on average ∼ 10% over multi-year timescales, a small fraction of quasars display significantly larger variability amplitude, and sometimes even monotonically increasing or decreasing light curves over time.It is possible that the structure of the BLR in some hypervariable quasars undergoes significant changes over multi-year timescales.
In the SDSS-RM sample, several quasars have notably increased their luminosity over the past few years by more than a factor of a few (e.g., RM017, RM160, etc. Dexter et al. 2019;Fries et al. 2023).While we have attempted to measure an average lag over the full SDSS-RM baseline, we found that the measured lags are substantially longer than those measured from their earlier light curves when they were at a lower luminosity state (Grier et al. 2017b).It would be interesting to study how the BLR sizes have evolved when luminosity changes significantly over a timescale comparable to the dynamical time of the BLR, and see if there is evidence for structural changes in the BLR, in addition to the anticipated breathing effect of the BLR.
A related issue is whether or not one should remove a long-term (typically assumed linear) trend in the light curves before measuring the lag.Linear trends in the light curve do not contribute to the short-term crosscorrelated variations of interest but may tilt the resulting CCF, and therefore there is an argument for detrending before measuring the lag.On the other hand, the removal of a linear trend from the data is somewhat arbitrary: the long-term trend may not be linear but using higher-order detrending cannot be easily justified; the RMS spectra are constructed without detrending; and the removed trend may be part of a real long-term signal in the BLR response.For these reasons, we opt to mea-sure lags without detrending, as the multi-year baseline is typically much longer than the damping timescale of the light curve for our targets.Nevertheless, we perform a test of measuring ICCF lags with (linearly) detrended light curves for our fiducial lag sample.We did not find significant differences for the bulk of lags w/ and w/o detrending.
Another important caveat is that the overall detection rate for SDSS-RM quasars is low (see Table 1), as necessarily limited by the light curve quality of the project.Selection biases from incompleteness in lag detection may have consequences on the inferred R − L relations.In Appendix B we investigate the detection incompleteness for Hβ, Mg ii and C iv.We found that the success of a lag detection primarily depends on how well we can measure the line variability, e.g., via the SNR2 metric discussed in §3.3.Most of the lags are undetected simply because their line variability is not well measured (e.g., SNR2 < 20).
For Hβ, we found that the lag detection rate is very high (≳ 70%) once SNR2 reaches ∼ 35, and there is no obvious difference in quasar properties (e.g., expected lag, quasar luminosity, line equivalent width, line width, etc.) between detected and undetected cases matched in SNR2.These lag non-detections are caused by random fluctuations of the light curves that led to weak cross-correlations in the light curves.Indeed, our visual inspection of non-detection cases with SNR2 > 40 suggests that most of the cases still have some hint of a lag, but the lag is usually small and below the 2σ threshold for our lag detection.There are a handful of cases where the Hβ line is not responding to continuum variations in the expected way, which will be investigated in future work.Thus we conclude there is limited impact on the Hβ R − L relation from detection incompleteness.
For Mg ii and C iv, we found similar trends as for Hβ, and no obvious differences in quasar properties between the detections and non-detections.However, their lag detection fractions are notably lower than that for Hβ at fixed SNR2.In Appendix B we elaborate on the potential overestimation of SNR2 for Mg ii and C iv due to the insignificant improvement on the spectrophotometry with PrepSpec for these high-redshift quasars.In addition, it is possible that Mg ii and C iv in high-redshift and high-luminosity quasars are more often to show abnormal responses to continuum variations, making it more difficult to detect a lag.
Finally, there is sill room to improve the lag measurement methodology, spectrophotometry refinement with PrepSpec, and scrutiny on individual light curves.While these additional efforts are beyond the scope of the current work, we plan to investigate these issues as we continue monitoring the SDSS-RM field in SDSS-V with the Black Hole Mapper program (Kollmeier et al. 2017).In particular, we plan to improve the spectrophotometry at the blue end of SDSS spectra, which would improve the lag detection for Mg ii and C iv, as well as other UV broad lines, in high-redshift quasars.

Applications of SDSS-RM data
The rich data from SDSS-RM enable a broad array of applications.The large number of RM lags and reliable lag-based RM masses have proven valuable to study the evolution of the BH mass-host galaxy correlations toward high redshift (e.g., Shen et al. 2015b;Matsuoka et al. 2015;Li et al. 2021;Li et al. 2023).By extending RM measurements to high redshift and measuring large samples of Mg ii and C iv lags, we are starting to explore the diversity in the BLR structure for these UV broad lines, and to critically evaluate their potential for designing efficient single-epoch BH mass recipes.
In the meantime, the multi-year photometric and spectroscopic light curves for the SDSS-RM sample allow investigations of the general variability of quasars (Sun et al. 2015;Dyer et al. 2019), and of the variability of particular subsets of quasars, such as broad absorption line quasars (Grier et al. 2015;Hemler et al. 2019) and extreme variability quasars (Dexter et al. 2019;Fries et al. 2023).In addition to the unprecedented spectroscopic monitoring data that allow detailed spectral variability analyses, the SDSS-RM sample provides highquality multi-wavelength information (Shen et al. 2019b;Liu et al. 2020) for a well defined quasar sample.These data have enabled additional RM science, such as constraining the accretion disk structure using continuum lags (Homayouni et al. 2019) and modeling the propagation of variability in the accretion disk (e.g., Stone & Shen 2023).
In this paper we focused on the main science goal of SDSS-RM, i.e., the measurements of lags for major broad emission lines in quasar spectra.We will explore other topics with the final SDSS-RM data in future work.We also invite the community to exploit the rich SDSS-RM data for RM science or general AGN science.

Outlook
The results from the SDSS-RM project unambiguously confirm the feasibility and potential of performing MOS-RM in the high-redshift regime.Hβ continues to be the most reliable line for RM purposes.We have shown that it is possible to measure Mg ii lags with optical spectroscopic RM, but the increased Mg ii lag and diluted RM response make it more difficult to robustly measure the lag.Indeed, the Mg ii lag detection rate is the lowest among the four lines studied in this work (see Table 1).We also demonstrated that measuring C iv lags in high-redshift, high-luminosity quasars is possible, provided that the monitoring baseline is sufficient to constrain the long time delays.

Shen et al.
The final lag yields from SDSS-RM are broadly consistent with our predictions at the beginning of the project (Shen et al. 2015a).The forecast was made using simulated light curves of quasars with physical properties and observing cadence/baseline/depth similar to those in SDSS-RM, but the lag detection methodology and criteria in the forecast were somewhat different.Nevertheless, this overall consistency demonstrates the important value of planning a MOS-RM program with tailored simulations.In Appendix B, we provide additional lessons learned from SDSS-RM.
Given the success of SDSS-RM and other MOS-RM programs, there are several ways to improve RM studies in the high-z regime.First, the typical quality of lag measurements from current MOS-RM programs is still low-to-moderate.This is not a limitation of the principles of the MOS-RM approach, but a limitation from insufficient observing resources.Future MOS-RM programs with larger aperture telescopes (e.g., Swann et al. 2019;McConnachie et al. 2016) will provide higher S/N spectroscopy to better measure the reverberation responses in the broad line flux.These future MOS-RM programs will also be able to target fainter quasars (thus extending the sample to lower luminosities), and measure velocity-resolved lags to better constrain dynamical models of the BLR.In anticipation of these future MOS-RM programs, there are also continued efforts to expand the sample size of high-redshift lag measurements with existing MOS facilities, such as the SDSS-V Black Hole Mapper Reverberation Mapping program during 2020-2026 (Kollmeier et al. 2017).In particular, the SDSS-RM field continues to be monitored in SDSS-V, providing extended light curve baselines for a significant fraction of SDSS-RM quasars.
Besides the optical MOS-RM approach, there are also important complementary approaches to advance RM in the high-z regime.To extend the R − L relations to the most luminous quasars (e.g., L bol ≳ 10 47 erg s −1 ), it is necessary to perform decade-long monitoring campaigns for individual objects (e.g., Lira et al. 2018;Czerny et al. 2019a;Zajaček et al. 2020;Kaspi et al. 2021), since these most luminous quasars are rare and sparsely distributed on the sky.The high success rate of measuring Hβ lags at high-z and the overall best reliability of Hβ for RM mass measurements suggests the possibility of IR reverberation mapping for Hβ at z > 1.Such IR RM programs would require large-aperture telescopes to measure the Hβ response with IR spectroscopy.
Finally, the advent of high-spatial-resolution spectroastrometric measurements with interferometry, e.g., with the GRAVITY instrument on the VLTs, has enabled a new avenue toward constraining the inner struc-tures of broad-line AGNs and quasars (e.g., Gravity Collaboration et al. 2018;GRAVITY Collaboration et al. 2020, 2021).With the upgraded GRAVITY+ instrument (GRAVITY+ Collaboration et al. 2022), it will become possible to target relatively faint (e.g., K ∼ 13 − 15) quasars at z > 1. Combining the spectroastrometric measurements (e.g., Bailey 1998;Bosco et al. 2021) and RM measurements for the same quasar will provide powerful joint constraints on the underlying BLR structure and kinematics (e.g., Li et al. 2022), an approach that can also be used to measure geometric distances toward distant AGNs (e.g., Elvis & Karovska 2002;Hönig et al. 2014;Wang et al. 2020a;GRAVITY Collaboration et al. 2021;Songsheng et al. 2021).Far down the road, it may eventually become feasible to measure astrometric jitter signals due to reverberation (e.g., Shen 2012;Li & Wang 2023).These various techniques motivate continued monitoring programs to constrain AGN inner structures, measure BH masses, and improve our understanding of the growth of SMBHs and the expansion history of the universe.

Shen et al.
directory (e.g., ./rm000/),we provide the original spectra, continuum and emission-line light curves, PrepSpec outputs, and lag measurements.Additional information about the SDSS-RM sample is provided in the sample characterization paper Shen et al. (2019b).Table 5 provides a brief description of the data products available.Future updates of these products will also be distributed via this site.Below we describe the main data products released with this paper.
SDSS spectra Customarily reprocessed (e.g., Shen et al. 2015a) optical spectroscopy from SDSS is provided in the ./spec/directory in ascii format.A total of 90 epochs are available for each object.These are the original spectra before the PrepSpec run.However, these spectra were customarily reprocessed as described in Shen et al. (2015a), thus are slightly different from the spectra included in the SDSS-III/IV public data releases.
Continuum and emission-line light curves The merged continuum light curve and emission-line light curves (after processed by PrepSpec) are provided in plain ascii files with the lc.txt affix in each object directory (e.g., ./rm000/).These light curves are used in the lag measurements.The emission-line light curve files only include Hα, Hβ, Mg ii, and C iv, with prefixes ha, hb, mg2 and c4, respectively.Additional emission-line light curves, if covered by spectroscopy, will be included in the ./prepspec/directory in the future.We are aware that in rare occasions, the merging of the continuum light curves failed, with discrepant fluxes from different facilities.These corrupted light curves usually do not produce successful lags, and we did not attempt to resolve these flux discrepancies in this work.
PrepSpec outputs Model light curves, flux calibration scaling factor ( p0 t.dat), mean and rms spectra, and line width measurements from PrepSpec run are provided in the ./prepspec/directory.These outputs include all available lines covered in SDSS spectra.The flux calibration scaling factor p0 t is used to scale the SDSS spectra to improve spectrophotometry.A subset of the PrepSpec outputs are also compiled in the summary Table 2.
Lag measurements Outputs from our lag analyses are provided in the ./pypetal/directory in each object directory.These outputs include the DRW fits to the continuum light curves and outlier masks, ICCFs, Javelin/PyROA results, and weights applied to the lag calculations.The essential subset of these measurements are compiled in the summary Table 2.For convenience, we provide a lag summary plot in each object directory (e.g., ./rm000/lagsummary.pdf) to display the lag measurements for all available lines.
Software Scripts of running lag detection and post-processing are provided in the ./scripts/directory.

B. INFORMING FUTURE MOS-RM PROGRAMS
The lag yields from the final SDSS-RM data are roughly in line with the simulation forecast in Shen et al. (2015a), despite the slightly different total baseline and lag detection criteria assumed in the forecast, and the fact that the predictions were based on simulated light curves.In Fig. 16, we show the final lag detection fraction as a function of the line variability SNR2 parameter described in §3.3 (also see Shen et al. 2019b).For a future MOS-RM program, it is necessary to have a sufficient number of epochs to sample variability patterns over a sufficiently long (e.g., multiyears) spectroscopic baseline.However, in order to detect the line variability properly, each spectroscopic epoch must also achieve a significant S/N for the line flux measurement.A value of SNR2 = 20 with the benchmark SDSS-RM cadence/baseline (with N epoch ≈ 100) approximately corresponds to line RMS/ ⟨σ i ⟩ ≈ 2, where line RMS is the rms line flux variability (absolute not fractional), and ⟨σ i ⟩ is the average per-epoch flux uncertainty.Thus for a 20% fractional line variability, i.e., RMS/AVG = 0.2, we require a fractional line flux measurement uncertainty of ⟨σ i ⟩ /AVG = 10% in order to reach SNR2 = 20.Deeper epoch spectra, e.g., with 5% flux measurements, will increase SNR2 to ∼ 40 and subsequently boost the probability of lag detection (Fig. 16).
For MOS spectroscopy, there is also a systematic floor of flux uncertainties limited by the spectrophotometry accuracy of the MOS program.For example, in SDSS-RM, we achieved a typical systematic flux uncertainty floor of ∼ 5%.Using PrepSpec and for quasars with strong narrow lines (such as [O iii] 5007) at low redshift, we were able to further improve the spectrophotometry to ∼ 2 − 3%.This additional improvement is likely partially responsible for the higher lag detection rate for Hβ at fixed SNR2 (computed using statistical flux uncertainties only), compared with Mg ii and C iv for which the PrepSpec improvement on spectrophotometry is marginal.Moreover, the systematic uncertainty of spectrophotometry increases toward the blue end of SDSS spectra (e.g., Shen et al. 2015a), resulting in a larger impact on lag detection for Mg ii and C iv in high-redshift quasars than for Hβ in the low-redshift subset.
The outcome from the SDSS-RM final data set thus provides useful guidance for future MOS-RM programs with similar baseline and cadence.Given the typical ∼ 10% RMS variability of broad lines, a low systematic floor of spectrophotometry will greatly improve the lag detection rate.At the same time, we recommend to reach a statistical (per epoch) line flux uncertainty of better than ∼ 10% at the flux limit of the sample.For SDSS-RM, the latter goal is achieved with 2 hr integration per epoch at a limiting magnitude of i psf < 21.7, for the general quasar population.Note-There are multiple files inside each directory.The content and format of these files are either self-explanatory or explicitly specified in the header.Occasionally, the 'lag summary.pdf'plot needs to be regenerated for better display ranges using the scripts provided in the ./scripts/directory.

C. ADDITIONAL QUALITY CUTS ON LAGS
As discussed in Sec 4.2, we impose a set of quantitative cuts for lag detection.Our choice of the r max > 0.4 cut is more lenient than those used in other high-quality RM studies.However, while imposing this less stringent cut inevitably introduces more false positives in the lag sample, it also ensures that we are not removing lags near the detection boundary that could potentially bias the R − L relation measurements.
We test the effects of imposing more stringent quality cuts on lag detection.Fig. 17 shows the comparison in the R − L plane with our fiducial lag results for: (1) a higher r max > 0.6 cut; (2) a SNR2 > 35 cut; (3) high-quality lags individual cases, but on average higher-grade lags are more robustly detected.Lower-grade lags not necessarily mean the lag is spurious, and may reflect the complexity of the BLR and the limitations of measuring an average lag.

Figure 1 .
Figure 1.An example of merged 11-yr photometric light curves for the SDSS-RM sample (see §3.2 for details).Light curves from different surveys are calibrated to the SDSS r band.For each SDSS-RM quasar, there are typically hundreds of nightlyaveraged photometric epochs, to go with 90 spectroscopic epochs (black open circles) from SDSS.

Figure 2 .
Figure 2. Expected Hβ lag in the observed-frame for the SDSS-RM sample, estimated using the mean Hβ R − L relation from Bentz et al. (2013).The points are color-coded by the fractional RMS variability σcont from the 11-yr photometric light curve, corrected for flux measurement uncertainties.The black and red dashed lines are the 7-yr spectroscopic baseline and the 11-yr photometric baseline, respectively.There is a general tendency of increasing variability toward lower luminosities (shorter lags) at fixed redshift.

Figure 3 .
Figure 3. Fractional (intrinsic) RMS variability for the four broad lines considered in this work versus that of the continuum light curve.The dashed diagonal lines show the unity relation, and the horizontal red lines (with numbers) indicate the median fractional line variability.There are reasonably good correlations between the line and continuum RMS variability, as expected from reverberation.

Figure 4 .
Figure 4. Fractional (intrinsic) RMS variability for additional broad lines versus that of the continuum light curve.We do not report lag measurements for these lines, but their variability properties are compiled in Table2.

Figure 5 .
Figure 5. Comparisons between the FWHM from the mean spectrum (for SE estimators) and the σ line,rms from the rms spectrum (for RM masses) for Hα, Hβ, Mg ii and C iv, colorcoded by the deviation from the sample median luminosity for the corresponding line (e.g., L5100, L3000, etc).The Mg ii σ line,rms here has been corrected for the velocity split of the doublet, but not for the SDSS spectral resolution (negligible).In each panel, the black dashed line is the unity relation, and the red dashed line represents the FWHM-σ line relation for a Gaussian profile.The best-fit linear regressions of (Y |X) and (X|Y ) are also shown.There is no clear luminosity trend (measured with the deviations from the sample median luminosity, ∆ log L) in the line width correlation for all lines.

Figure 7 .
Figure 7. Statistical detection of lags for the SDSS-RM sample using ICCF.As the rmax value decreases, there are increased incidents of negative lags.While overall the asymmetry toward positive lags is clear, below rmax = 0.4, the negative to positive lag ratio is substantially higher than that at rmax > 0.4, indicating a high false-positive rate for the positive lags measured with low rmax values.The points are color-coded by the line variability SNR2 (see §3.3).Objects with higher SNR2 values tend to cluster in the upper-right quadrant, where a positive lag detection is more likely to happen, along with a high rmax value.

Figure 8 .
Figure8.Comparisons of the detected lags between two different methods (ICCF vs PyROA at left and Javelin vs PyROA at right).In most cases, our fiducial PyROA lags are consistent with the ICCF lags.However, for our multi-year light curve data, the Javelin method often traps the lag in the seasonal gaps, leading to artificial clustering of lags there.
if using σ line,rms as the velocity indicator.Because the best-fit local M BH − σ * relation has been updated over the years (see discussions in, e.g.. Kormendy & Ho 2013), the calibration of the virial coefficient has also evolved.Moreover, because the M BH − σ * relation depends on galaxy bulge properties, in principle one should use different virial coefficients based on the bulge classification of the host galaxy (e.g., Ho & Kim 2014)such bulge classifications, however, are usually difficult for broad-line AGNs and their accuracy largely depends on the expertise of the classifier.

Figure 9 .
Figure9.Constraints on the mean and dispersion of the distribution of individual f factors based on the line dispersion measured from the RMS spectrum, using a sample of 30 objects with dynamical-modeling and RM BH masses as compiled in Table3.Top: raw distribution of individual f factors.The ML estimates of the sample mean and intrinsic 1σ dispersion are marked.Bottom: Marginalized probability distributions of the mean and intrinsic 1σ dispersion of f factors.The 16%, peak and 84% locations of the distributions are marked.

Figure 10 .
Figure 10.Redshift distribution of RM masses derived from Hα, Hβ, Mg ii and C iv lags for the SDSS-RM sample.Error bars are measurement uncertainties only and do not include the ∼ 0.3 dex systematic uncertainty in MRM.
. Because the baseline is limited in earlier SDSS-RM lag measurements, this average offset from the local R − L relation might have been overestimated in our earlier papers.Note that the host correction inShen et al. (2015b) is on average smaller by ∼ 30 − 40% than that estimated from image decomposition(Li et al. 2021; Li  et al. 2023).If we further reduce the 5100 Å luminosities by ∼ 0.1 dex, the lag offset in the R − L relation would become negligible.The intrinsic scatter in our Hβ lags around the mean R − L relation is ∼ 0.32 dex.The dispersion around the canonical Bentz et al.R−L relation for Hβ have gained substantial interest in the past few years (e.g.,Du et al. 2016;Dalla Bontà et al. 2020;Du & Wang 2019;Czerny et al. 2019b;Martínez- Aldama et al. 2019).It is suggested (e.g.,Du & Wang 2019;Czerny et al. 2019b;Fonseca Alvarez et al. 2020) that this dispersion is due to different spectral energy distributions between the local RM AGN sample inBentz et al. (

Figure 11 .
Figure 11.R − L relations for the four broad lines in this work.In each panel, the red dashed line shows the most recent literature relation(Bentz et al. 2013;Yu et al. 2023; Kaspi et al. 2021).The light blue lines are random draws from the linear regression of predicting log τ with log L using the Bayesian approach inKelly (2007).The blue dashed line is a forced regression fit with slope fixed to that measured in earlier work.Overall, measured SDSS-RM lags follow these earlier relations, but there are notable deviations (in particular for C iv).Our lag measurements suggest that the Hβ and Mg ii R − L relations are reasonably tight (with an intrinsic scatter of ∼ 0.3 dex in lag), while the C iv R − L relation has substantially larger scatter (∼ 0.5 dex) than the other two lines.The Spearman rank-order test results on the SDSS-RM sample (correlation coefficient ρ and p-value p) are marked in each panel.These results are discussed in detail in §5.1 and Table4.

Figure 12 .
Figure12.Comparisons between lags from two different lines for the same object.Hα and Mg ii lags are typically longer than Hβ lags, while C iv lags are somewhat shorter (but the sample statistics are limited and the scatter is large) than Mg ii lags.

Figure 15 .
Figure15.Same as Fig.14, but for the comparisons between MRM and updated MSE estimates (Table4) for the same line.The new Mg ii SE recipe is now an approximately unbiased estimator (i.e., at a given RM mass, they return an unbiased expectation value), while the Hβ and C iv SE recipes are still biased estimators (as we adopted similar slopes on L and mean FWHM as the earlier recipe inVestergaard & Peterson (2006).However, the normalizations have been recalculated using our RM samples.

Figure 17 .
Figure 17.Tests of lag detection with more stringent quality cuts.In each panel, the open symbols represent the fiducial lag sample, and the filled symbols are for the subset.The left column shows results with a rmax > 0.6 cut; the middle column shows results with a SNR2 > 35 cut; the right column shows the results with the manual inspection cut.Details regarding these quality cuts and their impact on the measured R − L relations are presented in Appendix C.

Table 1 .
Lag Detection Summary # of Lags Hα Hβ Mg ii C iv Note-We report all lag measurements in Table 2.But we recommend to only use lags with > 2σ detection in statistical analyses.
, we cannot reliably constrain a R − L relation

Table 4 .
Regression Results

•
The measured lags from different lines in the same quasars reveal BLR stratification, consistent with earlier findings based on low-z RM work.Specifically, Hα and Mg ii have on average longer lags than Hβ, while C iv lags are somewhat shorter than Mg ii lags overall (but the sample size is small).This result suggests increasing distances of the (observable) emitting clouds from the BH in the order of C iv<Hβ<Hα≲Mg ii.[See §5.1.]iv ionizing flux and 1350 Å continuum flux in quasars.[See §5.1.]• We present re-calibrated single-epoch BH mass recipes for Hβ, Mg ii and C iv, using the SDSS-RM lag sample.The new Hβ and Mg ii SE recipes are approximately unbiased estimators of RM mass (i.e., expectation value ⟨log M SE ⟩ = log M RM ), with an intrinsic scatter of ∼ 0.45 dex around RM masses.The C iv SE recipe, on the other hand, shows substantially larger scatter (∼ 0.58 dex) and the weakest correlation with RM masses.Nevertheless, these new SE recipes show overall consistency across two lines over the luminosity range probed by SDSS-RM quasars.Considering the systematic uncertainty of ∼ 0.3 dex in RM masses ( §4.5), one can argue that the absolute uncertainties of SE masses are ∼ 0.35 dex for Hβ and Mg ii, and ∼ 0.5 dex for C iv.Our findings continue to support the usage of Hβ and Mg ii SE masses, but caution on the usage of C iv SE masses for highredshift quasars.The three SE mass recipes are summarized below.[See §5.4.] • There is a significant global R − L relation for Hβ and Mg ii in SDSS-RM quasars, using R = cτ rest .ple that spans a much larger luminosity range than the SDSS-RM sample.More importantly, the intrinsic scatter in the said R − L relation for C iv is substantially larger than that for Hβ or Mg ii, likely caused by the larger dispersion in the ratio between C

Table 5 .
Content of Data Products Batch scripts and additional software tools ./summary.fitsSummary table (