A 4 Gyr M-dwarf Gyrochrone from CFHT/MegaPrime Monitoring of the Open Cluster M67

We present stellar rotation periods for late K- and early M-dwarf members of the 4 Gyr old open cluster M67 as calibrators for gyrochronology and tests of stellar spin-down models. Using Gaia EDR3 astrometry for cluster membership and Pan-STARRS (PS1) photometry for binary identification, we build this set of rotation periods from a campaign of monitoring M67 with the Canada-France-Hawaii Telescope's MegaPrime wide field imager. We identify 1807 members of M67, of which 294 are candidate single members with significant rotation period detections. Moreover, we fit a polynomial to the period versus color-derived effective temperature sequence observed in our data. We find that the rotation of very cool dwarfs can be explained by a simple solid-body spin-down between 2.7 and 4 Gyr. We compare this rotational sequence to the predictions of gyrochronological models and find that the best match is Skumanich-like spin-down, P_rot \propto t^0.62, applied to the sequence of Ruprecht 147. This suggests that, for spectral types K7-M0 with near-solar metallicity, once a star resumes spinning down, a simple Skumanich-like is sufficient to describe their rotation evolution, at least through the age of M67. Additionally, for stars in the range M1-M3, our data show that spin-down must have resumed prior to the age of M67, in conflict with predictions of the latest spin-down models.


INTRODUCTION
A critical piece of understanding the evolution of any system-be it stars, planets, or the Milky Way galaxy itself, is understanding both the order in which events occur as well as their timescales. To do this properly one requires precise, reliable ages for the stars involved. M dwarfs are the most numerous stars in the galaxy (Gould et al. 1996;Bochanski et al. 2010), and have higher occurrence rates of small planets compared to higher mass stars (Dressing & Charbonneau 2015;Hardegree-Ullman et al. 2019). They also do not fuse heavy elements, and many tens of Gyr must pass before they show perceptible signs of evolution on a Hertzsprung-Russell diagram (Laughlin et al. 1997). As a result, M dwarfs can serve as particularly excellent tracers of galactic chemical evolution.
However, M dwarfs are also resistant to most methods commonly used for measuring a star's age. Their evolution on the main sequence is undetectable (Laughlin et al. 1997), there are no observable asteroseismic oscillations (Chaplin et al. 2011;Berdinas et al. 2017;Mathur et al. 2019), and their deep convective envelopes burn Li within the first ∼ 50 Myrs (Bildsten et al. 1997 Figure 1. A Sloan Digital Sky Survey DR9 i-band image of the M67 field, with all of our candidate M67 cluster members identified by circles. Blue circles identify the the stars with reported rotation periods in Table 2. The black boxes are the MegaPrime footprint for one of the pointings in our dither pattern. degree of stalling as observed in open clusters, as it predicts that stars later than K5 should be rotating ∼ 5 days slower than they are in Ruprecht 147 (Curtis et al. 2020). We know that K and M dwarfs must resume spinning down, as field samples show K and M dwarfs with rotation periods that are many tens of days (McQuillan et al. 2013;Newton et al. 2016Newton et al. , 2017Santos et al. 2019). Such rotation periods are roughly consistent with a Skumanich-type spin-down over the age of the galactic disk (van Saders et al. 2019). Knowing when these stars resume spinning down and the timescales over which internal angular momentum exchange occur both directly affect the mapping of a rotation period to an age. Calibrating gyrochronology and testing spin-down models for M dwarfs requires a larger sample of older, well-dated M dwarfs. Only a handful of such stars are currently available, the majority of which are in young clusters (700 Myr at the oldest; Douglas et al. 2017;Rebull et al. 2017) or are limited by the use of kinematic ages (Newton et al. 2016;Popinchalk et al. 2021). Previously, Barnes et al. (2016) used K2 to obtain calibrators for gyrochronology of solar-type stars in M67, but observing faint M dwarfs in crowded fields has proved impossible for missions such as K2 or TESS. The Canada France Hawaii Telescope's MegaPrime (Boulade et al. 2003) instrument allows us to overcome the limitations of K2 and TESS in sensitivity, without significant loss in field-of-view. In this paper we present the rotation periods of late K and early M dwarf members of the 4 Gyr-old cluster M67 (3.5-5.0 Gyr; Nissen et al. 1987;Demarque et al. 1992;Montgomery et al. 1993;Carraro & Chiosi 1994;Fan et al. 1996;VandenBerg & Stetson 2004;Balaguer-Núñez et al. 2007;Stello et al. 2016). We present the oldest K and M dwarf gyrochrone to date and compare it to literature gyrochronology relations.

OBSERVATIONS AND DATA REDUCTION
Our campaign used Canada France Hawaii Telescope's (CFHT) MegaPrime to monitor M67 (center coordinates: α = 8 h 51 m 18 s , δ = +11 • 48 00 ) from 2018 October 15 to 2021 March 5 (UT). MegaPrime is the MegaCam imager placed at the prime focus of CFHT; it has a 1 square degree field-of-view sampled by 40 CCDs arranged in 4 rows of 9, 11, 11, and 9 detectors each (Fig. 1). In total we obtained 694 exposures of the cluster in discrete 1-2 week runs, producing 131 epochs of data for our light curves. All data were collected with 121 second integration times using the Sloan i filter, red enough that M dwarfs are not too faint but blue enough to observe spot variability. Observations were taken five at a time in a cross-like dither pattern with 10.4 arcsec offsets. Bias subtracting, flat-fielding, fringe correction, and bad-pixel masking were all performed by version 3.0 of the CFHT Elixir pipeline (Magnier & Cuillandre 2004). For our data reduction we treated the five exposures in a dither pattern independently, only combining them when we averaged the five photometric measurements together in the later steps of the pipeline. An image of the field taken from i-band images of the Sloan Digital Sky Survey DR9 is shown in Fig. 1. Included are the candidate cluster members of M67 (see Sec. 3 for further details).

Sky Background
We started with determining and removing the sky background from each detector's image. For this we used MMMBackground, a python implementation of the DAOPHOT MMM algorithm contained in the photutils package (Stetson 1987;Bradley et al. 2020). We divided each image into a 8x10 grid of equally sized sub-regions and for each sub-region we estimated the background level through an estimation of the mode by the equation M ode ≈ 3 × M edian − 2 × M ean. This 8x10 grid was then interpolated to the size of the original image using a bi-cubic spline and the resulting sky background was subtracted from the image. To estimate the uncertainty on this sky background we also computed the sigma-clipped standard deviation of each sub-region in the grid which was similarly interpolated to produce an estimated uncertainty for each pixel in the sky background.

Source Finding
With the sky background subtracted, we then used DAOStarFinder, a python implementation of the DAOFIND algorithm in the package photutils, to find the location of every source in the field for each individual image (Stetson 1987;Bradley et al. 2020). The threshold was set relatively low, at three times the sigma-clipped standard deviation of all pixel values in the image, and the full width at half maximum (FWHM) was set at the seeing value reported in the image header. Using the World Coordinate System values in the image headers, we converted the pixel coordinates reported by DAOStarFinder to RA and Dec (J2000.0). This enabled us to cross-match our detected sources with external catalogs.
We downloaded a catalog of every Gaia EDR3 (Gaia Collaboration et al. 2021) source in the field of view and converted the RA and Dec coordinates to the epoch J2000.0. For each Gaia source we then found the nearest neighbor match reported by DAOStarFinder. A handful of Gaia sources did not have a detection within the cut-off of 0.75 arcsec and were considered non-detections. Any remaining sources found by DAOStarFinder that were not paired up with a Gaia source were considered false positives and discarded from our catalog. A nearest neighbor search, with the same distance cut-off, was also used to match every Gaia source to a Pan-STARRS1 (PS1) DR2 source. After this cross-matching our catalog contained 8287 sources, all of which were matched to a source in both Gaia EDR3 and PS1 DR2. The limiting magnitudes of our observations (i ∼ 21) compared to that of Gaia EDR3 (G ∼ 21) and PS1 DR2 (i ∼ 22) limit the number of real sources that were discarded by this method.

Photometry
Next we performed aperture photometry on the background-subtracted images for every source in our catalog. The aperture diameter was set at four times the seeing value for the image, a value computed from the average empirical full width at half maximum of bright sources scattered throughout the field. This diameter was chosen after analyzing the effect of aperture size on the noise properties of the light curves (see related discussion in Sec. 2.4). Any sources with overlapping apertures were flagged and excluded from the calculation of the zero-point corrections described in this section due to the source confusion introduced by their overlap. Instrumental magnitudes were computed by a sum over the aperture divided by exposure time, and an uncertainty was estimated from the quadrature sum of: the photon noise on the flux in the aperture, the read noise of the MegaPrime detectors, and the previously estimated sky uncertainty for each pixel in the aperture. With this process repeated for each of our observations, we then began constructing the light curves for each target.
In order to construct the light curves we first averaged together the measurements within an epoch (i.e., the set of five exposures that make up one dither pattern). To do this we corrected for small changes in the photometric zero-point that may have occurred between exposures. The correction, which we call ∆ i , was taken to be the median difference between the stars with low scatter in their instrumental magnitudes. In equation form, the correction applied to the ith observation relative to the first (i = 0) is given by:  A scatter plot of the predicted versus observed scatter of the zero-point corrected magnitudes that are averaged together to form one epoch in our light curves. The expectation is that these points lie on the one-to-one line. Right Panel: A scatter plot of the predicted versus observed scatter of the zero-point corrected magnitudes in our light curves, the one-to-one line is the expected lower-limit. In both panels the predicted values are derived from our estimated uncertainties.
Where M ls,i represents the set of stars with less than median scatter in their n measurements. Since any measurement where the aperture included bad pixels is discarded, n ≤ 5. The epoch magnitude was then computed from the average of the zero point-corrected measurements. If m i is the instrumental magnitude from the ith observation within the jth epoch thenm j , which will become a point in a light curve, is given by: We repeated this calculation for every star and every epoch. Finally, we corrected for the changes in the photometric zero-point between epochs. We took the same approach as before, computing this correction from the low scatter stars. The zero-point correction, zp j , of the jth epoch relative to the first (j = 0) is given by: WhereM ls,j represents the set of stars with less than median scatter in their computedm values. Eachm was then corrected by this zero-point correction: In total, we obtained 4396 light curves that met our completeness criterion of at least 99 epochs of available data (see Sec. 4 for further details).

Validation
In order to validate our model of the photometric noise we performed a comparison of the observed scatter in the zero-point corrected magnitudes to the scatter expected from the estimated uncertainties alone. We made two important assumptions for these tests: 1) the uncertainties were the standard deviation of independent Gaussian distributions, and 2) all of these distributions had the same mean (i.e., the uncertain measurement was the only source of variability). Thus, any sources with additional variability in their magnitudes would fall above the one-to-one line on a plot of the theoretical scatter versus the observed scatter.
First, we performed this test on the n ≤ 5 measurements that form an epoch, comparing the standard deviation of these points to the scatter expected from the uncertainty on their average. Exposures in an epoch were collected over a period of roughly 20 minutes, short enough that we expected each star not to vary. As a result, a scatter plot of how the noise was modeled versus the observed scatter should follow a one-to-one line, as we see in our data (left panel of Fig. 2). For sources with magnitudes of i 18, the data show a departure from the one-to-one line which we attribute to either non-linearity in the detector as it approaches saturation or a fractional measurement error, such as flat fielding errors.
Second, we performed this test on full light curves, comparing the standard deviation of the epoch magnitudes to the scatter expected from their uncertainties. The one-to-one line is expected to be the lower limit of photometric scatter, therefore many sources will show variability beyond the case of random variations due to uncertain measurements of the magnitude. We stress that a source falling above the one-to-one line in the right panel of Fig. 2, meaning it has more variability than expected from uncertainty in the photometry alone, is not proof of the source having an astrophysical process driving that variability. That the one-to-one line is indeed the lower limit in the right panel of Fig. 2 demonstrates that our model of the noise is correct.

CLUSTER MEMBERSHIP AND STELLAR PROPERTIES
An important aspect of the results presented in this paper is that the rotation periods reported can be used as benchmarks for stellar spin down models. Critical to this is knowing the age and T eff . The age determination comes from their membership in the open cluster M67, whose age has been previously determined to be 4 Gyr (3.5-5 Gyr; Nissen et al. 1987;Demarque et al. 1992;Montgomery et al. 1993;Carraro & Chiosi 1994;Fan et al. 1996;VandenBerg & Stetson 2004;Balaguer-Núñez et al. 2007;Stello et al. 2016). The effective temperatures are derived using a color-T eff relation. In this section, we provide the details on both of these critical aspects.

Cluster Membership
Cluster membership was determined by using a clustering algorithm, HDBSCAN (McInnes et al. 2017), on Gaia EDR3 proper motions and parallaxes for every star in the MegaPrime field of view, regardless of whether or not it appeared in our catalog of light curves. HDBSCAN works by using the density of points to estimate a probability distribution function (PDF) that describes the full distribution of values in the data. Clusters are then defined by the peaks in this PDF. The primary advantage of HDBSCAN is that it relies on fewer assumptions about the data than more traditional clustering algorithms such as K-means, which assumes Gaussian distributions. It also does not require that every point in the data set be assigned to a cluster, reducing the risk that outlier field stars might incorrectly be assigned M67 membership. The python package of the same name provides many different parameters to tune the performance of the clustering 1 .. We found that the defaults for the version we used, v0.8.18, were acceptable with one exception: min samples. This parameter can be thought of as determining the level of detail in HDBSCAN's estimation of the underlying PDF. Too small a value and each data point produces its own peak in the estimated PDF, too large and the finer details of the estimated PDF are washed out. Given that the distribution of parallaxes and proper motions is effectively a two-peaked distribution (Fig. 3) we found that a value of 200 gave suitable clustering results compared to the default of 5.
A disadvantage of the HDBSCAN algorithm is that it does not make use of the uncertainties on any input data. To incorporate these into our cluster membership determination we performed Monte Carlo sampling. We ran the clustering algorithm on our list of Gaia sources, recorded the results, and resampled every star assuming Gaussian uncertainties, repeating this process 1000 times. HDBSCAN cannot be instructed to find a cluster with specific properties, so with each realization we computed the median parallax and proper motions for each grouping it found in the data and used the one with the closest match to literature values for M67 (π = 1.1327±0.0018 mas, µ α cos δ = −10.9738±0.0078, µ δ = −2.9465 ± 0.0074 mas yr −1 ; Gao 2018). The difference was never more than a few percent. We then calculated a "kinematic membership probability" from the fraction of realizations in which a star was assigned membership to M67. A final membership criterion of > 50% was selected based on an inspection of the Gaia color-magnitude diagrams (CMDs, Fig. 4) that were produced for various thresholds.
Another aspect we considered in our use of HDBSCAN was the Bayseian nature of this approach. Too small a field of view and the algorithm may not have had the leverage needed to separate M67 members from the field, too large and   the diversity of field stars may have encouraged labelling true members as field stars. To address this we repeated our membership determination on a Gaia EDR3 catalog including stars out to twice the radius of the MegaPrime field of view and compared the two membership lists. Of the 1807 members within the MegaPrime field of view 76 were considered field stars when using the larger catalog, and none of the field stars gained membership in M67. None of these 76 stars are outliers on our CMDs (Fig. 4), so we have kept them in our final list of members. However, we flagged them as potentially suspect, so that the interested reader may remove them from the sample if they wish. We compared our list of M67 members to that of Gao (2018), who applied a Gaussian mixture model based approach to Gaia DR2 astrometry. They found a list of 1502 likely members, whereas we have found 1807. In common between the two catalogs are 1241 members, leaving 261 stars that are unique to the Gao (2018) catalog, and 566 that are unique to our catalog. There are several factors that contributed to these differences. First, our search for members was limited to the field of view of CFHT MegaPrime, and this truncated our search at a radius of ∼30 arcminutes; all 261 stars that are only in the Gao (2018) catalog were outside our field of view and thus were not included in our clustering. Second, 431 of the 566 stars that appear only in our catalog have parallaxes and/or proper motions that are closer to the literature values for M67 in EDR3 than in DR2. Third, 99 of the 566 stars that appear in our catalog are new in EDR3 and thus could not have been included in the Gao (2018) catalog. Finally, there are 36 stars which are unique to our catalog for otherwise unknown reasons, we attribute these to the differences between the two methods used. The members that our two catalogs have in common are denoted by the "Gao member" column in Table 2.

Single vs Binary Members
Unresolved binaries bias our inferred stellar parameters and close binaries have spin-down influenced by tidal forces (Simonian et al. 2020); as a result we also need to identify whether or not the M67 members have a companion. With a parallax of π = 1.1327 ± 0.0018 mas, Gaia is able to resolve binaries that are separated by 600 AU. However, due to the size of our apertures stars with physical separations 7000 AU have overlapping apertures. As such,   their photometry was potentially limited by confusion with their nearest neighbor. For completeness, we included these stars in our catalog of reported rotation periods (Sec. 5), but we excluded them from our subsequent analysis. Work done by Deacon & Kraus (2020) indicates there are no wide binaries separated by 3000 AU in clusters, thus our analysis was focused only on single members of M67. Binary systems which are not resolved require a different method of detection. Common approaches include: 1) spectroscopy that resolves double-lined absorption features, 2) excess astrometric noise (quantified by the renormalized unit weight error, or RUWE, for Gaia astrometric solutions; Belokurov et al. 2020), and 3) photometric excess, stars that appear brighter than the main sequence on a CMD. For our data we used the photometric excess approach, calculated from PS1 photometry Flewelling et al. 2020). The effectiveness of this approach was confirmed by the finding that all the sources which exhibited excess astrometric noise (i.e., RUWE 1.4) were also found to show photometric excess. In PS1 DR2 the saturation limit is 12-14 magnitudes, depending on seeing and filter; for M67 we found that a cut at i = 13.75 removed these problematic sources. From the photometry in Gaia EDR3 and PS1 DR2 we found that PS1 r and i were the two filters with the highest signal-to-noise ratios for the faintest members in our catalog. Therefore, we used 3σ iterative outlier rejection to fit an eighth order polynomial to the main sequence of the cluster on a PS1 r − i vs i CMD. We then categorized each source by its vertical distance from the main sequence on the CMD based on the distribution of residuals after subtracting out our fit to the main sequence (Fig. 5). Stars within ±0.3 magnitudes of the main sequence fit were classified as single members, whereas sources outside these bounds were categorized as photometric binaries. The small secondary peak of binary members is broad, which suggests there may be a small number of high-contrast binaries contaminating our sample of candidate single M67 members. Sources fainter than the main sequence are thought to be binaries with a white dwarf component. Further observations are required to confirm this, which will be included in future work. The fraction of sources labeled as binaries by this method is 26%, in line with the expectation for M dwarf multiplicity rates (Duchêne & Kraus 2013;Winters et al. 2019). Both the PS1 r − i vs i CMD that was used for this binary classification and an additional Gaia G − RP vs G CMD can be seen in Fig. 4. The stars identified as binaries by this method were set aside for future analysis. They do not have rotation periods reported in this paper.

Effective Temperatures
In order to calculate the effective temperature (T eff ) for the stars in our catalog we used a (r − i) vs T eff relation derived from the sample of late K and M dwarfs analyzed by Mann et al. (2015). We converted the synthetic Sloan r and i photometry provided into the PS1 r and i passbands using the Tonry et al. (2012) relations and applied corrections for reddening (E(B − V ) = 0.041 ± 0.004 mag; Taylor 2007) as well as a conversion to apparent magnitudes for the distance to M67 (π = 1.1327 ± 0.0018 mas; Gao 2018). We trimmed the sample to stars with metallicities of −0.07 ≤ [Fe/H] ≤ 0.07, a range chosen to cover various values reported for the metallicity of M67 in the literature (Pace et al. 2008;Santos et al. 2009;Önehag et al. 2011;Liu et al. 2016;Sandquist et al. 2018). The parameters of the trimmed sample are included in Table 1. Finally, we fit a second order polynomial to the (r − i)-T eff pairs to obtain our relation: The residual dispersion of 46 K is small compared to the errors on the temperatures (Fig. 6). Adding this in quadrature with the spectroscopic errors provided by Mann et al. (2015) yields a T eff uncertainty of ∼ 75 K.

MEASURING ROTATION PERIODS
In our data set there are 7222 sources which contain at least one epoch of data. To reduce complications with recovering periodic signals we applied a conservative cut to our data, requiring that a light curve have a minimum completion of 99 out of the possible 131 epochs of data. After applying this cut we were left with a sample of 4674 stars, with a mean completeness of 129 epochs. Of these 4674 stars, 3607 have light curves with 131 epochs. For the 636 candidate cluster members that made these cuts the mean completeness is 128 epochs, with 444 having a light curve that has 131 epochs. Due to the irregular sampling of our light curves we used Lomb-Scargle (LS) periodograms (Lomb 1976;Scargle 1982;Press & Rybicki 1989;Zechmeister & Kürster 2009) for the detection of periodic signals in our light curves. In each case the rotation periods we report was the period of maximum power in the periodogram.
A common method for quantifying the uncertainty of LS periodograms is the false alarm probability (FAP). The FAP is a measure of probability that data with no signal would produce a peak in the periodogram of equivalent height (for further details see Sec. 7.4.2 of VanderPlas 2018). We required a FAP value of less than one percent for a periodic signal to be considered significant. To maintain the computational feasibility of our injection and recovery tests (see Sec. 4.1) we report FAP values estimated using the Baluev (2008) method. As a test of the validity of using the Baluev estimates, we performed a comparison of the FAP values estimated by the Baluev method to those computed using a bootstrapping (N = 10 4 ) of all of our light curves. Using the bootstrapping method it is roughly expected to find FAP × N ± √ FAP × N false positives (VanderPlas 2018). Accounting for this uncertainty on the FAP value, every one of our Baluev estimated FAP values is consistent with its bootstrapped equivalent. We also required that a rotation period have at least five complete periods within the light curve duration in order to be considered a detection. This placed an upper limit of 175 days on any rotation periods used in our analysis.

Injection and Recovery Tests
We performed injection and recovery tests to determine the detection efficiency and false positive rates for our recovered periods. The Kepler long-cadence data provided a database of real astrophysical signals of rotation for us to test our rotation recovery. For the injected signals, we used the KEPSEISMIC light curves of K and M dwarf main sequence stars obtained with the Kepler Asteroseismic Data Analysis and Calibration Software (KADACS; García et al. 2011García et al. , 2014Pires et al. 2015). The rotation periods for these stars have been derived by Santos et al. (2019). We applied a few additional cuts of our own: first, we checked for the completeness of the Kepler light curve, rejecting any star with fewer than 11 continuous quarters of data. Second, we applied a cut on the height of the autocorrelation function peak (H ACF in Santos et al. 2019, the average difference between the peak height and the two adjacent local minima) requiring that H ACF ≥ 1.0, a value typical of stable signals. Finally, we applied a cut on the effective temperature of T eff < 5270 K, so that the observed spot pattern evolution in the Kepler sample would more closely     match the expectation for our targets in M67. Given the precision of the Kepler photometry relative to our data, we made the assumption that these light curves contained noiseless rotation signals. This gave us a sample of 4599 signals with known rotation periods for injection. Each signal was characterized by two values: the rotation period (P rot ) and the photometric activity index (S ph ), a measure of the amplitude of variability. The value of S ph was calculated by dividing a light curve into sub-series, each five times the length of the star's rotation period, and then taking the mean of the standard deviations of each of the sub-series . One the advantages of S ph over other measures of photometric variability is its correlation with proxies of magnetic activity (Salabert et al. 2016(Salabert et al. , 2017. We created logarithmically spaced bins for the injections: 5 ≤ P rot ≤ 100 [days] and 0.05 ≤ S ph ≤ 3 [%Flux] using 11 bins along each axis. However, this left some of the outlier bins (see Fig. 7) with very few, if any, injections. To compensate for this deficiency we also generated a set of synthetic light curves. These synthetic light curves were simple sinusoids: Where φ is a uniformly distributed phase, and the factor of √ 2 comes from the fact that S ph is calculated from a standard deviation. The synthetic light curves were sampled with the same cadence as the Kepler data. For every bin with less than 50 Kepler light curves we generated a sample of up to 50 synthetic ones with P rot and S ph values uniformly distributed (in linear space) within the bounds of that bin. In total we used 3934 synthetic light curves.
Each injection and recovery test involved taking a Kepler (or synthetic) light curve and sampling it to match the cadence and length of our CFHT observations. We then added the signal into one of our CFHT light curves and computed an LS periodogram for the combined data. If the period of maximum power in the resulting LS periodogram was within 10% of the injected period (see Fig. 8) and had an estimated FAP of less than 1%, then we considered this a successful recovery. If the period of maximum power was more than 10% different from the injected period and the estimated FAP was less than 1% we considered this a false positive. All other cases were considered non-detections. We did not want to assume that the period of maximum power in our periodograms was due to rotation, thus we did not remove any pre-existing signal from the light curves before injection. To prevent confusion with the signal already present in the CFHT lightcurves we removed any case where the injected period was within 10% of the signal already detected in the lightcurve. This filtered out no more than 11% of the tests in any given bin, with every bin having at least 45000 tests. This approach enabled us to incorporate the actual systematics present in our CFHT photometry that may have limited the recovery of periodic signals.
Since we are only interested in the rotation periods of the members of M67, we limited the sample of CFHT light curves to a subset of the candidate cluster members and a matching number of randomly selected field stars. We selected the light curves for the injection and recovery testing by applying three criteria. First, cluster members were required to have very high (i.e. = 1.0) membership probability, while field stars must have had very low (i.e. = 0.0) membership probability. Second, they must have had at least 99 epochs of data available. Finally, they must not have had an overlapping aperture. We also required that the selected field stars have similar r − i colors and apparent i magnitudes to our selected cluster members, to mitigate the impact of any systematics that depended on color. This yielded 740 total light curves, 370 cluster members and 370 field stars, into which we injected each of our 4599 Kepler and 3934 synthetic light curves. Each injection and CFHT light curve pairing was repeated with three or four different phases, depending on how many CFHT light curves fit within the injection light curve. This was done to capture the shift in phase due to spot pattern evolution over the years of observations.
We compiled the results of these tests into our completeness diagram ( Fig. 9) as well as our false positives diagram (Fig. 10). We have plotted the results from the cluster members and field stars separately. Injections into the light curves of cluster members served as a direct test of our ability to recover rotation signals in the cluster member data, while injections into the field stars served as a control sample. The underlying distribution of P rot is different for each of these populations, and thus each is expected to impact the completeness diagram in different ways. Trends that are common to both figures are thus reflective of the pipeline's recovery capabilities in general. Any differences between the two panels that cannot be attributed to different P rot distributions would reflect issues in the pipeline, but we do not see any such differences.
The completeness diagram (Fig. 9) shows the major trends we would expect: 1) as the amplitude of the rotation signal decreases our ability to recover the correct period also decreases, and 2) the evolving spot patterns in the Kepler light curves reduced our ability to recover the correct period. Our false positives diagram (Fig. 10) also shows the major trends that we expected. In particular we highlight the 10−20% difference in false positive rates between cluster members and field stars for low amplitude injections (S ph 0.25%). Many of the light curves we injected signals into already had an existing periodic signal and, when injecting low amplitude signals we expected to instead recover the already present signal. This explains both the high percentage of false positives for low amplitude injections, as well as the difference in false positive rates. The cluster members were generally expected to show periodic variability due to their spot patterns. On the other hand, a smaller fraction of field stars were expected to show rotational variability and those that do span a much wider range of timescales (e.g., background evolved stars).

RESULTS AND ANALYSIS
We present the full rotation catalog in Table 2. This information includes: Gaia EDR3 and PS1 DR2 source identifiers, the Gaia EDR3 astrometry and photometry, the PS1 DR2 griz photometry, the recovered rotation periods (if available), their estimated FAP values, the derived T eff values, the percentage probability we calculated for M67 membership, and whether or not the star was flagged as a candidate binary. Appendix A contains plotted light curves and periodograms for each star in Table 2 with a reported rotation period.
We have plotted the measured rotation periods versus effective temperature for the 294 candidate single members of M67 with significant rotation detections in Fig. 11. For the analysis, we applied two extra cuts on the rotation periods, requiring that the stars have not been flagged as having an overlapping aperture (Sec. 2.3) or as a fieldof-view dependent member (Sec. 3.1). Despite some scatter in the periods, they are concentrated about a locus in T eff -P rot space. In an effort to describe this sequence, we performed a polynomial fit to the data using iterative outlier rejection, where at each step outliers were defined as the data greater than three median absolute deviations away from the median of the residuals. We did this for both T eff vs P rot and PS1 (r − i) vs P rot , finding that both approaches converged to the same solution: a subset of 64 stars, for which the least-squares best fits are: P rot (T eff,4K ) = 9.66 × 10 −10 · T 4 eff,4K + 8.25 × 10 −7 · T 3 eff,4K + 2.69 × 10 −4 · T 2 eff,4K + 0.016 · T eff,4K + 25.9, where T eff,4K = T eff − 4000 K. We used bootstrapping (N=10000) to calculate confidence intervals about our fit, fitting a polynomial to 64 stars sampled with replacement from the 253 stars that passed all quality cuts. As a point of comparison, we have also taken the approach of binning the 253 stars in T eff , computing a median P rot for each bin, and fitting a polynomial to these medians. The medians are plotted as red squares in Fig. 11  as a red line, which we have found is in agreement with the iterative outlier approach. We favor the results of the iterative outlier rejection due to its exclusion of points we believe are aliases from the fit to the T eff vs P rot sequence (see related discussion in Sec. 5.2).

Lomb-Scargle Failure Modes
In addition to the 10% uncertainties determined from our injection and recovery tests (Sec. 4.1) there are systematic uncertainties contributing to the scatter in our results (Fig. 11). These are the failure modes of the LS periodogram, originating from the irregular sampling in time. For the purposes of this discussion we will be using the term "window function" in a slightly different manner than in more classical time series analysis discussions. Instead of describing a traditional window function, such as the Hann window, we take an approach similar to that of VanderPlas (2018) where the window function describes how the light curve was sampled in time. This window function has predictable effects on the LS periodograms computed from the data, which can all be combined into one equation (Eq. 47 in VanderPlas 2018): where P obs is the observed peak in the periodogram, P true is the true period of the underlying signal, and m and n are integers. m = 1 and n = 0 yields the true period, and m = 2 and n = 0 represents the classic case of half-period aliasing, however they can both take any integer value, positive or negative. The final term, δP , is the period of a peak in the window function's periodogram; there are typically more than one, and for our CFHT observations there were two dominant ones. They were the "month window peak" (δP ≈ 29.5 Days) arising from only observing during the bright lunar phases, and the "year window peak" (δP ≈ 380.8 Days) arising from only observing when the cluster is up. The exact values of each depends on the precise sampling in time (i.e., on the completeness of the light curve). The effects of these window peaks can be easily seen in a scatter plot of P true vs P obs , which we have plotted using the results of a complete set of injections into one of our CFHT light curves (Fig. 12). There is no way to determine if     the period of maximum power in a periodogram corresponds to P true or one of its failure modes, P obs , with absolute certainty. Moreover, because m and n are integers there is no continuum of window effects, meaning a standard deviation computed across the peaks in a periodogram is a poor description of the uncertainty. VanderPlas (2018) provide a prescription for how one can use detected failure modes to improve the accuracy of interpreting periodgrams. We did not use their prescription, instead we found that many of the stars identified as likely to be failure modes as opposed to true rotation periods by their method are rejected in our iterative-outlier rejection and thus already excluded from our analysis. We have included a table of all the detected potential failure modes associated with each of our reported P rot values in Table 3. Readers who are interested in further trimming to create their own subset of the M67 rotation periods reported here may use these values in a prescription like that of Sec. 7.2 of VanderPlas (2018).

Deviations from the Sequence
There are a number of mechanisms that can result in an incorrectly measured value of P rot , both observational and astrophysical. Astrophysically, spot pattern evolution can spread the power from a rotation signal into multiple peaks in the periodogram, as well as shift the central location of these peaks. By using Kepler light curves as a part of our injection and recovery tests (Sec. 4.1) we are able to quantify the effect this has on our recovery. Comparing the bins with majority Kepler light curves to their neighbors with majority synthetic light curves in Fig. 9 indicates that spot pattern evolution among the Kepler light curves lead to a ∼15-20% drop in recovery. The same comparison using Fig.  10 shows an equivalent uptick in false positives, highlighting the impact of spreading the power across multiple peaks in the periodogram. Additionally, Basri & Nguyen (2018) have shown that stars with lower T eff and longer P rot tend to favor a "double dip" spot pattern that lends itself to half-period aliasing. However, the results of our injection and recovery tests suggest this is a relatively minor effect for our data set (see the half period alias line in Fig. 12). Finally, close binary systems will have rotation periods that appear as outliers in the data. Such systems are affected both by the confusion of brightness modulations in both stars as well as tidal forces changing their rotational evolution relative to single stars. We have mitigated the contamination from binary stars through our CMD cuts (Sec. 3.1.1). Observationally, we were also limited by our irregular sampling in time, such effects are described by the Lomb-Scargle failure modes (see Sec. 5.1). This means that even a perfectly stable sinusoid can be recovered incorrectly as the signal-to-noise ratio on the data decreases. With these effects in mind we believe that the use of iterative outlier rejection for our reported fit to the T eff vs P rot sequence was justified.
To demonstrate this, we substituted Eq. 7 for P true in Eq. 9, and plotted the resulting sequences alongside our original results in Fig. 13. Given the difference in the effect of each window peak we plotted the month and year effects separately. For the collection of stars rotating faster than our fit to the T eff vs P rot sequence, they align well with two possible cases: 1) they fall along the half period alias of our fit, or 2) they fall along a sequence that is associated with the month window peak (left panel of Fig. 13). The sequences associated with the year window peak show that these failure modes contribute to the scatter about our fit to the sequence. This, combined with the precision on our P rot values derived from the injection and recovery tests (Fig. 8), are what prevent us from measuring a T eff vs P rot sequence that is as sharply defined as the slow rotator sequences observed in younger clusters (e.g., Praespe and NGC 6811; Douglas et al. 2017Douglas et al. , 2019Curtis et al. 2019).
Finally, there remain a number of stars with relatively long rotation periods (P rot 40 d) and high temperatures (T eff 3600 K) that are inconsistent with our fitted sequence and its expected failure modes. To gain some insight into the origin of these inconsistent stars we binned our P rot values and computed the fraction of stars in a period bin that were inconsistent with the fitted sequence. Then we estimated the S ph values for these stars, allowing us to compare the computed fractions to the false positive rates of Fig. 10. The stars inconsistent with the sequence have a mean S ph of 0.3% with a standard deviation of 0.2%, compared to a mean S ph of 0.4% with a standard deviation of 0.1% for the stars of equivalent temperature consistent with the sequence. Given these S ph values, the inconsistency fractions of 20 − 50% align well with the false positive rates. Moreover, the inconsistent stars all have detected failure modes that are consistent with the fitted sequence itself, whereas those on the sequence primarily have detected failure modes consistent with a half-period alias. A multi-term model would aid in clarifying the true rotation period of these stars, however the sparse sampling in our data made those fits poorly conditioned. Therefore we are satisfied with their exclusion from our fit to the sequence. We postulate that these stars were affected by either spot pattern evolution or the signal-to-noise ratio of the data, both of which contribute to how likely a failure mode is to be recovered instead of the true rotation period. Confirmation would require follow-up observations at much higher cadence and with more regular sampling in time.

DISCUSSION
We first compared our observations (the 64 stars our iterative outlier rejection converged to, Sec. 5) against the predictions of two classes of gyrochronological models: empirical and theoretical (Fig. 14). Empirical models are agnostic to the physics of magnetic breaking and anguluar momentum transport, fitting a relation to a set of periods and ages often as a function of color. Generally they follow a Skumanich-like relation of P rot ∝ t n , using cluster data and the Sun as anchor points to tune the value of the exponent. On the other hand, theoretical models make assumptions regarding the underlying physics and spin-down that manifests from their description. Whether they are empirical or theoretical in nature, all models are calibrated against objects of known age and rotation period, relatively few of which are young M dwarfs, meaning that their predictions for the age of M67 are an extrapolation.
There are four empirical relations included in our comparison. First is the Barnes (2010) model, where dP rot /dt is parameterized in terms of the Rossby number (Ro) and two dimensionless constants calibrated on the Sun and young open cluster observations. Second, the Angus et al. (2019) empirical relation, which is a broken power law with mass fit to the sequence of Praesepe and a spin-down law tuned to replicate the Sun. Finally, we evolved the sequences of Praesepe (670 Myr; Douglas et al. 2017Douglas et al. , 2019 and Ruprecht 147 (2.7 Gyr; Curtis et al. 2020) forward in time through the use of a simple Skumanich-like spin down: P rot (4 Gyr) = P rot,0 (4 Gyr/t 0 ) 0.62 . For the hotter stars in our sample all of these empirical relations, with one exception, predict that the stars of M67 should be rotating ∼ 10-20 days slower than observed. The exception is the Skumanich-like spin down relation launched from the stars of Ruprecht 147, which provides an excellent match to our data for the earlier M dwarfs (T eff 3700 K), for the later M dwarfs there were no data available in Ruprecht 147.
The first of the two theoretical models we considered is from van Saders & Pinsonneault (2013) and described therein. Similar to our Skumanich-like empirical relation, we launched this model from two starting places: Praesepe and Ruprecht 147. This model is calibrated to match the spin down of solar mass stars and the Sun, and then applied to our low-mass M67 members. Braking laws of this form perform well on the Sun but fail to capture the rotational evolution during the first few hundreds of Myrs in young clusters (see Douglas et al. 2017;Breimann et al. 2021;Roquette et al. 2021  decoupling, etc.) or with the assumed distribution of initial rotation periods (Roquette et al. 2021). We manage these early-time uncertainties by starting our models as solid body rotators at Praesepe age and rotation rate, and evolving them forward to M67. The model evolved forward from Praesepe is a better match than the empirical relations, as it is within the 3σ confidence interval, but still predicts that the stars of M67 should be rotating slower than we observed. The model evolved forward from Ruprecht 147 provides an excellent match, being very closely aligned with the empirical relation launched from the same starting point, although this is again limited by a lack of later M dwarfs in the Ruprecht 147 data. The second theoretical model we included is the model of Spada & Lanzafame (2020). This iteration on the model includes some minor adjustments from that of Lanzafame & Spada (2015), a core-envelope decoupling model. Their model incorporates a mass-dependent wind-braking law that follows the classical rotation rate dependence of Kawaler (1988), dJ dt ∝ ω 3 . It also uses a two-zone approach to the interior, treating the core and envelope as two separate rotationally solid bodies that are allowed to exchange angular momentum. As a result, this model explains the apparent stalling of spin-down as a epoch during which significant angular momentum transport occurs from the core to the envelope, balancing out the angular momentum the envelope loses to wind-braking. The prediction of this model agrees with the observations quite well down to a T eff of around 3600 K. At this point, the model's prediction is too fast compared to our observations, though still consistent at the 3σ level.
In total, we have found three models that provide an excellent (i.e., within 1σ) match to our observations of M67. Of these models, two are solid body spin-down (one theoretical, one empirical) applied to the stars of Ruprecht 147, which we interpret as a sign that the late stages of low-mass stellar spin-down are dominated by solid body rotation. The other model is that of Spada & Lanzafame (2020), which is both the only core-envelope decoupling model tested, as well as the only model launched from the birth of the star. The excellent agreement between it and our observations makes a compelling case for the core-envelope decoupling theory.

The Case for Core-Envelope Decoupling
The evidence for core-envelope decoupling goes deeper than the agreement between the model of Spada & Lanzafame (2020) and our observations. In the core-envelope decoupling framework, after the epoch of significant angular mo-  (Howard et al. 2020), and the K2SDSS sample (Popinchalk et al. 2021).
mentum transport occurs, the expectation is that the core and envelope of the star have equalized in angular velocity. At this point, the star spins down as a solid body. If the stars of Ruprecht 147 have resumed their spin-down (Curtis et al. 2020), core-envelope decoupling would predict that they are now spinning down as solid bodies. The precise agreement between our solid body and empirical models launched from Ruprecht 147 and our observations implies that this is the case, at least through the age of M67. Another important test of core-envelope decoupling lies in the behavior of spin-down for stars that are nearly or fully convective. The diminishing size of the core limits the amount of angular momentum it can store relative to the envelope, reducing the length of time the star's spin-down would stall. Furthermore, stars with no radiative core should not stall their spin-down at all. Curtis et al. (2020) provide an empirical relation for the age at which stars resume spinning down. They find: based on a simplified model where spin-down comes to a full stop and then suddenly resumes after a mass-dependent length of time. If we extrapolate this relation to later spectral types (i.e., beyond M0), we find that by M1.5 the age at which spin down resumes is approaching that of M67 (3.6 Gyr vs 4 Gyr), and by M2 this age has potentially exceeded that of M67 (4.6 Gyr). However, the stars in this range of temperatures (M1-M3) are rotating ∼10-30 days slower than their younger counterparts (top two panels of Fig. 15), implying they have been spinning-down for at least part of the intervening ∼ 3 Gyrs. This is suggestive of a need for a t R relation that has a turnover as it approaches the fully convective boundary, as expected in the core-envelope decoupling framework. Observations of younger M dwarfs of these spectral types (e.g., those in Ruprecht 147 or NGC 752) will be a critical test for determining when these stars resumed their spin-down.

M67 and the Field
We have also compared our observations to an ensemble of field star rotation periods collected from a variety of sources. The largest contributor to this collection is the Kepler sample, with temperatures and rotation periods from Santos et al. (2019). The rest are predominantly M dwarfs with rotation periods from: the PS1 Medium Deep Survey (Kado-Fong et al. 2016), MEarth (Newton et al. 2016(Newton et al. , 2018, CARMENES (Díez Alonso et al. 2019), Evryscope (Howard et al. 2020), and the K2SDSS sample (Popinchalk et al. 2021). Popinchalk et al. (2021) provided the Gaia DR2 identifiers for the targets from all of these surveys, which we used to obtain temperatures from v8 of the TESS Input Catalog (TIC) (Stassun et al. 2019). We then plotted these field stars with the open cluster data (bottom two panels of Fig. 15), ignoring any stars which did not have a temperature in TIC.
Field M dwarfs follow a bimodality in their rotation periods (Kado-Fong et al. 2016;Newton et al. 2016;Howard et al. 2020). Using kinematic ages, Newton et al. (2016) speculated that the transition between the fast and slow populations must be quick, and must occur between the ages of 2 and 5 Gyr. The stars of M67 fall along the lower envelope of the slow rotator population, suggesting that they represent convergence onto a slow rotator sequence for M dwarfs. The age of M67 (4 Gyr) is consistent with the bounds for the transition. Higher cadence observations are needed to confirm whether or not there are still rapid rotators in M67, a lack of which would make M67 a fully converged slow rotator sequence. Since accurate gyrochronology depends on stars converging to a slow rotator sequence, the age of M67 serves as a lower bound for accurate gyrochronological ages of M dwarfs.
Another interesting feature seen in the distribution of field star rotation periods is the intermediate period gap. This is a bimodal distribution of stars with T eff values less than 5000 K and intermediate rotation periods (15-25 Days, McQuillan et al. 2013). A number of explanations have been put forth to explain this gap, including a lull in star-formation (Davenport 2017), a transition to faculae-dominated photospheres (Reinhold et al. 2019), or an epoch of accelerated spin-down during the recoupling of the core and envelope (McQuillan et al. 2013;Gordon et al. 2021). Open cluster data shows that any explanation relying on the gap stars having a common age is incorrect (Curtis et al. 2020). Instead the mechanism that causes this gap must occur at different times for stars of different masses. The stars of M67 appear along the upper envelope of the intermediate period gap, suggesting an upper bound of 4 Gyr for the age by which this mechanism has occurred. Furthermore, if the gap is indeed caused by accelerated spin-down during core-envelope recoupling, then the stars along the upper envelope of the intermediate period gap should be composed of stars that are spinning down as solid bodies, in line with our observations.

Evidence of a Unique Spin-Down History
While this description is compelling, some caution is important. Somers & Pinsonneault (2016) identified M67 as an outlier among open clusters in terms of its lithium abundance. Having demonstrated that Li depletion is a strong test of core-envelope recoupling they concluded that the most likely scenario explaining M67's Li abundances is an "intrinsically different mixing history" driven by a surplus of rapid rotators in the cluster's early years. Observations of young clusters and associations show that massive stars in large clusters can drive photoevaporation of the disks of nearby lower mass stars, shortening disk lifetimes and resulting in a larger population of rapid rotators (Roquette et al. 2021). Such a surplus of rapid rotators would shift the mean sequence of M67 to faster rotation periods compared to stars of equivalent ages until the initial conditions are forgotten. However, this will not affect the braking laws describing their spin-down. We can control for M67's unique initial rotation periods by modeling a variety of cases for the initial conditions, as well as observing other clusters of similar ages.

CONCLUSIONS
In this paper we have: • Generated a new catalog of 1807 M67 members based on Gaia EDR3 parallaxes and proper motions, identified potential unresolved binaries by their location on the cluster's CMD, and calculated the color-based effective temperatures for the late K and early M dwarf single members of M67.