The Complete Redshift Distribution of Dusty Star-forming Galaxies from the SPT-SZ Survey

The South Pole Telescope (SPT) has systematically identified 81 high-redshift, strongly gravitationally lensed, dusty star-forming galaxies (DSFGs) in a 2500 square degree cosmological mm-wave survey. We present the final spectroscopic redshift survey of this flux-limited ($S_{870 \, \mathrm{\mu m}}>25 \, \mathrm{mJy}$) sample. The redshift survey was conducted with the Atacama Large Millimeter/submillimeter Array across the 3 mm spectral window, targeting carbon monoxide line emission. The SPT sample is now spectroscopically complete, with redshifts spanning the range $1.9$$<$$z$$<$$6.9$, with a median of $z=3.9 \pm 0.2$. We present the mm through far-infrared photometry and spectral energy density fits for all sources, along with their inferred intrinsic properties. Comparing the properties of the SPT sources to the unlensed DSFG population, we demonstrate that the SPT-selected DSFGs represent the most extreme infrared-luminous galaxies, even after accounting for strong gravitational lensing. The SPT sources have a median star formation rate of $2.3(2)\times 10^3 \, \mathrm{M_\odot yr^{-1}}$ and a median dust mass of $1.4(1)\times10^9\, \mathrm{M_\odot}$. However, the inferred gas depletion timescales of the SPT sources are comparable to those of unlensed DSFGs, once redshift is taken into account. This SPT sample contains roughly half of the known spectroscopically confirmed DSFGs at $z$$>$$5$, making this the largest sample of high-redshift DSFGs to-date, and enables us to measure the"high-redshift tail"of the distribution of luminous DSFGs. Though galaxy formation models struggle to account for the SPT redshift distribution, the larger sample statistics from this complete and well-defined survey will help inform future theoretical efforts.


INTRODUCTION
Millimeter (mm) and sub-millimeter (submm) continuum observations have transformed our understanding of galaxy formation and evolution by demonstrating that luminous, dusty galaxies were a thousand times more abundant in the early Universe than they are today (see reviews by Blain et al. 2002 andCasey et al. 2014). The most intense star formation in the universe takes place in these high-redshift (z>1) dusty star-forming galaxies (DSFGs), which form new stars at rates of >100−1000 M yr −1 behind dense shrouds of dust. DSFGs are thought to be the progenitors of the massive elliptical galaxies seen in the present-day universe (Blain et al. 2004). The exact details of * NHFP Hubble Fellow how these galaxies form stars at such prodigious rates is still an open question (Narayanan et al. 2015), though galaxy mergers likely play a role (e.g. Tacconi et al. 2008;Engel et al. 2010;Narayanan et al. 2010;Hayward et al. 2011;Bothwell et al. 2013;Hayward et al. 2013;Ma et al. 2016). While the first surveys of the redshift distributions of these DSFGs suggested that the population peaks at z∼2 (Chapman et al. , 2005, modern redshift surveys suggest that the DSFG population peaks at much higher redshifts (2.5-2.9; Simpson et al. 2014;da Cunha et al. 2015;Danielson et al. 2017;Dudzevičiūtė et al. 2020). These updated surveys are also finding objects at increasingly higher redshift, extending past z>6 (e.g. Riechers et al. 2013;Fudamoto et al. 2017;Strandet et al. 2017;Zavala et al. 2018;Marrone et al. 2018). The presence of these extremely high-redshift DSFGs challenges our understanding of the underlying distribution of these objects and their role in the cosmic star formation history.
Dust emission at high redshift (z>1) exhibits dimming from increased cosmological distance, which is counteracted by a steep rise on the Rayleigh-Jeans side of the spectral energy distribution (SED) at a fixed observing wavelength, which leads to the so-called "negative K-correction" in the sub-mm (Blain & Longair 1993). This effect enables a nearly redshiftindependent selection of DSFGs at mm and submm wavelengths. However, the dust-obscured nature of DSFGs suppresses emission at optical/UV wavelengths, making robust redshifts difficult to obtain, especially at high redshift. Technological advances in correlator bandwidth have enabled "blind" spectroscopic surveys to be conducted at facilities such as the Atacama Large Millimeter/submillimeter Array (ALMA) and the Plateau de Bure Interferometer (PdBI). These spectroscopic surveys search for molecular emission at millimeter wavelengths and can be conducted without prior optical/near-IR spectroscopy. Because the molecular emission can be unambiguously related to the continuum emission, these surveys provide a direct and unbiased way to derive the redshifts of DSFGs (e.g. Scott et al. 2011;Weiß et al. 2009). Carbon monoxide (CO) is the second most abundant molecule in the Universe after H 2 and its rotational transitions are frequently targeted in blind searches. These rotational transitions are spaced evenly every 115 GHz and are among the brightest lines in the millimeter spectrum. The CO line brightness is due to both its abundance and the molecule's large dipole moment. The low energy required to excite its rotational states, and the fact that the resulting emission lines are accessible at frequencies of high atmospheric transmission make it ideal for line surveys with ALMA.
One of the first millimeter wave redshift searches with ALMA identified the redshifts of 26 gravitationally lensed DSFGs selected with the South Pole Telescope (SPT; Vieira et al. 2013;Weiß et al. 2013). Because gravitationally lensed sources (with magnifications µ∼10) are apparently brighter than unlensed sources, they require significantly less on-source time to survey (t obs ∝ µ −2 ). Blind CO surveys can be conducted with comparative ease and enable larger spectroscopic surveys (Weiß et al. 2013;Strandet et al. 2016;Neri et al. 2020). While some unlensed blind CO surveys have been conducted (Chapman et al. 2015), the majority of unlensed redshifts were obtained through optical spectroscopic searches (Chapman et al. , 2005Casey 2012;Koprowski et al. 2014;Danielson et al. 2017;Brisbin et al. 2017) or photometry Simpson et al. 2017;Micha lowski et al. 2017) and peak at a lower redshift (z∼2.3−2.9). The discrepancy between the peaks of unlensed and lensed distributions can be explained by the combination of the selection wavelength, survey depth, and the redshift-dependent probability of strong lensing (e.g. Béthermin et al. 2015a;Strandet et al. 2016).
Wide-area surveys conducted with the Atacama Cosmology Telescope (ATCA; Marsden et al. 2014), Herschel (Eales et al. 2010), Planck (Planck Collaboration et al. 2015, and the SPT (Carlstrom et al. 2011;Vieira et al. 2010Vieira et al. , 2013 span hundreds to thousands of square degrees and have enabled the discovery of hundreds of gravitationally lensed DSFGs at millimeter and sub-millimeter wavelengths. For DSFGs at high-redshift, the rest-wavelength peak in the spectral energy distribution (SED) at ∼100 µm is shifted into the observing bands of these mm and sub-mm instruments. Most of the brighter sources are gravitationally lensed as well (e.g. Blain 1996;Negrello et al. 2010;Wardlow et al. 2013;Spilker et al. 2016). The few intrinsically bright unlensed sources in these samples are typically major mergers of DSFGs (e.g. Fu et al. 2013;Ivison et al. 2013) or protoclusters (e.g. Overzier 2016Casey 2016;Miller et al. 2018;Oteo et al. 2018;Lewis et al. 2018;Hill et al. 2020). The magnification due to gravitational lensing enables us to obtain spectroscopic redshifts for a complete sample of DSFGs, measure the redshift distribution of DSFGs, and ascertain the prevalence of the highest-redshift DSFGs. Samples of lensed DS-FGs also afford the opportunity to study fainter observational diagnostics, in greater detail, than would otherwise be possible (e.g. Bothwell et al. 2017;Béthermin et al. 2018;Spilker et al. 2018;Zhang et al. 2018;Litke et al. 2019;Dong et al. 2019;Jarugula et al. 2019;Cunningham et al. 2020).
In this paper, we finalize the SPT-selected ALMA redshift survey, which started in Weiß et al. (2013) and continued in Strandet et al. (2016). The final catalog contains spectroscopic redshifts for all sources, making it the largest and most complete catalog of its kind to date. In Sec. 2, we present the 3 mm line scans obtained from ALMA (Sec. 2.2). We also present the photometry from SPT, ALMA, APEX, and Herschel (Sec. 2.3.1). In Sec. 2.3.2, we present a methodology for fitting spectral energy distributions (SEDs) and deriving intrinsic source properties. Section 3 is divided into two parts: spectroscopic results (Sec. 3.1) and photometric results (Sec. 3.2). In Sec. 3.1, we present the 3 mm spectra and the resulting spectroscopic redshifts. In Sec. 3.2, we describe the fitted SEDs and the resulting intrinsic properties. In Sec. 4, we discuss the redshift distribution of the complete SPT sample (Sec. 4.1), the possibility of temperature evolution (Sec. 4.2), the extreme nature of the SPT sources (Sec. 4.3) and the resulting high redshift tail of this distribution (Sec. 4.4).

Sample selection
The SPT-selected DSFG catalog is a fluxlimited sample comprised of 81 bright sources, selected at 1.4 mm from the 2500 deg 2 of the SPT-SZ survey (Vieira et al. 2010;Mocanu et al. 2013;Everett et al. 2020). The sources were selected with S 1.4 mm >20 mJy, corresponding to a signal to noise ratio of >4.5. The relatively coarse SPT positions (beam size of 1 .05 at 1.4 mm) were refined with observations with the Large Apex BOlometer CAmera (LABOCA) at 870 µm (Siringo et al. 2009). Given the smaller beam size (20 ) and higher signal-to-noise ratio (typically ∼2.5 higher) with the LABOCA observations, a final flux density cut was performed to select sources with S 870 µm >25 mJy. The complete source catalog and their positions are detailed in Ap. A.
The spectroscopic survey of the SPT sample presented here is complete for S 870 µm >25 mJy. The 1.4 mm SPT and S 870 µm LABOCA fluxes of the final sample are shown in Fig. 1. The S 350 µm /S 870 µm color can be used as a rough indicator of redshift, shown in the right panel of Fig. 1, assuming a constant dust temperature of 50 K (see Sec. 3.2.1 for details). Due to their extreme brightness, most of the sources were suspected to be gravitationally lensed by foreground galaxies, groups, or clusters (Negrello et al. 2007). High resolution 870 µm observations (Hezaveh et al. 2013;Spilker et al. 2016) confirmed this hypothesis and yielded a median magnification of µ 870 µm =6.3.

Spectroscopic Observations
2.2.1. ALMA 3 mm blind scans In order to obtain redshifts for the SPT sample, a blind spectroscopic redshift search was started in ALMA cycle 0 (project ID: 2011.0.00957.S). This program resulted in a 90% line detection rate for the 26 sources surveyed (Weiß et al. 2013). An updated distribution was presented in Strandet et al. (2016) with an additional 15 sources (project ID: 2012.1.00844.S). This work represents the conclusion of the SPT blind redshift survey, and presents spectroscopic scans for the remaining sources from ALMA cycles 3, 4 and 7. The individual 3 mm scans can be found in Ap. B. The blind spectroscopic search was conducted using ALMA's band 3 receiver, which operates between 84 − 116 GHz. The correlator has a total bandwidth of 7.5 GHz, which is split across two side bands. The entire 3 mm atmospheric transmission window can be covered in five tunings, as shown in the left panel of Fig. 2. This configuration results in overlapping coverage in the 96.2−102.8 GHz region. Fig. 2 also demonstrates this search's sensitivity to CO lines be-tween the CO(1 -0) and CO(7 -6) transitions. Scanning this region results in redshift coverage of 0.0<z<0.4 and 1.0<z< 8.6 with a narrow redshift desert at 1.74<z<2.00. Given the scanned frequency range, ALMA's primary beam ranges from 45 − 61 .
The cycle 3 ALMA observations were conducted from December 2015 -August 2016 (project ID: 2015.1.00504.S). During cycle 3, between 34 and 41 antennas were employed, and resulted in typical synthesized beams of 3.9 ×4.7 to 3.3 ×4.1 (FWHM) from the low to high frequency ends of the band. Each target was observed for 61 − 91 seconds in each tuning, or roughly 6 minutes on-source for each source. Typical system temperatures were measured to be T sys = 55 − 84 K. Flux calibration was performed on Uranus, Neptune, Ganymede, J0519-4546, and J0538-4405. Bandpass and phase calibration were determined from nearby quasars.
The cycle 4 observations were conducted from November 2016 -May 2017 (project ID: 2016.1.00672.S). In cycle 4, each scan utilized between 38 and 46 antennas, resulting in typical synthesized beams of 4.1 × 5.0 to 3.5 × 4.3 (FWHM) from the low to high frequency ends of the band. Each target was observed for ∼12 − 15 minutes on-source for each source, not including overheads. Typical system temperatures were measured to be T sys = 60 − 89 K. Flux calibration was performed on Mars, Uranus, Neptune, J0334-4008, J0538-4405, J2056-4714, and J0519-4546. Bandpass and phase calibration were determined from nearby quasars.
Because a total of three sources did not exhibit lines in their initial 3 mm line scans (SPT0112-55, SPT0457-49 and SPT2340-59; more detail in Sec. 3.1), additional deeper 3 mm scans for these sources were conducted in cycle 7. These observations were conducted from November 2019 -January 2020 (project ID: 2019.1.00486.S) with the aim of observing possible line features with fluxes of 1−2 mJy identified in the earlier scans. Each scan utilized between 42 and 49 antennas, resulting in minimum and maximum angular resolutions of 2.0 −4.0 from the high to low frequency ends of the observed band. Each target was observed for 45 − 91 minutes on-source, not including overheads. Typical system temperatures were T sys = 62 − 72 K. Flux calibration was performed on J0006-0623, J0519-4546 and J0238+1636. Bandpass and phase calibration were determined from nearby quasars.
The data were processed using the Common Astronomy Software Application (CASA; Mc-Mullin et al. 2007;Petry et al. 2012). Calibrated data cubes were constructed using CASA's TCLEAN package. The cubes have a channel width of 62.5 MHz (∼220 kms −1 ). Observations from cycle 3 had a typical noise per channel of 0.5 − 0.8 mJy/beam. The TCLEANed continuum images have typical noise levels of 50 µJy/beam. The sources observed in cycle 4 had a typical noise per channel decreased to 0.4 − 0.6 mJy/beam. The TCLEANed continuum images from cycle 4 had typical noise levels of 40 µJy/beam. Finally, the TCLEANed continuum images for the three sources reobserved in cycle 7 had typical noise levels of 10 − 70 µJy/beam and a typical noise per channel of 0.2 − 0.3 mJy/beam. , and H 2 O emission lines as a function of redshift. The darker teal shaded region designates the redshift range in which two or more strong lines are expected to be detected, which would provide an unambiguous redshift for a given observation. The lighter teal region marks redshift range where only a single line is detectable, and an ancillary spectroscopic observation or photometric redshift would be required to identify the correct redshift. The five frequency tunings used in the 3 mm line scans are shown in the left panel.

Additional Spectroscopic Observations
Because many observations from the blind 3 mm line scans described in Sec. 2.2 contain single CO lines, additional observations were required in order to break degeneracies between redshift solutions and obtain unambiguous spectroscopic redshifts. The First Light APEX Submillimetre Heterodyne receiver (FLASH, Heyminck et al. 2006) was used to conduct a survey of the [CII] emission line. A subset of these observations were published in Gullberg et al. (2015) and new [CII] observations were used to confirm four spectroscopic redshifts in this work. An additional 13 redshifts were confirmed through targeted CO line searches in the ALMA 2 mm band. Finally, the Australia Telescope Compact Array (ATCA) was used to conduct a survey of CO(1-0) and CO(2-1) and confirmed a subset of observations ). One of these observations was obtained after the publication of Aravena et al. (2016) is used to confirm one additional redshift in this work. The details for all of these observations can be found in Ap. C. Of the spectroscopic redshifts presented, only two are based on a single line and are still awaiting additional observations to confirm the redshift.

Photometry and SED Fitting
2.3.1. Photometry The SPT DSFGs have superb FIR-mm photometric coverage, with flux densities measured at 3 mm (ALMA), 2 mm, 1.4 mm (SPT), 870 µm (APEX/LABOCA), 500 µm, 350 µm, and 250 µm (Herschel /SPIRE) for all sources. Additional Herschel /PACS 160 µm and 100 µm observations were obtained for a subset of 65 sources. Despite the large range in redshifts (1.9<z<6.9), the photometry is complete between 71 µm<λ rest <380 µm, and the peak of the FIR SED at ∼100µm is always well constrained. The flux densities for all photometric points can be found in Ap. D. The absolute calibration uncertainties of 10% for Herschel /PACS, 7% for Herschel /SPIRE data, 12% for APEX/LABOCA, 7% for SPT, and 10% for ALMA data added in quadrature to the errors quoted in Ap. D.
ALMA -The ALMA 3 mm continuum maps were obtained as a result of the observations described in Sec. 2.2.1. The continuum images were created using CASA's TCLEAN procedure with the full observed bandwidth (84.2−114.9 GHz) with natural weighting in order to optimize sensitivity. In cases where the source is unresolved, meaning >90% of the total flux detected was contained within one beam, we extract flux from the brightest pixel detected from the continuum to obtain a spectrum. The error on flux density was calculated using the RMS of the residual map produced by the TCLEAN procedure. However, half of the SPT-selected sources are marginally resolved (> 80% of the source's flux is contained within one beam). In order to obtain spectra for these sources, CASA's imfit routine is used to fit a 2D Gaussian to the source in each datacube slice and extract the flux and associated error.
SPT -The SPT 1.4 mm and 2.0 mm flux densities were extracted from CMB maps acquired from the first survey, SPT-SZ. This survey was completed in November 2011 and covered 2500 deg 2 of the southern sky in three frequency bands, 95, 150, and 220 GHz (corresponding to 3.2, 2.0 and 1.4 mm, respectively) with arcminute angular resolution. Absolute calibration for both the 1.4 and 2.0 mm bands is derived from the CMB and the calibration uncertainty is 10%. The data were extracted and deboosted according to the procedure described in Everett et al. (2020).
APEX -The flux densities observed at 870 µm were performed with LABOCA at APEX. LABOCA is a 295-element bolometer array with an 11.4 field-of-view and a measured angular resolution of 19.7 (FWHM). The center frequency of LABOCA is 345 GHz (870 µm) with a passband FWHM of ∼ 60 GHz. The measured noise performance for these observations was 60 mJy s 1/2 .  -2013, M-092.F-0021-2013). For more details on the observations, see Strandet et al. (2016).
APEX/LABOCA maps were created for each source using the Bolometer Array analysis software (BoA; Schuller et al. 2010). The resulting time-ordered data undergo various calibration, noise removal, and flagging procedures detailed fully in Greve et al. (2012). The data is then gridded and individual maps are co-added with inverse variance weighting. The flux densities were either extracted from the peak flux density, in case of point-like sources or by integrating over the emission region, in cases where LABOCA resolves the emission.
Herschel/SPIRE -Flux densities at 250 µm, 350 µm and 500 µm for all sources were measured by the Spectral and Photometric Imaging Receiver (SPIRE) (Griffin et al. 2010) onboard the Herschel Space Observatory. The data were observed in two programs (project IDs: OT1 jvieira 4 and OT2 jvieira 5) conducted between August 2012 to March 2013. The Herschel /SPIRE data consists of triple repetition maps, with coverage complete to a radius of 5 from the nominal SPT position. The maps were produced using the standard reduction pipeline HIPE v9.0. Flux densities were extracted by fitting a Gaussian profile to the SPIRE counterpart of the SPT detection and the noise was estimated by taking the RMS in the central 5 of the map.
Herschel/PACS -Additional data were obtained for a subsample of 65 sources at 100 and 160 µm using the Photodetector Array Camera & Spectrometer (PACS) onboard Herschel (project IDs: OT1 jvieira 4, OT1 dmarrone 1, OT2 jvieira 5 and DDT mstrande 1). The additional data ensures that the thermal peak is well-sampled and is complete for z<2.5. The data were acquired using approximately orthogonal scans centered on the target at medium speed (i.e., with the telescope tracking at 20 s −1 ), spending a total of 180 s on source per program. Each scan was composed of ten separate 3 strips, each offset orthogonally by 4 and both wavelengths were observed simultaneously. The scans were co-added and weighted by coverage. The data were then handled by a variant of the reduction pipeline presented in Ibar et al. (2010). The resulting noise levels were calculated using random aperture photometry and were found to be σ≈4 and 7 mJy at 100 and 160 µm respectively. Flux densities were extracted by fitting a Gaussian profile to the SPIRE counterpart of the SPT detection and the noise was estimated by taking the RMS in the central 5 of the map.

SED Fitting
We fit each source in each sample with a modified blackbody law (e.g. Blain et al. 2003;Casey 2012) given by (1) where B ν is the Planck function for a value of T dust or the CMB temperature, T CMB . In order to reduce the number of free parameters and mitigate the degeneracies between redshift and T dust , we fix the Rayleigh-Jeans spectral slope, β. Empirically, the value β=2 was well-matched to the data, so we fix this parameter for our modified blackbody fits in a similar fashion to what was done in Greve et al. (2012). However, rather than fix ν 0 ≈3000 GHz (λ 0 ≈100 µm), we use the empirical relationship between λ 0 and T dust given in Eq. 2 of Spilker et al. (2016) to constrain λ 0 . Using this relation provides a better alternative to assuming a single value for λ 0 when an independent estimate of the size of the emission region is not available. We find that the introduction of this dependency between T dust and λ 0 improves both the reduced χ 2 value and photometric redshift. It should be noted, however, that this procedure tends to increase the value of the dust temperature by ∼20%. The only free parameters in this SED fit are the overall SED normalization, dust temperature, and redshift.
Because a modified blackbody fit alone does not typically describe the mid-IR excess found in the Wien side (< λ rest =50 µm) of the thermal emission peak, we perform another fit including an additional power law component . The power law component introduces another free parameter, α, which is the power law slope. The combined modified blackbody and power law fit empirically describes all of the available photometry, including the data on the Wien side of the thermal emission peak. In this work, we use this SED fit to define the frequency at which thermal emission peaks, λ peak , obtain a best fit to β for the M d calculation, and to determine total L IR .
In order to fit the data, we employ a Markov Chain Monte Carlo (MCMC) algorithm using the emcee package (Foreman-Mackey et al. 2013) to sample the posterior probability function. To ensure uniform photometric coverage in the fit region, we mask data shortward of λ rest =50 µm (Greve et al. 2012) for the modified blackbody fits. Fitting done with an additional power law included all available photometry points. The results of this fitting procedure are described in Sec. 3.2.

Calculating Intrinsic Source Properties
In this section, we describe the intrinsic source properties, which are calculated using the SED fits. Because we constrain λ 0 as a function of T dust (as described in Sec. 2.3.2), we are able to better understand the T dust distribution of the sample. The apparent FIR luminosity (L FIR ) is calculated by integrating the fitted SED over the wavelength range 42.5 − 122.5 µm (Helou et al. 1988). In order to obtain the IR luminosity (L IR ), we integrate the modified blackbody function with an additional power law over the 8 − 1000 µm range.
With FIR luminosity and T dust values, we derive star formation rates and dust masses for each source. The dust masses are calculated according to: (2) where S ν is the flux density at 345 GHz in the rest frame, determined from our SED fit. D L is defined as the luminosity distance, T CMB (z) is the cosmic microwave background temperature at redshift z, and µ is the magnification factor. We adopt κ(ν)/ m 2 kg −1 = 0.015 × (ν r /250 GHz) β (Weingartner & Draine 2001;Dunne et al. 2003), where β=2.0 is the dust emissivity index. The calculated dust masses are corrected using the magnifications presented in Spilker et al. (2016), when available. For the 34 sources without lens models, the median magnification ( µ 870 µm =6.3) of the modeled sources is adopted.
To derive total star formation rates (SFR), we use the following conversion from Murphy et al. (2011): (3) where the infrared luminosity, L IR , is calculated from the 8 − 1000 µm range using the modified blackbody fit with an additional power law to describe the mid-IR excess. This conversion was calculated using Starburst99 (Leitherer et al. 1999) for a Kroupa initial mass function (Kroupa 2001).

Spectroscopy Results
Building on the work of Weiß et al. (2013) and Strandet et al. (2016), the final SPTselected DSFG sample is composed of 81 sources. We have obtained spectroscopic redshifts for the complete SPT-selected sample, making our catalog the largest and most complete redshift survey of high redshift DSFGs to date. We begin by presenting the final blind 3 mm CO line scans obtained by ALMA and the resulting spectroscopic redshifts. For the cases where only a single CO line was detected, we discuss any ancillary spectroscopic data used to confirm the redshift in Ap. C.4. We also discuss the three 3 mm spectra where no spectroscopic lines were present in the initial 3 mm scans. These sources were re-observed and deeper scans enabled secure redshifts to be obtained. Finally, we discuss the two single line spectra where no ancillary data has yet been obtained, and discuss the most probable redshift.
All of the SPT-selected DSFG spectra, including those originally published in Weiß et al. (2013) and Strandet et al. (2016), are summarized in Fig. 3. A complete summary of the spectroscopic lines detected for each source can be found in Ap. E. In this work, we detect 62 strong line features from 12 CO and [CI] with integrated SNR >5σ. We detect an additional 29 weaker features (>3σ), which include HCN, HCO + , H 2 O, 13 CO, and CN.
We detect 3 mm continuum emission for all the 40 previously unpublished SPT-selected DS-FGs presented in this work. The positions for these sources were obtained by fitting Gaussian profiles to the ALMA 3 mm data and listed in Ap. A. The continuum flux densities were also obtained in these fits and are given with the other photometric observations in Ap. D.

Unambiguous Cases
We detect two or more line features in the 3 mm spectra for ∼ 46% of the SPT-selected catalog. Because of the unique distances between the CO rotational states, these redshifts can be related to rest frame spectra unambiguously. The redshifts are derived by averag-ing the redshifts for individual line detections. Because these line profiles were fitted using a Markov-Chain Monte Carlo (MCMC) to sample to posterior probability, the values differ slightly from the values published in Weiß et al. (2013), Strandet et al. (2016) and Strandet et al. (2017). However, the redshifts with previously published values differ at the < .1% level and agree within their stated error bars. Tab. E.1 summarizes all of detected line features for the full SPT DSFG catalog and their derived redshifts.

Single Line Detections
Spectroscopic redshifts can also be calculated from spectra with a single line feature. However, such redshifts have multiple degenerate solutions, as the observed transition cannot be unambiguously identified. Ancillary spectroscopic observations are required in order to break the degeneracy. Line scans specifically targeting CO transitions and [CII] at the expected redshift solutions have been obtained with ALMA, APEX and ATCA. These observations are described in detail in Ap. C and are used in order to confirm an additional 15 redshifts. Any ancillary data obtained is noted in the comments section of Tab. E.1.
However, there are still two sources (SPT0150-59 and SPT0314-44) with a single 3 mm feature that do not yet have any associated ancillary spectroscopy. These sources are identified in Tab. E.1 as the bolded sources. In both cases, these spectra cannot result from CO transitions of J=4-3 or higher because these lines would be accompanied by another line within the observing band (see Fig. 2). For more detail, see Sec. 3.2.1. Additionally, we can use the available photometry to determine the most probable redshift solution. Both methods indicate that CO(3-2) is the most probable identification, but follow up spectroscopy is needed for redshift confirmation.

CO(3-2)
CO(4-3) CO(5-4) CO(6-5) CO(7-6) [CI](1-0) Figure 3. Top panel: All obtained ALMA 3 mm spectra in the rest frame, with the associated redshift shown on the right of each spectrum. Not shown: three DSFGs which do not have ALMA 3 mm data, but were confirmed through other programs (SPT0538-50, SPT0551-48, SPT2332-53) and published in Greve et al. (2012) and Strandet et al. (2016). By performing a 1.4 mm flux-weighted average of the observed continuum-subtracted rest frame spectra, we obtain the composite spectrum presented in the bottom panel. This was first done in Spilker et al. (2014), but has been updated to include the final SPT-selected sample.

No Line Detections
Combining the results from our two previous redshift papers, there are three sources where no line could be identified in the 3 mm window: SPT0128-51, SPT0457-49 and SPT2344-51. In this work, we find one additional source, SPT0112-55, where we do not detect a line in our initially shallow band 3 observations. SPT0128-51 and SPT2344-51 failed to meet our S 870 µm > 25 mJy cut and are not re-tained in the final flux-limited sample. We obtained deeper 3 mm scans in ALMA cycle 7 for SPT0112-55 and SPT0457-49. We also re-observed SPT2340-59, which was originally published with a tentative single line detection in Strandet et al. (2016) to secure its redshift. All three showed > 5σ detections of CO(4-3) and [CI](1-0) and the spectra are shown in Fig. B.1. While SPT0112-55 does not have obvious multiplicity, both SPT0457-49 and SPT2340-59 had flux split between two sources. Optical imaging reveals that SPT2340-59 is almost certainly lensed, while SPT0457-49 is a protocluster candidate.

Summary of Spectroscopic Results
In summary, all of the combined observational efforts have yielded secure redshifts for the complete flux-limited sample of 81 sources from the 2500 deg 2 SPT survey. Of these, 79 sources had multiple spectroscopic lines, detected either solely from the 3 mm window or with ancillary spectroscopic observations. Only two of the redshifts are based on a single line, but they have no other possible redshift solution and agree with our photometric redshift from the distribution of dust temperatures. Altogether, this is the largest and most complete collection of spectroscopic redshifts for high-redshift DSFGs obtained so far at mm wavelengths.

Photometric Results
In this section, we first discuss the results of fitting the available photometry with the procedure outlined in Sec. 2.3.2. We present the fits, along with the derived dust temperatures and photometric redshifts in Sec. 3.2.1. With the SED fits in hand, we calculate the intrinsic source properties for the sample, and present them in Sec. 3.2.2.

SED Fits, Dust Temperature and Photometric Redshifts
More information can be obtained by fitting the photometry using the process outlined in Sec. 2.3.2. We first fit a modified blackbody to the FIR thermal emission peak for rest wavelengths >50 µm. We also fix the redshift parameter to the spectroscopic values obtained in Sec. 3.1, shown in Fig. 4 in red. This model describes the FIR thermal emission peak well, with a median reduced χ 2 value of 1.8. We also perform another fit with an additional power law component in order to fit the complete L IR wavelength range (shown in Fig. 4 in blue), which gives a median reduced χ 2 of 0.7.
Sources with large χ 2 values generally exhibit a discrepancy between the 3 mm flux and the value predicted by the modified blackbody. Because of the discrepancy in beam sizes between ALMA and SPT, objects which are broken into multiple components are sometimes below the ALMA 3 mm detection threshold, leading to an underestimation of the total 3 mm flux. Using ALMA 870 µm imaging , we verify that the source is split into multiple components. In cases of multiple components, the 3 mm point is then masked when fitting the thermal emission peak. One of these sources, SPT2349-56, has already been identified as a protocluster (Miller et al. 2018;Hill et al. 2020). This discrepancy in 3 mm flux from over-resolving the emission may be a good way to separate unlensed protoclusters from lensed DSFGs. The remaining sources are being investigated as potential protocluster candidates. As previously stated, any photometry <50 µm in the rest frame was masked for the modified blackbody fit. Any masked photometry points are shown in Fig. 4 in grey.
By extracting a dust temperature from the SED fit, a dust temperature probability distribution was created by sampling each source's dust temperatures 10 3 times using a Monte Carlo procedure, shown in Fig. 5. Though the distribution peaks near T dust =40 K, the median of the distribution is significantly higher at T dust =52.4 ± 2.3K, with a tail extending past ∼100 K. Given the relationship between λ 0 and T dust discussed in Sec. 2.3.2, the implied median of the λ 0 distribution is 155±7 µm. These warm dust temperatures suggest that there could be a correlation between redshift and dust temperature, which we discuss in Sec. 4.2.
Although spectroscopic redshifts have been obtained for all sources, the photometric redshift can still be used in order to break degener-  . Spectral energy distribution fits for all 81 of the SPT-selected DSFGs. The modified blackbody model (red ) was fit by masking data shortward of λ rest < 50 µm, such that all sources have roughly uniform photometric coverage and excess emission on the Wein side of the blackbody does not artificially drive the dust temperature towards higher values. Sources which exhibit multiple components in 3 mm are fitted by masking the 3 mm point. Any photometry points which were masked for the modified blackbody fit are shown in grey. In order to account for the mid-IR excess, a power law can be added on to the modified blackbody function  to better describe all available data (blue). ate redshift solutions in the case where a single CO transition was detected, as well as inform future spectroscopic and photometric surveys. In order to find the photometric redshift, we use the dust temperature distribution given in Fig. 5 as a prior and simultaneously fit the T dust and redshift parameters. This method exhibits good agreement with the spectroscopic redshift, as seen in Fig. 6. This photometric redshift fitting can also be used to break degeneracies for the two single 3 mm line sources (SPT0150-59 and SPT0314-44). For a given degenerate redshift solution, we fix the redshift parameter to that solution and use the SED fitting procedure to find the associated dust temperature. Using the dust temperature distribution given in Fig. 5 as a prior, we assign a likelihood of the source being at each possible redshift solution (see Ap. C.4 and Fig. C.5 for details). For both single-line sources, CO(3-2) is the most probable identification. If the detected line was in fact the higher-J transition (e.g. CO(4-3)), an additional CO line should have been detected. Taken together, these two independent pieces of evidence indicate that there is no ambiguity in the redshift of these sources. . Photometric redshift compared with spectrosocpic redshift. We find good agreement between photometric and spectroscopic redshifts, with unity shown as the dashed line.
Though we are complete in spectroscopic redshifts, photometry provides insights that inform future surveys. Because the SPT DSFGs have excellent FIR-mm coverage, it is possible to use single flux density ratios to obtain information about the redshift. For instance, the S 350 µm /S 870 µm ratio can be used as a rough indicator of redshift. We find good agreement between our catalog, Herschel /SPIRE sources (Negrello et al. 2010;Conley et al. 2011;Omont et al. 2011;Harris et al. 2012;Wardlow et al. 2012;Bussmann et al. 2013;Gladders et al. 2013;Riechers et al. 2013;Messias et al. 2014) and the redshifted SED of Arp220 (Silva et al. 1998), shown in Fig. 7. We perform a maximum likelihood estimation assuming the 350 − 870 µm flux density ratios for the Herschel /SPIRE and SPT sources can be described by an exponential model with an extra Gaussian variance term to account for intrinsic scatter. We find the following exponential fit best describes the available data: S 350 µm S 870 µm (4) which is shown with a 1σ limit in Fig. 7. In performing this fit, we find an intrinsic logarithmic scatter of 4.2 ± 1.1. . The S 350 µm /S 870 µm color serves as a rough indicator for redshift. The SPT catalog is compared to Herschel /SPIRE sources and is welldescribed by an exponential fit. The Arp220 colors were obtained by artificially observing the Arp220 SED (Silva et al. 1998) for a range of redshifts.

Individual Source Properties
To demonstrate the extreme star-forming nature of the SPT DSFGs, we compare our sources to a flux-limited blank-field sample throughout this work. The original LABOCA ECDFS submillimetre survey (LESS) mapped the full Extended Chandra Deep Field South (ECDFS; Weiß et al. 2009). The survey area encompassed 0.5×0.5 deg 2 and detected a total of 126 DS-FGs above a significance level of 3.7σ. The individual source detections were followed up with ALMA in cycle 0 and the sample was thus called ALESS (Hodge et al. 2013;Swinbank et al. 2014). Photometry was obtained at 1.4 GHz, 870, 500, 350, 250 160, 100 and 70 µm in order to obtain the photometric redshifts ). The MAGPHYS SED modeling code was used to obtain photometric properties (da Cunha et al. 2015). Spectroscopic redshifts were also obtained for a subset of these sources (Danielson et al. 2017). Because the ALESS sample is complete for sufficiently bright sources, we use it as an example of the classic unlensed 850 µm-selected DSFGs with a fluxlimited selection. Unless otherwise stated, the same formalism used for the SPT analysis was also applied to the ALESS sources throughout this work.
The distributions of T dust , apparent FIR luminosity, and redshift are shown for both the SPT sample and the ALESS sample (da Cunha et al. 2015) in Fig. 8. The FIR luminosity and T dust values for both samples are derived from the SED fitting procedure described in Sec. 2.3.2. Because many of the ALESS sources do not have robust FIR detections due to their relative faintness and the Herschel /SPIRE confusion limit (Nguyen et al. 2010), their fitted SED values have large associated uncertainties when using the same SED fitting routine described in Sec. 2.3.2. In the left panel of Fig. 8, the lensed SPT sources are offset from the largely unlensed ALESS sample, as expected from gravitational lensing. Gravitational lensing randomly samples sources from the background and increases the solid angle they subtend. This effect makes the lensed sources appear more luminous at a given T dust than an unlensed source. It would take a median magnification of ∼18 to make the median apparent L FIR for the SPT sources without lens models (3.88×10 13 L ) on average intrinsically identical to the median L FIR for the ALESS sources (2.3(2)×10 12 L ).
On the right panel of Fig. 8, both samples exhibit a roughly constant FIR luminosity as a function of redshift, demonstrating the negative K-correction inherent in sub/mm surveys of DSFGs. The approximate detection limits for both surveys and the Herschel /SPIRE confusion limit are also shown, assuming a standard template from Chary & Elbaz (2001). These limits illustrate the importance of selection wavelength and survey depth on the resultant redshift distribution of the sample. Though SPT is only sensitive to the most luminous sources, the detection threshold at the selection wavelength of 1.4 mm corresponds to a decreasing (apparent) luminosity at higher redshift. This is in contrast to the selection curve for the Herschel 500 µm selection, which corresponds to an increasing luminosity at higher redshift because 500 µm corresponds to rest wavelengths near or beyond the peak of the dust SED at higher redshifts.
In order to determine the intrinsic properties for each source, as in Fig. 9, lens models are needed to calculate the magnification factor. Using the 47 SPT-DSFGs with lens models , the magnification factors for individual sources are used to calculate the intrinsic luminosity. For the 34 sources without lens models, the median magnification ( µ 870 µm =6.3) is adopted. These magnifications were used to calculate the intrinsic FIR luminosity, dust mass and SFR values, as described in Sec. 2.3.3. All of the derived properties can be found in Ap. F.
The intrinsic dust masses and FIR luminosities are shown in Fig. 9 compared with T dust and redshift. We find that the SPT sources have a median intrinsic dust mass of 1.4(1)×10 9 M , while ALESS sources have a median of 7.4(9)×10 8 M . The median SPT intrinsic luminosity is 7.1(5)×10 12 L while ALESS sources exhibit a lower median L FIR = 2.37(2)×10 12 L . Thus, the SPT sources exhibit both higher luminosities as well as dust masses compared to the ALESS sources. These results are discussed in detail in Sec. 4.3.

DISCUSSION
In this section, we begin by discussing the complete spectroscopic redshift distribution in Sec. 4.1, along with the various selection effects. We explore the dust temperature evolution as a function of redshift in Sec. 4.2. We then discuss the extreme nature indicated by the intrinsic source properties of the sample in Sec. 4.3. Finally, because our sample is at higher redshift than most DSFG samples, we discuss the implications for the space density of high redshift DSFGs in Sec. 4.4.

Redshift Distribution
The redshift distribution of the full SPT-DSFG sample is shown in Fig. 10. The median of this distribution is z median = 3.9 ± 0.2, which is unchanged from the previously published values of z median = 3.9 ± 0.4 from Strandet et al. (2016). Given that the new sources presented in this work represent an effectively random sampling of the original sample published in Weiß et al. (2013) and Strandet et al. (2016) (shown in Fig. 1), it was expected that the median of the distribution would not shift significantly. The final distribution peaks between 3<z<5, with a large fraction (76%) of the sample at z > 3, shown in the top panel of Fig. 10. The SPT sources peak at significantly higher redshift than their unlensed counterparts (z∼2.3−2.9; Simpson et al. 2014;Koprowski et al. 2014;Miettinen et al. 2015;Danielson et al. 2017;Simpson et al. 2017;Micha lowski et al. 2017;Brisbin et al. 2017 Danielson et al. (2017) and the lensed SPT sources presented in this work (red ). The offset in L FIR between the lensed (SPT) and unlensed (ALESS) sources is due to gravitational magnification. It should be noted that for many of the ALESS sources, the photometry is not well constrained, which is reflected in the size of the associated error bars. Right: The FIR luminosity as a function of redshift for the same sources. The teal, purple, and red curves represent the limiting FIR luminosity as a function of redshift corresponding to: 3× the LESS limit (r.m.s. ∼1 mJy), 3× the SPIRE 500 µm confusion limit (30 mJy), and the 3σ SPT 2.0 mm survey limit (3.9 mJy), given the SED model of this paper and assuming a 35 K dust temperature. Many of the sources discovered in the LESS survey split into multiple components, and were treated as individual sources in the ALESS survey and consequently fall below the plotted LESS survey limit. The SPT survey, with its longer wavelength selection, is more sensitive to sources at the highest redshifts (z>5) than Herschel. multiple sources which break up into multiple individual galaxies at the same redshift when observed at higher resolution ( 1 ). As previously discussed in Sec. 3.2, the differences between the ALMA and SPT beam sizes can result in over-resolving the emission, causing the 3 mm flux to be underestimated in SED fitting. Though we treat these systems as individ-ual DSFGs in this work, at least two of these sources are known to break into multiple components. SPT0311-58 (Strandet et al. 2016) was shown in Marrone et al. (2018) to be composed of two individual galaxies at z=6.900, a western source with an intrinsic L IR = 33 × 10 12 L and an eastern source with an intrinsic L IR = 4.6 × 10 12 L . SPT2349-56 is among the most actively star-forming high-redshift protoclusters known (Miller et al. 2018), and is composed of at least 29 individual galaxies, six of which are at L IR > 3 × 10 12 L (Hill et al. 2020  higher redshifts. For the purpose of simplicity and homogeneity, we treat each of these systems as a single source in this analysis.

Selection Effects
The two major selection effects that impact our distribution are due to the long wavelength selection (1.4 mm) and the high S 870 µm flux cut, which selects almost exclusively gravitationally lensed sources. Both of these effects are known to bias a redshift distribution towards higher redshifts. In this section, we compare the model of Béthermin et al. (2015a) with our observed distribution. This model was selected because it best described the distribution in Weiß et al. (2013) and takes into account both selection wavelength and lensing. These effects were previously shown to account for the redshift distribution for the subset of the SPT DSFG sample published in Strandet et al. (2016) and Béthermin et al. (2015a). Some works (Blain et al. 2002;Greve et al. 2008;da Cunha et al. 2013;Staguhn et al. 2014) suggest that the CMB could make cold DSFGs at high redshifts difficult to detect. However, since the SPTselected DSFGs are warm, with a median dust temperature of T dust = 52.4 K and minimum of T dust = 20.9 K, this effect only becomes relevant at very high redshifts (z>6 − 18).
In Fig. 10, we examine the effect of a long wavelength selection on the observed redshift distribution using the Béthermin et al. (2015a) model. By applying a different selection wavelength on a realistic population of galaxies, we see the median shift from z∼2.6 at 850 µm to z∼2.8 at 1.4 mm. A Kolmogorov-Smirnov (K-S) test reveals that the observed SPT-DSFG distribution is distinct from the modeled data populations at levels of 7.7σ and 5.4σ for the 850 µm and 1.4 mm wavelength selections, respectively. Long wavelength selection alone does not explain the difference between the Béthermin et al. (2015a) model and the observed SPT data.
Based on models of the high redshift DSFG population (e.g. Baugh et al. 2005;Lacey et al. 2010;Béthermin et al. 2012b;Hayward et al. 2013;Béthermin et al. 2017;Lagos et al. 2019;Lovell et al. 2020), very few sources would be intrinsically bright enough to exceed our adopted flux density threshold at 870 µm (>25 mJy). Because of the high apparent luminosities, the SPT DSFG sample was expected to consist almost solely of gravitationally lensed sources (Blain 1996;Negrello et al. 2007). Indeed, Spilker et al. (2016) concludes that ∼70% of the sample is strongly lensed, while the remainder of the sample is either weakly lensed ( 20%) or unlensed ( 5%). Strong lensing preferentially selects sources at high redshift, biasing an intrinsic redshift distribution to higher redshifts. The probability of sources at z<1.5 undergoing strong lensing is heavily suppressed relative to sources at higher redshifts (z>4) (e.g. Hezaveh & Holder 2011). The 1.4 mm lensed model in Fig. 10 demonstrates the combined effect of the lensing potential on the observed redshift distribution and long wavelength selection. Though this model brings the median of the Béthermin et al. (2015a) mock catalog to z∼3.2, the observed SPT DSFG redshift distribution is still higher than this model and a KS test rules it out at a level of 3.2σ. A possible reason for discrepancy between the Béthermin et al. (2015a) model and the SPT redshift distribution is that the model was fit to data from previous surveys, which typically peaked at lower redshift. The original Béthermin et al. (2012b) model is based on the stellar mass function and evolution of the main sequence and relies on SED template fitting. As these parameters become better constrained with more observational data from the SPT catalog and others, more realistic models will be possible in the future.
The lensing analysis in Béthermin et al. (2015a) assumes that the sources do not undergo significant size evolution over cosmic time. Compact sources have a higher probability of being highly magnified than more diffuse sources (Hezaveh et al. 2012). Thus, size evolution verses redshift coupled with gravitational lensing could potentially bias a redshift distribution towards higher redshift, as discussed in Weiß et al. (2013). The fitted values of L FIR and T dust can then be used to determine the effective blackbody radius of the source via the modified blackbody function (e.g. Eq. 1 of Spilker et al. 2016). The median effective blackbody radius for SPT-selected DSFGs is 1.0±0.1 kpc, which is consistent with the median radius found using lens modeling (1.04±0.07 kpc; Spilker et al. 2016). The sizes of the SPT DSFGs are also compatible with the sizes (0.3−3 kpc) observed in unlensed DSFGs (Ikarashi et al. 2015;Simpson et al. 2015;Hodge et al. 2016;Ikarashi et al. 2017). None of these observations show evidence of size evolution with redshift above z>1.

Dust Temperature Evolution
With the first spectroscopically complete sample of DSFGs in hand and excellent FIR photometric coverage for all sources, we are now in a position to investigate whether the typical FIR properties of DSFGs evolve with redshift. For example, observations of 'normal' star-forming galaxies (Magdis et al. 2012;Magnelli et al. 2014) and backwards evolution modeling (Béthermin et al. 2012b;Schreiber et al. 2018) imply that higher redshift sources should exhibit progressively warmer dust temperatures. One possible explanation for this effect would be that T dust is proportional to the L IR /M dust ratio, which is proportional to the specific star formation rate (Narayanan et al. 2018;Liang et al. 2019;Ma et al. 2019). While these studies focused on dust temperature evolution in lower-SFR objects than our sample, determining whether DSFGs show temperature evolution has been difficult due to the highly uncertain photometric redshifts, confusion limits, and possible selection effects towards more luminous (and therefore, potentially warmer) galaxies in flux-limited surveys.
We examine this relationship by considering the dust temperatures from Sec Binning the dust temperatures with bin widths z=1 also yields a linear dependence (yellow ). Bottom: Peak wavelength (λ peak ) versus redshift. The wide sampling of the SED makes the λ peak constraint insensitive to the SED fitting procedure. The same analysis was repeated and the evolution with redshift is weaker, but still produces a non-zero slope.
shown in Fig. 11. In order to fit the data, we perform a maximum likelihood estimation assuming a linear model with an extra Gaussian variance term to account for intrinsic scatter. Using this method, the best fit line is given in Fig. 11 as the dashed line, with a slope of 4.7±1.5 K. However, the fitted intrinsic scatter is substantial (8.4±1.3 K). While there is evidence of temperature evolution, the large amount of intrinsic scatter implies the fit is also consistent with very shallow to no evolution with redshift. Performing a least squares fit without the intrinsic scatter term will also give a nonzero linear dependence. However, such a fit is largely driven by the presence of significant outliers (e.g. SPT0452-50 with T dust =21 K and SPT0346-52 with T dust =79 K). One way to mitigate the presence of outliers in this approach is to consider the binned temperature distribution. Sources were binned with a bin width of ∆z=1 and the median dust temperature was adopted, with bootstrapped errors representing the error on T dust . Though fitting these points again presents a positive slope of 1.9 ± 0.5 K/z, a one-sided reduced χ 2 test yields a p-value of 0.275, which is not statistically significant (<1σ). Moreover, hypothesis testing reveals that a line with no slope (p-value =0.184) is favored.
Since the dust temperature is highly sensitive to the exact SED fit function, we also examine the rest frame λ peak . The wide sampling of the SED makes the λ peak constraint insensitive to the SED fitting function. The λ peak values used were extracted by fitting a modified blackbody fit with an added power law (individual fits shown in Fig. 4 in blue). Repeating the linear fit with maximum likelihood estimation, we find that λ peak also exhibits a non-zero slope of −5.9±1.3 µm/z. However, like dust temperature, any evidence of evolution with redshift is obscured by the 9.5±1.1 µm intrinsic scatter. Like dust temperature, the presence of outliers (e.g. SPT0245-63, SPT0529-54 and SPT0452-50) could affect the significance of a slope, so we performed the same binned analysis. While the binned medians yielded a negative slope of −3±2 µm/z, it again had a low significance (<1σ).
Taken together, SPT sample's dust temperature and peak wavelengths show evidence of the thermal emission peak evolving with red-shift, proportional to (1+z), though it is not strong evidence. The intensely star forming nature of our sample implies that our sources are starburst galaxies. Unlike the 'normal' main sequence galaxies, starbursts have been shown to have almost constant radiation field intensities from z=0 to z∼2 (Béthermin et al. 2015b;Schreiber et al. 2018). However, recent observations of GN20 (Cortzen et al. 2020) suggest that optically thick dust could obscure temperature evolution in [CI] at high-redshift. While it is possible that luminous starburst galaxies could evolve in dust temperature versus redshift, strong evidence for temperature evolution is obscured by the large scatter on T dust .

SPT Sources are Extreme ULIRGs/HyLIRGs
A consequence of modifying our SED fitting function to have the optically thick transition wavelength (λ 0 ) vary as a function of dust temperature is that the fitted dust temperatures obtained are systematically higher than if λ 0 was fixed to 100 µm, as in Greve et al. (2012). Fitting the ALESS photometry with this same SED fit gives a median dust temperature of T dust =54±4 K. While this dust temperature is significantly higher than what is commonly reported for DSFGs, which typically range between 30−40 K (Chapman et al. 2005;Greve et al. 2012;Swinbank et al. 2014;da Cunha et al. 2015), this is consistent with the values we obtain for the SPT DSFGs (T dust =52±2 K).
We use the procedure outlined in Sec. 2.3.3 to calculate dust mass for the SPT sources and show the demagnified, intrinsic values throughout this section. Molecular gas mass was obtained by assuming a constant gas-to-dust ratio (taken to be 100; e.g. Casey et al. 2014). Because the FIR peak for the ALESS sample is not always well-constrained, fitting the full IR range as described in Sec. 2.3.2 is not always possible. We instead adopt the values derived from MAGPHYS for comparison purposes (da Cunha et al. 2015), which takes into account the available photometry, as detailed in . When a direct SED fit is possible, the two methods produce results that agree within the stated errors, with no bias. We derive a median SPT dust mass of 1.4×10 9 M , compared with a median ALESS dust mass of 7.3×10 8 M . This implies that the SPT-selected sources have an order-of-magnitude higher dust reservoirs than the ALESS sample, despite being earlier in cosmic history (Fig. 9).
The SPT sources exhibit higher FIR luminosities than their unlensed analogs. SPT probes relatively shallow depths (shown in Fig. 8) and is only sensitive to the brightest, lensed sources. These sources have apparent luminosities that are an order of magnitude larger than unlensed sources. However, even after correcting for the gravitational magnification, the SPT luminosities range from 3.4−140×10 12 L with a median of 7.1(5)×10 12 L , compared to the median ALESS value of L FIR =2.3×10 12 L . If the median magnification of the SPT sources lacking lens models was raised from µ 870 µm =6.3 to ∼18, these discrepancies would also be resolved. We find this possibility unlikely, however, as the 47 SPT sources with lens models from high resolution ALMA imaging presented in Spilker et al. (2016) were effectively drawn at random from the larger SPT sample of 81 sources presented here. Thus, SPT sources, in addition to being strongly lensed, are also intrinsically more luminous than standard DSFGs found in blank-field surveys.
Taking into account the full IR range of the SED, we find a median L IR =1.5(1)×10 13 L , which corresponds to a median SFR of 2.3(2) × 10 3 M yr −1 for the SPT sample. Fig. 12 shows the gas mass (derived from M dust ) verses SFR (derived from L IR ). The SPT sample exhibits a correlation between the two quantities. By taking the ratio of gas mass to SFR, we calculate the gas depletion times (τ depl ) shown in Fig. 13. Though the SPT-selected sources have comparable amounts of gas mass to the ALESS sample, the SPT sources also experience higher star formation rates.
Because the magnification correction enters into both the SFR and gas mass calculations in the same way, it cancels out, assuming no differential magnification (Hezaveh et al. 2012). While (τ depl ) in 'normal' main sequence galaxies are known to evolve with redshift (Saintonge et al. 2013;Genzel et al. 2015;Tacconi et al. 2018), both the ALESS and SPT samples show comparable depletion times below the main sequence. We also find an evolution in τ depl with redshift, which was already observed for main sequence galaxies. Alternatively, this effect could also be the result of a strong redshift evolution of metallicity, with the highest redshift galaxies being the most metal-poor. This would cause an underestimation of gas mass, which would also explain the lower depletion times. In order to address this properly, however, we would need stellar masses for our systems. Because of the high-redshift dust obscured nature of these sources, this requires the use of future facilities such as the James Webb Space Telescope.
Though the dust temperatures of the SPT and ALESS sources are similarly distributed, with medians of T dust =52(2) K and T dust =54(4) K respectively, SPT sources are on average more luminous than their unlensed analogs, even after accounting for the gravitational lensing. The simple blackbody relationship would imply that these sources should exhibit similar luminosities, if they were the same sizes. The SPT sources have a median radius of 1.0(1) kpc, while ALESS sources have a median of 0.46(4) kpc. The higher intrinsic luminosities could therefore stem from larger sizes, as discussed in Sec. 4.1.2. This fact, coupled with the larger dust masses, implies that the SPT sources could have larger gas reservoirs, which is supported by their systematically higher dust masses. SPT with lens models SPT without lens models ALESS, z spec Saintonge+13 Figure 13. Depletion time (τ depl ) versus redshift for the SPT-selected sample and ALESS sample. The extreme properties of the SPT sample result in short depletion times, which imply that the SPT sources are experiencing a rapid burst of star formation. The typical main sequence values presented in Saintonge et al. (2013) are shown in grey.

High-z Tail
As previously discussed, 850 µm-selected samples of DSFG from blank fields (e.g. Chapman et al. 2005;Swinbank et al. 2014) peak at a lower redshift than the SPT sample, typically between z∼2.3−2.9. This discrepancy is attributed to selection effects, as discussed in the previous section (Sec. 4.1.2). While the SPT sample includes some sources equivalent to those selected in the blank-field surveys, it also includes the previously undiscovered high redshift and extreme tail of DSFGs. In addition to discovering the highest redshift DSFG to-date (SPT0311-58; Strandet et al. 2017;Marrone et al. 2017), the SPT DSFG catalog also contains roughly half of the z>5 DSFGs discovered Asboth et al. 2016;Walter et al. 2012;Rawle et al. 2014;Riechers et al. 2013;Dowell et al. 2014;Strandet et al. 2017;Riechers et al. 2014;Riechers et al. 2017;Pavesi et al. 2018;Jin et al. 2019;Fudamoto et al. 2017;Zavala et al. 2018;Riechers et al. 2020 A way to assess the relative rarity of SPTselected sources is to consider the space density. However, because the majority SPT-selected sources are lensed, the volume probed changes as a function of lensing and the effective volume is difficult to calculate. In order to estimate the survey volume, we consider the ALESS survey, which has a well-defined survey area and can be considered complete above a luminosity threshold of >3×10 12 L , as shown in Fig. 14. We assume the two surveys have equal source density in the region 2.5<z<3.5 for >3×10 12 L . We use this region to estimate the volume for the SPT survey and compare the two redshift distributions directly in Fig. 15. The error bars for SPT are taken to be the Poissonian errors from SPT added in quadrature with the ALESS errors in order to account for the scaling.
Because the SPT distribution is both selected at long wavelength and spectroscopically complete, we have measured the high redshift tail of the overall DSFG distribution. By summing the contributions in Fig. 15, we infer that the expected number of DS-FGs above z>4 is 1.7(7)×10 −6 /Mpc 3 above 3×10 12 L . We can also predict the number of z>6 sources above 3×10 12 L , which we take to be 6(7)×10 −8 /Mpc 3 , but this number should be used with caution, as it is based upon just one source (Marrone et al. 2018). To place this number in context, we compare with other observations Ivison et al. 2016;Cooke et al. 2018) and theoretical predictions (Hayward et al. 2013;Béthermin et al. 2017;Lagos et al. 2019;Lovell et al. 2020) in Tab. 2 and Fig. 16. The stated number densities are taken to have an upper limit of z=8 and the errors are again calculated using both the SPT and ALESS Poissonian errors in quadrature.
The Hayward et al. (2013) model combines a semi-empirical model with 3D hydrodynamical simulations and a 3D dust radiative transfer. Strong lensing is not included in the mod-eling and the model predicted dn/dz is determined using sources with S 1 mm >1 mJy, consistent with the expected intrinsic flux densities of our sample. While the Hayward et al. (2013) model over-predicts the number of z>4 and z>5 galaxies, it does not predict any galaxies at z>6, which is contrary to our observations. As previously discussed, Béthermin et al. (2015a) model is an empirical model, which starts from the observed FIR number counts. This model also includes the effects of magnification by strong lensing, so it can directly predict the dn/dz for the SPT sample. We find that the updated Béthermin et al. (2017) model is, in general, in good agreement with the SPT measurement.
The Shark semi-analytic model Lagos et al. (2018) is also compared in Tab. 2 and Fig. 16. The number densities of galaxies with L IR >3 × 10 12 L were computed using the simulated lightcone with an area of 107.9 deg 2 and covers a redshift range 0≤z≤8, and includes all galaxies in the model with stellar masses >10 8 M and with a (dummy) AB magnitude (of massto-light-ratio = 1) ≤ 32 mags. Note that here, L IR corresponds to the total luminosity that is re-emitted by dust in Shark at wavelengths ≤1000 µm. While this model under-predicts the number of sources at z>4 and z>5, it exactly matches at z∼6. Lovell et al. (2020) modelled the S 850 µm emission using a full radiative transfer code coupled with a full cosmological lightcone. To compare with our model we have selected sources at S 850 µm > 3mJy, which corresponds to roughly L FIR ∼ 3 × 10 12 L . The model is in good agreement with the SPT measurement at z∼6, but over-predicts the density of sources at z∼4.
With SPT, we have measured the "highredshift tail" of the distribution of luminous DSFGs. The SPT measurement for z>4 luminous sources lies within the range of model predictions and previous measurements and can  Figure 15. The spatial density of ALESS and SPT sources. Because the ALESS survey has a welldefined survey volume, we use it to correct for the unknown SPT survey volume. We assume both surveys are complete above a luminosity threshold of > 3×10 12 L and use the region where both surveys have equal source density (2.5 < z < 3.5) to scale the SPT source density. In this plot, union simply represents the overlapping source density regions of both surveys. already constrain models of galaxy evolution. The next-generation mm-wave surveys, combined with spectroscopic followup, will further improve this measurement and extend the redshift constraints out to z∼8.

CONCLUSIONS
In this paper we present the final spectroscopic redshift distribution for the SPT-selected DSFG sample, as well as the complete FIR, submm, and mm photometry. Our main results and conclusions are summarized as follows: • Spectroscopic redshifts have been obtained for all 81 SPT-selected sources. This survey has the highest spectroscopic completeness of any sample of high redshift galaxies to-date. With the high red-  shift sources presented in this work, the SPT DSFGs represent roughly half of the z>5 DSFGs in literature to-date, which will be vital in constraining models of massive galaxy formation.
• Fitting the photometry by using the empirical relationship between λ 0 (T dust ) derived in Spilker et al. (2016) improved the SED fitting and provided better photometric redshifts than the original method presented in Greve et al. (2012). The dust temperatures obtained in Sec. 3.2.1 sug-gest a mild correlation between redshift and dust temperature (e.g. Cooke et al. 2018), but this result is not yet statistically significant.
• Due to gravitational lensing, the apparent luminosity of SPT sources is an orderof-magnitude greater than the unlensed DSFG population. However, the intrinsic luminosities are still significantly higher than the unlensed 850 µm-selected population, with higher dust masses and lower depletion times for the SPT sources. Thus, even after accounting for magnification due to strong gravitational lensing, SPT-selected DSFGs are more extreme than those typically found in blank-field surveys.
• Even though the SPT survey encompassed a contiguous 2500 deg 2 area and the sample is flux-limited, the gravitational lensing intrinsic to the SPT sample makes it difficult to estimate the total volume probed by the survey. Scaling the SPT sample to that of the ALESS sample allows us to estimate the source density of luminous DSFGs at high redshift. From our data and analysis, we estimate 3.0 ± 3.7 deg −2 and 6±7×10 −8 Mpc −3 DSFGs with L FIR >3×10 12 L at z> 6.
• Though this SPT-selected redshift catalog is now complete, the next generation of CMB experiments are already in operation. SPT-3G (Benson et al. 2014) has already begun the process of surveying 1500 deg 2 field with a sensitivity of ∼5 times the original SPT-SZ survey and will detect two orders-of-magnitude more sources, extending all the way to z>8. CMB-S4 (Abazajian et al. 2019), still in the design phase, is projected to find an order-of-magnitude more DS-FGs than SPT-3G. The next-generation of mm-wave surveys with spectroscopic followup will be key to uncovering the the earliest stages of the dust-obscured Universe.
We APEX is a collaboration between the Max-Planck-Institut für Radioastronomie, the European Southern Observatory, and the Onsala Space Observatory. The Australia Telescope Compact Array is part of the Australia Telescope National Facility which is funded by the Australian Government for operation as a National Facility managed by CSIRO.

A. SPT SOURCES AND POSITIONS
We present a total of 81 sources in this work. Their positions are calculated using 3 mm continuum images, and are given in Tab. A.1.

C. ANCILLARY SPECTROSCOPIC OBSERVATIONS
In this appendix, we show the supplementary observations that resolve redshift ambiguities in our ALMA observations. Ancillary spectroscopic observations were performed with targeted line scans using APEX/FLASH, ALMA and ATCA.

APEX/FLASH CONFIRMATIONS
Because of its brightness, the [CII] line was used to confirm eight sources at z 3. Be-cause four of the spectra have already been published in Weiß et al. (2013) and Strandet et al. (2016), we present only the four new [CII] observations. The observations were carried out using APEX/FLASH (Klein et al. 2014) in the 345 GHz and 460 GHz transmission window. The data were obtained as part of project IDs M-095.F-9528-2015 and M-097.F-9519-2016 using Max Planck Society observing time in the period March -August 2015 and April -September 2016. All observations were performed in "good" weather conditions with an average precipitable water vapor <1.5 mm, yielding typical system temperatures of T sys =240 K. A full detail of the observation scheme and data processing procedure is detailed in Gullberg et al. (2015). Fig. C.1 shows the four new [CII] spectra used in this work.

ALMA TARGETED LINE SCANS
A total of 17 sources in the ALMA 3 mm spectral scans that contained a single CO line were confirmed through targeted line scans. Though a photometric redshift could be used to further identify which CO transition was likely  detected, the large error bars associated with photometric redshifts allow for some ambiguity. Using the existing CO line detections and the photometric redshift as a guide, we are able to specifically target CO lines that would be visible for degenerate redshift solution. The sidebands were configured such that an observation would yield at least one CO line detection for each source at a likely redshift option. For both sets of observations, the flux density and bandpass calibrations were based on observations of J0519-4546, J0538-4405, J1924-2914, J2056-4714, J2258-2758, and J2357-5311. The phase calibration was determined using nearby quasars. Between 42-48 antennas were used to complete the observations, resulting in synthesized beam sizes between 3.1−0.8 . The observing time for each science block ranged from 5−13 minutes on-source, excluding overheads, with average single-sideband system temperatures of T sys =56−86 K. The data were processed using the CASA package. The images were created using natural weighting and the subsequent spectra were created with a channel width of 16 MHz. The average noise per channel in the resulting spectra is 0.4 − 0.6 mJy beam −1 .

ATCA TARGETED LINE SCANS
A subset of the SPT DSFGs were also observed with the Australia Telescope Compact Array (ATCA) as part of a survey to observe the CO(1-0) and CO(2-1) line emission (ν rest = 115.2712 and 230.5380 GHz, respectively). This survey was conducted as part of project IDs C2744 and C2818 and most of the observations were published in Aravena et al. (2016). How-ever, observations for one source, SPT0604-64, were obtained after publication and are shown in Fig. C.4. We summarize the observations briefly here, but more detail can be found in Aravena et al. (2016).
In order to obtain these observations, we used the Compact Array Broadband Backend (CABB) configured in the wide bandwidth mode (Wilson et al. 2011). This leads to a total bandwidth of 2 GHz per correlator window and a spectral resolution of 1 MHz per channel (∼ 610 km s −1 per channel for the relevant frequency range). The 7 mm receivers were tuned to the frequency range 30−50 GHz, which covers the redshift ranges 1.38−2.84 for CO(1-0). The H214 array configuration at these observing frequencies leads to typical beam sizes of 5−6 . We expect the flux calibration to be accurate to within 15%, based on the comparison of the Uranus and 1934-638 fluxes. The software packages Miriad (Sault et al. 1995) and CASA were used for editing, calibration and imaging.

E. ALL ALMA 3 MM SPECTROSCOPIC LINE OBSERVATIONS
A summary of all spectroscopic data obtained for the final SPT-selected DSFG sample is given below, in Tab. E.1. Blind 3 mm scans were conducted for all 81 sources across ALMA cycles 0, 1, 3 and 4. While the spectra are given in Fig. B.1, the lines detected can be found in the "Lines from 3 mm scans" column. Any ancillary spectroscopic data obtained is presented in the "New lines & comments" column. All of the combined observational efforts have yielded secure redshifts for the complete flux-limited sample of 81 sources from the 2500 deg 2 SPT survey. Of these, 79 sources had multiple spectroscopic lines, detected either solely from the 3 mm window or with ancillary spectroscopic observations. Only two of the redshifts are based on a single line, but with a robust line identification based on our analysis of the dust temperature distribution. Altogether, this is the largest and most complete collection of spectroscopic redshifts for high-redshift DSFGs obtained so far at mm wavelengths.       h from APEX and ALMA; CO(3-2) h from ATCA Note-The parenthesis at the end of the redshift represents the uncertainty on the last digit presented. The column named 'Cycle' refers to the ALMA cycle in which the 3 mm scan was obtained. The values for spectroscopic redshift differ slightly from previous works because the data was binned to a common binning and was fitted through MCMC. The spectroscopic redshifts obtained, however, are consistent within their error bars to previously published data. Comments in the right column indicates any follow-up spectroscopy or photometric redshift.

INDIVIDUAL SOURCE PROPERTIES
The properties for the individual SPT sources are derived using a modified blackbody model, which is described in detail in Sec. 2.3.2. The resulting fits are given in Sec. 3.2.1. The values for dust temperature (T dust ), FIR luminosity (L FIR ), dust mass (M dust ) and star forma-tion rate (SFR) are calculated as described in Sec. 3.2.2 and given in Tab. F.1. If available, the magnification from the lens models described in Spilker et al. (2016) are used to give the intrinsic values. However, for the 34 sources without lens models, the median magnification ( µ 870 µm = 6.3) is adopted.

2.4(0.2)
Note-Estimates of dust temperatures and FIR luminosities are derived from the modified blackbody fits described in Sec. 2.3.2. Using these parameters, the dust masses and star formation rates are then derived using Eq. 2 and Eq. 3, respectively. The FIR luminosities, star formation rates, and dust masses were corrected for gravitational amplification (µ) according to the lens models described in Spilker et al. (2016) where such models exist. The sources that have not been modeled have parameters which have been corrected by the average gravitational amplification presented in Spilker et al. (2016). Where multiple source components exist, we use a flux weighted average to estimate µ.
When a lens model is not available, the median value µ 870 µm = 6.3 is adopted. a Published by Marrone et al. (2018)