The Principal Component Analysis Filtering Method for an Unbiased Spectral Survey of Complex Organic Molecules

A variety of interstellar complex organic molecules (COMs) have been detected in various physical conditions. However, in the protostellar and protoplanetary environments, their complex kinematics make line profiles blend together and the line strength of weak lines weaker. In this paper, we utilize the principal component analysis technique to develop a filtering method that can extract COM spectra from the main kinematic component associated with COM emission and increase the signal-to-noise ratio (S/N) of spectra. This filtering method corrects non-Gaussian line profiles caused by the kinematics. For this development, we adopt the ALMA BAND 6 spectral survey data of V883 Ori, an eruptive young star with a Keplerian disk. A filter was, first, created using 34 strong and well-isolated COM lines and then applied to the entire spectral range of the data set. The first principal component (PC1) describes the most common emission structure of the selected lines, which is confined within the water sublimation radius (∼0.″3) in the Keplerian disk of V883 Ori. Using this PC1 filter, we extracted high-S/N kinematics-corrected spectra of V883 Ori over the entire spectral coverage of ∼50 GHz. The PC1-filtering method reduces the noise by a factor of ∼2 compared to the average spectra over the COM emission region. One important advantage of this PC1-filtering method over the previously developed matched-filtering method is the ability to preserve the original integrated intensities of COM lines.


Introduction
Molecules that contain six or more atoms, including carbon, are called complex organic molecules (COMs; Herbst & van Dishoeck 2009).These molecules are also the building blocks of life.Recently, interstellar COMs have been discovered in various physical and dynamic conditions (Ceccarelli et al. 2022).Particularly, COMs are found in the inner regions of protostellar envelopes and protoplanetary disks, where the icy mantle on dust sublimates above a temperature of ∼100 K (Lee et al. 2019b;Bianchi et al. 2022;Tobin et al. 2023).Detection of various COMs near protostars raises intriguing questions about how and when these COMs would be delivered to forming planets.To answer these questions, we need to investigate, with unbiased spectral surveys, the evolution of COMs along the star formation process from envelope to disk.
Thanks to the unprecedented high sensitivity and high resolution provided by the Atacama Large Millimeter/ submillimeter Array (ALMA), a variety of COMs have recently been detected in various interstellar environments, such as infrared dark clouds (Sakai et al. 2018), low-and high-mass protostars (Bøgelund et al. 2019;Csengeri et al. 2019;Manigand et al. 2020;Yang et al. 2021), and protoplanetary disks (Favre et al. 2018;Lee et al. 2019aLee et al. , 2019b)).The ALMA spectra of protostellar inner hot envelopes, so-called hot cores or hot corinos, show a forest of COM lines (Jørgensen et al. 2016(Jørgensen et al. , 2018;;Sakai et al. 2018;Manigand et al. 2020;Belloche et al. 2020;Hsu et al. 2022), which are often blended together due to kinematically broadened line profiles.In addition, weak lines become more diluted due to the kinematically broadened line shape and consequently are embedded within the noise (Yen et al. 2016;Loomis et al. 2018).Lee et al. (2019b) extracted COM spectra from V883 Ori by aligning the centroid velocities and integrating the emission at different positions (Yen et al. 2016).The authors measured the centroid velocities using C 17 O J = 3−2, which is strong enough to derive velocities accurately.The spatial distribution of COM emission, however, differs from that of the C 17 O emission in V883 Ori (Lee et al. 2019b).Loomis et al. (2018) introduced a matched-filtering method to obtain high signal-to-noise ratio (S/N) spectral lines from interferometric data sets.The matched-filtering method derives a filter response spectrum using an expected emission distribution in Fourier space as a kernel.This technique is an efficient way to perform rapid line identification without an imaging process (Loomis et al. 2018(Loomis et al. , 2020;;Booth et al. 2019).However, for further quantitative analysis, the specific lines must still be imaged.
Both the methods by Lee et al. (2019b) and by Loomis et al. (2018) require prior information on the spatial and spectral distribution of line emission.However, since different molecules often trace different gas components (Tychoniec et al. 2021), the prior information obtained via strong lines from simple molecules may not directly correspond to the spatial and spectral distribution of the COM lines.Unbiased spectral survey observations of COMs have been carried out using ALMA (e.g., Jørgensen et al. 2016).For efficient analyses of the 1000-10,000 COM lines covered by the spectral survey observations toward a given target, we require a method to extract COM spectra showing the correct associated kinematics while preserving the original integrated intensities without any prior information.
Principal component analysis (PCA) is one of the multivariate statistics that access the common features veiled in the variation of data.The PCA has been applied to spectral cube data of line emission at radio wavelengths, uncovering not only the properties of the turbulent environments (Heyer & Peter Schloerb 1997;Brunt & Heyer 2013;Yun et al. 2021) but also the chemical variation (Ungerechts et al. 1997;Lo et al. 2009;Jones et al. 2013;Gratier et al. 2017) within molecular clouds.For example, Okoda et al. (2021) applied PCA to ALMA observations toward L483 to assess the distribution of molecular lines in position-position-velocity spaces (PCA-3D).The authors found that the first-order principal component (PC1), which describes a typical distribution of molecular lines (Ungerechts et al. 1997;Lo et al. 2009;Jones et al. 2013;Gratier et al. 2017), showed a representative velocity structure of the rotating disk/envelope of L483.
V883 Ori is an outbursting FU Orionis-type object in transition from the Class I stage to the Class II stage.Previous ALMA observations revealed a well-developed rotating protoplanetary disk around the central source (Ruíz-Rodríguez et al. 2017;van 't Hoff et al. 2018).For the first time, many COM lines, including CH 3 OH, CH 3 CHO, CH 3 COCH 3 , CH 3 OCHO, and CH 3 CN (Lee et al. 2019b), were detected from the disk.The COM lines are detected within a small area within r ∼ 120 au (0 3).Tobin et al. (2023) detected water (HDO and H 2 18 O) lines and estimated the water snowline, which is the water sublimation radius in the disk midplane, at ∼80 au in V883 Ori.Because of the temperature structure, the water sublimation front develops the two-dimensional surface in the disk and extends to ∼0 3 in the disk surface of V883 Ori (Lee et al. 2019b;Tobin et al. 2023).In this paper, the water sublimation radius refers to the water sublimation radius in the disk surface.
We observed V883 Ori with the ALMA spectral scan mode in Band 6 covering ∼55 GHz via the project The ALMA Spectral Survey of An eruptive Young star, V883 Ori (ASSAY).Various simple molecules and COMs have been detected, and it has been demonstrated that different molecules trace different kinematic components within V883 Ori.Among the detected lines, as presented in Tobin et al. (2023), the HDO line is detected toward the inner region of the protoplanetary disk within the projected radius of ∼120 au (0 3), and the COMs' emission is confined within the water sublimation region.To identify all the detected COM lines and derive the physical conditions of the gas component associated with COM emission, high-S/N line spectra corrected for the kinematics are essential.Similar to the PCA used in Okoda et al. (2021), a typical emission distribution of COM lines can be derived by applying PCA to a sample of COM lines that are well isolated and relatively strong.By utilizing this common emission distribution, we can collect emission signals extended through the position-position-velocity space and make a single Gaussian-like emission line for a specific transition.
This paper introduces a new filtering method to extract the high-S/N kinematic-corrected spectra of COMs by applying PCA to the unbiased ALMA spectral survey data of V883 Ori (ASSAY).Our filtering method aims to obtain high-S/N spectra with kinematics-corrected Gaussian-like line profiles, still preserving their original intensities.Section 2 describes the ALMA Band 6 data of V883 Ori.The methodology of the filtering method is explained in Section 3. We assess how the filtering process can improve the spectra in Section 4. Section 5 discusses the intensity preservation of the filtered spectra and their applications.Finally, we summarize our results in Section 6.

Data
We adopt the Cycle 7 ALMA Band 6 observation of the project ASSAY (ALMA Spectral Survey of An eruptive Young star, V883 Ori; 2019.1.00377.S, PI: Jeong-Eun Lee).These ALMA data were obtained for an unbiased spectral survey of V883 Ori covering the range of 220.7-274.4GHz.The full data set comprises three science goals, and each science goal contains 20 spectral windows (SPWs), resulting in 60 SPWs in total.The velocity resolution of image cubes varies from 0.662 (at 221 GHz) to 0.533 km s −1 (at 274 GHz).All images are convolved such that their final beam properties match the poorest beam, whose size is 0 25 × 0 15 with a position angle of −77 °.The rms noise temperature (T rms ) varies from 1.143 to 1.915 K with a mean value of 1.481 K.The details of the observations and data reduction will appear in a seperate paper.
To evaluate the spectra produced by our filtering method, we compare them with three sets of spectra, (1) aperture-averaged spectra, (2) aligned spectra, and (3) spectra extracted using the matched-filtering method.A typical method to obtain a line spectrum is deriving a mean spectrum from a given aperture that covers the line-emitting region.Since the COM emissions are detected within the water sublimation region, we obtain the aperture-averaged spectra using a circular aperture with a radius of 0 3 from each spectral cube data.
We also generate line spectra using the same method as Lee et al. (2019b).This method aligns the line central velocity at each pixel, which is shifted by the disk rotation of V883 Ori, to the source velocity of 4.3 km s −1 , and extracts an average spectrum within an aperture (Yen et al. 2016).During this procedure, we determine the line central velocities at each pixel by utilizing the intensity-weighted velocity (moment 1) map of C 17 O J = 2−1 at 224.7144 GHz.This line is one of the strongest lines and is well known to trace a rotating disk (van 't Hoff et al. 2020).We extract average spectra for each SPW using an aperture covering the water-sublimated region (r  0 3) on a de-projected image of V883 Ori.These spectra are referred to as the aligned spectra.We also obtained additional aligned spectra from a narrow annulus at the radial distance from 0 145 to 0 155 specifically isolating strong COM lines.These aligned spectra from the narrow annulus are adopted for initial line inspection (see Section 3.1).
The matched-filtering method is applied to the cube data for the second SPW of the first science goal, which covers the frequency range of 221.63-222.54GHz.To obtain the matched-filtered spectrum, we created a kernel for a matched filter spanning ±10 km s −1 relative to the system velocity within 0 3 from the center of V883 Ori using the C 17 O line.
Measuring the 1σ noise level is important to determine the sensitivity of the filtered spectra.However, the error propagation for the filtering method is challenging since many spatial and spectral pixels are considered together to accumulate the line emission.We thus derive noise spectra for each SPW by applying filtering to emission-free image cubes extracted from non-primary beam-corrected images.These emission-free cubes are referred to as the reference images.To obtain the reference images, we selected a 1″ × 1″ area located 5″ east of V883 Ori.
The rms noise levels of the aperture-averaged and aligned spectra are also measured using these reference images.Assuming that the reference images are centered on V883 Ori, we obtained the emission-free aligned and apertureaveraged spectra using the same approach as described above.Subsequently, we measured the rms noise level of the aligned and aperture-averaged spectra from their corresponding emission-free spectra.Note that we cannot compute an emissionfree matched-filtered spectrum since the matched-filtering method is applied to visibility data.Thus, the rms noise level of the matched-filtered spectrum is derived from the emissionfree channels within the second SPW of the first SG.

Line Selection
It is important to select appropriate lines for the PCA.We first identified detected molecules using the eXtended CASA Line Analysis Software Suite (XCLASS; Möller et al. 2017).71 molecules, including isotopologues and isomers, have been identified by matching against ∼3000 detected lines in the spectral survey data.The detailed procedures of this analysis will be described in a separate paper.Next, we investigated the line transitions of the detected molecules from the Cologne Database for Molecular Spectroscopy (Müller et al. 2001(Müller et al. , 2005) ) database and the Jet Propulsion Laboratory (Pickett et al. 1998) catalog, where 61,541 lines exist in total within the covered frequency range.
Among these transition lines, we selected specific lines based on several criteria.These criteria include the lines being detected above the 5σ noise level in the aligned spectra extracted from the narrow annulus, having upper state energies (E up ) lower than 2000 K, an Einstein coefficient A (A ij ) greater than 10 −8 , and isolated from the other candidate lines.We define a line as an isolated line if there are not any other lines within ±5 km s −1 of its line center.The 5 km s −1 criterion for isolation was determined through iterative line selection processes, taking into account that the full width of the zero intensity of the selected lines is slightly smaller than 10 km s −1 .As a result, We identified a total of 34 isolated strong COM lines (see Appendix A).

Computing the PCA
The PCA-3D is applied to all 34 isolated strong lines to obtain the representative velocity and emission structure of the COMs within V883 Ori.We construct a data set for the selected lines, which share the same spatial and velocity spaces.For each of the selected lines, its cube data is extracted from a 1″ × 1″ area centered on V883 Ori, within which most of the COM emission arises (the white dotted box in the right bottom panel in Figure A1).This limited image size prevents the introduction of noise-dominated principal components (PCs) in the PCA by excluding noise spikes from emission-free areas.All the extracted cube data cover the velocity space of ±5 km s −1 with respect to the system velocity of 4.3 km s −1 , while they have slightly different velocity resolutions (Δv) depending on the line frequencies.To better identify the velocity structure traced by the COM emission, we interpolate the extracted cube data with the finest Δv of 0.533 km s −1 .
The PCA adopted in this study is the same as that used by Okoda et al. (2021).A correlation matrix is used to derive the PCs of the 34 lines.This method avoids the PCs being dominated by a single line.We obtain the correlation matrix (c ij ) of the lines as follows: where corr(T i ,T j ) is the correlation coefficient between the ith and jth lines, and T i is the intensity of the ith line from a position of X k at a velocity v m .The 34 PCs (34 eigenvectors and corresponding eigenvalues) can be obtained by diagonalizing c ij .For each PC, an eigenvector (u n ) contains component scores for each of the 34 lines.The component scores describe how the intensities of the 34 lines correlate.The eigenvalue (λ n ) represents the portion of total variation explained with the corresponding PC.The PCs are ordered by decreasing λ n ; the first PC (PC1; n = 1) has the largest eigenvalue (λ 1 ), explaining the largest portion of the total variation.
An eigencube (T PCn ), a dot product of line data cubes (T i ) and component scores for a given order n, reveals the intensity distribution explained by a corresponding PCn; where n line is the number of the analyzed lines and u n (i) is the component score for the ith line.We produce the eigencube of the PC1 to obtain the representative velocity structure of the 34 isolated strong lines (Okoda et al. 2021).

PC1-filtering Method
One of the essential benefits of the PCA-filtering process is accumulating line emission in different velocities and spaces to one Gaussian line profile centered at the system velocity.This is possible because T PC1 (X k ,v m ) describes how the COM lines distribute over the spatial (X k ) and velocity spaces (v m ).For example, in a Keplerian disk, the line profile at a given position typically has a Gaussian shape due to thermal and nonthermal broadening.However, the central velocity of the Gaussian profile is shifted by the disk rotation.Therefore, the line profile extracted over the entire disk has a double-peaked line shape (Smak 1981).However, if we know how much the central velocity shifts at individual positions, then we can correct the velocity shifts to the source velocity before accumulating line emission over the disk to produce a single Gaussian profile.It is, in principle, the same process implemented to stack spectra using prior information on the velocity structure of V883 Ori (Lee et al. 2019b).
We derive a window function for the filtering method, where v min and v max are the minimum and maximum velocities of the window function with (v sys − 5) km s −1 and (v sys + 5) km s −1 , respectively.Since the window function is normalized, the PC1-filtering method can preserve the filtered line intensities, another essential part of this filtering method.
Subsequently, the filtered spectrum (T filt ) is derived as follows: Here f * T obs denotes a convolution of the window function and the observed ALMA Band 6 cube data (T obs ), Otherwise, the other PCs (from PC2-PC34) have relatively small values of λ n (from 0.09-0.57).Therefore, the majority of the total variation is described by PC1, while the remaining 30.2% of the total variation is described almost evenly by the other 33 PCs. Figure 1 shows a correlation wheel plot for PC1 and PC2.The correlation wheel plot is a plot of arrows defined by a pair of component scores from a pair of PCs, and it is a popular way to present a variation of the emission lines described by PCs (Ungerechts et al. 1997;Lo et al. 2009;Pety et al. 2017).All arrows in the correlation wheel point to the right side: all component scores of PC1 have a positive sign varying from 0.14 to 0.19.It implies that all 34 lines positively correlate with each other.As a result, PC1 describes most of the total variation of 34 isolated strong lines, and they have similar emission distributions across the spatial and velocity spaces.We also calculate the Pearson correlation coefficients between T PC1 and T obs for 34 COM lines to assess the distribution of lines (Okoda et al. 2021).Figure 2 shows the correlation coefficients, which are ordered by their values (from Lines 1-34; Table A1).The correlation coefficients for all lines are larger than 0.7 (the black dashed line).This result is consistent with the result from the correlation wheel plot: all lines are tightly correlated with T PC1 , confirming that T PC1 provides a common distribution of the 34 isolated strong lines.
Figure 3 exhibits the moment 0 (upper-left panel) and channel maps (upper-right panels) of T PC1 .The moment 0 map of PC1 resembles the mean moment 0 map for the isolated strong lines (lower-left panel).Most line emission arises from a small circular area with a radius of 0 3 (the solid white line).Two bright blobs are located at the northeastern and southwestern parts of the target, and a narrow region between the two bright blobs has a dip in line intensity.Loomis et al. (2018) and van 't Hoff et al. (2020) also showed a similar emission distribution for protoplanetary disks.The high-resolution images show a crescent shape of emission distribution with a central emission hole within r∼0 1 (Lee et al. 2019b;Tobin et al. 2023), but the elongated beam shape of SG3 distorts the emission distribution in the images used in this study.In addition, the central emission hole, which is produced by the optically thick dust emission, is also distorted by the elongated beam shape, resulting in the dip between two elongated  A1.

A Test with the PC1 Itself
Before applying the PC1-filtering method to the entire spectral survey data, we check how the filtering method corrects the complex line profile of the observed data.For this, we first adopt T PC1 itself as test data.Applying the PC1filtering method to T PC1 can validate the correction of the observed line profiles, as it tightly correlates with the 34 isolated strong lines.For comparison, we extract an average spectrum of T PC1 in the same way that is used to obtain the aperture-averaged spectra (see Section 2).The intensity scaling of T PC1 is set by Equation (2) and does not refer to any particular line strength.We thus normalize the average spectrum to have a peak intensity of one.The apertureaveraged spectrum of PC1 has a double-peaked line profile expected from a rotating disk.
The left panel of Figure 4 shows the PC1-filtered (red) and average spectra (black).The filtering method changes the double-peaked line profile into a single-peaked profile.The peak intensity increases after the filtering.The filtered line profile is fitted by a Gaussian profile (blue) with a central velocity (v c ) of 4.39 ± 0.03 km s −1 and an FWHM of 2.99 ± 0.07 km s −1 (the right panel of Figure 4).This result proves that the PC1-filtering method can correct the complex line profile using the typical velocity and emission structure extracted by the PCA analysis.

PC1 Filtering of the Observed Spectra
Now, we apply the PC1-filtering method to the real ALMA Band 6 data of V883 Ori.First, we investigate the changes in line profiles of 34 isolated strong lines.Figure 5 shows examples of corrected line profiles by the PC1-filtering method; the double-peaked profile is corrected into a single-peaked Figure 3.The moment 0 and channel maps of the eigencube of PC1 (T PC1 ) (upper) and observed lines (lower).In the upper panels, the white circles mark a radius of 0 3, within which most of the emission features arise.On the moment 0 map of PC1, the white-filled ellipse at the left bottom represents the beam shape, and the magenta dashed line indicates a cut for the PV diagram, Figure B1, along the semimajor axis of the protoplanetary disk.The blue ellipse represents the radial distance of 0 15, along which the aligned spectra for the line selection were extracted, on the de-projected image of V883 Ori.The lower-left panel shows the mean moment 0 map of the isolated strong lines, while the lower-right small panels present the channel maps for the CH 3 OH line at 261.8057 GHz (Line 1; see Table A1).
Figure 4. Left: correction of the line profile of PC1 using the PC1-filtering method.The black solid line represents a spectrum extracted from the PC1 eigencube with an aperture of r = 0 3.Note that the y-axis is normalized by the maximum value of the average spectrum.The PC1-filtered spectrum is presented as the red solid line.The gray dashed vertical line denotes the system velocity of 4.3 km s −1 .Right: the PC1-filtered spectrum and its Gaussian fitting result (blue).The blue dashed and dotted lines represent the central velocity (v c ) and FWHM of the Gaussian profile, respectively.
profile with a higher peak intensity, as presented in Figure 4. Additionally, the filtered line profiles can also be described by a single Gaussian function; the mean filtered spectrum (the red spectrum in the right panel) can be fitted with a Gaussian profile with v c of 4.30 ± 0.02 km s −1 and FWHM of 3.28 ± 0.05 km s −1 .This FWHM is slightly broader than the PC1 line profile corrected with the PC1-filtering method (see Figure 4).This discrepancy arises because T PC1 is filtered using itself (see Equation 3), resulting in an exact match between the window function and test data, i.e., T PC1 .The convolution maximizes the peak intensity and leads to a slightly narrower line than the PC1-filtered observed lines.
Finally, we apply the PC1-filtering method to the spectra obtained by the unbiased spectral survey of V883 Ori. Figure 6 presents the result only for the second SPWs of the first science goal as an example.As anticipated, the filtered spectra (red)  The spectra are divided into three panels for better visibility.The aperture-averaged and PC1-filtered spectra are presented in black and red, respectively.The black and red horizontal dashed lines represent 3σ uncertainties in the aperture-averaged and PC1-filtered spectra, respectively.The blue shading indicates a member of the 34 isolated strong lines (see Table A1), and the orange shading indicates the spectral ranges selected for checking the conservation of total intensity (see Section 5.1).The red triangles indicate some of the additional emission lines identified above the 3σ level in the PC1-filtered spectrum.
exhibit single-peaked Gaussian-like line profiles at the frequencies where the aperture-averaged spectra (black) present double-peaked line profiles.Also, the peak intensities of the observed lines in the PC1-filtered spectra are higher than those in the aperture-averaged spectra.This comparison is the same as in Figure 5, except for the PC1 filtering of the continuous spectral data over a wide frequency range rather than a specific individual line profile.
One notable feature of the filtered spectra is the improved S/N.The red and black dashed lines in Figure 6 indicate 3σ noise levels of the filtered and aperture-averaged spectra, respectively.The noise level of the PC1-filtered spectrum is measured using the reference image; the rms calculated from the emission-free PC1-filtered spectra are adopted as the 1σ noise levels.The 1σ noise levels of the aperture-averaged spectra vary from 0.467 to 0.973 K, with a mean value of 0.694 K. On the other hand, those of the filtered spectra vary from 0.215 to 0.526 K, with a mean value of 0.329 K.The mean noise level decreases by a factor of ∼2 via the PC1-filtering method.
The noise level decreases by this filtering process because (1) noise is proportional to n 1 pix , where n pix is the number of the spatial and spectral pixels included in the filtering process, (2) correlated noise signals in interferometric data are decorrelated by collecting emission lines in different velocities (Yen et al. 2016), and (3) the contribution of the noise signal is reduced by low weight in the window function (Loomis et al. 2018).The S/N should increase by more than a factor of 2 since the peak intensity increases by the correction of velocity shift.Indeed, in this SPW, the S/N is enhanced by a factor of ∼ 2.5 on average, with the maximum improvement reaching around a factor of 3.0 for a transition line at 222.2474 GHz.
The improved S/N in the PC1-filtered spectra enables the detection of additional COM lines.The red triangles in Figure 6 indicate some isolated transition lines.These lines are not detected in the aperture-averaged spectra, but are identified in the PC1-filtered spectra above the 3σ noise level.In total, we find 31 additional COM lines from the PC1-filtered spectra, which are ∼ 36% of the detected lines in this SPW.This result demonstrates the effectiveness of our filtering method in discovering more COM lines from the observed data.

Conservation of the Total Intensity
In this section, we compare the integrated intensities of individual lines to check whether the PC1-filtering method preserves line intensities.We measure the integrated intensities of 34 isolated strong lines from both the PC1-filtered and aperture-averaged spectra (see the left panel of Figure 7).The integrated intensity of each line is derived within ±5 km s −1 from the line center.
The integrated intensities from the PC1-filtered spectra tend to be slightly lower than those from the aperture-averaged spectra when the integrated intensity exceeds 40 K km s −1 .However, the difference in integrated intensity is still within the 3σ error range.The difference in the rms in the integrated intensity of the isolated strong lines is about 1.07 K km s −1 .Since the mean error of the integrated intensities from the PC1filtered spectra is about 0.7 K km s −1 , the integrated intensities from the PC1-filtered spectra are still consistent with those from the averaged spectra.We also check whether the PC1-filtering method preserves intensity in general.In order to take account of various types of lines, we selected 109 frequency ranges which contain (1) Gaussian-like emission lines with high peak intensities, (2) Gaussian-like lines but with low intensities, (3) blended broad emission lines, (4) deep absorption features, and (5) emissionfree-regions. Figure 8 shows examples of the selected frequency ranges.Among the 109 frequency ranges selected for this test, two frequency ranges are located within the second SPW as marked by the orange shading in Figure 6.
The right panel of Figure 7 shows that the integrated intensities from the PC1-filtered spectra are consistent with those from the aperture-averaged spectra for all selected lines.The rms of the differences in intensity is 1.08 K km s −1 .Since the mean error of the integrated intensities from the PC1filtered spectra is about 0.97 K km s −1 , the measured integrated intensities are consistent with each other within about the 1σ error range.As can be seen in the bottom panel, the residuals are randomly distributed around zero.Thus, the slightly lower intensities of the PC1-filtered spectra above 40 K km s −1 for the isolated strong lines may be a coincidence resulting from a lack of samples in the high intensities.Therefore, the PC1-filtering method preserves the total intensity of a line.

Comparison with the Other Methods
Figure 9 presents a comparison of the PC1-filtered spectrum (red), aligned spectrum (blue), and matched-filtered spectrum (green) obtained from the water-sublimated region of V883 Ori.There are slight variations among the three spectra in terms of relative line intensities, line profiles, and noise patterns near zero intensity.These discrepancies are likely attributed to the use of different emission structures (T PC1 for the PC1-filtering method, moment 1 map for the aligning method, and kernel for the matched-filtering method), which may not perfectly coincide with each other.
The aligned spectrum shows line intensities consistent with those in the PC1-filtered spectrum.However, the lines detected in the PC1-filtered spectrum tend to have slightly broader line widths (FWHM ∼ 3.28 km s −1 ) and smoother line profiles compared to those in the aligned spectrum (FWHM ∼ 2.40 km s −1 ).These differences are likely a result of the convolution of the cube data and window function, which has a similar effect to Gaussian filtering on the spectra.On the other hand, the aligned spectrum has a T rms value of 0.88 K, which is comparable to that of the aperture-averaged spectrum (0.81 K).As both the aperture-averaged and aligned spectra are extracted from similar apertures encompassing the water-sublimated region of V883 Ori, they include a similar number of pixels to derive their mean spectra, resulting in similar noise levels.Thus, the S/N of the PC1-filtered spectrum is higher compared to the aligned spectra by a factor of ∼2.
The matched-filtered spectrum exhibits overall features consistent with the other spectra.Similar to the PC1-filtered spectra, the matched-filtered spectra also show broader line widths (FWHM ∼ 3.34 km s −1 ) compared to the aligned spectra.Moreover, both PC1-filtered and matched-filtered spectra demonstrate similar improvements in the S/N.The S/N of the matched-filtered spectra is calculated using the 1σ noise level estimated from their emission-free channels.The maximum S/N values for the PC1-filtered and matched-filtered spectra are 37.13 and 40.90, respectively.Since the maximum S/N of the aperture-averaged spectrum is 18.47, both the PC1-Figure 8.An example of the selected frequency ranges and their PC1-filtered spectra.Each panel shows the aperture-averaged (black) and PC1-filtered spectra (red) within the selected frequency ranges (the orange-filled regions); they are one of the (1) Gaussian-like emission lines with high peak intensities, (2) Gaussian-like lines but with low intensities, (3) blended broad emission lines, (4) deep absorption features, and (5) emission-free-regions.As presented in the upper-right panel, if two adjacent lines are intrinsically blended because of thermal and nonthermal effects or a low spectral resolution, those lines cannot be completely decoupled even with the PC1-filtering method.
filtering and matched-filtering methods achieved approximately a twofold improvement in S/N.However, the matched-filtered spectrum has a relatively unstable baseline, which poses challenges in identifying weak emission lines.For instance, the red bars in Figure 9 indicate weak emission lines that are visible in the PC1-filtered and aligned spectra but not in the matched-filtered spectrum.Through model fitting, these lines are tentatively identified as weak COM lines over the 1σ level: one is a CH 2 CCH line at 222.099 GHz, and the other is a H 2 C 13 CO line at 222.235 GHz.
Each of the methods presented here has advantages and limitations.The aligning method and matched-filtering method offer the ability to obtain line spectra easily.In particular, the matched-filtering method allows us to derive a high S/N spectrum from visibility data without requiring the CLEAN process.Therefore, a quick search for specific transition lines can be achieved with low computational costs.However, both methods require prior information on the kinematic structure of a target.In addition, despite a narrower line width, the aligning method is accompanied by a lower S/N than the other methods because the aligned spectrum includes all noise signals in the vicinity of the line emissions.The matched-filtered spectrum significantly improves S/N as the contribution of noise signals is efficiently reduced by the kernel.However, line identification using the matched-filtered spectrum via model fitting can be challenging due to an unstable baseline.The matched-filtered spectrum also does not provide information on the line intensities, so we need subsequent CLEAN processes.
On the other hand, for the PC1-filtering method, initial line identification is required to find the isolated strong lines.The statistical accuracy of the emission distribution in T PC1 drops if there are few isolated strong lines.The PC1-filtering method, however, derives a window function from multiple emission lines and obtains a representative emission distribution.Therefore, this method is very useful when we carry out unbiased spectral surveys, especially for the COM lines.We can acquire high-S/N spectra without prior information on the kinematic structure of a target.Despite the broader line width compared to the aligned spectra, the high S/N and a relatively stable baseline of the spectra make them capable of detecting weak emission lines, which are crucial to studying COMs.Furthermore, unlike the matched-filtering method, the PC1-filtering method can provide accurate intensities of the observed lines.

Application of the PC1-filtering Method
The PC1-filtering method produces a representative spectrum of a line emitting gas traced by PC1, which presents a common emission structure of these observed lines.This method has only two requirements: an image cube with many emission lines and the line catalog covering the observed frequencies.Even with no prior information on the kinematics of the target, the PCA can assess the velocity structures traced by the selected lines.These advantages make the PC1-filtering method robust enough to extract representative spectra of the target.Also, the PC1-filtering method can extract high S/N spectra with a single Gaussian profile from the observed cube data.Also, weak lines, which cannot be identified from the original cube data, will rise up above the now reduced noise level.
Recently, ALMA has been utilized for unbiased line surveys of COMs (Jørgensen et al. 2016).For these survey data, our PC1-filtering method can be used to identify COM lines, including weak lines.With those identified lines, the physical and chemical environments of the COM emission region will be investigated more accurately; the high S/N spectra with single Gaussian profiles provide good quality data sets to fit with line simulation tools, such as XCLASS (Möller et al. 2017) and the MAdrid Data CUBe Analysis (MADCUBA; Martín et al. 2019).By the fitting process, the blended spectra can also be decomposed.
In addition, we can utilize the PCA products to identify lines with emission distribution different from the COM emission.Figure 10 shows the comparison of the integrated intensities between the PC1-filtered spectra and the spectra averaged over the 1″ × 1″ boxy aperture, where the PC1 has been derived, for the same selected frequency ranges as presented in Section 5.1.The data points follow a linear relation described by the area ratio of the circle with r ∼ 0 3 and the box of 1″ × 1″ (the gray dashed line).This result is expected because the integrated intensities of the PC1-filtered spectra agree with those of the spectra extracted from the circular aperture of r ∼ 0 3 (Figure 7).However, two outliers corresponding to C 17 O J = 2−1 and HNC J = 3−2 appear.These two lines trace much more extended structures compared to the COM lines (Lee et al. 2019b;Tobin et al. 2023): the COM lines are confined within the water sublimation radius traced by the HDO emission, while the C 17 O line traces the whole dust disk structure, and the HNC line traces a ring-like structure beyond the C 17 O emission.
The T PC1 also provides information on the kinematics of the target with high S/N.As presented above, the COM lines trace the inner part of the Keplerian disk of V883 Ori.The channel maps in Figure 3 exhibit that the noise signals in T PC1 are much lower than those in the original cube data.Figure B1 also shows that the maximum S/N of the PV diagram of T PC1 (∼197.6) is much higher than that of Line 1 (∼26.4).This improved S/N of PC1 greatly enhances kinematics analysis capabilities.Figure 11 shows the maps of the intensityweighted velocity (moment 1) and intensity-weighted velocity dispersion (moment 2) for the eigencube of PC1.These maps are commonly used in radio astronomy to assess the kinematics of the target.The S/N of these maps depends on how many noise signals are included, so the moment maps have been produced using a mask that can exclude the noise signals (Dame 2011).In the PC1-filtering method, the randomly varying noise signals are suppressed in T PC1 because it describes the common feature of selected lines.Therefore, the moment maps derived from T PC1 have very high S/N.
The T PC1 traces the inner disk of V883 Ori inside the water sublimation radius because most COM lines trace that component.The observed molecular lines are categorized into several groups that trace different spatial and kinematic structures of V883 Ori.Therefore, we can construct additional versions of T PC1 that trace these kinematics using other  molecular emission lines.If a molecular line traces a mixture of different kinematics, including the inner rotating disk, we could decompose these different kinematics.First, the emission components tracing the inner disk can be inferred by fitting the emission structure with T PC1 .After removing the inner disk component from the cube data, we can explore other kinematic features step by step using the filters derived from the PCA.

Summary
We introduce the PC1-filtering method, which is robust enough to extract the representative spectra of a line-rich target.This method derives a common emission structure of selected emission lines in position-position-velocity space using PCA and utilizes the structure to obtain a representative spectrum of the gas component traced by the selected lines.Thus, this method provides a high-S/N kinematics-corrected line spectrum without requiring any prior information about the target.We apply the PC1-filtering method to the ALMA Spectral Survey of An eruptive Young star, V883 Ori (ASSAY) in Band 6 (2019.1.00377.S, PI: Jeong-Eun Lee), especially for the COM lines.The main results are summarized as follows: 1. We find 34 COM lines that are strong and well isolated from adjacent lines.These lines are confined within the water sublimation radius (within 0."3 from the center), positively correlated, and emitted from the same gas component.The PC1 of PCA-3D exhibits their common emission structure, and the emission structure follows the kinematics of the Keplerian rotating disk of V883 Ori. 2. The PC1-filtered spectra are derived by utilizing T PC1 .
The PC1-filtering method corrects non-Gaussian line profiles generated by the rotating disk and produces line spectra with single Gaussian line profiles with higher peak intensities.3. The PC1-filtering method effectively decreases T rms in the spectra by a factor of 2 compared to the apertureaveraged spectra directly extracted from the image cube.This reduction in noise, along with the corresponding enhancement in line peak intensity, results in an overall improvement of the S/N by a factor of 2.5.This level of improvement is comparable to what can be achieved using the matched-filtering method.4. The PC1-filtered spectra are generally consistent with the filter response spectra obtained using the matchedfiltering method.However, we have observed certain cases where the PC1-filtered spectra reveal weak emission lines that are not easily identified in the matched-filtered spectra.This finding highlights the usefulness of the PC1-filtered spectra in detecting and identifying weak emission lines that might otherwise be missed using the matched-filtering method.5.The PC1-filtering method preserves the integrated intensities of the observed lines.The integrated intensities for emission lines measured from the PC1-filtered spectra are consistent with those from the aperture-averaged spectra using the 3σ criterion.6.By comparing the integrated intensities of the PC1filtered spectra and those from the spectra averaged over the 1″ × 1″ boxy aperture, we can identify the specific lines tracing different gas components.7. The PC1-filtering method can be applied to any unbiased spectral survey data set to extract kinematic-corrected spectra.Thus, it can be very useful, especially, for the COM lines that are easily blended with each other.High-S/N kinematic-corrected line profiles are crucial when they are fitted with line simulation tools, such as XCLASS and MADCUBA, for further quantitative analysis.8. T PC can be used to explore the system kinematics, with a high-S/N for the gas component of interest.We can also assess different kinematic features associated with the target by fitting out the emission structure with the primary T PC .

Appendix B PV Diagram of V883 Ori
The PV diagram is a commonly used method of analysis to assess the kinematics of a rotating disk.We generate a PV diagram of V883 Ori along the semimajor axis of its disk.Figure B1 shows the PV diagram of T PC1 .The distribution of the COM lines can be explained by a Keplerian rotation profile (the red solid line) around a 1.2 M e central protostar (Cieza et al. 2016;Lee et al. 2019b).The inner boundary of the emission (∼0 1) is set by the optically thick continuum emission, while the outer boundary (∼0 3) is determined by the water sublimation radius.Note.
a The selected lines are arranged in decreasing order of the correlation coefficient with PC1 of PCA-3D (see Section 4.1).
Results of the PCAThe derived λ 1 is 23.73, and the fraction of the total variation of the data described by the PC1, P 1 = 69.8%.Here ( )

Figure 1 .
Figure1.A correlation wheel plot for the PC1 and PC2.The right panel shows a zoomed-in view for detail and the numbers in blue refer to the COM number identified in TableA1.

Figure 2 .
Figure 2. Pearson correlation coefficients (ρ) between the eigencube of PC1 and the spectral cube data for each isolated strong line.The black dashed line indicates ρ = 0.7.

Figure 5 .
Figure5.Line profile corrections for the isolated strong lines.The four left panels show the aperture-averaged (black) and PC1-filtered (red) spectra for Lines 1, 2, 33, and 34.Lines 1 and 2 have the strongest correlation with PC1, while Lines 33 and 34 have the weakest correlations.The right panel shows the mean spectra of 34 isolated strong lines.

Figure 6 .
Figure6.The spectra for the second SPW of the first science goal.The spectra are divided into three panels for better visibility.The aperture-averaged and PC1-filtered spectra are presented in black and red, respectively.The black and red horizontal dashed lines represent 3σ uncertainties in the aperture-averaged and PC1-filtered spectra, respectively.The blue shading indicates a member of the 34 isolated strong lines (see TableA1), and the orange shading indicates the spectral ranges selected for checking the conservation of total intensity (see Section 5.1).The red triangles indicate some of the additional emission lines identified above the 3σ level in the PC1-filtered spectrum.

Figure 7 .
Figure 7.The integrated intensities for 34 isolated strong lines (left) and 109 selected frequency ranges (right).The top panels show a comparison between the integrated intensities measured from the aperture-averaged and PC1-filtered spectra.The bottom panels show the difference in integrated intensity (residuals).The error bars in both panels represent the 3σ uncertainty of the measured integrated intensities.

Figure 9 .
Figure 9.Comparison of spectra produced by the PC1-filtering (red), aligned (blue), and matched-filtering methods (green).Note that the matched-filtering method provides the filter response values instead of the intensities.The red bars indicate two tentatively identified COM lines that are detected in the PC1-filtered spectrum over the 1σ level, which are not visible in the matched-filtered spectrum.

Figure 10 .
Figure10.The same as the right panel of Figure7but for a comparison between the PC1-filtered spectra and the average spectra over the 1″ × 1″ area centered on V883 Ori.The gray dashed line shows y = ax, where a is the area ratio between the circular aperture of r = 0 3 and the boxy aperture of 1″ × 1″.

Figure 11 .
Figure 11.Moment 1 (top) and moment 2 (bottom) maps for the eigencube of PC1.The black contours in both panels show the moment 0 map of PC1 (presented in the upper-left panel of Figure 3.

Figure A1 .
Figure A1.Moment 0 maps for the 34 isolated strong lines.The lines are referred to by the numbers listed in Table A1.The bottom-right panel presents a mean moment 0 map of the 34 lines.The white dashed box in the mean moment 0 map represents 1″ × 1″ area centered on V883 Ori.

Figure B1 .
Figure B1.The PV diagrams extracted along the semimajor axis of the V883 Ori disk (the magenta dashed line in Figure 3).The left panel shows the PV diagram for T PC1 , and the right panel shows that for Line 1.In each panel, the contour levels start from 5 times the noise level of the data and are spaced by a factor of 2. The red solid lines on the left panel depict the Keplerian rotation profile of the V883 Ori disk.