The Distribution of Semidetached Binaries. I. An Efficient Pipeline

Semidetached binaries are in the stage of mass transfer and play a crucial role in studying the physics of mass transfer between interacting binaries. Large-scale time-domain surveys provide many light curves of binary systems, while Gaia offers high-precision astrometric data. In this paper, we develop, validate, and apply a pipeline that combines the Markov Chain Monte Carlo method with a forward model and DBSCAN clustering to search for semidetached binaries and estimate the inclination, relative radius, mass ratio, and temperature ratio of each using light curves. We train our model on the mock light curves from Physics of Eclipsing Binaries (PHOEBE), which provides broad coverage of light-curve simulations for semidetached binaries. Applying our pipeline to Transiting Exoplanet Survey Satellite sectors 1–26, we have identified 77 semidetached binary candidates. Utilizing the distance from Gaia, we determine their masses and radii with median fractional uncertainties of ∼26% and ∼7%, respectively. With the added 77 candidates, the catalog of semidetached binaries with orbital parameters has been expanded by approximately 20%. The comparison and statistical results show that our semidetached binary candidates align well with the compiled samples and the PARSEC model in T eff–L and M–R relations. Combined with the literature samples, comparative analysis with stability criteria for conserved mass transfer indicates that ∼97.4% of samples are undergoing nuclear-timescale mass transfer, and two samples (GO Cyg and TIC 454222105) are located within the limits of stability criteria for dynamical- and thermal-timescale mass transfer, and are currently undergoing thermal-timescale mass transfer. Additionally, one system (IR Lyn) is very close to the upper limit of delayed dynamical-timescale mass transfer.


INTRODUCTION
Eclipsing binaries (EBs) play a crucial role in modern astronomy.More than 50% stars with masses of 1 M ⊙ or higher are discovered to exist in binary (multiple) systems (Sana et al. 2012;Moe & Di Stefano 2017).
Studying binaries provides insights into the star formation and evolution (Stacy et al. 2010), stellar model cali-Corresponding author: Jianping Xiong, Xuefei Chen xiongjianping@ynao.ac.cn, cxf@ynao.ac.cn bration (Tkachenko et al. 2020;Xiong et al. 2023), accretion physics (Bisikalo 2010) and mass transfer physic (Ge et al. 2015(Ge et al. , 2020a)).The interactions between binary systems give rise to various intriguing objects, including compact binaries, supernovae, gamma-ray bursts, X-ray binaries, pulsars, cataclysmic variables, etc (Sana et al. 2012;Han et al. 2020).Semi-detached binaries (SDs) are in a state of mass transfer and therefore serve as excellent laboratories for studying the mass transfer of binaries.Analyzing a large sample of SDs containing accurate parameters will enhance our understanding of statistical properties and mass transfer processes in SDs.
With photometric observations, numerous SDs have been discovered; for instance, Budding et al. (2004) compiled a catalog of 411 SDs, Paczyński et al. (2006) revealed a list of 2949 SDs from the All Sky Automated Survey (ASAS), and Papageorgiou et al. (2018) identified 449 SDs by analyzing the Catalina Sky Surveys (CSS) data.Moreover, recent large-scale time-domain surveys such as Kepler/K2, TESS (Transiting Exoplanet Survey Satellite), ZTF(Zwicky Transient Facility), and ASAS-SN (All-Sky Automated Survey for Supernovae) have greatly contributed to the discovery of millions of eclipsing binaries (Kirk et al. 2016;Chen et al. 2020;IJspeert et al. 2021;Prša et al. 2022;Christy et al. 2023).However, only a small fraction of SDs have measurements of complete parameters, where Surkova & Svechnikov (2004) have compiled approximately 232 SDs samples with stellar parameters, of which 96 had spectroscopic and photometric observations, Ibanoǧlu et al. (2006) also listed the absolute parameters of 61 SDs, and recently, Malkov (2020) added 31 new samples to the previous catalog (Surkova & Svechnikov 2004) with available light and radial-velocity curve solutions, and Meng et al. (2022) has compiled physical parameters of 48 individually studied Near-Contact Binaries.
TESS is a pioneering space mission that can capture nearly the entire sky (Ricker et al. 2015), and it has monitored ∼200,000 bright stars (V=5∼12mag) in its first 2-year prime mission.Notably, in the TESS survey, by using the light curves from sectors 1-26, IJspeert et al. (2021) and Prša et al. (2022) have reported the confirmation of 3155 early-type eclipsing binaries (EBs) and 4584 EBs, respectively.These light curves can provide valuable information on various properties of binaries, such as mass ratio (q), period (P ), temperature ratio (T 2 /T 1 ), luminosity ratio (L 2 /L 1 ), inclination (i) and relative radii (R (1,2) /a), where a is the semi-major axis.This information will greatly contribute to our understanding of semi-detached binaries and the mass transfer in binary systems.
PHOEBE1 (PHysics Of Eclipsing BinariEs Prša & Zwitter 2005;Prša et al. 2016;Jones et al. 2020) is now a widely used tool for analyzing and fitting light curves of binaries, which is built upon the WD code (Wilson & Devinney 1971).However, when dealing with a large number of data points, the light curve fitting process in PHOEBE is time-consuming.This is primarily due to the integration of internal physical models.Therefore, motivated by the need for measuring stellar properties in a large number of semi-detached binaries, there is an increasing demand for a pipeline that can effectively process the data and extract relevant information for studying these systems.
Recently, machine learning techniques have been widely applied to the processing and analysis of massive astronomical observations, including the measurement of stellar parameters (Fabbro et al. 2018;Wang et al. 2020), target detection (Ichinohe & Yamada 2019;Chan et al. 2022), and classification (Chen et al. 2020;Barbara et al. 2022).In the specific context of binary systems analysis, machine learning methods have also been used.For example, Zhang et al. (2022) employed the CNN network to identify the binaries in the LAM-OST survey.Additionally, Ding et al. (2022) proposed a fast approach for deriving parameters of contact binaries using a combination of a neural network (NN) and the MCMC (Markov chain Monte Carlo) algorithm.This method significantly enhances the speed of light curve fitting for contact binaries.
Furthermore, a large number of stars with accurate measurements of astronomical and atmospheric parameters have recently been released from Gaia DR3 (Gaia Collaboration et al. 2023a,b;Zhang et al. 2023).By combining these parameters and light curves, a catalog containing the complete parameters for semi-detached binaries can be constructed.Consequently, in this paper, we develop a fast pipeline using a machine learning method to accurately fit the light curves of semidetached binaries.As a result of our work, we will analyze the semi-detached binaries in the TESS survey and present a catalog containing the complete parameters for these systems.The result will serve as a valuable resource for further research and analysis of the physical and evolutionary properties of semi-detached binaries.
The paper is organized as follows: We describe the light curve fitting model for semi-detached binaries in Section 2. The pipeline for deriving parameters from the light curve is presented in Section 3.Then, we analyze the semi-detached binaries from the TESS survey in Section 4. The results and statistical analysis are indicated in Section 5. Finally, we summarize in Section 6.

Training dataset
Semi-detached binaries are the binaries in which one component completely fills its Roche lobe while the other component remains inside its Roche lobe.There are two types of semi-detached binaries according to the mass, the more massive component and the less massive one, of the Roche lobe filling stars.In some specific parameters space, the light curves of a more massive or a less massive component overfill its Roche lobe will be similar.So, we need to construct two individual models for fitting the light curves of these two types of semidetached binaries.
For the two models, the following free parameters are considered, and the more massive star is defined as the primary star (star 1): • mass ratio (q=M 2 /M 1 ) within the range of [0.1, 1].
• effective temperature of primary star (T 1 ) within the range of [3500 K, 55000 K] • effective temperature ratio (T 2 /T 1 ) in the range of [0.2, 2.25] where a is the semi-major axis, R L1 and R L2 are the Roche lobe radii of two components.And the Roche lobe radii are constrained by mass ratio as (Eggleton 1983) given: In Eq.1, the mass ratio q is defined as M 2 /M 1 .
Based on the specified parameters' ranges, we uniformly and randomly sample the parameters within their given ranges to generate training samples by using the PHOEBE package.We generate 100 sample points of light curves within the phase range of 0 to 1.The passband used for PHOEBE is set as "TESS:T", which represents the passband of TESS mission.Then we use the Eq. 2 to convert the flux values obtained from PHOEBE to magnitudes.
In Eq.2, f i is the mock light curve from PHOEBE, n represents the number of sampling points, and m ′ i is the normalized light curve in "mag" unit.For each model, we generate approximately 1 million light curves for model construction.Finally, the dataset is divided into training and validation sets in a ratio of 8:2 for model training and evaluation.Fig. 1 shows the distribution histograms of the training dataset, and we found that the distributions of R (1,2) /a, and effective temperatures (T 1 and T 2 /T 1 ) are non-uniform.This non-uniformity is primarily due to the constraints imposed by the physical models and the composition conditions of semi-detached binaries.Therefore, during the model training process, we randomly sample the training set during each iteration to mitigate the influence of bias caused by the dataset's distribution on model training.This approach aims to enhance the model's coverage of the entire dataset, facilitating more effective training.By employing random sampling, we ensure that each iteration's batch or sample represents the dataset as a whole, minimizing any potential biases towards specific subsets.Random sam-pling also helps prevent the model from memorizing the order or specific characteristics of the training examples.Consequently, the model can learn more generalized patterns and improve its ability to handle diverse data.

Model establishment
In this study, we utilize a Multi-Layer Perceptron (MLP) network to establish the mapping relationship between parameters and light curves.We apply the MLP network to the mock light curves constructed in Section 2.1.The complete network architecture is illustrated in Fig. 2.This MLP network comprises 1 input layer, 8 hidden layers, and 1 output layer.The input layer is composed of 5 neurons, representing the effective temperature of primary (more massive) star (T 1 ), inclination (i), relative radii of unfilled component (R (1,2) /a), mass ratio (q=M 2 /M 1 ) and temperature ratio (T 2 /T 1 ).The output layer predicts the corresponding light curve based on these input parameters.Each hidden layer in the network is a fully connected layer, and the number of neurons in each layer is indicated in Fig. 2. In order to reduce the complexity of our model, we train the model using only 50 sample points from the 0-0.5 phase range of the light curve.This choice was motivated by the circular and symmetrical characteristics of our simulated light curves, resulting in the model generating a light curve with 50 data points.During the light curve fitting process, we will concatenate the 50-point light curve to create a 100-point curve.To enhance the model's expressive capacity, we incorporate residual blocks and utilize the hyperbolic tangent (tanh) activation function.The inclusion of residual blocks helps address the vanishing gradient problem, allowing the model to learn more effectively.Finally, during the training process, we employ the backpropagation (BP) algorithm to train the model (Rumelhart et al. 1986).For optimization, we utilize the Adam optimizer and L2 loss function.These choices aid in optimizing the model's performance and improving its ability to accurately predict the corresponding light curves based on the given input parameters.

Model verification
For model verification, we generate a test dataset consisting of 10,000 mock light curves by PHOEBE considering two scenarios.In Fig. 3 (a) and (b), we present the evaluation of the model for a more massive component filling its Roche lobe.In panel (a), we depict the distribution of standard deviations for the residuals (σ Res ) between the predictions and simulated light curves.It illustrates that the mean value of the residuals' standard deviation (σ Res ) is approximately 0.00053 mag for the model with a more massive component filling its Roche lobe.Additionally, in Fig. 3 (b), a direct comparison is provided between the true light curve (black dots) and the predicted light curve (red solid line).Similarly, panels (c) and (d) exhibit the evaluation results of the model with a less massive component filling its Roche lobe.In Fig. 3 (c), it shows that the mean value of the residuals' standard deviation (σ Res ) for this model is approximately 0.00039 mag.It can be seen in panels (b) and (d) that there are well-matched light curves between the predictions and mock data for the two models, with the R-squared value (R2 ) exceeding 0.99.The Rsquared value is a statistical measure that represents the goodness of fit between the predicted and true values.And it is defined as: (3) in which, f (x i ) is the predicted light curve, and y represents the true light curve.n denotes the total number of data points in the light curve.
Fig. 3 indicates that both models are able to accurately capture the characteristics and features of the light curve using the given input parameters.The predicted light curves closely resemble those generated by PHOEBE.Furthermore, the light curve fitting model constructed using this neural network significantly accelerates the process of fitting light curves for semi-detached binaries.Specifically, when comparing the two models to the PHOEBE package, they demonstrate superior efficiency in generating light curves.For example, when modeling a light curve with 100 data points, the light curve fitting model can be executed in just 5 ms on a CPU with 32GB of memory, operating at a frequency of 5.10GHz, such as the 12th Gen Intel ® Core TM i9-12900.In contrast, performing the same fitting process using the PHOEBE package under identical conditions would typically take around 4 seconds.As the number of data points increases, the slow speed of PHOEBE makes it almost useless for MCMC analysis.The substantial decrease in processing time by using neural network is particularly well-suited for analyzing extensive datasets., which comprises 1 input layer, 8 hidden layers, and 1 output layer.The input layer contains 5 neurons, representing the effective temperature of primary (more massive) star (T1), inclination (i), relative radii of unfilled component (R (1,2) /a), mass ratio (q=M2/M1) and temperature ratio (T2/T1).Each hidden layer is a fully connected layer (FC), and the number of neurons in each layer is indicated by the numbers.The output is the corresponding light curve.
models (built-in Section 2).Previously, to speed up the photometric solution process, observed light curves were often resampled along the phase axis.In this study, the models enable the rapid generation of light curves.Hence, in this work, we refrain from resampling, and instead employ interpolation to align the output light curve from the neural network with the phases of the observed data.To summarize, our photometric analysis progress is described as follows: Initially, the light curves are folded into phases within the range of 0 to 1 using their known orbital periods, and normalized by Eq.2.Subsequently, assuming the effective temperature of the primary star (T 1 ) is obtained from spectroscopic or multiple-band observations, we set i, R 2 /a (or R 1 /a), q, T 2 /T 1 and t 0 bias as free parameters with uniform priors, based on the initial values from model establishment.Where t 0 bias represents the narrow deviation of the primary minimum potentially caused by measurement errors when we fold light curves.
An illustrative light curve demonstrating the secondary star filling its Roche lobe is used to show this photometric solution method, its parameters are T 1 = 36552 K, i=77.72 • , R 1 /a=0.347,q=0.516, and T 2 /T 1 =1.268.We have run 800 iterations of MCMC, with each free parameter being evaluated using 100 walkers.A single light curve might be generated by multiple parameter sets.Therefore, we employ DB-SCAN (Density-based spatial clustering of applications with noise, Ester et al. 1996) to find the global maxi-mum of the R 2 among all other local solutions obtained from MCMC results.Finally, the parameters with the highest R 2 values are selected as the final results.Subsequently, we use 1500 light curves to test the systematic performance and potential biases of this pipeline.Fig. 5 shows the result of our pipeline on the model with a more massive component filling its Roche lobe, in panels (a) to (d), the differences between the parameters (i, R 2 /a, q and T 2 /T 1 ) obtained from our pipeline and true values are presented.As depicted in Fig. 5, the majority of samples in the mock dataset can be accurately measured their parameters by this pipeline with small deviations, and the corresponding standard deviations of the differences for the model with more massive component filling its Roche lobe in i, R 2 /a, q and T 2 /T 1 are 0.403 • , 0.003, 0.020, 0.017, respectively.Similarly, Fig. 6 illustrates the evaluation result of our pipeline on the the model with a less massive component filling its Roche lobe.The corresponding standard deviations of differences for i, R 1 /a, q and T 2 /T 1 are 0.328 • , 0.003, 0.018, 0.012, respectively.

Method verification
Here, we apply this pipeline to the observed light curve data for further validation.RT Per (TIC 385105755) was initially identified as an eclipsing binary by Ceraski (1904).It was further validated as a semi-detached binary system, with the lower-mass component filling its Roche lobe (Edalati & Zeinali 1996).Table 1 summarizes the characteristics on RT Per.The effective temperature of the primary star (T 1 ) for RT Per is required to set as fixed value.Gaia MSC3 (Multiple Star Classifier, Gaia Collaboration et al. ( 2023c)) has provided stellar parameters (T (1,2) , log g (1,2) , [M/H], distance, A 0 and A G ) for all sources with G≥18.25 mag from BP/RP spectra and parallaxes, assuming they are unresolved coeval binaries.Additionally, assuming they are single stars, Gaia DR3 also offers six homogeneous star samples (types OBA, FGKM, ultracool dwarfs (UCDs), solar analogues, carbon stars, and the Gaia spectrophotometric standard stars (SPSS)) with high-quality astrophysical parameters (golden sample) across the Hertzsprung-Russell (HR) diagram (Gaia Collaboration et al. 2023d).For RT Per, the T 1 values provided by Gaia MSC and the Gaia golden sample are 5921 +186 −284 K and 6053 +22 −22 K, respectively.These T 1 values are utilized as fixed constants to fit the light curve.During the fitting process, we employ the pipeline on two NN models to determine the parameters sequentially, and the reproduced light curves from the two models are compared with the observational data using the R 2 score.Finally, we select the parameters that yielded the highest R 2 value as the final result for RT Per.
The fitting results for RT Per are presented in Table 1.As depicted in Table 1, the parameters (i, R 1 /a, q, and T 2 ) obtained from Gaia MSC and golden sample show small differences compared to the existing literature.However, we notice relatively larger errors in the mass and radius when utilizing the MSC parameters.Furthermore, there is also a larger disparity between the parameters derived from the MSC and those reported in the existing literature.Fig.7 shows the light curve fitting results of RT Per with the priors from Gaia golden sample.In the upper right panel of Fig. 7, the comparison of the reproduced light curves and observations is presented.The black dots represent the observed light curve, and the blue dashed and red solid lines are the reproduced light curves from two models.And the higher R 2 value of 0.9738 for the model with a less massive component filling its Roche lobe is selected as our final result, and the corner plot illustrates the distribution of these parameters.(2022) with the effective temperatures from Gaia DR3.Initially, the samples consist of all EBs identified in the aforementioned studies.We then apply a rigorous selection process to refine the sample as follows: First, we extract the complete information in TESS input catalog (Stassun et al. 2019) for these EBs, we used the latest version of the TIC (TIC v8.1).Second, we cross-match the TESS EBs with Gaia DR3 within 5 arcsec cone, and for cases where one star has multiple Gaia DR3 sources, we chose the source that exhibited the least deviation of magnitude in G/BP/RP band between TESS input catalog and Gaia DR3.Third, to ensure the accuracy of the parameters obtained from Gaia data, we also apply two quality filter parameter selection conditions on the Gaia DR3 dataset: (1) parallax over error > 5;

Light curve from TESS
The TESS survey provides 2-minute and 30-minute cadence data on the public data releases of TESS-SPOC (Jenkins et al. 2016) and MIT QLP (Huang et al. 2020a,b) during its normal mission.The eclipsing binaries from IJspeert et al. ( 2021) and Prša et al. (2022) are detected using 2-minute and 30-minute cadence data from the first and second year of TESS observations (sectors 1-26).Therefore, for our study, we will process both the 2-minute and 30-minute cadence data for the provided eclipsing binaries, if such data is available on the Mikulski Archive for Space Telescopes (MAST4 ).Furthermore, the short-cadence (2-minute) light curve data obtained from the TESS survey includes SAP (Simple Aperture Photometry) flux (Twicken et al. 2010;Morris et al. 2020) and PDCSAP (Presearch Data Conditioning SAP) flux (Smith et al. 2012;Stumpe et al. 2014).SAP flux is basically a background-corrected value, while the PDCSAP flux not only corrects for the background but also effectively preserves transits, eclipses, and the intrinsic characteristics of the stars by removing systematic stellar effects.And PDCSAP flux is typically preferred for scientific research in analyzing stellar variability and conducting studies on planetary transit events.For long-cadence (30-minute) light curve, SAP flux and KSPSAP flux are provided, where the KSPSAP flux represents the light curve from the optimal aperture (Huang et al. 2020a,b).Therefore, in our study, when PDCSAP flux and KSPSAP flux are available, they are utilized for analysis.Then, we filter out the stars with few obser-vations and involve data pre-processing, which includes removing invalid data observations and normalizing the light curve by Eq.2.

Photometric solution
After conducting the above quality control screening, we obtained 2914 eclipsing binaries from the TESS survey.We next employ the pipeline proposed in this study to investigate the candidates of semi-detached binaries and extract their parameters.For each target, we sequentially employ the two models for these binaries.Finally, we select the candidates with an R 2 score larger than 0.95 and exclude the candidates with a dispersion greater than 0.02 mag between the fitted light curves and the observed light curves.According to the light-curve solution, we can obtain the mass ratio, relative radii, effective temperatures, and luminosity ratio.Then, the absolute parameters (e.g.luminosity, mass, radii) are calculated.For the purpose of measuring the luminosity of each star, the total bolometric magnitude is required to be entered into:

Absolute parameters solution
where M ′ total is the absolute magnitude derived from: In Eq. 5, m total is the visual magnitude obtained from Gaia DR3.In this paper, the visual magnitudes provided by Gaia in three bands (G, BP, RP) are all used to calculate the absolute parameters.d represents the distance in parsecs(pc) that obtained from Bailer- Jones et al. (2021) or Gaia MSC, where the distance estimation is based on color and magnitude priors.In Eq. 4, A represents the extinction in different band, which is calculated from A V using the extinction coefficient provided by Wang & Chen (2019), and the A V is given from the 3-D dust map (Green et al. 2019).And BC is the bolometric correction obtained from Chen et al. (2019).
Then, the luminosity of each star can be calculated by: where M b⊙ is the absolute bolometric magnitude of the Sun that is taken as 4.73 mag (Torres 2010).And the M b,(1,2) is the separated bolometric magnitude for each component, which is calculated as: where l (1,2) is a relative value of luminosity derived by: in Eq.8, l1 l2 is derived by Steffan-Bolzman law, where l1 l2 = ( R1 R2 ) 2 × ( T1 T2 ) 4 .As the L (1,2) is measured from bolometric magnitude (see Eq.6), radii can be derived by the relation of (1,2) .Then semi-major axis (a) is calculated by using relative radii (R (1,2) /a).Finally, masses can be computed using the third Kepler's law (Eq.9)and mass ratio (q).: 5. RESULTS

Absolute parameters
From the TESS EB catalogs (IJspeert et al. 2021;Prša et al. 2022), we finally identified 77 semi-detached binary candidates, including 76 binaries with a low-mass component filling its Roche lobe and 1 system with a more massive star filling its Roche lobe.In Appendix A, we present their parameters computed using the priors from Gaia MSC and Gaia golden sample, respectively.Additionally, the upper limit for effective temperature in Gaia MSC is set at 8000K, and considering our sample includes systems with early-type stars (IJspeert et al. 2021), in this section, we present the parameters calculated based on Gaia golden sample and distances from Bailer-Jones et al. ( 2021).It's essential to note that our parameter solutions in this study are predicated on the assumption of these binaries being semi-detached systems.
We compare the differences and relative uncertainties of masses and radii (σ M /M or σ R /R) for our candidates calculated from Gaia G-, BP-and RP-band in Fig. 8 and Fig. 9.In Fig. 8, it can be observed that there are relatively larger discrepancies when comparing the mass and radius between the RP-band and BP-band (or G-band).
In the Gaia mission, the G-band photometry demonstrates relatively higher precision, as it measures flux from the Image Parameter Determination (IPD) process, employing a complex model that incorporates extensive calibrations and employs a shape-based Point Spread Function (PSF).Conversely, BP-and RP-band photometry relies on the integration of low-resolution spectra (Riello et al. 2021).For stars showing larger mass and radius measurement discrepancies, their effective temperatures fall in the range of 6500 to 7500 K, which corresponds to early F-type or late A-type stars.These stars may also display variations in their spectral response in different bands.
Additionally, Fig. 9 shows the distribution of relative uncertainties of masses (panel (a) and (b)) and radii (panel (c) and (d)).Due to the asymmetric nature of the prior distributions for effective temperature, distance etc., we generate 500 Monte Carlo samples from these asymmetric distributions to estimate their upper and lower levels 1σ confidence intervals (i.e., 16th and 84th percentiles).In Fig. 9, the histograms represented by black dotted lines, blue dashed lines, and red solid lines depict the relative uncertainties of masses and radii derived from Gaia's G-, BP-and RP-band with priors from Gaia golden sample.Panels (a) and (c) display the distribution of the upper 1σ confidence intervals for mass and radius, while panels (b) and (d) depict the dis-tribution of the lower 1σ confidence intervals.As Fig. 9 shows, the median relative uncertainties of the upper and lower 1σ confidence intervals for masses in the G-, BP-and RP-band are 36.4% (upper) and 26.3% (lower), 36.3% (upper) and 25.8% (lower), and 36.4% (upper) and 25.5% (lower), respectively.The larger measurement uncertainty of mass in our study is due to the mass being derived from the radius, which introduces error propagation and amplifies uncertainty in the calculation of mass.Subsequently, we compare these parameters with stellar models and samples from the literature.d)) for our candidates that derived from Gaia three bands by using the parameters from Gaia golden sample.The black dotted, blue dashed, and red solid represent the relative uncertainties for Gaia's G-, BP-and RP-band, respectively.mary and secondary components compiled from the literature.The cyan solid and hollow rectangles are the primary components and secondary components of our semi-detached binary candidates with a less massive component filling its Roche Lobe that identified from TESS survey.The red solid and hollow diamonds represent the our candidates with a more massive component filling its Roche lobe.As shown in Fig. 10 (a), the majority of our candidates are very close to Algol-type semidetached systems, where the slow mass transfer is occurring (Mkrtichian et al. 2004).And the mass-accreting stars in these systems are typically the main sequence stars of spectral types B to F, while the donor stars are usually located in the region of the Hertzsprung-Russell diagram between the TAMS and the giants.Furthermore, we also include the samples with T eff > 10000 K. Similarly, we locate our samples with masses and radii in Fig. 10 (b), it can be observed that the main sequence stars are in agreement with the M -R relation, and the secondary components exhibit deviations from the main sequence and possess larger sizes compared to main-sequence stars with the same masses.(Ibanoǧlu et al. 2006;Meng et al. 2022).The black dash-dotted line and solid line represent the ZAMS (zero-age main sequence) lines with metallicities of Z=0.004 and Z=0.014 from the PARSEC model (Bressan et al. 2012), and the red line is the TAMS (terminal-age main sequence) line.The gray solid and hollow circles correspond to the primary and secondary components of the literature samples.The cyan solid and hollow rectangles represent the primary and secondary components of our semi-detached binary candidates with the less massive component filling its Roche lobe.The red solid and hollow diamonds represent our candidates with a more massive component filling its Roche lobe.

Specific angular momentum
J c is defined as: J c =q(1 + q) −2 • P 1/3 (Zhai et al. 1989).In Fig. 11 (a), the relation of the SDs containing earlytype accretors is shown, and panel (b) shows the relation of the SDs containing late-type accretors.In Fig. 11, the star markers are our SD candidates identified from the TESS survey, and dots are the literature samples, and the color of the marker represents the filling fraction of accretors (R/R L ).Zhai et al. (1989) presented a lower limit of the period for SDs with P min = 0.752 days for O-B stars and 0.248 days for A-F stars.Furthermore, they also pointed out that the presence of evolved SDs with extremely short periods is precluded due to the dynamic influence of the Roche critical surface.As the observations shown in Fig. 11, the shortest periods of SDs in observation are 0.915 days for O-B stars and 0.2508 days for A-F stars.Compared to the theoretical simulations, the SDs containing an O-B accretor have not yet reached the theoretical lower limit of the period.Moreover, SDs with larger mass ratios and shorter periods tend to exhibit a larger filling fraction for the accretors.It also can be seen that our sample is consistent with the semi-detached binaries from previous studies, and it further validates the effectiveness of our pipeline.

Mass transfer stage
In Fig. 12, we plot the relation between mass ratios (q=M donor /M accretor ) and orbital periods of these SD samples.The red star markers and black dots represent our SD candidates and the literature samples, respectively.Ge et al. (2015Ge et al. ( , 2020aGe et al. ( ,b, 2023) ) have proposed relationships between the critical mass ratio limits ( q ad ) and orbital periods for stars of different masses.In Fig. 12, we have selected the critical mass ratio limits for dynamical-timescale (gray lines) and thermal-timescale (blue lines) mass transfer at some significant evolutionary stages, including the ZAMS, the late Hertzsprung-Russell gap (LHG), and the base of the red giant branch (BRGB), represented by dashed, dotted, and solid lines, respectively.These limits form the foundation for discussing mass transfer phases in semi-detached binary systems.When a star with Roche lobe filling is positioned above the thermal-timescale limit but below the dynamicaltimescale limit, it undergoes thermal-timescale mass transfer.Conversely, when a star is under the thermaltimescale limit, it experiences nuclear-timescale mass transfer.Furthermore, if a star is located above the dynamical-timescale limit, it will likely enter the phase of dynamical-timescale mass transfer.Depending on the structure of the donor, a binary system could enter the phase of the prompt (convective-dominated) or the delayed (radiative-dominated) dynamical-timescale transfer (Ge et al. 2015(Ge et al. , 2020a)).And it is important to note that the criteria for mass transfer employed in this study are derived from the principles of conserved mass transfer.
As Fig. 12 shows, we find the majority of these SD samples are located under the mass ratio limit of BRGB (solid blue lines) for thermal-timescale, which indicates that these samples are undergoing nuclear-timescale mass transfer.And we also know from Fig. 10 that these SD samples are Algol-type semi-detached systems, where the evolved, lower-mass star fills its Roche lobe.In addition, systems with the more massive component filling its Roche Lobe (q > 1) are only observed in a few cases.These samples are undergoing rapid mass transfer, which is very short-lived and challenging to detect in observations.Among our candidates and the compiled literature samples, we only have 5 such systems.All of these 5 samples consist of two main-sequence stars, and they all have relatively short orbital periods (P < 1 days).As shown in Fig. 12 (a) and (b), two of these samples (GO Cyg and TIC 454222105) are positioned below the dynamical-timescale limit and above the thermaltimescale limit.This indicates that these two samples are currently undergoing a thermal-timescale mass transfer.As the mass transfer progresses, if the initial mass ratios are large enough for such samples, the accretor may overfill its Roche lobe, then forming a contact binary.During the contact phase, if the more massive main-sequence star continues transferring material to the less massive companion, it can result in a reversal of the mass ratio.In other words, the current less massive component becomes larger and more evolved, while the current more massive main-sequence star is smaller and less evolved.In Fig. 12 (b), the two samples (V36 Lyr and IR Cas) with a main-sequence donor star, they are slightly under the limit of thermal-timescale in ZAMS, it suggests that they may undergo the nuclear timescale mass transfer.Moreover, an intriguing system (IR Lyn, Meng et al. (2022)) whose observation indicates it is located above the ZAMS line and below the TAMS line in the T eff -L and M -R diagrams.However, there is a discernible deviation from the established ZAMS criteria in Fig. 12 (b).And it is located above the limit of dynamical-timescale mass transfer for LHG.As depicted in Fig. 12 (b), under the assumption of conserved mass transfer, it shows that this system is nearing the threshold for entering the dynamical timescale of mass transfer.As the mass transfer continues, the radiative-dominated envelope of the primary may be transferred completely, and then a delayed dynamicaltimescale mass transfer may occur in this system.

SUMMARY
In this paper, based on the MLP network with the MCMC method and DBSCAN clustering, we develop an efficient pipeline for identifying and deriving parameters of semi-detached binaries.In our pipeline, two  et al. (2015, 2020a,b, 2023), including the ZAMS in dashed lines, the LHG (late Hertzsprung-Russell gap) in dotted lines, and the BRGB (base of the red giant branch) in solid lines.The red star markers represent our semi-detached binary candidates, while the black dots correspond to the literature samples.
light curve fitting models are established.They are the more massive component filling its Roche Lobe and the less massive component filling its Roche Lobe.We train the models by using the light curves generated by PHOEBE.The results demonstrate that the model with the more massive component filling its Roche Lobe provides residuals' standard deviations for inclination, relative radius, mass ratio, and temperature ratio of approximately 0.403 • , 0.003, 0.020, and 0.037, respectively.Similarly, the model with the less massive component filling its Roche Lobe yields standard deviations of residuals for these four parameters of approximately 0.328 • , 0.003, 0.018, and 0.0.012,respectively.Then, we apply this pipeline to the TESS survey to analyze semidetached binaries and 77 semi-detached binary candidates are identified.We also derive their Gaia-distancedependent masses and radii with median relative uncertainties of ∼25% (lower) and ∼36% (upper), ∼6% (lower) and ∼7% (upper), respectively.We also compare the distributions of our semidetached binary candidates with the 111 compiled samples from previous studies.The T eff -L and M -R distributions demonstrate that our candidates show good agreement with the compiled samples and the PARSEC model.And the samples consist of configurations involving two main sequence stars, a main sequence star with a giant, or two giants.The majority of our candidates ex-hibit characteristics similar to Algol-type semi-detached systems.Based on a comparison with mass ratio limits and orbital periods, these Algol-type semi-detached systems are confirmed to undergo a nuclear-timescale mass transfer.Additionally, we highlight 5 samples where the more massive component fills its Roche lobe.These samples exhibit a close proximity to the predicted mass ratio limits.The discovery of such samples holds great anticipation as they can serve as valuable constraints for models.Moreover, as an application, our pipeline allows for a significant reduction in the processing time for photometric analysis of semi-detached binaries.It proves highly applicable in the analysis of semi-detached binaries within the framework of big data.This pipeline has the potential for transfer learning across other photometric data, enabling its use in a broader range of semi-detached binary studies.

ACKNOWLEDGEMENT
We wish to thank the referee for his/her valuable comments and suggestions, which have helped us further improve this work.This work is supported by NSFC (grant No.12125303, 12288102, 12125303, 12173081, 12303106)

Figure 1 .
Figure1.The input parameters' distributions of the training sample for the model with a more massive component filling its Roche lobe (red solid histograms) and the model with a less massive component filling its Roche lobe (black dash-dotted histograms).From left to right, the distributions of the effective temperature of primary (more massive) star (T1), cosine of the inclination (cos(i)), relative radii of unfilled component (R (1,2) /a), mass ratio (q=M2/M1) and temperature ratio (T2/T1) are shown.The non-uniform distributions of T1, R (1,2) /a and T2/T1 are primarily caused by the constraints of the physical models for semi-detached binaries.
Method descriptionIn our photometric analysis pipeline, we utilize EM-CEE (Foreman-Mackey et al. 2013) 2 as the framework for performing a MCMC fitting on light curves in two

Figure 2 .
Figure 2. The architecture of the network

Fig. 4
displays the distributions of parameters with the highest R 2 .The corresponding light curves are shown in the upper right panel of Fig.4.In the plot, the blue dots represent the true points of the illustrative example, the red line depicts the reconstructed light curve using the parameters measured by this pipeline, and the black dashed line represents the light curve generated through PHOEBE based on the parameters.Fig.4 demonstrates that the parameters measured by this method are identical to the true values, and the light curve reconstructed by the NN model closely matches the mocked light curve, as well as the one obtained through PHOEBE.

Figure 3 .
Figure 3.The upper plots display the results of the model with a more massive component filling its Roche lobe, while the bottom panels show the results of the model with a less massive component filling its Roche lobe.Panels (a) and (c) are cumulative histograms for standard deviations of the residuals (σRes) between the predicted light curves and the simulated light curves.Panel (b) shows a direct comparison between the predicted light curve (red solid line) and the true light curve (black dots) for a target with the more massive component filling its Roche lobe, and the parameters are T1= 37768 K, i=68.97 • , R2/a=0.265, q=0.924, and T2/T1=0.892.Similarly, panel (d) presents a direct comparison for a target with the less massive component filling its Roche lobe, with the parameters being T1= 29179 K, i=89.90 • , R1/a=0.235, q=0.756, and T2/T1=1.118.In panels (b) and (d), the corresponding residuals between predictions and true values are shown at the bottom.

Figure 4 .
Figure 4.The light curve fitting results obtained through our pipeline for a simulated target with the less massive component filling its Roche lobe.The true parameters are T1= 36552 K, i=77.72 • , R1/a=0.347, q=0.516, and T2/T1=1.268.The corner plot displays the distributions of measured parameters with the highest R 2 of 0.9980.The corresponding light curves are presented in the upper right plot.In this plot, the blue dots represent the true points of the simulated target, the red line represents the reconstructed light curve using the parameters measured by our pipeline, and the black dashed line represents the light curve generated by PHOEBE based on the parameters measured from our pipeline.
4. PHOTOMETRIC ANALYSIS FOR TESS SURVEY4.1.Target SelectionTo determine the properties of semi-detached binaries for the TESS survey, we combined the identified eclipsing binaries from IJspeert et al. (2021) andPrša et al.

Figure 5 .
Figure 5.The precision of parameter measurements for our pipeline (the model with a more massive component filling its Roche lobe).Panels (a) to (d) show the distributions of the discrepancies of inclination (i), relative radius of unfilled component (R2/a), mass ratio (q) and temperature ratio (T2/T1).The ordinate (y axis) represents the measured values, while the abscissa (x axis) is the true values.The mean value (µ) and standard deviation (σ) of the residuals between measured values and true values are displayed at the bottom of the panels.

Figure 6 .
Figure 6.The precision of parameter measurements for our pipeline (the model with a less massive component filling its Roche lobe).Panels (a) to (d) show the distributions of the discrepancies of inclination (i), relative radius of unfilled component (R1/a), mass ratio (q) and temperature ratio (T2/T1).The ordinate (y axis) represents the measured values, while the abscissa (x axis) is the true values.The mean value (µ) and standard deviation (σ) of the residuals between measured values and true values are displayed at the bottom of the panels.
(2) RUWE < 1.4.Such selection criteria were widely accepted practices and have been used in previous studies.For example, Pelisoli et al. (2019); Pelisoli & Vos (2019) used a similar condition to search for extremely low-mass white dwarfs, and Sanders (2023) used Gaia DR3 parallaxes to calibrate preliminary period-luminosity relations of O-rich Mira variables, etc.In the next step, we review the light curves for each star in the TESS survey.

Figure 7 .
Figure 7.Light curve fitting results for RT Per(Ceraski 1904) obtained by our pipeline with the priors from Gaia golden sample(Gaia Collaboration et al. 2023d).The corner plot illustrates the distributions of these measured parameters with the highest R 2 =0.9748 for the model with less massive component filling its Roche lobe.In the upper right plot, observations from the TESS survey are represented by black dots, while the blue dashed and red solid lines depict reproduced light curves using models with a more massive and a less massive component filling their Roche lobes, respectively.

Figure 8 .
Figure 8.Comparison of mass (panel (a) to (c)) and radius (panel (d) to (f)) determined from Gaia G-, BP-and RP-bands using the parameters from Gaia golden sample.The black and red rectangles represent the primary and secondary components.

Fig. 10 Figure 9 .
Fig.10(a) compares our semi-detached candidates with the 50 Near-Contact Binaries(Meng et al. 2022) and 61 Algol-type semi-detached binaries(Ibanoǧlu et al. 2006) on the T eff -L relation.The black dashdotted line and solid line represent the zero-age main sequence (ZAMS) lines with metallicities of Z=0.004 and Z=0.014 from the PARSEC model(Bressan et al. 2012), the red solid line is the terminal-age main sequence (TAMS) line from PARSEC model.The gray solid circles and hollow circles correspond to the pri-

Fig. 11 Figure 10 .
Fig.11shows the relation between mass ratio (q=M donor /M accretor ) with specific angular momentum (J c ) for SDs containing a giant (subgiant) donor, while

Figure 11 .
Figure 11.The relation between mass ratio (q=M donor /M accretor) with specific angular momentum (Jc) for the semi-detached binaries that containing a giant (subgiant) donor.Panel (a) shows the relation of the semi-detached binaries with early-type accretors, and panel (b) is the relation of the semi-detached binaries with the late-type accretors.The color of the marker represents the filling fraction of the unfilled component (R/RL).The star markers are our semi-detached binary candidates identified from the TESS survey, and the dots represent the semi-detached binaries compiled from the literature (Ibanoǧlu et al. 2006; Meng et al. 2022).

Figure 12 .
Figure12.The distribution of mass ratio (q=M donor /M accretor) and orbital period for semi-detached binaries with early-type accretors (panel (a)) and late-type accretors (panel (b)).The colored lines represent the theoretical critical mass ratio limits for dynamical-timescale (gray lines) and thermal-timescale (blue lines) mass transfer at different evolutionary stages fromGe et al. (2015Ge  et al. ( , 2020aGe  et al. ( ,b, 2023)), including the ZAMS in dashed lines, the LHG (late Hertzsprung-Russell gap) in dotted lines, and the BRGB (base of the red giant branch) in solid lines.The red star markers represent our semi-detached binary candidates, while the black dots correspond to the literature samples.

Table 1 .
Prša et al. (2022) RT PerThe period we used is derived fromPrša et al. (2022).The final two columns are our results obtained using different priors, specifically Gaia MSC (Multiple Star Classifier,Gaia Collaboration et al. (2023c)) and Gaia golden sample of astrophysical parameters(Gaia Collaboration et al. 2023d).For RT Per, the goodness-of-fit score logposterior msc=201.

Table 2 .
, the National Key R&D Program of China (grant No.2021YFA1600401/ 2021YFA1600403), the key research program of frontier sciences, CAS, No.ZDBS-LY-7005, Yunnan Fundamental Research Projects (grant No.202101AV070001), the International Centre of Supernovae, Yunnan Key Laboratory (No.202302AN360001) and the Yunnan Revitalization Talent Support Program"-Science & Technology Champion Project (No.202305AB350003).This work is also supported by the China Manned Space Project of No.CMS-CSST-2021-A10.We acknowledge the TESS mission provided by NASA's Science Mission Directorate.We also acknowledge the use of public TESS data from pipelines at the TESS Science Office and at the TESS Science Processing Operations Center.Absolute parameters of 77 semi-detached binaries from TESS survey (calculated based on T1 from Gaia golden sample (Gaia Collaboration et al. 2023d) and distance from Bailer-Jones et al. (2021)).
Period is extracted from IJspeert et al. (2021), while other periods are obtained from Prša et al. (2022).(*): The target with the more massive component filling its Roche lobe.