Identification of 4FGL uncertain sources at Higher Resolutions with Inverse Discrete Wavelet Transform

In the forthcoming era of big astronomical data, it is a burden to find out target sources from ground-based and space-based telescopes. Although Machine Learning (ML) methods have been extensively utilized to address this issue, the incorporation of in-depth data analysis can significantly enhance the efficiency of identifying target sources when dealing with massive volumes of astronomical data. In this work, we focused on the task of finding AGN candidates and identifying BL Lac/FSRQ candidates from the 4FGL DR3 uncertain sources. We studied the correlations among the attributes of the 4FGL DR3 catalogue and proposed a novel method, named FDIDWT, to transform the original data. The transformed dataset is characterized as low-dimensional and feature-highlighted, with the estimation of correlation features by Fractal Dimension (FD) theory and the multi-resolution analysis by Inverse Discrete Wavelet Transform (IDWT). Combining the FDIDWT method with an improved lightweight MatchboxConv1D model, we accomplished two missions: (1) to distinguish the Active Galactic Nuclei (AGNs) from others (Non-AGNs) in the 4FGL DR3 uncertain sources with an accuracy of 96.65%, namely, Mission A; (2) to classify blazar candidates of uncertain type (BCUs) into BL Lacertae objects (BL Lacs) or Flat Spectrum Radio Quasars (FSRQs) with an accuracy of 92.03%, namely, Mission B. There are 1354 AGN candidates in Mission A, 482 BL Lacs candidates and 128 FSRQ candidates in Mission B were found. The results show a high consistency of greater than 98% with the results in previous works. In addition, our method has the advantage of finding less variable and relatively faint sources than ordinary methods.


INTRODUCTION
Active Galactic Nucleus (AGN) has been a hot topic for more than sixty years in astronomy since its discovery in 1963 (Schmidt 1963).It is believed that AGNs centred a supermassive black hole (SMBH), which is surrounded by an accretion disk, the whole system provides energy for AGN radiation (Lynden-Bell 1969;Blandford & Znajek 1977;Blandford & Payne 1982).The emissions from AGNs are observed to span over the entire electromagnetic spectrum, these emissions are found to be strong and variable.Based on the ratio of radio emission strength to the optical one, AGNs are divided into radio-loud and radio-quiet ones (Strittmatter et al. 1980;Kellermann et al. 1989), and this method has recently been modified by a 'double-criterion' method (Xiao et al. 2022a).Blazars, a subclass of radio-loud AGNs that with jets pointing toward the observer, show multi-bands high and fast variability, high and variable polarization, strong and variable γ-ray emissions, and apparent superluminal motion (Wills et al. 1992;Urry & Padovani 1995;Villata et al. 2006;Fan 2002;Fan et al. 2014Fan et al. , 2021;;Gupta et al. 2016;Xiao et al. 2019Xiao et al. , 2020aXiao et al. , 2022b;;Abdollahi et al. 2020).Blazars consist of BL Lacertae object (BL Lac) and flat spectrum radio quasar (FSRQ), the former one shows strong emission lines (rest-frame equivalent width, EW > 5 Å) and the latter one demonstrates no or weak emission features (EW < 5 Å) (Urry & Padovani 1995;Scarpa & Falomo 1997).
The study of blazars was severely limited due to the small sample size, until the launch of the Large Area Telescope on board the Fermi Gamma-Ray Space Observatory (Fermi -LAT) in 2008.It has an unprecedented performance, a better energy resolution, angular resolution, and a wider effective area in both low-energy and high-energy bands1 than its predecessor EGRET (Thompson et al. 1993), at the energy ranges from 20 MeV to 300 GeV.Fermi -LAT collaboration has released 5 main γ-ray sources catalogues, namely 0FGL, 1FGL, 2FGL, 3FGL and 4FGL (Abdo et al. 2009(Abdo et al. , 2010;;Nolan et al. 2012;Acero et al. 2015;Abdollahi et al. 2020).However, there are still significant Fermi sources not related to any known class, e.g., 1010 unassociated sources in 3FGL and 2291 uncertain sources (2157 unassociated sources + 134 unknown sources) in the latest 4FGL DR3, 573 blazar candidate of uncertain types (BCUs) in the 3FGL and 1493 BCUs in the latest 4FGL DR3.
It is time-consuming to verify these uncertain sources one by one through optical observation, thus more efficient methods must be explored.Many machine-learning-based algorithms have been employed for this issue.For instance, Saz Parkinson et al. (2016) applied two different Machine Learning (ML) methods (Random Forest (RF) and Logistic Regression (LR)) to identify 1008 unassociated sources in 3FGL.Among them, 334 sources were predicted as Pulsars (PSRs), and 559 sources were predicted as AGNs.Chiaro et al. (2016) utilized Blazar Flaring Patterns (B-FlaP) as an identification approach on BCUs.Since variability is one of the characterizing properties of blazar (Paggi et al. 2011), the light curve of blazar was used by an Artificial Neural Networks (ANN) to identify 573 BCUs in 3FGL.These BCUs were associated with 342 BL Lacs and 154 FSRQs, and 77 sources remained uncertain.Xiao et al. (2020b) carried out an ensemble ML method, picked out 748 AGNs candidates from 1010 3FGL unassociated sources, and identified 573 BCUs to be 326 BL Lac candidates and 247 FSRQ candidates.Moreover, Kang et al. (2019) studied the classification of 1312 BCUs from 4FGL DR1 via three supervised ML methods and obtained 724 BL Lac and 332 FSRQ candidates.
The task of Fermi source classification can be seen as feature extraction with ML methods due to their ability to learn patterns from data and provide valuable insights, decisions, and predictions (Jordan & Mitchell 2015;Zhou et al. 2017).However, traditional ML methods are limited when dealing with the increasing volume of big astronomical data with the successful launch of more and more telescopes and detectors.In recent years, the popularity of Graphics Processing Units (GPUs) fires the research of Deep Learning (DL) for learning features from massive-scale data using Deep Neural Networks (DNNs).It has become a major research focus in the ML field (Yu & Deng 2010;Liu et al. 2017).DNNs have proven to be successful in various real-world applications (Jifara et al. 2019;Zyner et al. 2019;Alemany et al. 2019;Lam et al. 2019;Chen et al. 2018).Furthermore, it has been shown that more complex problems require deeper networks (Bengio et al. 2009;He et al. 2016), which has led to the development of sophisticated networks such as VGG (Simonyan & Zisserman 2014), ResNet (He et al. 2016), and ChatGPT (Brown et al. 2020).
However, rather than solely focusing on the deep structure of networks that helps to learn intrinsic features, attribute analysis should be emphasized as well.We propose that there exists a correlation feature among the attributes of raw data, and it will improve the learning performance further.In addition, numerous studies have shown that real-world data often contain highly redundant and unimportant attributes (Bakshi & Stephanopoulos 1993;Bengio et al. 2009;Glorot et al. 2011).This redundancy can lead to sparsity in high-dimensional space, where most samples in the dataset are far away from each other.In classification tasks, this sparsity can result in less reliable predictions than in low dimensions, as predictions are based on larger extrapolations (Géron 2017).Therefore, we believe that attribute analysis presents an opportunity for better results through additional correlation features and dimension reduction.
In this paper, we focus on two missions for 4FGL DR3, i.e., classifying 2291 uncertain sources into AGN or Non-AGN and associating 1493 BCUs to BL Lac or FSRQ, which are named Mission A and Mission B, respectively.Firstly, we study some popular attribute analysis methods and glimpse the attributes of 4FGL DR3 sources.Then we find out the core attributes based on the Fractal Dimension (FD) theory and step into the research of multi-attribute analysis from the perspective of the whole dataset.Based on the results, we propose a novel method called FDIDWT, which is the combination of FD and Inverse Discrete Wavelet Transform (IDWT), to extract correlation features at a higher resolution.By FDIDWT, the original dataset is transformed into a low-dimensional and feature-highlighted set, which benefits the subsequent learning process.In the end, we combine the FDIDWT method with a lightweight Convolutional Neural Network (CNN) model as our contribution to accomplish the classification missions.
The paper is organized as follows.Sec. 2 describes the datasets in two missions and presents some commonly used attribute analysis methods.Based on that, Sec. 3 interprets our proposed method in detail.The experiments and results are reported in Sec. 4. Further discussions and conclusions are presented in Sec. 5 and 6, respectively.

Samples of 4FGL DR3
The Fermi -LAT collaboration has recently released the incremental version of the 12-year Fermi -LAT Gamma-ray Source Catalog (4FGL DR3, Abdollahi et al. 2022).It contains 6659 sources, among which 3809 sources are AGNs, 559 sources are associated with Non-AGNs (including pulsars, high-mass binaries, and supernova remnants, etc.), and 2291 uncertain sources (134 sources are associated with counterparts of unknown nature and 2157 unassociated sources).Within the AGNs, there are 3743 sources confirmed with blazars, among which 1456 blazars are associated with BL Lacs, 794 blazars are associated with FSRQs, and the rest of 1493 Blazars are BCUs that have not been tagged as BL Lacs or FSRQ.
To accomplish Mission A and B mentioned above, we need to select features that can distinguish one from another.Variability is one of the widely known characteristics of AGNs with respect to others that are detected by Fermi -LAT (Abdollahi et al. 2020(Abdollahi et al. , 2022)), those features that contain variability information ('Flux1000', 'Flux Band', 'Variability Index' and 'Frac Variability') should be included for accomplishing Mission A. While FSRQs and BL Lacs demonstrate significant γ-ray spectra, the GeV γ-ray regime situates at the different regimes of the higher hump of blazar spectral energy distribution (SED) (Fan et al. 2016;Yang et al. 2022Yang et al. , 2023)), those features that contain spectral information ('Flux Band', 'Pivot Energy', 'PL Index') should be included for Mission B.
In this case, we compile the data of 13 attributes from 4FGL DR3, as listed in Tab. 1. Integral photon flux from 1 to 100 GeV a 3 PL Index Best fit power-law index a 4 Variability Index Sum of 2A-log(Likelihood) difference between the flux fitted in each time interval and the average flux over the full catalog interval a 5 Frac Variability Fractional variability computed from the fluxes in each year a 6 Flux Band1 Integral photon flux in the spectral band 0.05 -0.1 GeV a 7 Flux Band2 Integral photon flux in the spectral band 0.1 -0.3 GeV a 8 Flux Band3 Integral photon flux in the spectral band 0.3 -1 GeV a 9 Flux Band4 Integral photon flux in the spectral band 1 -3 GeV a 10 Flux Band5 Integral photon flux in the spectral band 3 -10 GeV a 11 Flux Band6 Integral photon flux in the spectral band 10 -30 GeV a 12 Flux Band7 Integral photon flux in the spectral band 30 -100 GeV a 13 Flux Band8 Integral photon flux in the spectral band 100 -1000 GeV As to Mission A and Mission B, we randomly split the data into three subsets with a ratio of 8:1:1, as shown in Fig. 1.The training set is used to fit the ML methods or DNN models, and the validation set aids in fine-tuning hyper-parameters for better performance.Finally, the generalization capability of the model is evaluated on the test set independently.Moreover, to explore the robustness of the model, this split policy was repeated 10 times and in the end, 10 datasets were prepared for each mission.In this section, we analyze the attributes of the training set with the methods shown in Fig. 2.

Attribute Importance
As mentioned before, it is found that real-world data often contain highly redundant and unimportant attributes.The Decision Tree (DT) is a popular technique to estimate the importance of attributes.The attributes that appear in the tree are considered important, with the frequency of their appearance being their attribute importance.The less frequently an attribute appears, the less important it is assumed to be.RF is composed of multiple DTs and reduces the bias in estimating attribute importance (Breiman 2001).It is always the practice to remove the unimportant attributes and perform the so-called Attribute Selection (AS) for dimension reduction.We employ 50000 DTs to build the RF and take the entropy as the criterion.Other hyper-parameters align to the default values of scikit-learn (Pedregosa et al. 2011).The results of attribute importance on the training sets are averaged over 10 datasets and shown in Fig. 3. Interestingly, from the results, we find that the four most important attributes are 'Pivot Energy', 'PL Index', 'Variability Index', and 'Frac Variability' both in Dataset A and B.

Principal Component Analysis
Another method to study attributes is to project the data from the attribute space to a new space.In this new space, the direction with maximum variance is considered to contribute most to the data, while the directions with small variances can be removed without sacrificing crucial information.This method is called Principal Component Analysis (PCA) (Jolliffe & Cadima 2016).For each split dataset, we normalize the samples and perform PCA on training sets.The variance ratio is averaged and depicted in Fig. 4. We find that the most information of training data is concentrated in the first 3 components (variance ratio larger than 0.1) both in Dataset A and B.

Attribute Significance Estimator based on the Fractal Dimension (FDASE)
Even though the RF model aggregates the attribute importance values from all the DTs (see Sec. 2.2), it may result in redundancy as it does not consider the correlations among important attributes.To address this issue, the FD theory is applied to estimate the potential contribution of each attribute to the dataset and measure the correlations among  attributes (Belussi & Faloutsos 1995, 1998).This method is based on the observation that independent attributes contribute more to the dataset, while correlated attributes contribute less.Mathematically, for a dataset A with E attributes: A = {a 1 , a 2 , ..., a E }, the fractal dimension theory supposes that the potential existence of correlated attributes leads the set of points in the original E-dimensional space to describe one spatial object in a dimension that is lower than or equal to E. The dimension of the object represented by the dataset is called Intrinsic Dimension (ID), denoted by D, D ∈ R + .The ceiling of the ID ⌈D⌉ is the minimum number of attributes that must be retained to keep the essential characteristics of a dataset (de Sousa et al. 2007).
However, the ID of a dataset is difficult to obtain.Alternatively, we consider the ID projecting the dataset on a subset C ⊂ A, where C is defined by an attribute subspace C ⊂ A. It is named Partial Intrinsic Dimension (PID) on C: pD(C).Based on the definitions, the attribute a i ∈ (A − C) increases pD(C) by at most its Individual Contribution (IC) according to the degree of the correlation between a i and the attributes in C. The IC of a i , i.e., iC(a i ), is the maximum potential contribution of the attribute a i to pD(C).The greater the correlation between a i and the attributes in C, the lower its contribution to pD(C).
Also, iC(a i ) can be measured by pD({a i }) and it ranges in [0, 1].A more independent distribution of the values of a i leads to iC(a i ) closer to one, while a more structured distribution brings iC(a i ) closer to zero (de Sousa et al. 2007).Thus, the E-dimensional dataset can be seen as formed by adding the attributes with different contributions to the D-dimensional sub-dataset.
Moreover, the degree of correlations among attributes can be measured by a threshold ξ: a sub-dataset B ⊂ A is said to be ξ-correlated to another sub-dataset C ⊂ A (their attribute spaces B ∩ C = ∅) if every attribute a i ∈ B does not contribute more than ξ * iC(a i ) to pD(C).The threshold ξ ∈ [0, 1) tunes how strong the correlation between attributes in B and attributes in C should be to be detected.
A greedy algorithm FDASE was developed by (de Sousa et al. 2007) to find a subset of attributes whose PID approaches the ID of the whole dataset.The result subset is called Attribute Set Core ξC (ASC), with the given correlation threshold ξ and scale range n.Therefore, the ratio pD(ξC)/ID normalizes the contribution of ASC to the whole dataset.Similarly, for each split dataset, we utilize the FDASE algorithm on the training sets with scanning correlation threshold ξ and compare the value of pD(ξC)/ID with the number of ASC in Fig. 5.We fix the scale range n = 50 and plot the IC for each attribute in Fig. 6 under a suitable ξ.From the figures, we find that when ξ = 0.5 two attributes ('PL Index' and 'Pivot Energy') contribute around 98% information to Dataset A. While for Dataset B, four attributes ('PL Index', 'Pivot Energy', 'Flux Band1' and 'Flux Band7') contribute around 92% information when ξ = 0.5.

Inverse Discrete Wavelet Transform
The attribute analysis methods introduced in Sec. 2 are to find the important attributes or components that can be retained for the learning process, while other attributes or components are removed for dimension reduction.However, the crude removal may cause the loss of correlation features that degrade the performance.
We propose to retain all of the attributes but perform IDWT on the original samples for the representations at higher resolutions.After IDWT, the correlation features are supposed to be highlighted and meanwhile, the dimension is possibly reduced, which is important when dealing with big data.
The well-known wavelet transform may be the Discrete Wavelet Transform (DWT), which is usually implemented with filtering operations for high efficiency.The filters are designed based on the standpoint of multiresolution: the difference of information between the approximation of a signal at the resolutions 2 m+1 and 2 m (where m is an integer) can be extracted by decomposing this signal on an orthonormal basis of wavelets (Mallat 1989).The pyramidal structure of the wavelet filter bank makes it possible to infer the information at a low resolution from the information at a high resolution.IDWT is the converse process of DWT and provides the representations at high resolutions for DL.
However, in practical applications, a finite signal should be considered.The length of the signals varies at different resolutions, which is due to the operations of downsampling/upsampling and filtering.In the IDWT process, if denote p and s as the length of the signal at low resolution and high resolution, respectively, then Rajmic & Prusa (2014) gives: where u is the length of the filters.
To perform IDWT on the original dataset, the attribute data of one sample will be seen as the signal at a lower resolution.As a result, the attributes should be rearranged in some order based on the wavelet theory.This can be achieved with the alignment of the information in the wavelet domain and attribute space, which is to be discussed next.

Information in Wavelet Domain: I c
The process of DWT is to analyze the original signal from a fine scale to a coarse scale.The representation of a finite signal f (t) in the wavelet domain after DWT is a collection of vectors: where, J ∈ N is called decomposition level; c J contains the approximation coefficients at level J, i.e., the lowest resolution, and d j contains the detail coefficients at level j, i.e., the high resolutions.An example of the three-level pyramid transform is illustrated in Fig. 7.We find that the original signal f (t) can be seen as the approximation coefficients at the highest resolution, i.e., c 0 .However, based on prior knowledge, it is often observed that at a specific level, the majority of information in a natural signal is typically presented in the approximation coefficients.Additionally, during the decomposition of the signal from level j to j + 1 in DWT, the information is typically reduced by half due to the downsampling operation.Hence, it can be found that the information of coefficients at different levels roughly respect to: (3) As an example, the three-level DWT is performed on three images to intuitively illustrate the relationship, as shown in Fig. 8.These figures suggest that the approximation coefficients at each level retain the image contour and contain a significant amount of information, while the objects in the image cannot be easily identified through the detail coefficients.However, to some extent, the detail coefficients at a low level appear to provide more information than those at a high level, i.e., I c (d j+1 ) < I c (d j ).To transform the coefficients from lower resolutions to higher resolutions in the IDWT process, the Wavelet Decomposition Vector (WDV) c and Bookkeeping Vector (BV) l are required according to multiresolution analysis.The WDV includes the coefficients shown in Eqn.2: while the BV is made up of the number of coefficients in c: where len(•) indicates the number of the elements in a vector.

Information in Attribute Space: I a
In the field of big data, an attribute's information reflects its contribution to the dataset.As explained in Sec.2.4, the IC estimates the potential contribution of an individual attribute to the dataset.However, the presence of correlations among attributes means that the actual contribution of an attribute a i , i.e., its information I a ({a i }), cannot be precisely determined if a i involves other attributes.
Moreover, as discussed, ASC is the smallest subset of attributes that can fully characterize the entire dataset.Given a correlation threshold ξ, the ASC ξC can be found out with the FDASE algorithm (de Sousa et al. 2007).The attributes in ASC are not ξ-correlated with each other and they contain the most information in the dataset.
Based on the analysis, if A = {a 1 , a 2 , . . ., a E } is denoted as the universal attribute set of the dataset that has E attributes, then the real contribution of attribute a i ∈ (A − ξC) to the dataset can be seen as the degree of correlation between a i and the attributes in ξC.The weaker the correlation between a i and ξC, the higher the contribution of a i to the dataset, i.e., attribute a i contains more information.Therefore, the information of attribute a i can be estimated by where, pD(•) is the PID of a sub-dataset.
By estimating the information for each attribute a i ∈ (A − ξC), we can obtain a rough ordering of attributes based on their information.In this case, the data under the attributes can be treated as wavelet coefficients and arranged to form the WDV c and BV l for IDWT.The dataset is then transformed into representations at a higher resolution where the correlation features are presented.This is the main idea of our method, which will be detailed next.

FDIDWT
The J-level IDWT aims to transform the original data into the representations at the J-higher resolution.To achieve this, the WDV and BV must be constructed with the attribute data of each sample.However, the positions of the data in the WDV are determined by the estimation of the attribute information with Eqn. 6.As a result, this approach is referred to as Fractal Dimension -Inverse Discrete Wavelet Transform (FDIDWT), or FDIDWT for short.
As discussed before, the attributes in ASC ξC ⊂ A contains the most information of the dataset defined in A = {a 1 , a 2 , . . ., a E }.It is assumed that the information of attributes in A − ξC respects to an ascending order: where, a i ∈ (A − ξC) and 1 ≤ i ≤ Q.If the number of attributes in ξC is P , then P + Q = E. Contrasting with the order of the information in the wavelet domain shown in Eqn. 3, we group the attributes in A − ξC and obtain a similar order of the information in the attribute space: where, D L j is the attribute group containing L attributes whose information respects to Eqn. 7, i.e., Thus the data under the attributes of D L j and ξC can be taken as the detail coefficients d j and approximation coefficients c J , respectively.Specifically, if denote x j and x ξC as the data under the attributes in D L j and ξC respectively for one sample of the original dataset, the WDV c and BV l for the IDWT process can be constructed as: where O is the number of attributes in the transformed dataset, i.e., the transformed dimension in the new attribute space, which can be derived with Eqn. 1. Tab. 2 shows the procedure of the proposed FDIDWT method.To better describe how the algorithm works, we fix the decomposition level J = 3.Moreover, as in the common case, the length of filters u for the pyramid transform is set to be even.Then: 1.
Step 1-3: Given the correlation threshold ξ and scale range n, the FDASE algorithm (de Sousa et al. 2007) is used to find out the attribute set core ξC.Then the information I a ({a i }) of the remaining attributes a i ∈ (A − ξC) are calculated.The attributes of A are arranged into a set A ′ according to the ascending order of I a ({a i })2 .For instance, if the relationship in Eqn.7 holds, then In our experiment, we scanned the correlation threshold ξ in the range [0.05, 1.0] with step 0.05, and set scale range n = 50.We got the ASC {a 3 , a 1 } and {a 3 , a 1 , a 5 } for Dataset A and B, respectively, as shown in Fig. 5 and 6.Then the arranged attribute set A ′ for the two datasets will be {a 3 , a 1 , a 5 , a 13 , a 10 , a 2 , a 9 , a 12 , a 4 , a 8 , a 11 , a 7 , a 6 } and {a 3 , a 1 , a 5 , a 13 , a 4 , a 12 , a 11 , a 7 , a 8 , a 6 , a 9 , a 2 , a 10 }.Input: original dataset defined on attribute space A decomposition level J correlation threshold ξ scale range n Output: new dataset defined on attribute space B 1: run FDASE algorithm to find out the ASC ξC, denote its length as P ; 2: calculate Ia(ai), ai ∈ (A − ξC), with Eqn.6; 3: arrange the attributes of A into A ′ according to the order of Ia(ai) as shown in Eqn.7; 4: based on P , calculate the length of other groups, insert placeholder attributes if needed; 5: construct the WDV c and BV l; 6: perform IDWT on each normalized sample with c and l; 7: output the new dataset defined on B; 2. Step 4-5: P is the number of attributes in ξC.In the proposed method, the length of the approximation coefficients at level J is assumed to be not shorter than P .If this length equals P , then Fig. 9 shows other groups divided from A ′ .C P 3 denotes the approximation coefficient group of attributes in ξC.Note that, the original dimension E, i.e., the number of attributes of the original dataset, may be smaller than the length of WDV c due to the given decomposition level J and filter length u.In this case, some necessary placeholder attributes " * " are inserted following the ASC group C P 3 .The role of the placeholder attributes is to ensure IDWT works while not changing the information of data.Therefore, the values under the placeholder attributes are set to zeros so that they contribute nothing to the dataset.
x g a 0 l 5 6 J U r t 0 C q c / J q W B E 9 K E l B c T l q c Z K p 4 q Z 8 n + 5 j 1 T n v J u U 1 q d z M s n V u C W 2 L 9 0 i 8 z / 6 m Q t A k N c q B o 8 q i l S j K z O z V x S 9 S r y 5 s a X q g Q 5 R M R J P K B 4 T N h V y s U 7 G 0 q T q N r l 2 9 o q / q Y y J S v 3 b p a b 4 l 3 e k h p s O q e 1 w 9 O j 8 s 1 S r Z q P P Y w z 4 q N M 8 T 1 F B H A 0 3 y H u I R T 3 i 2 6 l Z o p d b d Z 6 q V y z S 7 + L a s h w / 9 s o + e < / l a t e x i t > P < l a t e x i t s h a 1 _ b a s e 6 4 = " x o m v w q K Z Y M 6 o 6 l r m k u i v q 5 u a X q i Q 5 x M Q p P K C 4 I M y 0 c t Z n U 2 s S X b v q r a P j b z p T s W r P s t w U 7 + q W N G D 7 5 z j n Q b t W t Y + q h + e 1 U r 2 S j T q P P e y j Q v M 8 R h 0 N N N E i 7 x E e 8 Y R n o 2 G E R m r c f a Y a u U y z i 2 / L e P g A 9 o + P m w = = < / l a t e x i t > M < l a t e x i t s h a 1 _ b a s e 6 4 = " u d t n P X Z 1 J p E 1 6 5 6 6 + j 4 m 8 5 U r N q z L D f F u 7 o l D d j + O c 5 5 0 K 5 V 7 a P q 4 X m t V K 9 k o 8 5 j D / u o 0 D y P U U c D T b T I e 4 R H P O H Z a B i h k R p 3 n 6 l G L t P s 4 t s y H j 4 A + P C P n A = = < / l a t e x i t > N < l a t e x i t s h a 1 _ b a s e 6 4 = " 7 e 1 N G E Q y x o m v w q K Z Y M 6 o 6 l r m k u i v q 5 u a X q i Q 5 x M Q p P K C 4 I M y 0 c t Z n U 2 s S X b v q r a P j b z p T s W r P s t w U 7 + q W N G D 7 5 z j n Q b t W t Y + q h + e 1 U r 2 S j T q P P e y j Figure 9.The groups divided from sorted attributes.Taking decomposition level J = 3 an example.
In the signal processing field, signals are always transformed into the wavelet domain with DWT for processing, and the lengths of the wavelet coefficients at each level are recorded for the reconstruction with IDWT.However, there is no DWT in the proposed method.Hence, many cases exist for the lengths of the divided groups based on Eqn.1: The length of the placeholder attributes L can be derived based on the equality of the length of WDV, i.e., P + M + N + K = E + L. Then the WDV and BV are constituted as follows: where O is the transformed dimension computed with: We used the Daubechies filter which can be indicated by its length u.For example, the filter length u = 8 indicates that the "db4" wavelet filter was used.Under the assumption of the max decomposition level J = 3, we built the WDV and BV for two missions in Tab. 3.  ({x3, x1, x5},{x13, x10, x2},{x9, x12, x4},{x8, x11, x7, x6}) B: ({x3,x1,x5},{x13,x4,x12},{x11,x7,x8},{x6,x9,x2,x10}) (3,3,3,4,5) db2 #14 (3,3,3,4,6) db2 #15 Value 0 in the WDV means one placeholder attribute is inserted.

3.
Step 6-7: Finally, IDWT is performed on each sample of the normalized dataset, and a new dataset defined on

A Lightweight CNN
Since the new attribute space B is considered to respect some natural order after FDIDWT, we believe the convolution operation with a kernel will learn more features and achieve better performance.The kernel of CNN determines a visual field on the adjacent attributes in DL.Thus we utilized a lightweight CNN model modified from the Matchbox net (Majumdar & Ginsburg 2020) but with a smaller size of only 94k parameters for high efficiency.We refer it to as "MatchboxConv1D" and its structure is depicted in Fig. 10.

EXPERIMENTS AND RESULTS
With the 3 attribute analysis methods introduced in Sec. 2, we independently used 4 ML classifiers, i.e, RF, Support Vector Machine (SVM), AdaBoost, and Multilayer Perceptron (MLP) to carry out the Mission A and Mission B. The hyper-parameters are listed in Tab. 4. While the FDIDWT method is designed to work with the MatchboxConv1D model proposed in Sec.3.5, and the hyper-parameters are shown in Tab. 5.After training the models with the training set, fine-tuning with the validation set, and evaluating with the test set, the uncertain samples, i.e., the uncertain sources in Mission A and the BCUs in Mission B, are finally predicted with the model achieving the highest test accuracy.This flowchart is illustrated in Fig. 11.
We performed the experiments on each split dataset, and the accuracy results were averaged over 10 test sets.In the experiments, we extensively evaluated the performance of an increasing number of the most important attributes or principal components.These attributes and components were sorted in descending order of importance and variance ratio, respectively, as shown in Fig. 3, Fig. 4, and Fig. 6.The ML methods were implemented with scikit-learn (Pedregosa et al. 2011), and the training process of DNNs was accelerated by TensorFlow (Abadi et al. 2016) on NVIDIA GeForce RTX 3060.The code used in this experiment has been uploaded to GitHub3 .The average accuracy results of the commonly used attribute analysis methods and classifiers on test sets are compared in Tab.6 and 7 for two missions.These results are also illustrated in Fig. 12. From the results, we find that the highest test accuracy is 95.49% ± 1.05% in mission A, which comes from the AdaBoost classifier working with the FDASE attribute selection method in the case of reducing two dimensions.In mission B, the highest test accuracy is 91.19% ± 0.0% which results from the MLP classifier working with different attribute analysis methods but reduces at most one dimension.The highest test accuracy for each combination of the attribute analysis method and the classifier is highlighted by a gray color.The highest test accuracy for each combination of the attribute analysis method and the classifier is highlighted by a gray color.
Moreover, Tab. 8 shows the average test accuracy results of the proposed method with different transformed dimensions (see Tab. 3) for two missions.We find that case #6 achieves the highest test accuracy and outperforms the results of Tab. 6 and Tab. 7, meanwhile reducing five dimensions.Finally, the prediction results of uncertain sources and BCUs in two missions are listed in Tab. 9.

The Proposed Method
The extraction of correlation features among the attributes of Fermi sources is a promising research field.The commonly used attribute analysis methods such as the estimation of attribute importance (see Sec. 2.2) or principal components (see Sec. 2.3) aim to find the significant attributes or components, and remove the unimportant ones for dimension reduction.However, the correlation features will be removed with these methods, which results in inferior classification performance (see the results of Tab. 6 and Tab. 7).
The FDASE gives a global perspective of the whole dataset and estimates the correlation features among all of the attributes based on fractal dimension theory.The ASC is the resulting attribute subset of the FDASE algorithm and its data contains most of the information of the dataset under some correlation threshold and scale range.Nevertheless,  the retention of ASC also suffers from the loss of correlation features (see the results of FDASE attribute selection in Tab.6 and Tab. 7).Thus, we propose to rearrange the original attributes for highlighting correlation features at higher resolutions based on FDASE and IDWT, which is referred to as the proposed FDIDWT method.Additionally, the resulting attribute space is considered to respect some natural order after FDIDWT.We believe that the convolution operation will extract more correlation features from the ordered attributes.Meanwhile, the structure of the CNN gives more potentiality for better classification performance.Therefore, by combining FDIDWT with the MatchboxConv1D model, we obtain the best test accuracy for both Mission A and B in case #6.Besides, the dimension is transferred to be 8, which is significant for reducing the computing burden in big astronomical data processing.It may also be concluded that most features of Dataset A and B exist at one higher resolution since case #6 corresponds to decomposition level 1. (see Tab. 3).
The prediction results have been carried out with an accuracy of 96.65% ± 1.32% and 92.03% ± 2.2% for Mission A and Mission B, respectively, using the proposed method, and the results are listed in Tab. 9. A general comparison between the predicted AGNs of Mission A and the originally confirmed AGNs is shown in Fig. 13, and the comparison between the predicted BL Lacs of Mission B and the originally confirmed BL Lacs is shown in Fig. 14, and the comparison between the predicted FSRQs of Mission B and the originally confirmed FSRQs is shown in Fig. 15.From the comparison, we can find that the distribution shapes of the 13 attributes of the predicted sources resemble the shapes of the original 4FGL DR3 sources with the corresponding classification, in general.The results indicate that our predicted sources are correctly classified.Moreover, we notice that the histograms of the attribute 'Variability Index' show a longer high-variation-index tail for the original 4FGL DR3 sources than for those predicted sources (including predicted AGNs, predicted BL Lacs, and predicted FSRQs in Fig. 13-15, respectively), and the predicted sources have more contribution on the high-variationindex head than the original 4FGL DR3 ones.This result suggests that our method has the advantage of finding less variable sources from those uncertain sources.Also, we notice that the original 4FGL DR3 sources give more contribution to the histogram tails than the predicted sources while the predicted sources give more contribution to the histogram head for the attributes of multi-band intensities ('Flux1000', 'Flux Band1', 'Flux Band2', 'Flux Band3', 'Flux Band4', 'Flux Band5', 'Flux Band6', 'Flux Band7', and 'Flux Band8').It suggests that our method has the advantage of finding relatively faint γ-ray sources from those uncertain ones.It also encourages that our method should be able to make efforts to identify less variable and faint sources in the era of survey telescopes, e.g., the Large Synoptic Survey Telescope (LSST, LSST Science Collaboration et al. 2009), China Space Station Telescope (CSST, Zhan 2011), etc.

The Classification Results of Mission A and Mission B
By employing the proposed method, we managed to predict 2291 uncertain sources into 1731 AGNs and 560 Non-AGNs, and predict 1493 BCUs into 948 BL Lacs and 545 FSRQs.The likelihood probabilities of these predicted uncertain sources and BCUs are shown in Tab. 9 and the distribution is displayed in Fig. 16.
We set a boundary of likelihood probability greater than 95%, which is shown as dashed red lines in Fig. 16, to claim this source as a candidate of the corresponding class.In this case, we get further constrained results into 1354 AGN candidates in Mission A, 482 BL Lacs candidates and 128 FSRQ candidates in Mission B, as shown in the last column of Tab. 9.
There have been several methods employed to classify 4FGL BCUs as FSRQ candidates or BL Lac candidates, as Mission B of the present work, in the previous works.Kang et al. (2019) utilized three supervised ML methods (RF, SVM and ANN) to make a classification for 1312 4FGL DR1 BCUs, and carried out a combined classification result of 724 BL Lac candidates and 332 FSRQ candidates.Crossing matching the results from Kang et al. (2019) and our results, we found that there are 419 overlapping BCUs in two works.Among the overlapping 419 BCUs, there are 324 BCUs predicted as BL Lac candidates and 90 BCUs are predicted as FSRQ candidates in both works, which gives our results a consistency of 98.8% with Kang's work.
Similarly, five different supervised ML algorithms (RF, LR, XGBoost, CatBoost, and neural network) were applied to the 4LAC DR3 BCUs in Agarwal (2023), and they managed to classify 610 BCUs as BL Lac candidates and 333 BCUs as FSRQ candidates.A comparison between our work and Agarwal's work suggests that there are 481 overlapping BCUs in the two works, 392 BCUs are classified as BL Lac candidates and 87 BCUs are classified as FSRQ candidates in both works, which gives our results a consistency of 99.6% with their work.Besides, Fan et al. (2022) employed three physical parameters and built diagrams among them (the photon spectrum index against the photon flux diagram; the photon spectrum index against the variability index diagram; the variability index against the photon flux) to separate the known BL Lacs from known FSRQs.Then they used the boundary to divide BCUs into BL Lacs and FSRQs.In their work, 751 BCUs were classified as BL Lac candidates and 210 BCUs were classified as FSRQ candidates.There are 492 overlapping BCUs in the two works, 409 BCUs are classified as BL Lac candidates and 83 BCUs are classified as FSRQ candidates in both works, which gives our results a consistency of 100% with their work.

CONCLUSION
In this paper, the correlation features of attribute space of the 4FGL DR3 dataset are highlighted by the proposed FDIDWT method, and the intrinsic features hidden in data are further extracted by a lightweight MatchboxConv1D model.With the combination of the FDIDWT method and the MatchboxConv1D model, we have obtained the results with an accuracy of 96.65%±1.32%for Mission A and an accuracy of 92.03%±2.2%for Mission B. As for the likelihood probability boundary 95%, we managed to classify 1354 AGN candidates in Mission A, 482 BL Lacs candidates and 128 FSRQ candidates in Mission B. A high consistency of greater than 98% emerges by comparing our predicted candidates with those from previous works.More importantly, our method has the advantage of finding less variable and faint sources.

Figure 4 .
Figure 4.The average variance ratio of the training sets in Dataset A (left) and Dataset B (right).

Figure 5 .
Figure 5.The average ratio of the PID of ASC to ID (magenta color) and the average number of ASC (blue color) with different correlation threshold ξ for the training sets of Dataset A (left) and Dataset B (right).

Figure 6 .
Figure 6.The IC for each attribute when ξ = 0.5 for the training sets of Dataset A (left) and Dataset B (right).

Figure 7 .
Figure 7. Wavelet coefficients for a three-level pyramid transform.

Figure 8 .
Figure 8.The representations of images Lena, Goldhill, and Peppers in (a-c) spacial domain, and (d-f) wavelet domain, respectively.For display performance, the images of coefficients have been mapped into pink color.The detail coefficients at a particular level drawn in the figures are the sum of the corresponding horizontal, vertical, and diagonal detail coefficients.
t e x i t s h a 1 _ b a s e 6 4 = " x F W u g K n a p j n 1 o q y a l b J 2 W T 6 4 r x e p F 1 u o 8 D n C I Y + r n G a q 4 Q g 1 1 8 h 7 j E U 9 4 1 p r a T L v T 7 j 9 T t V y m 2 c e 3 o T 1 8 A A i 6 l g k = < / l a t e x i t > c < l a t e x i t s h a 1 _ b a s e 6 4 = " O I u G O t V C I j t T P D 9 + 4 q

Figure 12 .
Figure 12.The comparison of average test accuracy results in two missions.The results of the proposed methods in each mission are depicted in every panel.

Figure 13 .
Figure 13.The comparison of 13 attributes between 4FGL DR3 AGNs and predicted AGNs.The histogram of 4FGL DR3 AGNs is shown in red and the histogram of predicted AGNs is shown in blue, the area under the histogram integrates into one.

Figure 14 .
Figure 14.The comparison of 13 attributes between 4FGL DR3 BL Lacs and predicted BL Lacs.The histogram of 4FGL DR3 BL Lacs is shown in red and the histogram of predicted BL Lacs is shown in blue, the area under the histogram integrates into one.

Figure 15 .
Figure 15.The comparison of 13 attributes between 4FGL DR3 FSRQs and predicted FSRQs.The histogram of 4FGL DR3 FSRQs is shown in red and the histogram of predicted FSRQs is shown in blue, the area under the histogram integrates to one.

Figure 16 .
Figure 16.The likelihood probability distribution of Mission A (upper panel) and Mission B (lower panel).The AGN and Non-AGN candidates are shown in the upper histogram with the solid line and the dashed line, respectively.The BL Lac and FSRQ candidates are shown in the lower histogram with the solid line and the dashed line, respectively.The dashed red lines are the set boundary of likelihood probability equals 95%.

Table 3 .
The wavelet decomposition vector and bookkeeping vector for the IDWT in two missions.

Table 4 .
The hyper-parameters of attribute analysis methods and classifiers.

Table 5 .
The hyper-parameters of proposed method and model.

Table 6 .
The average test accuracy results of Mission A with the ML methods.

Table 7 .
The average test accuracy results of Mission B with the ML methods.

Table 8 .
The average test accuracy results of two missions with the FDIDWT method and MatchboxConv1D model.

Table 9 .
The prediction results of uncertain sources in Mission A and BCUs in Mission B.