Finding $r$-II sibling stars in the Milky Way with the Greedy Optimistic Clustering algorithm

$R$-process enhanced stars with [Eu/Fe]$\geq+0.7$ (so-called $r$-II stars) are believed to have formed in an extremely neutron-rich environment in which a rare astrophysical event (e.g., a neutron star merger) occurred. This scenario is supported by the existence of an ultra-faint dwarf galaxy, Reticulum~II, where most of the stars are highly enhanced in $r$-process elements. In this scenario, some small fraction of dwarf galaxies around the Milky Way were $r$ enhanced. When each $r$-enhanced dwarf galaxy accreted to the Milky Way, it deposited many $r$-II stars in the Galactic halo with similar orbital actions. To search for the remnants of the $r$-enhanced systems, we analyzed the distribution of the orbital actions of $N=161$ $r$-II stars in the Solar neighborhood by using the Gaia EDR3 data. Since the observational uncertainty is not negligible, we applied a newly-developed {\it greedy optimistic clustering method} to the orbital actions of our sample stars. We found six clusters of $r$-II stars that have similar orbits and chemistry, one of which is a new discovery. Given the apparent phase-mixed orbits of the member stars, we interpret that these clusters are good candidates for remnants of completely disrupted $r$-enhanced dwarf galaxies that merged with the ancient Milky Way.

In the standard paradigm of galaxy formation, it is believed that the stellar halo of the Milky Way was formed through numerous mergers with smaller stellar systems, such as dwarf galaxies (White & Rees 1978;Blumenthal et al. 1984). Identifying and recovering the past merger events in the Milky Way from the stellar position x, velocity v, chemistry, and age is one of the ultimate goals in Galactic astronomy.
Except for the surviving dwarf galaxies that are currently moving around the Milky Way at large Galactocentric radii, most of the dwarf galaxies that merged with the Milky Way are tidally disrupted. On the one hand, recently accreted dwarf galaxies are being disrupted to form stellar streams. These stellar streams have a spatially coherent structure, and they are extensively searched for from large photometric and astrometric surveys (Belokurov et al. 2006;Malhan et al. 2018;Shipp et al. 2018;Ibata et al. 2021;Martin et al. Email: khattori@ism.ac.jp 2022). To date, nearly a hundred stellar streams have been found (Mateu 2022), and many of these streams are thought to be associated with large merger events (Bonaca et al. 2021;Malhan et al. 2022). On the other hand, dwarf galaxies that merged with the ancient Milky Way have been completely disrupted (Bullock & Johnston 2005;Wu et al. 2022;Brauer et al. 2022). The stars that originated in these disrupted dwarf galaxies show a spatially smooth distribution due to the phase-mixing of the orbits. The lack of spatial coherence makes it hard to identify the remnants of the disrupted dwarf galaxies. To unravel the ancient merger history of the Milky Way, it is crucial to find the remnants of the completely disrupted dwarf galaxies from the 'field' halo stars.
Because the stars stripped from the progenitor dwarf galaxy move on orbits that are similar to the progenitor's orbit, the remnants of the disrupted dwarf galaxies show a clumped distribution in the phase-space spanned by the conserved orbital properties, such as the orbital action J (x, v) = (J r , J φ , J z ), angular momentum L = x × v, or orbital energy, E (Helmi et al. 1999;Gómez et al. 2010). Thus, the most promising way to find these remnants is to find substructure in the dynamical phase-space by using clustering methods (Roederer et al. 2018a;Myeong et al. 2018Myeong et al. , 2019Matsuno et al. 2019;Li et al. 2019;Koppelman et al. 2019;Yuan et al. 2020;Buder et al. 2022;Shank et al. 2022;Sofie Lövdal et al. 2022;Brauer et al. 2022).

r-II stars as the tracers of ancient merger history
Spectroscopic observations of old stars in the Milky Way have revealed that a few percent of nearby field halo stars are highly enhanced in r-process elements, such as europium (Eu, Z = 63) . Currently, more than 160 stars are known to have [Eu/Fe]> +0.7 (and [Eu/Ba]> 0) and are classified as r-II stars (Holmbeck et al. 2020). Their extremely enhanced Eu abundances and their old ages imply that each r-II star was born in a system that was polluted by Eu-rich ejecta from one r-process event, such as a neutron star merger (Hotokezaka et al. 2015).
The birthplace of the field r-II stars has been a mystery for decades. Recently, Ji et al. (2016) and Roederer et al. (2016) discovered that the ultra-faint dwarf galaxy (UFD) named Reticulum II contains seven r-II stars among nine spectroscopically observed stars in this galaxy. The discovery of this r-enhanced UFD favors a scenario wherein a majority of r-II stars in the halo were originally born in UFDs similar to Reticulum II, in which very rare r-process events, such as a neutron-star merger, occurred (Tsujimoto & Shigeyama 2014a,b). In this scenario, r-II stars were later deposited to the Galactic halo when the progenitor system was disrupted (Brauer et al. 2022).
This view is supported by the distribution of the orbital action J of field r-II stars. Roederer et al. (2018a) performed a clustering analysis for N = 35 r-II stars in the Solar neighborhood and found several clusters of stars with similar J . Although metallicity information was not used in the clustering analysis, the discovered clusters turned out to have a tight distribution in [Fe/H]. These clusters of stars have similar orbits and chemistry, and at least some of the clusters may be the remnants of dwarf galaxies that were completely disrupted long ago.

The scope of this paper
One of the limitations in previous studies of the clustering analysis of halo stars is that the clustering of stars in the orbital action J (x, v) is blurred when the observational uncertainty in (x, v) is not negligible. For example, a modest distance uncertainty of ∆d/d = 0.2 of Solar-neighbor halo stars would result in an uncertainty in action of ∆J ∼ 400 kpc km s −1 . This blurring effect is a serious problem when we aim to find dynamically cold substructure of halo stars. Indeed, the remnants of disrupted low-mass dwarf galaxies (with stellar mass M * < 10 6 M ) -including UFDs -are expected to have a small internal dispersion of σ J ∼ 100 kpc km s −1 in galaxy formation simulations (e.g., Bullock & Johnston 2005). Therefore, if we are to use conventional clustering methods to find dynamically cold substructure, we need to use a sample of halo stars with very good distance estimates. In fact, Roederer et al. (2018a) discarded 58% of the stars from their original catalog of r-II stars because their stellar distances were associated with more than 12.5% uncertainty. In general, the requirement of using very accurate data implies that we need to discard a large fraction of data available, which reduces the scientific impact of surveys. Therefore, it is crucial to invent a new clustering algorithm that allows data sets with large uncertainty.
The originality of this paper is to introduce a greedy optimistic clustering method, which is applicable for finding clusters from noisy data sets. This method simultaneously estimates both the centroid of each cluster and the denoised (true) value for each data point. A general mathematical formulation of this method is described in the accompanying paper (Okuno & Hattori 2022). As demonstrated in Okuno & Hattori (2022), this new technique can successfully find the remnants of the completely disrupted dwarf galaxies from the realistic kinematical data of field halo stars with a Gaia EDR3-like astrometric error. In this paper, we apply one flavor of the greedy optimistic clustering method, a greedy optimistic clustering method using Gaussian mixture model (GMM) (or greedy optimistic GMM), to N = 161 r-II halo stars to find the remnants of completely disrupted dwarf galaxies. This sample is an extended version of the r-II star catalog presented in Roederer et al. (2018a), including most of the r-II stars published before the end of 2020. We emphasize that, with our new technique, we do not need to discard the sample stars even if the distance uncertainty is large.
In this paper, we first describe our catalog of r-II stars in Section 2. In Section 3, we explain that some of our sample stars have a large uncertainty in J because of the observational uncertainty. This is why we use the greedy optimistic clustering method instead of the conventional clustering methods. In Section 4, we explain the concept of the uncertainty set and describe how we generate it. This procedure is a preprocessing of the data that is required for our clustering method. The detailed implementation of our method is presented in Section 5. (An intuitive illustration of our method is presented in Appendix A.) Section 6 describes how we perform the clustering analysis and presents the fiducial results. We find that the results in Roederer et al. (2018a) are recovered in our analysis. In Section 7, we discuss the implications from our results. In particular, we validate our clustering results by using the chemistry data in Section 7.1. We find six clusters with a tight distribution in chemical abundances that played no role in the clustering process or the initial selection for inclusion in our sample, which we interpret as revealing the remnants of completely disrupted dwarf galaxies (Section 7.2.1). We further analyze the connection of our results to the Galactic merger history (Sections 7.3 and 7.4). In Section 8, we summarize our paper.
2. OBSERVATIONAL DATA We extend the sample in Roederer et al. (2018a) and construct a catalog of r-II stars for which reliable measurements of chemical abundances and the line-of-sight velocity v los are available from high-resolution spectra. In constructing the catalog, we select N = 161 stars from literature (published before the end of 2020) that satisfy the following criteria: 1 • (i) [Eu/Fe]≥ +0.7 and [Eu/Ba]> 0; • (ii) small v los variability (no hint of binarity); and • (iii) not being a member of known dwarf galaxies or globular clusters.
When observational uncertainties overlap with the boundaries defined by criterion (i), we apply our best judgment to assess inclusion in the catalog, based on spectral quality, other heavy-element abundance ratios, and-when available-confirmation by independent studies.  (Gaia Collaboration et al. 2021) to construct the catalog to be analyzed in this paper. For those stars that are reported in Roederer et al. (2018a), we adopt the uncertainty σ vlos in the line-ofsight velocity as in Roederer et al. (2018a). For the rest of the stars, we use either the literature value or assign some reasonable value. In any case, the value of σ vlos is small for all of our sample stars and thus the detailed value of σ vlos does not affect our results.
For all of these stars, Gaia EDR3 provides reliable measurements of the five-dimensional astrometric quantities, (α, δ, , µ α * , µ δ ), as well as their associated uncertainties. In this paper, X and σ X denote the pointestimate and the one-dimensional uncertainty for the quantity X, respectively. We assume that the uncertainties associated with the right ascension and declination (α, δ) are negligible; and thus we assume (α, δ) = ( α, δ). For parallax, we have the (zero-point corrected 2 ) point estimate and its uncertainty σ . For proper motion, we have the point-estimate ( µ α * , µ δ ), their onedimensional uncertainties (σ µα * , σ µδ ), and the Pearson's correlation coefficient in these uncertainties ρ corr .
The name of r-II stars and their chemical abundances are listed in Appendix D. We note that our catalog based on Gaia EDR3 supersedes the catalog in Roederer et al.  Observed uncertainty in parallax measured by /σ and the resultant uncertainty in action ∆J for our sample stars. We can see that stars with /σ 10 result in ∆J > √ 3 σJ (horizontal dashed line), where we set σJ = 100 kpc km s −1 . Those stars with ∆J > √ 3 σJ cannot be adequately handled with a conventional Gaussian Mixture Model comprising of K Gaussian distributions with internal dispersion σJ . To handle this noisy data set, we invent the greedy optimistic clustering method.

WHY DO WE NEED THE GREEDY
OPTIMISTIC CLUSTERING?
Before we perform a clustering analysis, we explain the difficulty of using the conventional clustering methods for our data set. We note that this Section is independent from the main analysis of this paper.
Let us compute the uncertainty in the orbital action given the error distribution of the observables in Section 2. For each star in our r-II star catalog, we randomly draw (α, δ, , µ α * , µ δ , v los ) from the associ-ated error distribution. After drawing a large enough number of instances (realizations), we discard instances with negative parallax and select 200 instances with > 0. By converting these observed quantities, 3 we obtain 200 instances of the orbital action, J = (J r , J z , J φ ). For each star, we denote the standard deviation in each dimension as (∆J r , ∆J z , ∆J φ ). The uncertainty in the orbital action for each star is given by ∆J = The uncertainty ∆J is dominated by the uncertainty in parallax (or equivalently, distance). This is because the parallax uncertainty propagates to both the threedimensional position and the two-dimensional tangential velocity. Fig. 2 shows ∆J as a function of the signalto-noise ratio of the parallax, /σ . On average, ∆J increases if the parallax measurement becomes poorer. This trend partially justifies previous studies in which sample stars are selected based on a threshold on /σ (Roederer et al. 2018a). However, using such a threshold means that we end up discarding some fraction of the original data, which is not desired.
As we will describe in Section 5, in this paper, we want to find clusters that typically have an intrinsic dispersion of σ J ∼ 100 kpc km s −1 in each dimension in the Jspace. Thus, if a star's orbital action is associated with an uncertainty ∆J √ 3×100 kpc km s −1 , conventional clustering methods may fail to assign the star to the correct cluster. In this regard, about a quarter of our sample have ∆J √ 3 × 100 kpc km s −1 (see Fig. 2) and therefore they are not suited for conventional clustering analyses.

PREPROCESSING OF THE DATA: GENERATION OF THE UNCERTAINTY SET
An important difference between this paper and Roederer et al. (2018a) is that we use the greedy optimistic clustering method, which allows a clustering analysis of noisy data (Okuno & Hattori 2022). This method requires a special preprocessing of the data. Namely, for each star, we generate M synthetic data points that represent the uncertainty in the data. We call these synthetic data points as an uncertainty set. 4 5 In this Section, we describe how we generate the uncertainty set. In Section 4.1, we describe the uncertainty set of the observed quantities, D obs i , for ith star in the catalog. In Section 4.2, we convert D obs i to obtain the 3 The detailed procedure to compute the orbital action is the same as in Section 4. 4 In Okuno & Hattori (2022) we used a term empirical uncertainty set, but we omit empirical for brevity in this paper. 5 The uncertainty set is not restricted to be a Monte-Carlo realization of the error distribution. Uncertainty set typically aims at approximating a confidence region, but not approximating the distribution itself. For example, in the main analysis of this paper, the parallax in the uncertainty set is drawn uniformly from a userspecified region. uncertainty set of the orbital action of ith star, D action i . We note that D action i will be used in our analysis.

Uncertain set of the observed quantities
As mentioned in Section 2, the six-dimensional observables for ith star (α, δ, , µ α * , µ δ , v los ) i are associated with observational uncertainties. To represent these uncertainties, we generate M = 101 six-dimensional observable vectors, which we call the uncertainty set. To be specific, the uncertainty set of the observed quantities of ith star is expressed as In the following, we describe the procedure to generate jth instance of the uncertainty set of ith star, (α, δ, , µ α * , µ δ , v los ) i,j .
First, we set j = 0th instance of the uncertainty set to be the same as the point-estimate of the observables: Second, for j = 0, we neglect the tiny observational errors in the sky position (α, δ), and we set Third, for j = 0, we draw the line-of-sight velocity and proper motion from the corresponding error distribution: v los,i,j ∼ N ( v los,i ; σ 2 vlos,i ), Here, A ∼ B indicates that A is drawn from a distribution B; and N (m; S) corresponds to a Gaussian distribution with mean m and covariance matrix S. Lastly, for j = 0, we draw the parallax from a uniform distribution within two standard deviation range, With this setting, the parallaxes of j = −50th, 0th, and 50th instances of D obs i is i − 2σ ,i , i , and i + 2σ ,i , respectively. 6 7 6 In generating the instances of the parallax, we do not use a Gaussian distribution. We note that using a Gaussian distribution (truncated within two standard deviation range) would not change the result significantly, as long as the instances of the parallax are densely distributed (e.g., M 100). 7 We neglect the covariance between the proper motion and parallax. We have confirmed that the inclusion of the covariance hardly affects our results.
Up to this step, we have generated the uncertainty set D obs i without taking care of their physical meanings. For example, if the parallax uncertainty σ ,i is large for star i, the uncertainty set for this star may include an instance with negative parallax. To properly handle negative parallax (which is unphysical) and to compensate for the effect of observational uncertainty, we introduce a penalty function 8 9 The hyper parameter λ ≥ 0 is designed to tune the strength of the penalty (see also equation (14)). In the fiducial analysis of this paper, we set λ = 0, and therefore the detailed implementation of the penalty function is not important in our fiducial analysis.
We repeat the same procedure for all j and generate the uncertainty set D obs i consisting of M = 101 instances of observable vectors. Then we repeat the same procedure for all the stars, i = 1, · · · , N .

Uncertainty set of the orbital action
To perform a clustering analysis in the orbital action space, we need the uncertainty set of the orbital action.
Here we describe the procedure to map D obs i (equation (1)) to the uncertainty set of the orbital action of ith star For each (i, j), we first convert the observables (α, δ, , µ α * , µ δ , v los ) i,j into the position and velocity (x, v) i,j in the Galactocentric Cartesian coordinate. In this step, we assume the position and velocity of the Sun as described in Hattori et al. (2021). Then we map (x, v) i,j to the orbital action In computing the orbital action, we assume the gravitational potential of the Milky Way in McMillan (2017) and we use the AGAMA package (Vasiliev 2019).
In the top row in Fig. 3, the gray dots show the distribution of D action i for our entire sample. We see that the uncertainty set for each star typically shows a bananalike shape, reflecting the fact that the parallax uncertainty dominates the uncertainty in J .

GREEDY OPTIMISTIC GMM
8 Suppose that a star has i,j > 0 for all j. Then, the penalty is 0 if j = 0 (if i,j = i ). Also, the penalty is 2λ if j = ±50 ( i,j = i ± 2σ ,i ). 9 Among N = 161 stars in our sample, there is no star whose observed parallax is negative; and there are only seven stars with parallax over error < 2. This means that only seven stars satisfy ( − 2σ < 0). Thus the treatment of rejecting negative parallax is relevant for these seven stars only.

Gaussian Mixture Model in action space
In this paper, we assume the Gaussian Mixture Model (GMM) to describe the intrinsic distribution of J , which consists of K isotropic Gaussian distributions with identical one-dimensional dispersion σ 2 J . Under this assumption, 10 the probability distribution of a randomly chosen star is expressed as Here, I denotes identity matrix, k = 1, · · · , K is the index for the normal distributions representing clusters, with the model parameters θ = {(π k , J k ) | k = 1, · · · , K; π k ≥ 0; K k=1 π k = 1} to be estimated. In GMM clustering shown below, J k ≡ ( J r k , J φ k , J z k ) corresponds to the centroid of kth cluster.

Conventional GMM
Let us first consider a case where we use the pointestimate of action J i for each star i to perform a clustering analysis. This case corresponds to an ideal case where we have no observational uncertainties.
In this case, the logarithmic likelihood of the data D = { J i | i = 1, · · · , N } given the model parameters θ = {(π k , J k ) | k = 1, · · · , K} is given by The expectation-maximization (EM) algorithm tries to find the best parameters by iteratively improving them with the following steps (Dempster et al. 1977;Neal & Hinton 1998;Bishop 2006).
We note that γ ik is called responsibility, which quantifies the contribution of ith star to kth cluster. The quantity N k represents the effective size of kth cluster. The EM algorithm iteratively updates γ ik and the parameters such that the likelihood function is monotonically improved until convergence.

Greedy optimistic GMM
The above-mentioned conventional GMM does not take into account the uncertainty in the data. Therefore, the conventional GMM does not work when J is associated with large uncertainties. This is why we introduce a new flavor of clustering method, the greedy optimistic clustering method (Okuno & Hattori 2022). An intuitive illustration of this method is given in Appendix A.
The key ideas of our method are as follows: 1. We make an optimistic assumption that the true orbital actions of N stars {J true i | i = 1, · · · , N } are highly clustered in the J -space.
2. Given the uncertainty set D action i consisting of M instances of the orbital action of ith star {J i,j }, we assume that one of the instances J i,βi is very close to the true orbital action J true i . Here, β i is the index of the 'best' instance (among M instances) in the uncertainty set D action i . 3. Given the two assumptions above, we greedily find the best configuration {J i,βi | i = 1, · · · , N } of N stars such that their distribution is most highly clustered in the J -space. We call the best instance J i,βi the greedy optimistic estimate of the orbital action for star i.
In the greedy optimistic clustering algorithm, we simultaneously estimate the cluster parameters {(π k , J k ) | k = 1, · · · , K} as well as the best indices {β i | i = 1, · · · , N } such that the clustering of N points {J i,βi | i = 1, · · · , N } is most enhanced. For this purpose, we define an object function to be maximized Here, λ ≥ 0 is a hyper parameter that controls the strength of the penalty term. Also, is the uncertainty set of the entire sample, and is the model parameters. We note that the model parameters now include {β i }.
To find the solution, we use an iteration algorithm, which we refer to as the GOEM algorithm. In the GOEM algorithm, we introduce a 'GO step' (greedy optimistic step) in addition to the E and M steps: • (GO step) For a given set of {(π k , J k ) | k = 1, · · · , K}, update {β i | i = 1, · · · , N } so that the object function f (D action |θ) is maximized.
• (E step) For a given set of {(π k , J k ) | k = 1, · · · , K} and {β i | i = 1, · · · , N }, compute γ ik and N k given by • (M step) Update {(π k , J k ) | k = 1, · · · , K} with the following expressions: We note that, in the GO step, we can optimize β i separately for each i. The GO, E, and M steps always improve the object function f (D action |θ). Just like the conventional EM algorithm, the GOEM algorithm would converge to a local maximum, and there is no guarantee that the derived solution is the global maximum. To find a better solution, we try many initial conditions of the parameters and use the split-and-merge algorithm (Ueda et al. 1998).

Difference between the conventional GMM and the greedy optimistic GMM
In the greedy optimistic GMM, for each star i, we have a freedom to choose an instance from the uncertainty set D action i . In a sense, we can move the locations of the data points according to their uncertainties and can search for the best configuration such that the N data points are most highly clustered. This characteristics is in contrast to the conventional GMM, where the data point is stuck to its point estimate. Note that the greedy optimistic GMM is distinct from extreme deconvolution (Bovy et al. 2011), that further assumes that the centroid J k of kth cluster also follows a distribution, i.e., cluster center has uncertainty.
In the greedy optimistic GMM method, we introduce a penalty term in the object function (second term in equation (14)). This penalty term is designed to favor the instances whose parallax is closer to the pointestimate of the parallax. The strength of the penalty term is governed by the hyper parameter λ. On the one hand, in the limiting case of λ → ∞, the optimal solution satisfies β i = argmin β {penalty(i, β, λ)}. In this limit, the greedy optimistic GMM reduces to the conventional GMM, because we no longer have the freedom to choose among M instances. Thus, the greedy optimistic GMM is a generalization of the conventional GMM. On the other hand, in the limiting case of λ = 0, greedy optimistic GMM treats all the instances equally. Interestingly, the mock data analysis in Okuno & Hattori (2022) suggests that the performance of the clustering is reasonably good when we set λ = 0. Indeed, we adopt λ = 0 in the fiducial analysis of this paper.

ANALYSIS AND RESULTS
After preparing for the uncertainty set {D action i }, we perform a greedy optimistic clustering analysis for our sample stars.

List of hyper parameters
In the greedy optimistic GMM, we need to fix three hyper parameters (σ J , λ, K), which will be described in the following.

Internal dispersion of the cluster: σJ
The hyper parameter σ J determines the internal dispersion of a cluster in J -space. In realistic galaxy formation simulations, completely disrupted low-mass stellar systems typically have a dispersion of ∼ (100-200) kpc km s −1 in J -space (Bullock & Johnston 2005). Motivated by this result, we fix σ J = 100 kpc km s −1 .
This value of σ J is also supported by the observed properties of UFDs hosting r-II stars, such as Reticulum II, which has an internal velocity dispersion of σ v ∼ 3 km s −1 (Koposov et al. 2015;Walker et al. 2015;Minor et al. 2019). If a stellar system with a similar velocity dispersion is disrupted in the inner part of the Milky Way (r disruption ∼ 10−30 kpc), we expect that the stars stripped from the system typically have a spread in action of σ v × r disruption ∼ (30 − 90) kpc km s −1 .
In principle, we can make σ J as a free parameter to be fitted in the analysis. However, we fix σ J to stabilize our analysis, following the mock data analysis in Okuno & Hattori (2022).

Number of clusters: K
The number of clusters K is hard to choose, because fixing K is equivalent to fixing the typical size of a cluster, N/K, which is unknown. After some experiments, we adopt six values, K = 10, 20, 30, 40, 50, and 60. 6.1.3. Strength of the penalty term: λ The hyper parameter λ governs the penalty term in our greedy optimistic GMM method. To see how our result is affected by λ, we adopt six values, λ = 0, 10 −4 , 10 −3 , 10 −2 , 1, 10 2 . On the one hand, λ = 0 corresponds to the most optimistic case where we pay the least respect to the observed parallax. On the other hand, the case with λ = 10 2 is almost equivalent to the conventional GMM.
After finding the best solution, we assign each star to the nearest centroid. Namely, ith star is assigned to c i th cluster such that We count the number of member stars N member,k for kth cluster. 11 We note that this hard-assigment sometimes results in clusters with no members assigned, especially when λ ≤ 10 −2 and K ≥ 50. This is one of the reasons why we do not explore K > 60 in our analysis. If kth cluster has N member,k ≥ 2 members, we compute the internal dispersion in (J r , J z , J φ ): and the total internal dispersion: Here, For each set of hyper parameters (K, λ), we compute the median of {σ k } for clusters with N member,k ≥ 2. 11 We note that N member,k N k (= π k N ). N member,k is an integer resulting from the hard assignment, while N k is a real number resulting from a soft assignment. When λ is fixed, the median value of {σ k } is a decreasing function of K, so we choose the optimal K such that this median is closest to σ J (= 100 kpc km s −1 ), which is the fixed hyper parameter. As a result, we found that K = 30 is optimal for λ ≤ 1, while K = 50 is optimal for λ = 10 2 . This result indicates that we need K = 30 clusters to represent the action distribution for our N = 161 stars if we adopt the greedy optimistic clustering, almost independent of the strength of the penalty term (unless the penalty term is too strong). Thus, we choose K = 30 as the fiducial value. With K = 30, we have a freedom to choose 0 ≤ λ ≤ 1. For simplicity, we choose λ = 0 as the fiducial value, motivated by the mock analysis in Okuno & Hattori (2022).

Fiducial result
Here, we focus on the result obtained for the fiducial set of hyper parameters, (K, λ, σ J ) = (30, 0, 100 kpc km s −1 ). We call the clusters of r-II stars in our fiducial result simply as r-II clusters. Also, following recent convention (Yuan et al. 2020;Gudin et al. 2021), we label the kth r-II cluster as H22:DTC-k, where DTC stands for dynamically tagged cluster. 12 Since we are most interested in the orbital properties of the r-II stars and r-II clusters, we visualize their distribution in the J -space. The top row in Fig. 3 shows the distribution of the greedy optimistic estimate J i,βi (black cross) for N = 161 r-II stars, along with their uncertainty set D action i (gray dots). 13 The second row in Fig. 3 shows the distribution of centroids of the orbital action, J k , for K = 30 clusters. Table 1 summarizes the properties of r-II clusters, including the cluster size N member,k , orbital action of the centroid J k , the total internal dispersion of the orbital action σ k , and the chemical properties. As seen in Table 1, σ k is approximately 100 kpc km s −1 , which is a natural consequence of the way we choose our hyper parameter K. We note that the chemical information is not used in our clustering analysis, but will be used to validate our clustering results (see Section 7.1).
Tables 2 and 3 present the detailed chemical and dynamical information of individual r-II stars analyzed in this paper. Due to their length, we put them in Appendix D. Table 2 lists the chemical and kinematical properties of the member stars for each cluster. For an easy comparison of the results in Roederer et al. (2018a) with ours, we add a column R18 in Table 2, which indicates those stars that are included in groups A-H in Roederer et al. (2018a). We will comment on this comparison in Section 6.4. Table 2 also lists the reference for the chemical data. The orbital action listed in Table 2 is the greedy optimistic estimate J i,βi (shown on the top row of Fig. 3). For a reference, we also list the orbital energy E, which is computed as E = v 2 i,βi /2 + Φ(x i,βi ), where Φ is the Milky Way's gravitational potential (McMillan 2017). Unlike Roederer et al. (2018a), we do not use E in our clustering, because E is a function of the orbital action and therefore it provides duplicated information. The last column in Table 2 is the quantity parallax over error (= /σ ) taken from Gaia EDR3 (and not corrected for the zero-point offset of the parallax). We list this information here to make it easier to understand which member stars benefit from the greedy optimistic clustering. For example, a star with poor parallax measurements ( /σ 10) may not be assigned to the right cluster with conventional clustering methods; while a star with good parallax measurements ( /σ 10) is likely assigned to the right cluster independent of the clustering methods. (See Appendix F where we analyze N = 119 r-II stars with /σ > 10 by using standard GMM.) Table 3 lists additional information of the member stars in each cluster. All quantities in this table are not used in the clustering analysis, but they are presented to offer an intuitive understanding of the dynamical properties of stars. The Galactocentric position and velocity listed in this table correspond to (x i,βi , v i,βi ), respectively. By using the position-velocity pair, we integrate the orbit for 100 Gyr to obtain the pericentric radius (r peri ), apocentric radius (r apo ), maximum distance from the Galactic disk plane (z max ), and orbital eccentricity ecc = (r apo − r peri )/(r apo + r peri ). Because the member stars of a given cluster have similar orbital actions, they also have similar values of (r peri , r apo , z max , ecc). In contrast, the member stars of a given cluster have very different (x i,βi , v i,βi ). This result is consistent with a scenario wherein the progenitor system of a cluster accreted to the Milky Way long ago, and the stars stripped from the progenitor system are completely phase-mixed today.

Consistency with Roederer et al. (2018a)
Because we extend the analysis in Roederer et al. (2018a), we briefly check the consistency between their results and ours. As summarized in Table 1, all the stars in groups A, D, E, G, and H in Roederer et al. (2018a) are found in clusters H22:DTC-3, 2, 15, 5, and 15, respectively. Also, the majority of the members in groups B (3 stars out of 4 stars), C (3 out of 4), and F (2 out of 3) are found in H22:DTC-9, 4, and 3, respectively. These results are reassuring in that Roederer et al. (2018a) and this work apply different clustering methods to data sets with different quality and still mostly agree with each other.
Intriguingly, groups A and F are included in a single cluster H22:DTC-3 in our analysis; and groups E and H (and another star from group F) are included in a single cluster H22:DTC-15. In our analysis, the clusters H22:DTC-3 and 15 are the biggest clusters, containing N member,k = 18 stars each. Our result may reflect either of the two possibilities. The first possibility is that our new clustering method is superior to conventional methods. (Namely, our method can find a cluster that is seen as multiple clusters with conventional methods due to the observational uncertainty.) The second possibility is that our method is too optimistic. (Namely, multiple clusters with a slightly different orbital properties are mistakenly regarded as a single one due to the optimistic nature of our method.) Although we do not have a decisive conclusion on which of these possibilities is correct, it is worth mentioning that the cluster H22:DTC-3 is one of the six clusters in our analysis that we have the highest confidence based on the tight distribution in [Fe/H] and [Eu/H] (see Section 7.2.1). According to our additional analysis using high-quality data alone (see Appendix F), 10 stars among 18 stars in H22:DTC-3 are more likely to be associated with each other than the remaining 8 stars. Also, the cluster H22:DTC-15 has a tight distribution in [Fe/H] and [Eu/H] if we remove one metal-rich outlier star (see Appendix C.3).
7. DISCUSSION 7.1. Chemical homogeneity of the clusters Up to this point, we only use the orbital action of the r-II stars, and we do not use the chemical properties of these stars. (The chemical information is used to construct the sample, but the chemical information is not used for the clustering analysis.) Because our clustering analysis is based on a few assumptions, we intentionally reserve the chemical information so that we can check the validity of our clustering result.
The sibling stars born in the same dwarf galaxy are expected to have similar chemical abundances, such as [Fe/H]  15 Simulated Eu-rich dwarf galaxies also have similar chemical properties. According to Fig. 15(d) of Hirai et al. (2022), lowmass dwarf galaxies named 12,13,14, and 16 in their simulation have two or more r-II stars. These systems respectively ing metallicity dispersion equivalent to or smaller than that of Reticulum II. This result indicates that many of the field r-II stars may have originated from disrupted dwarf galaxies.
Following the procedure in Roederer et al. (2018a), we evaluated the statistical significance of the chemical homogeneity of each cluster as follows. First, for each cluster H22:DTC-k with N member,k ≥ 2, we compute the dispersions in [ To investigate the significance of the tight distribution in [Fe/H] for our r-II clusters, we did an additional analysis by using carefully constructed sample of N = 161 non-r-II stars in the literature. We find that our r-II clusters have tighter distribution in [Fe/H] than the clusters of non-r-II stars found in this additional analysis (see Appendix H for detail).
Our result indicates that, in agreement with Roederer et al. (2018a) and Gudin et al. (2021), r-enhanced stars with similar orbits tend to have similar chemistry. This result is consistent with a scenario wherein r-II stars were born in r-enhanced dwarf galaxies, similar to the UFD Reticulum II, that were later tidally disrupted when they merged with the Milky Way.

Individual clusters
Among 30 clusters, 26 clusters have N member,k ≥ 2 and four clusters have N member,k = 1. Based on the distribution of the chemical abundances in each cluster, we categorize 18 clusters that have N member,k ≥ 2 and show a tight chemical abundance distribution into Tiers-1, 2, 3, and 4. This classification reflects our confidence, such that we have the highest confidence on Tier-1 clusters.  Note-(a) R18 denotes Roederer et al. (2018a) and Y20 denotes Yuan et al. (2020). The group C in R18 contains 4 member stars, among which 3 stars are included in our cluster H22:DTC-4. This is why we put a comment 'C 3/4 (R18)' for our cluster H22:DTC-4.
In addition, there are four clusters with one member star (N member,k = 1 To obtain additional insights into the progenitor systems of Tier-1 clusters, Fig. 5 (Mg, Ca) abundances is an supporting evidence that these clusters are remnants of dwarf galaxies that merged with the Milky Way. Furthermore, we see a hint of decreasing trend of [α/Fe] as a function of [Fe/H] for clusters H22:DTC-1 and 4, which is reminiscent of the trends seen in classical dwarf galaxies, such as Sculptor and Ursa Minor (e.g., Tolstoy et al. 2009;Kirby et al. 2011), and UFDs, such as Hercules and Reticulum II (Vargas et al. 2013;Ji et al. 2022). If this trend is real, it may provide a clue that the progenitor systems of these clusters experienced a quiescent star formation activity.
The distribution of position x and velocity v of the member stars provides additional insights into the progenitor systems of Tier-1 clusters. As seen in the top row in Fig. 6, the member stars in each Tier-1 cluster show a wide spatial distribution. This spatial distribution suggests that the member stars have different orbital phases, although they have similar orbits. The bottom row in Fig. 6 further supports this view. We see that, for clusters H22:DTC-1, 2, 3, and 4, there are almost equal numbers of stars with v R > 0 (approaching apocenter), v R < 0 (approaching pericenter), v z > 0 (moving upward), and v z < 0 (moving downward). For clusters H22:DTC-5 and 9, there are almost equal numbers of stars with v R > 0 and v R < 0, while stars with v z < 0 dominate. (The member stars in each cluster have similar values of v φ because they have similar values of J φ = Rv φ .) This result means that most Tier-1 clusters are apparently completely phase-mixed, suggesting that the disruption of the progenitor systems happened long ago. Therefore, Tier-1 clusters are likely the remnants of completely disrupted dwarf galaxies that merged with the ancient Milky Way. Of course, it is premature to conclude decisively that they really are disrupted dwarf galaxies, and we discuss the prospects to confirm these r-II clusters in Section 7.5.
In the following, we summarize other basic properties of Tier-1 clusters: Tier-1 cluster H22:DTC-1. This cluster is a newly discovered cluster characterized by retrograde, round orbits with J φ = 1209 kpc km s −1 .
This cluster is rather large (N member,k = 9), but its metallicity dispersion is only σ [Fe/H] = 0.22. In terms of q [Fe/H] , this cluster has the tightest distribution of [Fe/H] among 30 clusters. The clusters H22:DTC-1 and 9 have a similar orbit and chemistry, which will be discussed in Section 7.4. Intriguingly, two member stars of the cluster H22:DTC-1 (2MASS J09544277+5246414 and BPS CS 22896-154) were also analyzed by Roederer et al. (2018a); however they are regarded as r-II stars that are not associated to any kinematic groups in Roederer et al. (2018a).
Tier-1 cluster H22:DTC-2. This cluster is characterized by highly eccentric orbits with r peri < 1 kpc and r apo > 10 kpc. This cluster is a metal-rich cluster, with [Fe/H] = −1.65. In terms of q [Eu/H] , this cluster has the tightest distribution in Eu abundance. This cluster corresponds to the group D in Roederer et al. (2018a), containing all three stars in the group D. One of the member stars, HD 222925, is the most well-studied r-II star in terms of its chemistry .
Tier-1 cluster H22:DTC-3. This cluster is characterized by prograde, eccentric orbits with J φ = −711 kpc km s −1 . This cluster is one of the largest clusters in our analysis, with N member,k = 18. The face value of the dispersion in [Fe/H] is relatively large, σ [Fe/H] = 0.37, but its low percentile q [Fe/H] = 1.86% suggests that this cluster has a tight distribution in [Fe/H] (given its large N member,k ). This cluster includes 4 out of 4 stars from the group A and 2 out of 3 stars from the group F in Roederer et al. (2018a). This cluster also includes a r-II star (2MASS J22562536−0719562) that is associated with the DTG38 in Yuan et al. (2020). 17 Tier-1 cluster H22:DTC-4. This cluster is a dynamically cold cluster with its internal action dispersion of σ k = 63 kpc km s −1 . The member stars show prograde, mildly eccentric orbits with J φ = −889 kpc km s −1 , which corresponds to a guiding center radius of R ∼ 4 kpc. The very metal-poor nature of this group, [Fe/H] = −2.42, is in contrast to the majority of the disk stars at R ∼ 4 kpc. This cluster corresponds to the group C in Roederer et al. (2018a), containing 3 out of 4 stars in the group C. Our method finds 9 additional members of this group.
Tier-1 cluster H22:DTC-5. This cluster is characterized by retrograde, eccentric orbits with J φ = 773 kpc km s −1 . This cluster has the smallest σ [Fe/H] and σ [Eu/H] among Tier-1 clusters. This cluster corresponds to the group G in Roederer et al. (2018a), containing 2 out of 2 stars in the group G. Our method finds 3 additional member stars of this group. Two of the new members (SMSS J183647.89−274333.1 and HE 0300−0751) have poor measurement of the parallax ( /σ < 4; see Table 2), which highlights the advantage of using our clustering method.
Tier-1 cluster H22:DTC-9. Although our clustering analysis assign these extremely metal-poor r-II stars to separate single-member clusters, they have similar orbital properties. For ex-J00405260−5122491, which is included in the cluster H22:DTC-15. Although Yuan et al. (2020) claims these two r-II stars are associated with a single group (DTG38), our analysis associate them to two clusters (clusters H22:DTC-3 and 15) with similar orbital properties. 18 The DTG10  ample, as seen in Table 3, both stars have prograde orbits with high eccentricity (ecc 0.88) and have similar pericentric and apocentric radii (r peri 8 kpc and r apo > 90 kpc). These similarities indicate that their origins might be related to each other, but we do not further discuss their connection in this paper.
In our analysis, the extremely metal-poor stars SMSS J024858.41−684306.4 and SMSS J063447.15−622355.0 (which are the only member of clusters H22:DTC-29 and 30, respectively) do not have sibling stars in our catalog. This result can be understood from chemical and dynamical points of view. If each of these stars was formed in a dwarf galaxy, the total stellar mass of its progenitor dwarf galaxy may have been small, according to the mass-metallicity relationship of the dwarf galaxies Naidu et al. 2022). Therefore, it is reasonable that these extremely metal-poor stars have a rather small number of sibling stars ( 10 3 stars) in the Galactic halo. Also, the fact that these two stars have highly eccentric orbit (with r apo > 90 kpc) means that most of their sibling stars are in the outer part of the Galactic halo. Because our catalog is restricted to bright r-II stars, and most of the r-II stars in our catalog are within 10 kpc from the Sun, the apparent lack of sibling stars may be understood as the spatial selection bias of the sample stars (cf. Hattori & Yoshii 2011). Malhan et al. (2022) analyzed the kinematics of the dynamical tracers in the Milky Way (i.e., stellar streams, globular clusters, and dwarf galaxies) and found that these tracers show a clumpy distribution in the orbital action and energy space (see also similar analysis by Bonaca et al. 2021). They identified six groups of dynamical tracers, namely, (i) the Gaia-Sausage/Enceladus merger group; (ii) the Arjuna/Sequoia/I'itoi merger group; (iii) the LMS-1/Wukong merger group; (iv) the Pontus merger group;

Connection to the past merger events
(v) the Cetus merger group; and (vi) the Sagittarius merger group. They claimed that these groups correspond to the past merger events in the Milky Way.
To investigate if any of our clusters are associated with these merger groups, we check the action distributions of the clusters and the merger groups in Malhan et al. (2022). First, we adopt the literature values of the pointestimate of the orbital actions for the stellar streams from Malhan et al. (2022) and for the globular clusters from Vasiliev & Baumgardt (2021) (see bottom row in Fig. 3). We note that they computed the orbital actions with a slightly different assumptions on the position and velocity of the Sun, but this difference has a minor effect on the values of actions (with a typical difference of ∼ 100 kpc km s −1 ). We neglect this minor effect, because our conclusion is not affected. Secondly, we compute the Euclidian distance from our r-II clusters to the stellar streams and globular clusters in the action space. Thirdly, for each r-II cluster, we find the nearest neigh-   bor stellar stream and globular cluster. If the distance to the nearest neighbor object (stream or globular cluster) satisfies ||J rIIcluster − J nearest || < 600 kpc km s −1 , and if the nearest neighbor object is a member of a certain merger group (among the above-mentioned six merger groups (i)-(vi)), we interpret that the r-II cluster is also associated with the same merger group. As a result, we find eight r-II clusters that are associated with four merger groups (as summarized below); and there are no r-II clusters that are associated with the Cetus and Sagittarius merger groups.
(i) Gaia-Sausage/Enceladus merger group Two r-II clusters, H22:DTC-3 and 5, are dynamically associated with the Gaia-Sausage/Enceladus merger group (see top row in Fig. 7). The mean metallicity of these r-II clusters are [Fe/H] = −2.37 and −2.62, respectively. These r-II clusters are more metal-poor than the majority of the stellar streams and globular clusters associated with the Gaia-Sausage (ii) Arjuna/Sequoia/I'itoi merger group One r-II cluster, H22:DTC-11, is dynamically associated with the Arjuna/Sequoia/I'itoi merger group (see second row in Fig. 7). The mean metallicity of this r-II cluster is [Fe/H] = −1.39. This metallicity is slightly higher than the metallicities of the stellar streams and globular clusters associated with the   Malhan et al. (2022). In all panels, the filled symbols represent the r-II clusters in our analysis, and the open symbols represent the stellar streams and globular clusters in the merger groups. In each row, we highlight one of the merger groups with blue symbols. The rest of the merger groups are shown by gray symbols. The clusters associated with the highlighted merger group are shown by blue filled circles. (The number corresponds to k of the r-II cluster H22:DTC-k.) The rest of the clusters are shown by light-red filled circles. We do not find clusters associated with the merger groups Sagittarius and Cetus.
• The nearest neighbor to the r-II cluster H22:DTC-11 is the Phlegethon stream ([Fe/H]= −1.96±0.05; Martin et al. 2022), which is a member of this merger group (Malhan et al. 2022). Given that this r-II cluster has a mean metallicity [Fe/H] = −1.39, it may be the most metal-rich member of this merger group.
• The r-II cluster H22:DTC-17 is also associated with this merger group; but our reasoning mainly comes from external knowledge that this cluster includes an r-II star ) associated with the Indus stream ([Fe/H]= −1.96±0.33; Li et al. 2022b) and that the Indus stream is a member of the LMS-1/Wukong group.
• For the r-II cluster H22:DTC-28 which is a onemember cluster, the nearest neighbor objects are Pal 5 and NGC 5272 (M3; [Fe/H]= −1.45 ± 0.03; Sneden et al. 2004), both of which are members of this merger group.
(iv) Pontus merger group Three r-II clusters, H22:DTC-2, 13, and 15, are dynamically associated with the Pontus merger group (see bottom row in Fig. 7). The mean metallicities of these r-II clusters are [Fe/H] = −1.65, −2.51, and −2.43, respectively. It is intriguing that two of these r-II clusters are more metal-poor than the stellar streams and globular clusters associated with the Pontus merger group (−2.3 ≤[Fe/H]≤ −1.3) (Malhan et al. 2022).
• For the r-II

A very-metal-poor merger group candidate
Two Tier-1 clusters, H22:DTC-1 and 9, are very metal-poor with [Fe/H] −2.8. As seen in Section 7.2.1, these clusters have tight distributions in chemical abundances, making them reliable candidates for disrupted dwarf galaxies. Interestingly, these two clusters have very similar dynamical and chemical properties.
The distance between these r-II clusters in the Jspace is only 717 kpc km s −1 , which is much smaller than the typical distance between two points randomly drawn in the action space. For example, if we randomly draw two r-II clusters from the 30 clusters in Table 1, we have only 4.4% chance of getting a distance smaller than 717 kpc km s −1 . Therefore, it is likely that these clusters are dynamically associated with each other. Intriguingly, as seen in the third row in The similarity of the chemical properties indicates that these clusters may have experienced a similar star formation history. Based on the chemo-dynamical similarity and the fact that they are separated from other merger groups, we postulate that these r-II clusters may be the remnants of two very-metal-poor dwarf galaxies that merged with the Milky Way together, corresponding to yet another merger event.
In the following, we estimate the stellar mass of the progenitor dwarf galaxies of the r-II clusters H22:DTC-1 and 9, which we refer to as M * (H22:DTC-1) and M * (H22:DTC-9), respectively. According to the massmean metallicity relationship of the disrupted dwarf galaxies (Naidu et al. 2022; see also Kirby et al. 2013 Our estimate of M * is supported by the distribution of stars in the color-magnitude diagram for each r-II cluster. Based on the Padova stellar evolution model (Bressan et al. 2012; ver 3.6), a stellar population with an initial total mass of 10 4 M and metallicity of [Fe/H] = −2.8 is expected to have, on average, 4.4 stars with M G < 0 (brigter than the majority of the horizontal branch stars) at the age of 10 Gyr. The expected number (4.4) is, at face value, consistent with the observation: we have 7 and 3 stars with M G < 0 in r-II clusters H22:DTC-1 and 9, respectively. Of course, we admit that this argument needs to be treated with care, because we do not know the completeness of the member stars with M G < 0 and the contamination in our r-II clusters. However, the distribution of stars in the color-magnitude diagram is at least consistent with the view that the progenitor dwarf galaxies of our r-II clusters H22:DTC-1 and 9 had M * ∼ 10 4 M before they were disrupted.

Prospects for confirming our r-II clusters
The aim of this paper is to find candidates for r-II clusters which are likely remnants of disrupted dwarf galaxies. Due to the optimistic nature of our clustering method, at this moment it is premature to conclusively determine whether any of these r-II clusters are real. Here we discuss two prospects to confirm or refute the reality of these r-II clusters.
Because dwarf galaxies have their own chemical enrichment history, dwarf galaxies show different trends in ([Fe/H], [X/Fe])-space (Tolstoy et al. 2009). By obtaining spectroscopic measurements of various elemental abundances, we may be able to find real r-II clusters that are remnants of dwarf galaxies.
An issue of our greedy optimistic clustering method is that stars with poor parallax (or distance) measurements can contaminate real clusters. This kind of contamination effect can be reduced by improving the distance estimates of the r-II stars, by using parallax from future data releases of Gaia or using better photometric distances. 7.6. Caveats in our analysis 7.6.1. Assumptions on the gravitational potential In this paper, we performed a clustering analysis of r-II stars in the orbital action space. In computing the orbital action J (x, v) from the observed position-velocity data (x, v), we assumed a gravitational potential model of the Milky Way Φ(x) and the Solar position and velocity with respect to the Galactic center. Therefore, our clustering analysis is necessarily affected by any bias on these assumptions. For example, seven stars travel beyond r > 30 kpc (see r apo in Table 3), where the perturbation from the Large Magellanic Cloud may be important (Besla et al. 2010;Erkal et al. 2019Erkal et al. , 2021Garavito-Camargo et al. 2019;Koposov et al. 2019;Conroy et al. 2021;Petersen & Peñarrubia 2021;Shipp et al. 2021). Also, some groups of stars that pass near the bulge region with prograde motion might be affected by the rotating potential of the Galactic bar (Hattori et al. 2016;Price-Whelan et al. 2016). Although these complexities are not included in our study, we believe they do not seriously affect our results, especially the result on Tier-1 clusters, because we validated our results with chemical abundance information that are independent from the dynamical analysis of this paper.

Assumptions on the distribution of the orbital action of r-II stars
We assume that the distribution of the r-II stars in the J -space is described by a mixture of isotropic Gaussian distributions with identical dispersion σ 2 J , as expressed in equation (10). This is a natural assumption if all the r-II stars in our catalog originate from small dwarf galaxies such as UFDs, but in reality it is uncertain if this assumption is valid. For example, if some fraction of r-II stars originate from disrupted large dwarf galaxy (such as Gaia-Sausage/Enceladus), we may expect diffuse and smooth background of r-II stars not associated with small clumps. Indeed, there are some indications that large dwarf galaxies do include r-enhanced stars (Matsuno et al. 2021 for Gaia-Sausage/Enceladus; Reichert et al. 2021 for Fornax dwarf galaxy; see also Xing et al. 2019, who discuss the origin of an r-II star that exhibits low [Mg/Fe]). Because the fraction of r-II stars in large dwarf galaxies are expected to be low (Hirai et al. 2022), we believe it is justifiable to apply our method to r-II stars. However, since we did not analyze the effect of diffuse and smooth background of r-II stars in our mock-data analysis (Okuno & Hattori 2022), it is unclear how such a background population may affect our results (Brauer et al. 2022). This issue may be a scope for future studies. 7.6.3. Very-metal-poor r-II stars in our sample Our analysis is motivated by the discovery of an Eurich UFD, Reticulum II. Given that known members of surviving UFDs typically have [Fe/H] < −2, it may be interesting to see how our results are affected if we pre-select very-metal-poor r-II stars with [Fe/H]< −2 before performing the clustering analysis. We did an additional analysis in this direction, as described in Appendix G. We found that the clustering results are similar to our fiducial result. With this simple [Fe/H] se-lection, one may fail to discover some interesting clusters, such as H22:DTC-2 (which corresponds to group B in Roederer et al. 2018a), whose mean metallicity is [Fe/H] = −1.65. Thus, we assert that using the entire sample of r-II stars available is more informative than using only very-metal-poor r-II stars.
8. CONCLUSION In this paper, we extended the work in Roederer et al. (2018a) and performed a clustering analysis of N = 161 r-II stars ([Eu/Fe]≥ 0.7 and [Ba/Fe]< 0) in the orbital action (J ) space. Our data set is the largest catalog of r-II stars, which includes r-II stars discovered before the end of 2020 (see Tables 2 and 3). For all the sample stars, we have astrometric data from Gaia EDR3, and therefore our catalog supersedes the catalog in Roederer et al. (2018a) in which Gaia DR2 data were used. To our updated catalog, we applied a newly-developed greedy optimistic clustering method (Okuno & Hattori 2022), which allows us to analyze not only stars with good observational data but also stars with poor data. As a result, we were able to analyze all the r-II stars in our catalog, without discarding stars with large observational uncertainty.
The summary of this paper is as follows.
• By using the Gaussian Mixture Model equipped with the greedy optimistic clustering algorithm, we performed the clustering analysis in the orbital action space. The orbital action distribution of N = 161 r-II stars can be described by K = 30 clusters with intrinsic internal dispersion of σ J = 100 kpc km s −1 (see Table 1 and Fig. 3).
• The groups A-H discovered in Roederer et al. (2018a) are recovered in our analysis (see Table 1). Specifically, groups B, C, D, and G are identified as separate clusters in our analysis. Groups A and F are identified as a single (big) cluster in our analysis. Groups E and H (and a star in group F) are identified as a single (big) cluster in our analysis (see Section 6.4). For all groups in Roederer et al. (2018a), we found additional member stars (see Table 2).
• Among 26 clusters with N member,k ≥ 2 member stars, 13 clusters have metallicity dispersion of σ [Fe/H] < 0.35, which is equivalent to or smaller than the dispersion in the r-enhanced UFD, Reticulum II. This result indicates that many of the filed r-II stars may have originated from disrupted dwarf galaxies (see Section 7.1).
• We validated our clustering result by using the chemical abundance data, which we did not use in the clustering analysis. Based on the tightness of the distribution of [Fe/H] and [Eu/H], we categorized our clusters into five categories: Tier-1 clusters (for which we have the highest confidence); Tiers-2, 3, and 4 clusters (with decreasing confidence); and single-member clusters (see Sections 7.1 and 7.2).
• We found six r-II clusters (H22:DTC-1, 2, 3, 4, 5, and 9;  4 and 5). Because the chemical information is not used in the clustering analysis, the chemical homogeneity of these clusters suggests that these six clusters are likely to be genuine clusters. Given that the member stars of Tier-1 clusters are apparently completely phase-mixed (see Fig. 6), we interpret Tier-1 clusters as the remnants of completely disrupted dwarf galaxies that merged with the ancient Milky Way (see Section 7.2.1). However, more data are desired to confirm this scenario (see Section 7.5).
• The cluster H22:DTC-1 is a newly discovered cluster with N member,k = 9 member stars. This cluster has the tightest distribution in [Fe/H] in terms of the quantities q [Fe/H] introduced in Section 7.1. Two of the member stars are among 35 r-II stars analyzed in Roederer et al. (2018a) and regarded by that study as r-II stars not associated with any groups. The fact that these two stars (as well as the other 7 member stars) are successfully considered as a single cluster highlights the advantage of using the greedy optimistic clustering method (see Section 7.2.1).
• Apart from a Tier-1 cluster H22:DTC-1, we identified many new r-II clusters that have not been identified in previous studies. All the nine clusters in Tiers-2 and 3 are newly discovered (see Appendix C).
• We found four r-II clusters (H22:DTC-27, 28, 29, and 30) with a single member star. Two of these clusters, H22:DTC-29 and 30, are the most metalpoor r-II clusters characterized by a highly eccentric orbit with ecc 0.88 and r apo > 90 kpc (see Section 7.2.3).
• In the hierarchical galaxy formation paradigm, some small stellar systems such as dwarf galaxies or globular clusters merge together as a group.
In accordance with this scenario, it has been claimed that some stellar streams, globular clusters, and dwarf galaxies in the Milky Way are clustered in phase space (Bonaca et al. 2021;Malhan et al. 2022). Recently, Malhan et al. (2022) reported that there are six big merger groups. We checked the 30 r-II clusters obtained in this study and found that eight r-II clusters are associated with four of the merger groups: Gaia-Sausage/Enceladus, Arjuna/Sequoia/I'itoi, LMS-1/Wukong, and Pontus (see Section 7.3 and Fig. 7).
• Two Tier-1 clusters H22:DTC-1 and 9 are very metal poor ([Fe/H] −2.8), which indicates that their progenitor systems were low-mass UFD-like systems with stellar mass of M * ∼ 10 4 M according to the mass-metallicity relationship Naidu et al. 2022). Intriguingly, these two clusters have similar chemistry and orbits. They may be the remnants of two r-enhanced dwarf galaxies that merged with the Milky Way as a group. If these clusters are associated, they may constitute a new merger group which is separate from previously known merger groups found in Bonaca et al. (2021) and Malhan et al. (2022) (see Section 7.4).

A. A DEMONSTRATION OF THE GREEDY OPTIMISTIC CLUSTERING
To perform a clustering analysis for a nosiy data set, we introduce the greedy optimistic clustering method (Okuno & Hattori 2022). In the greedy optimistic clustering, we simultaneously estimate both (i) the centroids of the clusters and (ii) the true orbital action J true i . By contrast, in the conventional clustering methods, we are mainly interested in estimating the centroids of the clusters by using the point estimate of the orbital action J i . Here we explain how these clustering methods work differently. As a demonstration, we focus on a clustering of three r-II stars under an assumption that these stars are members of a single cluster. Fig. 8 shows the distribution of J for stars A, B, and C. Stars A, B, and C respectively have /σ = 16.63, 11.18, and 26.83; and ∆J/( kpc km s −1 ) = 126, 353, and 138. The names of these stars are described in the caption of Fig. 8. These stars are taken from the cluster H22:DTC-1 that we find in our main analysis of this paper. Apart from this fact, the content of this Appendix is independent from the main analysis of this paper.

A.1. Conventional clustering methods
On the left-hand panels of Fig. 8, the point-estimate J i (i ∈ {A, B, C}) of each star (big black symbol) is derived from the point-estimate of the observables. Under an assumption that stars A, B, and C are associated with one cluster, conventional clustering methods find the centroid in the 'middle' of these three point estimates. (The definition of the 'middle' depends on the clustering methods.) The small gray symbols on the left-hand panels of Fig. 8 represent the uncertainty set of J for these stars. We see that the distribution of the uncertainty set for each star is highly elongated, despite the relatively small parallax uncertainty ( /σ > 10). The elongated shape arises from the fact that the parallax uncertainty dominates the uncertainty in J , as explained in Section 3. For example, if we adopt a small value of for star B, both J r and J z become large; and vice versa.
As we can see from the uncertainty set, the point estimate is just one of many possibilities. Even when the parallax uncertainty is small, the point estimate may be very different from the true orbital action. Because conventional clustering methods use the point estimate J i , conventional clustering methods may fail to work when the observational uncertainty is not negligible.

A.2. Greedy optimistic clustering method
In the greedy optimistic clustering, we estimate the true orbital action J true i under some assumptions and then perform a clustering analysis. Because the estimation of J true i is essential in this clustering method, we demonstrate how we estimate J true i by using the example case in Fig.8. On the right-hand panels of Fig. 8, the uncertainty sets of J are shown with small colored symbols. In the greedy optimistic clustering, we assume that the true orbital action J true i of star i is very close to one of the M instances (realizations) {J i,j } in the uncertainty set of that star. In other words, we assume that one of the M 3 combinations of {(J A,j A , J B,j B , J C,j C )} is very close to the true orbital actions of these stars (J true A , J true B , J true C ). In addition, we optimistically assume that the true orbital actions (J true A , J true B , J true C ) are highly clustered in the J -space. Under these assumptions, we perform a greedy search for the best combination. As a demonstration, among M 3 combinations of {(J A,j A , J B,j B , J C,j C )}, we find the best combination (J A,β A , J B,β B , J C,β C ) that achieves the minimum internal dispersion in J with a brute-force approach. The orbital action estimated through this procedure is the greedy optimistic estimate of the orbital action. The greedy optimistic estimates are shown by the big red symbols on the right-hand panels of Fig. 8. By comparing J i,βi (right-hand panels) with J i (left-hand panels), we can visually confirm that the greedy optimistic estimates are more condensed. Indeed, the dispersion of {J i,βi } is only 41 kpc km s −1 , while the dispersion of { J i } is as large as 203 kpc km s −1 .
The most important assumption in the greedy optimistic clustering is the assumption that the true orbital actions are highly clustered in J -space. Because there is no guarantee that this optimistic assumption is valid, it is important to validate our clustering results with an independent set of data. This is why, in the main analysis of this paper, we check the chemical information for each cluster.

B. A COMMENT ON THE COMPUTATIONAL COST OF GOEM ALGORITHM
In this paper, we use GOEM algorithm to find the best solution of the greedy optimistic GMM. Here we comment on the computational cost of this algorithm.
If we had an infinite amount of computing resources, we could use the following steps to find the best solution.
(2) We apply the conventional GMM to each of the M N configurations of the . The small gray symbols represent the uncertainty set for each star, which are shown for a reference. (Right panels) The big red symbols are the greedy optimistic estimate J i,β i . The small colored symbols represent the uncertainty set for each star. The uncertainty set for star B is marked to illustrate its banana-shaped distribution. We see that the distribution of (J A,β A , J B,β B , J C,β C ) is more condensed than that of ( J A, J B , J C ). data points. For each configuration, we find the best parameters (i.e., the centrids and weights) that maximize the object function in equation (14). (3) Find the best configurations and parameters that maximize the object function in equation (14). Obviously, such a brute-force strategy is computationally expensive, because we would need to perform GMM fitting for M N ∼ 10 322 times for our case with (M, N ) = (101, 161). (In Appendix A, we use a brute-force strategy because it is easier to understand.) Fortunately, in the mock data analysis in Okuno & Hattori (2022), we typically need ∼ 20 iterations of GO, E, and M steps to reach a good solution. This computational cost is obviously much smaller than a brute-force approach.
C. CLUSTERS IN TIERS 2, 3, AND 4 In the main body of this paper, we showed the distribution of orbital action and chemical abundances of Tier-1 clusters, for which we have the highest confidence in our results. Here we show the distribution of stars for other clusters that are worth mentioning.
Tier-2 cluster H22:DTC-7. This cluster is one of the lowest metallicity and highly prograde cluster with Tier-2 cluster H22:DTC-10. This cluster has N member,k = 4. Apart from the most metal-poor member, BPS CS 29491-069, the other three stars have poor parallax measurement with /σ < 5 (see Table 2). This is a clear example that our optimistic clustering can find a candidate group even if the observational error is large.

C.2. Tier-3: Four promising clusters
Apart from Tiers-1 and 2 clusters, there are four clusters that satisfy either q [Fe/H] < 25% or q [Eu/H] < 25%. We label these clusters as Tier-3 clusters, which include clusters with H22: 8,14,and 21. All of the Tier-3 clusters are newly found, and have N member,k = 2 member stars. The distribution of stars in the action and chemistry space is shown in Fig. 10.
Tier-3 cluster H22:DTC-6. This group is characterized by a prograde, nearly circular orbit with J φ = −2504 kpc km s −1 , corresponding to a guiding center radius of R ∼ 10 kpc.
One of the member stars of this cluster is 2MASS J15213995−3538094, which has the most enhanced value of [Eu/Fe]= +2.23 in our catalog (Cain et al. 2020 There are three additional clusters (H22: 17,and 24) that are worth attention. We label them as Tier-4. The distribution of stars in the action and chemistry space is shown in Fig. 11.
Tier - Therefore, without one outlier star, this cluster could be classified as a Tier-1 cluster. Even after removing G210-33, this cluster contains all the stars in groups E and H and a star in group F in Roederer et al. (2018a).
Tier-4 cluster H22:DTC-17. This is a cluster of size N member,k = 4. This cluster includes Gaia DR2 6412626111276193920 (also known as Indus 13) which is a member of the Indus stream . The remaining three r-II stars in this group have very different orbital phases, suggesting that they are unlikely to be the members of the Indus stream. These r-II stars might have originated from different dwarf galaxies that accreted to the Milky Way together.
This group has N member,k = 14 member stars. As seen in Fig. 11 Table 2 lists the member stars of all the K = 30 clusters and the basic chemical and dynamical properties of the member stars. Table 3 lists additional kinematical and orbital information of the r-II stars. See Section 6.3 for the description of these tables.

E. A COMMENT ON THE 'EDGE EFFECT' OF GREEDY OPTIMISTIC SOLUTION
An alert reader may notice that, in the top row in Fig. 3, the greedy optimistic estimates of some stars are located near the edge of their uncertainty sets. To confirm the validity of our result, we conduct an additional test on our fiducial solution. In this test, we use K = 30 centroids J k (k = 1, · · · , K) obtained from the fiducial analysis and the as-observed orbital action of N = 161 stars J i ≡ J i,50 (i = 1, · · · , N ). For each star i, we find the nearest centroid k such that ||J i − J k || is minimized. For each star i, we compare k and the value of k in our fiducial analysis. As a result, we find that (i) 137 stars among 161 stars (85 percent) satisfy k = k; and that (ii) 53 stars among 59 stars in Tier-1 clusters (90 percent) satisfy k = k. These results indicate that, in most cases, the choice of the instance of the uncertainty set is not critical in assigning the cluster. As demonstrated in Appendix A, our algorithm tries to shrink the cluster in the action space; and depending on the configuration of the uncertainty set and the centroid, the edge of the uncertainty set is chosen in our method.    Figure 11. The same as Fig. 4, but for Tier-4 clusters (H22:DTC-15, 17, and 24).
F. CLUSTERING ANALYSIS USING STANDARD GMM AND HIGH-QUALITY DATA The originality of this paper is that we use a new clustering algorithm, namely the greedy optimistic GMM, in finding candidates for clusters in the J -space. Although our approach has an advantage in that stars with low-quality kinematic data can be used, it has a disadvantage in that stars with low-quality data might contaminate clusters. In this regard, the resultant clusters in our fiducial analysis might be contaminated by stars that are not supposed to be true member stars. In the main text of this paper, we carefully analyze the chemical information of the member stars to conclude that Tier-1 clusters are more plausible candidates for disrupted dwarf galaxies than other r-II clusters.
To further examine the validity of our fiducial result, here we perform an additional clustering analysis using N = 119 r-II stars with high-quality astrometric data defined by parallax over error > 10. In this analysis, we fix σ J = 100 kpc km s −1 and K = 30 as in our fiducial analysis. Also, we use the standard GMM instead of the greedy optimistic GMM.
In the fiducial result, we have K = 30 clusters consisting of N = 161 stars. Among these 30 clusters, we have (i) 11 clusters in which all the member stars have high-quality kinematic data (parallax over error > 10); (ii) 2 clusters in which all the member stars have low-quality kinematic data (parallax over error < 10); and (iii) 17 clusters which include both stars with high-quality data and stars with low-quality data. From the fiducial result, we discard 2 clusters in item (ii). Also, we discard member stars with parallax over error < 10 from 17 clusters in item (iii). As a result, we end up with K = 28 clusters consisting of N = 119 stars with parallax over error > 10.
To investigate the similarity between the two sets of clusters mentioned above, namely 30 clusters described in the second paragraph (i.e., an additional clustering result using stars with high-quality data) and 28 clusters described in the third paragraph (i.e., a subset of the fiducial analysis), we compute four similarity indices. We find that the purity is 0.883, Normalized Mutual Information is 0.888, Rand index is 0.968, and F-measure is 0.846. These indices are close to unity, which means that these two sets of clusters are similar to each other.
We also investigate how the member stars of Tier-1 clusters in the fiducial result are classified in our additional analysis. As a result, most member stars in each Tier-1 cluster are successfully identified as a single cluster. This result indicates that, as long as we use high-quality data only, Tier-1 clusters can be identifiable independent of the adopted clustering method (i.e., the standard GMM or greedy optimistic GMM), supporting the plausibility of Tier-1 clusters. The details of individual Tier-1 clusters are summarized below.
(H22:DTC-1) Among 9 member stars, all the stars satisfy parallax over error > 10. Among them, 7 stars (except for J14592981-3852558 and BPS CS 22896-154) are found in the same cluster. This cluster (which is a subset of H22:DTC-1) is chemically homogeneous, with σ G. CLUSTERING ANALYSIS USING STARS WITH [FE/H] < −2 In our r-II star catalog, we have N = 118 very-metal-poor stars with [Fe/H] < −2, similar to most stars in UFDs known to date (including Reticulum II). Motivated by the observed chemical properties of UFDs, we perform an additional clustering analysis using these N = 118 r-II stars. We fix λ = 0 and σ J = 100 kpc km s −1 as in our fiducial analysis and set K = 25.
In the fiducial result, we have K = 30 clusters consisting of N = 161 stars. Among these 30 clusters, we have (i) 4 clusters in which all the member stars are metal-rich ([Fe/H] ≥ −2); (ii) 12 clusters in which all the member stars are very metal-poor ([Fe/H] < −2); and (iii) 14 clusters which include both very-metal-poor stars and metal-rich stars. From the fiducial result, we discard 4 metal-rich clusters in item (i). Also, we discard metal-rich member stars from 14 clusters in item (iii). As a result, we end up with K = 26 clusters consisting of N = 118 very-metal-poor stars.
To investigate the similarity between the two sets of clusters mentioned above, namely 25 clusters described in the first paragraph (i.e., an additional clustering result using very-metal-poor stars only) and 26 clusters described in the second paragraph (i.e., a subset of the fiducial analysis), we compute four similarity indices. We find that the purity is 0.839, Normalized Mutual Information is 0.868, Rand index is 0.966, and F-measure is 0.815. These indices are close to unity, which means that these two sets of clusters are similar to each other.
We also investigate how the member stars of Tier-1 clusters in the fiducial result (except for H22:DTC-2 because only one out of its nine member stars is very metal-poor) are classified in our additional analysis. H22:DTC-1 is divided into two separate clusters. All the very-metal-poor member stars in H22:DTC-3 are found in a single cluster. H22:DTC-4 is divided into two separate clusters. All the very-metal-poor member stars in H22:DTC-5 are found in a single cluster. Five out of six member stars in H22:DTC-9 are found in a single cluster. These results indicate that the pre-selection of the sample by [Fe/H] does not drastically affect the clustering results of Tier-1 clusters, while some Tier-1 clusters (H22:DTC-3 and H22:DTC-9) are less sensitive to the pre-selection than others.

H. ANALYSIS OF NON-R-II STARS
In the fiducial analysis, we find that many dynamically identified r-II clusters have relatively tight distribution in [Fe/H]. Here we investigate whether or not our result is statistically significant by performing the same analysis for non-r-II stars.
For this test, we carefully construct a sample of N = 161 non-r-II stars such that its [Fe/H] histogram (with a bin size of 0.25 dex) is almost identical to that of our r-II sample. For this sample, [Fe/H] is determined from highresolution spectroscopy, either from Subaru observation (Li et al. 2022a) or Gaia-ESO survey DR5.0 (Gilmore et al. 2012; https://www.gaia-eso.eu/data-products/public-data-releases).
We analyze this sample in the same manner as our fiducial analysis with K = 30. We note that we do not have [Eu/H] or [Eu/Fe] measurements for most of our non-r-II stars partly because Eu abundance is difficult to measure for most non-r-II stars. Thus, we compare the tightness in the [Fe/H] distribution for clusters of r-II stars and those of non-r-II stars. As a result, we find • that non-r-II sample has 3 clusters with σ [Fe/H] < 0.35 (while our r-II sample has 14 clusters); • that non-r-II sample has 0 clusters with q [Fe/H] < 5 (while our r-II sample has 5 clusters); and • that non-r-II sample has 4 clusters with q [Fe/H] < 15 (while our r-II sample has 9 clusters).
We see that our r-II clusters have tighter [Fe/H] distributions than the corresponding clusters of non-r-II stars. In other words, r-II stars with similar orbits tend to have smaller [Fe/H] dispersion than normal stars with similar orbits do, supporting the main result in this paper.