Development of Laos Khao Kai Noi rice landrace (Oryza sativa L.) core collection as a model for rice genetic resources management in the Laos National Genebank

Khao Kai Noi rice is considered as an elite quality landrace in Laos, which has led to its germplasm conservation in the Laos National Genebank. As happens with other germplasm collections, a manageable yet representative sub collection has become an essential element for researchers and breeders to simplify many activities, including those related to crop improvement, phenotype-genotype correlation and determination of diversity hotspots. In this study, 109 accessions were used as a test collection for core collection development to determine the feasibility of collection reduction in a closely related rice group. Three core collections were developed by two established methodologies and evaluated by diversity indexes, allele retention, phylogenetic distribution and geographical location. Based on SSR molecular markers and PowerCore, a reduction to 24 accessions was achieved with the conservation of complete genetic diversity. A K-means based on reduction to 24 accessions rendered slightly lesser results while based on 12 accessions resulted in a 17% diversity loss. These core collections may be useful for genebank management, research and breeding activities in the future. Also, they may as well serve to estimate core collection development behavior in other landraces and cultivars, which is fundamental in genetic resources management and utilization.


Introduction
Genebanks play a key role in the conservation, availability and use of a wide range of plant genetic resources for crop improvement in order to provide food and nutrition security. They help to ensure the continued availability of genetic resources that may have become disused in order to aid future research and breeding programs [1]. As collections tend to increase in numbers as time goes by, maintenance costs and management effort increase as well. Also, diversity analysis, phenotypegenotype correlation and low redundancy becomes challenging, which in turn affect the genebank ability to supply users and breeders with the most adequate materials to perform their activities. A suitable solution to this issue is the development of a "core collection" derived from the existing germplasm collection, which consists of a limited set of accessions, chosen to represent the genetic spectrum in the whole collection in order to facilitate mentioned management, research and supply [2].
This process is generally known as core collection development (CCD) and was first proposed by Frankel and Brown [3], and as stated, it is of high importance as it plays a significant role in the management and use of genetic resources [2]. CCD has been implemented by many genebanks in diverse crops such as rice [4-6], common beans [7], chilis [8], barley [9,10], soybeans [11] and sesame [12]. Although there exist several approaches and different particular objectives, it has become clear that many factors need to be considered for an adequate CCD, where selection criteria and genetic structure play an important role [13], as well as the evaluation procedures that best fit the particular objectives, as some core collections are being created to represent specific sections of germplasm collections [14]. Strategies for CCD of these specific sections need to be further explored and implemented.
The Laos National Genebank (LNG) is located in the Rice Research Center, National Agriculture and Forestry Research Institute (NAFRI). Among its different functions, LNG provides both active and base germplasm collections, which include more than 14,000 rice accessions. This collection includes a group of Khao Kai Noi (KKN), a high quality regional Laos landrace with high importance in local consumption and export value. Currently, about 200 accessions of KKN germplasm are conserved in the LNG, collected in different efforts since 1995 [15−17].
KKN was chosen as the model group for CCD in LNG for two principal reasons. The first one is related to the direct application of the KKN core collection in research of genetic diversity, genetic structure and diversity hotspots determination; which in turn would lead to adequate in situ/ex situ conservation recommendations, breeding improvement and supply. The second one has to do with CCD in a closely related rice landrace, in order to evaluate the properties of a core collection of such nature, and to be able to extrapolate the CCD methodologies to the entire Laos rice collection, as well as to other important landraces and cultivars in the near future.
PowerCore [18] is a reliable and open access software that has been successfully applied in the development of several subsets from rice collections [19,20], as well as from other crops like chili [8] and Turkish melon [21]. PowerCore implementation provides 100% allelic coverage, yet the target core element number depends on the original collection selecting parameters distribution. A K-means based algorithm [22] may complement our implementation of PowerCore, as it allows determination of a target core element number a priori.
In this work, we propose the establishment of core collections from KKN collection in LNB, which could be useful for breeding purposes, identifying diversity hotspots, phenotype-genotype correlation, genebank germplasm management and recommendation for on-farm conservation sites for this important rice landrace, as well as serves as a model for CCD implementation in other landraces and cultivars.

Materials and methods
One hundred and nine non-redundant accessions were selected from the available KKN in LNG as a whole collection model (Supplemental Table 1). In order to construct our whole collection model genetic data set, a random individual for each non redundant accession was selected and associated with 24 highly informative genome spread SSR data. Molecular characterization procedures have been published elsewhere [23].
Data was analyzed with PowerCore V1.0 in its default parameters, which provided a core collection number that preserves the totality of allele diversity. This number was used as fixed target in the Kmeans algorithm for CCD to construct a second core collection with the same number of elements. A third core collection was built with K-means, using half previous fixed target to determine impact on such reduction.
Evaluation of the selected core collections was performed by comparing original collection and selected core collections in terms of: Nei and Shannon-Weaver diversity indexes [18], allelic representation, phylogenetic clustering representation, and, when possible, geographical distribution. Diversity indexes and allelic representation where calculated with PowerCore. Phylogenetic clustering was determined by an UPGMA dendrogram, which was first constructed by PowerMarker V3.25 software [24] and then visualized by MEGA6 [25], where the representative accessions where tracked. Geographic distribution of those accessions with available geographical reference data were plotted in a Xiengkhouang and Houaphan Laos Provinces Map by ArcGIS (http://doc.arcgis.com/en/arcgisonline/).

Results and discussion
To establish representative accessions of KKN as a tool for breeding purposes, diversity hotspots identification, genetic resources management and on-farm conservation sites recommendations for this rice landrace, 109 accessions where analyzed for 24 SSR genome spread molecular markers. The minimum number for complete allelic representation was determined as the 24 accessions selected by PowerCore. Therefore, 24 and 12 accessions where the target for K-means CCD algorithm.
As established above, the PowerCore core collection was able to represent 100% of allele diversity,  Cluster representation among the constructed phylogenetic dendrogram is presented in Figure 2, where constructed core collections with 24 accessions included an accession from each of the 12 major clusters. Constructed core collection with 12 accessions included an accession from 9 of mentioned clusters. KKN can be classified by its phenotype in 6 groups: "Deng" (red), "Leuang" (yellow), "Hay" (upland), "Khao" (white), "Lai" (striped) and "Lai Dam" (striped and black). Twenty-four accessions included at least a member for each group, while 12 accession core collection failed to represent "Khao" (Supplementary Table 1). The geographical distribution of all core collection accessions were distributed along and covered all provinces where KKN is primarily produced (Figure 3). The core collection consists of a limited set of accessions derived from an existing germplasm collection, chosen to represent the genetic spectrum in the whole collection and should include as much as possible of its genetic diversity [2]. In this study, we aimed to determine the feasibility to create a useful core collection from a closely related rice landrace. A CCD within KKN would prove handful in more than one way, as it can provide priority accessions for in situ/ex situ conservation, as well as to optimize resources on breeding and research, and provide an important insight of how these CCD methodologies may respond to other landraces or cultivars. The first approach was to determine a core collection that would maintain all possible diversity, which is possible by PowerCore implementation. However, a 22% core collection appear as a high value, distant from the recommended consensus percentage mentioned by Frankel and Brown [3], who indicated that core collections should be reduced to 5% from large and 10% from small collections. This led to the implementation of another methodology that could establish a target value regardless of allele retention percentage. By K-means 24-accession implementation, we wanted to compare methodologies under similar conditions, while 12-accession implementation allow us to determine the behavior under mentioned percentages that have been used in similar CCD [8].
Both PowerCore and K-means 24-accession behaved very similar in terms of diversity indexes, as both clearly reduced redundancy from the original collection. K-means is not able to represent the full allele diversity. This is expected as the selection method for this algorithm is to select a sample closest to centromere of each of the target-number generated groups, which explain why Nei index is higher in K-means than in PowerCore's selected accessions. In terms of hierarchical clustering and geographic representation, these two core collections had again similar results, proving to be effective methodologies for CCD in the evaluated landrace.
K-means 12-accessions did not comply as effectively as its evaluated counterparts, and a phenotype was not included among the selected elements. However, considering the fact that this core collection had a 50% accession deficit compared to the other ones, it is interesting to note that there was only a 17% allele reduction representation; 2 of the 3 missing cluster representation in the hierarchical clustering were single accession groups; diversity indexes were maximized. It is also important to consider that the main strength of K-means is the inclusion of both agromorphological traits and genetic information, although in this preliminary study only the second one was included. There may be some circumstances where this core collection may prove useful, particularly when priority for conservation or distribution must be provided for budget limitations.

Conclusions
In summary, we present three KKN core collections that have their valuable for the landscape's genetic resources management, research and breeding. We believe that the methodologies implemented in this work may be successfully extrapolated to other rice landraces and cultivars, with similar results when selecting a core collection from a highly related original collection.

Acknowledgement
This research was supported by Grants-in-Aid #24405049 and #25257416 from the Japan Society for the Promotion of Science.