This site uses cookies. By continuing to use this site you agree to our use of cookies. To find out more, see our Privacy and Cookies policy.
Paper The following article is Open access

The implementation of k-means partitioning algorithm in HOPACH clustering method

, and

Published under licence by IOP Publishing Ltd
, , Citation K R Adzima et al 2019 IOP Conf. Ser.: Earth Environ. Sci. 243 012073 DOI 10.1088/1755-1315/243/1/012073

1755-1315/243/1/012073

Abstract

Hierarchical Ordered Partitioning And Collapsing Hybrid (HOPACH) is one of the powerful clustering methods which combine the strengths of partitioning and agglomerative clustering methods. Several partition clustering methods such as PAM, K-Means, SOM, or other partitioning algorithms can be used in the partitioning process. This process is followed by the ordering steps, then continued with the agglomerative process. The number of main clusters is determined by MSS (Mean Split Silhouette) value. MSS is used to measure the heterogeneity of the clustering result. The lower the MSS value, the more homogenous each cluster members. We select the number of clusters from the clustering results with minimum MSS. In this implementation of HOPACH, we incorporate k-Means partitioning algorithm in this HOPACH clustering method, to cluster and analyze 136 DNA sequences of Ebola viruses. The clustering process is started with collecting DNA sequences of Ebola viruses from GenBank, then followed by performing features extraction of these DNA sequences using N-Mers frequency. The extraction results are compiled to be a features matrix and normalized using the min-max normalization with the interval [0, 1] as an input data to generate genetic distance matrix using Euclidian distance. The genetic distance matrix is used in partitioning process by the K-Means algorithm in HOPACH clustering. As the results, we obtained 8 clusters with minimum MSS (Mean Split Silhouette) 0.50266. The clustering process in this article uses the open source program R.

Export citation and abstract BibTeX RIS

Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

Please wait… references are loading.