Modified affinity propagation

Affinity Propagation (AP) is exemplar-based clustering algorithm, this algorithmdoes not require prior knowledge of the number of clusters. The quality of clustering results is highly dependent on the “preference” value. Standard AP algorithm take “preference” value based on median or minimum value of similarity matrix, then the value is shared to all “preference” value on similarity matrix. This method does not give the best solution, because the value not represent the overall data structure. The Modified AP (M-AP) is proposed to resolve this problem. M-AP algorithmtake “preference” value based on data distribution on each row from similarity matrix. Experimental result show that M-AP cat outperform AP in quality clustering result based on Silhouette Index score.


Introduction
The process of grouping a set of physical or abstract object into classes of similar objects is called clustering. A clusteris a collection of data objects that are similar to another within the same cluster and dissimilar of data in other clusters [1].Clustering techniques have been used in many fields, such as artificial intelligence, biology, data mining, machine learning, marketing, pattern recognition and others [2].
Affinity Propagation (AP) is a new clustering algorithm proposed by Brendan and Delbert Duek [3]. Unlike previous clustering algorithm such as k-means which taking random data points as first potential exemplars, AP considers all data points as potential cluster centers.
AP algorithm requires the value of "preference" parameter as the initial input, this "preference" parameter will directly affect the quality of clusteringresulting by the AP algorithm [3]. AP algorithm take median (Pm) or minimum (Pmin) value of the similarity matrix and shared that value as "preferences" value for all "preference" in the similarity matrix. The Pm will resulting in a moderate number of clusters and the Pmin will resulting in a small number of clusters.
As [4] suggests that in many cases, setting a "preference" value based on the Pm value or Pmin value for all "preference" values in the similarity matrix is not the best solution, because the Pm value or Pmin value can't represent the overall data structure. Therefore, the determination of "preference" value becomes very important in AP algorithm, because the value will greatly affect the quality of AP algorithm itself [5].

Affinity Propagation
Affinity propagation (AP) is a new exemplar-based clustering algorithm proposed by Brendan and [3]. AP viewing all data points as a node in network, then the message exchanged recursively transmits along the edge of network until a good of exemplars emerges. Exemplar is the best data point to represent data clusters. Fiq. 1 shown how the AP works. AP take as input of real-valued similarities between data points, where s(i,k) indicates how well the data point with index k is suited to be exemplar for data point i. Because the goal is to minimize squared error, each similarity in set to a negative square error (Euclidean distance), the similarity computed as: (1)

Input preference
The "preference" value is the diagonal value of the similarity matrix, the default value of "preference"is computed from median (Pm) or the minimum (Pmin) value of similarity matrix. p = median(s(:)) (2) p = min(s(:)) This value then shared as "preference" value on similarity matrix, so every "preference" on similarity matrix has same value.

Messages passing
The process of AP can be viewed as a message passing process with two kinds of messages exchanged among data points: that message are responsibility and availability [6], these two kinds of messages can determine which points are served as exemplar and the point that belong of the exemplar [7]. Message passing process is shown in Fig. 2:

Fiqure 2. Message passing process
Responsibility, r (i, k), is a message from data point i to k that reflects the accumulated evidence for how well-suited data point k is to serve as the exemplar for data point i. Responsibility, r(i,k) computed as: Availability, a (i, k), is a message from data point k to ithat reflects the accumulated evidence for how appropriate it would be for data point i to choose data point k as its exemplar. Availability, a(i,k) computed as (5) and "self-availability" a(k,k) computed as (6).

Exemplar decision
At any point during AP process, availabilities and responsibilities can be combined to identify exemplars. Data point i will serve as exemplar if k = i, otherwise i will belong as member of exemplar k. Exemplar decision computed as: ← arg max{ ( , ) + ( , )} (7)

Modified AP (M-AP)
In this section will propose new algorithm call Modified AP (M-AP), this algorithm designed to solve AP limitation to determine the best "preference" value. in M-AP method, 'preference" value is computed for each row in similarity matrix as:

Experimental Results
This section compares the clustering performance between AP algorithm and M-AP algorithm based on Silhouette Index (SI) [8] score. The SI score computed as:  The "preference" value for AP algorithm are computed using (2), whereas in the M-AP algorithm, the "preference" value is computed using (8). The result for both algorithm is shown in table 2.

Conclusions
This paper proposed new algorithm named M-AP, in this algorithm the "preference" valueis computed each row from similarity matrix. Based on the experimental result that shown in table 2, M-AP algorithm can outperform AP algorithm, based on Silhouettes Index Score.