New Similarity

Either space or object consists of parts, where a part of it is interchangeable with part from other objects based on similarities in the nature of the parts. However, the similarity of the two objects in advance must quantified to determine the difference and the closeness. There are many ways to measure similarity and dissimilarity, but there is no measurement of the similarity of two objects by the parts semantically. In this article, we will reveal new similarity formally based on the concept of multilevel sections and triangular equation. The new similarity applied to the URL address to measure the similarity of identity of web pages.


Introduction
Each space consists of subspaces (sections), as well as each object contains the components be builder itself [1]. In each object, either the largest or the smallest components have possible sameness with the components of the other objects. Based on that principle, it is possible that two objects to be similar or not, as well as the spaces [2]. Similarity of two objects is measured based on the concept of proximity, which is used to deliver the limited value in the range [0,1]. The use of this range expressed in fuzzy theory, which functions in the information management semantically [3]. There are many similarity measurements, but it involves a sequence of components of the objects needs a special approach [4]. This paper aims to express formally something in common whereby the possibility of the components may sorted.

Basic Concept and Motivation
Generally similarity measurement is based on the concept of distance [5,6]. Each distance has opposition to the similarity. Thus, if the concept of distance grouped into different principles and uses, the similarities are also in the same group [7]. The distance based on the mathematical theory: Although the distance is always familiar by the principles of geometry, but in classical mathematic the distance involves concepts such as algebra; strings; numbers; polynomials; matrices; analysis functions; and probability, while based on the application the distance is categorized according to graph theory or coding theory [4]. The distance is also related to the world of computers that function in the Internet, for example [4,8]. Likewise, there are distances for cosmological, geography, physics and biology. However, the distances are also related to the distance scale and figurative sense [4,9]. In the world of information technology, extracting the social network involves the basic measuring the distance between two social actors [8,10], which can be used to trace the origin of the information [11], or it can be used to predict the social behaviour [12]. While in the scientific world, the distance measurement is refer to the similarity between the documents and improperly disclose the possible behaviour of scientific writing is plagiarism. The semantic technology or knowledge technology in data mining, it includes the problems about big data, require techniques involving resemblance for clustering data or the documents in certain groups [13]. Therefore, the development of literacy in general and the semantic technology needs the similarities based measurements.
No matter small or big, either object or space is composed of parts called components or subspaces section [14]. Formally, for example, object A is composed of components a or may be disclosed that a set A contains ai, i = 1,...,n. Indirectly a set A can also be broken down into subspaces based on the components i.e. empty set, subsets contains one component of A, and so on, or {{ai},{ai,aj},....} [15]. Each component in A has advantages or disadvantages compared to other components if an operation implemented, or between one with another have a relationship so that for all relationships we need a consideration for determining a way to find the components are closer to the others [16]. Therefore, on a sequence of components that likely to be existed in two or more objects need to be measured similarity between the rows of components of two or more objects.

An Approach: Toward Similarity
The concept of similarity relates to the symmetry form like line, equilateral triangle, squares or rectangles. One of symmetrical forms is the right triangle where there are three sides a, b, and c, where c as the hypotenuse, and then Therefore, each parameter of similarity measurement directed to the square form (the parameters rank two). First, we define the representation of any objects or any space.

Definition 1.
A set A is a representation of object with an operation → which assigns to each pair of components a,b in A exactly a relation h = (a → b).
As an objects a set A consists of ai, i = 1,…,n, or a space a set A can be divided into subspaces of A. The size of A is |A| = n. Lemma 2. If a set A = {ai|i=1,…,n} is the representation of an object, then the components in A have different or same relation between one to another. Proof. In binary, h = 0 or h=1 for each pair of components in A. Let R is real values for weights w of (ai → aj) where ai,aj in A or w(ai → aj), we have three: (a) (ai,ak) > (aj,ak), (b) (ai,ak) = (aj,ak), or (c) (ai,ak) < (aj,ak). Thus w(h1) > w(h2), w(h1) = w(h2), or w(h1) < w(h2), for all h1, h2, h3 in H. Proposition 1. Let H = {hj|j=1,…,m} is a set of relations between the components in A, then the relations between components be a sequence. Proof. Each relation between two components in A has a weight. The relation weights in range strongest to weakest, and the relations be one order (Lemma 2). Each relation has two points (Lemma 1): the beginning and the end. The strongest relation be first part of sequence, and then the stronger where α is angle between hi and hj.

Corollary 1. If H is a set of relations for components in A, then the resultant of relations is
Proof. To address the symmetrical forms, we assume that the angle α between two vectors in H is 90 o . Therefore, based on Equation (2) Proof. To address the symmetrical forms, we use the concept of the right triangle where a = b. Thus, we use Equation (1) to adapt the position of two vectors hi, hj in H such that rh as the hypotenuse c. Let hi = hj, we have rh 2 = hi 2 + hj 2 = 2hi 2 = 2hihi, or we reposition one of hihi be hj, so rh 2 = 2hihj or Equation (4).
In recursive, the general form of Equation (4)  As expressed by Equation (1) we obtain a normalized compression distance between the hypotenuse and two sides of the right triangle, i.e. c 2 -(a 2 + b 2 ) = 0, (6) while the comparison between the hypotenuse and two sides of the right triangle we get normalized information distance as follows c 2 /(a 2 +b 2 ) = 1.
Let us can transform a distance (dissimilarity) d from Equation (6) and Equation (7), then we can define the distance as follow [4].

Definition 4. A distance is
where a > 0 and b > 0.
Based on Equation (1) d = 0. Suppose c 2 and a 2 + b 2 are not same or c 2 < a 2 + b 2 , then 0 ≤ d ≤ 1. With that, we define a proximity or similarity as follow [4].
where |ab| is the cardinality of |a∩b|, also |a| and |b| are the cardinalities of vectors of Ai and Aj, respectively. Proof. By substituting Equation (8) into Equation (9) we have the formula as follows Based on Equation (8)

Discussion
Tim Berners-Lee [18] has set that each web page has a URL address in canonical form as follows [17]: x = pn-1 or x = pn-1?q, so the URL has n layers and each portion is separated by a slash. Suppose that for each URL has the c times duplication, u to be cm/ni, m is a number of URL parts. Thus, for each the submitted query q of a query space q = [q1,q2,…,qk] to a search engine, qj = cjmj/ni and based on Corollary 1 we obtain |q| = Σj=1,…,ga (cjmj/ni) 2 .
Suppose the paper as the object, a title of paper as the content of the query as follows: Experiments involving many URL addresses have not done to demonstrate in real terms and appraise new formula of similarity. In addition, each query does not only generate the URL addresses from web pages but also is to presents a piece of information that is recognizable as a snippet. The similarity does not happen for two or more web page URLs but also the content of the snippets. Therefore, if the similarity based on the concept of two-dimensional, then the other information also accompany the query can be an additional dimension and may be measured based on extra dimensions. This assumption may expressed as follows.
Conjecture. Similarity between two two-dimensional objects be information of objects if and only if the resultant of similarities be the similarity of the multidimensional objects.

Conclusions
We have produced a formula of similarity between two objects based on the components of the object. The similarity based on the concept of two-dimensional and the symmetry forms. By applying to the URL addresses, this work will be generating the validation that are appropriate for this similarity. Future work will be assessing what to be conjecture.