Decentralized Face Identification with Hierarchical Navigable Small World on Blockchain

This paper presents a novel method for decentralized storage in deep-learning-based face recognition systems using the Hierarchical Navigable Small World (HNSW) algorithm. The proposed solution utilizes Ethereum smart contracts, which acts as highly available data storage systems for storing identifiable data for authorized personnel. In addition, the solution is integrated with a centralized vector database that is in charge of vector indexing, searching and associating face embeddings to an identity on the Ethereum blockchain with anonymous hashes. Vector indexing and search processes involve different machine learning algorithms that enable computations to be carried out in a reasonable time with good matching accuracy. Specifically, we compared different approaches and selected the HNSW algorithm. Accordingly, we successfully implemented a prototype of a reliable and privacy-focused decentralized face identification system for areas under government surveillance, such as customs inspection sites. In our measurements, the system could handle 20,000 face vectors easily with high matching accuracy, and the performance could be further improved using more powerful hardware. Finally, we also propose additional methods to further scale up the system to handle millions of face vectors.


Introduction
As the demand for enhanced border security and efficient customs protocols continues to increase, the considerable volume of data produced by these systems poses significant storage challenges.Performing face identification sufficiently fast for human-machine interactivity is particularly challenging on blockchains given the special nature of blockchain data structures and data validation.
Currently, facial recognition usually involves the use of neural networks that use pictures of faces to generate face embeddings, which are multi-dimensional vectors that describe key features of a human face.For the program to recognize a face in a timely manner, developers would not have iterated over each entry in the database because it would take  () time to recognize a face.In addition, the Deep Neural Network generates slightly different vectors for the same person each time.Furthermore, for a feasible user experience, the face feature vector similarity search time should be small to unnoticeable, which Building an image index requires that face images from the same person correspond to the same index slot or similar slots of the index.Therefore, the most important factors in designing an index are ensuring the performance and maintaining the hit rate of the index as high as possible.In our exploration of existing research, [5] used the Product Quantization algorithm to build a hash look-up table as an index.However, Product Quantization splits the embedding uniformly, neglecting the relationship between the subvectors.Thus, [6] introduced Optimized Product Quantization, using rotating vectors to flatten the distribution of values across the sub-vectors in Product Quantization.
Product Quantization and its extension methods solve the problem of variance imbalance in each subspace.However, excessive variance remains in each subspace.With the development of neural networks, [7] combined deep neural networks and Product Quantization approaches, utilizing deep learning to divide subspaces to minimize the quantization distortion of each subspace.Reference [8] optimized the deep quantization network by using orthonormal vectors to enhance the quantization performance and reduce the redundancy of the codeword, mitigating significant intra-class variations.
In addition to the approach of Product Quantization for achieving high-speed vector retrieval, vector databases have been widely used to solve these problems.According to the Pinecone website [9], which is the manual for the efficient implementation of vector databases, it is common for vector databases to build an index and use the Approximate Nearest Neighbor algorithm to search for similar vectors.Commonly used algorithms to generate indexes include Locality Sensitive Hashing, Hierarchical Navigable Small World graphs, Inverted File Index and Product Quantization.Although Product Quantization compresses the original vector and saves considerable memory, it cannot accurately represent the original vector.On the other hand, the Hierarchical Navigable Small World utilizes a graph in which every vertex connects to the nearest neighbor so that the search performance is guaranteed.However, the indexing time rises proportionally as the number of vectors in the graph increases, not to say that the graph takes up a large amount of memory during searches.Using only a single algorithm to generate indices would be inappropriate or sub-optimal because they all have drawbacks.Therefore, some vector databases have adopted composite indexing approaches to address this problem.For example, one could first use the Inverted File Index algorithm to generate cell centroids, and then leverage Hierarchical Navigable Small World graphs to conduct an approximate search among these cell centroids.

Proposed Methodology
First, we wanted to apply traditional down-scaling techniques; therefore, we investigated how classical machine learning methods may provide insights into categorizing human face features.In words, we wanted to integrate an algorithm that could partition, for instance, 100 people into five different categories, so that the system could have had  (/) time complexity where  = 100 and  = 5.
Our experimental outcomes show that K-means Clustering, a classical method for performing Product Quantization, may not be suitable for this type of task, owing to quantization distortion [6].First, we used K-means Clustering to compress the original vectors and searched for the five most similar vectors based on cosine and Euclidean distances.Not only did the correct results never appear, but we also found that the compressed vectors were similar to the original vectors; they had low cosine and high Euclidean distances.Hence, we explored alternative approaches that could produce better search accuracy.

Selecting Optimal Parameters for HNSW
A Navigable Small World graph [10] is a graph network in which each vertex connects with several neighboring vertices.It is important to ensure that each vertex is linked to adjacent vertices and to construct minimal edges.In graph theory, the Delaunay Triangulation [11] algorithm can be employed to generate a graph and solve the problem in which each vertex is associated with nearby vertices and uses minimal edges.However, the time complexity of this algorithm is very high, meaning that it takes too much time to insert a new vertex or search for the nearest neighbors.
When inserting a new vertex  into the graph, the search algorithm first calculates the top  (set on demand) vertices nearest to  before connecting these vertices with .Algorithm 1 performs excellently in the search process.As shown in Algorithm 1, the search algorithm plays an important role in Navigable Small World graphs.The entry point is randomly chosen and the search starts at it.In each step of the search process, the algorithm selects the nearest point  in its candidate list and traverses all neighboring points of .If neighbor point  has not been visited before, the algorithm compares the distance between  and  (query point) and the distance between  and  (the furthest point in the results).If  is a better choice, it replaces  in the results.The algorithm stops at the end condition where the distance between  and  is greater than the worst point in the results.In some respects, this algorithm can be regarded as a Heuristic Breadth First Search algorithm.Remove  from candidates 25: end while Output: results In Fig. 1, the green points represent query points.In the early stages of graph construction, the points are sparse and scarce.The similarity between points is relatively low; however, each point ought to connect  neighbors when inserting, leading to the formation of solid blue edges.The black dashed lines are shorter because they are constructed later when the points in the graph are intensively distributed.Notably, these blue solid edges were formed in the earliest stages of the graph.As more points are added to the graph, the blue edges no longer seem to link the neighboring vertices, whereas those blue edges are still preserved.These blue solid edges happen to form unconscious shortcuts like "highways".When performing a search, the entry point takes these "highways" that enable a faster approach to the best result (query point); later, the algorithm utilizes the black edges to get closer proximity to the final query point.This mechanism significantly accelerates the search process to a large extent.

Figure 1. Visualization of an NSW graph
However, storing all vectors in a single NSW graph and conducting a search on this graph is timeconsuming.Inspired by the skip linked-list, a multi-layer Navigable Small World structure called Hierarchical Navigable Small World [12] was invented to take a further step in boosting search time.
To solve this problem, a multi-layer graph structure was introduced, with each layer being an NSW graph.
As the number of layers increases, the number of vectors inside decreases exponentially.As shown in Algorithm 2, the insertion algorithm of the Hierarchical Navigable Small World [12], the HNSW-KNN-Search-Layer, is more complex because the vector needs to be inserted in multiple layers.Being almost equivalent to NSW-KNN-Search, the only difference between them is that HNSW-KNN-Search-Layer requires a variable   to identify the current layer being searched.
When inserting a new vector , the algorithm first calculates , which refers to the layer of  that exists in each layer from layer 0 to .As shown in Fig. 2, the probability of higher values of  generated declines exponentially as  increases, because −(U (0, 1)) transforms a uniform distribution into an exponential distribution, which has the property of exponentially decreasing probability as  approaches ∞.In addition, the algorithm greedily traverses each graph from layer    to  + 1 to find top   vectors closest to , and   is set to 1. Furthermore, in each layer   from  to 0, the algorithm still traverses the NSW graph to find the    vectors closest to .Because  needs to be inserted in the graph, the algorithm chooses the top  nearest vectors to build connections with .However, the neighbors of  are likely to have too many edges that exceed   after insertion; thus, they must be shrunk.
After these steps are completed in layer   , the results are assigned to  , meaning that in the next layer's search, the entry point is a list of neighbors from the previous layer.The reason for transferring the results to   rather than the nearest vector to   is that non-nearest vectors may have neighbors that are closer to  in the next layer.Therefore, having an    that is too small negatively impacts the search accuracy and leads to the oversight of more adjacent vectors when inserting a vector.In addition, the    is greater than , which allows leveraging the heuristics algorithm to select  vectors for building connection rather than simply finding top  nearest vectors.  = results 25: end for As shown in Algorithm 3, the HNSW search effectively searches the NSW graph layer by layer; thus,    ℎ can be adjusted to an appropriate number to achieve good performance.If    ℎ is too small, various vectors with a lower distance are omitted in the search process.By contrast, if it is set too large, the search time would increase drastically, and it would no longer be feasible in production anymore.
Owing to its hierarchical structure, the search time of HNSW is significantly reduced compared with that of searching in a NSW graph.On a higher level view of the HNSW graph, the vectors are discretely distributed in each layer, enabling the algorithm to spend less time obtaining the approximate nearest neighbor.As the algorithm approaches lower levels, the number of vectors in the layer increases drastically, enabling the algorithm to choose an entry point that near the query vector.Therefore, it is easier to attain the most accurate nearest neighbors using the Hierarchical Navigable Small World approach.  = results 7: end for 8: results = HNSW-K-NNSearch-Layer(,  ,   , 0) Output:  nearest vectors to  from results

Approximate Nearest Neighbor on Ethereum
As mentioned earlier, a vector similarity search is typically performed using Approximate Nearest Neighbor algorithms.However, because the algorithm creates a heavy computational load on modifications, it may not be feasible for blockchains, particularly Ethereum Mainnet, given the capacity limitations posed by the design of the blockchain.
Hence, to ensure vector searching performance and optimal operational costs, we suggest using a central vector database that performs periodic indexing and on-demand searching using techniques such as the Approximate Nearest Neighbor algorithms.If these functionalities are moved to the Ethereum Mainnet, the process will become slow and costly given the design of the blockchain.Nevertheless, Ethereum recently introduced zero-knowledge rollups, which could enable developers to move heavy computations off-chain [13].Zero-knowledge rollups could be one of the most feasible solutions for moving face recognition computations on Ethereum; however, they are beyond the scope of this research.
In addition, to enhance privacy and security, this study utilizes a cryptographic secure random hashing method to produce secure and unpredictable unique identifiers that are solely used to associate information between the vector database and the blockchain.This is a hybrid approach, and a compromise to the aforementioned performance problem, where a centralized vector database is used to perform fast vector searching and an Ethereum blockchain is used for reliable and transparent data storage.The theoretical design is illustrated in Fig. 3.
The vector database stores a completely random, unpredictable hash that is used to associate a person's facial features stored on the blockchain.On the other hand, the software on the operator's device obtains the hash, being the person's identifier on the blockchain, and uses it to query the blockchain for their personal details, such as their official names, gender, and government-issued ID numbers.A transaction engine should also be used to ensure data consistency between the centralized vector database and the storage in the smart contract.

Facial Embedding Generation
The client transfers facial images to the backend, which uses a Deep Neural Network Model to process the facial images into a 128 dimensional vector, also known as embedding.This vector represents the input face feature, similar faces correspond to the high similarity between the corresponding vectors: the Euclidean distance is low, and the cosine score is high.In conclusion, when facial images are passed to the backend, a Deep Neural Network called YuNet [14] first receives the images, detects the faces, and returns the coordinates of each face in the images.Subsequently, a Convolutional Neural Network based on SFace [15] intervenes and extracts the face features to generate face embeddings.

Vector Databases
The centralized vector database of choice is an open-source solution, ChromaDB.Given its simplicity to get started with, we decided to integrate it into our prototype.ChromaDB mainly uses hnswlib [12], the implementation of HNSW by its original author, to index and query vectors using the HNSW algorithm.
Introduced in 3.1, the Hierarchical Navigable Small Worlds (HNSW) algorithm [16] offers improvements to the original Navigable Small Worlds (NSW) algorithm by utilizing a hierarchical layout that drastically improves the search time to the extent of  () by sacrificing the memory footprint, because all HNSW implementations are required to load the full vertex graph tree in memory in order for HNSW to provide the benefits of high-performance searching.
In our experiments, hnswlib took over 2 min to index 22,000 vectors when 400 vectors with 128 dimensions were added to the database.Therefore, because HNSW indexing requires considerable time for large datasets, ChromaDB uses a hybrid approach that utilizes a brute force index as a buffer for vectors that have not been added to the HNSW graph.Thus, vector searching will always be available even if newer vectors are not added to the HNSW graph.However, scaling up this type of hybrid system to a large-scale mission-critical application poses significant difficulties.

Random Hash Generation
The purpose of a random hash is to associate face vectors anonymously with their respective blockchain data entries.As shown in Algorithm 4, when a new identity is created, a unique hash is generated using cryptographic methods.The system then inserts an entry into the vector database that associates the vector with this hash.Simultaneously, personnel information should be written to the blockchain along with their unique hash.

Ethereum Smart Contract as Data Storage
Considering the complex and resource-intensive nature of the Hierarchical Navigable Small World )HNSW) graphs, their implementation in the Ethereum blockchain environment presents significant challenges.Given these factors and the constraints inherent in the Ethereum blockchain, it was deemed more judicious to refrain from attempting such an implementation within the context of our current Algorithm 4 Pseudocode For Random Hash Generation 1: procedure GenerateSecureRandomHash Compute the hash of the random bytes using the hash function 6: Convert the hash into a string (e.g., a hexadecimal string) 7: Return the string 8: end procedure research project.Thus, the role of Smart Contracts in our research is consistent with that of personal data storage.
To store personal information in the Ethereum blockchain, we first define a structure for the information to be stored for personnel, as shown in Listing 1. Structs in Solidity [17], the programming language powering Ethereum Smart Contracts, are similar to C++.Next, because mappings in Ethereum smart contracts are essentially Hash Tables [18] for contract developers, the system can gather the corresponding information within reasonable time.An example of code for defining such a mapping is presented in Listing 2. The mapping key is a random hash generated by Python backend.If the Python backend can recognize a face after the vector similarity search process, a hash will be given to the user, who will read this mapping with the hash as the key using Blockchain APIs.

Listing 2. Mappings in Solidity
1 mapping(string => IdentityInfo) identities; // hash => single identity The other implementation details of the smart contract are too trivial to be included in this research, because Solidity provides easy data manipulation for structures and mappings.

Use Case Design
In this study, we designed and implemented a use case for fast identity recognition at customs inspection sites, as shown in Fig. 4. The system must store large-scale facial images and achieve efficient image retrieval in the mean time.Users are customs officers who use the web interface to check the authenticity of unregistered tourists or registered citizens.

Features of the Web Interface
As shown in Fig. 5, there are several critical components in the interface, one of which is the identification of a person in front of the camera.When the user presses the "Identify" button, the interface captures the image of the camera and sends it to the Python backend that matches the face against a large-scale face database.If a matching record is found, the web interface further queries the blockchain for the details of the person with the secure hash returned by the backend.On the other hand, for the sake of prototype demonstration purposes, creations of new identities can be also done in this interface.During the creation, the interface captures an image in the camera and asks the backend to generate a hash for the face in the image.After getting a unique hash, the interface will send the associated hash, name, gender, government-issued ID numbers and some other similar personal information to a smart contract on the Ethereum blockchain that we designed, so the data can be stored on the blockchain safely.
It is trivial to say that the interface also provides a convenience function that lists all the existing identity entries in the blockchain that the user (operator) can modify or delete.In addition, the web interface removes an identity in the record; the software only needs to delete the corresponding data on-chain and ask the Python backend to delete associated entries in the vector database.

Evaluation
As a facial recognition system, we wanted to see how the combination of our face recognition model and the vector similarity search engine performs.Thus, we tested the recognition accuracy for 1,000 vector entries to 10,000 entries.
The test iterations used at least two images from the CASIA-FaceV5 dataset [19] of the same person, one of which was added to the vector database for indexing and (at least) another to test the accuracy of the system.The results are presented in Table 1.The accuracy field in Table 1 presents the proportion of successful matches in all recognition attempts, in which a value of 1.0 is 100% accurate and a value of 0.0 is 0% accurate.It can be seen that the system has 98.72% accuracy when there are 1,000 vectors in the database, and 84.53% accuracy when there are 10,000 vectors.It then becomes apparent that accuracy is linearly related to the number of vectors in the database.As a proof of concept, this implementation should already be sufficient for most restricted sites that require identity authentication.
After the completion of our prototype, we were also concerned about the time required for vector searching and the face vector generation process.Hence, in-depth performance measurements of the system were conducted to determine how the amount of vectors could influence the system stability and performance.In theory, the search time of the HNSW algorithm should be linearly proportional to the number of vertices in the tree, that is, the number of vectors in the database.In addition, the increment of vectors in the database should only influence the time required to build the HNSW index.
To verify our theory, some feasible scenarios were designed to experiment with.This study used five iterations to test how the memory footprint of the Python backend, time used for face recogniton and time used for vector similarity search were influenced by different numbers of vectors in the database.To add the number of vectors, we used images from the IMDB-WIKI dataset [20].Table 2 shows that we doubled the number of vectors in the database for each test.

Limitations 6.1. The amount of the face images used in evaluation.
During our performance measurements, we were only able to gather 60,000 images to test database performance.Since the time complexity of vector similarity search using Approximate Nearest Neighbors grows logorithmically, that if the system is at scale, the search performance might become a lot worse than we were able to test in the research.In addition, the search accuracy may vastly differ from our measurements in production systems, where there can be millions of entries of face vectors, which would require manual adjustments to the HNSW hyperparameters.Needless to say, this study only provides a proof of concept that demonstrates the feasibility of our architectural design.

Legality of the designed use case.
The use case is solely for academic research purposes; face scanning and automatic personal data retrieval may be illegal or against constitutional laws in many countries or jurisdictions.This study only shows the concept of how blockchain can be used with a centralized vector database to store sensitive data; thus, the privacy issues of this design are subject to further research.

Future Work
It is important to mention that our implementation relies on the public Ethereum blockchain, which has all the smart contract data and interactions visible to the public Internet.Hence, a private blockchain with a design similar to that of Ethereum is preferred for protecting sensitive government data.In addition, private blockchains also provide more flexibility to development and reduce operational costs, which incentivizes their use in private organizations or government agencies.
Furthermore, it is of our concern that the centralized module, i.e. the Python backend component, being the most fragile attack vector in this system.If the module ever gets controlled by a malicious party, they would be able to control the identity authentication flow.One way to think about this is that the attacker can identify criminals as ordinary people by associating an ordinary person's hash with the criminal's face vector.To overcome this attack vector, the designers of the system should move the vector index and search process to the blockchain.Given that these operations are too heavy to be performed on layer 1, the designer would need to move these off-chains and verify later on layer 1.One way to do this is by utilizing zero-knowledge rollups [13], and further research regarding the ability to validate the results of vector indexing with the HNSW algorithms using zero-knowledge proofs should be conducted.
During our research, we became aware that ChromaDB might not be suitable for large-scale deployments, such as in our designed use case for internal government services.We chose ChromaDB for building our prototype because it is much easier to deploy and test than other existing solutions.According to the documentation in ChromaDB [21], administrators might not be able to scale up easily.One reason is the use of SQLite for persistent storage, which would prevent concurrent writes.To scale the HNSW search system up to millions of vectors, we suggest that system architects scale the search engine horizontally and implement a message queue to digest the vector buffer (vectors that have not yet been added to the HNSW graph) and a distributed in-memory cache for the HNSW index.
In addition, as discussed in Section 2.2, [7] focuses on using a Deep Neural Network and Product Quantization to compress vectors and conduct an Approximate Nearest Neighbor Search.Therefore, it may be possible to build an index based on this approach.According to the paper, the hit rate can reach more than 90% when there are more than 100,000 vectors in the index.
Regarding our design and implementation of the use case, the solution can be further extended to an AR environment where an AR device can actively identify people in view of the camera to further improve the government agents' experience.Specifically, the AR device can perform face detection using deep learning methods to obtain cropped face images locally and send them to a remote backend.Furthermore, maintainers can deploy the backend on edge computers, also known as Serverless cloud computing, to reduce human-machine interaction latency and ensure high availability (HA) of this centralized module.We imagine that the proposed use case is production-ready for government agencies for the aforementioned implementation and deployment.

Conclusion
In our journey of decentralizing face recognition systems, it has come to our awareness that vector searching is still immature in the blockchain world.Hence, we look forward to future related researches on this topic.The introduction of smart contracts can eliminate the attack vector that exists in the Python backend of our design.By defining the authentication flow in smart contracts, security can be easily audited with blockchain transaction history, and the flow would always be immutable given the nature of blockchains.
During the research, we demonstrated the readiness of the current technology chain to implement such a system that could be further improved to make real-life use.

Figure 3 .
Figure 3.The relationship of each components.

2 : 3 : 4 :
Initialize a cryptographic secure random number generator Generate a sequence of random bytes using the generator Initialize a cryptographic hash function (e.g., SHA-256) 5:

Figure 4 .
Figure 4.The diagram of the solution implementation.

Figure 5 .
Figure 5.The features of the web interface.

Table 2 .
Performance Evaluation Results Time spent in milliseconds without model loading time.c Milliseconds without time in loading HNSW index. b