Design of network-based biocomputation circuits for the exact cover problem

Exact cover is a non-deterministic polynomial time (NP)—complete problem that is central to optimization challenges such as airline fleet planning and allocation of cloud computing resources. Solving exact cover requires the exploration of a solution space that increases exponentially with cardinality. Hence, it is time- and energy consuming to solve large instances of exact cover by serial computers. One approach to address these challenges is to utilize the inherent parallelism and high energy efficiency of biological systems in a network-based biocomputation (NBC) device. NBC is a parallel computing paradigm in which a given combinatorial problem is encoded into a graphical, modular network that is embedded in a nanofabricated planar device. The network is then explored in parallel using a large number of biological agents, such as molecular-motor-propelled protein filaments. The answer to the combinatorial problem can then be inferred by measuring the positions through which the agents exit the network. Here, we (i) show how exact cover can be encoded and solved in an NBC device, (ii) define a formalization that allows to prove the correctness of our approach and provides a mathematical basis for further studying NBC, and (iii) demonstrate various optimizations that significantly improve the computing performance of NBC. This work lays the ground for fabricating and scaling NBC devices to solve significantly larger combinatorial problems than have been demonstrated so far.


Introduction
Many combinatorial problems of practical importance require exploring a large number of possible candidate solutions to check for the existence of a correct solution and-if it exists-to find it. One example is exact cover, which needs to be solved for example when scheduling resources, such as airline fleet planning [1] and allocation of cloud computing resources [2]. Exact cover has been shown to be NP-complete, meaning that it is a difficult problem for which all currently existing algorithms scale exponentially with the cardinality of the problem [3]. In some cases, heuristics can be used to try to solve the problem for a subset of the instance space, using some knowledge about the structure of certain types of instances [4,5]. However, these approaches cannot provide any guarantees on the runtime performance, and there will always be instances that require very long runtimes. Another approach is to relax the problem and look for approximate solutions [6], with no guarantee to find the optimal solution. An approximate heuristic solution has been demonstrated to have a quantum advantage on a noisy quantum computer [7,8]. But this solution still suffers from potentially long run times and approximate solutions.
Here, we focus on a computing strategy that explores the entire solution space utilizing an energy-efficient, parallel-computation approach to solve exact cover. Whereas such approaches still require exponential computing resources (energy, memory, computing cores), they do not necessarily require exponential time, because the computing is performed in parallel. As an added benefit, individual computing steps can be performed relatively slowly, making it possible to use vastly less energy in total compared to serial computing at the necessary high frequency [9]. Specifically, we use network-based biocomputation (NBC), an approach by which a mathematical problem is encoded into a graphical, modular network embedded in a nanofabricated planar device [10]. Exploring the network in a parallel fashion using a large number of independent, biological agents, such as molecular-motor-propelled protein filaments, can then solve the mathematical problem, as demonstrated for subset sum [11] (see figure 1 for an explanation of the subset sum network algorithm). This approach, using molecular motors, has the critical advantages that it uses orders of magnitude less energy per operation than conventional computers and that it has the potential to be scalable with presently available technology. Particularly, the protein filaments that act as both computing core and memory are self-assembling and cheap to produce in large numbers. Figure 1 shows a network encoding the subset sum instance {2, 5, 9}. Agents enter the network from the top-left corner. Cyan circles represent split-junctions where it is equally probable that agents continue straight ahead or turn. Unmarked junctions represent pass-junctions where agents continue straight ahead. Moving diagonally down at a split-junction corresponds to adding that integer (numbers 2 and 9 for the purple example path). The actual value of the integer potentially added at a split-junction is determined by the number of rows of junctions until the next split-junction. The exit numbers correspond to the target sums T (potential solutions) represented by each exit; correct results for this particular set {2, 5, 9} are labeled in green, and incorrect results (where no agents will arrive) are labeled in magenta.
Here we report on the encoding of exact cover into a network format suitable for solving the problem by NBC. Given a collection S of subsets, each containing elements of a target set X, exact cover asks whether a subcollection S * of S exists such that each element in X is contained in exactly one subset in S * . In other words, exact cover has a solution when all subsets in S * are (i) pairwise disjoint, (i.e. ∀S i, S j ∈ S * .S i ∩ S j = ∅) and (ii) yield X when joined together. In the following, we will consider exact cover as a decision problem, meaning we only want to know if there exists a solution or not. This information can be used to find the exact solution by solving a partially fixed variation of the decision problem |S| times, each time enforcing the use of one set in the solution collection. By checking each fixed variation for whether it has a solution or not, the exact solution can be identified with only linear increase in computing effort [3]. would not be a solution because '1' is missing and '4' is in two subsets. A more practical example would be an outgoing airplane that needs X = {captain, first officer, purser, cabin crew}. The corresponding collection of subsets S would be formed by several incoming airplanes that each have their own varying crew combinations that need to be matched to X. Here, 1 is mapped to the most significant bit and 4 is mapped to the least significant bit. Two examples of exact cover instances (X 1 S 1 and X 2 S 2 ) are given. X 1 S 1 has a solution (highlighted in green) and X 2 S 2 does not. (C) Example network block that encodes the subset 0011. Agents arriving at input 0100 (blue) encounter a split junction (cyan rectangle) that allows the agents to choose between the path straight down-not combining the sets 0100 and 0011-and the diagonal path that combines the sets to leave at output 0111. The same is true for input 0000, where agents leave at outputs 0000 or 0011. Because input subsets containing identical elements shall not be combined (as this would violate the rules of exact cover), all inputs that contain elements already present in the encoded subset start with a reset junction (red rectangle). This junction forces the agents to take the path straight down, i.e. not combining the input subset with the encoded subset. For example an agent entering at input 0001 (red rectangle) can only leave at output 0001. The same is true for inputs 0010 and 0011 which would force agents to leave at outputs 0010 and 0011, respectively. The input and output rows are separated by two rows of pass junctions (example denoted by blue rectangle) that force agents to remain on their chosen paths (diagonal or straight down). The number of pass junction rows is always one fewer than the binary value of the encoded subset. Cyan circles mark positions of split junctions, red rectangles mark positions of reset junctions. (D) Scanning electron micrographs of examples for each junction type. The color of the frame corresponds to the rectangles in C.
In this paper, we will describe an algorithm that takes an instance of exact cover (i.e. a given target set X and a collection of subsets S) as input and creates an NBC circuit which allows to solve the underlying decision problem by counting the number of agents leaving the network at predefined exits. Moreover, we will provide the formal semantics for a computational model for capturing the networks used to solve the exact cover problem, that is suitable for studying the computational power of the NBC paradigm and forms the basis for applying formal verification algorithms to prove the logical correctness of NBC circuits.

Results
The aim of the network algorithm is to enable agents exploring the network (figure 2(A)) to find the solution by randomly choosing all possible subcollections of S and checking whether any subcollection exactly covers X. The result for an exact cover instance encoded by a particular network is then given by whether or not agents arrive at the exit that corresponds to the target set X. To achieve this, our algorithm for converting exact cover into network format performs the following steps: A detailed description of how the network algorithm works can be found in figure 2. Two examples of networks encoding an instance that has a solution and one that does not are shown in figure 3.
Because the network is explored stochastically, it should be noted that the answer is always probabilistic, i.e. it certainty depends on the problem size, the number of agents and the probabilities of making choices at the split junctions [13]. For example, if the split junctions split 50/50 and there are no errors at pass junctions, the examples given in figure 3 would require approximately 64 agents to give the correct answer with 95% confidence. If the agents can make wrong turns at pass junctions, the certainty of the answer depends additionally on the associated error rate and on the number of traversed pass junctions [11].

Network optimization
A main factor determining the speed and cost of the calculation is the physical size of the network. In this section we will introduce several optimization steps that each reduce the size of the network that needs to be built, particularly the number of rows. All of these optimizations improve the performance of the NBC devices by reducing the time, energy and number of agents needed for exploring the network and solving the respective exact cover instance.
Bit-mapping optimization. The mapping between the elements in the problem sets and the binary numbers is arbitrary from a mathematical point of view, but affects the value of the binary number, which  in turn determines the physical size of the resulting network. Therefore, we can reduce the size of the network (and thus the calculation time and energy) by optimising the mapping between bits and elements in X ( figure 4 (B)): we ordered the elements in X by how frequently they appear in the subsets in S. The least frequent elements were assigned to the highest value bits (i.e. bits resulting in large decimal numbers) in the binary number and the most frequent elements were assigned to the lowest value bits. Figure 3 shows an example where this bit-mapping optimization reduced the size of the network from 48 rows to 26 rows (compare figure 4(A) and (B)), decreasing the time and energy needed to solve the network by approximately 50%.
Entrance optimization. Another factor determining the time needed to solve an exact cover instance is the number of filaments that can be fed into the network simultaneously. Therefore, we devised a method to increase the number of entrances into the network. We begin by noting that the main purpose of the network is to calculate combinations of subsets. These combinations are calculated at split junctions. Subsets that cannot be combined with each other (because they contain identical elements) are connected by reset junctions. If this happens at the start of the network, the respective subsets only share a split junction with the empty set 0 (this is the case for the sets 0001, 0101, 0011 and 1011 in figure 4(B)). Consequently, the result of the network does not change if we remove the network sections encoding these subsets and use the numbers corresponding to the respective subsets as entrances into the network ( figure 4(C)). The resulting network has only six rows and five entrances. Thus this 'entrance optimization' reduces the time needed to solve an exact cover instance by roughly a factor of twenty (a quarter of the rows and five times the entrances).
Entrance and bit-mapping optimization. We can combine entrance optimization with bit mapping optimization: because the entrances do not contribute to the number of rows, we rearrange the bit mapping such that the most frequently appearing element in X is encoded not by the lowest-but the highest value bit. Thus, the corresponding large decimal numbers are used as entrances, further reducing the number of rows in the network to 3 ( figure 4(D)). The optimal combination of bit-mapping and entrance optimization reduces the time needed to solve a network by a factor of 80 (5 entrances times 48/3 rows).
Accessible entrance optimization. The target exit can only be reached from entrances which have a value larger than the target exit minus the total sum of numbers encoded in the network. In the example given in figure 4, the cutoff is 1100. Any entrance that is further away from the target exit than this cutoff can be removed. In case of the example given in figure 4, this leaves us with a tiny network that has only three rows and three columns.
To better illustrate the power of our optimizations, we picked substantially more difficult problems with |S| = 9 and |X| = 6 (figure 5), one that has a solution (figure 5 left column) and one that does not (figure 5 right column). The optimizations reduced the two network sizes from 268 and 252 rows and columns to 25 rows and 24 columns, respectively (figures 5(A) and (B)). At the same time, the number of entrances increased from one to five. Overall, the time needed to solve these instances is reduced roughly by a factor of 50 (a tenth of the rows and five times the entrances). The correct paths (figure 5 blue lines) indicate that agents arrive at the target exit of the network that encodes an instance that has a solution (figure 5(A)) but no agents arrive at the exit of the network that encodes an instance that does not have a solution ( figure 5(B)).
Reverse exploration optimization. Since we know the target exit, we can split the network up into a top and a bottom part, each encoding only half the total number of sets represented in the network (figures 5(C) and (D)). That way, the network is explored both from the entrances and-in reverse-from the exit. We termed this 'reverse exploration' optimization. In our current example, this means that the top network encodes the sets {1, 5} and {6, 5, 4} (corresponding to the binary numbers 001 010 and 000 111), while the bottom network encodes the sets {5, 4} and {6, 4} (corresponding to the binary numbers 000 101 and 000 011) (figures 5(C) and (D)). The top network still has the same entrances as the whole network, while the bottom network has only one entrance (corresponding to the target exit 111 111). This optimization reduces the total number of paths that need to (potentially) be explored from 80 possible paths (5 * 2 4 ) to 24 possible paths (6 * 2 2 ). However, it comes at the cost that more exits (16 instead of one) need to be monitored. The benefit of reduced agent number (and thus reduced time needed to feed agents into the network) scales exponentially with |S|, while the cost of increased exit monitoring scales approximately linear. Furthermore, the reduced number of rows in each network will exponentially reduce the number of agents that make an error in case of imperfect pass junctions.

Discussion
We successfully encoded exact cover into a network format suitable for solving the problem by NBC. Exact cover is the second NP-complete problem successfully encoded into network format, demonstrating that NBC can solve a range of different problems.
The size of the networks (i.e. the number of rows) of the network encoding of exact cover presented here scales approximately linearly with the number of sets in S (see below for more detailed scaling considerations). However, it is important to note that the networks scale exponentially with the number of elements in X. This is a consequence of the unary encoding of the binary numbers in a representing each subset in S. For the placement of reset junctions, only split junctions (i.e. a fraction of all junctions in the network) need to be tested locally, indicating that the algorithm for placing reset junctions scales linearly with the size of the network.
We demonstrated how automated optimizations can be used to make the network design more efficient, while preserving the correctness of the circuit. Each optimization step (bit mapping, entrance, reverse exploration) reduces the network size (and the run time) for random problems on average by a factor of 2. The agent number is reduced exponentially (the problem size is effectively halved) by both entrance and reverse exploration optimizations.
Crucially, the algorithm that designs the network scales itself only polynomially, not exponentially, with problem size. It requires counting the frequency of each element of X in S, which scales with |X| * |S| as well as sorting of the resulting counts, which scales with |S| log |S| (see [14]). Overall, the algorithm that encodes exact cover into optimized network format scales with 2 |X| |X| |S| log |S| for randomly generated instances.
The run time scales with 2 |X| |S| and the number of agents scales with 2 |S| . Thus, as long as |S| > |X|, the majority of the computation is done by the agents traversing the network. In this case, the computation will benefit from the fact that many agents can perform calculations in parallel. However, if |X| > |S|, the majority of the computation will be performed designing the network. In that case, it will be more efficient to solve the respective instance on an electronic computer.
The design of the exact cover network presented here is more complex than the design for the subset sum problem network introduced in [11], which raises a challenge of ensuring logical correctness of the exact cover network and new networks that will be designed in the future. In order to formally verify the correctness of our exact cover network encoding, we defined the formal semantics of NBC circuits for exact cover networks [15] using a transition system [16], which provides a mathematical model for the computation of the circuit. Transition systems are used as a basic computational model for defining dynamic behavior of both software and hardware systems. Constructing transition systems for an NBC circuit provides a formal definition of the computations enabled by this new paradigm, and forms the basis of a formal verification method and tool that allowed us to mathematically prove the correctness of exact cover circuit instances using model checking algorithms [15].
This work lays the ground for fabricating and scaling NBC networks solving exact cover instances that are significantly larger than the proof of concept instance [11] solved by NBC so far. Compared to the instance with three numbers and eight possible solutions solved in [11] (see also figure 1), the optimizations enable us to encode an instance with 9 subsets (512 possible solutions) with a network that is only ∼50% larger (see also figure 5). This demonstrates that it is possible to implement optimized algorithms in NBC, extending the range of problems that can be solved in practice.