Dissipation and irreversibility in computing

There has been much discussion for decades over the proper concepts of minimum dissipation per logic gate and what is required for computation, as well as early arguments over logically reversible machines. Here these arguments, and the thermodynamics related to them, are discussed in terms of what is required for a computing machine. This gives a set of requirements for a valid computer that arise already from Turing and a set of limitations on energy dissipation and entropy. Here, the requirements set by Turing on what can and cannot be a computing machine are introduced, and how these limits affect not only the machine, but individual bits and a general need for irreversibility, are discussed. Then, it is shown that there is a minimum dissipation for a bit operation, and this is imposed both by Turing’s requirements as well as by noise in the system. Finally, it is shown that information entropy differs from physical entropy and care must be taken in trying to connect the two quantities. How these requirements, and new ones, affect quantum computation is also discussed.


Introduction
The idea of calculating machines is probably as old as recorded history. The earliest such machine is probably the abacus, which dates at least to the ancient Sumerians. Various forms were certainly used throughout the middle east, and had spread as far as China and the Roman empire by the first century CE. Jumping from these simple machines for addition and subtraction to ones capable of multiplication probably occurred much later with Schickard and Pascal [1]. These calculating engines were developed significantly further and became electrically powered over the intervening years. They perhaps reached their peak with the tabulating machine developed by Hollerith and mastered by IBM as their accounting machines, that were programmed with plug-boards.
The step to computers was taken by Charles Babbage in the early 19th century, first with his difference engine and then with his analytical engine [2]. These machines were both mechanical computers, but the nature of the machine tools at the time probably hindered his ability to actually build the entire systems. But, he was far ahead of his time. It would be almost a century later, the fourth decade of the twentieth century, when progress would begin both with computing machines and with a conceptual basis. The idea of building an electronic computing machine seemed to appear in multiple places around 1935-7-Berlin, Ames (Iowa), Philadelphia, and at IBM. But, it was the work of Alan Turing that set in place a more rapid development of computer theory [3], as his conceptual computing machine contained all the essential elements of the modern computer: memory (his tape) and a central processing unit (his automaton).
It is not the aim here to spend time on the development of the computer as it progressed from these beginnings to the massive machines of today. While there are a great many textbooks on computer design, even for quantum computers, there is little discussion of the physical and logical constraints. Here, the goal is to address some physical and logical constraints that exist in any computer. In considering these constraints, Turing's work will provide some key limitations. In the next section, the physical requirements and limits for computation are first discussed. With the renewed interest in the discussion of fundamental limits of energy dissipation in computation, we would like to present an in-depth treatment of existing misconceptions and controversies. Then, the discussion turns to the logical requirements which absolutely forbid reversible logic/ computing if one is to believe Turing. Finally, the minimum energy dissipation that must occur in each step is discussed, the concept of entropy is covered, and finally some conclusions are drawn. While the discussion here will mainly be kept in the realm of classical physics, quantum computing will be discussed in the context of the various topics under discussion. Clearly, a classical bit with clearly defined states of 0 and 1 is very different from a qubit with analog values between 0 and 1, so that they cannot be discussed 'in the same breath,' so to speak.

Turing's views on computing
Computational systems generally are those which implement a Turing machine [3], which is considered to be a general purpose computer. In the normal sense, a Turing machine is composed of an automaton (the head) and a usually finite-dimensional tape (the memory, as the tape may be written to an infinite number of times, there is no strict need for the tape to be finite-dimensioned). There are many reasons for this machine to be irreversible, both physically and logically. First, the tape in the Turing machine must store information and keep it from dissipating, or thermalizing, the latter of which randomizes all information that may have been on the tape. If the information is to be maintained, then the tape must be maintained in a non-equilibrium state that itself is ordered due to the stored information [4]. Any symbols on the tape, which might need to be changed, must follow certain logical rules and must be secure from being disordered by random thermal noise. Secondly, one wants the computation to proceed to a logical conclusion, which yields the numerical answer for the computable number-the solution to the Entscheidungsproblem [3]. This problem was the 10th in a list of 23 unsolved problems that David Hilbert described in a lecture at the international mathematics conference in Paris in 1900 [5]. The problem asks whether a statement can be proven to be true or false given the logical axioms that are assumed; but it is interpreted as to whether or not a number is computable from a given set of initial axioms [6,7].

The turing machine
The above physical constraints were actually discussed by Turing in describing his machine. Turing called his central unit the automatic machine, or a-machine, and finally just a machine. In addition to this machine, he had a tape upon which symbols were read, written, and stored. These two parts would today be called the central processing unit (cpu) and the memory, irrespective of whether this was a cache level or the long term memory. The symbols written on the tape could be of two kinds: normal binary numbers, consisting of 0 and 1 symbols, and symbols which were not of this kind. At the end of the computation, the real binary number prefaced by a decimal point was called the number computed by the machine. Importantly, the machine could remember prior symbols read from the tape that were other than the current position of the tape, which we would call using a storage register that was part of the cpu. It was also important that there were no blanks stored on the tape until the end of the computation is reached, and then a blank would follow the computed number. This would then cause the machine to stop.
While Turing went to great length to describe the full details of the operation of his computing machine, these are not of any great interest here. On the other hand, certain properties of the machine will be very important. For example, for every sequence of operations, there was at least one descriptive number, but the descriptive number was unique in that it described only one possible sequence of operations. This meant that the computable sequences were enumerable, and the machine was of finite size; hence the name finite state machine. Turing went to lengths to point out that the machine could not run in circles, and thus any machine which computed a number must be circle-free. This meant that each independent sequence in the computation came to an end after a finite number of steps. The state of the entire machine depended upon both the state of the cpu and that of the tape.
The key steps of the Turing machine, and thus for any computer which is supposed to be an implementation of a Turing machine, seem to be that the sequences are circle-free and they have a stop/end state. The entire computation would be ended when the machine encountered that blank space on the tape, with the answer just before the blank on the tape. The solution to the Entscheidungsproblem, or to any general program on the Turing machine, requires the computation to proceed to a logical conclusion and yield the numerical answer, which means there is a preferred direction of the arrow of time, which by its very nature implies that the system is a non-equilibrium system. Turing states that this preferred direction of time is achieved by the application of a force, that already leads to a phase transition that breaks time-reversal symmetry, and requires consumption of energy and dissipation [8].

On bits
Let us extend Turing's ideas by stating what we mean by a computer and by computation. A real-world computer is some physical system that is used to represent information by the state of some physical quantity, and which can store and manipulate these states according to rules of logic. Specifically, we focus here on the binary representation of information, i.e. bits. Bits can be mechanical, electrical, magnetic, etc. A computer, as a physical system, is subject to the laws of physics, which describe the dynamics by its degrees of freedom. It is important here to distinguish between the information-bearing (IB) degrees of freedom, which define the bits, and the not-information-bearing (NIB) degrees of freedom, which describe the environment in which the bits are embedded. The bit states at the macroscopic level are carefully designed already at the microscopic level such that one can maintain and manipulate their macroscopic states in the presence of microscopic random thermal excitations.
Just to be clear, there are many, many more NIB degrees of freedom than there are IB degrees of freedom. A piece of computing machinery with mass of about 1 kg might have on the order of 10 25 degrees of freedom, but we only care about the bit states, which might be on the order of one billion, a tiny fraction of the total number of degrees of freedom. For example, in an electronic computer, the bits might be represented by charges on a capacitor and information is represented by the presence or absence of these charges. There are many more degrees of freedom associated with the detailed microscopic state of the metal plates, the oxide, or the substrate, but we do not care about these degrees of freedom as long as the information-bearing charge on the capacitor is maintained. Clearly, there are many, many degrees of freedom that we do not care about, many more than associated with the bits that we do care about. All we care about is that the 'environment' maintains the bit state and keeps it from being randomized due to thermal excitations and/or interactions with NIB states. We also emphasize that we do not care about the precise configuration of the charges on the capacitor; all we care about is whether charges are there (or not there), regardless of their precise microscopic state.
As another example, bits might be represented by the presence (or absence) of gas molecules in a container, and this example is widely discussed in the literature in the context of the link between computation and thermodynamics. A bit might be represented by two gas containers, which are connected to each other, and equipped with a mechanism to transfer the gas between them (such as a door, or a mechanical plunger). If the gas is in the left container, we might call this bit state 0, and 1 if the gas is in the right container. If we assume perfectly elastic collisions between gas molecules themselves and with the walls, this is an example of bits that can be manipulated in a dissipationless fashion (which, of course, is by the assumption of elastic microscopic processes and frictionless motion of the plunger). We also emphasize here, as we did above for the charge bit, that we do not care about the precise microscopic configuration of the gas molecules in a container, just whether the gas is in one container or the other. In other words, bit information is represented by the macroscopic center of mass of the gas volume, and not by its myriad microscopic configurations that are compatible with that macroscopic state.
In order for a computer to work properly, the bit states need to be maintained, i.e. they need to be protected against randomly flipping due to thermal excitations. In physical terms, a bit can be represented by a potential landscape with two minima [9], each representing one of the two possible bit states, and separated by a potential barrier. In the presence of a thermal environment with temperature T, there will be a certain probability that the bit might be excited over the barrier, i.e. the bit state changes due to random thermal fluctuations and not due to the intended computational algorithm. Clearly, this is undesirable, and the potential barrier needs to made sufficiently high for a desired error tolerance. For the example of a charge bit, this potential barrier is due to the work function of the capacitor metal, which prevents the electrons from leaking away. Bit errors are not only important for computation, but also for communication. If a message with a certain number of bits is transmitted through a noisy channel, i.e. a channel where bits flip with a certain (hopefully, small) probability due to random thermal noise, transmission errors result, and error-correction techniques have been developed in communication theory. Shannon considered this problem and established fundamental bounds, which formally look similar to the formalism of physical thermodynamics [10]. A nice discussion of this topic can be found in the Feynman Lectures on Computation [11].

Why not reversibility
Some five decades ago, when the idea of quantum computing first appeared [12,13], there arose a belief that logically reversible computers could be achieved, which themselves might be physically reversible [14]. The arguments of the previous section were emphatic that real computers must dissipate energy not only to maintain the desired state in the presence of noise, but also because of a preferred direction of time that led from a start state to a final stopping state where the answer is displayed. The suggestion for reversible machines follows a misunderstanding of Landauer's thesis [9]. Landauer pointed out that logical irreversibility requires physical irreversibility, but the inverse of this does not hold; e.g., logical reversibility does not imply physical reversibility (these two quantities will be explored further in section 3). As Landuaer put it [9], any physical degree of freedom is associated with k B T of thermal energy, and any switching signal must have this much energy to override the thermal noise associated with that degree of freedom. He questioned whether this energy had to be dissipated.
While introductory textbook quantum mechanics may be reversible, since the focus there is on Hamiltonians with conservative potentials, that assumption is reversible by assumption. However, the real world has a definite arrow of time due to a much richer dynamics [15]. Even the early work of Deutsch recognized the irreversibility that would occur in the quantum computer [13].
Landauer [9] pointed out that a device is logically irreversible if its output does not uniquely define the inputs, and he contended that such devices are essential to computing. If one wanted to do reversible computation, then one is limited to only those two-input devices that produced an equal number of 0's and 1's in its output possibilities. This limited such devices to an identity, a CNOT gate or an XOR, both created with two output bits instead of one, with its own inverse as possibilities. But one could think of devices with more than 2 inputs that could satisfy reversible computation. Contrary to this is the fact, however, that any multiple input, multiple output device can be reduced to a set of two-input gates. So, it would seem that Landauer's reversible logic does not allow for a proper number of types of gates to do general purpose computation. This forced him to admit that real information processing proceeds in a real physical world that imposes certain requirements on the physical structures and the energy consumption.
The function of a particular logic gate can be represented by its truth table, which lists the output given a certain input. Clearly for any reasonable logic gate, there is an output for each input. In other words, the input determines the output. But is the inverse also true? For a given output, can one deduce what input produced that output? In other words, is that logic gate logically reversible? It is rather straightforward to see that logic gates, in general, are not logically reversible. For example, elementary Boolean logic gates like the AND gate and/or the OR gate have two input bits and one output bit. Clearly, one cannot infer from the single output bit what the two input bits were at the beginning of the logic operation. In general, logic gates that have more input bits than output bits are not logically reversible.
Any system of logical gates creates a system that may be described by a state transistion matrix. How would the state transition matrix look if we insisted upon using reversible logic. As Landauer stated [9], a machine is logically reversible if, and only if, all of its individual steps during any sequence are logically reversible. This will be important. While the control signals generally are kept separate from the machine's state definition, that does not have to be the case. Assume that M bits, or qubits, are used to define the state of the machine. Then, the large state matrix will be a 2 M × 1 column matrix, assuming that one needs all 2 M states. In reversible logic, each state of the machine must have a unique predecessor state, as no 'fan in' is allowed, since such an operation would reduce the phase space and imply irreversibility. In addition, each state of the machine must have a unique successor state, as no 'fan out' is allowed, since the inverse operation would reduce the phase space and imply irreversibility. These conditions tell us that each row of the transition matrix T, which is 2 M × 2 M , must have a single 1 in it, and each column of the transition matrix must have a single 1 in it. In addition, every row and column must have a 1 in order to make the matrix full rank and therefore invertable.
The interesting thing about these transition matrices for the reversible machine is that they are well known in group theory. These matrices are representations of the cyclic permutation group, so that each one represents one or more rings upon which the states sit. That is, sequences carried out with this reversible logic lie on these rings, and these sequences are not circle free. So, in Turing's parlance, they cannot be used for a computing machine. Moreover, there is no clear state at which the machine will stop, hence violating another one of Turing's requirements; e.g., that each sequence of the computation must have a clear end state. We would have expected no less, since we pointed out above that a computing machine needs a force to drive it in a preferred direction of time, and this meant dissipation and irreversibility. If we could construct a machine with reversible logic, it would be a perpetual motion machine (likely of the second kind, as it would violate the second law of thermodynamics).
Landauer identified another problem with the reversible computer, and that lies with the fact that nothing can be erased from the tape in such a machine. If we had a reversible machine that was non-terminating (i.e., was not a circle-free machine), the capacity for preserving all the information about the intermediate steps cannot be there [9]. That is, this reversible machine requires an infinite tape and infinite time! This is another reason for the belief that reversible logic machines are not in the cards, at least if they want to be compatible with Turing's requirements. The conclusion that can be drawn is that reversible logic systems just cannot compute.

Summary
We may summarize this section with the observation that concepts such as reversibility and dissipation-free computing are not compatible with Turing's requirements for a computing machine. The need for an arrow of time, a stop state, and circle-free dynamics forbids both reversible computing and dissipation-free logic gates. This is a natural result of the fact that the information bearing bits are embedded in a sea of non-information bits, and interactions between these two sets is difficult to avoid. The concepts of dissipation and entropy will be dealt with in more detail in the next two sections.

Minimum dissipation
Every numerical process that is contained within a numerical calculator or an electronic computer, no matter how abstract, is constrained to limits imposed by physical processes that occur in the real world [4]. This topic has been of interest for decades, especially due to considerations about the minimum energy necessary per bit (or qubit) operation. All known computational systems, whether electronic or biological or mechanical, must dissipate energy, on physical grounds, and this has been speculated to lead to a minimum energy per 'operation' of [9,14] ( ) ( ) k Tln 2 1 B Landauer gave us this simple equation for the minimum energy dissipation per bit when erasing a bit (discussed in more detail in section 4). He pointed out that the degrees of freedom associated with the information in the machine, through thermal relaxation, go to any of the 2 N states associated with the N bits in the machine [9]. In this process, according to Landauer, the entropy increases by ( ) k Nln 2 , B which must be associated with dissipation to increase the entropy. It is this so-called loss of information which requires the energy dissipation [16]. Landauer would further state that computers must discard information to avoid being choked on the irrelevant intermediary results [17]. This discarding of information again is related to physical irreversibility and heat generation. But, he changed his tune later in life, and stated that noise could be reduced by computing slowly, due to the reduced bandwidth [18]. Then, the energy dissipation of (1) is, in fact, related to the sequence and not to each logical step in the sequence, as it related to the necessary force on the machine. This led to his statement that there is no minimal energy requirement of (1) per bit if the computation was done slowly enough [19], although this might not be physically achievable in practice.
We have already discussed this myth above, but we want to go further here. Much of the discussion in the literature has focused on reversibility and thermodynamics, and we would like to examine these arguments in detail here. General concepts like 'reversibility' and 'entropy' can have very different meanings in different contexts, and one has to carefully distinguish between physical reversibility and logical reversibility, and similarly between physical (thermodynamic) entropy and information-theoretic entropy. The physical reversibility will be dealt with here, while the entropy is treated in the next section. The more a system can be used for computation, i.e. the more the bits are isolated from thermal noise, the less thermodynamics applies. The more bits are flipped due to random thermal noise, the more thermodynamics applies, but the less these bits are useful for computation. As already pointed out [4], there appears to be some kind of complementarity between thermodynamics and computation.
The concept of 'reversibility' is used extensively in the literature, as well as in our discussion above, but not always with a clear definition what it actually means. Specifically, we contend, that it is important to clearly distinguish between physical reversibility and logical reversibility, and that these two distinct meanings of 'reversibility' cannot be used interchangeably [7,8].
Physical reversibility refers to the fact that some dynamical laws of physics are reversible in time, such as the equations of motion of classical particles (molecules in a gas) or the Hamiltonian dynamics in quantum mechanics. Staying with classical physics (although the phase space can also be used in quantum mechanics [20,21]), the dynamics of a particle is described by an ordinary differential equation where the particle's location and velocity uniquely determine the location and velocity for that particle for all times (both future and past). In a pictorial phase-space representation with location and velocity (or momentum) as axes, the particle is represented as a point in that space at any given time, and the time-evolution of the particle is represented as a trajectory (particles, in fact, also have a rich history in quantum mechanics [22][23][24][25]). Because of the nature of the equation of motion, trajectories cannot cross. For an assembly of particles, such as molecules in a gas container, each particle traces out its own trajectory, and the whole assembly, viewed in phase space, behaves like an incompressible fluid. The dynamics of such a gas container is reversible in time in the sense that if one were able to perfectly reverse all the particles' velocity at some instant in time, the whole assembly would move backwards in time as each individual molecule retraces its microscopic trajectory.
Logical reversibility refers to the logical invertibility of a (Boolean) logic gate. As discussed in the section 2, each logic gate has an input and an output, and each has a certain number of bits. It turns out that one can always make a logically reversible gate (with an unequal number of input/output bits) by adding extra bits such that the number of input and output bits become equal. A famous example of such a logically-reversible construction is the Fredkin gate [26], which can be implemented with a number of standard irreversible logic gates. So, while the overall logic of the Fredkin gate may be logically reversible, the individual gates are physically irreversible! And, we have already seen that a system of logically reversible gates is unable to perform computation.
Quite often, the literature closely links physical reversibility and logical reversibility, and a common assertion is that the time-reversibility of microscopic physics is incompatible with logically-irreversible operations. A much-used argument is related to the erasure of a bit [9], where the bit in its final, erased state exists in only one state, say 0, whereas initially it could have been in two possible states, it could have been either 0 or 1. This looks like a 'fan in' or a merging of trajectories in phase space, which is not compatible with timereversible microscopic physics. This argument would be true if the bit state were actually represented by the microscopic physical state, but this never is the case. As discussed above with two examples, electrical bits are represented by the presence or absence of charge, no matter what their instantaneous microscopic configuration might be, and gas-container bits are represented by the location of a volume of gas, no matter what their instantaneous microscopic configuration might be. Moreover, if the microscopic physical state would represent the logic state, charges on a capacitor would correspond to a myriad of logic states (one for each microscopic configuration) and the same for the gas-container 'bit;' in fact, we would not have bits any more. Logical irreversibility at the macroscopic level is perfectly compatible with time reversibility at the microscopic level.
Reversibility also appears in the context of adiabatic computing, but we will not discuss it here in detail since the meaning is quite different. Suffice it to say that in adiabatic computing, bit operations are performed in a way that keeps the physical system close to equilibrium during a bit operation, which minimizes energy dissipation. For example, if a bit is represented by the charge on a capacitor, abruptly switching from the charged to an uncharged state entails energy dissipation of / CV 2, 2 which is due to Joule heating caused by the current that drains off the charges. This dissipation can be reduced if the current is reduced by applying an adiabatic clock. However, there is additional circuitry required, and the associated overhead has proven to negate any potential gains, despite significant efforts in this area. We conclude that physical reversibility and logical reversibility are separate concepts that are not related to each other (except for the case where the logic state is directly represented by the microscopic physical state, but this is never the case). Simply put, logical ir/reversibility does not imply physical ir/reversibility and vice versa.
In fact, (1) is not unique to Landauer, as it is often attributed to Shannon's work on communication theory, where it is the necessary energy to read a transmitted bit [10]. While (1) does not appear explicitly in this work, it is certainly obtainable for a binary stream. In fact, he expressed it differently. If N is the noise power, P is the signal power, and B W is the bandwidth of the information channel, then the maximum bit rate is given as This would tell us that a non-zero rate of bit transmission could be achieved even if the noise power exceeded the signal power, a situation often present in experiments and in which 'lock-in' amplifiers are used to reduce the bandwidth in order to measure the signal. But, this is not likely to be achievable in computational systems. Keyes demonstrated how this result would lead to (1) in the case in which the signal power was less than the noise power [27]. The above result tells us that Shannon was concerned with information in communication systems, and allegedly he was advised to call missing information the signal entropy [28]. This connection of information to negative (physical) entropy has continued up to the present, although they are not equal in most cases [29,30]. Once Shannon had coined his information entropy, Brillouin worked out in some detail the connection of this to physical entropy [31]. But, these quantities are not the same thing, and do not yield the same numerical values in many cases. Yet, it is not surprising that (1) is obtained in most cases. Bate asked the question whether the same answer would be obtained in the many-body case of semiconductor devices, with their Fermi-Dirac statistics, used in e.g. information transmission and computation [32]. He examined the energy required to switch a bit in the presence of the quantum statistics, and found the same result as (1). This strengthened the belief that (1) is a general result, not connected to classical physics.
As remarked above, interest soon shifted to the actual performance that could be obtained in digital electronic systems. Keyes showed that the power level [27] where Z is the impedance of the interconnection lines, while not a physical limit, was a characteristic power level for digital logic. He concluded that there was not a single factor that would limit information processing, as the latter was affected by many such factors. In this line of thinking, Kish showed that one needed a barrier between the two states that was sufficiently large to avoid the thermalization [33]. He argued that, if C is the capacitance of the device, the noise voltage was While somewhat arbitrary, Kish suggested that the voltage barrier would have to be 12 times larger than this noise voltage in order to avoid significant bit error rates. This, in fact, is similar to an argument made by Keyes some years earlier [34], when he estimated that a quarter of a volt was a minimum logic swing in room temperature circuits. This is only a touch less than that suggested by Kish.
A more extensive look into the speed, power and integration limits was undertaken by Zhirnov et al [35]. These authors also probed how quantum mechanics would increase the results of (1) due to the onset of confinement energies. Surprisingly, they found that the use of cryogenic temperatures for very small devices would raise the minimum energy per bit well beyond (1) and well beyond the required energy at room temperature. This is contrary to all above suggestions, based upon (1), that low temperatures were favorable. This suggests, in their view, that low temperatures and small physical size of the logic gate were incompatible. If we accept these results, the logical conclusion is that quantum computers operating at low temperatures are probably not scalable to the integration density of modern microelectronics. A similar conclusion was reached by Gea-Banaloche and Kish [36].
In summary, many authors have given a number of rationales for a minimum energy dissipation for each bit operation. These have various points of departure in physics. But, it is clear already from section 2, that neither physically reversible gates nor physically reversible logic is compatible with a Turing machine. Even the oftquoted result (1) from Landauer [9] must be questioned, as he clearly is considering only a closed system when assuming that there are N information bearing bits only, thus ignoring the much larger number of noninformation bearing bits with which the former can interact. It is also clear that, in real physical computing systems, there are other important requirements on voltage, power, and noise that provide real thermodynamic limitations to a computing machine. Such results actually recognize that physical reversibility is not equivalent to logical reversibility, but in neither world do dissipation-free processes arise.

Thoughts on entropy
Just like 'reversibility' discussed above, the concept of 'entropy' is used extensively in the computer literature but not always with a clear definition what physical entropy versus information entropy actually means. Specifically, we contend, as was discussed above, that it is important to clearly distinguish between physical (thermodynamic) entropy and logical (information) entropy, and that these two distinct meanings of 'entropy' cannot be used interchangeably.
Boltzmann entropy relates a property of a macrostate, the entropy S, to the number of microscopic configurations, W, that are compatible with that microstate. This is expressed as where k B is the Boltzmann constant. Let us consider a system of N particles that can exist in a number of microstates, each labeled by an index i. Let us further assume that each of these microstates occurs with probability p i , which means that the number of particles in microstate i is given by n i = p i N. The question of how many microscopic configurations there are is equivalent to the combinatorial problem of how many ways there are of placing N balls in the i microstate bins. Obviously, The product in the denominator is over all microstates i. Taking the natural logarithm, ln, on both sides, we obtain If we are dealing with large numbers, which certainly is the case for gas particles, we use Sterling's formula as an approximation of the factorials~- After a little algebra, one obtains an expression for the entropy in terms of the probabilities p i with which microstates contribute to the microstate We went through the trouble of this derivation to point out the similarities and differences between physical (thermodynamic) entropy and a mathematically similar-looking expression used in information theory. As we will discuss below, the mathematical, formal expressions for physical entropy are similar to those for information entropy, but the similarity ends there. They have very different meaning, and they cannot be used interchangeably.
Shannon derived an expression for the average information of a symbol in a string (message) of length N. Consider a number of symbols a 1 , a 2 , K. a i , each occurring with probability p 1 , p 2 , K. p i . On average, the number of different messages then is i which looks just like (6) above. In fact, it results from the same combinatorial problem of how many ways there are of placing a number of different objects in a certain number of bins. Shannon then defines the average information I carried by a message of length N by taking the log 2 of (10). Assuming again, as above, that we are dealing with large numbers, the factorials can be approximated by Sterling's formula, and we arrive at This information-theoretic expression formally looks just like the above expression for thermodynamic entropy (9). A very readable account of these arguments can be found in Feynman's Lectures on Computation [11]. In fact, this book contains a very telling statement. Quoting Feynman from page 123 in [11]: 'Incidentally, Shannon called this average information the 'entropy,' which some think was a big mistake, as it led many to overemphasize the link between information theory and thermodynamics.' The literature contains many papers that define an entropy term associated with the possible configurations of a string of bits. Using Boltzmann's definition of entropy (5) above, one can identify W with all possible bit configurations for a string of N bits, which clearly is 2 N , and one can associate an entropy with that. Formally, each possible bit configuration looks like a microstate in the above physical picture. However, Boltzmann's thermodynamic entropy and the information-theoretic entropy, defined in analogy, have very different meanings. It is clear in the discussion of (5) and (11) that physical entropy is related to the number of microstates in the physical system, while information entropy is linked to the number of information-bearing bits. It was already emphasized that the number of information bearing bits is much smaller than the number of total bits (information bearing and non-information bearing). Moreove, it was emphasize that a bit is nowhere near a single microstate, as the actual microstate is usually irrelevant to the 'bit.' While thermodynamic entropy is closely linked with energy, information entropy is not. If one isothermally compresses a gas volume with N gas molecules to, say, half its value, there is a decrease in entropy of ΔS = N k B ln (2), and an amount of energy equal to ΔE = N k B T ln(2) has to be expended. The physically reason for this direct link between physical entropy and energy is the pressure that the gas molecules exert on the container, and that physical work has to be done to compress the gas against this pressure. Similarly, if one allows the gas to isothermally expand to twice its volume, energy can be extracted during this process, and this energy is due to the gas pressure. We emphasize that the gas pressure is due to the microscopic dynamics of the gas molecules, and collisions lead to changes in the microscopic configurations. These microscopic configurations appear in the sum of (9), which makes the macroscopic quantity of entropy an average over actual physical microscopic configurations of the gas molecules.
For information-theoretic entropy, there is no link to physical energy. For bits, there is no equivalent to pressure as bits-by definition-do not undergo random thermal motion (if they do, they are not bits any more). Bits confined to a string of length N do not exert physical pressure on that string. If one allowed a bit string of length N to double to length 2N, one obviously cannot extract physical energy from this 'expansion.' The sum in the expression for information-theoretic entropy is a mathematical sum over possible bit configurations, but a string of bits physically exists in only one single configuration of all these possible configurations.
We come to the conclusion that information entropy and physical (thermal) entropy, while formally looking similar, have very different physical meanings. Specifically, information entropy is not related to physical energy, while thermodynamic entropy certainly is. Information-theoretic entropy and physical (thermal) entropy are like apples and oranges, which also has been pointed out in [30].

On erasure
A central place in the arguments about the question of minimum energy requirements for computation in section 3 is the operation of 'erasure.' This argument originated in the seminal 1961 paper by Landauer [9], where he argued that logical irreversibility necessarily entails heat generation. Erasure essentially is a reset-to-astandard-state operation, i.e. the erased bit ends up in the standard state, say the 0 state, regardless of what its initial state was. It could have been either 0 or 1. Using thermodynamics, consider the entropy expression (9). Landauer argued that erasure corresponds to a decrease in entropy, and therefore necessarily requires energy dissipation. He argued that in the initial information state, which he referred to as W i = 2 (the bit could have been 0 or 1), and in the final information state, W f = 1 (the bit is only in the 0 state). Using (2), leads to a decrease in entropy of with an associated amount of energy dissipation, as given in (1). According to this thermodynamic argument, the fundamental reason for heat generation and energy dissipation is the fact that W f in the final state is less than W i in the initial state. This is true for any logically irreversible operation, such as erasure. Tying logical irreversibility to heat generation is aptly captured in the title of Landauer's paper [9]. However, we question this below.
In subsequent work, Bennett adopted Landauer's view that erasure does not necessarily require logically irreversible operation, but can be done in a logically reversible fashion [14]. Without going into details here, his scheme has computation being done in three steps: (1) Do the computation and arrive at the result; (2) Copy the result; and (3) Compute backwards, thus erasing the intermediate data accumulated during step 1. Logically irreversible operations can always be made logically reversible by adding extra, redundant data. In Bennett's scheme, logical reversibility comes at the cost of doing twice as much work: In addition to computing forward, one also has to do the extra work of computing backwards. A question immediately arises: If computing only forward is dissipative, why is computing forward plus backwards then not dissipative? In other words, why can doing twice as much be done without dissipation, while doing only half is dissipative? This question alone should give one pause about Bennett's scheme.
Indeed, we contend that there is a fundamental flaw with Landauer's original argument. As discussed above, Landauer's argument that irreversibility leads to heat generation, is based on a thermodynamic argument about a decrease of entropy during erasure. The important point here is that the decrease in entropy for a 2-to-1 operation is NOT a statement about physical (thermodynamic) entropy, but about information entropy. Landauer argued that the bit in the initial state could have been in the 0 or 1 state, we just didn't know, i.e. this is a statement about our knowledge of the bit. However, physically, the bit in its initial state physically was in one of the two states, we just didn't know, otherwise it would not have been a bit in the first place. In other words, in physical reality, the bit initially was in one state, and it ends up in one final state, i.e. physically, erasure is a 1-to-1 operation with no change in entropy and no associated energy dissipation. So, when using an expression like (9), one has to carefully distinguish what kind of entropy is meant. 'Erasure' entails no change in physical entropy, and thus no heat generation. While there is a change in information entropy, this does not imply any change in physical energy. The important point here is to distinguish between physical entropy and information entropy, and they have very different meanings [29,30]. We arrive at the conclusion that Landauer's argument, identifying the logically irreversible 'erasure' operation as a fundamental source of dissipation, is faulty as it treats physical entropy and information entropy as if they were the same, which they are not. We also conclude that Bennett's scheme of doing 'erasure' in a logically-reversible fashion, while of interest in computer science, 'fixed' a nonexistent problem as far as physical energy dissipation is concerned.
There is nothing special about 'erasure.' It is a bit operation like any other bit operation. In any bit operation, the bit starts out in its initial state (which we might know or not know, but physically it was in one state or the other, or we would not have a bit the first place) and it ends up in its final state. The preoccupation of the literature with 'erasure' is puzzling as there is much discussion about the energy cost of 'erasure,' but not about other bit operations. There are several papers in the literature that claim to provide experimental evidence that erasure requires dissipation. What these papers have in common is that they discuss a variety of two-state systems in the presence of noise, and that an energy barrier on the order of at least k B T is required to maintain a bit state in the presence of thermal fluctuations. What these papers also have in common is that, while they discuss this energy barrier in terms of bit erasure, there is no discussion of the energetics of non-erasure bit operations and what they actually might be.

Maxwell's demon
At several places above, there was a discussion about moving a cloud of gas molecules from one part of a container to another, in the guise of a bit being switched from one state to another. The astute reader would recognize that this is often an application of Maxwell's demon. Maxwell's demon is often invoked in such discussions as it provides a link between thermodynamics and information. Conceived by James Clerk Maxwell in 1867 (and published in his book Theory of Heat in 1871 [37]): 'a being whose faculties are so sharpened that he can follow every molecule in its course' could, in principle, violate the Second Law of Thermodynamics. In Maxwell's thought experiment, such a being controls an opening between two containers of gas, and he allows molecules with certain properties to pass through the opening, while blocking others. For example, if the demon allows fast (hot) molecules to pass from right to left, and allows slow (cold) molecules to pass from left to right, then the left container will heat up and the right container will cool down, in apparent violation of the Second Law of Thermodynamics. In other versions of this thought experiment, the demon could unmix a gas that contains two types of molecules, by selectively collecting one type in one container and the other type in the other container, again in apparent violation of the Second Law of Thermodynamics.
This paradox, which then was hotly debated in the physics community, was resolved by Leo Szilard in 1929 [38], who posited that the demon needs to know when to open or close the opening. Clearly, if the demon were to operate the opening at random, he could not selectively separate molecules with certain properties. Szilard argued that in order to know when the right molecule approaches, the demon has to perform a measurement, which is some physical process. For example, the demon might use a flashlight to see when a molecule approaches the opening, and a photon will be expended in the measurement process. This measurement has to be performed in the presence of background noise, and the photon has to be above the level of background radiation, otherwise the demon would just see thermal noise. Since thermal noise involves energies on the order of k B T, at least this amount of energy has to be spent on the measurement that lets the demon know when to open or close the opening. In other words, it is the measurement process that reconciles the Second Law of Thermodynamics, just as it does in computing.
Szilard's argument, which pinpoints the fundamental nature of energy expenditure during a measurement, is in contrast to Landauer's and Bennett's assertion that the fundamental process that leads to energy expenditure is the process of erasure. This lead to a reinterpretation of Maxwell's demon by Bennett [39] where he claims that the demon can perform a measurement without the need to expend energy, and the dissipative step occurs when the demon erases its memory, where the result of that measurement somehow was stored. An easily-accessible account of this reinterpretation can be found in [40].
Bennett frequently is credited, e.g. in [11], as having provided proof that the demon can perform the measurement in a reversible fashion, without any expenditure of energy. However, a closer look at Bennett's original publication [39] reveals that this is not so. In fact, he discusses a one-molecule Maxwell's demon apparatus with the help of a figure, which shows the operation in its various steps. Quoting from that page [39]: 'At first (a) the molecule wanders freely throughout the apparatus and the demon is in standard state S, indicating that it does not know where the molecule is. In (b) the demon has inserted a thin partition trapping the molecule on one side or the other. Next the demon performs a reversible measurement to learn (c) whether the molecule is on the left or on the right. The demon then uses this information to extract k B Tln2 of isothermal work from the molecule and allowing the molecule to expand (d) against the piston to fill the whole apparatus again (e).' Clearly, Bennett did not provide proof that the demon can perform a reversible measurement-he simply asserted it! In fact, it is clear that Bennett did not provide a valid re-interpretation of Maxwell's demon. His assertion that the dissipative erasure of the demon's mind restores the Second Law of Thermodynamics simply does not make sense. After all, the demon is free to erase its mind whenever or wherever it chooses. Does it really make sense that the demon can violate the Second Law by making reversible measurements, and that the energy expenditure required to restore the Second Law can occur somewhere else and at some other time?
Clearly not, and we are left with Szilard's original interpretation that the energy cost of information acquisition, i.e. the measurement process, resolves the demon paradox and restores the Second Law of Thermodynamics. The fundamental reason for the dissipative nature of the measurement process is that it has to be performed in a noisy environment, and some physical process with an associated energy higher than the noise level has to be used in order to distinguish the outcome of the measurement form random thermal noise. We note that in addition to the measurement that must be made, other sources of dissipation have been identified ensuring the failure of Bennett's argument [41].
This point is important in the computing world. The head of a Turing machine is similar to Maxwell's demon. As discussed in an earlier section above, the basic operation of the head is to read a symbol from the tape, and-based on that information-to then perform the intended operation. This is just what the demon does. A crucial step here is the read operation, (measurement) which tells the head (demon) what to do. Without a reliable read operation, the head would just read thermal noise, and the Turing machine would not be able to perform a specific computation, but would just wander at random. Even if one assumes that the head of a Turing machine might perform its state-change and move operations in a dissipationless fashion (similar to the demon opening and closing the opening), its operation fundamentally is dissipative due to the read operation.

Noise, ever present
Let us review the steps of the computation, in order to understand how each is susceptible to noise. At each step of the computation, the system moves from one dynamical state to another dynamical state, and this transition depends upon the value of the symbol read from the tape. Each of these states must be maintained in a nonequilibrium state [4, [42], and the transition from one to the next, as well as the read operation, must occur in the presence of random thermal noise. An external force is needed to drive the transition between states, and it is this force that by necessity breaks time reversal symmetry and entails dissipation. An integral part of each computational step is the reading of information from the tape, and this is essentially the process of making a measurement on the tape. In this regard, the automaton that reads the tape cannot be allowed to evolve freely, but must respond to the nature of the force and the information on the tape. It may even write a new symbol into the current spot on the tape. This again forces the computer along a desired logical path, and these reading operations require energy to find the desired information and protect it from thermal noise. As the noise level rises, the amount of energy that must be expended also increases. Hence, we are able to say that computation is dissipative at least because of the reading process in addition to the desired force that keeps the process on track. Only systems that are forced along a non-equilibrium, and non-thermodynamic, path by virtue of measurements, and thus dissipation, can be used for computation.
Hence, it is clear that noise is a fundamental limitation to computation. In (2), if the noise level is increased, then the bandwidth B W must be reduced to maintain a bit rate C. But, if the bandwidth is reduced, the bit rate must also be reduced, because a smaller bandwidth requires Landauer's slow-down of the computation. This means that (2) is not a fundamental equation of physics, but seems to be a convenient measure of information flow. However, there is a problem with the general concept of noise. Noise is a many-headed beast, and one must be careful about which kind of noise is being discussed.
In most cases of electronic systems, the noise being discussed is Johnson-Nyquist noise which arises from thermal fluctuations of the charge carriers [43]. The noise power for this type of noise generally depends upon the bandwidth of the system and is expressed as When this is the primary noise source, then reducing the bandwidth lowers the noise level and helps to make measurements or to send information digitally as well as by analog techniques, just as Landauer suggested. But, this is not the only type of noise known in condensed matter systems, such as typical electronic systems. There is another type of noise which varies with frequency as~a with a < < 0 2 [44]. In condensed matter systems, typically a~1. This type of noise can appear from many sources, including a range (in energy) of defects or impurities or even quantum coherence in transport around impurities [45]. The presence of this additional type of noise puts Landauer's argument of 'Kcomputing sufficiently slowlyK' to rest, as the noise level increases dramatically as the system moves to lower and lower frequencies of operation.
In quantum computation, there is another noise source intrinsic to qubit operation that must be considered. As discussed above, the primary difference between the bits on a classical digital computer and the qubits of a quantum computer is that qubits are not simply 0 or 1. Rather, the qubit allows a continuous range of projection of values that lie in the phase. That is, the qubit has an absolute value of unity, with a phase that varies as = j ( ) q e 15 i y j = ( ) exp i and quantum gate operations work on this phase. The qubits themselves should be thought of as analog objects. The state is a continuous (complex) variable. Yet, a quantum computer, like a classical computer, is a set of interconnected processing elements, which are the quantum gates. The efficacy of the quantum computation scheme relies in the ability to build in efficiencies through the use of quantum entanglement [46,47]. If one assumes that moving qubit information from one gate to the next corresponds to a spatial progress of the information, then this leads to a change of phase of the information as one moves through the circuit. This is a spatial change of the phase, but the phase typically incorporates the velocity or momentum of the wave function. This momentum does not commute with the spatial position assumed in the circuit, and the resulting uncertainty [48] clearly affects the results of the indicated operations. Such progress through the array of gates may be compared with a trajectory in the sense of Dirac [49] or Feynman [24].
Generally, a propagating wave moves with phase variations according to where w has been substituted by / k c and k is the momentum wave number of the wave, while v is the speed of the wave (or, more properly, the phase velocity in the appropriate medium in which the wave is situated). In free space, one has w which are the phase and group velocities, respectively. The fact that both of these equations lead to the speed of light just indicates how linear and dispersion-free the propagation of electromagnetic waves is. In circuits, however, this is no longer the case, and signal propagation is via some sort of waveguide. And, in waveguides, the two equations of (17) are no longer true. It is the group velocity with which information flows through the guide, and also between gates in a computer. Normally, if the relationship between w and k is not linear, the guide is dispersive which means different frequencies have different velocities. This causes a wave packet to broaden as it propagates.
The wave representation of the qubit information does not propagate from one gate to the next by a wave such as (15). Rather, it must be described by a wave packet that is rather local in space. Since the positions of the gates are determined almost exactly, the momentum in this packet must satisfy the uncertainty principle. Thus, the velocities corresponding to this uncertainty in momentum are also uncertain and this leads to an uncertainty in the arrival time of the packet at each new gate. This provides uncertainty in the phase and corresponds to a quantum noise in determining the phase. This noise source must set a limit upon how much information can be carried in the wave packet, or qubit, phase.
In addition, gate operations are usually described as unitary transformations. Such unitary operations can be expressed as a Green's function that has its own description as a Feynman path integral [50]. The path integral is itself a sum over a number of trajectories, which leads to a distribution of duration times for the gate operation. This leads to an error in both amplitude and phase for the gate, and this error may well propagate from one gate to the next.
Certainly there are an array of error-correcting techniques that have been suggested for quantum computers. But, Kak [51] has pointed out that many, if not most, suggestions for error correction have their base in classical computation. Classical computers can certainly have timing errors, but they possess no equivalent to phase errors. So, it is still a concern to evaluate the limitations that phase noise will present.

Summary
It is important to realize that information entropy is not physical entropy. Physical entropy is defined by the microstates of the system, which essentially may be related to the total number of bits within the system. On the other hand, information entropy is almost always defined by the information-bearing bits of the logical system. The computing system is an open system in the which the IB bits are only a small part of the total microstates, as was discussed in section 2. Any attempt to relate these two in an ontological manner is to compare apples to oranges. This further makes any attempt to relate information entropy to real physical processes complicated and often confusing, and many discussions of e.g. a limit on dissipation only in terms of the IB bits cannot be considered to be the right answer.
This carries over to discussions of e.g. erasure. These are often again given in terms of IB bits, but the bit transitios themselves are often erroneously connected with entropy changes. Since these are real physical entropic quantities, great care must be taken. The logical world is not the physical world-one is exponentially smaller than the other. Even the simple act of 'reading the symbol on the tape' in Turing's machine is complicated in the physical world by, as an example, understanding Maxwell's demon, itself a subject of considerable debate in the literature (although we present our reasoned view on this topic).
Finally, all physical systems are subject to noise of which there are many varieties. Simple computing slowly does not help, as there are new sources of low frequency noise not present in high frequency operations. This affects all discussions of the effect of noise unless the power requirements of the entire physical system are incorporated.

Quantum bits-qubits
At several places in the previous sections, quantum computation and quantum bits (or qubits) have been mentioned. It is important to note that bits and qubits have some fundamental differences. A bit has only two states, 0 or 1. On the other hands, a qubit has phase according to (15). Thus, the qubit is actually an analog quantity, with information contained within the phase of the qubit. It is this phase that is manipulated in quantum computing. This is because qubits have a unique quantum property-entanglement [46]. Consider a pair of particles (which represent arbitrary things, not necessarily what is normally considered a particle), each of which has its own unique wave function. When these two particles interact quantum mechanically, they are no longer distinct particles. Rather, they are now described as a single two-body wave function. Measurement of a property of one particle also gives the equivalent property of the second particle, although there are several ontological views of quantum mechanics, and the concept of measurement differs among them [52,53]. If one defines a density matrix from the two original wave functions, there will be significant off-diagonal terms that represent the entanglement of the two particles. Dissipation, or decoherence, destroys this entanglement. The power of the quantum computer lies in this entanglement.
An example is the Fourier transform, which is critical in many areas of information security. Classically, one can make a very fast version of the Fourier transform, the fast Fourier transform (FFT), if the frequencies are all described by binary integers; that is, the frequencies differ by powers of 2. Nevertheless, the FFT must be repeated for each of the frequencies to obtain the entire transform. But, if the latter are entangled into a single set of qubits, then all can done in one pass, giving a substantial speedup in the calculation. But, quantum mechanics is a probabilistic theory, so that one still needs several passes to ensure the correct answer. Nevertheless, entanglement yields a significant speedup in the process.
There are many implementations of qubits that have appeared over time [54]. These range from superconducting versions that depend upon the properties of Joseph junctions and quantum interference [55],

Summary and conclusions
In this paper, we have presented arguments and discussion on the fundamentals aspects of dissipation and reversibility in the computing process. There are thousands (likely more) papers that talk about the computing process, but do not mention the topics considered here. Yet, understanding these topics is critical to really understand the physical requirements on the manner in 'devices' are connected to create an embodyment of the Turing machine.
Turing describes an automaton (a machine) by which one can determine if a number is computable. This really is the question of whether or not the machine will stop and give an answer. There are basic requirements which actually include the need for dissipation, represented by him as a force driving the machine in a preferred direction that ends with the machine stopping. Physically, this requires the machine to be irreversible as timereversal symmetry is broken. Since, time reversal symmetry is broken, the machine cannot be reversible physically, nor can it contain reversible logic at each gate. The latter follows from Turing himself when he forbids the machine having 'circles' or eternal loops in today's language. But, reversible logic machines always have the states lying on one or more circles, so mathematics also forbids reversible machines.
Then, one must realize that the logical system is composed of a set of information-bearing bits. This set is embedded within a much larger group of non-information bearing bits. The interaction between these two sets is the classical system-environment problem in physical science. The total system is open, and this allows interactions between the two set of states and the dissipation of energy from the IB states. Considerations of dissipation may set limits as to how small it may be, it non-the-less is a physical process upon which the computational machine rests. Any connection between information and thermodynamics passes through the NIB states and the real physical system, not merely the gates used in the computation. Such results actually recognize that physical reversibility is not equivalent to logical reversibility, but in nether world do dissipationfree processes arise.
This becomes even more confusing when entropy is brought into the discussion. Information entropy is not physical entropy. Physical entropy is defined by the microstates of the system, which essentially may be related to the total number of bits within the system (both IB and NIB togedther). On the other hand, information entropy is almost always defined by only the information-bearing bits of the logical system. Thus, there cannot be a connection between information entropy and physical entropy; to attempt to do so is to compare apples and oranges. The argument affects discussions of erasure, as this operation is entirely in the logical domain, but is usually misconstrued as being connected to physical entropy. The latter connects to the expansion or contraction of the amount of phase that contains the entire set of microstates as it progresses in time. This has no connection to logical entropy.
Finally, any computer operates in the presence of noise. The role of this noise on a particular gate, or state transition, will depend upon the physical device(s) employed for the purpose. Each known/type device has its own noise sources. Many sources are common to most devices. But, for example, computing slowly does not reduce the noise as there are new noise sources at low frequency.
At the end of the day, all computational systems are in fact real physical systems. It is difficult to actually connect an information process with a physical process as the IB states are a small subset of all possible states. But, there are several major rules for trying to connect the two. We have tried to identify the connections as well as the rules, and hope this tutorial will provide/stimulate further studies.

Data availability statement
All data that support the findings of this study are included within the article (and any supplementary files).