By one plausible calculation of the typical energy consumption in a maximum-output nanotech-based machine (~10 watts per cubic centimeter), all of the light energy reaching Earth from the sun could be used by a single city of nanotech hardware 100 meters (~33 stories) tall and 10 kilometers on each side (Freitas 1999). Yet Earth has raw materials enough to build a vastly larger volume of computer hardware.
Whereas computers today must necessarily dissipate energy (bit erasure creates heat) as they do their work, reversible computers are not bound by any thermodynamic limits in energy efficiency. This is an overlooked concept in future projections of technology, whether here on Earth, or speculating on the energy needs of advanced alien civilizations (Kardashev scale, Dyson sphere, etc.).
OK. So enough with background. The headline result of the paper is
For any e > 0, ordinary multi-tape Turing machines using time T and space S can be simulated by reversible ones using time O(T^1+e) and space O(S log T) or in linear time and space O(ST^e).
Now, if you're trained like me, the vanishing epsilon in the big O notation seems nonsensical. And the log of anything finite, is as good as a constant. (I'm a hand waving physicist, after all.) Regardless, this paper asserts that the space and time overheads of running a reversible algorithm need not be essentially any worse (big O-wise) than the irreversible (conventional) version of the algorithm. That, in my mind, was very surprising. The line of proof I have in mind, however, hopefully makes it less so. Here it is.
We begin by observing that any conventional (irreversible) program can be simulated by a reversible program in linear time using space O(S + T) (Lemma 1 in the paper).
Why must this be so? (I'm not offering a different proof from Bennett for this lemma; just an alternate exposition.) A basic tenet of reversible computing is that you must run a program in such a way that at any point along its execution path you keep enough information around to be also able to step backward to the previous instruction in the program. (Keeping this information around, by the way, does not magically get rid of wasted heat; it's just a necessary design attribute of any efficient, reversible computing hardware.) One way to make this more concrete is to require that the reversible computer's memory be all zeroed out both before the program is run and on termination. The inputs to the reversible computer are the program, and the program's inputs (which, strictly speaking, include a flag indicating which direction the program is to be run, forward or backward); the computer's outputs are the program together with the program's output (which, again, might include that flag flipped). But even with a brute force approach employing a write-once memory design (wherein memory is zeroed out as a last step), it's easy to see even this scheme's space overhead is O(S + T). (If you wrote out the contents of the registers after every clock cycle, the space overhead would still be O(S + T) while the final zeroing out step would still take O(T) time.)
So O(S + T) is no big deal.
But observe that any irreversible program (that halts) can be partitioned into a series of intermediate irreversible subprograms, with each successor taking its predecessor's output as its input. (You can picture this construct simply as n breakpoints in the irreversible program generating n+1 chained subprograms.) Now the space overhead for none of these conventional subprograms can be any greater than O(S). Assume the breakpoints are spread out evenly in execution time--for argument's sake, though it doesn't hinge on it. That is, the time overhead for each of these chained, conventional subprograms is O(T/n). But from Lemma 1, we know the space overhead for the reversible version of each of these subprograms is no worse than O(S + T/n). So as we increase n, the number of intermediate reversible subprograms, the space overhead of the whole reversible program tends back to O(S) the space overhead of the conventional, irreversible program.
~
P.S. The breaking of the execution of a long running program into many parts is also a central theme in my post about managing large computer aided proofs.