The papers of Alan Craig Feinstein on $P$ versus $NP$ have caused me to previously review some basic questions in computational complexity. This post continues our line of thought on what "computational complexity" means from the pragmatic perspective. This in contrast to "theoretical" computation results, which is a contradiction in terms.
In this post we simply consider the question: What is the computational complexity of computing the $n$-th Fibonacci number?
Let's recall the basic definition: the fibonacci sequence is a sequence of integers $f_0, f_1, f_2, \ldots$ defined recursively by the rules $$f_0=1, ~~ f_1=1, ~~\text{and}~~ f_{n+1}=f_n+f_{n-1}~~\text{for}~~ n\geq 1.$$
Naively to compute $f_{2022}$ would require we apply the definition and find $f_{2021}$ and $f_{2020}$, and then continuing the recursion we need to call all the previous values of $f_k$ for $k < 2022$. What is the precise computation?
Let's begin with the pseudocode. The idea is memoization. If we apply the above recursive formula to evaluate $f_{2022}$, then we'll find ourselves reusing values of $f_k$ for $k < 2022$. The memoization is simply a device for recording and reusing these intermediate values. So to compute $f_{2022}$ we will need to store the previous values, although there is a method (Bottom-Up evaluation) which is similar, and which inductively evaluates $f_1$, $f_2$, $f_3$, etc., until we reach $f_{2022}$. So if we compute $f_{2022}$ and memoize all the terms $m=\{f_1, f_2, f_3, \ldots\}$ then we see the memo $m$ grows unbounded in length. Some authors modify the memo to only contain the two "largest" elements, thus reducing the length of the list. This would require updating the memo $$\{f_1, f_2\} \leadsto \{f_2, f_3\} \leadsto \{f_3, f_4\} \leadsto \cdots \leadsto \{f_{n-1}, f_n\} \leadsto \cdots.$$ But still the number of bytes necessary to represent the fibonacci elements grows exponentially with $n$ since $f_{n+1} \approx \varphi^n$ where $\varphi$ is the golden ratio $\varphi=\frac{1+\sqrt{5}}{2} \approx 1.618033$.
The memoization gives a time complexity of $O(n)$ to compute $f_n$. But memoization requires memory to store the memo! We think this needs be accounted for in the computational complexity, since obviously it's an essential part of the computation. Indeed computation as represented by Turing machines, say, involves motion and read/write/react at different locations on the "tape". It seems that a proper accounting of complexity should involve every time-step, namely every motion and site operation. In analyzing the complexity of the usual $O(n)$ complexity of computing $f_n$, the action of calling and retrieving the data in the memo are considered to be constant. But is it truly constant in the Turing model?
fib_cache = {0 : 0, 1 : 1}
def fib(n):
if n in fib_cache:
return fib_cache[n]
for i in range(100, n, 100):
fib(i)
fib_cache[n] = result = fib(n-1) + fib(n-2)
# print(fib_cache) # <--this line is diagnostic.
return result
# There's interesting comment on why the range(100, n, 100) line is important
# to avoid a "recursive depth error". The point is that the memo needs to be filled
# as the algorithm develops, otherwise it recurses too far and returns an error!
# fib(1111111)
fib(11111)
Wolfram's (non)Definition of Computational Irreducibility.
In Wolfram's book "A New Kind of Science", there is introduced the idea of computational irreducibility. We quote from Chapter 13, pp.739-740.
[Begin quote]
"Computational irreducibility leads to a much more fundamental problem with prediction. For it implies that even if in principle one has all the information one needs to work out how some particular system will behave, it can still take an irreducible amount of computational work actually [sic] to do this.
Indeed, whenever computational irreducibility exists in a system it means that in effect there can be no way to predict how the system will behave except by going through almost as many steps of computation as the evolution of the system itself.
In traditional science it has rarely even been recognized that there is a need to consider how systems that are used to make predictions actually operate. But what leads to the phenomenon of computational irreducibility is that there is in fact always a fundamental competition between systems used to make predictions and systems whose behaviour one tries to predict.
_For if meaningful general predictions are to be possible, it must at some level be the case that the system making the predictions be able to outrun the system it is trying to predict. But for this to happen the system making the predictions must be able to perform more sophisticated computations than the system is trying to predict._
But the remarkable assertion that the Principle of Computational Equivalence makes is that this assumption is not correct, and that in fact almost any system whose behaviour is not obviously simple performs computations that are in the end exactly equivalent in their sophistication.
So what this means is that systems one uses to make predictions cannot be expected to do computations that are any more sophisticated than the computations that occur in all sorts of systems whose behaviour we might try to predict. And from this it follows that for many systems no systematic prediction can be done, so that there is no general way to shortcut their process of evolution, and as a result their behaviour must be considered computationally irreducible."
[End quote]
Here's what I like about Wolfram's idea of "computational irreducibility". Firstly it aligns with experience. Mathematicians who actually compute things know that there's no shortcuts for alot of operations. It's very difficult to add two numbers $a+b$ without actually "adding" them arithmetically. In otherwords, mathematicians don't expect shortcuts. This relates to Craig Alan Feinstein's arguments that $P \neq NP$, and specifically that the subset-sum problem is computationally irreducible. To decide whether a set of signed integers $S$ contains a zero subset-sum, there really is nothing we can do except exhaustively search through all the subsets and add the elements. But most theoretical computer scientists "dream" of somehow there existing a shortcut, a black box, a sophisticated magical algorithm that will compute the sums without actually computing them. So Wolfram's idea of "computational irreducibility" is in opposition to the theoretical computer scientists' optimism.
But what's lacking is any rigorous proof or definition of computational irreducibility. If I had to make a pseudo-definition (intuitive) I would first say:
a computation is the evaluation of a function $f: X \to Y$,
and the evaluation $f$ is computationally irreducible if the complexity of any equivalent composition $g: X \to Z$ and $h: Z \to Y$ with $f = h \circ g$ satisfies $$c(f) \leq c(h) + c(g),$$ where $c(f)$ represents the "computational complexity" of evaluating a function $f$.
However the complexity $c(f)$ is not well-defined at this point. But the idea is that a function $f$ always admits (nonunique!) compositions $f=h \circ g$, but the question would be whether these compositions $h, g$ are any simpler than the evaluation of $f$ itself!
The above is not sufficiently rigorous since we have not defined $c(f)$. Now it must be clear that $c(-)$ is not defined on the category of functions, since compositions $f=h\circ g$ are identical functions, but not necessarily identical computations.
For example, Wolfram would argue that the Collatz function arising in the Collatz conjecture $3x+1$ problem is computationally irreducible. There are no shortcuts for evaluating the Collatz function, and therefore there will never be a proof of Collatz' conjecture. This is why no mathematician should study the problem, because the results/outputs of the Collatz function cannot be logically established, they can only be empirically established by directly evaluating the Collatz function using the definition.
Topological Irreducibility
Let's examine the word irreducibility somewhat more. Wikipedia has many references, but the primary definition (for our perspective) is the topological irreducibility. Here is the definition from Wikipedia
Def. (Reducible and Irreducible Topological Spaces)
- _A topological space $X$ is reducible if it can be written as a union
$ X= X_1 \cup X_2$ of two closed proper subsets $X_1, X_2$ of $X$._ - A topological space $X$ is irreducible (or hyperconnected) if it is not reducible. Equivalently, if all non empty open subsets of X are dense, or if any two nonempty open sets have nonempty intersection.
So the computational aspect of irreducibility would involve computing intersections of nontrivial open subsets of the space. Formally it would be better to provide the relative definition of irreducibility for subsets $A \subset X$, but we just use the subspace topology on $A$ in $X$. I.e., open subsets of $A$ are defined as intersections $U \cap A$ of open subsets $U$ of $X$. It's useful to introduce the relative definition because our first idea for defining computational complexity is to consider the topology of the graph of the function in question, i.e. $graph(f) \subset X \times Y$. Now if we have a composition $f = h \circ g$, then we obtain two graphs(!) namely $$graph(h) \subset Z \times Y$$ and $$graph(g)\subset X \times Z.$$
I suppose it would be interesting to determine whether the graph of the Collatz function is an irreducible topological subset of the product $\bf{N} \times \bf{N}$. And here we need specific the topology of $\bf{N}$, which is not so obvious, although perhaps the discrete topology is the most natural.
To this point we haven't proved anything new, we've only conceptualized the problem, trying to more formally define "irreducibility". We have yet to define the complexity $c(f)$ of a function although our idea is that $c(f)$ should reflect a property of $graph(f)$. Dependancy on the graph of $f$ allows us to compare compositions $f=h\circ g$ and maybe develop comparisons between $c(f)$, $c(g)$, $c(h)$, etc..
But the point is that we have a strategy now, and we can consult the literature by looking for techniques to prove the irreducibility of topological spaces. This is a general problem, and we can even begin with the more specialized case of algebraic geometry.
In algebraic geometry, if a variety is irreducible then what does that mean in concrete terms for the equations defining the variety? For hypersurfaces it means the variety can be described by two equations instead of one. For example a polynomial $p(x)$ is reducible if $p(x)=q_1(x) . q_2(x)$ for nontrivial polynomials $q_1, q_2$. What's interesting for polynomials is that the degrees of the products are strictly smaller, namely $$deg(p) = deq(q_1)+deg(q_2).$$
But let's hypothesize that the complexity of evaluating a degree $d$ polynomial is $d$. Then the complexity of computing $p=q_1 q_2$ is not $deg(p)$ but actually $$\max\{ deg(q_1), deg(q_2) \}.$$ This is assuming that the product $p=q_1 . q_2$ has constant time complexity, which is reasonable for real or complex numbers. So this is elementary examples of how topological reducibility relates to computational complexity. And this seems to be the beginning of maybe something more interesting.
Complexity of Exponentiation
There is a recurring idea in evaluating the complexity of exponentiations, and it's based on taking successive squares based on the binary representations. This is where the $O(log_2 n)$ complexity algorithm for evaluating $f_n$ is derived. This speedup depends on the fact that the rule for defining the fibonacci sequence is linear and static, namely the infinite sequence of rules $f_{n+1}=f_n+f_{n-1}$ for $n\geq 1$ is almost only one rule $$f_{*+1} = f_* + f_{*-1}.$$
This relates to the observation that iterated matrix powers of $$Q=\begin{pmatrix} 1 & 1 \\ 1 & 0 \end{pmatrix} $$ generate the Fibonacci sequence. Specifically we have the identity $$ \begin{pmatrix} f_{n+1} & f_n \\ f_n & f_{n-1} \end{pmatrix} = \begin{pmatrix} 1 & 1 \\ 1 & 0 \end{pmatrix}^n. $$
This identity has alot of different explanations, but it suggests the following "speedup" for computing $f_n$ for $n\geq 2$. For example, to compute $f(16)=f(2^4)$ we can evaluate $$Q, ~~~~~Q^2, ~~~~ Q^4=(Q^2)^2, ~~~~ Q^8=(Q^4)^2.$$ And this successive squaring of the previous result allows a $log_2(n)$ speedup of the Fibonacci computation.
Likewise $f(15)$ requires we compute all the powers $Q, Q^2, Q^4, Q^8$ and then take the product $$Q.Q^2.Q^4.Q^8=Q^{15}.$$ This represents somewhat the "worst case". Notice that here we need to store the powers $Q^{2^k}$, $k>0$, in memory and then multiply these powers together. So what precisely is the complexity? We find it curious how the complexity of matrix multiplication remains an open question in computer science. There are various speedups, although the naive complexity is $O(n^3)$ although speedups to $O(n^{2+\epsilon})$ are possible for some specific small values of $\epsilon$. We'll let $\tilde{m}$ represent the complexity of multiplication of two matrices.
So the total complexity is counted like:
- Complexity of evaluating $log_2(n)$ matrix powers is $\tilde{m}. log_2(n)$ time steps.
- Complexity of $log_2(n)$ matrix multiplications is $\tilde{m}. log_2(n)$ time steps.
- Total Complexity is the sum of the complexities in 1. and 2., and therefore equal to $\tilde{m}. log_2(n)$, or $O(log_2n)$
Are the methods of computing the fibonacci sequence really different? Obviously one appears to be faster, requiring nominally less time steps or elementary operations. However there are increasing memory constraints arising from the matrix form, having to store the powers $Q^{2^k}$, and where of course the entries are becoming super exponentially large.
So What?
Fair question. But as we said, this is the beginning of maybe something deeper. So we need to compare the two algorithms, and perhaps address whether the final $O(log_2n)$ is really irreducible in Wolfram's sense.
[To be continued ... -JHM]