# This is a script for a talk I gave 2019-05 about the Tree Evaluation Problem
# and the algorithm I discovered. It's not meant to be read word-for-word, but
# it is written so that it could be, as a way to make sure the talk is smooth,
# and to make it will be easy to present it again later.
#
# Topic:
# The Tree Evaluation Problem. Separating L or NL from P, and a surprising
# algorithm.
#
# Duration: 25 minutes, not including questions.
#
# Legend:
# * Lines that start with @ are text that goes on slides, except that {text in
#   braces} describes the slides rather than appearing on the slides.
# * Lines that start with # are "commented out" (I plan to not say them).
#
# References:
# [1] S. Cook, P. McKenzie, D. Wehr, M. Braverman, R. Santhanam. 2010. "Pebbles and Branching Programs for Tree Evaluation". https://arxiv.org/abs/1005.2642
# [2] S. Chan, J. Cook, S. Cook, P. Nguyen, D. Wehr. 2010. "New Results for Tree Evaluation" https://www.cs.toronto.edu/~sacook/homepage/results.pdf
# Not included in this talk, but relevant:
# [3] S. Cook, J. Edmonds, V. Medabalimi, T. Pitassi. 2016. "Lower Bounds for Nondeterministic Semantic Read-Once Branching Programs".

@ The Tree Evaluation Problem
@ James Cook, Google Research
@ Presenting work by:
@ * Mark Braverman
@ * Siu Man Chan
@ * Stephen Cook
@ * Pierre McKenzie
@ * Phuong Nguyen
@ * Rahul Santhanam
@ * Dustin Wehr

I'm James cook, Steve Cook's son.
I'm going to talk about a topic my Dad's been working on lately, and that I've been peripherally involved in, called the Tree Evaluation Problem.

@ {new slide}
@
@ Overview
@
@ * Motivation
@ * The Tree Evaluation Problem
@ * Lower bounds
@ * A new algorithm
@
@ Annotate the first three bullets with:
@     Pebbles and Branching Programs for Tree Evaluation
@     2010
@     S. Cook, P. McKenzie, D. Wehr, M. Braverman, R. Santhanam
@     https://arxiv.org/abs/1005.2642
@   and
@     New Results for Tree Evaluation
@     2010
@     S. Chan, J. Cook, S. Cook, P. Nguyen, D. Wehr
@     https://www.cs.toronto.edu/~sacook/homepage/results.pdf

This talk has two parts.
The first part is a bunch of work mostly done by people other than myself.
I'll start with some motivation explaining why we care about the Tree Evaluation Problem.
Then I'll describe the problem itself, and I'll talk about some lower bounds for the complexity.
All of that comes from a couple of publications from twenty ten.
Finally I'll talk about a new algorithm I've found that solves the problem with surprising performance.

@ {new slide}
@
@ Motivation
@
@ {<= stands for LaTeX \subseteq}
@ AC^0(6) <= NC^1 <= L <= NL <= LogCFL <= AC^1 <= NC^2 <= P <= NP <= PH

The motivation is separating complexity classes.
Here's a sequence of complexity classes.
Don't worry if you don't know what all these are.
On the right side, we have NP followed by the whole polynomial hierarchy.
On the left, we have A C zero of six.
As far as we know, A C zero of six is a really weak complexity class.
For example, consider this simple problem, called "majority": the input is a string of bits, and you have to determine whether the majority of the bits are zero or one.
@ {arrow to AC^0(6)} We don't know if MAJORITY in AC^0(6).
We don't know whether you can compute that in A C zero of six.
And in between there's a wide range of complexity classes, that are, as far as we know, all distinct from each other.

@ {"We don't know"... disappears.}
@ AC^0(6) = PH ?

The embarassing thing is that we still don't have any proof that these classes aren't all the same as each other.
Actually, I'm not sure if my knowledge is out of date here. Is this still true?
Any kind of separation on this line would be a breakthrough result.

@ Tree Evaluation Problem {arrows to L, NL and LogCFL}

The Tree Evaluation Problem was introduced as an attempt to get a separation.
It's a problem that is in Log C F L.
Don't worry if you don't know what log C F L is. The point is that you don't need a very powerful computer to solve this problem.
On the other hand, we believe it is not in log space.
The hope is to prove that indeed it isn't in log space, which would allow us to separate that from LogCFL.
There are lots of tantilizing approaches to trying to prove this.

Okay, so let's see what the Tree Evaluation Problem is.

@ {new slide}
@
@ The Tree Evaluation Problem
@
@ {Diagram: A height-3 complete binary tree.}

The input to the Tree Evaluation Problem is a complete binary tree with some information attached to each node.
Every leaf has a number attached to it,
@ {The four leaves are labelled with the numbers 3, 1, 2, 2, respectively.}
and every internal node has a table of numbers.
@ { Each internal node is labelled with a 3x3 table where each entry is 1, 2 or
@   3.
@   In particular, the (3, 1) entry of the left child's table is 2, the (2, 2)
@   entry of the right child's table is 3, and the (2, 3) entry of the root's
@   table is 2.
@ }
Given that input, we're going to recursively define a single number at each node, called the value of the node.
The values of the leaves are already part of the input: three, one, two, and two in this case.
To compute the value of an internal node, we need to first know the values of its children.
Those two values tell us where to look in that internal node's table.
For the internal node on the left, we look at the entry in row three, column one of its table, and we find the number two.
@ {Add a 2 to the left child of the root.}
Similarly, we look up row two column two of the node on the right, and find the number three.
@ {Add a 3 to the right child of the root.}
Finally, the numbers two and three tell us where to look in the root node, and we find the number two.
@ {Add a 2 to the root.}
@ Goal: compute the value at the root. {arrow to the value at the root node}
The output of the Tree Evaluation Problem is the value at the root.
@ height = 3, k = 3
There are two parameters to this problem.
The first parameter is the height of the tree. Three in this case.
The second parameter is k, which is the range of the numbers at the nodes.
In this case it's also three, meaning every number is between one and three, and the tables are all three by three.

The next question is: how should we solve this problem?
On that topic, I'd like to introduce another concept called the pebbling game.

@ {new slide}
@
@ Pebbling Game [Paterson Hewitt 1970]
@
@ {complete binary tree of height 3}

This was first defined by Paterson and Hewitt in 1970.
It works like this.
Suppose we have a complete binary tree of height h.

@ height h = 3

Three in this case.

@ * Limited supply of pebbles (say, 3).

You have some limited number of pebbles. Let's say it's three.
They all start in your hand.
You're allowed two kinds of move.

@ * Two kinds of move:
@  * Move a pebble to a leaf.
@  * If a node's two children have pebbles, move a pebble to that node.

First, you can move one of your pebbles to a leaf of the tree.
And second, if a node's two children both have pebbles on them, you can move one of your pebbles to that node.
The goal is to place a pebble on the root node.

@ * Goal: put a pebble on the root.

Let's try.

@ {Three pebbles appear beside the tree.}

We have three pebbles.
We'll start by moving two of the pebbles to the leftmost two leaves.

@ {Two pebbles move to the leftmost two leaves.}

Now, the internal node on the left has pebbles on both of its children.
So we're allowed to move a pebble to it.

@ {One of the pebbles on a leaf moves to the left child of the root.}

This is good progress.
Our goal is to put a pebble on the root, and we've already got one of the root's two children.
Now, let's focus on the right side of the tree.
I'll move two pebbles to the two leaves on the right.

@ {The two pebbles that are not on an internal node move to the right two leaves.}

Now I'm allowed to move a pebble to the right child of the root.
Which pebble should I move?
Not the one that's already on the root's left child, because I don't want to lose that progress I've made!
So, I move one of the other two pebbles to that node.

@ {The pebble on the leftmost of the two leaves with pebbles on them moves to the right child of the root.}

Now, both of the root's children have pebbles on them, so I can move a pebble to the root.

@ {The pebble on the left child of the root moves to the root.}

And we're done.
The important question is: how many pebbles do we need?
In this case we had three pebbles, and it was enough.

@ Theorem: h pebbles is enough.

In general, you can solve this game with h pebbles, where h is the height of the tree, using a simple recursive algorithm.
And that's tight: if you only have h-1 pebbles, no sequence of legal moves can put one on the root.

@ Theorem: h pebbles are needed.
@ Proof: Exercise.

I'll leave that one as an exercise.

The reason I described this game is that we can turn pebbling strategies into algorithms for the Tree Evaluation Problem.

@ {Image of Tree Evaluation Problem from earlier slide.}

Instead of a limited supply of pebbles, we have a limited supply of memory

@ {Beside "Limited supply of pebbles":} Limited memory.

Having a pebble at a node means that we're storing the value of that node in memory.
Moving a pebble to a leaf corresponds to reading the input at that leaf.

@ {Beside "Move a pebble to a leaf.":} Read the input at a leaf.

Moving a pebble to an internal node corresponds to looking up the correct value in that node's table, and storing it.

@ {Beside "move a pebble to that node.":} If we've computed the values of a node's two children, look up the value at that node.

And just like the goal of the pebbling game is to put a pebble at the root, the goal of the tree evaluation problem is to compute the value at the root.

@ {Beside "put a pebble on the root.":} Compute the value at the root.

Using this analogy, we can turn pebbling strategies into algorithms.
In particular, the pebbling strategy this theorem gives us that uses h pebbles can be transformed into an algorithm that uses big oh of h log k memory to solve the Tree Evaluation Problem.

@ {Beside "h pebbles is enough":} O(h log k) memory is enough to solve TEP.

Let's try taking this analogy one more step.
We know the pebbling game requires at least h pebbles.
And, for a long time, we couldn't think of any algorithm that uses less memory.

@ {Beside "Can't solve with fewer than h pebbles.":} Conjecture: TEP requires Omega(h log k) memory.

So we had a conjecture: the Tree Evaluation Problem requires big omega of h log k memory.

@ {new slide}
@
@ Pebbling algorithm: O(h log k) memory.
@
@ Conjecture: TEP requires Omega(h log k) memory.

To summarize where we are so far: we have an algorithm that uses on the order of h log k memory.
I'm going to call that the "pebbling algorithm".
We also have a conjecture saying that's the best you can do.

Now, remember, our overall goal is to prove the Tree Evaluation Problem is in polynomial time but not log space.
In order to measure the complexity, we need to measure how big the input to the Tree Evaluation Problem is.

@ {Repeat image of Tree Evaluation Problem.}
@
@ Input size:

Remember, the input consists of a k by k table at each internal node, and a number at each leaf.

@ {beside Input size:} Theta(2^h k^2 log k)

So, the size of the input is on the order of two to the h internal nodes times k squared numbers at each node times log k bits to store each number.

@ TEP in L => can solve in O(h + log k) memory.

The logarithm of that is big oh of h plus log k.
So, if the tree evaluation problem is in log space, that means we can solve it in big oh of h plus log k memory.

That's less than h times log k, so our pebbling-based algorithm isn't a log space algorithm.

@ {Arrow to "Pebbling algorithm":} Not log space.

So, if our conjecture is true, then log space is not equal to polynomial time.

@ {Arrow to Conjecture:} Would imply L not= P.

It turns out this conjecture isn't true: I'm going to describe an algorithm I recently found that uses big oh of h plus k memory.
That defeats the conjecture in the case that k is smaller than h log k.

@ Counterexample: can solve with O(h + k) memory.
@ Beats Omega(h log k) when k = o(h log k).

But it's still not a log space algorithm,

@ Still not log space.

so while we found the existence of this algorithm a bit surprising, it doesn't rule out using the Tree Evaluation Problem to separate L and P.

@ {Repeat overview slide}

I've introduced the Tree Evaluation Problem, so the next thing I'm going to talk about is lower bounds.

@ {new slide}
@
@ Some lower bounds
@
@ Compare to pebbling: O(k^h) memory

I'll leave the upper bounds from two algorithms we know on the slide for reference: big oh of h log k memory, and the algorithm I'll present later which uses big oh of h plus k memory.

@ * Nečiporuk [1966: "On a boolean function"]: Omega(h + log k) memory

An old method from Nečiporuk gives a lower bound of h plus log k memory.

@ * "Thrifty" algorithm: Omega(h log k) memory
@ * Read once algorithm: Omega(h log k) memory

The other two methods I will list both put a requirement on the algorithm:
one requirement is called being "thrifty", and the other is read once.
In each case, we get a lower bound saying we can't beat pebbling:
thrifty and read-once algorithms need big omega of h log k memory.
We spent a lot of time trying to make these lower bounds apply more generally.

@ New algorithm: O(h + k) memory

But, it turns out the lower bounds don't apply to all algorithms: I'm going to present an algorithm that uses big oh of h plus k memory.
I want to give a quick description of last two bounds, to give you a sense of why we find the existence of the new algorithm surprising.

A thrifty algorithm is one that never reads an irrelevant part of the input.

@ {new slide}
@
@ Thrifty algorithm: never reads irrelevant input.
@
@ {Repeat Tree Evaluation Input from earlier, but with all the values that don't contribute to the final output crossed out.}

So, for example, in the table for the left intermediate node, the only value that matters is row three, column 1.
So our program must make sure it never attempts to read at any of the other cells of that table, otherwise it's not a thrifty algorithm.
This doesn't seem like a very tough requirement, because intuitively, it's hard to imagine how we could benefit from reading these irrelevant parts of the input.
But it's a strong enough requirement that we can prove a lower bound of h log k memory.

@ Theorem: Thrifty branching programs need Omega(h log k memory).
@ Pieces of the proof:
@ * Before reading from a node's table, need to know childrens' values.
@ * Extract pebbling game strategy.
@ {QED box}

I won't describe the proof in detail, but I'll mention a couple of the ideas that go into it.
First, we can't read a node's table until we know for sure which part of the table to read,
so we need to compute the values at the two children first.
The second idea is that it's possible to extract a strategy for the pebbling game from a computation of a thrifty branching program, and we know that you need at least h pebbles to solve the pebbling game.

The other lower bound I want to talk about is for read-once algorithms.

@ {new slide}
@
@ Read-once algorithm: no computation reads the same input twice.

A computation of a read-once algorithm is only allowed to read each part of the input at most once.

@ Theorem: Read-once algorithms need Omega(h log k) memory.

Once again, we can prove that under this requirement, an algorithm needs big omega of h log k memory to solve the Tree Evaluation Problem.
Again, I won't describe the proof in detail, but here's the basic idea.
First of all, we fix all the tables in the internal nodes, so the algorithm only needs to read leaf nodes.
@ * Fix internal nodes.
Wait until the very last leaf node the algorithm reads.
@ * Wait until last time algorithm reads a leaf.
At that instant before it reads that value, there are k different values that leaf could have, and the algorithm neeeds to know how to correctly handle all of them: it needs to have memorized a function to translate that leaf's value into the final output.
@ * Output is a function of that leaf.
We can arrange things so that there are k to the power h minus one possible functions.
@ * k^(h-1) possible functions from that leaf to output.
So memorizing that function requires big omega of h log k bits.
@ {QED box}
And that concludes the proof.

Both read-once and thrifty branching programs need h log k memory to solve the Tree Evaluation Problem.
There's an intuition that at some point during the computation, a program will need to remember h different values at the same time.
Now I want to show you an algorithm I've found that works completely differently.

@ {new slide}
@
@ Computing with a full memory: catalytic space [BCKLS2014]

It's based on an interesting result from a 2014 paper called Computing with a full memory: catalytic space.
The idea is that you're given a small amount of ordinary memory to work with, and a much larger amount of extra memory.
The catch with the extra memory is that once you're done with your computation, you need to return it back the way it was.
Imagine your friend has leant you a hard disk that you can use, but when you return it, all the data on the hard disk has to the same as the way you got it.

@ Given
@ * Small ordinary memory
@ * Large memory that must be returned to its original state

I think a natural first thought here is that the extra memory can't possibly help.
For example, if you overwrite any data that was stored in it, you'd better keep a copy of that data somewhere else so that you can put it back once you're finished.
So it's hard to imagine how you could get any net gain from it.
Surprisingly, the result in the paper is that the extra memory does help.

@ Result: With O(log n) ordinary memory, can compute uniform TC^1.
@ E.g. matrix determinant.

It turns out that with only a logarithmic amount of ordinary memory, you can compute uniform T C one circuits.
For example, you can compute the determinant of a matrix in T C one, and we don't know how to do with logarithmic memory.

I'm not going to use this setting with the borrowed extra memory.
And so I'm not going to use the result from this paper directly when building my algorithm.
But I'll use some pieces of their result.

The basic ingredient in the paper is a kind of reversible computation that operates on registers.

@ {new slide}
@
@ Computing with full memory
@
@ Inputs x_1, ..., x_n
@ Registers r_1, ..., r_m

Let's say we have n inputs x one through x n, and m registers r 1 through r m.

@ Reversible instructions.
@ E.g. r_5 <- r_5 + r_4 x x_1

Then a program is a sequence of reversible instructions.
Each instruction modifies one register in a reversible way.
For example this instruction multiplies register r four by input x 1, and increments register five by that amount.

@ Inverse: r_5 <- r_5 - r_4 x x_1

Importantly, we can build an inverse instruction that has exactly the opposite effect, by subtracting r four times x 1 from register five.
A program is a sequence of these instructions.
Now, suppose we have some function f we're interested in computing.

@ Definition:
@ A program /transparently computes/ f into r_i if
@ (final value of r_i) = (initial value of r_i) + f(x_1, ..., x_n)

I'm going to define a notion called transparently computing.
We say a sequence of reversible instructions transparently computes f into register r i if the final value of register r i is equal to the initial value plus f.
For example, here's a program that transparently computes x one plus x two into register one.

@ Example: f(x_1, x_2) = x_1 + x_2
@ r_1 <- r_1 + x_1
@ r_1 <- r_1 + x_2

We simply add x one to register one, and then add x two to register one.

@ Definition:
@ If a program transparently computes f into r_i, and leaves every other register unmodified,
@ it /cleanly computes/ f into r_i.

If a program transparently computes f into register i and also leaves the other registers unmodified, we say it cleanly computes f.
The example program here also cleanly computes x one plus x two into r one.

Now, as a warm-up, let's prove a simple result about transparent and clean computations.

@ {new slide}
@
@ {From the previous slide, keep definitions of transparent and clean computation.}
@
@ Warm-Up: Turning transparent computation into clean computation

I'll keep the two definitions on the slide so we can refer to them.
We'll prove that if we can transparently compute something, we can also cleanly compute it.

@ Suppose P transparently computes f into r_1, and doesn't use r_2.

Suppose we have a program P that transparently computes f into register one.
We'll need one spare register to make this work: so assume P doesn't use register two.
It's allowed to use every other register our computer has.

@ Then this program cleanly computes f into r_2:
@  P
@  r_2 <- r_2 + r_1
@  P^-1
@  r_2 <- r_2 - r_1

Then, this program cleanly computes f into register two.
It has four steps.
We start by running program P.
At this point, register one holds its initial value plus f.
@ {Beside the first instruction:} r_1 = (original value of r_1) + f
Then we add the value of register 1 to register 2.
At this point, register 2 holds the initial value of register two plus the initial value of register one plus f.
@ {Beside the second instruction:} r_2 = (original value of r_2) + (original value of r_1) + f
Now, we run the inverse of program P in reverse.
This P inverse notation means: take the inverse of every instruction in P, and run them in the reverse order.
This undoes all the changes we've made, except the change to register 2.
@ {Beside the third instruction:} r_1 = (original value of r_1)
Finally, register one is back to its original value, and we subtract that value from register two.
Now, all that's left in register two is its original value plus f.
@ {Beside the fourth instruction:} r_2 = (original value of r_2) + f

So we've cleanly computed f into register 2.
We've proved that any time we can transparently compute a function f,
we can also cleanly compute f, and what it costs us is that we need to run the program twice and use one extra register.

I'm going to use one other result from the computing with full memory paper, which is a lemma about multiplying two functions.

@ {new slide}
@
@ {From the previous slide, keep definitions of transparent and clean computation.}

@ Lemma 4 from "Computing with a full memory"
@
@ Suppose P transparently computes f into r_1 and g into r_2,
@ and doesn't use r_0, r_3, r_4.
@
@ Then this program transparently computes f * g into r_0:
@
@ r_0 <- r_0 + r_1 r_2 + r_1 r_4 + r_3 r_2
@ P
@ r_3 <- r_3 + r_1   r_4 <- r_4 + r_2   r_0 <- r_0 + r_1 r_2
@ P^-1
@ r_0 <- r_0 - r_1 r_4 - r_3 r_2

This is Lemma Four from that paper.
It says: suppose we have a program that transparently computes two different
functions in parallel: it computes f into register 1 and g into register 2.
And we have three registers that P doesn't use: registers 0, three and four.
Then, this new program transparently computes f times g into register zero.
That is, the final value of register zero is the initial value plus f times g.

@ (final value of r_0) = (initial value of r_0) + f(x) * g(x)

The proof of this is just a bunch of algebra which I won't go into.
The important thing to keep in mind is that we had to run program P twice.

With these ingredients, we're ready to build our algorithm.

@ {new slide}
@
@ Algorithm
@
@ Disclaimer: not (yet) peer reviewed.

I should give a quick disclaimer: this is a brand new result that hasn't yet been through a full peer review process, so it's possible it's completely wrong.
One or two other people have looked it over.
Also, I won't go into the full detail of the algorithm. This is just a sketch.

The algorithm uses on the order of k registers, and each register stores one bit.

@ O(k) registers. Each stores one bit.

The program is based around the following recursive subroutine.

@ Subroutine
@ Given: node A
@ Flips register r_v, where v is value at node A.
@ Leaves other registers unchanged

Given a node A in the tree, the subroutine flips the bit value of exactly one register, and leaves all the other registers unchanged.
The index of the flipped register is the value at that node.
For example, if A is the root node, and the value of the root node is five, this subroutine flips register five.
Another way of thinking of this subroutine is that for each of the k possible values x, we ask the question: does node A have value x?

@ In other words:
@ Define E_A,x = 1 if node A has value x, else 0
@ For x=1...k, Subroutine cleanly computes E_A,x into r_x.

Define E A x to be one if node A has value x, otherwise zero.
This subroutine is cleanly computing E A x for k different values of x.
Exactly one of those value is one, so in the end, we flip one register.

So, how do we implement this subroutine?
Well, suppose we want to compute E A one.

@ Example: computing E_A,1

We look at the table at node A.

@ {3x3 table appears. Each cell has the number 1, 2 or 3 in it. Three of the cells are 1, at coordinates (2, 1), (2, 2) and (3, 1).}

We want to know if node A has value one, so we look at all the places the number one appears in the table.

@ {Highlight all the 1s in the table.}

E A one is equal to one if and only if the children of this node point us to one of those highlighted cells.
So, let's make a definition.

@ Let S_x = {(y, z) | entry (y, z) in node A's table is x}

For any number x, define S x to be the set of coordinates in the table where we find the number x.
For example, the highlighted cells here form the set S one.

@ {Arrow to the table with highlighted cells.} S_1

We can build a formula for E A one based on this idea.
Let's name A's children B and C.

@ Let B and C be A's children.

Continuing our example, there are three ways node A can be one, so our formula has three terms.

@ {Near the Example:}
@ E_A,1 = E_B,2 * E_C,1 + E_B,2 * E_C,2 + E_B,3 * E_C,3.

The first term, E B two times E C one, is one if the first child is two and the second child is one.
Similarly, we add E B two times E C two and E B three three E C three.
In general, the formula looks like this:

@ E_A,x = sum_{(y, z) in S_x} E_B,y * E_C,z

To compute E A x, we take the sum over all of the pairs y z in this set S x of E B y times E C z.

So, let's talk about implementing this.

@ {new slide}
@
@ {Keep example from previous slide.}
@
@ Subroutine: cleanly compute E_A,1 , ... , E_A,k
@
@ E_A,x = sum_{(y, z) in S_x} E_B,y * E_C,z

We implement it recursively.
We can compute all of the E B y and E C z values by calling the subroutine twice, once on each child.

@ Can compute all E_B,y and E_C,z by calling Subroutine twice (once on B and once on C).

Now, remember the Lemma about multiplication which I showed you earlier.
That lemma us that if we have a program that computes two different functions, we compute the product of those functions by running the program twice.
It turns out that we can adapt the lemma to compute this whole formula.

@ Based on Lemma 4, can compute E_A,1, ..., E_A,k using four calls to Subroutine.

The lemma requires us to double the number of times we run the subroutine, so it takes us four subroutine calls to compute all of the different E A x values.
And we're almost done.
The only problem now is that the subroutine is supposed to be a clean computation.
So far, it's not, because the recursive calls and all the work by Lemma 4 caused a bunch of extra registers to be modified.
We can make it clean by running everything twice using the method I showed you in the warm-up earlier.

@ This is a transparent computation. To make it clean, run everything twice: total of 8 recursive calls to Subroutine

In the end, the subroutine is recursively calling itself eight times.

So, let's analyze this algorithm.

@ {new slide}
@
@ Subroutine recursively calls itself on children 8 times.

To compute the value at a node, the subroutine recursively calls itself on its two children eight times.
For a tree of height h, that means a total of two to the order h calls to the subroutine.

@ Number of register operations: 2^O(h) k^2

I didn't show it, but each call also includes a loop with on the order of k squared instructions.
So the total number of register operations is two to the order h times k squared.
That means that just to remember where we are in the execution already requires big oh of h plus log k bits.

@ => Need O(h + log k) bits to remember which operation is next.

The other thing we need to store in memory is the values of the registers.

@ O(k) one-bit registers

We're using on the order of k registers, each storing one bit.
So, in total, we're using big oh of h plus k memory.

@ Total memory: O(h + k)

For comparison, the more straightforward algorithm based on pebbling uses big oh of h log k  memory.

@ Compare to pebbling algorithm: O(h log k)
@ New algorithm uses less memory when k << h log k

This new algorithm uses less memory when k is smaller than h log k.

@ {new slide}
@
@ Future work:
@ * Improve the algorithm?
@ * Lower bounds that apply to this new algorithm
@   (Algorithm isn't thrifty, isn't read-once.)

That's everything I wanted to present.
There are two clear directions for future work here.
The first is to see if we can make the algorithm even better.
And the other direction is to try to develop new lower bounds that apply to this algorithm.
If you remember from earlier, I talked about lower bounds that apply to two special classes of algorithm.
Thrifty algorithms are not allowed to read irrelevant parts of the input: they can't read an input until it's sure it contributes to the final answer.
This algorithm certainly isn't thrifty: it reads every part of the input, in a fixed order.
I also talked about a lower bound for read-once algorithms.
But this algorithm is full of repetition: it does the same operations over and over again, forward and backward, as a trick to save memory.
So it would be interesting to find some conditions on branching programs that don't rule out this algorithm, but still allow us to prove a lower bound.

@ Thanks!

That's it. Thanks!