An Unexpected Correspondence
Enter any expression and it’ll get evaluated:
And internally—say in the Wolfram Language—what’s going on is that the expression is progressively being transformed using all available rules until no more rules apply. Here the process can be represented like this:
We can think of the yellow boxes in this picture as corresponding to “evaluation events” that transform one “state of the expression” (represented by a blue box) to another, eventually reaching the “fixed point” 12.
And so far this may all seem very simple. But actually there are many surprisingly complicated and deep issues and questions. For example, to what extent can the evaluation events be applied in different orders, or in parallel? Does one always get the same answer? What about non-terminating sequences of events? And so on.
I was first exposed to such issues more than 40 years ago—when I was working on the design of the evaluator for the SMP system that was the forerunner of Mathematica and the Wolfram Language. And back then I came up with pragmatic, practical solutions—many of which we still use today. But I was never satisfied with the whole conceptual framework. And I always thought that there should be a much more principled way to think about such things—that would likely lead to all sorts of important generalizations and optimizations.
Well, more than 40 years later I think we can finally now see how to do this. And it’s all based on ideas from our Physics Project—and on a fundamental correspondence between what’s happening at the lowest level in all physical processes and in expression evaluation. Our Physics Project implies that ultimately the universe evolves through a series of discrete events that transform the underlying structure of the universe (say, represented as a hypergraph)—just like evaluation events transform the underlying structure of an expression.
And given this correspondence, we can start applying ideas from physics—like ones about spacetime and quantum mechanics—to questions of expression evaluation. Some of what this will lead us to is deeply abstract. But some of it has immediate practical implications, notably for parallel, distributed, nondeterministic and quantum-style computing. And from seeing how things play out in the rather accessible and concrete area of expression evaluation, we’ll be able to develop more intuition about fundamental physics and about other areas (like metamathematics) where the ideas of our Physics Project can be applied.
Causal Graphs and Spacetime
The standard evaluator in the Wolfram Language applies evaluation events to an expression in a particular order. But typically multiple orders are possible; for the example above, there are three:
So what determines what orders are possible? There is ultimately just one constraint: the causal dependencies that exist between events. The key point is that a given event cannot happen unless all the inputs to it are available, i.e. have already been computed. So in the example here, the evaluation event cannot occur unless the
one has already occurred. And we can summarize this by “drawing a causal edge” from the
event to the
one. Putting together all these “causal relations”, we can make a causal graph, which in the example here has the simple form (where we include a special “Big Bang” initial event to create the original expression that we’re evaluating):
What we see from this causal graph is that the events on the left must all follow each other, while the event on the right can happen “independently”. And this is where we can start making an analogy with physics. Imagine our events are laid out in spacetime. The events on the left are “timelike separated” from each other, because they are constrained to follow one after another, and so must in effect “happen at different times”. But what about the event on the right? We can think of this as being “spacelike separated” from the others, and happening at a “different place in space” asynchronously from the others.
As a quintessential example of a timelike chain of events, consider making the definition
and then generating the causal graph for the events associated with evaluating f[f[f[1]]] (i.e. Nest[f, 1, 3]):
A straightforward way to get spacelike events is just to “build in space” by giving an expression like f[1] + f[1] + f[1] that has parts that can effectively be thought of as being explicitly “laid out in different places”, like the cells in a cellular automaton:
But one of the major lessons of our Physics Project is that it’s possible for space to “emerge dynamically” from the evolution of a system (in that case, by successive rewriting of hypergraphs). And it turns out very much the same kind of thing can happen in expression evaluation, notably with recursively defined functions.
As a simple example, consider the standard definition of Fibonacci numbers:
With this definition, the causal graph for the evaluation of f[3] is then:
For f[5], dropping the “context” of each event, and showing only what changed, the graph is
while for f[8] the structure of the graph is:
So what is the significance of there being spacelike-separated parts in this graph? At a practical level, a consequence is that those parts correspond to subevaluations that can be done independently, for example in parallel. All the events (or subevaluations) in any timelike chain must be done in sequence. But spacelike-separated events (or subevaluations) don’t immediately have a particular relative order. The whole graph can be thought of as defining a partial ordering for all events—with the events forming a partially ordered set (poset). Our “timelike chains” then correspond to what are usually called chains in the poset. The antichains of the poset represent possible collections of events that can occur “simultaneously”.
And now there’s a deep analogy to physics. Because just like in the standard relativistic approach to spacetime, we can define a sequence of “spacelike surfaces” (or hypersurfaces in 3 + 1-dimensional spacetime) that correspond to possible successive “simultaneity surfaces” where events can consistently be done simultaneously. Put another way, any “foliation” of the causal graph defines a sequence of “time steps” in which particular collections of events occur—as in for example:
And just like in relativity theory, different foliations correspond to different choices of reference frames, or what amount to different choices of “space and time coordinates”. But at least in the examples we’ve seen so far, the “final result” from the evaluation is always the same, regardless of the foliation (or reference frame) we use—just as we expect when there is relativistic invariance.
As a slightly more complex—but ultimately very similar—example, consider the nestedly recursive function:
Now the causal graph for f[12] has the form
which again has both spacelike and timelike structure.
Foliations and the Definition of Time
Let’s go back to our first example above—the evaluation of (1 + (2 + 2)) + (3 + 4). As we saw above, the causal graph in this case is:
The standard Wolfram Language evaluator makes these events occur in the following order:
And by applying events in this order starting with the initial state, we can reconstruct the sequence of states that will be reached at each step by this particular evaluation process (where now we’ve highlighted in each state the part that’s going to be transformed at each step):
Here’s the standard evaluation order for the Fibonacci number f[3]:
And here’s the sequence of states generated from this sequence of events:
Any valid evaluation order has to eventually visit (i.e. apply) all the events in the causal graph. Here’s the path that’s traced out by the standard evaluation order on the causal graph for f[8]. As we’ll discuss later, this corresponds to a depth-first scan of the (directed) graph:
But let’s return now to our first example. We’ve seen the order of events used in the standard Wolfram Language evaluation process. But there are actually three different orders that are consistent with the causal relations defined by the causal graph (in the language of posets, each of these is a “total ordering”):
And for each of these orders we can reconstruct the sequence of states that would be generated:
Up to this point we’ve always assumed that we’re just applying one event at a time. But whenever we have spacelike-separated events, we can treat such events as “simultaneous”—and applied at the same point. And—just like in relativity theory—there are typically multiple possible choices of “simultaneity surfaces”. Each one corresponds to a certain foliation of our causal graph. And in the simple case we’re looking at here, there are only two possible (maximal) foliations:
From such foliations we can reconstruct possible total orderings of individual events just by enumerating possible permutations of events within each slice of the foliation (i.e. within each simultaneity surface). But we only really need a total ordering of events if we’re going to apply one event at a time. Yet the whole point is that we can view spacelike-separated events as being “simultaneous”. Or, in other words, we can view our system as “evolving in time”, with each “time step” corresponding to a successive slice in the foliation.
And with this setup, we can reconstruct states that exist at each time step—interspersed by updates that may involve several “simultaneous” (spacelike-separated) events. In the case of the two foliations above, the resulting sequences of (“reconstructed”) states and updates are respectively:
As a more complicated example, consider recursively evaluating the Fibonacci number f[3] as above. Now the possible (maximal) foliations are:
For each of these foliations we can then reconstruct an explicit “time series” of states, interspersed by “updates” involving varying numbers of events:
So where in all these is the standard evaluation order? Well, it’s not explicitly here—because it involves doing a single event at a time, while all the foliations here are “maximal” in the sense that they aggregate as many events as they can into each spacelike slice. But if we don’t impose this maximality constraint, are there foliations that in a sense “cover” the standard evaluation order? Without the maximality constraint, there turn out in the example we’re using to be not 10 but 1249 possible foliations. And there are 4 that “cover” the standard (“depth-first”) evaluation order (indicated by a dashed red line):
(Only the last foliation here, in which every “slice” is just a single event, can strictly reproduce the standard evaluation order, but the others are all still “consistent with it”.)
In the standard evaluation process, only a single event is ever done at a time. But what if instead one tries to do as many events as possible at a time? Well, that’s what our “maximal foliations” above are about. But one particularly notable case is what corresponds to a breadth-first scan of the causal graph. And this turns out to be covered by the very last maximal foliation we showed above.
How this works may not be immediately obvious from the picture. With our standard layout for the causal graph, the path corresponding to the breadth-first scan is:
But if we lay out the causal graph differently, the path takes on the much-more-obviously-breadth-first form:
And now using this layout for the various configurations of foliations above we get:
We can think of different layouts for the causal graph as defining different “coordinatizations of spacetime”. If the vertical direction is taken to be time, and the horizontal direction space, then different layouts in effect place events at different positions in time and space. And with the layout here, the last foliation above is “flat”, in the sense that successive slices of the foliation can be thought of as directly corresponding to successive “steps in time”.
In physics terms, different foliations correspond to different “reference frames”. And the “flat” foliation can be thought of as being like the cosmological rest frame, in which the observer is “at rest with respect to the universe”. In terms of states and events, we can also interpret this another way: we can say it’s the foliation in which in some sense the “largest possible number of events are being packed in at each step”. Or, more precisely, if at each step we scan from left to right, we’re doing every successive event that doesn’t overlap with events we’ve already done at this step:
And actually this also corresponds to what happens if, instead of using the built-in standard evaluator, we explicitly tell the Wolfram Language to repeatedly do replacements in expressions. To compare with what we’ve done above, we have to be a little careful in our definitions, using ⊕ and ⊖ as versions of + and – that have to get explicitly evaluated by other rules. But having done this, we get exactly the same sequence of “intermediate expressions” as in the flat (i.e. “breadth-first”) foliation above:
In general, different foliations can be thought of as specifying different “event-selection functions” to be applied to determine what events should occur at the next steps from any given state. At one extreme we can pick single-event-at-a-time event selection functions—and at the other extreme we can pick maximum-events-at-a-time event selection functions. In our Physics Project we have called the states obtained by applying maximal collections of events at a time “generational states”. And in effect these states represent the typical way we parse physical “spacetime”—in which we take in “all of space” at every successive moment of time. At a practical level the reason we do this is that the speed of light is somehow fast compared to the operation of our brains: if we look at our local surroundings (say the few hundred meters around us), light from these will reach us in a microsecond, while it takes our brains milliseconds to register what we’re seeing. And this makes it reasonable for us to think of there being an “instantaneous state of space” that we can perceive “all at once” at each particular “moment in time”.
But what’s the analog of this when it comes to expression evaluation? We’ll discuss this a little more later. But suffice it to say here that it depends on who or what the “observer” of the process of evaluation is supposed to be. If we’ve got different elements of our states laid out explicitly in arrays, say in a GPU, then we might again “perceive all of space at once”. But if, for example, the data associated with states is connected through chains of pointers in memory or the like, and we “observe” this data only when we explicitly follow these pointers, then our perception won’t as obviously involve something we can think of as “bulk space”. But by thinking in terms of foliations (or reference frames) as we have here, we can potentially fit what’s going on into something like space, that seems familiar to us. Or, put another way, we can imagine in effect “programming in a certain reference frame” in which we can aggregate multiple elements of what’s going on into something we can consider as an analog of space—thereby making it familiar enough for us to understand and reason about.
Multiway Evaluation and Multiway Graphs
We can view everything we’ve done so far as dissecting and reorganizing the standard evaluation process. But let’s say we’re just given certain underlying rules for transforming expressions—and then we apply them in all possible ways. It’ll give us a “multiway” generalization of evaluation—in which instead of there being just one path of history, there are many. And in our Physics Project, this is exactly how the transition from classical to quantum physics works. And as we proceed here, we’ll see a close correspondence between multiway evaluation and quantum processes.
But let’s start again with our expression (1 + (2 + 2)) + (3 + 4), and consider all possible ways that individual integer addition “events” can be applied to evaluate this expression. In this particular case, the result is pretty simple, and can be represented by a tree that branches in just two places:
But one thing to notice here is that even at the first step there’s an event that we’ve never seen before. It’s something that’s possible if we apply integer addition in all possible places. But when we start from the standard evaluation process, the basic event
just never appears with the “expression context” we’re seeing it in here.
Each branch in the tree above in some sense represents a different “path of history”. But there’s a certain redundancy in having all these separate paths—because there are multiple instances of the same expression that appear in different places. And if we treat these as equivalent and merge them we now get:
(The question of “state equivalence” is a subtle one, that ultimately depends on the operation of the observer, and how the observer constructs their perception of what’s going on. But for our purposes here, we’ll treat expressions as equivalent if they are structurally the same, i.e. every instance of or of 5 is “the same”
or 5.)
If we now look only at states (i.e. expressions) we’ll get a multiway graph, of the kind that’s appeared in our Physics Project and in many applications of concepts from it:
This graph in a sense gives a succinct summary of possible paths of history, which here correspond to possible evaluation paths. The standard evaluation process corresponds to a particular path in this multiway graph:
What about a more complicated case? For example, what is the multiway graph for our recursive computation of Fibonacci numbers? As we’ll discuss at more length below, in order to make sure every branch of our recursive evaluation terminates, we have to give a slightly more careful definition of our function f:
But now here’s the multiway tree for the evaluation of f[2]:
And here’s the corresponding multiway graph:
The leftmost branch in the multiway tree corresponds to the standard evaluation process; here’s the corresponding path in the multiway graph:
Here’s the structure of the multiway graph for the evaluation of f[3]:
Note that (as we’ll discuss more later) all the possible evaluation paths in this case lead to the same final expression, and in fact in this particular example all the paths are of the same length (12 steps, i.e. 12 evaluation events).
In the multiway graphs we’re drawing here, every edge in effect corresponds to an evaluation event. And we can imagine setting up foliations in the multiway graph that divide these events into slices. But what is the significance of these slices? When we did the same kind of thing above for causal graphs, we could interpret the slices as representing “instantaneous states laid out in space”. And by analogy we can interpret a slice in the multiway graph as representing “instantaneous states laid out across branches of history”. In the context of our Physics Project, we can then think of these slices as being like superpositions in quantum mechanics, or states “laid out in branchial space”. And, as we’ll discuss later, just as we can think of elements laid out in “space” as corresponding in the Wolfram Language to parts in a symbolic expression (like a list, a sum, etc.), so now we’re dealing with a new kind of way of aggregating states across branchial space, that has to be represented with new language constructs.
But let’s return to the very simple case of (1 + (2 + 2)) + (3 + 4). Here’s a more complete representation of the multiway evaluation process in this case, including both all the events involved, and the causal relations between them:
The “single-way” evaluation process we discussed above uses only part of this:
And from this part we can pull out the causal relations between events to reproduce the (“single-way”) causal graph we had before. But what if we pull out all the causal relations in our full graph?
What we then have is the multiway causal graph. And from foliations of this, we can construct possible histories—though now they’re multiway histories, with the states at particular time steps now being what amount to superposition states.
In the particular case we’re showing here, the multiway causal graph has a very simple structure, consisting essentially just of a bunch of isomorphic pieces. And as we’ll see later, this is an inevitable consequence of the nature of the evaluation we’re doing here, and its property of causal invariance (and in this case, confluence).
Branchlike Separation
Although what we’ve discussed has already been somewhat complicated, there’s actually been a crucial simplifying assumption in everything we’ve done. We’ve assumed that different transformations on a given expression can never apply to the same part of the expression. Different transformations can apply to different parts of the same expression (corresponding to spacelike-separated evaluation events). But there’s never been a “conflict” between transformations, where multiple transformations can apply to the same part of the same expression.
So what happens if we relax this assumption? In effect it means that we can generate different “incompatible” branches of history—and we can characterize the events that produce this as “branchlike separated”. And when such branchlike-separated events are applied to a given state, they’ll produce multiple states which we can characterize as “separated in branchial space”, but nevertheless correlated as a result of their “common ancestry”—or, in quantum mechanics terms, “entangled”.
As a very simple first example, consider the rather trivial function f defined by
If we evaluate f[f[0]] (for any f) there are immediately two “conflicting” branches: one associated with evaluation of the “outer f”, and one with evaluation of the “inner f”:
We can indicate branchlike-separated pairs of events by a dashed line:
Adding in causal edges, and merging equivalent states, we get:
We see that some events are causally related. The first two events are not—but given that they involve overlapping transformations they are “branchially related” (or, in effect, entangled).
Evaluating the expression f[f[0]+1] gives a more complicated graph, with two different instances of branchlike-separated events:
Extracting the multiway states graph we get
where now we have indicated “branchially connected” states by pink “branchial edges”. Pulling out only these branchial edges then gives the (rather trivial) branchial graph for this evaluation process:
There are many subtle things going on here, particularly related to the treelike structure of expressions. We’ve talked about separations between events: timelike, spacelike and branchlike. But what about separations between elements of an expression? In something like {f[0], f[0], f[0]} it’s reasonable to extend our characterization of separations between events, and say that the f[0]’s in the expression can themselves be considered spacelike separated. But what about in something like f[f[0]]? We can say that the f[_]’s here “overlap”—and “conflict” when they are transformed—making them branchlike separated. But the structure of the expression also inevitably makes them “treelike separated”. We’ll see later how to think about the relation between treelike-separated elements in more fundamental terms, ultimately using hypergraphs. But for now an obvious question is what in general the relation between branchlike-separated elements can be.
And essentially the answer is that branchlike separation has to “come with” some other form of separation: spacelike, treelike, rulelike, etc. Rulelike separation involves having multiple rules for the same object (e.g. a rule as well as
)—and we’ll talk about this later. With spacelike separation, we basically get branchlike separation when subexpressions “overlap”. This is fairly subtle for tree-structured expressions, but is much more straightforward for strings, and indeed we have discussed this case extensively in connection with our Physics Project.
Consider the (rather trivial) string rewriting rule:
Applying this rule to AAAAAA we get:
Some of the events here are purely spacelike separated, but whenever the characters they involve overlap, they are also branchlike separated (as indicated by the dashed pink lines). Extracting the multiway states graph we get:
And now we get the following branchial graph:
So how can we see analogs in expression evaluation? It turns out that combinators provide a good example (and, yes, it’s quite remarkable that we’re using combinators here to help explain something—given that combinators almost always seem like the most obscure and difficult-to-explain things around). Define the standard S and K combinators:
Now we have for example
where there are many spacelike-separated events, and a single pair of branchlike + treelike-separated ones. With a slightly more complicated initial expression, we get the rather messy result
now with many branchlike-separated states:
Rather than using the full standard S, K combinators, we can consider a simpler combinator definition:
Now we have for example
where the branchial graph is
and the multiway causal graph is:
The expression f[f[f][f]][f] gives a more complicated multiway graph
and branchial graph:
Interpretations, Analogies and the Concept of Multi
Before we started talking about branchlike separation, the only kinds of separation we considered were timelike and spacelike. And in this case we were able to take the causal graphs we got, and set up foliations of them where each slice could be thought of as representing a sequential step in time. In effect, what we were doing was to aggregate things so that we could talk about what happens in “all of space” at a particular time.
But when there’s branchlike separation we can no longer do this. Because now there isn’t a single, consistent “configuration of all of space” that can be thought of as evolving in a single thread through time. Rather, there are “multiple threads of history” that wind their way through the branchings (and mergings) that occur in the multiway graph. One can make foliations in the multiway graph—much like one does in the causal graph. (More strictly, one really needs to make the foliations in the multiway causal graph—but these can be “inherited” by the multiway graph.)
In physics terms, the (single-way) causal graph can be thought of as a discrete version of ordinary spacetime—with a foliation of it specifying a “reference frame” that leads to a particular identification of what one considers space, and what time. But what about the multiway causal graph? In effect, we can imagine that it defines a new, branchial “direction”, in addition to the spatial direction. Projecting in this branchial direction, we can then think of getting a kind of branchial analog of spacetime that we can call branchtime. And when we construct the multiway graph, we can basically imagine that it’s a representation of branchtime.
A particular slice of a foliation of the (single-way) causal graph can be thought of as corresponding to an “instantaneous state of (ordinary) space”. So what does a slice in a foliation of the multiway graph represent? It’s effectively a branchial or multiway combination of states—a collection of states that can somehow all exist “at the same time”. And in physics terms we can interpret it as a quantum superposition of states.
But how does all this work in the context of expressions? The parts of a single expression like
In ordinary evaluation, we just generate a specific sequence of individual expressions. But in multiway evaluation, we can imagine that we generate a sequence of Multi objects. In the examples we’ve seen so far, we always eventually get a Multi containing just a single expression. But we’ll soon find out that that’s not always how things work, and we can perfectly well end up with a Multi containing multiple expressions.
So what might we do with a Multi? In a typical “nondeterministic computation” we probably want to ask: “Does the Multi contain some particular expression or pattern that we’re looking for?” If we imagine that we’re doing a “probabilistic computation” we might want to ask about the frequencies of different kinds of expressions in the Multi. And if we’re doing quantum computation with the normal formalism of quantum mechanics, we might want to tag the elements of the Multi with “quantum amplitudes” (that, yes, in our model presumably have magnitudes determined by path counting in the multiway graph, and phases representing the “positions of elements in branchial space”). And in a traditional quantum measurement, the concept would typically be to determine a projection of a Multi, or in effect an inner product of Multi objects. (And, yes, if one knows only that projection, it’s not going to be enough to let one unambiguously continue the “multiway computation”; the quantum state has in effect been “collapsed”.)
Is There Always a Definite Result?
For an expression like (1 + (2 + 2)) + (3 + 4) it doesn’t matter in what order one evaluates things; one always gets the same result—so that the corresponding multiway graph leads to just a single final state:
But it’s not always true that there’s a single final state. For example, with the definitions
standard evaluation in the Wolfram Language gives the result 0 for f[f[0]] but the full multiway graph shows that (with a different evaluation order) it’s possible instead to get the result g[g[0]]:
And in general when a certain collection of rules (or definitions) always leads to just a single result, one says that the collection of rules is confluent; otherwise it’s not. Pure arithmetic turns out to be confluent. But there are plenty of examples (e.g. in string rewriting) that are not. Ultimately a failure of confluence must come from the presence of branchlike separation—or in effect a conflict between behavior on two different branches. And so in the example above we see that there are branchlike-separated “conflicting” events that never resolve—yielding two different final outcomes:
As an even simpler example, consider the definitions and
. In the Wolfram Language these definitions immediately overwrite each other. But assume they could both be applied (say through explicit
,
rules). Then there’s a multiway graph with two “unresolved” branches—and two outcomes:
For string rewriting systems, it’s easy to enumerate possible rules. The rule
(that effectively sorts the elements in the string) is confluent:
But the rule
is not confluent
and “evaluates” BABABA to four distinct outcomes:
These are all cases where “internal conflicts” lead to multiple different final results. But another way to get different results is through “side effects”. Consider first setting x = 0 then evaluating {x = 1, x + 1}:
If the order of evaluation is such that x + 1 is evaluated before x = 1 it will give 1, otherwise it will give 2, leading to the two different outcomes {1, 1} and {1, 2}. In some ways this is like the example above where we had two distinct rules: and
. But there’s a difference. While explicit rules are essentially applied only “instantaneously”, an assignment like x = 1 has a “permanent” effect, at least until it is “overwritten” by another assignment. In an evaluation graph like the one above we’re showing particular expressions generated during the evaluation process. But when there are assignments, there’s an additional “hidden state” that in the Wolfram Language one can think of as corresponding to the state of the global symbol table. If we included this, then we’d again see rules that apply “instantaneously”, and we’d be able to explicitly trace causal dependencies between events. But if we elide it, then we effectively hide the causal dependence that’s “carried” by the state of the symbol table, and the evaluation graphs we’ve been drawing are necessarily somewhat incomplete.
Computations That Never End
The basic operation of the Wolfram Language evaluator is to keep doing transformations until the result no longer changes (or, in other words, until a fixed point is reached). And that’s convenient for being able to “get a definite answer”. But it’s rather different from what one usually imagines happens in physics. Because in that case we’re typically dealing with things that just “keep progressing through time”, without ever getting to any fixed point. (“Spacetime singularities”, say in black holes, do for example involve reaching fixed points where “time has come to an end”.)
But what happens in the Wolfram Language if we just type , without giving any value to
? The Wolfram Language evaluator will keep evaluating this, trying to reach a fixed point. But it’ll never get there. And in practice it’ll give a message, and (at least in Version 13.3 and above) return a TerminatedEvaluation object:
What’s going on inside here? If we look at the evaluation graph, we can see that it involves an infinite chain of evaluation events, that progressively “extrude” +1’s:
A slightly simpler case (that doesn’t raise questions about the evaluation of Plus) is to consider the definition
which has the effect of generating an infinite chain of progressively more “f-nested” expressions