Presentation
We are monitoring a system, and we are given a utility/cost function for comparing predictions made about this system to what happens really. We would like to get:
- The optimal decisions to take at any given time, in order to maximise the utility function.
- The corresponding expected utility.
- Events when one of the above quantity change in the system.
This article (preprint version) presents the theoretical background for this problem, as well as a concrete algorithm for computing the above events from data and utility function only.
Source code of the reference implementation is available as Free/Libre software, according to the GNU LGPL v2.1 or above.
Application examples are given below and detailed in the article.
Details
Predictions in a physical context
In a physical framework information transfer is limited by the speed of light. In this view the system present is a single point in state-space. Contrast this with dynamical systems where the present is the whole state vector, the line in the figure below. Here there is no instant propagation of information, and only a small portion of the state vector is accessible.

The past light-cone is the collection of all points that could possibly have an a priori causal influence on the present. The future cone is the collection of all points that might possibly be influenced by the present state. The problem is that in order to infer correctly the state of a point F in the future cone we might potentially need all points in the past light cone of F. It would theoretically be possible to have access to the points like P in the system current past, provided in practice that we indeed recorded the value of P. However there is by definition no way of getting the value of points like O that are outside the current system past light cone. Since both points belong to the past light cone of a point F in our own future, the consequence is that even for deterministic systems we get a statistical distribution of possible futures for a given observed past, depending on what information present outside the current past cone is necessary to predict the future. In other words, boundary and/or conditions in inaccessible regions may determine part of the future.
Let us now consider grouping two past cones x1- and x2- together if they lead to the same distribution of futures p(x+|x1-)=p(x+|x2-). Suppose that point P in the past have a distinct value for pasts x1- and x2-. Then there is no way to recover what the value of P was by new observations: we cannot use future knowledge to decide between x1- and x2- since p(x+|x1-)=p(x+|x2-). For all practical matter these two pasts are then equivalent. Mathematically the associated equivalence relation x1- ≡ x2- partitions the system past light cones in sets σ(x-)={ y-: p(x+|y-)=p(x+|x-)}.
The sets σ are called the causal states of the system (see Cosma Shalizi's Thesis). In a discrete setup a new observation leads to a transition from a state σ1 to a state σ2. The causal states and their transitions form a deterministic automaton: the ε-machine (see this article by James Crutchfield and Karl Young). A neat result is the abstraction of the time dependencies into the states. The transitions between states include all dependencies from the past that could have an influence on the future, hence the ε-machine actually forms a Markovian graph.
The causal states construction is not limited to light-cones as described above. We can also cluster together data (parameter) points x∊X according to the conditional distributions p(Z|x) of points z∊Z in a space of predictions. The same equivalence relation as above can be defined: The equivalence classes induced by this relation might also be called causal states by analogy, even though no causality relation needs to be explicitly introduced as is the case for light-cones.
Introduction of the utility function
The preceeding view offers no distinction between futures with dire or benign consequences. For example, administrating a drug to a patient may cause severe secondary effects. The causal states view is to monitor p(effect|drug), but does not specify if the effect is desired or not. In practice this simple example would need to take into account much more parameters, like the patient condition when administrating the drug, etc. But this is an illustration only, where we wish to predict an effect (space Z) from a drug (space X).
The idea is to introduce a utility function U(y,z) that tells the consequence of predicting y∊Z (the patient state we think will happen) instead of z (the patient state that really happens). For example, if we administrate the drug because we think this will solve the patient headache, but she gets an additionnal bellyache instead, then the utility of that prediction is not very good.
What happens when we now put together drugs X according to their expected utility E[U(effect|drug)]=E[U(y|x)]=∫z∊ZU(y,z)p(z|x) instead of the probabilities p(effect|drug) ? We actually need to distinguish two cases:
- We put x1 and x2 together when the expected effect that achieve the maximal utility is the same. This allows to classify drugs by category, like treating headaches or treating bellyaches. But tells nothing about secondary effects.
- We put together drugs x1 and x2 when the expected utility is the same, irrespectively of what the drug is for. This allows to classify drugs according to their level of secondary effects, and keep dangerous drugs out of self-medication. This tells nothing about what the drug is for in the first place.
- We put together drugs according to both above criteria at the same time.
Technically speaking:
- The first case is x1 ≡ x2 when argmaxy∊ZE[U(y|x1)]=argmaxy∊ZE[U(y|x1)]. This is the iso-prediction state.
- The second case is x1 ≡ x2 when maxy∊ZE[U(y|x1)]=maxy∊ZE[U(y|x1)]. This is the iso-utility state.
- The third state is the intersection of both. This is the decisional state.
The article gives more details. The name "decisional state" comes from the assumption the utility function encodes all the user needs to know in order to make a decision.
If as hypothesised in the aforementioned thesis "emergent structures are sub-machines of the ε-machine", then a nice consequence is that the decisional states correspond to the emergent structures of a given utility function. Another consequence is that the transitions between the decisional states correspond to events in the system where we might want to change our decision. Hence an answer to the problem mentioned in the above introduction.
Application examples
Image filtering
We can filter an image according to whether it is difficult to predict a point by looking at its neighbours. It is then expected that flat regions with uniform background (like a blue sky) are quite simple to predict. Whereas edges and other sharp transitions are more difficult.


More examples and applicability conditions are detailed in the article.
Hidden markov model reconstruction
ε-machines are known to be the minimal and optimal automaton that can statistically reproduce the original data. Introducing a utility function results in a sub-machine of the ε-machine, where states are merged according to their expected utility or expected best decision or both.

See the article for an example of automaton reconstruction
Dowload the source code
The reference implementation is available as Free/Libre software, according to the GNU LGPL v2.1 or above.
The latest version is available in my source repository. It can be downloaded either as a tar.gz archive or by using GIT: git clone git://source.numerimoire.net/decisional_states.git.

