We propose a measure based upon the fundamental theoretical concept in algorithmic information theory that provides a natural approach to the problem of evaluating

The question of natural measures of complexity for objects other than strings and sequences, in particular suited for 2-dimensional objects, is an open important problem in complexity science and with potential applications to molecule folding, cell distribution, artificial life and robotics. Here we provide a measure based upon the fundamental theoretical concept that provides a natural approach to the problem of evaluating

The challenge of finding and defining 2-dimensional complexity measures has been identified as an open problem of foundational character in complexity science (

On the other hand, for Kolmogorov complexity, the common approach to evaluating the algorithmic complexity of a string has been by using lossless compression algorithms because the length of lossless compression is an upper bound of Kolmogorov complexity. Short strings, however, are difficult to compress in practice, and the theory does not provide a satisfactory solution to the problem of the instability of the measure for short strings.

Here we use so-called

Compression algorithms have proven to be signally applicable in several domains (see e.g., _{m}

Central to algorithmic information theory (AIT) is the definition of algorithmic (Kolmogorov–Chaitin or program-size) complexity (

That is, the length of the shortest program ^{n}, which can be described as “_{2}(

A technical inconvenience of

The invariance theorem guarantees that complexity values will only diverge by a constant _{1} and _{2}) and that they will converge at the limit.

_{1} and _{2} are two universal Turing machines and _{U1}(_{U2}(_{1} and _{2}, there exists a constant

Hence the longer the string, the less important _{U1} and _{U2} for a string

The algorithmic probability (also known as Levin’s semi-measure) of a string ^{1}

The group of valid programs forms a prefix-free set (no element is a prefix of any other, a property necessary to keep 0 <

Levin’s semi-measure^{2}

It is called a

^{−K(s)}, involving

This means that if a string has many descriptions it also has a short one. It beautifully connects frequency to complexity, more specifically the frequency of occurrence of a string with its algorithmic (Kolmogorov) complexity. The Coding theorem implies that (

An important property of _{μ} such that for all _{μ}

Let

Here we consider an experiment with 2-dimensional deterministic Turing machines (also called

In ‘Comparison of _{m}

In ‘Comparison of _{m}

Turmites or 2-dimensional (2D) Turing machines run not on a 1-dimensional tape but in a 2-dimensional unbounded grid or array. At each step they can move in four different directions (_{1}, _{1}} → {_{2}, _{2}, _{1} and reads symbols _{1}, it writes _{2}, changes to state _{2} and moves to a contiguous cell following direction _{2} is the halting state then

Let (_{2D} be the set of Turing machines with _{1}, _{1}} there are 4_{2D} is (4^{nm}. It is possible to enumerate all these machines in the same way as 1D Turing machines (e.g., as has been done in

We take as output for a 2D Turing machine the minimal array that includes all cells visited by the machine. Note that this probably includes cells that have not been visited, but it is the more natural way of producing output with some regular format and at the same time reducing the set of different outputs.

Top: Example of a deterministic 2-dimensional Turing machine. Bottom: Accumulated runtime distribution for (4, 2)_{2D}.

_{2D} and its execution over a ‘0’-filled grid. We show the portion of the grid that is returned as the output array. Two of the six cells have not been visited by the machine.

We have run all machines in (4, 2)_{2D} just as we have done before for deterministic 1-dimensional Turing machines (

We also used a reduced enumeration to avoid running certain trivial machines whose behavior can be predicted from the transition table, as well as filters to detect non-halting machines before exhausting the entire runtime. In the reduced enumeration we considered only machines with an initial transition moving to the right and changing to a different state than the initial and halting states. Machines moving to the initial state at the starting transition run forever, and machines moving to the halting state produce single-character output. So we reduce the number of initial transitions in (_{2D} to ^{nm−1}. To enumerate these machines we construct a mixed-radix number, given that the digit corresponding to the initial transition now goes from 0 to

The Busy Beaver runtime value for (4, 2) is 107 steps upon halting. But no equivalent Busy Beavers are known for 2-dimensional Turing machines (although variations of Turmite’s Busy Beaver functions have been proposed (^{8} random machines in the reduced enumeration. We used a runtime of 2,000 steps for the runtime sample, this is 10.6% of the machines in the reduced enumeration for (4, 2)_{2D}, but 1,500 steps for running all (4, 2)_{2}_{2D} we produce a random number from 0 to

But we found some high runtime values—precisely 23 machines required more than 1,000 steps. The highest value was a machine progressing through 1,483 steps upon halting. So we have enough evidence to believe that by setting the runtime at 2,000 steps we have obtained almost all (if not all) output arrays. We ran all 6 × 34^{7} Turing machines in the reduced enumeration for (4, 2)_{2D}. Then we applied the completions explained before.

The final output represents the result of 2(4^{2} executions (all machines in (4, 2)_{2D} starting with both blank symbols ‘0’ and ‘1’). We found 3,079,179,980,224 non-halting machines and 492,407,829,568 halting machines. A number of 1,068,618 different binary arrays were produced after 12 days of calculation with a supercomputer of medium size (a 25×86-64 CPUs running at 2,128 MHz each with 4 GB of memory each, located at the Centro Informático Científico de Andalucía (CICA), Spain.

Let _{2D} be the set constructed by dividing the occurrences of each different array by the number of halting machines as a natural extension of _{2D}, that is the objects with lowest Kolmogorov complexity values.

Only non-symmetrical cases are displayed. The grid is only for illustration purposes.

_{2D} denotes the frequency distribution (a calculated Universal Distribution) from the output of deterministic 2-dimensional Turing machines, with associated complexity measure _{m,2D}. _{2D} distributes 1,068,618 arrays into 1,272 different complexity values, with a minimum complexity value of 2.22882 bits (an explanation of non-integer program-size complexity is given in ^{d×d} (without considering any symmetries), _{2D} can be said to produce all square binary arrays of length up to 3 × 3, that is ^{(4×4)} = 65,536 square arrays with side of length (or dimension) _{2D} is of side length ^{48}; it has a _{m,2D} value equal to 34.2561.

Top: Frequency of appearance of symmetric “checkerboard” patterns sorted from more to less frequent according to _{2D} (displayed only non-symmetrical cases under rotation and complementation). The checkerboard of size 4 × 4 doesn’t occur. However, all 3 × 3 as seen in _{m,2D} = 6.7 this is the simplest 4 × 4 square array after the preceding all-blank 4 × 4 array (with _{m,2D} = 6.4) and before the 4 × 4 square array with a black cell in one of the array corners (with complexity _{m,2D} = 6.9). Bottom Right: The only and most complex square array (with 15 other symmetrical cases) in _{2D} with _{m,2D} = 34.2561. Another way to see this array is as one among those of length 13 with low complexity given that it occurred once in the sampled distribution in the classification unlike all other square arrays of the same size that are missing in _{2D}.

What one would expect from a distribution where simple patterns are more frequent (and therefore have lower Kolmogorov complexity after application of the Coding theorem) would be to see patterns of the “checkerboard” type with high frequency and low random complexity (

Bottom 16 objects in the classification with lowest frequency, or being most random according to _{2D}. It is interesting to note the strong similarities given that similar-looking cases are not always exact symmetries. The arrays are preceded by the number of occurrences of production from all the (4, 2)_{2D} Turing machines.

We have coined the informal notion of a “climber” as an object in the frequency classification (from greatest to lowest frequency) that appears better classified among objects of smaller size rather than with the arrays of their size, this is in order to highlight possible candidates for low complexity, hence illustrating how the process make low complexity patterns to emerge. For example, “checkerboard” patterns (see ^{n}, that is, the string 01 repeated

Symmetric objects have higher frequency and therefore lower Kolmogorov complexity. Nevertheless, a fully deterministic algorithmic process starting from completely symmetric rules produces a range of patterns of high complexity and low symmetry.

An attempt of a definition of a climber is a pattern _{m}_{m}_{m}

For example,

_{2D} in 5 equal parts: 72.66%, 15.07%, 6.17359%, 2.52%, 3.56%.

We denote this set by _{m,2D3×3}. For example, the 2 glider configurations in the Game of Life (

One way to validate our method based on the Coding theorem (

It is also not uncommon to detect instabilities in the values retrieved by a compression algorithm for short strings, as explained in ‘Uncomputability and instability of

When researchers have chosen to use compression algorithms for reasonably long strings, they have proven to be of great value, for example, for DNA false positive repeat sequence detection in genetic sequence analysis (

In this section we study the relation between _{m}_{m}_{m}

For this experiment we have selected the strings in ^{15} possible strings. The distribution of complexities is shown in

Length ( |
Strings |
---|---|

10 | 1,024 |

11 | 2,048 |

12 | 4,094 |

13 | 8,056 |

14 | 13,068 |

15 | 14,634 |

Top: Distribution of complexity values for different string lengths (

As expected, the longer the strings, the greater their average complexity. The overlapping of strings with different lengths that have the same complexity correspond to climbers. The experiment consisted in creating files with strings of different _{m}_{m}

Then for each

6 different string lengths

10 partitions (sorted by increasing complexity) of the strings with length

100 files with 100 random strings in each partition.

This makes for a total of 6,000 different files. Each file contains 100 different binary strings, hence with length of 100 ×

A crucial step is to replace the binary encoding of the files by a larger alphabet, retaining the internal structure of each string. If we compressed the files

The files were compressed using the _{m}

We have used other compressors such as GZIP (which uses Lempel–Ziv algorithm LZ77) and BZIP2 (Burrows–Wheeler block sorting text compression algorithm and Huffman coding), with several compression levels. The results are similar to those shown in

We shall now look at how 1-dimensional arrays (hence strings) produced by 2D Turing machines correlate with strings that we have calculated before (

All Turing machines in (4, 2) are included in (4, 2)_{2D} because these are just the machines that do not move up or down. We first compared the values of the 1,832 output strings in (4, 2) to the 1-dimensional arrays found in (4, 2)_{2D}. We are also interested in the relation between the ranks of these 1,832 strings in both (4, 2) and (4, 2)_{2D}.

_{m,2D} with 2D Turing machines as a function of ordinary _{m,1D} (that is, simply _{m}_{s}

The length _{Km,1DKm,2D.l} = 0.9936 still denotes a tight association.

_{m,2D} on _{m,1D} gives the following approximate relation: _{m,2D} ≈ 2.64 + 1.11_{m,1D}. Note that this subtle departure from identity may be a consequence of a slight non-linearity, a feature visible in

Length ( |
Correlation |
---|---|

5 | 0.9724 |

6 | 0.9863 |

7 | 0.9845 |

8 | 0.9944 |

9 | 0.9977 |

10 | 0.9952 |

11 | 1 |

12 | 1 |

A 1-dimensional CA can be represented by an array of _{i}_{i}^{n}. An evolution comprises a sequence of configurations {_{i}^{n} → Σ^{n}; thus the global relation is symbolized as: ^{t}^{t+1} by a local function ^{n} different neighborhoods (where ^{kn} distinct evolution rules. The evolutions of these cellular automata usually have periodic boundary conditions. Wolfram calls this type of CA Elementary Cellular Automata (denoted simply by ECA) and there are exactly ^{kn} = 256 rules of this type. They are considered the most simple cellular automata (and among the simplest computing programs) capable of great behavioral richness.

1-dimensional ECA can be visualized in 2-dimensional space–time diagrams where every row is an evolution in time of the ECA rule. By their simplicity and because we have a good understanding about them (e.g., at least one ECA is known to be capable of Turing universality (_{m,2D}, being just as effective as other methods that approach ECA using compression algorithms (

We have seen that our Coding theorem method with associated measure _{m}_{m,2D} in this paper for 2D Kolmogorov complexity) is in agreement with bit string complexity as approached by compressibility, as we have reported in ‘Comparison of _{m}

The Universal Distribution from Turing machines that we have calculated (_{2D}) will help us to classify Elementary Cellular Automata. Classification of ECA by compressibility has been done before in _{2D}, as follows.

We will say that the space–time diagram (or evolution) of an Elementary Cellular Automaton ^{t}_{d×d} from breaking {^{t}

Notice that the same procedure can be extended for its use on arbitrary images.

If the classification of all rules in ECA by _{m,2D} yields the same classification obtained by compressibility, one would be persuaded that _{m,2D} is a good alternative to compressibility as a method for approximating the Kolmogorov complexity of objects, with the signal advantage that _{m,2D} can be applied to very short strings and very short arrays such as images. Because all possible 2^{9} arrays of size 3 × 3 are present in _{m,2D} we can use this arrays set to try to classify all ECAs by Kolmogorov complexity using the Coding Theorem method. _{m,2D3×3} this subset from _{m,2D}.

_{m,2D3×3} calculated for every cellular automaton. It shows a positive link between the two measures. The Pearson correlation amounts to ^{2} = 0.6853. These values correspond to a strong correlation, although smaller than the correlation between 1- and 2-dimensional complexities calculated in ‘Comparison of _{m}

Concerning orders arising from these measures of complexity, they too are strongly linked, with a Spearman correlation of _{s}

Top: Distribution of points along the axes displaying clusters of equivalent rules and a distribution corresponding to the known complexity of various cases. Bottom: Same plot but with some ECA rules highlighted some of which were used in the side by side comparison in

The anomalies found in the classification of Elementary Cellular Automata (e.g., Rule 77 being placed among ECA with high complexity according to _{m,2D3×3}) is a limitation of _{m,2D3×3} itself and not of the Coding theorem method which for _{2D3×3} as attempting to reconstruct the evolution of each ECA for the given number of steps with square arrays only 3 bits in size, the complexity of the three square arrays adding up to approximate _{m,2D} of the ECA rule. Hence it is the deployment of _{2D3×3} that takes between 500 to 50K bits to reconstruct every ECA space–time evolution depending on how random versus how simple it is.

Other ways to exploit the data from _{2D} (e.g., non-square arrays) can be utilized to explore better classifications. We think that constructing a Universal Distribution from a larger set of Turing machines, e.g., _{2D4×4} will deliver more accurate results but here we will also introduce a tweak to the definition of the complexity of the evolution of a cellular automaton.

All the first 128 ECAs (the other 128 are 0–1 reverted rules) starting from the simplest (black cell) initial configuration running for

Splitting ECA rules in array squares of size 3 is like trying to look through little windows 9 pixels wide one at a time in order to recognize a face, or training a “microscope” on a planet in the sky. One can do better with the Coding theorem method by going further than we have in the calculation of a 2-dimensional Universal Distribution (e.g., calculating in full or a sample of _{2D4×4}), but eventually how far this process can be taken is dictated by the computational resources at hand. Nevertheless, one should use a telescope where telescopes are needed and a microscope where microscopes are needed.

One can think of an improvement in resolution of _{m,2D}(_{2}(_{m,2D} for larger images as a sort of “optical lens”. This is possible because we know that the Kolmogorov complexity of repeated objects grows by log_{2}(_{2}(_{u}^{t}_{d×d} of the matrix ^{t}_{u}_{u}^{t}

Top: Block decomposing (other boundary conditions are possible and under investigation) the evolution of Rule 30 (top) ECA after _{m,2D3×3} to approximate its Kolmogorov complexity. Bottom: Side by side comparison of 8 evolutions of representative ECAs, starting from a random initial configuration, sorted from lowest to highest BDM values (top) and smallest to largest compression lengths using the Deflate algorithm as a method to approximate Kolmogorov complexity (

Now complexity values of

Also worth notice that the fact that ECA can be successfully classified by _{m,2D} with an approximation of the Universal Distribution calculated from Turing machines (TM) suggests that output frequency distributions of ECA and TM cannot be but strongly correlated, something that we had found and reported before in

Another variation of the same _{m,2D} measure is to divide the original image into all possible square arrays of a given length rather than taking a partition. This would, however, be exponentially more expensive than the partition process alone, and given the results in

One important question that arises when positing the soundness of the Coding theorem method as an alternative to having to pick a universal Turing machine to evaluate the Kolmogorov complexity

On the one hand, one has to bear in mind that no other method existed for approximating the Kolmogorov complexity of short strings. On the other hand, we have tried to minimize any arbitrary choice, from the formalism of the computing model to the informed runtime, when no Busy Beaver values are known and therefore sampling the space using an educated runtime cut-off is called for. When no Busy Beaver values are known the chosen runtime is determined according to the number of machines that we are ready to miss (e.g., less than .01%) for our sample to be significative enough as described in ‘Setting the runtime’. We have also shown in

Among the possible arbitrary choices it is the enumeration that may perhaps be questioned, that is, calculating

We have provided here some theoretical and statistical arguments to show the reliability, validity and generality of our measure, more empirical evidence has also been produced, in particular in the field of cognition and psychology where researchers often have to deal with too short strings or too small patterns for compression methods to be used. For instance, it was found that the complexity of a (one-dimensional) string better predicts its recall from short-term memory that the length of the string (

We have shown how a highly symmetric but algorithmic process is capable of generating a full range of patterns of different structural complexity. We have introduced this technique as a natural and objective measure of complexity for _{m,2D} (and _{m}

We also introduced the

We have shown that the method is stable in the face of the changes in Turing machine formalism that we have undertaken (in this case Turmites) as compared to, for example, traditional 1-dimensional Turing machines or to strict integer value program-size complexity (

We have made available to the community this “microscope” to look at the space of bit strings and other objects in the form of the Online Algorithmic Complexity Calculator (_{m}_{m,2D} and many other objects and a wider range of methods) that provides objective algorithmic probability and Kolmogorov complexity estimations for short binary strings using the method described herein. Raw data and the computer programs to reproduce the results for this paper can also be found under the Publications section of the Algorithmic Nature Group (

Contents: CSV files and output distribution of all 2D TMs used by BDM to calculate the complexity of all arrays of size 3 × 3 and ECAs.

The authors declare there are no competing interests.