This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ PrePrints) and either DOI or URL of the article must be cited.
A traditional random variable X is a function that maps from a stochastic process to the real line. Here, "real line" refers to the structure (R,<=,|x-y|), where R is the set of real numbers, <= is the standard linear order relation on R, and d(x,y)=|x-y| is the usual metric on R. The traditional expectation value E(X) of X is then often a poor choice of a statistic when the stochastic process that X maps from is a structure other than the real line or some substructure of the real line. If the stochastic process is a structure that is not linearly ordered (including structures totally unordered) and/or has a metric space geometry very different from that induced by the usual metric, then statistics such as E(X) are often of poor quality with regards to qualitative intuition and quantitative variance (expected error) measurements. For example, the traditional expected value of a fair die is E(X)=(1/6)(1+2+...+6)=3.5. But this result has no relationship with reality or with intuition because the result implies that we expect the value of [ooo] (die face value "3") or [oooo] (dice face value "4") more than we expect the outcome of say [o] or [oo]. The fact is, that for a fair die, we would expect any pair of values equally. The reason for this is that the values of the face of a fair die are merely symbols with no order, and with no metric geometry other than the discrete metric geometry. On a fair die, [oo] is not greater or less than [o]; rather [oo] and [o] are simply symbols without order. Moreover, [o] is not "closer" to [oo] than it is to [ooo]; rather, [o], [oo], and [ooo] are simply symbols without any inherit order or metric geometry. This paper proposes an alternative statistical system, based somewhat on graph theory, that takes into account the order structure and metric geometry of the underlying stochastic process.
This paper is largely a mathematics paper, however it has "clear applicability" to computational biology as demonstrated by Example 8.18 (DNA page 40), Example 9.20 (DNA to linear structures page 60-61), Example 9.21 (GSP to complex plane pages 61-62), Example 9.22 (DNA mapping with extended range page 62), Example 9.23 (GSP with Markov model pages 62-63).