peerj-preprints peerj-preprints PeerJ Comput. Sci. PeerJ Preprints 2167-9843 PeerJ Inc. San Francisco, USA 2052v1 10.7287/peerj.preprints.2052v1 Bioinformatics Computational Biology Data Mining and Machine Learning Data Science Order and Metric Compatible Symbolic Sequence Processing Greenhoe Daniel J dgreenhoe@gmail.com National Chiao Tung University Hsinchu Taiwan 18 5 2016 4 e2052v1 18 5 2016 © 2016 Greenhoe 2016 Greenhoe This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.

A traditional random variable X is a function that maps from a stochastic process to the real line (X,<=,d,+,.), where R is the set of real numbers, <= is the standard linear order relation on R, d(x,y)=|x-y| is the usual metric on R, and (R, +, .) is the standard field on R. Greenhoe(2015b) has demonstrated that this definition of random variable is often a poor choice for computing statistics when the stochastic process that X maps from has structure that is dissimilar to that of the real line. Greenhoe(2015b) has further proposed an alternative statistical system, that rather than mapping a stochastic process to the real line, instead maps to a weighted graph that has order and metric geometry structures similar to that of the underlying stochastic process. In particular, ideally the structure X maps from and the structure X maps to are, with respect to each other, both isomorphic and isometric.Mapping to a weighted graph is useful for analysis of a single random variable.for example the expectation EX of X can be defined simply as the center of its weighted graph. However, the mapping has limitations with regards to a sequence of random variables in performing sequence analysis (using for example Fourier analysis or wavelet analysis), in performing sequence processing (using for example FIR filtering or IIR filtering), in making diagnostic measurements (using a post-transform metric space), or in making goptimalh decisions (based on gdistanceh measurements in a metric space or more generally a distance space). Rather than mapping to a weighted graph, this paper proposes instead mapping to an ordered distance linear space Y=(R^n,<=,d,+,.,R,+,x), where (R,+,x) is a field, + is the vector addition operator on R^n x R^n, and . is the scalar-vector multiplication operator on R x R^n. The linear space component of Y provides a much more convenient (as compared to the weighted graph) framework for sequence analysis and processing. The ordered set and distance space components of Y allow one to preserve the order structure and distance geometry inherent in the underlying stochastic process, which in turn likely provides a less distorted (as compared to the real line) framework for analysis, diagnostics, and optimal decision making.

metric linear space signal processing GSP genomic signal processing genomic sequence processing symbolic sequence processing The author received no funding for this work. version 1

This paper is a kind of followup to another paper entitled “Order and metric geometry compatible stochastic processing”

that was submitted to PeerJ on 2015 February 19 (almost 15 months ago) and is available here:

https://peerj.com/preprints/844/

The 2015 paper presents a traditional random variable

Supplemental Information 10.7287/peerj.preprints.2052v1/supp-1 C++ source code

C++ source code (written by the author of the paper) for the prgrogram ssp.exe, which was used to generate TeX files for the 128 or so data plots presented in the paper.

Additional Information Competing Interests

The author declares that he has no competing interests.

Author Contributions

Daniel J Greenhoe conceived and designed the experiments, performed the experiments, analyzed the data, contributed reagents/materials/analysis tools, wrote the paper, prepared figures and/or tables, performed the computation work, reviewed drafts of the paper.

Data Deposition

The following information was supplied regarding data availability:

The C++ source code to support this paper is available here:

https://www.researchgate.net/publication/302953844

No username or password is necessary. Just click "View" and then "Download", which allows a user to download a single zip file.