PhilDB - The time series database with built-in change logging

Not affiliated, Melbourne, Victoria, Australia
DOI
10.7287/peerj.preprints.1488v1
Subject Areas
Data Science, Databases
Keywords
time series, database, logging, Python, data science
Copyright
© 2015 MacDonald
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ PrePrints) and either DOI or URL of the article must be cited.
Cite this article
MacDonald A. 2015. PhilDB - The time series database with built-in change logging. PeerJ PrePrints 3:e1488v1

Abstract

PhilDB is an open-source time series database. It supports storage of time series datasets that are dynamic, that is recording updates to existing values in a log as they occur. Recent open-source systems, such as InfluxDB and OpenTSDB, have been developed to indefinitely store long-period, high-resolution time series data. Unfortunately they require a large initial installation investment before use because they are designed to operate over a cluster of servers to achieve high-performance writing of static data in real time. In essence, they have a ‘big data’ approach to storage and access. Other open-source projects for handling time series data that don’t take the ‘big data’ approach are also relatively new and are complex or incomplete. None of these systems gracefully handle revision of existing data while tracking values that changed. Unlike ‘big data’ solutions, PhilDB has been designed for single machine deployment on commodity hardware, reducing the barrier to deployment. PhilDB eases loading of data for the user by utilising an intelligent data write method. It preserves existing values during updates and abstracts the update complexity required to achieve logging of data value changes. PhilDB improves accessing datasets by two methods. Firstly, it uses fast reads which make it practical to select data for analysis. Secondly, it uses simple read methods to minimise effort required to extract data. PhilDB takes a unique approach to meta-data tracking; optional attribute attachment. This facilitates scaling the complexities of storing a wide variety of data. That is, it allows time series data to be loaded as time series instances with minimal initial meta-data, yet additional attributes can be created and attached to differentiate the time series instances as a wider variety of data is needed. PhilDB was written in Python, leveraging existing libraries. This paper describes the general approach, architecture, and philosophy of the PhilDB software.

Author Comment

This is a submission to PeerJ for review.

Supplemental Information

Mean write time comparison

DOI: 10.7287/peerj.preprints.1488v1/supp-2