The R language has withstood the test of time. Forty years after it was initially developed (in the form of the S language) R is being used by millions of programmers on workflows the inventors of the language could never have imagined. Although base R packages...

["Data Mining and Machine Learning","Data Science"]
doi:10.7287/peerj.preprints.3180v1
190 downloads
1,155 views

Within the statistics community, a number of guiding principles for sharing data have emerged; however, these principles are not always made clear to collaborators generating the data. To bridge this divide, we have established a set of guidelines for sharing data....

["Statistics","Data Science"]
doi:10.7287/peerj.preprints.3139v2
4 downloads
14 views

Data wrangling is a critical foundation of data science, and wrangling of categorical data is an important component of this process. However, categorical data can introduce unique issues in data wrangling, particularly in real-world settings with collaborators...

["Computer Education","Data Science","Scientific Computing and Simulation","Social Computing"]
doi:10.7287/peerj.preprints.3163v1
5 downloads
28 views

Many current and future data scientists will be "isolated"---working alone or in small teams within a larger organization. This isolation brings certain challenges as well as freedoms. Drawing on my considerable experience both working in the professional sports...

["Computer Education","Data Science","Scientific Computing and Simulation","Social Computing"]
doi:10.7287/peerj.preprints.3160v1
31 downloads
79 views

Data analysis, statistical research, and teaching statistics have at least one thing in common: these activities all produce many files! There are data files, source code, figures, tables, prepared reports, and much more. Most of these files evolve over the course...

["Computer Education","Data Science","Scientific Computing and Simulation","Software Engineering"]
doi:10.7287/peerj.preprints.3159v1
14 downloads
32 views

We describe the \textsc{Coefficient-Flow} algorithm for calculating the bounding chain of an $(n-1)$--boundary on an $n$--manifold-like simplicial complex $S$. We prove its correctness and show that it has a computational time complexity of $O(|S^{(n-1)}|)$ (where...

["Algorithms and Analysis of Algorithms","Data Science","Scientific Computing and Simulation"]
doi:10.7287/peerj.preprints.3151v1
2,504 downloads
9,494 views

Despite growing interest in Open Access (OA) to scholarly literature, there is an unmet need for large-scale, up-to-date, and reproducible studies assessing the prevalence and characteristics of OA. We address this need using oaDOI, an open online service that...

["Legal Issues","Science Policy","Data Science"]
doi:10.7287/peerj.preprints.3119v1
13 downloads
94 views

Bacterial surfaces are complex, built of from membranes, peptide-glycans and, importantly, proteins. The proteins play crucial roles as the key regulator of how the bacterium interacts with its environment. A full catalog of the motifs in coiled-coil proteins and...

["Bioinformatics","Infectious Diseases","Data Science"]
doi:10.7287/peerj.preprints.3118v1
51 downloads
479 views

Gathering up-to-date information on food prices is critical in developing regions, as it allows policymakers and development practitioners to rely on accurate data on food security. This study explores the feasibility of utilizing social media as a new data source...

["Data Science","Network Science and Online Social Networks","Social Computing"]
doi:10.7717/peerj-cs.126
36 downloads
172 views

Sigmoidal and double-sigmoidal dynamics are commonly observed in many areas of biology. Here we present sicegar, an R package for the automated fitting and classification of sigmoidal and double-sigmodial data. The package categorizes data into one of three categories,...

["Bioinformatics","Computational Biology","Mathematical Biology","Statistics","Data Science"]
doi:10.7287/peerj.preprints.3116v1
48 downloads
366 views

We present a CUDA-based implementation of a decision tree construction algorithm within the gradient boosting library XGBoost. The tree construction algorithm is executed entirely on the graphics processing unit (GPU) and shows high performance with a variety of...

["Artificial Intelligence","Data Mining and Machine Learning","Data Science"]
doi:10.7717/peerj-cs.127
42 downloads
454 views

Despite recent algorithmic improvements, learning the optimal structure of a Bayesian network from data is typically infeasible past a few dozen variables. Fortunately, domain knowledge can frequently be exploited to achieve dramatic computational savings, and...

["Artificial Intelligence","Data Mining and Machine Learning","Data Science","Distributed and Parallel Computing"]
doi:10.7717/peerj-cs.122
102 downloads
978 views

The ability to promptly recognise new research trends is strategic for many stakeholders, including universities, institutional funding bodies, academic publishers and companies. While the literature describes several approaches which aim to identify the emergence...

["Artificial Intelligence","Data Science","Digital Libraries"]
doi:10.7717/peerj-cs.119
34 downloads
202 views

Scaling up the analysis of sensitive or confidential documents frequently stumbles on the limited number of individuals with the necessary clearance to access the documents. The availability of cryptographic protocols compatible with text processing methods can...

["Cryptography","Data Science","Natural Language and Speech"]
doi:10.7287/peerj.preprints.2994v1
91 downloads
368 views

Sharing and reusing data in research is a welcome and encouraged practice since it maximises the scientific outcomes given limited financial, material and human resources. Interdisciplinary research is considered to benefit from this practice, uniting researchers...

["Bioinformatics","Computational Biology","Data Science","Spatial and Geographic Information Systems"]
doi:10.7287/peerj.preprints.2248v4

Top subject areas - Articles & Preprints

Top subject areas - People

View all subject areas