NOT PEER-REVIEWED
"PeerJ Preprints" is a venue for early communication or feedback before peer review. Data may be preliminary.

Supplemental Information

S1 Text: In-Depth Tiling Methods

DOI: 10.7287/peerj.preprints.1426v1/supp-1

S2 Text: Upper-bounding Estimates for Storing 1 Million Tiled Genomes

DOI: 10.7287/peerj.preprints.1426v1/supp-2

S3 Text: Mathematical Definition of a Genome

DOI: 10.7287/peerj.preprints.1426v1/supp-3

S4 Text: Clinical Analysis of PGP Participants with BRCA frameshifts

DOI: 10.7287/peerj.preprints.1426v1/supp-4

S5 Text : ABO Blood Type Classifiers

DOI: 10.7287/peerj.preprints.1426v1/supp-5

S6 Text: Methods for the VCF-based Clinical Analysis of the BRCA region

DOI: 10.7287/peerj.preprints.1426v1/supp-6

Table S1: Fisher’s exact test for Hashimoto’s thyroiditis

The Fisher’s exact test statistic value is 0.012545, which is significant at p < 0.05. The total number of PGP participants included is the number of participants from our cohort of 178 who completed the “Endocrine, Metabolic, Nutritional, and Immunity Survey” (last updated in 2012, downloaded on 2015, February 18).

DOI: 10.7287/peerj.preprints.1426v1/supp-7

Table S2: Fisher’s exact test for breast cancer for all participants reporting to be female

The Fisher’s exact test statistic value is 0.172704, which is not significant at p < 0.05. The total number of PGP participants included is the number of participants from our 178 who completed the “Cancer Survey” (last updated in 2012, downloaded on 2015, February 18) and who reported to be female with no conflicting reports of other genders. The power associated with this study is 48%, if we assume the probability of developing breast cancer when a participant has a deleterious BRCA mutation is 57% [69] and the probability of developing breast cancer when a participant has functional BRCA proteins is 12% [70]. To obtain a power greater than 95%, we would require a population size of 181 females.

DOI: 10.7287/peerj.preprints.1426v1/supp-8

Figure S1: Projection of 502 1000 Genomes Project whole genome sequences along 489 their first two principal components

This projection replicates the 1000 Genomes Project PCA projection, which used single nucleotide variants. Only well sequenced positions in autosomal chromosomes were used in our PCA: 29,366 positions out of 20,160,996 (10,080,498 per phase) (0.146% of the genome). As expected, the first principal component separates the participants of African descent from the other participants. The second principal component separates the participants of European and Asian descent. The African super population is colored with brown shades, the Ad Mixed American super population is colored with red shades, the East Asian super population is colored with green shades, the South Asian super population is colored in light blue shades, and the European super population is colored in purple shades. The number in parenthesis in the legend is the number of whole genome sequences labeled with this ethnicity.

DOI: 10.7287/peerj.preprints.1426v1/supp-9

Figure S2: Projection of 178 PGP whole genome sequences along their first two principal components

Only well sequenced positions in autosomal chromosomes were used in PCA: 29,366 positions out of 20,160,996 (10,080,498 per phase) (0.146% of the genome). As expected, the first principal component separates the participants of African descent from the other participants. The second principal component separates the participants of European and Asian descent. The African super population is colored with brown shades, the Ad Mixed American super population is colored with red shades, the East Asian super population is colored with green shades, the South Asian super population is colored in light blue shades, and the European super population is colored in purple shades. The number in parenthesis in the legend is the number of whole genome sequences (callsets) with this ethnicity. Note the large proportion of participants with european ancestry indicating the ethnic homogeneity of the PGP.

DOI: 10.7287/peerj.preprints.1426v1/supp-10

Additional Information

Competing Interests

We have read the journal's policy and the authors of this manuscript have the following competing interests: The authors are or have been employed by Curoverse and are compensated by a mixture of cash, stock and stock options from the company. AB and WV are officers of the company. AWZ and AB are members of the Curoverse board of directors.

Author Contributions

Sarah Guthrie conceived and designed the experiments, performed the experiments, analyzed the data, wrote the paper, prepared figures and/or tables, reviewed drafts of the paper.

Abram Connelly conceived and designed the experiments, performed the experiments, analyzed the data, wrote the paper, prepared figures and/or tables, reviewed drafts of the paper.

Peter Amstutz contributed reagents/materials/analysis tools, reviewed drafts of the paper.

Adam F. Berrey reviewed drafts of the paper, helped to conceive and design the experiments.

Nicolas Cesar contributed reagents/materials/analysis tools, reviewed drafts of the paper.

Jiahua Chen contributed reagents/materials/analysis tools, reviewed drafts of the paper.

Radhika Chippada contributed reagents/materials/analysis tools, reviewed drafts of the paper.

Tom Clegg contributed reagents/materials/analysis tools, reviewed drafts of the paper, helped to conceive and design the experiments.

Bryan Cosca contributed reagents/materials/analysis tools, reviewed drafts of the paper.

Jiayong Li performed the experiments, analyzed the data, wrote the paper, reviewed drafts of the paper, helped to conceive and design the experiments.

Nancy Ouyang performed the experiments, contributed reagents/materials/analysis tools, wrote the paper, prepared figures and/or tables, reviewed drafts of the paper.

Jonathan Sheffi reviewed drafts of the paper, helped to conceive and design the experiments.

Brett Smith contributed reagents/materials/analysis tools, reviewed drafts of the paper.

Ward Vandewege contributed reagents/materials/analysis tools, reviewed drafts of the paper, helped to conceive and design the experiments.

Alexander Wait Zaranek conceived and designed the experiments, analyzed the data, wrote the paper, reviewed drafts of the paper.

Human Ethics

The following information was supplied relating to ethical approvals (i.e., approving body and any reference numbers):

All work was done with public data consistent with national and international standards.

Data Deposition

The following information was supplied regarding data availability:

All files and analyses are available from the Arvados platform (accession url:

https://curover.se/su92l-j7d0g-swtofxa2rct8495).

Funding

Research reported in this publication was supported by Curoverse and the National Institute Of General Medical Sciences of the National Institutes of Health under Award Number R43GM109737. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.


Add your feedback

Before adding feedback, consider if it can be asked as a question instead, and if so then use the Question tab. Pointing out typos is fine, but authors are encouraged to accept only substantially helpful feedback.

Some Markdown syntax is allowed: _italic_ **bold** ^superscript^ ~subscript~ %%blockquote%% [link text](link URL)
 
By posting this you agree to PeerJ's commenting policies
  Visitors   Views   Downloads