Interview with Dr. Mark Biggin – Don’t underestimate protein abundances and the relative importance of transcription
Today, we publish “System Wide Analyses have Underestimated Protein Abundances and the Importance of Transcription in Mammals”. This new study, led by Dr. Mark Biggin, suggests that the major reason why protein and mRNA abundance measurements are poorly correlated is because of various types of measurement error in the protein and mRNA abundance, rather than transcription having minimal impact on protein expression levels.
Dr. Mark Biggin is a Staff Scientist at Lawrence Berkeley National Laboratory. He obtained his PhD at the MRC Laboratory of Molecular Biology, Cambridge in 1984, was a Post Doctoral Fellow with Robert Tjian at UC Berkeley from 1985-1989 and after that an Assistant and Associate Professor in Molecular Biophysics and Biochemistry at Yale. He joined Berkeley Lab. in 2000. For the last 29 years, he has primarily studied the biochemistry of transcriptional regulatory networks that control animal development, using the Drosophila blastoderm network as a model.
During the last decade, this work has ballooned into a large interdisciplinary collaboration: the Berkeley Drosophila Transcription Network Project (BDTNP). This multi laboratory effort includes several advanced math groups, including his collaborators on this current publication, Statisticians Profs. Peter Bickel and Jingyi Jessica Li. As part of the analysis of transcription networks, it became important to understand the number of molecules per cell of each transcription factor because the relatively high levels and which these proteins are expressed helps explain the unexpectedly widespread, overlapping patterns of DNA binding these regulators show across the genome. In 2011, Mark published a survey showing that most transcription factors are expressed at 10,000 – 300,000 molecules per cell, with a median level of expression of 60,000. This is consistent with the evidence that each transcription factor binds to tens of thousands of different genomic regions.
We asked Dr. Biggin to comment on the work he published with us.
PJ: Can you tell us a bit about the research you publish today?
MB: Our paper presents a critical reanalysis of several proteomics papers. We estimate different forms of experimental measurement error in the high throughput mass spectrometry and RNA-seq data in these papers and shows that when these errors are account for quite different conclusions are reached. For example, we discovered that the absolute abundances of proteins given are up to an order magnitude too low. In addition, we find that transcription and mRNA turnover are most important in determining the relative levels of protein expression, not translation and protein degradation as these papers had concluded.
PJ: Why is it important?
MB: Most proteins assert their activity by interacting with other bio-molecules, be they metabolites, lipids, nucleic acids or other proteins. As Paul Ehrlich classically stated “corpora non agunt nisi ligata” – “a substance is not active unless it is linked to another”. The position of equilibrium of interaction is determined by the concentration of the protein. If there is too little protein, there will be insufficient interaction. For example, some of the proteomics papers estimated only 10% of the number of histones needed to cover the genome with nucleosomes, whereas it is known that in fact the bulk of the genome is covered. In addition, the proteomics papers greatly under estimated the number transcription factor molecules per cell, which is inconsistent with the widespread DNA binding others and we have observed. Because many researchers seem unaware of the typical concentrations of different classes of proteins in vivo, we thought it important to take the time to formally correct the record. In the process we devised a statistical approach that should allow other researchers to more accurately scale similar mass spectrometry data in future.
An important challenge for high throughput proteomics and genomics is taking experimental measurement error into account when analyzing what are frequently quite noisy datasets. We present two strategies that use Analysis of Variance (ANOVA) to determine the percent of the variation in measured protein expression levels that is due to biological process, such as transcription and translation, and that due to error. ANOVA is a classic statistical method developed by RA Fisher in the 1920s. Despite the fact that this is a well-regarded and standard approach in some fields, its usefulness has not been widely appreciated in genomics and proteomics. I hope that our paper will help people understand how it can help the correct interpretation of datasets that suffer significant error.
PJ: What challenges did you face while doing this research?
MB: My colleagues and I do not normally work with label free mass spectrometry data, so we had to take the time to understand it first. Another challenge was that, unusually for me, we produced no data for this paper but instead relied on data from other groups. The early version of our paper was prepared over two years ago before some of the important datasets we now use were available. So in the early days, while we were convinced that there were serious problems with the published proteomics papers, we could not at that time rigorously show that in the way we have now been able to.
PJ: What is next in your research?
MB: For the moment, we will return to focusing on our other research. We hope that researchers in the proteomics field will take up the analysis approaches we present and extend them. We have not been able to calculate the exactly correct values, either of protein abundances or of the importance of each step in gene expression. We have only been able to obtain better approximations. It will require additional experiments with more accurate control data to achieve a more precise answer. We stand ready to help groups who wish to employ the analysis approaches we presented.
PJ: What persuaded you to submit this work to PeerJ?
MB: We ran into a close-minded response from fans of the research we were criticizing. We choose PeerJ as an independent journal not beholden to a particular point of view, but willing to judge work on its merits. We are very pleased with our choice.
PJ: How would you describe your overall experience with us?
PJ: What do you think about our “Pay once, Publish for life” publishing plans?
MB: It is much cheaper than any other publishing we have done. That would be true even if two of our authors were not UC Berkeley members. Since they are and UC Berkeley has an institutional agreement, it only cost a membership for me.
PJ: Thank you very much!
Experience the PeerJ process for yourself, and take advantage of our free publication offer (when you also submit a preprint) through the last day of March.