The first whole-cell computer model of a living organism was just completed. It accurately models Mycoplasma genitalium, the world’s smallest free-living organism, throughout its entire life cycle, accounting for every type of endogenous molecule known to exist in the organism. The model represents a key step towards one of the greatest biological challenges of our time: bridging the divide between genotype and phenotype.
In the future, such models will be of tremendous importance in the biological sciences. They will expand our understanding of how organisms work, they will enable novel hypotheses to be quickly tested in silico, they will enable observation of biological phenomenon that are too difficult to observe experimentally, and they will enable such observations to be made without perturbing the system at all. The long-term motivation driving our research is the development of computer models of complex organisms that model their underlying chemistries with great detail.
Of course, there's a long way to go before this can be realized. Our current research focuses on solving one of the most basic problems limiting progress in the field: we still can't even identify all the proteins and metabolites in a biological sample. Humans produce approximately 21,000 unique proteins, but when post-translational modifications, protein variants, and isoforms are considered, that number is many times larger. The problem is further complicated by the fact that those proteins span a range of concentrations over 14 orders of magnitude wide. High Performance Liquid Chromatography-Tandem Mass Spectrometry (LC-MS/MS), combined with extensive offline pre-fractionation, has become the analytical tool of choice for untargeted identification of proteins, but the most proteins ever reported to be identified in a sample is only 5,111. Moreover, our current methodology for protein identification is biased. It assumes that there are no other amino acid sequences in the sample other than those predicted from the sequenced genome.
The problem is even more severe in metabolomics. Metabolites are far more chemically diverse than peptides, their mass spectral properties are more variable from instrument to instrument, and they are not as easy to interpret/predict. For example, in one of the most advanced experiments to date, only 643 elemental compositions (not chemical structures) were identified in an Arabidopsis thaliana leaf extract. Given estimates of the size of the Arabidopsis metabolome (link 2), that corresponds to only 3-13% of all metabolites.