Highlights
- •
3D spatial resolution of a fully dynamical whole-cell kinetic model
- •
Detailed single-reaction, single-cell accounting of time-dependent ATP costs
- •
Genome-wide mRNA half-lives emerge from length-dependent kinetics and diffusion
- •
Connections among metabolism, genetic information, and cell growth are revealed
Summary
We present a whole-cell fully dynamical kinetic model (WCM) of JCVI-syn3A, a minimal cell with a reduced genome of 493 genes that has retained few regulatory proteins or small RNAs. Cryo-electron tomograms provide the cell geometry and ribosome distributions. Time-dependent behaviors of concentrations and reaction fluxes from stochastic-deterministic simulations over a cell cycle reveal how the cell balances demands of its metabolism, genetic information processes, and growth, and offer insight into the principles of life for this minimal cell. The energy economy of each process including active transport of amino acids, nucleosides, and ions is analyzed. WCM reveals how emergent imbalances lead to slowdowns in the rates of transcription and translation. Integration of experimental data is critical in building a kinetic model from which emerges a genome-wide distribution of mRNA half-lives, multiple DNA replication events that can be compared to qPCR results, and the experimentally observed doubling behavior.
Graphical abstract

Keywords
Introduction
An overarching goal of molecular biology is to explain the basic processes of life in terms of the laws of physics and chemistry. In 1984, Morowitz proposed the study of the simplest living cells, the Mycoplasmas, as models for understanding the fundamental principles of life (
). Just as the study of hydrogen, the simplest atom, led to the understanding of more complex atoms, it seems plausible that the study of the simplest living cells will reveal principles that apply to all living systems. For this reason, we have been interested in studying “minimal cells” by designing and building cellular genomes that do not include genes that are non-essential in the laboratory. We have been able to produce living cells with fewer genes than any known naturally occurring cell (
- Hutchison C.A.
- Chuang R.-Y.
- Noskov V.N.
- Assad-Garcia N.
- Deerinck T.J.
- Ellisman M.H.
- Gill J.
- Kannan K.
- Karas B.J.
- Ma L.
- et al.
Design and synthesis of a minimal bacterial genome.
;
- Breuer M.
- Earnest T.M.
- Merryman C.
- Wise K.S.
- Sun L.
- Lynott M.R.
- Hutchison C.A.
- Smith H.O.
- Lapek J.D.
- Gonzalez D.J.
- et al.
Essential metabolism for a minimal cell.
). Such cells should be easier to completely describe than any known naturally occurring cells. To approach the question “what is life?” using our minimal cell model, we are testing whether the combined functions of the minimal cell genes can inform a computer model that correctly predicts the behavior of the cell. This will provide a test of our understanding of the minimal requirements for life. Such a model will also provide a design tool for predicting the effects of changes in the genome; for example, when a reaction pathway is added.
A complete description of the state of the cell requires knowledge of its size, shape, components, intracellular reactions, and interactions with its environment, all of these as a function of time and cell growth. Adding to this list is the need for theoretical models and simulations that interpret and integrate this daunting amount of experimental data. Due to large numbers of genes with unknown function and the complexity in model systems such as Escherichia coli, along with the broad range of concentrations and timescales that need to be considered, simulating a complete description of the state of a cell has been challenging. The development of whole-cell models (WCMs) and how they have progressed from genome-scale metabolic models (GSMMs) (
- Varma A.
- Palsson B.O.
Stoichiometric flux balance models quantitatively predict growth and metabolic by-product secretion in wild-type Escherichia coli W3110.
) and the calculation of their steady-state fluxes has been recently reviewed (
- Goldberg A.P.
- Szigeti B.
- Chew Y.H.
- Sekar J.A.
- Roth Y.D.
- Karr J.R.
Emerging whole-cell modeling principles and methods.
;
- Marucci L.
- Barberis M.
- Karr J.
- Ray O.
- Race P.R.
- de Souza Andrade M.
- Grierson C.
- Hoffmann S.A.
- Landon S.
- Rech E.
- et al.
Computer-aided whole-cell design: taking a holistic approach by integrating synthetic with systems biology.
). The most comprehensive models have been developed for Mycoplasma genitalium and E. coli (
- Karr J.R.
- Sanghvi J.C.
- Macklin D.N.
- Gutschow M.V.
- Jacobs J.M.
- Bolival Jr., B.
- Assad-Garcia N.
- Glass J.I.
- Covert M.W.
A whole-cell computational model predicts phenotype from genotype.
;
- Macklin D.N.
- Ahn-Horst T.A.
- Choi H.
- Ruggero N.A.
- Carrera J.
- Mason J.C.
- Sun G.
- Agmon E.
- DeFelice M.M.
- Maayan I.
- et al.
Simultaneous cross-evaluation of heterogeneous E. coli datasets via mechanistic simulation.
), where the subsystems were treated in terms of ordinary differential equations, flux balance analysis, and stochastic simulations. Common challenges are establishing the reaction networks and the availability of kinetic parameters and -omics data such as metabolomics and proteomics to use as initial conditions. No one bacterium has a complete set of parameters and -omics data, so the development of a WCM relies upon synthesizing information from other well-studied organisms. Unlike the previous WCMs, the simulations we present here are based on fully dynamical kinetic models where subsystem networks and chemical species are interconnected continuously over time on a single-cell basis.
An ideal system for such a whole-cell model would be a minimal cell consisting of as few genes and reactions as possible for the cell to grow and divide (
- Luthey-Schulten Z.
Integrating experiments, theory and simulations into whole-cell models.
). JCVI-syn3A is a genetically minimal bacterial cell, consisting of only of 493 genes on a single 543-kbp circular chromosome with 452 genes coding for proteins (
- Breuer M.
- Earnest T.M.
- Merryman C.
- Wise K.S.
- Sun L.
- Lynott M.R.
- Hutchison C.A.
- Smith H.O.
- Lapek J.D.
- Gonzalez D.J.
- et al.
Essential metabolism for a minimal cell.
), some of which are subunits of multi-domain complexes (NCBI GenBank: CP016816.2). Syn3A’s genome and physical size are approximately one-tenth those of the model bacterial organism E. coli. Syn3A has a smaller fraction of genes with unclear function (87/452, 20%) than E. coli (1780/4637, 38%) and Mycoplasma pneumoniae (311/688, 45%) (
- Breuer M.
- Earnest T.M.
- Merryman C.
- Wise K.S.
- Sun L.
- Lynott M.R.
- Hutchison C.A.
- Smith H.O.
- Lapek J.D.
- Gonzalez D.J.
- et al.
Essential metabolism for a minimal cell.
). The reduction in complexity and scale of Syn3A presents a unique opportunity to develop a near-complete whole-cell kinetic model. The Syn3A genome was synthesized based on the known genome sequence of the natural parent Gram-positive organism Mycoplasma mycoides subsp. capri str. GM12 (GenBank: CP001621.1) and has been synthetically reduced to achieve a minimal genome producing living cells that grow, divide in about 100 min, and have consistent spherical morphologies with 400–500-nm diameters (
- Breuer M.
- Earnest T.M.
- Merryman C.
- Wise K.S.
- Sun L.
- Lynott M.R.
- Hutchison C.A.
- Smith H.O.
- Lapek J.D.
- Gonzalez D.J.
- et al.
Essential metabolism for a minimal cell.
;
- Pelletier J.F.
- Sun L.
- Wise K.S.
- Assad-Garcia N.
- Karas B.J.
- Deerinck T.J.
- Ellisman M.H.
- Mershin A.
- Gershenfeld N.
- Chuang R.-Y.
- et al.
Genetic requirements for cell division in a genomically minimal cell.
;
- Hutchison C.A.
- Chuang R.-Y.
- Noskov V.N.
- Assad-Garcia N.
- Deerinck T.J.
- Ellisman M.H.
- Gill J.
- Kannan K.
- Karas B.J.
- Ma L.
- et al.
Design and synthesis of a minimal bacterial genome.
;
- Gibson D.G.
- Glass J.I.
- Lartigue C.
- Noskov V.N.
- Chuang R.-Y.
- Algire M.A.
- Benders G.A.
- Montague M.G.
- Ma L.
- Moodie M.M.
- et al.
Creation of a bacterial cell controlled by a chemically synthesized genome.
).
In
- Breuer M.
- Earnest T.M.
- Merryman C.
- Wise K.S.
- Sun L.
- Lynott M.R.
- Hutchison C.A.
- Smith H.O.
- Lapek J.D.
- Gonzalez D.J.
- et al.
Essential metabolism for a minimal cell.
, we established the essential GSMM for Syn3A along with genome-wide gene essentiality and proteomics. The protein products of 155 genes involved in 175 metabolic reactions were organized into seven subsystems: central, nucleotide, lipid, cofactors, amino acid, ions, and macromolecule metabolism, providing the starting point for the kinetic metabolic model presented here. The reactions in macromolecule metabolism are now kinetically modeled (
- Thornburg Z.R.
- Melo M.C.R.
- Bianchi D.
- Brier T.A.
- Crotty C.
- Breuer M.
- Smith H.O.
- Hutchison 3rd, C.A.
- Glass J.I.
- Luthey-Schulten Z.
Kinetic modeling of the genetic information processes in a minimal cell.
) through approximately 2,000 reactions involving the 251 genes in the genetic information processes of DNA replication, transcription of all 493 genes, translation and degradation of all 452 mRNA, tRNA charging, and cell growth. In addition to biochemical reactions, whole-cell, 3D spatial models require cellular architecture, including spatial distributions of ribosomes and configurations of the circular chromosome. The cellular architectures (Figure 1) are reconstructed at the single-cell level directly from cryo-electron tomograms (cryo-ET) that reveal a near-random distribution of ribosomes throughout the cell with a few present in polysomes (
- Gilbert B.R.
- Thornburg Z.R.
- Lam V.
- Rashid F.M.
- Glass J.I.
- Villa E.
- Dame R.T.
- Luthey-Schulten Z.
Generating chromosome geometries in a minimal cell from cryo-electron tomograms and chromosome conformation capture maps.
).

Figure 1Workflow for whole-cell simulations
(A) Ribosome coordinates and cell boundaries are obtained from cryo-electron tomograms.
(B) The self-avoiding lattice DNA (red, white, and blue spheres) is folded around the ribosomes (yellow spheres).
(C) The membrane (green cubes) surrounds the ribosomes, DNA, and 200-nm radius cytoplasmic space.
(D) A representative set of membrane complexes and proteins (degradosomes—red spheres, SecY—blue spheres, and PtsG—green spheres) are randomly distributed in the peripheral membrane and transmembrane space.
(E) All other macromolecules are randomly distributed throughout the cytoplasm as shown in all gray spheres.
(F) Some rates have been reported from single-molecule experiments such as the DnaA filament formation rate.
(G) Otherwise, we used the BRENDA and other databases for kinetic rates.
(H) The defined medium composition is used to determine nutrient uptake in our simulations.
(I) Spatial simulations require GPU acceleration.
(J–L) The spatial simulations predict numbers of (J) active degradosomes breaking down mRNA, (K) transcribing RNAP, and (L) translating ribosomes.
(M) The length-dependent kinetics of mRNA decay and requiring mRNA to diffuse to degradosomes results in a distribution of mRNA half-lives.
(N) The average number of times each gene is transcribed over the course of the 20-min spatial simulations.
(O) A distribution of the average number of times each mRNA type is translated in its lifetime shows that every mRNA is translated at least once in its lifetime on average.
3D visualization done with VMD (
).
Kinetic parameters for the majority of the cellular reactions have been measured in related organisms through decades of biochemical, single-molecule (sm) FRET, and spectroscopic studies reported in the literature and kinetic databases like
- Bremer H.
- Dennis P.P.
Modulation of chemical composition and other parameters of the cell at different exponential growth rates.
, BRENDA (
- Chang A.
- Jeske L.
- Ulbrich S.
- Hofmann J.
- Koblitz J.
- Schomburg I.
- Neumann-Schaal M.
- Jahn D.
- Schomburg D.
BRENDA, the ELIXIR core data resource in 2021: new developments and updates.
), and equilibrium constants reported in NIST’s TECRdb (
- Goldberg R.N.
- Tewari Y.B.
- Bhat T.N.
Thermodynamics of enzyme-catalyzed reactions–a database for quantitative biochemistry.
) and Equilibrator (
- Flamholz A.
- Noor E.
- Bar-Even A.
- Milo R.
eQuilibrator–the biochemical thermodynamics calculator.
). Comparative proteomics analyses to Mesoplasma florum (
- Matteau D.
- Lachance J.-C.
- Grenier F.
- Gauthier S.
- Daubenspeck J.M.
- Dybvig K.
- Garneau D.
- Knight T.F.
- Jacques P.-É.
- Rodrigue S.
Integrative characterization of the near-minimal bacterium Mesoplasma florum.
;
- Lachance J.-C.
- Matteau D.
- Brodeur J.
- Lloyd C.J.
- Mih N.
- King Z.A.
- Knight T.F.
- Feist A.M.
- Monk J.M.
- Palsson B.O.
- et al.
Genome-scale metabolic modeling reveals key features of a minimal gene set.
), B. subtilis (
- Wang M.
- Herrmann C.J.
- Simonovic M.
- Szklarczyk D.
- von Mering C.
Version 4.0 of PaxDb: Protein abundance data, integrated across model organisms, tissues, and cell-lines.
), and E. coli (
- Taniguchi Y.
- Choi P.J.
- Li G.-W.
- Chen H.
- Babu M.
- Hearn J.
- Emili A.
- Xie X.S.
Quantifying E. coli proteome and transcriptome with single-molecule sensitivity in single cells.
) were used to approximate missing or questionable information regarding a few of the Syn3A enzymes. At the moment, only relative metabolomics data on Syn3A is available, so the metabolite concentrations used to initialize the simulations of the Syn3A WCM were estimated from a scaling of the comprehensive study done on E. coli (
- Park J.O.
- Rubin S.A.
- Xu Y.-F.
- Amador-Noguez D.
- Fan J.
- Shlomi T.
- Rabinowitz J.D.
Metabolite concentrations, fluxes and free energies imply efficient enzyme usage.
) and a limited list from M. pneumoniae (
- Yus E.
- Maier T.
- Michalodimitrakis K.
- van Noort V.
- Yamada T.
- Chen W.-H.
- Wodke J.A.
- Güell M.
- Martínez S.
- Bourgeois R.
- et al.
Impact of genome reduction on bacterial metabolism and its regulation.
). For completeness, all the modifications to the metabolic map, genetic information processing, and kinetic parameters are provided in Figures 2, S1, and S2; Tables S1 and S2; and STAR Methods (Metabolic rates and parameterization).

Figure 2The JCVI-syn3A central metabolic network
Central metabolism starts with glucose uptake via the phosphorelay mechanism shown in the inset including protein names and gene numbers. Red reaction names indicate reactions that do not have a gene annotated but are assumed to be performed by one of the uncharacterized proteins. ACALDt, PYRt2r, L_Lact2r, and ACt are all assumed to be non-enzymatic passive transport reactions. The reaction map was generated using Escher (
- King Z.A.
- Dräger A.
- Ebrahim A.
- Sonnenschein N.
- Lewis N.E.
- Palsson B.O.
Escher: A web application for building, sharing, and embedding data-rich visualizations of biological pathways.
). See also Figures S1 and S2.


Figure S2Lipid metabolism reaction network and lipidomics data, related to Figure 2
(A) Updated reaction network of lipid metabolism in Syn3A. The pathway leading to capsule formation has been removed and reactions for uptake of sphingomyelin (SM) and phosphatidylcholine (PC) have been added. (B) Lipidomics study of the lipid composition of JCVI-syn3B (Syn3A with a synthetic DNA landing pad present, but genetically identical to Syn3A) from mass spectrometry. Error bars respresent the standard deviations of three biological replicates.
With the background data now available for Syn3A, we were able to develop a whole-cell kinetic model of this minimal cell. Because of the large variation in timescales and concentrations, developing a whole-cell model that treats metabolism, genetic information processes, and growth can, at the moment, only be achieved by hybrid stochastic and deterministic simulations. Kinetics of the essential metabolic network (
- Breuer M.
- Earnest T.M.
- Merryman C.
- Wise K.S.
- Sun L.
- Lynott M.R.
- Hutchison C.A.
- Smith H.O.
- Lapek J.D.
- Gonzalez D.J.
- et al.
Essential metabolism for a minimal cell.
) are handled deterministically via ordinary differential equations (ODEs), and the kinetics of the genetic information processes are handled with stochastic simulations. We consider here two models of the stochastic simulations (Figure S3): a chemical master equation (CME) description that assumes that the whole cell is well stirred and implicitly includes the effects of diffusion in the rates of transcription, mRNA degradation, and translation; and a reaction-diffusion master equation (RDME) description that requires macromolecules to diffuse to each other for reactions to take place in the spatially heterogeneous environment of the cell. In our simulations, we record time-dependent particle counts of each molecule and intermediate, fluxes of all metabolic reactions, and in the spatial model, the position of each macromolecule within the cell. We present results of the well-stirred model over a complete cell cycle for 174 healthy replicate cells out of a total population of 207 cells. Unhealthy cells within the simulations run out of phosphoenol pyruvate (pep), halting glycolysis.

Figure S3Pictoral RDME-gCME-ODE algorithm and stochastic reaction table, related to STAR Methods
Pictoral algorithm for the hybrid RDME-gCME-ODE simulations and formulations of the stochastic reactions. The first table designates the handling of each reaction type in the spatially-resolved simulations. The second table gives the reactions used in the totally well-stirred simulations.
For the spatial model, which is computationally expensive, we use our graphics processing unit (GPU)-based Lattice Microbes software (
- Roberts E.
- Stone J.E.
- Luthey-Schulten Z.
Lattice Microbes: high-performance stochastic simulation method for the reaction-diffusion master equation.
;
- Hallock M.J.
- Stone J.E.
- Roberts E.
- Fry C.
- Luthey-Schulten Z.
Simulation of reaction diffusion processes over biologically relevant size and time scales using multi-GPU workstations.
;
- Earnest T.M.
- Cole J.A.
- Luthey-Schulten Z.
Simulating biological processes: stochastic physics from whole cells to colonies.
) to simulate replicates over the first 20 min of the cell cycle, before DNA replication and substantial cell growth have occurred. These relatively short-term simulations are critical in calculating binding probabilities and estimating mRNA half-lives in the well-stirred model. Being computationally less expensive, the well-stirred model runs on CPUs and provides a whole-cell model that can be easily complexified to allow addition of more pathways and testing. More importantly, it allows us to quantitatively understand the “principles of life” for a minimal cell growing with little regulation.
Results
3D spatial whole-cell simulations incorporate experimental data and inform homogeneous well-stirred simulations
The general workflow of constructing an initial cell state is shown in Figure 1. Starting from the coordinates of the 503 ribosomes and cell boundary of a small cell with
400-nm diameter from cryo-ET (Figure 1A), the DNA is folded around ribosomes as a circular self-avoiding polymer on a lattice in such a way that the sequence order and gene positions are maintained (Figures 1B and 1C) (
- Gilbert B.R.
- Thornburg Z.R.
- Lam V.
- Rashid F.M.
- Glass J.I.
- Villa E.
- Dame R.T.
- Luthey-Schulten Z.
Generating chromosome geometries in a minimal cell from cryo-electron tomograms and chromosome conformation capture maps.
). Experimental 3C maps showed no significant features of persistent supercoiled domains or loops, so the chromosome is assumed to be in a relaxed state (
- Gilbert B.R.
- Thornburg Z.R.
- Lam V.
- Rashid F.M.
- Glass J.I.
- Villa E.
- Dame R.T.
- Luthey-Schulten Z.
Generating chromosome geometries in a minimal cell from cryo-electron tomograms and chromosome conformation capture maps.
). Each replicate cell uses the same ribosome coordinates from cryo-ET, but a different chromosome configuration unique among the ensemble. The top of the membrane is cut away in Figure 1C to reveal the ribosomes and DNA. According to the 3C-seq maps and the proteomics of nucleoid-associated proteins (NAPs) (
- Gilbert B.R.
- Thornburg Z.R.
- Lam V.
- Rashid F.M.
- Glass J.I.
- Villa E.
- Dame R.T.
- Luthey-Schulten Z.
Generating chromosome geometries in a minimal cell from cryo-electron tomograms and chromosome conformation capture maps.
;
- Breuer M.
- Earnest T.M.
- Merryman C.
- Wise K.S.
- Sun L.
- Lynott M.R.
- Hutchison C.A.
- Smith H.O.
- Lapek J.D.
- Gonzalez D.J.
- et al.
Essential metabolism for a minimal cell.
), the DNA configurations are assumed to be relaxed with no supercoiling so that the genes are easily accessible. Figure 1D shows 120 degradosome complexes in red, 66 SecY proteins in blue, and 831 PtsG proteins in green as three examples. The remainder of the proteome consisting of over 77,000 proteins, 200 mRNA, and 5,800 tRNA are then randomly distributed throughout the cytoplasm and membrane, resulting in the crowded environment (Figure 1E).
Our spatial model includes a total of 7,765 unique molecules and intermediates and over 7,200 reactions including binding reactions such as RNAP binding to a gene start site. Where possible, kinetic parameters are obtained from single-molecule experiments, such as the smFRET experiment for the formation of the DnaA filament along the AT-rich single-stranded DNA (Figure 1F) near the origin. Otherwise, as described in STAR Methods (Metabolic rates and parameterization), kinetic parameters are developed from a targeted survey of the primary literature or kinetic databases (Figure 1G) as discussed above. Simulations of a spatially resolved cell are computationally expensive and require GPU (Figure 1I) acceleration to make them possible on a human timescale (
- Hallock M.J.
- Stone J.E.
- Roberts E.
- Fry C.
- Luthey-Schulten Z.
Simulation of reaction diffusion processes over biologically relevant size and time scales using multi-GPU workstations.
). The GPUs used for spatial simulations included NVIDIA Titan V and NVIDIA Tesla Volta V100 GPUs, which took 10 h and 8 h to simulate 20 min of cell time, respectively. Because of this computational expense, we simulated the first 20 min of the cell cycle for only 8 cells, a limited time frame during which we assume no substantial growth or DNA replication has occurred. The simulations provide insight into the numbers of active degradosomes, RNAP, and ribosomes (Figures 1J–1L). The early increases are due to the initial conditions of the spatial simulations. The cell is initialized with no active complexes (no RNAP are on the chromosome and no mRNA are on ribosomes and degradosomes), and the transient behavior reflects the time required for RNAP to diffuse to genes and mRNA to be translated or degraded.
- Bremer H.
- Dennis P.P.
Modulation of chemical composition and other parameters of the cell at different exponential growth rates.
calculated that anywhere from 15.5% to 36.2% of RNAP are active at any one time in E. coli depending on the doubling time, with slower-growing cells having a smaller fraction of active RNAP. The spatial model predicts that Syn3A will have an average of 63 of its 187 RNAP active (34%) early in the cell cycle, falling within the calculated range. For fast-growing E. coli, approximately 80% of the ribosomes are active (
- Bremer H.
- Dennis P.P.
Modulation of chemical composition and other parameters of the cell at different exponential growth rates.
;
- Dai X.
- Zhu M.
- Warren M.
- Balakrishnan R.
- Patsalo V.
- Okano H.
- Williamson J.R.
- Fredrick K.
- Wang Y.-P.
- Hwa T.
Reduction of translating ribosomes enables Escherichia coli to maintain elongation rates during slow growth.
), but for slow-growing E. coli, this number drops between 20%–50% depending on the growth rate (
- Dai X.
- Zhu M.
- Warren M.
- Balakrishnan R.
- Patsalo V.
- Okano H.
- Williamson J.R.
- Fredrick K.
- Wang Y.-P.
- Hwa T.
Reduction of translating ribosomes enables Escherichia coli to maintain elongation rates during slow growth.
). The spatial model predicts that, on average, 220 of the 503 ribosomes, roughly 45% of ribosomes, are active.
We calculated the mRNA half-lives, the number of times a gene is transcribed, and the number of times an mRNA is translated in its lifetime for all the 452 protein-coding genes (Figures 1M–1O). The average and median half-lives are in reasonable agreement with the 2-min average half-life experimentally measured in Mycoplasma gallisepticum (
). The broad distribution of half-lives, including the long tail out to 15 min, has been observed in a genome-wide study of mRNA half-lives in B. subtilis,
- Hambraeus G.
- von Wachenfeldt C.
- Hederstedt L.
Genome-wide survey of mRNA half-lives in Bacillus subtilis identifies extremely stable mRNAs.
. Each gene is transcribed at least once within the ensemble of simulations, but not in every simulation. The number of times each gene is transcribed reflects both its length and, more importantly, its promoter strength, which is weighted relative to proteomics counts. Lastly, the genome-wide average translations per mRNA is four times, but several factors impact this number including gene length and how many times a mRNA can be read by a polysome using a polysome spacing of 120 nt estimated from a distribution of polysome sizes in E. coli (
- Brandt F.
- Etchells S.A.
- Ortiz J.O.
- Elcock A.H.
- Hartl F.U.
- Baumeister W.
The native 3D organization of bacterial polysomes.
).
The kinetic model is influenced by the defined medium composition and new genome annotations
The time-dependent metabolite concentrations within the cell are determined by the metabolic reactions that depend on transport of key metabolites like glucose, nucleosides, fatty acids, amino acids, and cofactors. With a defined growth medium, exact uptake kinetics can be simulated using the external metabolite concentrations and the numbers of transporters. The metabolic maps in Syn3A here have been revised to be consistent with the defined growth medium, updated gene annotations, and experimental measurements such as lipidomics. To refer to genes in JCVI-syn3A, we simplify the locus tags from the NCBI entry from JCVISYN3A_xxxx to xxxx. For example, JCVISYN3A_0527 is referred to as gene 0527.
The only sugar source in the defined medium is glucose, so the revised map for central metabolism (Figure 2) starts with the phosphorelay relay system (
- Rohwer J.M.
- Meadow N.D.
- Roseman S.
- Westerhoff H.V.
- Postma P.W.
Understanding glucose transport by the bacterial phosphoenolpyruvate:glycose phosphotransferase system on the basis of kinetic measurements in vitro.
;
- Meadow N.D.
- Mattoo R.L.
- Savtchenko R.S.
- Roseman S.
Transient state kinetics of Enzyme I of the phosphoenolpyruvate:glycose phosphotransferase system of Escherichia coli: equilibrium and second-order rate constants for the phosphotransfer reactions with phosphoenolpyruvate and HPr.
,
- Meadow N.D.
- Savtchenko R.S.
- Nezami A.
- Roseman S.
Transient state kinetics of enzyme IICBGlc, a glucose transporter of the phosphoenolpyruvate phosphotransferase system of Escherichia coli: equilibrium and second order rate constants for the glucose binding and phosphotransfer reactions.
), which is responsible for the uptake and phosphorylation of glucose to glucose-6-phosphate (g6p) and is shown in the inset. Each phosphate exchange reaction of the phosphorelay is simulated independently, and the overall kinetics for the phosphorelay predict that Syn3A takes up 15,000 glucose molecules per second for a cell with a radius of 200 nm. Syn3A does not have the proteins to perform oxidative phosphorylation, so all ATP in Syn3A is generated by the central metabolism (
- Breuer M.
- Earnest T.M.
- Merryman C.
- Wise K.S.
- Sun L.
- Lynott M.R.
- Hutchison C.A.
- Smith H.O.
- Lapek J.D.
- Gonzalez D.J.
- et al.
Essential metabolism for a minimal cell.
). Pyruvate kinase (pyk/0221) (PYK) converts ADP to ATP and pep to pyruvate, competing for pep molecules with the glucose transport reaction, so the fluxes between these two reactions need to be carefully balanced throughout the cell cycle. Because the fructose-1,6-bisphosphate aldolase (fbaA/0131) (FBA) reaction splits fdp into two molecules, the rate of lower glycolysis is twice that of upper glycolysis. Therefore, a maximum of 45,000 ATP can be generated per second assuming no other NTPs are being formed: 30,000 ATP per second can be generated by the phosphoglycerate kinase (pgk/0606) (PGK) and 15,000 by pyruvate kinase (PYK) where pep is split between the glucose uptake reactions and PYKs. A much smaller amount of ATP is generated through acetate kinase (ackA/0230) (ACKr) upon the secretion of acetate. In our model, the cell was able to survive off of the ATP generated by central metabolism. Detailed discussions of the amino acid, cofactor, nucleotide, and lipid metabolic subsystems are included in STAR Methods.
The kinetic model of the genetic information processing reactions including DNA replication initiation and elongation, mRNA degradation, transcription, and translation are based on the kinetic model by
- Thornburg Z.R.
- Melo M.C.R.
- Bianchi D.
- Brier T.A.
- Crotty C.
- Breuer M.
- Smith H.O.
- Hutchison 3rd, C.A.
- Glass J.I.
- Luthey-Schulten Z.
Kinetic modeling of the genetic information processes in a minimal cell.
with a few modifications. Each of the elongation reactions are treated using a polymerization rate dependent on the respective monomer concentrations (dNTPs, NTPs, aa:tRNAs). For full details on these rate forms, see STAR Methods (Genetic information processes).
Timing of the cell cycle and cell growth are determined by dynamics of DNA replication and surface area growth
Unlike the spatial RDME-CME-ODE model, cells are simulated for whole cell cycles in the well-stirred CME-ODE model. To highlight the key features defining the cell cycle in Syn3A, we examine in Figure 3 the time dependence for the initiation of DNA replication, chromosome duplication, and the doubling of both the cell volume and surface area in the well-stirred simulations. In our previous work (
- Thornburg Z.R.
- Melo M.C.R.
- Bianchi D.
- Brier T.A.
- Crotty C.
- Breuer M.
- Smith H.O.
- Hutchison 3rd, C.A.
- Glass J.I.
- Luthey-Schulten Z.
Kinetic modeling of the genetic information processes in a minimal cell.
), only one replication event was allowed to occur per cell cycle, but here we complexify this model and allow for multiple replication initiation events based solely on the kinetics. We use the same equation for the rate of DnaA(III) filament formation on ssDNA for each independent origin
(Equation 1)

Figure 3Processes determining the cell cycle progression
(A) DNA replication initiation in two individual cells with and without subsequent replication initiation on the daughter chromosomes.
(B) Distributions of replication initiation times for 174 original chromosomes of which 141 of the daughter chromosomes had further replication initiation events.
(C) The DNA copy number as a function of time among 174 cells. The solid line is the average and the shaded region represents the full range among the population.
(D) Relative quantities of origins, quarter positions, and termini of the chromosome from qPCR for exponential and stationary phase cells. Error bars represent the standard deviation among six biological replicates. Exponential phase standard devations are 0.05 (Terminus), 0.09 (Quarter), and 0.3 (Origin). Stationary phase standard deviations are 0.03 (Terminus), 0.2 (Quarter), and 0.5 (Origin).
(E) The cell doubles in volume between 50 to 70 min.
(F) The cell surface area doubles between 88 to 112 min. Cells maintain a 55:45 ratio of protein surface area to lipid surface area over the course of a cell cycle.
Both
and
for the binding of DnaA(III) were measured using smFRET (
- Cheng H.-M.
- Gröger P.
- Hartmann A.
- Schlierf M.
Bacterial initiators form dynamic filaments on single-stranded DNA monomer by monomer.
). We allow replication initiation events on the daughter chromosomes in our model after they separate following the first replication cycle. The initial cell volume is used for the first replication initiation event, and the doubled cell volume is used for the daughter chromosomes late in the cell cycle. Based on these kinetics, if the number of DnaA doubles faster than the volume, it is likely that another replication initiation event will occur and the cell will have more than two chromosomes at the end of its cell cycle.
Two representative cells were selected to demonstrate the stochastic nature of replication initiation, and their DnaA filament lengths are shown as functions of time (Figure 3A). The first cell had completed its first replication initiation event around 5 min into the cell cycle. Both daughter chromosomes then initiate events late in the cell cycle. The second cell did not translate enough DnaA for it to be favorable for either daughter chromosome to complete a full DnaA filament. The distribution of replication initiation times for 174 cells (Figure 3B) for the original chromosome ranges from 3 min up to 36 min with an average time of 10 min and most probable time of 6 min. Of the 174 cells, 33 had only one replication event occur in their cell cycle. The daughter chromosomes in cells with multiple replication events have replication initiation times ranging from 58 min up to 110 min with an average time of 82 min.
Each cell is initialized with a single chromosome, and from Figure 3C we find that the average duplication time for the original chromosome is around 70 min with the earliest being completed at 56 min and the latest at 90 min. The average chromosome number grows to 2.8 at the end of the cell cycle, which reflects that either one or both of the daughter chromosomes have partially completed another chromosome. Cells with multiple replication events can have chromosome copy numbers as high as 3.8, indicating that they have nearly duplicated both daughter chromosomes in their cell cycle.
Although other bacteria have been shown to undergo multiple replication events per cell cycle (
;
- Nielsen H.J.
- Youngren B.
- Hansen F.G.
- Austin S.
Dynamics of Escherichia coli chromosome segregation during multifork replication.
), it had not yet been directly observed in Syn3A. From quantitative polymerase chain reaction (qPCR) experiments, we have determined the relative quantities of origins, quarter positions, and termini in Syn3A cells in both exponential and stationary phases (Figure 3D). The results are presented with all the quantities scaled to the number of termini. In the exponential phase cells, there are more than three times the number of origins than termini on average. This indicates that in many cells, after the first replication initiation event occurs, another replication initiation event will occur on the same chromosome before the first replication cycle completes. In our current model, a maximum ratio of 2 would occur, as we do not allow multiple initiation events to occur until the first replication cycle completes.
Critical to determining the length of the cell cycle are the times required to double the volume and surface area of the cell. The kinetics for cell growth are characterized by the formation of lipids and insertion of membrane proteins to determine the cell surface area and volume in the model. The cell volume is calculated from the surface area assuming the cell maintains spherical morphology until the onset of division. The volume doubles in the simulations anywhere from 56 to 72 min with an average of 64 min (Figure 3E). Without explicit kinetics for cell division by FtsZ/FtsA filaments, division is assumed to begin when the volume has doubled, during which the volume stays constant and the surface area continues to grow until two separate cell are formed. The surface area will double anywhere from 88 to 112 min with an average doubling time of 97 min (Figure 3F). Syn3A has an experimentally measured doubling time of 105 min in rich growth medium (
- Breuer M.
- Earnest T.M.
- Merryman C.
- Wise K.S.
- Sun L.
- Lynott M.R.
- Hutchison C.A.
- Smith H.O.
- Lapek J.D.
- Gonzalez D.J.
- et al.
Essential metabolism for a minimal cell.
). We report a simulated doubling time (Figures 3E and 3F) based only on healthy cells, whereas the experimentally measured doubling time includes a whole population, which may include unhealthy cells, reducing the average doubling time of the colony.
To determine the cell surface area, each lipid and membrane protein has an assigned surface area contribution (STAR Methods, Lipid metabolism). The contributions of proteins and lipids to the surface area are separated in Figure 3F where the two contributions are seen to maintain a rough 55% to 45% surface area contribution ratio (
- Sáenz J.P.
- Sezgin E.
- Schwille P.
- Simons K.
Functional convergence of hopanoids and sterols in membrane ordering.
). Because the translation and insertion of membrane proteins are both totally stochastic in the simulations, there is more variation in the surface area contribution from membrane proteins. The variation in lipids increases with simulation time as lipid synthesis genes are stochastically expressed in each individual cell.
The liponucleotide synthase cdsA/0304 catalyzing reaction DASYN (Figure S2) that adds CDP to a phospholipid precursor (phosphatidic acid, PA) was identified as a “choke point” in the phoshpolipid biosynthesis, in agreement with a recent whole-cell model of E. coli (
- Macklin D.N.
- Ahn-Horst T.A.
- Choi H.
- Ruggero N.A.
- Carrera J.
- Mason J.C.
- Sun G.
- Agmon E.
- DeFelice M.M.
- Maayan I.
- et al.
Simultaneous cross-evaluation of heterogeneous E. coli datasets via mechanistic simulation.
). Additionally, acyltransferase plsY/0117, which catalyzes the conjugation of fatty acid and glycerol moieties at the membrane was found to limit the production of the downstream intermediate PA. We attribute both of these effects to their low counts in the reported proteomics of Syn3A (
- Breuer M.
- Earnest T.M.
- Merryman C.
- Wise K.S.
- Sun L.
- Lynott M.R.
- Hutchison C.A.
- Smith H.O.
- Lapek J.D.
- Gonzalez D.J.
- et al.
Essential metabolism for a minimal cell.
) and adjusted the counts to values similar to those observed in other bacterial species (Table S2 and STAR Methods, Lipid metabolism). Their low counts are likely due to the fact that both are multiple domain membrane proteins, which are known to be underreported by proteomics when only a trypsin digest is used (D. Gonzalez, personal communication).
Balance of genetic information processes and metabolism
Due to reduction in its minimal genome, Syn3A has few remaining transcription/translation/transport regulatory proteins and must adjust the fluxes through the cellular subsystems to maintain stable growth. The simplified map of the reaction network with fluxes from a representative cell from the well-stirred model early in its cell cycle demonstrates the balance among use of ATP and GTP, nucleotide metabolism, and glycolysis (Figure 4A). The glycolysis pathway and nucleotide metabolism are connected through the PYK reactions converting all (d)NDPs to (d)NTPs, which results in the shared usage of pep with glucose uptake. Syn3A exclusively makes pep by the action of enolase at the end of glyolysis, generating two pep per glucose taken up. Because the glucokinase was removed in genome reduction (
- Breuer M.
- Earnest T.M.
- Merryman C.
- Wise K.S.
- Sun L.
- Lynott M.R.
- Hutchison C.A.
- Smith H.O.
- Lapek J.D.
- Gonzalez D.J.
- et al.
Essential metabolism for a minimal cell.
), we assume that the only way Syn3A can phosphorylate glucose is by PtsG in the phosphorelay. According to our kinetic model, if the cell runs out of pep, there is no way to continue glycolysis or phosphorylate NDPs except ADP, which can be converted to ATP by the reversible ATP synthase. On average, if PYK reactions use more than half of the pep formed, less glucose will gradually be taken up and the cell will run out of pep and cease to take up glucose. Roughly 16% of the cells in a total ensemble of 207 cells in the well-stirred simulations experienced pep’s shortage, leaving 174 cells that could successfully complete a cell cycle.

Figure 4The dominant connections between gene expression and cellular metabolism
(A) A simplified diagram of the ATP and GTP use, nucleotide metabolism, and glycolysis pathway show connections among the networks. Arrow width corresponds to rate through the reaction given in mM/s on the arrow. Red arrows indicate the rate-limiting steps of glycolysis in the simulations.
(B) If a cell runs low on dNTPs, NTPs, or charged tRNAs, the rates of the corresponding genetic information processing reactions are reduced (dnaA/0001). The black trace shows the average rate among the population with the full range in gray. The orange and green traces represent two individual cells. See also Figure S4.
Previous studies have indicated the possibility of PYK, PFK, and GAPD being rate-limiting reactions of glycolysis (
- Iwami Y.
- Yamada T.
Rate-limiting steps of the glycolytic pathway in the oral bacteria Streptococcus mutans and Streptococcus sanguis and the influence of acidic pH on the glucose metabolism.
;
); however, fructose-1,6-bisphosphate aldolase (FBA) appears to control the overall flux of glycolysis in Syn3A according to our simulations, which agrees with the findings of
- Kitamura S.
- Shimizu H.
- Toya Y.
Identification of a rate-limiting step in a metabolic pathway using the kinetic model and in vitro experiment.
). FBA has a low experimental proteomics count of 227 relative to the other glycolytic enzymes in Syn3A, having counts of 400 or greater. Relative to other bacteria, Syn3A appears to have a lower FBA count (Table S1), so in parameterizing the simulations, the count of FBA was scaled to 775 based on the enzyme’s concentration in E. coli so that kinetic parameters would better match the known equilibrium constant for the reaction. The FBA reaction is almost always at its maximum rate and is the slowest step in upper glycolysis in our models. The rates of the reactions in lower glycolysis are dictated by how much flux goes through the FBA reaction.
The balance of metabolic reactions also affects the rates of genetic information reactions through the monomer-dependent polymerization reaction rates discussed in STAR Methods (Genetic information processing). The time-dependent rates of each elongation reaction (DNA replication, transcription, and translation) scaled to its maximum rate for dnaA/0001 expression are shown in Figure 4B. The single cell in orange has only one complete DNA replication event occur in its cell cycle, and the green cell has a second. Their replication rates are at the maximum until the cell runs low on dNTPs (
0.01 mM) (Figure S4), in this case dATP (data not shown). The slowdown gives the cell time to import more deoxynucleosides and generate more dNTPs. The rate of replication will fluctuate from minute to minute as long as the cell runs low on a particular dNTP. In general, DNA replication rate is the most frequently affected of the genetic information processes with its average (black) going as low as 75% of the maximum.

Figure S4Sensitivity of genetic information processing reactions to metabolite concentrations, related to Figure 4
Each rate is scaled to their maximum rate for an average gene/mRNA/protein. Replication elongation rate as a function of dATP concentration (A) and of all dNTPs (B). Transcription elongation rate as a function of ATP concentration (C) and of all NTPs (D). Translation elongation rate as a function of charged (aa:tRNA) tRNA counts for isoleucine (ILE) (E) and of all aa:tRNAs (F). (G-J) The polymerization reactions are sensitive to the ratio of their
to monomer concentrations. (G)
to monomer concentration ratio plot for the general polymerization rate. When the monomer concentration is equal to the
, the overall rate is halved. (H-J)
to monomer concentration ratio plots varying the dATP, ATP, and ILE aa:tRNA ratios for DNA replication elongation, transcription, and translation respectively for DnaA. Black vertical lines (Simulation) indicate the ratio of the
used in the simulations to the initial monomer concentrations.

Figure S5qPCR amplification and standard curve plots, related to Figure 3D and STAR Methods
Amplification plots for the origin, quarter, and terminus qPCRs are shown on the top row. The amounts of PCR amplicon accumulated (vertical axis listed as ..Rn) after each PCR cycle (horizontal axis) are plotted. The threshold cycle (Ct) is the cycle at which each sample crosses the threshold cycle line, which is determined by the qPCR instrument QuantStudio software. The lower row shows standard curves generated using the qPCR standard DNA to calculate the relative amounts of DNA present in each sample.
There are no significant deviations in the transcription and translation rates for the cell with a single replication event (orange) (Figure 4B), as the instantaneous pool sizes of NTPs and charged tRNA remain high enough to not slow down any rates. For cells with multiple replication events (green), more RNA are transcribed, potentially depleting concentrations of NTPs, thereby reducing its transcription rate. This slowdown in transcription leads to pauses during which nucleosides can be taken up and phosphorylated.
Translation is infrequently altered because the amount of charged tRNA (aa:tRNA) depends directly on the uptake of amino acids, which are present in millimolar concentrations in the medium. So even though cells will sometimes run low on amino acids, a brief slowdown is enough for them to import more amino acids and recover their charged tRNA levels (
of each aa:tRNA).
Time-dependent ATP costs in the minimal cell quantify the cell’s significant energy costs
A key feature of our model is the explicit tracking of every energy molecule used by activated reactions in both metabolism and genetic information processes. Already in 1973,
- Stouthamer A.H.
A theoretical study on the amount of ATP required for synthesis of microbial cell material.
comprehensively calculated the amount of ATP required for the formation of a microbial cell based on the studies available on energetic costs at the time. He broke down the ATP demands of a cell into requirements for formation of polysaccharides, proteins, lipids, RNA, and DNA, uptake of amino acids, phosphate, and ions, and turnover of RNA. He notes that exactly accounting for transport reactions is difficult and little information on their ATP costs were available at the time, so only a few transport reactions are included in his calculations. More recently, a comprehensive review of the literature and calculations for both bacteria and eukaryotes (
assigned costs for the synthesis of macromolecules and determined the total ATP costs for DNA replication, transcription, and translation.
While the previous reviews calculated ATP costs for overall cell formation, we advance ATP cost calculations to single-cell, single-reaction resolution as a function of time for the cellular networks of Syn3A including both metabolism and genetic information processing, which is made possible by fully dynamical kinetic modeling. Figure 5A shows the ATP generated and used as functions of time for a representative cell from the well-stirred simulations. Note that these plots do not represent the number of phosphate bonds made and broken, but the number of ATP molecules being used. Another notable difference is that tRNA charging is only counted as one ATP, whereas it is typically counted as two phosphate bonds (
). The charging reactions still convert ATP to AMP and pyrophosphate in the simulations (Figure S3), but because the metabolic network explicitly accounts for the ATP cost of regenerating an AMP to an ADP through the ADK1 reaction (Figure S1). tRNA charging is counted as a single ATP cost in Figure 5. Additionally, because translation elongation uses GTP instead of ATP, it is not shown in the cost plot, but it is twice what we defined as the ATP cost of charging tRNAs from two GTP per amino acid: one during the loading of an aa:tRNA into the A site of the ribosome and a second during translocation to the next step on the mRNA (
).

Figure 5A detailed accounting of cellular ATP costs
Time-dependent ATP costs over the course of a cell cycle show the distribution of ATP costs among individual processes and a balance of ATP generation and usage. ATP is generated only in the central metabolism through the PGK, PYK, and ACKr reactions in Figure 2. ATP synthase uses ATP but is reversible in the kinetic model and can switch direction as seen in its drop in the single replication cell. Translation elongation is not included because it uses GTP rather than ATP (
).
The total ATP generated at each time step is close to or slightly greater than the total ATP used. As discussed earlier, the maximum ATP production is 45,000 ATP per second assuming no other NTPs are being made by PGK or PYK. Roughly 35,000 ATP are made per second initially (Figure 5A). As the cell grows, the number of proteins and associated rates of metabolic reactions increase, giving rise to the overall increase in both the ATP production and cost over the cell cycle. The growth is not perfectly smooth or linear because the protein counts and reaction rates depend on stochastic gene expression and timing of DNA replication events. To better compare the relative ATP cost of each activated process, we plot their fraction of the total ATP cost for a cell with a single replication event (Figure 5B, left) and a cell with multiple replication events (Figure 5B). The highest cost in Syn3A is for metabolic reactions, in particular the PFK reaction in upper glycolysis using 75% of the total metabolic cost.
Quantifying the exact cost of activated transport reactions has been a difficult challenge in both the energy calculations by
- Stouthamer A.H.
A theoretical study on the amount of ATP required for synthesis of microbial cell material.
and
, as well as in other recent whole-cell models (
- Karr J.R.
- Sanghvi J.C.
- Macklin D.N.
- Gutschow M.V.
- Jacobs J.M.
- Bolival Jr., B.
- Assad-Garcia N.
- Glass J.I.
- Covert M.W.
A whole-cell computational model predicts phenotype from genotype.
;
- Macklin D.N.
- Ahn-Horst T.A.
- Choi H.
- Ruggero N.A.
- Carrera J.
- Mason J.C.
- Sun G.
- Agmon E.
- DeFelice M.M.
- Maayan I.
- et al.
Simultaneous cross-evaluation of heterogeneous E. coli datasets via mechanistic simulation.
). Because each transport reaction is simulated independently, we know their exact ATP costs by recording the fluxes through each reaction. Unlike most organisms, which have synthesis pathways for most of its building blocks, Syn3A has been reduced to the point where it relies on having to transport them in. From crystal structures of related transporters, it is clear that the majority of them contain ATPase domains that require, on average, at least one ATP for every nutrient molecule imported (
- Santos J.A.
- Rempel S.
- Mous S.T.
- Pereira C.T.
- Ter Beek J.
- de Gier J.-W.
- Guskov A.
- Slotboom D.J.
Functional and structural characterization of an ECF-type ABC transporter for vitamin B12.
). It is assumed here that one ATP is used for every molecule taken up. The cost of active transport is approximately twice the ATP cost of tRNA charging (roughly 5,000:2,500 ATP per second), making it one of the largest energetic costs in the minimal cell. This is the most exact calculation of the costs of transport presented to date and reveals how transport reactions are critical for a minimal cell, which relies almost entirely on nutrients from its environment.
In Mycoplasmas, the ATP synthase typically breaks down ATP and pumps out protons to maintain a basic intracellular environment (
- Benyoucef M.
- Rigaud J.-L.
- Leblanc G.
The electrochemical proton gradient in Mycoplasma cells.
,
- Benyoucef M.
- Rigaud J.-L.
- Leblanc G.
Gradation of the magnitude of the electrochemical proton gradient in Mycoplasma cells.
). The reversible kinetics for ATP synthase depend on the ATP, ADP, and phosphate levels inside the cell. While ATP synthase accounts for roughly 10% of the overall ATP costs for the majority of the time, in response to low ATP levels in the cell, it can momentarily change direction.
Below the cost of ATP synthase come in descending order tRNA charging, transcription elongation, mRNA degradation, RNA material cost, DNA replication elongation, and membrane protein insertion via translocation by SecY. DNA replication only takes place during part of the cell cycle. Its ATP costs typically occur early in the cell cycle (Figure 5B, left) and again late in the cell cycle (Figure 5B, right) when initiation of another replication event occurs. The fluctuations reflect changes in the rate of DNA replication.
Time-dependent concentrations show consistent average behavior and large population variability
Homeostasis is a property of a normal cell to maintain constant intracellular concentrations over a cell cycle suggesting that upon experiencing a perturbation whether from the environment or an intracellular reaction, it responds by adjusting its biochemical pathways to bring the concentrations back into an acceptable range for stable functioning of its networks (
- Agozzino L.
- Balázsi G.
- Wang J.
- Dill K.A.
How do cells adapt? stories told in landscapes.
). The prevailing wisdom is that some degree of regulation is required to control any large fluctuations. The time traces for a representative selection of metabolites and macromolecules from the well-stirred simulations are shown in Figure 6. The complete time traces of all chemical species and reaction fluxes are provided in Data S1. The dNTP concentrations decrease late in the cell cycle for cells where multiple replication events have occurred because they are being incorporated into new chromosomes. For dATP and dTTP, it appears that the same cells maintain average concentrations that are relatively constant over the whole cell cycle. The concentrations of dGTP and dCTP both continually increase over the cell cycle, which calls for investigating the possibility of any regulation on the uptake of their precursor deoxynucleosides or the thioredoxin reactions converting GDP and CDP to dGDP and dCDP in nucleotide metabolism. The concentrations of UTP and CTP are fairly constant throughout most of the cell cycle, likely because the only annotated way to make new CTP in Syn3A is by converting UTP to CTP through a CTP synthase reaction CTPS2 in nucleotide metabolism (Figure S1). ATP and GTP, on the other hand, continue to increase over the cell cycle in both cells with single and multiple replication events.

Figure 6Simulated traces of key cellular metabolite and enzyme concentrations
Concentrations for 141 cells with multiple replication events (red) and 33 cells with single replication events (black) show a wide range of intracellular concentrations over the population: solid lines (average) and the shading (10th to 90th percentiles). A proteome-wide distribution of scaled protein counts shows that most proteins have their counts accurately doubled over the course of a cell cycle on average. The scaled protein counts represent the count of a protein at 105 min (experimental end of the cell cycle) divided by its initial count from the proteomics.
Even though homeostasis can be observed for a population, there can be significant variation among individual cells due to stochastic fluctuations in gene expression. Some of the largest variations in our simulations were for phosphate (PI), pyrophosphate (PPI), and fructose bisphosphate (fdp). Cells with multiple replication events have a wide range of phosphate levels at the end of the cell cycle. In contrast, cells with only a single replication event return to consistently lower phosphate levels after the first replication event is complete. PPI is given off from DNA replication reactions, so cells with multiple replication events will see a broader range of PPI and therefore PI levels. While such high concentrations have been reported in yeast (
- Park J.O.
- Rubin S.A.
- Xu Y.-F.
- Amador-Noguez D.
- Fan J.
- Shlomi T.
- Rabinowitz J.D.
Metabolite concentrations, fluxes and free energies imply efficient enzyme usage.
), they may be inaccurate for Syn3A. Syn3A still has the gene phoU/0428 coding for a phosphate regulator, so it is a primary candidate to be included in a complexified model that includes regulation.
As discussed earlier, the FBA reaction limits the rate of glycolysis in our kinetics. As long as the enzyme performing the reaction, fructose-1,6-bisphosphate aldolase (FbaA), is at its average concentration or lower, FBA will be going slower than upper glycolysis, and its substrate, fdp, can build up by tens of millimolars. In cells that generate more FbaA enzymes or take up less glucose, there is less significant buildup or even no buildup resulting in the fdp pool being lower, even less than 10 mM.
Examples of three proteins, the nucleoside transporter ATP-binding protein (rnsA/0010), pyruvate kinase (pyk/0221), and fructose-1,6-bisphosphate aldolase (fbaA/0131), are provided in Figure 6. The concentrations of RnsA, FbaA, and Pyk all share similar behavior, being slightly diluted over the first part of the cell cycle and then increasing until the end of the cell cycle. Reactions for protein degradation are not included, so this is purely a volume effect where the number of proteins is not increasing to match the increased volume. The farther away a gene is from the origin, the more exaggerated this effect becomes because of the delay between the start of replication and when a gene gets doubled.
To gauge proteome-wide homeostasis, the scaled protein counts for all proteins reported in the proteomics data (
- Breuer M.
- Earnest T.M.
- Merryman C.
- Wise K.S.
- Sun L.
- Lynott M.R.
- Hutchison C.A.
- Smith H.O.
- Lapek J.D.
- Gonzalez D.J.
- et al.
Essential metabolism for a minimal cell.
) with counts of 10 or greater excluding ribosomal proteins, a total of 350/452 proteins, at the end of a cell cycle are shown in the histogram in Figure 6. Ribosomal proteins are excluded from this plot because their counts do not reflect the 503 ribosomes observed in cryo-ET, with many having counts fewer than 300. For a protein to maintain a near-constant concentration, its count must double over a cell cycle as the volume doubles. On average, the overwhelming majority of proteins end the cell cycle with 1.75 to 2.25 times their initial protein counts. Outliers include proteins whose genes are longer than 4,000 nt on the low end and priB/0026 on the high end, which has a transcript only 441 nt long.
Agreement between hybrid well-stirred and spatial simulations
To gauge the quality of our parameterized well-stirred hybrid CME-ODE model to reproduce the results of the 3D spatially resolved hybrid RDME-gCME-ODE model with diffusion (see Video S1), we compare counts of mRNAs for genes involved in genetic information processes, nucletoide pools, protein distributions, and mRNA half-life distributions (Figure 7). The mRNA counts in the spatial model are higher on average likely due to the difference in transcription rates between the two models (see STAR Methods, Transcription). The spatial model has lower concentrations of nucleotides (Figures 7C and 7D) than the well-stirred model, which can be tied to two factors: first, more nucleotides are being incorporated into mRNA, and second, the current spatial model does not include DNA replication, which initiates, on average, around 10 min in the well-stirred model (Figure 3B). With the genes for the nucleoside transporters being close to the origin, their genes would be duplicated early in the cell cycle. Consequently, the spatial model should have fewer nucleoside transporters, which would result in slightly lower uptake rates of nucleosides. Scaled protein counts are compared in Figures 7E and 7F. The slightly higher counts produced in the well-stirred simulations reflect that partial replication events have taken place in a few of the cells within the first 20 min. Finally, mRNA half-lives are compared between the two methods (Figures 7G and 7H), where the well-stirred half-lives depend on the active degradosome statistics in the spatial model (STAR Methods, mRNA Degradation). The well-stirred model has longer half-lives on average than the half-lives in the spatial model, with average half-lives of 3.4 min and 1.97 min, respectively.

Figure 7A comparison of gene expression between well-stirred and spatially resolved simulations
(A and B) Counts of mRNA for genes coding for genetic information processing proteins.
(C and D) Intracellular concentrations (pools) of NTPs and ADP.
(E and F) Genome-wide scaled protein counts after the first 20 min of the cell cycle.
(G and H) Genome-wide mRNA half-life distributions. The 1.97-min mRNA half-life in the spatial model is in better agreement with a measured average half-life in M. gallisepticum.
Discussion
We report here the results from fully dynamical kinetic models, both for the well-stirred homogeneous (CME-ODE) and 3D (RDME-CME-ODE) spatially resolved scenarios, for a living minimal bacterial cell. We provide the time-dependent information about the dynamic rates of genetic information processes, the 148 known metabolites, 452 proteins and mRNAs, 29 tRNAs, 503 ribosomes, and DNA undergoing over 7,000 reactions. With its reduced genome of 543 kbp and 493 genes, the minimal cell JCVI-syn3A has retained only a few genes for regulatory proteins and functional small RNAs. In our present kinetic models, regulation can only occur through gene expression and the rate forms for the various genetic information processes and metabolic reactions. We have not included explicitly known regulatory proteins like PhoU (phoU/0428) and the riboswitches TPP and SAM, as the kinetic parameters and time-scales of conformational changes in the riboswitches are still being investigated (
- Scull C.E.
- Dandpat S.S.
- Romero R.A.
- Walter N.G.
Transcriptional riboswitches integrate timescales for bacterial gene expression control.
). The simulations based on the hybrid well-stirred (CME-ODE) whole-cell kinetic model have already given us quantitative insight into how the cell balances the demands of metabolism, genetic information, and growth over a cell cycle. From the emergent behaviors arising from the well-stirred and spatially resolved stochastic-deterministic simulations presented here, we can begin to understand the principles of life for this minimal cell when little regulation is present.
By emergent, we specifically mean behaviors defining the state of the cell (time-dependent concentrations, patterns, reaction rates, and correlations) that arise from simulations of the kinetic models and are not imposed. Such a behavior is the relationship among stochastic gene expression, cell growth, and progression of the cell cycle (Figure 3). Formation of a complete DnaA filament along the single-stranded DNA near the origin determines the timing of initiation of DNA replication. Cell growth as measured by increasing surface area is controlled in our model solely from lipid metabolism and translation of the mRNAs for lipid enzymes and membrane proteins. In an earlier work (
- Peterson J.R.
- Cole J.A.
- Fei J.
- Ha T.
- Luthey-Schulten Z.A.
Effects of DNA replication on mRNA noise.
), we showed experimentally, theoretically, and computationally the effects of DNA replication or gene copy number on the variance in mRNA distributions and ultimately the protein distributions, and this prior study guided the development of the kinetic model. As most growth studies are carried out on a population of cells, the results in this figure suggest a range of doubling times to be expected at a single-cell level. Importantly, given the dependence of the DnaA binding rate to its abundance and inverse dependence to the cell volume, the expected number of replication initiation events should exhibit a distribution as it does here. Lipid metabolism based on the lipidomics study in Figure S2 leads to cell volume/surface growth that doubles the volume shortly after the first replication event in approximately 65 min and doubles the surface area in a range from 88 to 112 min.
Based on average time-dependent fluxes emerging from the simulations, the cells react to depletion of either the nucleotide or amino acid pools by slowing down replication, transcription, or translation (Figures 4 and S4A–S4F). While, on average, the cells are able to maintain these pools, the change in rates allows them to recover from the imbalance. Syn3A monitors its internal state through the metabolic network and modifies its rate of replication, transcription, and translation in response to the concentrations of dNTPs, NTPs, and aa:tRNAs, respectively. The rate of glycolysis in our kinetics is limited by the fructose-1,6-bisphosphate aldoase (fbaA/0131) and the PYK reactions (pyk/0221), which require pep to convert any (d)NDP to (d)NTP. With the removal of glucokinase in the genome reduction, pep is essential for the transport of glucose into the cell and its phosphorylation to g6p. An imbalance between upper and lower glycolysis and nucleotide metabolism does occur in a small fraction of the cells, which stop growing due to a depletion of pep.
The kinetic model allows the cell-wide uses of ATP to be monitored (Figure 5), which revealed that the costs of active transport of ions, amino acids, and nucleosides in Syn3A are comparable to other significant costs such as translation, as first suggested by
- Stouthamer A.H.
A theoretical study on the amount of ATP required for synthesis of microbial cell material.
. This increased cost reflects the simplicity of this organism and its dependence on communication with its environment. Because Syn3A lacks oxidative phosphorylation, the cell depends almost entirely on glycolysis for formation of ATP. This results in significant sensitivity to the levels of glycolytic enzymes and therefore the stochastic genetic information processing reactions that express the enzymes.
In general, average metabolite concentrations generated from the model (Figure 6) show reasonable agreement to the values reported by
- Park J.O.
- Rubin S.A.
- Xu Y.-F.
- Amador-Noguez D.
- Fan J.
- Shlomi T.
- Rabinowitz J.D.
Metabolite concentrations, fluxes and free energies imply efficient enzyme usage.
in E. coli, but the cell-to-cell variation in the kinetic models over a cell cycle is broader than the predicted range. This discrepancy certainly calls for regulation in some cases such as the uptake of (deoxy)nucleosides, uptake of inorganic phosphates, and formation of some metabolites that build up to large concentrations in some cells such as prpp and fdp.
In summary, the emergent behaviors are validated by several experimental results. The simulations accurately double the protein counts from experimental genome-wide proteomics. While our results are not in perfect agreement with qPCR origin-to-terminus ratios, our kinetics also reflect that many Syn3A cells experience multiple replication events per cell cycle. Our model predicts an average surface area doubling time of roughly 98 min, which is close to the experimentally measured doubling time (
- Breuer M.
- Earnest T.M.
- Merryman C.
- Wise K.S.
- Sun L.
- Lynott M.R.
- Hutchison C.A.
- Smith H.O.
- Lapek J.D.
- Gonzalez D.J.
- et al.
Essential metabolism for a minimal cell.
) in rich medium of 105 min. Also emergent are fractions of active complexes and the distribution of mRNA half-lives from the spatial model. The fraction of active RNAP is similar to the fraction reported by
- Bremer H.
- Dennis P.P.
Modulation of chemical composition and other parameters of the cell at different exponential growth rates.
. The average of the distribution of mRNA half-lives is similar to the average reported for M. gallisepticum (
), and the distribution having a long tail out to 15 min has also been observed in B. subtilis (
- Hambraeus G.
- von Wachenfeldt C.
- Hederstedt L.
Genome-wide survey of mRNA half-lives in Bacillus subtilis identifies extremely stable mRNAs.
). At the moment, no experiments have been done to measure the number of active degradosomes in Syn3A and is a prediction to be validated in future experiments. While the predicted distribution of half-lives is in agreement with genome-wide studies carried out on related organisms, it awaits confirmation from ongoing transcriptomics studies. The hybrid CME-ODE whole-cell kinetic model requires roughly 4 h to calculate the behavior of a cell over a maximum cell cycle of 120 min. The hybrid model is straightforward to complexify by adding additional reactions before proceeding to the full spatial hybrid RDME-CME-ODE model, which is computationally more demanding. For example, the reintroduction of the glucokinase into the kinetic model restored those unhealthy cells, which halted glycolysis due to the shortage of pep. The genetic reintroduction still needs to be carried out to validate this prediction.
Limitations of the study
Current simulations predict a lower ratio of origins to termini formed in DNA replication than observed in the qPCR experiments, which comes from restricting the formation of replication forks on the daughter chromosomes to occur only after complete replication of the mother chromosome and from starting each simulation with a single circular chromosome. In the future, we will consider multiple cell cycles starting from daughter cells each possibly containing a chromosome with multiple replication forks. Extending the simulations over several cell cycles would allow us to obtain statistics about cell divisions and multiple initiations of DNA replication events and DnaA filament formation.
The RDME-CME-ODE simulations are currently limited by having the degradosome, RNAP, and ribosome complexes all initialized in inactive states. Starting from an inactive state results in the initial transience in Figures 1M–1O, where the first few minutes of simulation are dominated by the complexes reaching steady-state processing of mRNAs, which may be overshadowing other interesting phenomena. In the absence of experimental knowledge of the average active complexes in Syn3A, the initial transience emphasizes importance of diffusion and how the ensemble-averaged results of the spatial model could be used to parameterize probabilistic factors in the well-stirred CME-ODE kinetic model, which account for diffusion. Future simulations with averaged occupation states will address this limitation. Besides the lack of explicit regulatory factors discussed above, the reactions to modify nucleobases in DNA and rRNA have been neglected and the time-dependent assembly mechanisms of protein complexes and ribosomes have been absorbed into the overall rates. Regulation is important and will be included in future models. No kinetic model for formation of FtsZ/FtsA filament and septum formation prior to formation of the daughter cells is considered. In the future, these processes will be addressed as we modify the spatially resolved kinetic model to allow a change in cell morphology, DNA replication, and ribosome biogenesis similar to the rule-based model we created for DNA replication in a slow-growing and dividing E. coli cell (
- Earnest T.M.
- Cole J.A.
- Peterson J.R.
- Hallock M.J.
- Kuhlman T.E.
- Luthey-Schulten Z.
Ribosome biogenesis in replicating cells: Integration of experiment and theory.
).
STAR★Methods
Key resources table
Resource availability
Lead contact
Further information and requests for resources and reagents should be directed to and will be fulfilled by the Lead Contact, Zaida Luthey-Schulten ( [email protected] ).
Materials availability
This study did not generate any new unique cell strains or reagents, however the JCVI-syn3A and JCVI-syn3B bacterial strains are available to researchers through the JCVI and Codex DNA, Inc under a material transfer agreement through John Glass ( [email protected] ). Note that to handle JCVI-syn3A and JCVI-syn3B, United States scientists must obtain a United States Veterinary Permit for Importation and Transportation of Controlled Materials and Organisms and Vectors from the U.S. Department of Agriculture Animal and Plant Health Inspection Service. The organisms require Biosafety Level 2 containment.
Experimental model and subject details
Bacterial strains
The principal mycoplasmal strains used in this study are JCVI-syn3A (GenBank: CP016816.2) and JCVI-syn3B, which is genetically identical to JCVI-syn3A except is has a synthetic DNA landing pad to allow for easy introduction of new genes. These strains were propogated in the SP4 and defined medium compositions listed in the Key Resources Table. JCVI-syn3A and JCVI-syn3B were handled in labs with permits for Biosafety Level 2 containment.
Defined medium
In a continuing effort to completely define the minimal external environment sufficient to support growth of Syn3A, increasingly defined media have been developed. Our current iteration is based on the defined medium developed and fully described by Rodwell (
) which supports growth of Mycoplasma mycoides subspecies including the natural precursor of JCVI-syn1.0, Mycoplasma mycoides subsp. capri str. GM12 (Rodwell medium designation C5). This medium did not support growth of Syn3A. Defined components were therefore empirically added based on i) predictions from the analysis of metabolic pathways for Syn3A (
- Breuer M.
- Earnest T.M.
- Merryman C.
- Wise K.S.
- Sun L.
- Lynott M.R.
- Hutchison C.A.
- Smith H.O.
- Lapek J.D.
- Gonzalez D.J.
- et al.
Essential metabolism for a minimal cell.
), ii) components of defined media reported for growth of M. pneumoniae (
- Yus E.
- Maier T.
- Michalodimitrakis K.
- van Noort V.
- Yamada T.
- Chen W.-H.
- Wodke J.A.
- Güell M.
- Martínez S.
- Bourgeois R.
- et al.
Impact of genome reduction on bacterial metabolism and its regulation.
) and iii) a defined component of SP4 medium (CMRL-1066). The complete description of the medium is given in the Key Resources Table and Table S1. In all cases lipid delivery was provided using KnockOut (KO) serum replacement as a source of albumin (
). Concentrations of assorted constituents in this medium served as the basis for simulated parameters in the model. A subset of the components of the defined medium were used to simulate transport reactions in metabolic reactions. The subset was selected as each component present in a transport reaction in the metabolic reactions.
Adaptation of Syn3A was accomplished by progressive dilution of SP4 cultures into the defined medium. Cultures were maintained as previously described (
- Hutchison C.A.
- Chuang R.-Y.
- Noskov V.N.
- Assad-Garcia N.
- Deerinck T.J.
- Ellisman M.H.
- Gill J.
- Kannan K.
- Karas B.J.
- Ma L.
- et al.
Design and synthesis of a minimal bacterial genome.
) in static liquid culture at 37 ° C, monitoring growth with the pH indicator phenol red for acid production. Mycoplasma growth was confirmed by plating dilutions of adapted cultures onto solid agar SP4 medium to identify and quantify colonies.
Method details
Background for stochastic cell simulations
The computational methods in this study employ both deterministic and stochastic reaction solvers and communication between them. Before discussing specific construction of the hybrid simulations of Syn3A, we must first introduce the stochastic simulation program Lattice Microbes (LM) (
- Roberts E.
- Stone J.E.
- Luthey-Schulten Z.
Lattice Microbes: high-performance stochastic simulation method for the reaction-diffusion master equation.
;
- Hallock M.J.
- Stone J.E.
- Roberts E.
- Fry C.
- Luthey-Schulten Z.
Simulation of reaction diffusion processes over biologically relevant size and time scales using multi-GPU workstations.
;
- Earnest T.M.
- Cole J.A.
- Luthey-Schulten Z.
Simulating biological processes: stochastic physics from whole cells to colonies.
). Stochastic simulations are necessary when reactions involving few particles such as those in genetic information processing and the charging of the tRNAs can lead to large variations in the state of the cell (
- Taniguchi Y.
- Choi P.J.
- Li G.-W.
- Chen H.
- Babu M.
- Hearn J.
- Emili A.
- Xie X.S.
Quantifying E. coli proteome and transcriptome with single-molecule sensitivity in single cells.
). LM stochastically simulates reactions without (in the well-stirred homogeneous scenario) and with the explicit inclusion of particle diffusion to account for the spatially-heterogeneous intracellular environment. For example, ribosome and nucleoid regions are distinct from cytoplasmic, membrane, and peripheral membrane regions, and we allow different reactions r to take place in each with a different propensity
and stoichiometry
. In our spatially-resolved reaction-diffusion master equation (RDME) simulations, a cubic lattice is imposed on the cell and the system is divided into subvolumes v centered about the lattice points. Within each subvolume the reactions are assumed to be well-stirred and simulated using the Gillespie stochastic algorithm (
- Gillespie D.T.
Exact stochastic simulation of coupled chemical reactions.
). Diffusion of particles from one subvolume to another is described by a diffusion operator, and the overall evolution of the state of the cell is given by the combined RDME.
(Equation 2)
The state of the cell x represents all the species (genes, RNAs, and proteins) present at any instant of time. The first term is the chemical master equation (CME) probabilistic description of the subvolume-localized reactions for every subvolume in the system and the second term is the diffusion operator D for each particle type α in the x, y, and z directions specified by i, j, and k. For clarity, within the context of the spatially-resolved simulations, we will use the term local CME to describe the modeling of subvolume-localized stochastic reactions and the term global CME to describe the modeling of cell-wide stochastic reactions between species assumed to be well-stirred in the full cellular volume. In the whole-cell simulations presented in this study, we combine the above simulation methods with ordinary differential equation (ODE) solvers in hybrid methods that use the results of each method to update the particle counts of the subsequent method in a backward-updating fashion (
- Bianchi D.M.
- Peterson J.R.
- Earnest T.M.
- Hallock M.J.
- Luthey-Schulten Z.
Hybrid CME-ODE method for efficient simulation of the galactose switch in yeast.
), which we present schematically in Figure S3. LM allows for periodic communication between its stochastic CME and RDME solvers and other simulation methods, such as an ordinary differential equations solver for metabolic reactions. The time step for communication, τ, is a parameter based on general timescale separation between the reactions in the metabolism, which are modeled by ODEs, and the reactions involved in the gene expression, which are modeled by a CME/RDME, that have significantly longer times between individual reactions. The communication times for linking these two methodologies were proven for similar metabolic and gene expression reactions (where regime separation was again chosen based on varying reaction propensities) in a genetic switch in yeast (
- Bianchi D.M.
- Peterson J.R.
- Earnest T.M.
- Hallock M.J.
- Luthey-Schulten Z.
Hybrid CME-ODE method for efficient simulation of the galactose switch in yeast.
). The accuracy and efficiency for a series of communication times was reported there, even to the numerical limit of a computationally “exact” implementation of the algorithm, where communication occurs after the firing of each stochastic reaction. The choice of this parameter does affect simulations (Figure 2 in
- Bianchi D.M.
- Peterson J.R.
- Earnest T.M.
- Hallock M.J.
- Luthey-Schulten Z.
Hybrid CME-ODE method for efficient simulation of the galactose switch in yeast.
), however the size of these effects diminishes substantially as the communication time step is decreased from the minute to single second scale. In these simulations, we used a communication time step of 1 s.
In the spatial RDME model, the genetic information processes are simulated as reactions and diffusion into and out of each local subvolume. Specifically, RNAP diffuses to the location of the start of the gene and binds to it. Later it is released along with the transcript at the end of the gene. mRNA can diffuse and bind to either the ribosomes or degradasomes. These reactions communicate with a global CME model of transcription elongation kinetics and tRNA charging kinetics, which are assumed to take place anywhere within the cell volume. Information about the stochastic reactions that occurred during the time interval of length τ are communicated to the ODE kinetic model for metabolism, the ODE solver is then run for an identical time interval, and the final state of the ODE model is used to update the state of the stochastic reaction system. The entirety of this procedure is shown in Figure S3.
Below we discuss more thoroughly the RDME, CME, and ODE solvers, and the combined hybrid algorithms. After introducing how the cellular architecture and DNA configurations are created from cryo-electron tomograms using the theory of a circular self-avoiding polymers (
- Gilbert B.R.
- Thornburg Z.R.
- Lam V.
- Rashid F.M.
- Glass J.I.
- Villa E.
- Dame R.T.
- Luthey-Schulten Z.
Generating chromosome geometries in a minimal cell from cryo-electron tomograms and chromosome conformation capture maps.
), we discuss the construction of the spatial model for Syn3A that includes the full kinetic model of metabolism. A detailed description of lipid metabolism requires use of information coming from the lipidomics studies by the Saenz group which we provide at the end of the methods. The minimal cell simulation programs are available at https://github.com/Luthey-Schulten-Lab/Minimal_Cell
LM interface for creating spatial simulations: jLM
The algorithms discussed in the subsections below use the new release of LM, Lattice Microbes v2.4 (https://github.com/Luthey-Schulten-Lab/Lattice_Microbes), which comes with a new user interface package designed to be used in Jupyter Python notebooks: jLM. LM v2.4 has a more user-friendly installation. (The programs and user manuals are available through Github address in Resources and our website). jLM has an improved interface for generating initial conditions for spatially resolved simulations, including constructing cell geometries such as those shown in Figures 1B–1E, defining diffusion rates and rules, and defining reactions. To check the initial setup of the physical cell and the simulations, jLM includes visualization of cell geometries using active rendering, table visualization for particle diffusion coefficients, and table visualization of reaction details such as subvolume localization and the particles involved in the reaction. jLM allows for incorporation of data such as cryo-electron tomograms where we can include cell features directly into cell architecture in simulations as long as positional data about the cellular features is annotated. For diffusion, the simulations require diffusion probabilities for each particle between each region defined in the simulation, for example between the membrane and cytoplasm, as well as diffusion probabilities within the region. While this sounds daunting at first to define so many diffusion probabilities for many species, jLM allows the user to define global diffusion rules that will be set for all particles in the simulation which can be later modified. To define any one diffusion rule, jLM takes the particle name, subvolume region (meaning the region in which the particle currently resides), destination subvolume region, and diffusion coefficient as inputs. To define reactions, jLM takes a list of substrates, list of products, and rate constant as inputs where the rate constant is earlier defined giving a value and reaction order as inputs.
Overview of deterministic and stochastic simulation methods
Here, we describe individually the three simulation methods used to model the chemical kinetics in order of increasing temporal and spatial complexity. At the lowest level of complexity, we model the time-evolution of metabolite concentrations using ordinary differential equations (ODEs). We solved the deterministic initial value problem for the ODE system modeling the metabolic reactions using the LSODA solver within the ODEPACK software suite (
- Hindmarsh A.
Odepack, a systematized collection of ode solvers.
;
- Petzold L.
Automatic selection of methods for solving stiff and nonstiff systems of ordinary differential equations.
). The LSODA solver has implementations of multistep methods via the Adams or BDF methods for both stiff and non-stiff systems of ODEs. We used the backward differentiation formula method, with order varying between 1 and 5, to solve the stiff ODE system of metabolic reactions. The initial conditions for the enzyme counts and metabolite concentrations for the metabolic reactions are in Table S1 and are based on the proteomics data from (
- Breuer M.
- Earnest T.M.
- Merryman C.
- Wise K.S.
- Sun L.
- Lynott M.R.
- Hutchison C.A.
- Smith H.O.
- Lapek J.D.
- Gonzalez D.J.
- et al.
Essential metabolism for a minimal cell.
) and metabolomics data primarily from (
- Park J.O.
- Rubin S.A.
- Xu Y.-F.
- Amador-Noguez D.
- Fan J.
- Shlomi T.
- Rabinowitz J.D.
Metabolite concentrations, fluxes and free energies imply efficient enzyme usage.
) and (
- Yus E.
- Maier T.
- Michalodimitrakis K.
- van Noort V.
- Yamada T.
- Chen W.-H.
- Wodke J.A.
- Güell M.
- Martínez S.
- Bourgeois R.
- et al.
Impact of genome reduction on bacterial metabolism and its regulation.
). Based on the cryo-electron tomograms for Syn3A (
- Gilbert B.R.
- Thornburg Z.R.
- Lam V.
- Rashid F.M.
- Glass J.I.
- Villa E.
- Dame R.T.
- Luthey-Schulten Z.
Generating chromosome geometries in a minimal cell from cryo-electron tomograms and chromosome conformation capture maps.
), we assumed the initial radius of Syn3A of 200 nm and cell volume of 0.0335 fL. Missing or questionable data was supplemented by comparative analysis to E. coli, M. florum, and B. subtilis. The kinetic parameters are listed in Table S2.
To write the metabolic reactions into a system of ODEs that can be numerically evaluated by the ODE solver, we use a custom python package named odecell. This package was designed to have a simple application programming interface (API) for defining rate forms, specifying kinetic parameters, and assigning reactants and products for individual reactions. The package odecell comes with some predefined simple rate forms such as first and second order kinetics, but most reactions in the whole-cell model have a custom rate form defined for the random binding model of enzyme kinetics. Once all reactions have been defined and parameters, reactants, and products have been assigned, odecell is used to pass the time-evolution equation for every metabolite to the ODE solver discussed above.
The chemical master equation (CME) is an equation describing the time-evolution of a well-stirred system of reacting particles. Like other master equations, it models the probability of the system being found in a set of different states and the transitions between those states. In the case of the chemical master equation, the states are different combinations of discrete counts of particles and the system transitions between states as a result of chemical reactions. Under the well-stirred condition, the chemical reactions are equally probable between any reactant particles in the system volume. The system is assumed to be Markovian and the transition probabilities are conditionally dependent on the current state and the parameterization of the transition probabilities. Given this assumption, realizations of the system’s time-evolution can be simulated using the Gillespie algorithm and its variants. Ensembles of those realizations can then be used to reconstruct the time-evolution of the probability distribution of system states. The advantage of a stochastic CME model of chemical kinetics versus a deterministic ODE model is that the CME model reports the distribution of system states (and fluctuations), rather than only the mean concentration. This is especially relevant for systems with low copy numbers, such as models of genetic information processing, where the fluctuations and mean are of a similar order of magnitude.
At the greatest level of complexity, we use the reaction-diffusion master equation (RDME) to model diffusion and reactions within Syn3A. The RDME is a spatially-resolved version of the chemical master equation, where the system is now a set of connected subvolumes and the system state is the distribution of particles across that set of subvolumes. The system state changes by particles diffusing between subvolumes or reacting within subvolumes. The reactions among particles in a subvolume are handled as well-stirred and can be simulated using the same methods as the CME.
We sampled the stochastic initial value problems for the CME and RDME systems using the newly-released version 2.4 of Lattice Microbes (LM) (
- Roberts E.
- Stone J.E.
- Luthey-Schulten Z.
Lattice Microbes: high-performance stochastic simulation method for the reaction-diffusion master equation.
;
- Hallock M.J.
- Stone J.E.
- Roberts E.
- Fry C.
- Luthey-Schulten Z.
Simulation of reaction diffusion processes over biologically relevant size and time scales using multi-GPU workstations.
;
- Earnest T.M.
- Cole J.A.
- Luthey-Schulten Z.
Simulating biological processes: stochastic physics from whole cells to colonies.
), a GPU-based stochastic simulation software for chemical kinetics. The CME system was sampled using the implementation of the Gillespie direct algorithm and the RDME system was sampled using the implementation of the multi-particle diffusion algorithm. The reaction network for the CME system was constructed using the pyLM subpackage, a problem-solving environment that provides Python bindings to access the underlying LM code written in C++. The spatial model and reaction network for the RDME system were constructed using the new jLM subpackage, which extends the functionality of pyLM to include real-time visualization and interrogation of the system using ipywidgets within Jupyter notebooks. The simulations and analyses presented below were performed using Python scripts and Jupyter notebooks that are listed within the Key Resource Table.
Construction of JCVI-syn3A cell geometry and DNA configurations
We model the spatially-resolved kinetics using a RDME formalism and simulate the system using LM, which necessitates recreating the geometry of cells in a cubic lattice representation. Centered about the lattice sites are cubic reaction subvolumes. Particles within a common subvolume are assumed to be well-stirred and the subvolume-localized reactions proceed in a manner identical to the CME. Particles may also diffuse between subvolumes that are directly adjacent and share a face. The selection of the lattice size is an essential step in the creation of the spatial model. Through a constraint introduced by the greatest diffusion rate, the lattice size and maximum timestep are interdependent, i.e., a greater lattice size permits greater maximum timesteps (
- Roberts E.
- Stone J.E.
- Luthey-Schulten Z.
Lattice Microbes: high-performance stochastic simulation method for the reaction-diffusion master equation.
). Ultimately, the balance between increased spatial-resolution and computationally-achievable timescales is a choice on the part of the modeler. Previous LM simulations of ribosome biogenesis in the model Gram-negative bacterium E. coli used a lattice with dimensions of 32
32
192 sites (196,608 total sites) and a lattice size of 32 nm, with timesteps of 25 μs (
- Earnest T.M.
- Cole J.A.
- Peterson J.R.
- Hallock M.J.
- Kuhlman T.E.
- Luthey-Schulten Z.
Ribosome biogenesis in replicating cells: Integration of experiment and theory.
). Syn3A is approximately one-tenth of the physical size of E. coli and we chose a lattice size of 8 nm to realize the effects of spatial heterogeneities in the small system, while allowing for simulations over biologically-relevant timescales. The dimensions of the lattice used for the Syn3A model is 64
64
64 sites (262,144 total sites). The location of reactions and diffusion are controlled by manipulating the types of individual lattice sites, which in turn associate the surrounding subvolume with a region of the cell. For example, translation reactions are prohibited from occurring in subvolumes associated with the chromosome. In another example, proteins may diffuse from subvolumes associated with the ribosomes into neighboring cytoplasmic subvolumes, but the inverse is prohibited. Constructing the spatial-model can be decomposed into three essential steps: 1) creating the cell architecture on a cubic lattice by manipulating the site types, 2) specifying the rates for reactive and diffusive events, along with their region-based rules, and 3) placing the particles within the model.
We created single-cell architectures of Syn3A cells, including the ribosome distribution and chromosome configurations, using the cryo-ET and self-avoiding polymer model decribed in our previous work (
- Gilbert B.R.
- Thornburg Z.R.
- Lam V.
- Rashid F.M.
- Glass J.I.
- Villa E.
- Dame R.T.
- Luthey-Schulten Z.
Generating chromosome geometries in a minimal cell from cryo-electron tomograms and chromosome conformation capture maps.
). In summary, there are four steps to the process. 1) Syn3A cells were imaged using cryo-ET by the lab of Elizabeth Villa at UCSD and ribosome coordinates were determined by applying an iterative binary classification procedure (
) to tomographic reconstructions of Syn3A cells. A cell observed to have roughly a 200 nm radius contained 503 ribosomes distributed nearly-randomly throughout the cell. In a few cases, neighboring ribosomes were so close that possible polysomes ranging in size of 2-5 ribosomes could be identified, and we include them in our treatment of translation (Figure 1A). For comparison, M. florum contains 1,600 to 2,100 ribosomes (
- Matteau D.
- Lachance J.-C.
- Grenier F.
- Gauthier S.
- Daubenspeck J.M.
- Dybvig K.
- Garneau D.
- Knight T.F.
- Jacques P.-É.
- Rodrigue S.
Integrative characterization of the near-minimal bacterium Mesoplasma florum.
) which correpsonds to 650 to 850 ribosomes when scaled to the volume of Syn3A. A larger ribosome count is not surprising because of the faster doubling time of M. florum.
2) Assuming the ribosomes to be spherical with a radius of 10 nm, the ribosomes were placed in the lattice representation using seven 8 nm lattice sites arranged as a star. 3) We modeled the 543 kbp circular chromosome of Syn3A as a lattice polymer on a 4 nm lattice cubic lattice (11.8 bp per monomer). This lattice polymer model was constrained to be self-avoiding and circular, a type of model also known as a self-avoiding polygon (SAP). The 4 nm lattice was made to be coincident with the 8 nm cubic lattice, and the chromosome model was also constrained to avoid the ribosomes and remain within the membrane.
Ensembles of chromosome configurations constrained by the transformed tomogram data (ribosomes and membrane) were then generated using a Monte Carlo algorithm that alternated between freely-growing the SAP through the insertion of monomers and then equilibrating the lattice model while subject to a simplified Hamiltonian containing a nearest-neighbor excluded volume term and a bending potential based on the assumed persistence length of dsDNA.
4) Finally, a coarse-graining procedure was applied to the configurations to localize up to eight of the 4 nm monomers within the 8 nm lattice sites, this is shown as the spheres embedded in the orange lattice sites in Figure 1B. Upon completion, 46,188 sequence-specific DNA particles, each representing an 11.8 bp portion of Syn3A’s chromosome, were distributed among the 8 nm lattice sites according to the final positions of the monomers in the 4 nm lattice polymer model of the chromosome. An example of a Syn3A cell geometry with the ribosomes and DNA particles is shown Figure 1C. There were few intrachromosomal interactions in the preliminary 3C-Seq map, and only a small number of possible DNA looping interactions were identified (
- Gilbert B.R.
- Thornburg Z.R.
- Lam V.
- Rashid F.M.
- Glass J.I.
- Villa E.
- Dame R.T.
- Luthey-Schulten Z.
Generating chromosome geometries in a minimal cell from cryo-electron tomograms and chromosome conformation capture maps.
). The lack of intrachromosomal interactions was hypothesized to be caused by a lack of persistent supercoiling, which may result from two factors present in Syn3A: 1) a low abundance of the nucleoid-associated protein HU, which can stabilize plectonemic loops resulting from supercoiling, and 2) a relatively high abundance of topoisomerases and gyrases, which can relax translation-induced supercoiling (
- Gilbert B.R.
- Thornburg Z.R.
- Lam V.
- Rashid F.M.
- Glass J.I.
- Villa E.
- Dame R.T.
- Luthey-Schulten Z.
Generating chromosome geometries in a minimal cell from cryo-electron tomograms and chromosome conformation capture maps.
). The limited number of possible DNA looping interactions were assumed to be unsupercoiled loops created by SMC protein complexes. Due to the uncertainty about nature of the interactions in the preliminary 3C-seq map, the whole-cell simulations in this study used chromosome configurations without DNA looping present. The DNA configurations for the independent RDME simulations were chosen from an ensemble of over 100 possible configurations, so that no two simulations contained the same configuration. Within the spatially-resolved model, the lifetime of a gene’s mRNA transcript is strongly-dependent on the proximity of the gene’s to ribosomes and the membrane-associated degradosomes. We evaluated the uniqueness of the DNA configurations in our study by comparing the radial distances from the center of the cell for all 493 gene end sites. There were significant variations between the 8 configurations we selected with an average difference in radial distances of approximately 50 nm.
Initialization of proteins, mRNAs, and tRNAs
Having placed the ribosomes and circular DNA, we create the rest of the cellular architecture by specifying regions of the cell. Within spatially-resolved LM simulations, this is done by manipulating the lattice site types that the individual reaction subvolumes are centered about. To do this, we use new functionality in the jLM subpackage, in which the user can specify the parametric form of select three-dimensional shapes and jLM will generate matching lattice representations. Additionally, jLM allows for logical set operators to be used to compose multiple lattice representations to create more complex geometries. If a particle restricted to a region may freely diffuse to every subvolume within that region during the course of an RDME simulation, then we refer to that region as being contiguous. All lattice sites in the simulation space are initialized as belonging to the extracellular region. We first create a lattice representation of a sphere with a radius of 200 nm and define those lattice sites as belonging to the cytoplasm region. Within the spatially-resolved reaction model, the outermost layer of the cytoplasm is defined as a separate outer-cytoplasm region, so that peripheral membrane complexes, such as the degradosome, can be treated independently from transmembrane complexes. We next create this outer-cytoplasm region by using jLM to compose a spherical shell that is exterior and adjacent to the cytoplasm region, a minimum of one subvolume in thickness, and contiguous. In the final step, we create the membrane region by using jLM to compose a spherical shell that is exterior and adjacent to the outer-cytoplasm region, a minimum of one subvolume in thickness, and contiguous.
Once the cellular architecture has been constructed, we place other macromolecular complexes such as degradosomes and transporters, mRNA, and proteins as shown in Figures 1D and 1E. The degradosome is a complex that consists of a membrane-bound endoribonuclease scaffold (RNase Y), two metabolic enzymes (enolase and phosphofructokinase), an RNA helicase, a 3′ to 5′ exoribonuclease (RNase R or a putative exoribonuclease), and two 5′ to 3′ exoribonucleases (RNases J1 and J2) (
- Cho K.H.
The structure and function of the gram-positive bacterial RNA degradosome.
). The 3′ to 5′ exoribonuclease that typically binds most favorably in Gram-positive bacteria is PNPase. However, since PNPase isn’t present in JCVI-syn3A, we assume one of the other two 3′ to 5′ exoribonucleases present in the cell can take its place. The degradosome breaks down mRNA by first cleaving messengers with RNase Y, unwinding any dsRNA using the helicase, and then degrading the fragments from end to end using the exoribonucleases. We assume the stoichiometry of all degradosome components to be 1, so we take the minimum proteomics count among the components to determine the total number of complete degradosomes: 120 shared by RNase J1 (rnjA/0600) and the putative degradosome helicase (0410).
Proteins are initialized to their counts from the genome-wide proteomics for Syn3A (
- Breuer M.
- Earnest T.M.
- Merryman C.
- Wise K.S.
- Sun L.
- Lynott M.R.
- Hutchison C.A.
- Smith H.O.
- Lapek J.D.
- Gonzalez D.J.
- et al.
Essential metabolism for a minimal cell.
). The 93 membrane proteins making up roughly 9,000 of the 77,000 proteins in the proteomics are randomly distributed in the membrane. Proteins not reported in the proteomics or with counts less than 10 are initialized with a count of 10. 10 was chosen as the default protein count because the average uncertainty among all proteins over the three replicate mass spec experiments performed by