Description |
-
Abstract: The engineering of living cells promises to advance many applications including synthetic biology and personalized medicine. Experimental efforts, however, can be costly and time-consuming, requiring large efforts to interpret collected data and many iterative design-and-test cycles to achieve desired results. Computational efforts that harness the continuing growth of computing power an... read mored catalogued biological data can advance biological system design by interpreting measurements, efficiently exploring the design space and expediting biological discoveries. This thesis advances state-of-the-art in the engineering and analysis of cellular metabolism by computationally addressing two challenges. The first challenge concerns the lack of systematic ways to design selection pathways in directed evolution of enzymes, an iterative process of creating mutant libraries and choosing desired phenotypes through screening or selection until the enzymatic activity reaches a desired goal. Identifying high-throughput screens or selections to isolate the variant(s) with the desired property is the biggest challenge in directed enzyme evolution, as there are currently no known generalized strategies or computational techniques to do so. This thesis presents a computational metabolic engineering framework, termed Selection Finder (SelFi), to construct a selection pathway from a desired enzymatic product to a cellular host and to couple the pathway with cell survival. When applied to construct selection pathways for several target enzymes and their desired enzymatic products, SelFi identifies selection pathways that were previously manually designed and experimentally validated. The second challenge concerns the interpretation of data measured through untargeted metabolomics, where molecular masses of thousands of small molecules are measured simultaneously via mass spectrometry. Annotating the masses by assigning them a chemical identity and interpreting their biological relevance is challenging, as a particular mass may be associated with multiple chemical compounds. This thesis contributes to solving the metabolite interpretation challenge in two ways. This thesis presents a novel computational workflow, termed Expanded Metabolic Model based Annotation (EMMA). EMMA constructs a biological filter consisting of an Expanded Metabolic Model (EMM) that includes not only the canonical substrates and products of enzymes, but also metabolites that can form due to substrate promiscuity, where an enzyme transforms other substrates in addition to its natural substrate. This expanded model is used to reduce the number of candidate chemical identities from large chemical databases that can be assigned to the measurements. EMMA is applied to two untargeted metabolomics data sets. Compared to a basic annotation workflow that analyzes every candidate compound in large chemical databases, EMMA reduces the number of calculations by 4 orders of magnitude. Additionally, EMMA increases the number of annotated masses by average of 1.71 and 2.39-fold, respectively, when compared to using the sample's metabolic model. Further, the results show that EMMA increases the number of annotated masses and biologically relevant candidate molecules by the average of 2.65 and 2.80-fold, respectively, when compared to using candidate sets from a biological database. The EMMA workflow was experimentally validated by confirming the presence of 4-hydroxyphenyllactate, a Chinese Hamster Ovary (CHO) cell metabolite in the EMM that has not been previously identified as part of CHO cell metabolism. Further contributing to metabolite interpretation, this thesis presents a novel probabilistic approach, termed Probabilistic modeling for Untargeted Metabolomics Analysis (PUMA), for predicting the likelihood of activity of metabolic pathways by assigning measurements directly to metabolic pathways and then deriving probabilistic assignment of measurements to candidate chemical identities. This approach captures measurements and metabolic models within a probabilistic model, and uses stochastic sampling to compute posterior probability distributions. When applied to a test case, pathway activity results are biologically meaningful and distinctly different from those obtained using statistical pathway enrichment techniques. Further, annotation results are in agreement with those obtained using other tools that utilize additional information in the form of spectral signatures.
Thesis (Ph.D.)--Tufts University, 2018.
Submitted to the Dept. of Computer Science.
Advisor: Soha Hassoun.
Committee: Anselm Blumer, Nikhil Nair, Li-Ping Liu, and Tamer Kahveci.
Keyword: Computer science.read less
|
This object is in collection