Computational Methods to Advance Directed Evolution of Enzymes and Metabolomics Data Analysis
Hassanpour, Neda.
2018
-
Abstract: The
engineering of living cells promises to advance many applications including synthetic
biology and personalized medicine. Experimental efforts, however, can be costly and
time-consuming, requiring large efforts to interpret collected data and many iterative
design-and-test cycles to achieve desired results. Computational efforts that harness
the continuing growth of computing ... read morepower and catalogued biological data can advance
biological system design by interpreting measurements, efficiently exploring the design
space and expediting biological discoveries. This thesis advances state-of-the-art in
the engineering and analysis of cellular metabolism by computationally addressing two
challenges. The first challenge concerns the lack of systematic ways to design selection
pathways in directed evolution of enzymes, an iterative process of creating mutant
libraries and choosing desired phenotypes through screening or selection until the
enzymatic activity reaches a desired goal. Identifying high-throughput screens or
selections to isolate the variant(s) with the desired property is the biggest challenge
in directed enzyme evolution, as there are currently no known generalized strategies or
computational techniques to do so. This thesis presents a computational metabolic
engineering framework, termed Selection Finder (SelFi), to construct a selection pathway
from a desired enzymatic product to a cellular host and to couple the pathway with cell
survival. When applied to construct selection pathways for several target enzymes and
their desired enzymatic products, SelFi identifies selection pathways that were
previously manually designed and experimentally validated. The second challenge concerns
the interpretation of data measured through untargeted metabolomics, where molecular
masses of thousands of small molecules are measured simultaneously via mass
spectrometry. Annotating the masses by assigning them a chemical identity and
interpreting their biological relevance is challenging, as a particular mass may be
associated with multiple chemical compounds. This thesis contributes to solving the
metabolite interpretation challenge in two ways. This thesis presents a novel
computational workflow, termed Expanded Metabolic Model based Annotation (EMMA). EMMA
constructs a biological filter consisting of an Expanded Metabolic Model (EMM) that
includes not only the canonical substrates and products of enzymes, but also metabolites
that can form due to substrate promiscuity, where an enzyme transforms other substrates
in addition to its natural substrate. This expanded model is used to reduce the number
of candidate chemical identities from large chemical databases that can be assigned to
the measurements. EMMA is applied to two untargeted metabolomics data sets. Compared to
a basic annotation workflow that analyzes every candidate compound in large chemical
databases, EMMA reduces the number of calculations by 4 orders of magnitude.
Additionally, EMMA increases the number of annotated masses by average of 1.71 and
2.39-fold, respectively, when compared to using the sample's metabolic model. Further,
the results show that EMMA increases the number of annotated masses and biologically
relevant candidate molecules by the average of 2.65 and 2.80-fold, respectively, when
compared to using candidate sets from a biological database. The EMMA workflow was
experimentally validated by confirming the presence of 4-hydroxyphenyllactate, a Chinese
Hamster Ovary (CHO) cell metabolite in the EMM that has not been previously identified
as part of CHO cell metabolism. Further contributing to metabolite interpretation, this
thesis presents a novel probabilistic approach, termed Probabilistic modeling for
Untargeted Metabolomics Analysis (PUMA), for predicting the likelihood of activity of
metabolic pathways by assigning measurements directly to metabolic pathways and then
deriving probabilistic assignment of measurements to candidate chemical identities. This
approach captures measurements and metabolic models within a probabilistic model, and
uses stochastic sampling to compute posterior probability distributions. When applied to
a test case, pathway activity results are biologically meaningful and distinctly
different from those obtained using statistical pathway enrichment techniques. Further,
annotation results are in agreement with those obtained using other tools that utilize
additional information in the form of spectral
signatures.
Thesis (Ph.D.)--Tufts University, 2018.
Submitted to the Dept. of Computer Science.
Advisor: Soha Hassoun.
Committee: Anselm Blumer, Nikhil Nair, Li-Ping Liu, and Tamer Kahveci.
Keyword: Computer science.read less - ID:
- ht24ww71r
- Component ID:
- tufts:25025
- To Cite:
- TARC Citation Guide EndNote