Classication and Regression Framework for Characterizing Contaminant Source Zone.
Abstract: In this thesis
we develop two machine-learning frameworks for estimating quantitative metrics
characterizing subsurface zones of chemically contaminated soil focusing on problems
involving Dense Non-Aqueous Phase Liquid (DNAPL). Source zone characterization, a
necessary first step in the development of the remediation strategy, is challenging due
to practical constraints associated ... read morewith the data available for processing. We first
propose a set of geometric features which are based on morphological image processing
operations. These features are used for both the classification work in Chapter 3 and
the regression approach developed in Chapter 4. Second, we propose a classification
framework as our initial solution. Specifically, we quantize each metric into a number
of intervals and employ machine learning methods to determine the interval containing
the metric. A classification scheme based on an iterative algorithm of Linear
Discriminant Analysis (LDA) and Spectral Clustering (SC) is used to determine
feature-based clusters that are associated with metric intervals. Furthermore, we
propose a regression framework focusing on the use of manifold regression techniques. We
use manifold methods for jointly representing labeled training data comprised of metrics
as well as features. We then propose a new integrated approach to the problems of (a)
robustly embedding test data into the manifold and (b) constructing a regression
function for metrics estimation. The utility of the approach is enhanced by the explicit
incorporation of a physical constraint associated with the metrics into the problem
formulation. Results based upon simulated data using Sequential Gaussian Simulation
(SGS) method demonstrate the potential effectiveness of the manifold regression
approaches as well as significant improvement in performance relative to the case where
the algorithmic components are designed serially. At last, we apply our manifold
regression algorithms to a new simulated data set whose the hydraulic conductivity
fields were built by Transition Probability Markov Chain (TP/MC) model. In TP/MC data
the full concentration data are available for training, but the test data are sparsely
sampled from 25 ports. The modifications of our manifold regression algorithms to
process the sparse data are proposed and the results show the efficacy of our
Thesis (Ph.D.)--Tufts University, 2015.
Submitted to the Dept. of Electrical Engineering.
Advisor: Eric Miller.
Committee: Linda Abriola, Shuchin Aeron, and Yue Wu.
Keywords: Computer science, and Hydrologic sciences.read less
- Component ID:
- To Cite:
- TARC Citation Guide EndNote