Machine Learning in Health Informatics: Making Better use of Domain Experts.
Wallace, Byron.
2012
-
Abstract: We present
novel machine learning and data mining methods that make real-world learning systems
more efficient. We focus on the domain of clinical informatics, an archetypical example
of a field overwhelmed with information. Due to properties inherent to clinical
informatics tasks -- and indeed, to many tasks that require specialized domain knowledge
-- `off-the-shelf' machine learning ... read moretechnologies generally perform poorly in this
domain. If machine learning is to be successful in clinical science, novel methods must
be developed to: mitigate the effects of class imbalance during model induction; exploit
the wealth of domain knowledge highly skilled domain experts bring to the task; and to
induce better models with less effort (fewer labels). We present new machine learning
methods that address each of these issues, and demonstrate their efficacy in the task of
abstract screening. In particular, we develop new theoretical perspectives on
\emph{class imbalance}, novel methods for exploiting \emph{dual supervision} (i.e.,
labels on both instances and features), and new \emph{active learning} techniques that
address issues inherent to real-world applications (e.g., exploiting multiple experts in
tandem). Each of these contributions aims to squeeze better classification performance
out of fewer labels, thereby making better use of domain experts' time and expertise.
The immediate aim in this work is to reduce the workload involved in conducting {\em
systematic reviews}, and to this end we demonstrate that the developed methods can
reduce reviewer workload by more than half, without sacrificing the comprehensiveness of
reviews (i.e., without missing any relevant published evidence). But this is only an
exemplary task; the approaches presented here have wider application to many real-world
learning problems, i.e., those that require specialized expertise, exhibit class
imbalance (and asymmetric costs) and for which limited human resources are available. We
show that the methods we have developed bring substantial improvements over previously
existing machine learning approaches in terms of inducing better models with less
effort.
Thesis (Ph.D.)--Tufts University, 2012.
Submitted to the Dept. of Computer Science.
Advisor: Carla Brodley.
Committee: Thomas Trikalinos, Roni Khardon, Anselm Blumer, and Jaime Carbonell.
Keywords: Computer science, Artificial intelligence, and Bioinformatics.read less - ID:
- h989rf38m
- Component ID:
- tufts:21156
- To Cite:
- TARC Citation Guide EndNote