Description |
-
Abstract: In Natural
Language Processing, most popular word embeddings involve low-rank factorization of a
word co-occurrence based matrix. We aim to generalize this trend by studying word
embeddings given by low-rank factorization of word co-occurrence based higher-order
arrays, or tensors. We present four novel word embeddings based on tensor factorization
and show they outperform popular ... read morestate-of-the-art baselines on a number of recent
benchmarks, encoding useful properties in a new way. To create one of our word
embeddings, we present a novel joint symmetric tensor factorization problem related to
the idea of coupled tensor factorization. We also modify a recent embedding evaluation
technique known as Outlier Detection to measure the degree to which an embedding
captures Nth order information, showing that tensor embeddings (naturally) outperform
popular pairwise embeddings at this task. Suggested applications of tensor
factorization-based word embeddings are given, and all source code and pre-trained
vectors are publicly available online.
Thesis
(M.S.)--Tufts University, 2017.
Submitted to the
Dept. of Computer Science.
Advisor: Shuchin
Aeron.
Committee: Benjamin Hescott, and Lenore
Cowen.
Keywords: Artificial intelligence, and
Computer
science.read less
|
This object is in collection