On the Subject of Identifying Anomalies in Heterogeneous Vector Based Data.
Cousins, Cyrus B.
- Abstract:This paper is focused novel techniques for the unsupervised anomaly detection algorithm over datasets of real and finite discrete variables, centered around the existing FRaC algorithm. Novel variants of the FRaC algorithm are presented, alongside mathematical justification and empirical evidence to support their use. mFRaC, eFRaC and cFRaC, are introduced here. These techniques have a ... read morefocus on ensemblification, treating features equally, and handling missing values in samples. eFRaC and cFRaC are shown to be minor improvements on the traditional FRaC algorithm over UCI datasets by comparison of AUROC values.Additionally, several previously unknown properties of the FRaC algorithm are analyzed. The property of normalization-invariance is shown, given lenient assumptions. The properties of strong and weak self selection in feature modelling techniques are introduced, and FRaC is sown to have the weak self selection property under certain circumstances. Implementation details of the various statistical calculations necessitated by FRaC are also discussed. Original filter method based heuristic feature selection techniques are presented, alongside analysis and empirical evidence. A more conservative FRaC specific alternative to traditional filtering, termed partial filtering, is also introduced, and compared to traditional filtering.Finally, these algorithms are discussed in the context of the larger subfield of feature modelling anomaly detection techniques. The algorithms presented are broken into two categories, mathematical redefinitions of what constitutes an anomaly, and hyperparameter-reducing generalization algorithms. Additionally, further algorithms that were conceived but not implemented are described in this framework.read less