SE PhD Prospectus Defense of Jing Qian

11:00 am on Tuesday, December 17, 2013
1:00 pm on Tuesday, December 17, 2013
15 Saint Mary's Street, Rm 105
TITLE: Anomaly Detection in High-Dimensional Space ABSTRACT: Anomaly detection refers to identifying statistically significant deviations of data from an expected nominal behavior. It has found extensive use in a wide variety of applications such as fraud detection for credit cards or health care, intrusion detection for cyber-security, surveillance for enemy activities, etc. In this proposal we study two problems in anomaly detection: point-wise anomaly detection in high-dimensional space, and anomalous cluster detection on graphs. In the first part we consider the problem of point-wise anomaly detection, where given n i.i.d samples from some unknown nominal density, the goal is to design a mechanism that decides whether a new test point comes from the same distribution, under some desired false alarm control. We focus on Non-parametric methods, among which popular approaches include one-class SVM and density-based algorithms. One-class SVM is computationally efficient, but does not directly incorporate density information, and has poor control of false alarm rate. In contrast, various density-based methods show better performance but have higher computational complexity at test stage. We propose a novel ranking-based anomaly detection framework that are based on learning a ranker on nominal samples. Density information is incorporated in this learning-to-rank step. At test stage, similar to one-class SVM, we only need to evaluate an SVM-type decision function on the test point. Future work includes showing the asymptotic consistency of our approach, possible finite-sample results, and performance comparison against some new randomized algorithms. In the second part we study the problem of detecting anomalous clusters on graphs. The nodes of the graph are associated with i.i.d standard normal random variables under null hypothesis, while under the alternative, the random variables of some nodes, which form a connected sub-graph, have a lifted mean. Existing work focuses on statistical aspects of the problem and has shown that the test of maximizing the scan statistic over the collection of connected sub-graphs performs well. However, there is little work on practical algorithms for solving this problem, mainly due to the difficulty of characterizing connectivity of clusters. We are able to parameterize the family of all connected sub-graphs in terms of linear matrix inequalities. This allows for optimizing desired graph functionals over the collection of arbitrarily connected sub-graphs. We then provide a convex relaxation of the integer programming framework. We show that the solution to this convex SDP problem guarantees connectivity in some sense, and has additional good properties which naturally motivates a heuristic rounding scheme. Future work includes showing potential statistical advantages of our method, improving the algorithm to suit larger graphs, and finding interesting applications to which our method can be applied. COMMITTEE: Advisor: Venkatesh Saligrama, SE/ECE; Ioannis Paschalidis, SE/ECE; David CastaƱon, SE/ECE; Henry Lam, Math & Statistics