Limits of Data Mining in Malicious Activity Detection: Murat Kantarcioglu, University of Texas at Dallas (Data Management seminar)
- 11:00 am on Friday, February 7, 2014
- 12:00 pm on Friday, February 7, 2014
- MCS 148
Murat Kantarcioglu, Director of UTD Data Security and Privacy Lab, University of Texas at Dallas Abstract: Many data mining applications, such as spam filtering and intrusion detection, are faced with active adversaries. In all these applications, the future data sets and the training data set are no longer from the same population, due to the transformations employed by the adversaries. Hence a main assumption for the existing data mining techniques no longer holds and initially successful data mining models degrade easily. This becomes a game between the adversary and the data miner: The adversary modifies its strategy to avoid being detected by the current classifier; the data miner then updates its classifier based on the new threats. In this talk, we investigate the possibility of an equilibrium in this seemingly never ending game, where neither party has an incentive to change. Modifying the data mining algorithm causes too many false positives with too little increase in true positives; changes by the adversary decrease the utility of the false negative items that are not detected. We discuss our game theoretic framework where equilibrium behavior of adversarial classification applications can be analyzed, and provide solutions for finding an equilibrium point. A classifier’s equilibrium performance indicates its eventual success or failure. The data miner could then select attributes based on their equilibrium performance, and construct an effective data mining model. In addition, we discuss how our framework could be applied for building support vector machines that are more resilient to adversarial attacks. In the remainder of this talk, we discuss the implications of our game theoretic adversarial data mining framework in the context of social network mining. We discuss how data mining techniques could be applied to predict undisclosed private information. More specifically, we discuss how to launch inference attacks using released social networking data to predict undisclosed private information about individuals, such as their political affiliation or sexual orientation. We then discuss various techniques that could be employed to prevent learning of such sensitive data and the effectiveness of these techniques in practice. We show that we can decrease the effectiveness of data mining algorithms by sanitizing data. Speaker Bio: Dr. Murat Kantarcioglu is an Associate Professor in the Computer Science Department and Director of the UTD Data Security and Privacy Lab at the University of Texas at Dallas. He holds a B.S. in Computer Engineering from Middle East Technical University, and M.S. and Ph.D degrees in Computer Science from Purdue University. He is a recipient of NSF CAREER award and Purdue CERIAS Diamond Award for Academic excellence. Currently, he is a visiting scholar at Harvard Data Privacy Lab. Dr. Kantarcioglu's research focuses on creating technologies that can efficiently extract useful information from any data without sacrificing privacy or security. His research has been supported by grants from NSF, AFOSR, ONR, NSA, and NIH. He has published over 100 peer reviewed papers. Some of his research work has been covered by the media outlets such as Boston Globe, ABC News etc. and has received two best paper awards.