Yin Chen

January 2011
From Networks to Proteins: Modeling and Optimization with Markov Random Fields
Committee Members: Advisor: Ioannis Paschalidis, SE/ECE; Appointed Chair: David Starobinski, SE/ECE; Pirooz Vakili, SE/ME; Sandor Vajda, SE/BME; Venkatesh Saligrama, SE/ECE

Abstract: Markov random field (MRF) theory provides a consistent way to model context-dependent entities and correlated features. It is often used in conjunction with statistical decision and estimation theories to formulate objective functions in terms of established optimality principles. This dissertation applies MRF modeling and optimization in two areas: network security and protein docking.

The first problem considered is network anomaly detection. The objective is to detect statistically significant temporal or spatial changes in either the underlying process being monitored or the network operation itself. These changes may point to faults, threats, misbehavior, or other anomalies that require intervention. A new statistical anomaly detection framework is introduced that uses Markov models to characterize the “normal” behavior of the network. A series of Markov models are developed, including a node-level model, a tree-indexed model, and a networked Markov chain model to capture temporal and spatial features of network evolution. Large deviations techniques are employed to compare the empirical distribution (estimated from past activity traces) with its most recent empirical measure. Optimal decision rules are developed for each model to identify anomalies in recent activities. Simulation results validate the effectiveness of the proposed anomaly detection algorithms.

The second application lies in bioinformatics where the objective is to computationally predict the structure of a complex of two interacting proteins. In the refinement stage of a multistage protein docking process, side-chains are extracted from the docking interface and positioned optimally by minimizing an energy function which models interactions among atoms. This combinatorial optimization problem is formulated as a maximum weighted independent set (MWIS) problem. The MWIS problem is NP-hard, yet, this dissertation develops a distributed message passing algorithm that can produce effective feasible solutions. The approach is tested on a benchmark of 17 proteins and the results indicate that optimal side-chain positioning significantly improves the binding energy landscape, increasing the correlation between energy and root mean square deviation (RMSD) from the native structure. This new side-chain positioning procedure has the potential to accelerate refinement algorithms seeking very accurate predictions of the native structure and thus serve as an important step in computational docking protocols.