Traffic Cop on the Information Superhighway

By Cara Feinberg

There is a good chance you own a zombie. It could have been infected by a website you visited, or a link you clicked in an email from a trusted friend. Or maybe you didn’t do anything at all to compromise your computer, and still an attacker slipped past your firewalls and turned your computer into a virulent drone.

One thing, however, is nearly certain. If your computer has been breached, you will likely never know it.

Computer viruses, explains Computer Science Professor Mark Crovella, have changed drastically in the last five years. Where once attacks were obvious to victims—computers might slow down, or run unwanted programs—users now fall prey to unseen intruders. Small programs, called “bots,” slip unseen through weak spots in PC protection; from there, they can take up secret residence on a computer and download complex instructions from a remote controller elsewhere on the Internet. Poised to do the bidding of their masters, these zombie computers become part of “botnets”—networks of thousands, or even millions, of computers that can cause all sorts of trouble, by scanning the Internet to look for other vulnerable computers; sending out reams of spam email to ensnare other users; revealing keystrokes or online transactions to steal passwords and credit card numbers; or launching distributed denial-of-service (DDoS) attacks that overload and shut down other websites.

These are the types of threats that Mark Crovella and his team of BU computer scientists and statisticians spend their days hunting down. Their goal is not to treat your ailing computer, or even to gird your existing security. Instead, they aim to identify unwanted Internet traffic, allowing network providers to stop it from ever reaching your PC.

Mark Crovella
Mark Crovella

“For the most part, we tend to leave security to the virus protection programs we buy and install on our PCs,” says Crovella. These programs typically try to identify malicious software by looking for a signature—a sequence of code, or something in the content—and then blocking programs with those signatures. For instance, says Crovella, “There are programs that can tell a computer to block email with the word ‘Viagra’ in the title.” It can be effective, he says, but only for a little while. “An adversary just has to change one letter in its signature—to ‘Vi@gra,’ for example—and they’re in.”

Rather than attempt to define the properties of unwanted traffic, Crovella’s strategy is to paint a picture of what “normal” Internet usage looks like. Using software he and his team designed, they capture and analyze anonymous traffic information at five-minute intervals as the data flows through thousands of routers around the world.

Unusual patterns—statistical anomalies in the amount or type of data being transferred—tip off Crovella and his team to potentially malicious activity. How these programs sneak into individual computers can change daily, even hourly. But their patterns of behavior, the ways they interact on the Internet, are nearly always outside the norm.

DDoS attacks, for instance, generate abnormally large amounts of traffic. Content, too, can reveal criminals at work. “If you see a large variety of Internet Protocol, or IP, addresses—numbers that identify individual computers—coming from one source in a short period of time, that kind of activity is statistically anomalous,” Crovella explains. And “anything outside of statistically normal traffic patterns is potentially malicious.”

Other researchers and companies have tried similar techniques, but with only one router at a time. Using the unique multivariate statistical approach developed at BU, says Crovella, “Suddenly, activity outside the norm stands out in a way it never could if you were looking for it at each source.”

His technique—based on a method called “Principal Component Analysis” and licensed to Guavus, a venture-backed bi-national company led by one of Crovella’s former PhD students Anukool Lakhina—is now being used by GÉANT, Europe’s main multi-gigabit computer network for research and academic purposes.

Crovella continues to refine the technique at BU. Executing this type of analysis requires collecting an immense amount of data, and while computers amass and evaluate it, Crovella and his team must validate their results themselves before submitting their research for publication. This manual examination of multiple terabytes (a terabyte is 1,000,000,000,000 bytes—one trillion if you lost count of the zeros) not only requires a great deal of time and patience, but also expertise in both computer science and statistics. Crovella typically works with two to three students at a time and one to three other faculty investigators. It is a small army against a growing enemy: every day, according to Symantec MessageLabs, approximately 151 billion unsolicited messages are distributed by compromised computers.

“A year or so ago, I discovered my own PC was infected with a botnet,” Crovella said, smiling and shaking his head. “The IT folks discovered it. I never knew it was there.”