Mark Crovella: Mapping the Internet in a New Age of Privacy

You can’t see it, but when you enter something in the search bar, there is a whole network of connections that happens. We typically don’t think about the internet having a map, but CISE faculty affiliate Mark Crovella, a founding member and faculty of Computing & Data Sciences, likened his work to figuring out what the map of the internet is. For 25 years, Crovella has been interested in studying the physical structure of the internet quantitatively through measures like latency (the amount of time it takes to send data packets). His most recent research focuses on network connectivity in private backbone infrastructures such as Google, Amazon, and Microsoft.

“There are enormous amounts of wires and switches deployed all over the world, creating the internet, but nobody has a map of the internet in the same way you can look at a map while driving a car. So we’re forced to go out and measure the internet to try and figure out what its characteristics are,” Crovella said.

Gradually, cloud and content providers that dominate most of the web have started to build their own networks straight to the user instead of going through multiple different service providers (like Sprint, AT&T, and Verizon) to get to the user. This means most pathways are obscured from being seen. The reasons that service providers obscure their pathways are to protect their equipment, like where their routers are located and what they’re doing, which could open up a host of problems. It also brings a competitive advantage to hiding the relationship between Google and another site, which can be determined through the amount of traffic going from Google to a site.

Photo by Mike Spencer for Boston University Photography
Photo by Mike Spencer for Boston University Photography

With these private backbone infrastructures, the standard mapping tool (called traceroute) can no longer be used to evaluate network connectivity on the public web. Companies are able to manipulate traceroute so the information that it’s giving is not completely accurate. Other providers have disallowed traceroute in their networks.

Without traceroute, it’s harder to obtain useful insight into network structures. There is nothing holding companies accountable– they are able to make claims about the performance of their networks without verification from outside sources. It’s also harder for researchers such as Crovella to see what the maps of the internet look like. In order to get around this problem, Crovella and his team have used light-weight measurements combined with heavy-weight mathematical analysis tools.

A light-weight analysis tool would be measuring the end-to-end round trip delay (RTT) of information going through the network. RTT can then be augmented in the form of geolocation and path endpoints. Through triangulation, Crovella was able to measure the distance between a user and the router on the map by taking data points that are emitted from different locations. This meant he could geolocate where a router was in the world.

However, Crovella said there’s not a simple relationship between points on the map and the latency that takes between those points, meaning the distance is not directly predictable by the amount of time it takes to go from one point to another. He compared this to traveling from Boston to Provincetown. The most direct route would be straight across the ocean, but this isn’t possible by car, so one must drive south and along the Cape before reaching Provincetown. The time taken driving by land is much longer than if one were to “drive” across the ocean. This means that if a packet (data) is going from a user in Chicago to Denmark, the packet isn’t necessarily going the most direct route.

In addition to this, routes can be curved. What might appear to be a straight line from one node to another may actually be curved– similar to how standing on Earth appears flat, but in reality, it’s a sphere.

Crovella made an analogy to Einstein’s theory of relativity to come to the conclusion that the distance between nodes might be curved. In order to determine the distance, he used Riemannian geometry– a heavy-weight analysis tool that deals with continuous surfaces. However, most computer science deals with graphs.

The problem is the internet has no surfaces– it’s just switches and links, but now mathematicians have started to apply Riemannian geometry to graphs.

Using this math, Crovella and his team were able to see the paths that packets were taking in different cloud service providers such as AWS, Azure, and Amazon. Using manifold view, they turned the graphs of packets into a map of the world represented with elevation. The points of elevation on the graph show that there are multiple different paths a packet can take. The deeper the valley, the more limited the paths are between two locations.

Crovella said deep valleys between Europe and Asia are because the Red Sea is a choke point for data traffic since cables have to go underwater. Some cloud providers are able to find better routes that go around this chokepoint, such as AWS.

His research can help companies determine where to place new infrastructure to help with connectivity issues. Recently, Crovella and his team were invited to present to the Google Networking group.

“The internet is being built without any centralized control or direction. Nobody is deciding on a global level where things should be built and located,” Crovella said. “I liken my work to being a biologist who discovers a new organism and wants to describe it and figure out how it works. We think of what we do as discovery-oriented more than engineering-oriented.”

Loqman Salamatian, Scott Anderson, Joshua Matthews, Paul Barford, Walter Willinger and Mark Crovella (2022).
Curvature-based Analysis of Network Connectivity in Private Backbone Infrastructures.
In: Proceedings of ACM SIGMETRICS / IFIP Performance. Mumbai, India.