The Math Behind Vision

Scientists have long known that the brain uses shortcuts to glean information from the senses. In vision, for instance, we don’t need to see all the nuances of an object to recognize it. A simple, shaky line drawing of an apple or a house is instantly recognizable as the object depicted. In 1980, the late David Marr, then a professor in MIT’s Artificial Intelligence Laboratory, attempted to explain this phenomenon using a theory of visual processing known as edge detection.

Mark Kon

Marr postulated that humans determine information about images by finding edges, that is, lines of contrast that delineate one object from another adjacent one. A line drawing thus represents what the brain is already doing—outlining the world using edges separating discrete objects.

Being aware of this shortcut has made it possible for scientists to better understand vision, and for computer scientists to simulate visual perception through edge detection algorithms. One of Marr’s most trenchant observations was that different edges appear at different visual length scales. Looking at an aerial view of a city, for instance, different structures come into view at different heights, from the gross outlines of neighborhoods to the fine borders of individual yards. Or consider the way that different edges emerge when you blur an image to different degrees. Marr made a conjecture: that by locating only a picture’s edges at various levels of blurring—what are called multiscale edges—it would be possible to reverse-engineer mathematically this so-called edge information and reconstruct the full original visual image.

This principle has been used extensively in computer science for practical applications like facial recognition and image processing, and has informed neuroscience research on vision in humans and other animals. But Marr’s conjecture also set up a fundamental problem for mathematicians: his approach made practical sense, but could his conjecture be proven mathematically?

“People grabbed onto this as a mathematical conjecture,” says Mark Kon, professor of mathematics and statistics at BU. Several scholars attempted to tackle the problem, and a team of mathematicians even succeeded in showing the conjecture false for images that were infinite in size, but no one could prove or disprove the conjecture for real images, which have finite boundaries.

Kon’s research deals with statistics and applied mathematics, so he has long been interested in questions that connect to problems in the real world, particularly when teaching students. Several years ago, while discussing Marr’s problem in one of his classes, Kon came upon the idea of using a mathematical tool known as multipole expansions, which are used to describe electromagnetic and gravitational fields in physics, to solve Marr’s problem. He assigned the task of investigating the idea to Ben Allen, then a PhD student at BU and now a post-doctoral fellow at Harvard University. Together they produced a mathematical proof of Marr’s conjecture for finite images, which is under review at Annals of Mathematics, three decades after the conjecture was first put forth.

Their work also offers proof of the kind of flourishing and productive conversations that can take place between biology, computer science, and pure mathematics. In proving the conjecture behind a practical vision tool like multiscale edge analysis, Kon says, “We’ve put the icing on the cake.”

Research at Boston University BU 2011 Building Smarter Machines

The Math Behind Vision