NRI: FND: A Formal Methods Approach to Safe, Composable, and Distributed Reinforcement Learning for co-Robots

Sponsor: National Science Foundation (NSF)

Award Number: IIS-2024606

PI: Calin Belta

Abstract:

Many applications require heterogeneous teams of robots to collaborate with each other and with humans to accomplish complex tasks. Consider, for example, a futuristic robotic restaurant, in which the goal is to make hotdogs and serve them together with drinks to incoming customers. A couple of robotic manipulators have sensors and actuators allowing them to manipulate and grill the hotdogs, put them in buns, and add spices. Another robot has a gripper that allows it to handle glasses and pour drinks. Mobile wheeled robots can move around the restaurant, greet the customers, and then serve them hotdogs and drinks. A human supervisor gives the robotic team high level task specifications, together with some useful facts, and then watches the team working. If something goes wrong, or the robots do not manage to coordinate efficiently, the supervisor can intervene and give more instructions. Many other application areas share similar scenarios, including agriculture, military surveillance, search and rescue. This project proposes an approach to solve such problems that exploits the robots’ manipulation and cooperation capabilities, allows for rich task specifications and interactions with humans, while at the same time ensuring the safety of the overall operation. The research plan is integrated with an education and outreach plan that includes a rich spectrum of robotic-related activities for undergraduate and high-school students.

The proposed technical approach brings together tools from machine (reinforcement) learning, formal methods, and optimal control. A rich, easy-to-understand, temporal logic specification language will be developed to formalize requirements such as the ones from the example above, and to specify prior knowledge. Central to the computational framework is a metric that measures the satisfaction of the specifications, which will be used to guide the learning process. This metric will be combined with control barrier functions to ensure safety. The proposed approach is compositional – policies for new tasks will be constructed from a library of learned policies with little to no additional exploration.

For more information, click here.