CAREER: Advancing AI for Accessibility with User Feedback
Sponsor: National Science Foundation
Award Number: 2443719
PI: Eshed Ohn-Bar
Abstract:Artificial Intelligence (AI) and accessible computing have tremendous potential to improve the quality of life for users. Despite significant advancements in AI, the needs and preferences of people who have low vision (e.g., older adults and those who are blind and visually impaired) are not appropriately considered. A multimodal AI model may misinterpret critical directional information in navigation tasks, causing potential safety hazards. When users attempt to provide detailed feedback, the model often fails to follow the request. The model may respond with long and inaccurate descriptions. It may not recognize the needs of the user and focus on irrelevant information. This research highlights a critical gap in AI’s ability to quickly recognize and adapt to the specific interaction needs of users with varying abilities. One of the results of this research is to create the first publicly available, large-scale multi-modal dataset with labeled preferences from users with low vision. This work aligns the data with the goal of AI working effectively and safely with all users. The research also emphasizes and supports the importance of scalable multimodal systems for goal-oriented language generation and real-world decision-making tasks. Through pre-training and generalized feedback from users of these multimodal large language models, it will make these AI-based systems more usable. The methods and results will support those with low vision but can be adapted to address other disabilities. This will allow AI-based systems to better serve the different needs and contexts of users with different abilities.
This project addresses a critical gap in computing associated with artificial intelligence (AI) by embedding comprehensive knowledge of accessibility including user preferences into interactive, goal-oriented AI-based agents. The results from Aim 1 will create the first publicly available, large-scale multimodal dataset, AccessBench, with labeled preferences from low vision users. This can be extended to people with different abilities. AccessBench facilitates the training and evaluation of multi-modal large language AI models tailored to user needs in different contexts. The system aligns the data with the goal of collaborating effectively and safely with end-users with different abilities. The approach includes detailed feedback from expert users and people with lived experiences of low vision to identify limitations in the existing reward modeling and instruction generation methods. This will allow for better AI models and system designs, which will provide for preferences and create more helpful systems at scale. Aim 2 will result in pre-trained and fine-tuned multimodal large language models (MLLMs), known as AccessAgent. This will result in a feedback-based framework for instilling usability knowledge into MLLMs, facilitating diverse downstream HCI and usability tasks for various users and contexts. The outcome will be a large-scale preference benchmark derived from Aim 1 in collaboration with industry partners. The project will revisit standard MLLM training pipelines to explore and address robustness issues in the context of preferences and user abilities throughout the development pipeline. A key outcome from this project is the release of a novel pre-trained, fine-tuned, and aligned MLLM at various capacities and sizes as an open-source project for the community to use.