Online image descriptions help people who are blind or have low vision easily access information every day. These bits of “alt-text” are a valuable resource – providing the context and detail needed to meaningfully, securely and efficiently interact with the websites and social media platforms that make up everyday modern life.
However, website developers and social media users often neglect the important task of creating alt text. Because of this dynamic, researchers are exploring how artificial intelligence can be used to quickly recognize image content and generate useful descriptions to fill the gap.
It's a complex problem that CU Boulder alumna Abigale Stangl has been working to untangle for years. She recently helped lead a multi-university study focused on how to create training materials that humans and artificial intelligence can use to author more useful image descriptions.
The work, published in ACM SIGACCESS Conference on Computers and Accessibility, extends prior research in the field of human-computer interaction that indicates that blind people want different information for images found on different media sources, she said.
“To further investigate how to author image descriptions that are responsive to the context in which they are found, we presented 28 people who are blind with as much information as possible about five images and then asked them to specify what information they would like about the image for the different scenarios,” Stangl said.
“Each scenario contained a media source in which an image is found and a predetermined information goal. For instance, we considered a person visiting a shopping website to find a gift for a friend as a potential scenario.”
Stangl said the work provided several key findings. One was that the information blind people want in an image description changes based on the scenario in which they are encountering the image.
“For alt-text to be accurate, both human and AI systems will need training to author image descriptions that are responsive or context-aware to the user's information goal along with where the image is found,” she said.
Other findings suggest that there are some types of information that blind people want for an image across all scenarios, and thus it may be possible to determine what image content should always be included in image descriptions.
Stangl earned her PhD in Technology, Media and Society from the ATLAS Institute in 2019. She currently works remotely for the National Science Foundation as a Computing Research Association Computing Innovation Fellow (CI-Fellow) at the University of Washington.
One of her co-authors on this paper was Assistant Professor Danna Gurari, who recently joined the Department of Computer Science at CU Boulder.
Stangl first met Gurari through her PhD advisor, Associate Professor Tom Yeh, as they are both part of broader computer vision community.
“As a PhD student, professor Yeh encouraged me to pursue my interests in non-visual accessibility and the design of tactile media with blind people. Just as I was finishing my dissertation, he learned of professor Gurari’s mutual interest in making visual information accessible to people who are blind and made our introduction,” Stangl said.
“We both wanted to work on projects with real-world application and impact. She supported me in getting a Bullard Postdoctoral Fellowship at the University of Texas at Austin and guided me to conduct user-centered research on improving automated image description technologies in partnership with Microsoft’s Research-Ability Initiative.”
Stangl added that, in Gurari, CU Boulder has gained a great professor, mentor and researcher who is leading efforts in accessible and ethical AI research.
During her PhD studies, Stangl volunteered with the Anchor Center for the Blind, the Colorado Center for the Blind, and the National Federation for the Blind to better understand the barriers blind people face in gaining access to information and becoming artists and designers themselves. She said she has always been motivated to make sure that end-users and stakeholders are involved in the design process.
“My research with professor Gurari was essentially a proof of concept that one-size-fits-all image descriptions do not meet the access needs of blind people. In it, we provide reflections and guidance for how our experimental approach may be used and scaled by others interested in creating user-centered training materials for context-aware image descriptions – or at least minimum viable image descriptions,” she said. “I am looking forward to continuing it and exploring new approaches and problems in the near future.”