Brown CS: Posts

Research Associate Tom Sgouros And Brown CS Students Use Sound And AI To Make NASA Imagery Accessible

Posted by Jesse Polhemus
on Dec. 18, 2023

How would you describe this image to someone who can't see?

"Pivoting is a lot of what I do," Brown CS Research Associate Tom Sgouros says of a current project. It began in a familiar research area, virtual reality, and evolved in two different directions, resulting in work that offered unexpected depths along the route to an important and often neglected goal: aiding the visually impaired.

"Sometimes," he notes, "you don't anticipate the interesting questions that'll appear until you dive into something. Certain things sound trivial until you actually get going with them, and then you see serious challenges that are worth real consideration, and all sorts of philosophical questions arise."

The story begins when Tom met Kimberley Kowal Arcand, a visualization scientist and science communicator for NASA's Chandra X-ray Observatory, during a tour of the Yurt Ultimate Reality Theatre (YURT), Brown's world-class virtual reality tool. "I've got some data," Sgouros remembers her saying, and he was soon working on 3D reconstructions of distant stellar objects for the Smithsonian Astrophysical Observatory (SAO), taking viewers on a magic carpet ride through supernovas and nebulae.

The work was going well, Tom says, but things soon took an unexpected turn, the result of a NASA request to make the visualizations accessible to visually-impaired people. Essentially, they wanted a VR experience that could be navigated with the ears.

At first, Sgouros was skeptical.

"But then you start thinking," he says, "and you realize it's the right thing to do, and there are some interesting problems. How do you reveal the positions and the important features of the data you were displaying to people who might not be able to see?" The team that he assembled was ready to conduct user trials in April of 2020, but the COVID-19 pandemic interfered, making the sharing of headsets undesirable.

So they pivoted, switching to ordinary smart phones. Being able to use commodity equipment instead of bleeding-age VR technology enabled them to make rapid strides, Sgouros says, and in less than two years they'd created a sonification web app that let users explore stellar phenomena with their fingers, hearing different sounds as they navigated distant galaxies. Encouraged by the results of user trials, they pivoted again: this time, to a native phone app in order to incorporate haptic feedback.

Earlier this fall, some new funding from the SAO offered yet another facet for their emerging research: using large language models to generate alt text (text read aloud by screen readers to assist users with vision impairment) for the astronomical images. Tom was immediately intrigued, and reaching out to the Brown CS Gigs mailing list, where users can share job postings with CS students, was delighted to receive more than two dozen responses, including three from visually impaired CS students. With the help of an independent study project sponsored by Brown CS faculty member Jeff Huang, he added some additional undergraduates to his team.

A slide from a presentation by Tom's research team containing LLM-generated text

This part of the project, Tom says, has been immensely fun: "Partly because of how well it works, but partly because of how it doesn't work. We noticed pretty quickly that the AI would do things like describing images as having colors that weren't really there. We broke the process into three steps: building an evaluator to grade descriptions, generating descriptions, and then fact-checking them, and what's sort of funny is that we've used LLMs in every step. We have fourteen different axes of performance, and clustering analysis seems to indicate that we can identify 'good' descriptions – they end up in a particular corner of the graph."

"I'm fascinated by the questions that arose," Tom tells us, "and the way our team members with vision impairments gave us insight into them. What actually constitutes a good description? When our LLM unexpectedly mentioned the 'inky vastness' of space, does that kind of poetic language help? And our consensus is that yes, a little bit of figurative language can be helpful. Our vision-impaired students mentioned that comparisons to everyday objects are useful, but what do we do when our LLM says that something looks sort of like a jellyfish?"

"We live in Rhode Island," Tom laughs, "so a jellyfish might be an everyday object around here, but now we have the question of what constitutes an everyday object! A lot of what makes the project interesting are these discussions about what constitutes a good alt text description.”

It's clear that other philosophical questions lie ahead, and Sgouros and his team are looking forward to them. Tom's current student researchers include Areshva Mir, Anna Ohrt, Mihnea Steiu, Effy Pelayo Tran, and Master's students Tianze (Etha) Hua and Yumeng Ma. Prior contributors include Ashley Chang, Elaine Jiang, Anika Mahns, Alex Stewart, Holly Zheng, and Master's student Grant Lee.

Earlier this month, the students who are getting credit for their work gave a presentation to Jeff's research group, and Tom's alt-text generation team recently presented to representatives of AI working groups from NASA and the Smithsonian Astrophysical Observatory (see here for their slides). A beta version of the sonification app will be ready in the spring, and Tom is hoping that the reception of both projects will lead to additional research funding from NASA and/or related government agencies.

"When I look back," Tom says, "having visually-impaired students on my team has been one of the best parts of these projects. Getting their advice in-house makes a huge difference, and knowing we have people in the room who might be our users someday gives us an appreciation of how valuable the work we're doing actually is. They work hard and produce, and I'm sure some of the reason why is that they don't need to be told how incredibly important accessibility software is."

For more information, click the link that follows to contact Brown CS Communications Manager Jesse C. Polhemus.

Information for:

Posts

Categories

Research Associate Tom Sgouros And Brown CS Students Use Sound And AI To Make NASA Imagery Accessible