Study finds ChatGPT is biased against resumes that list disabilities

(Photo by MARCO BERTORELLO/AFP) (Photo by MARCO BERTORELLO/AFP via Getty Images)

AFP via Getty Images

The practice of recruiters deploying artificial intelligence to scan and categorize candidate resumes has become fairly commonplace over the past few years. AI can summarize large amounts of data, highlight desirable traits, and highlight red flags, making this work, previously performed by HR staff, more streamlined and efficient.

At the same time, many organisations representing the disability community have warned that the technology could discriminate against and exclude job seekers with disabilities due to superficial differences in their resumes compared to the general population.

Now, researchers at the University of Washington have identified an intriguing new dimension to these exclusionary dynamics by studying OpenAI’s ChatGPT and how mentions of a disability affect how a job seeker’s resume is ranked.

To begin their study, researchers from the University of Washington’s Paul G. Allen School of Computer Science & Engineering used one of the study authors’ publicly available resumes as a control. The team then modified the control resume to create six variations, each listing a different disability-related qualification, from scholarships and awards to membership in diversity, equity, and inclusion committees and student organizations. They ran ChatGPT’s GPT-4 model 10 times to rank the modified resumes against original versions of actual “student researcher” job ads from major software companies, producing both surprising and disappointing results.

In virtually every other field, an award or participation in a panel should be perceived as a pure positive, but in this experiment, across 60 trials, the association with a disability caused ChatGPT to rank resumes with disability-related modifications higher than control resumes only a quarter of the time, even though all other parts of the resume remained identical to the original resume aside from the disability-related modifications.

Digging deeper

Of course, the beauty of a large language model like GPT-4 is that users can have a human-like conversation with the chat interface and ask further questions about how they reached their conclusions. In this experiment, ChatGPT appeared to make some discriminatory assumptions, such as determining that Autism Leadership Award candidates were likely to be “less focused on the leadership role.” It also determined that candidates with depression “place more emphasis on DEI and personal challenges,” detracting from the core technical and research-oriented aspects of the role, even though such challenges were not explicitly detailed.

In a recent interview, explaining the uneasy relationship between AI algorithms and disability, Ariana Aboulafia, technology disability rights policy advisor at the Center for Democracy and Technology, said, “Algorithms and algorithmic systems are based on pattern recognition, and many people with disabilities exist outside of the pattern.”

She continues, “These algorithmic systems may, to some extent, be inherently incompatible with producing output that does not discriminate against people with disabilities.”

Commenting specifically on the UW project, project lead author Kate Grazko said, “AI-based resume ranking is becoming more common, but there hasn’t been much research into whether it’s safe and effective. … For job seekers with disabilities, the question always arises of whether they should include their disability credentials when submitting a resume. Even with a human reviewer, we assume that people with disabilities would take that into consideration.”

“When using AI for these real-world tasks, people need to be aware of biases in the system,” Grazko added.

Humanity

Still, the UW study offered a ray of hope. By using GPT-4’s editor feature, which allows users to add further customization to the tool, the researchers were able to rank resumes that corrected disability-related activities higher. In this example, the researchers instructed GPT-4 to not exhibit ableist bias and to follow disability justice and DEI principles. This adjustment improved bias for all but one of the disabilities tested, except for depression. Resumes related to hearing impairment, visual impairment, cerebral palsy, autism, and the general term “disability” all improved, but only three were ranked higher than resumes that didn’t mention a disability. Overall, the system ranked disability-corrected resumes higher than control CVs 37 out of 60 times after GPT-4 was instructed to be more inclusive.

This suggests that ensuring recruiters are aware of AI’s limitations and having tools that can be trained and customized around DEI principles may still be some of the complex challenges in making AI more inclusive.

The other aspect is to deepen our understanding of this new and emerging field through more specific research, as senior study author Jennifer Mankoff, a professor at the Allen School, explains:

“Studying and documenting these biases is critical,” Mankoff said. “We hope to learn from and contribute to the broader conversation about ensuring technology is implemented and deployed in a fair and equitable way, not just for disabilities but for other underrepresented identities as well.”

Aboulafia strongly agreed, emphasizing that “the issue of multiple marginalization is always there, so it’s important to recognize that a straight, cisgender, white, disabled man is unlikely to have the same experiences with systems and technologies as a disabled queer woman of color.”

Aboulafia is a strong advocate of working collaboratively with the disability community in both building the dataset and auditing the tool, but acknowledges the limitations that each individual with a disability “can only really have a say in their own experience.”

“It’s useful to include people with backgrounds in disability rights and disability justice,” Aboulafia said.

“There are as many different ways of experiencing disability as there are people with disabilities. So having knowledge of disability rights and justice and working from that framework can be very helpful in advocacy work that transcends disability.”

Despite being immensely complex under the hood, generative AI is becoming more human-like on the front end. Unleashing its full potential seems to be largely a matter of asking the right questions. Building a more inclusive AI future may be about not talking to a computer, but reaching out to the right humans at the right time and taking the time to really listen to what they have to say.

Source link