The study, by University of Washington linguists, a Mozilla Foundation fellow, and Google research scientists, urges tech companies to protect the privacy and other rights of people whose data is used to train AI. Currently there is a lack of both diversity and respect for participants, which is a subset of the larger problems with AI:
“The paper concluded that large language models contain the capacity to perpetuate prejudice and bias against a range of marginalized communities and that poorly annotated datasets are part of the problem.”
Marginalized groups are at high risk of discrimination from the output of biased algorithms, a problem which has come under increasing scrutiny in the past year. The authors note it is not enough to merely change the datasets themselves - the entire culture must be overhauled.
Events over the past year have brought to light the machine learning community’s shortcomings and often harmed people from marginalized communities