Memory  /  Study

Whose History? AI Uncovers Who Gets Attention in High School Textbooks

Natural language processing reveals huge differences in how Texas history textbooks treat men, women, and people of color.

Harnessing the power of machine learning, Stanford University researchers have measured just how much more attention some high school history textbooks pay to white men than to Blacks, ethnic minorities, and women.

In a new study of American history textbooks used in Texas, the researchers found remarkable disparities.

Hispanic students make up 52 percent of enrollments in Texas schools, for example, but Hispanic people received almost no mention at all in any of the textbooks — less than one-quarter of one percent of people who were mentioned by name.

By contrast, all but five of the 50 most-mentioned individuals were white men. Only one woman made that list — Eleanor Roosevelt — and only four people of color. Former president Barack Obama came in at 29th, Martin Luther King came in 30th, followed by Dred Scott and Frederick Douglass. Andrew Jackson, a slaveowner who contributed mightily to the genocide of Native Americans, got more mentions than anyone else.

Those are just the top-line numbers. Using the tools of natural language processing, or NLP, the researchers also quantified differences in how various groups were characterized.

White men were more likely to be associated with words denoting power, while women were more likely to be associated with marriage and families. African Americans were most likely to be associated to words of powerlessness and persecution, rather than with political action or government.

“Even for people who grew up with these textbooks, these patterns are surprising,” said Dorottya Demszky, a PhD candidate in linguistics who co-initiated the project. “We hope that this kind of quantification can become a tool for developing textbooks that are more representative.”

Exposing the Patterns, Faster

To be sure, it’s no secret that textbooks are shaped by the priorities and prejudices of the people in power. As recently as the mid-20th century, many southern schools taught that the Civil War was primarily about states’ rights rather than slavery. Indeed, educators have been scouring textbooks for decades to measure prejudice and distortions.

NLP models, the researchers say, can be useful new tools in that effort. Because the AI models read every word and parse every sentence, they can provide more holistic, nuanced, and reliable measures of possible under- and over-representation of different groups.

The Stanford researchers analyzed 18 American history textbooks that Texas school districts used from 2015 through 2017, applying an array of natural language processing techniques. These included neural network models that quantify subtle implicit associations, as well as linguistic databases that ...