During the years between each census, researchers, activists, politicians, and interest groups lobby for the rewording of a label, the addition (or elimination) of a category, or the disaggregation of another, such as Asian or American Indian or Alaska Native. In 2000, for example, “Hispanic or Latino, or Spanish origins” was reclassified from racial to ethnic data. Respondents were also allowed to select multiple boxes to reflect multiracial heritage for the first time. Additional changes that affect how the racial makeup of the country is represented are underway, including the creation of a separate category for people of Middle Eastern and North African descent (referred to as MENA).
The statistical accounting used to correct such errors is commonly referred to as “data cleaning” or data cleansing. This process involves identifying and then editing data already collected—through modification, enhancement, or deletion of responses—when it does not conform to some predetermined rules that standardize the data set. Ostensibly, the goal is to improve data quality by correcting measurement errors generated by people who complete the questionnaires or enter responses into the database. Data cleaning hopes to make a final data set similar to other, related ones, such as the other national censuses and the American Community Survey.