Matt A. Nelson, University of Minnesota
Matthew Sobek, University of Minnesota
Diana Magnuson, University of Minnesota
Lap Huynh, University of Minnesota
IPUMS has recently released final versions of all complete count census data for the United States between 1790-1940. These data represent over 700 million person records. Because of their sheer size and scale, the development of these datasets necessitated using different methods and approaches to assess data quality and make fixes. We document cases where data quality was affected by choices made by the Census historically, but also by data transcription errors in the modern day. This paper describes our tools and approaches, the challenges of scale, and the implications for research these decisions have. The tools and approaches can help with the development of other historical big data, and the implications will guide researchers on how to use the data effectively and accurately.
No extended abstract or paper available
Presented in Session 68. New Historical Data Infrastructure II