Caglar Koylu, University of Iowa
Evan Roberts, University of Minnesota
Jonas Helgertz, University of Minnesota/Lund University
Alice B. Kasakoff, University of South Carolina
Family trees have not been widely used in demography due to questions of representativeness but the availability of the full count U.S. censuses from 1850 to 1940 makes it possible to compare the trees with the census population. Previous studies have provided information on representativeness by comparing proportions of types of individuals in the two sources at a given date (Koylu et al. 2021), and others have done the same for the linked census panels (Helgertz et al., 2022). While previous studies have noted the lack of African Americans in linked census records, genetic databases and those used by historical demographers, there has been no attempt to link individuals from family trees with historical census records to see what systematic biases exist. In this paper, we use the machine learning algorithm developed within the Multigenerational Longitudinal Panel (MLP) project based on an extensive set of individual, familial, and contextual characteristics such as first and last names, birth year and places, father, mother, spouse and sibling information to link individuals alive in 1880 taken from a crowdsourced family tree database with the 1880 census records. Linking tree records with census not only adds information from the census such as occupation, household composition and residence at the township level to the information from trees but is the first step to linking census individuals to their forbears in both the male and female lines, links that the census does not provide.
No extended abstract or paper available
Presented in Session 94. New Linked Data Sources