Auke Rijpma, Utrecht University
Jeanne Cilliers, Lund University
Johan Fourie, Stellenbosch University
Jonathan Schoots, Stellenbosch University
The Cape of Good Hope Panel Project has been building a panel centered on the 1685–1844 tax censuses from the Cape Colony, but also including the rich source material of the Cape archives, and is using a machine learning-based record linkage procedure to track individuals and their assets over time (Fourie and Green 2018; Rijpma et al. 2020). In this paper we describe how this approach gives us excellent performance in terms of precision and recall. We also present updates to the model and procedure that we have since developed, asking how the procedure can be implemented most efficiently in terms of training data and computational requirements, and how the models can be extended to accomodate diverse source material that we did not originally design the procedure for, including other periods and regions, and e.g. genealogical sources. Finally, now that the output of the model -- the linked data -- is being used in research papers, we can now better evaluate how our record linkage procedures performs in terms of biases that are introduced and how this can best be mitigated. Fourie, Johan, and Erik Green. 2018. “Building the Cape of Good Hope Panel.” The History of the Family 23 (3): 493–502. https://doi.org/10.1080/1081602X.2018.1509367. Rijpma, Auke, Jeanne Cilliers, and Johan Fourie. 2020. “Record Linkage in the Cape of Good Hope Panel.” Historical Methods: A Journal of Quantitative and Interdisciplinary History 53 (2): 112–29. https://doi.org/10.1080/01615440.2018.1517030.
No extended abstract or paper available