Publications

  1. School desegregation by redrawing district boundaries. Nature: Scientific Reports. 14, 22097 (2024)
    • Click for Abstract. Schools in the United States remain heavily segregated by race and income. Previous work demonstrates districts can promote group diversity within their schools with policies like redrawing attendance zones. Yet, the promise of such policies in many areas is limited by the fact that most school segregation occurs between school districts, and not between schools in the same district. I adapt Markov Chain Monte Carlo algorithms from legislative redistricting to redraw school district boundaries that decrease segregation while maintaining desirable criteria like distance to school and using only existing school facilities. Focusing on New Jersey, where the segregation of Black and Hispanic students from White and Asian students is among the worst in the country, I demonstrate that redrawing school districts could reduce more than 40% of existing segregation in the median New Jersey county, compared to less than 5% for redrawing attendance zones alone. Finally, I show how my proposed methodology can be applied to as few as two districts to reduce segregation in proposed consolidations, when small districts are merged into a larger district.
  2. Evaluating Bias and Noise Induced by the U.S. Census Bureau's Privacy Protection Methods. Science Advances Vol. 10, eadl2524 (2024)
    • Coverage: Election Law Blog, Science Insider
    • with Christopher Kenny, Cory McCartan, Shiro Kuriwaki, and Kosuke Imai.
    • Click for Abstract. The United States Census Bureau faces a difficult trade-off between the accuracy of Census statistics and the protection of individual information. We conduct the first independent evaluation of bias and noise induced by the Bureau's two main disclosure avoidance systems: the TopDown algorithm employed for the 2020 Census and the swapping algorithm implemented for the 1990, 2000, and 2010 Censuses. Our evaluation leverages the recent release of the Noisy Measure File (NMF) as well as the availability of two independent runs of the TopDown algorithm applied to the 2010 decennial Census. We find that the NMF contains too much noise to be directly useful alone, especially for Hispanic and multiracial populations. TopDown's post-processing dramatically reduces the NMF noise and produces similarly accurate data to swapping in terms of bias and noise. These patterns hold across census geographies with varying population sizes and racial diversity. While the estimated errors for both TopDown and swapping are generally no larger than other sources of Census error, they can be relatively substantial for geographies with small total populations.
  3. How academics and policymakers can collaborate effectively: Lessons from using behavioral science to improve US federal government policies. Behavioral Science & Policy (2024)
    • with OES Members: Shibeal O’Flaherty, Lizzie Martin, Syon Bhanot, Crystal Hall, Sebastian Jilke, and Mary Steffel.
    • Click for Abstract. The U.S. government administers many public programs and services. Creating programs that work requires an understanding of the psychological processes that influence behavior. To this end, policymakers may collaborate with academics who have expertise in behavioral science to generate ideas for improving existing programs, procedures, or policies; to test existing programs; or to design wholly new programs that address societal problems. Such collaborations also enable academics to test new or established theories in real-world settings. In this article, we draw on our collective experience in the U.S. Office of Evaluation Sciences, where we have worked on studies that evaluate various federal programs, to outline some of the core issues that make research collaborations between academics and government agents challenging. We also offer tips for making these partnerships productive and mutually beneficial.
  4. Census officials must constructively engage with independent evaluations. PNAS. 121 (11) e2321196121. (2024).
  5. Making Differential Privacy Work for Census Data Users. Harvard Data Science Review. 5 (4), (2023).
    • with Cory McCartan and Kosuke Imai.
  6. Widespread Partisan Gerrymandering Mostly Cancels Nationally, but Reduces Electoral Competition. PNAS, 120 (25), e2217322120. 2023.
    • Coverage: PNAS Blog
    • with Christopher Kenny, Cory McCartan, Shiro Kuriwaki, and Kosuke Imai.
    • Click for Abstract. Congressional district lines in many U.S. states are drawn by partisan actors, raising concerns about gerrymandering. To isolate the electoral impact of gerrymandering from the effects of other factors including geography and redistricting rules, we compare predicted election outcomes under the enacted plan with those under a large sample of non-partisan, simulated alternative plans for all states. We find that partisan gerrymandering is widespread in the 2020 redistricting cycle, but most of the bias it creates cancels at the national level, giving Republicans two additional seats, on average. In contrast, moderate pro-Republican bias due to geography and redistricting rules remains. Finally, we find that partisan gerrymandering reduces electoral competition and makes the House's partisan composition less responsive to shifts in the national vote.
  7. Researchers need better access to US Census data. Science, 380, no. 6648 pg. 902-903. 2023.
    • with Cory McCartan and Kosuke Imai.
    • Click for Abstract. For the 2020 decennial census, the Census Bureau adopted a new Disclosure Avoidance System (DAS) based on differential privacy. The DAS was designed to protect the confidentiality of responses by injecting statistical noise into a confidential individual census dataset. A key output of this system is the Noisy Measurement File (NMF), which is produced by adding random noise to tabulated statistics. The resulting Noisy Measurement File (NMF) is an invaluable resource for Census data users to understand the error introduced by the DAS and perform statistically valid analyses that properly account for DAS-introduced error. The Bureau did not initially release the NMF, but released a demonstration version in April 2023 after several public requests and subsequent litigation. The Bureau plans to release the NMF for the P.L.94-171 redistricting data and more detailed census data (the DHC file) later this year. We commend the Bureau's decision to provide the NMF, which will help advance social science research, improve policy decisions, and further strengthen the DAS itself. To maximize the benefits of the released NMF, however, we believe that the Bureau must substantially improve the way in which the NMF is formatted and released. In a letter recently published in Science, we explain several obstacles researchers may face when accessing, processing, and using the demonstration data for statistical analyses.
  8. LocalView: a database of public meetings for the study of local politics and policy-making in the United States. Nature Scientific Data, 10, 135, 2023.
    • with Soubhik Barari.
    • Click for Abstract. Despite the fundamental importance of American local governments for service provision in areas like education and public health, local policy-making remains difficult and expensive to study at scale due to a lack of centralized data. This article introduces LocalView, the largest existing dataset of real-time local government public meetings -- the central policy-making process in local government. In sum, the dataset currently covers 139,616 videos and their corresponding textual and audio transcripts of local government meetings publicly uploaded to YouTube -- the world's largest public video-sharing website -- from 1,012 places and 2,861 distinct governments across the United States between 2006-2022. The data are processed, downloaded, cleaned, and publicly disseminated at localview.net for analysis across places and over time. We validate this dataset using a variety of methods and demonstrate how it can be used to map local governments' attention to policy areas of interest. Finally, we discuss how LocalView may be used by journalists, academics, and other users for understanding how local communities deliberate crucial policy questions on topics including climate change, public health, and immigration.
    • Project Website
    • Press Coverage: Nature Behavioral and Social Sciences Blog
  9. Comment: The Essential Role of Policy Evaluation for the 2020 Census Disclosure Avoidance System. Harvard Data Science Review, Special Issue 2. 2023.
    • with Christopher Kenny, Cory McCartan, Evan T. R. Rosenman, and Kosuke Imai
    • Click for Abstract. In "Differential Perspectives: Epistemic Disconnects Surrounding the US Census Bureau's Use of Differential Privacy," boyd and Sarathy argue that empirical evaluations of the Census Disclosure Avoidance System (DAS), including our published analysis, failed to recognize how the benchmark data against which the 2020 DAS was evaluated is never a ground truth of population counts. In this commentary, we explain why policy evaluation, which was the main goal of our analysis, is still meaningful without access to a perfect ground truth. We also point out that our evaluation leveraged features specific to the decennial Census and redistricting data, such as block-level population invariance under swapping and voter file racial identification, better approximating a comparison with the ground truth. Lastly, we show that accurate statistical predictions of individual race based on the Bayesian Improved Surname Geocoding, while not a violation of differential privacy, substantially increases the disclosure risk of private information the Census Bureau sought to protect. We conclude by arguing that policy makers must confront a key trade-off between data utility and privacy protection, and an epistemic disconnect alone is insufficient to explain disagreements between policy choices.
  10. Simulated redistricting plans for the analysis and evaluation of redistricting in the United States. Nature Scientific Data, 9, 698, 2022.
    • with Cory McCartan, Christopher Kenny, George Garcia III, Kevin Wang, Melissa Wu, and Kosuke Imai
    • Click for Abstract. This article introduces the 50stateSimulations, a collection of simulated congressional districting plans and underlying code developed by the Algorithm-Assisted Redistricting Methodology (ALARM) Project. The 50stateSimulations allow for the evaluation of enacted and other congressional redistricting plans in the United States. While the use of redistricting simulation algorithms has become standard in academic research and court cases, any simulation analysis requires non-trivial eforts to combine multiple data sets, identify state-specifc redistricting criteria, implement complex simulation algorithms, and summarize and visualize simulation outputs. We have developed a complete workfow that facilitates this entire process of simulation-based redistricting analysis for the congressional districts of all 50 states. The resulting 50stateSimulations include ensembles of simulated 2020 congressional redistricting plans and necessary replication data. We also provide the underlying code, which serves as a template for customized analyses. All data and code are free and publicly available. This article details the design, creation, and validation of the data.
    • Project Website
    • Replication: Dataverse
  11. Presidential Patronage and Executive Branch Appontments, 1925-1959. Presidential Studies Quarterly, 52(1): 38-59. 2022.
    • with Jon Rogowski.
    • Click for Abstract. We study presidential patronage as a form of distributive politics. To do so, we introduce comprehensive data on supervisory personnel in the executive branch between 1925 and 1959 and link each bureaucrat to the congressional representative from their home district. We identify testable hypotheses regarding the impact of electoral considerations, partisanship, and legislative support on the distribution of bureaucratic appointments across districts. Results from a variety of fixed-effects estimation strategies are consistent with several forms of presidential patronage. Our results provide initial evidence about the mechanisms through which patronage appointments are administered in the executive branch and illustrate how presidential politics affects the composition of the federal government.
  12. The Use of Differential Privacy for Census Data and its Impact on Redistricting: The Case of the 2020 U.S. Census. Science Advances, vol. 7, eabk3283. 2021.
    • with Christopher Kenny, Cory McCartan, Evan Rosenman, and Kosuke Imai.
    • Click for Abstract. Census statistics play a key role in public policy decisions and social science research. Yet given the risk of revealing individual information, many statistical agencies are considering disclosure control methods based on differential privacy, which add noise to tabulated data. Unlike other applications of differential privacy, however, census statistics must be post-processed after noise injection to be usable. We study the impact of the US Census Bureau's new Disclosure Avoidance System (DAS) on a major application of census statistics: the redrawing of electoral districts. We find that the DAS systematically undercounts the population in mixed-race and mixed-partisan precincts, yielding unpredictable racial and partisan biases. The DAS also leads to a likely violation of "One Person, One Vote" standard as currently interpreted, but does not prevent accurate predictions of an individual's race and ethnicity. Our findings underscore the difficulty of balancing accuracy and respondent privacy in the Census.
    • Selected Press Coverage: Associated Press, The Washington Post, San Francisco Chronicle
    • Originally a Public Comment to the US Census Bureau (May 28, 2021)

Working Papers