Job Market Paper


  1. Widespread Partisan Gerrymandering Mostly Cancels Nationally, but Reduces Electoral Competition. PNAS, 120 (25), e2217322120. 2023.
    • Coverage: PNAS Blog
    • with Christopher Kenny, Cory McCartan, Shiro Kuriwaki, and Kosuke Imai.
    • Click for Abstract. Congressional district lines in many U.S. states are drawn by partisan actors, raising concerns about gerrymandering. To isolate the electoral impact of gerrymandering from the effects of other factors including geography and redistricting rules, we compare predicted election outcomes under the enacted plan with those under a large sample of non-partisan, simulated alternative plans for all states. We find that partisan gerrymandering is widespread in the 2020 redistricting cycle, but most of the bias it creates cancels at the national level, giving Republicans two additional seats, on average. In contrast, moderate pro-Republican bias due to geography and redistricting rules remains. Finally, we find that partisan gerrymandering reduces electoral competition and makes the House's partisan composition less responsive to shifts in the national vote.
  2. Researchers need better access to US Census data. Science, 380, no. 6648 pg. 902-903. 2023.
    • with Cory McCartan and Kosuke Imai.
    • Click for Abstract. For the 2020 decennial census, the Census Bureau adopted a new Disclosure Avoidance System (DAS) based on differential privacy. The DAS was designed to protect the confidentiality of responses by injecting statistical noise into a confidential individual census dataset. A key output of this system is the Noisy Measurement File (NMF), which is produced by adding random noise to tabulated statistics. The resulting Noisy Measurement File (NMF) is an invaluable resource for Census data users to understand the error introduced by the DAS and perform statistically valid analyses that properly account for DAS-introduced error. The Bureau did not initially release the NMF, but released a demonstration version in April 2023 after several public requests and subsequent litigation. The Bureau plans to release the NMF for the P.L.94-171 redistricting data and more detailed census data (the DHC file) later this year. We commend the Bureau's decision to provide the NMF, which will help advance social science research, improve policy decisions, and further strengthen the DAS itself. To maximize the benefits of the released NMF, however, we believe that the Bureau must substantially improve the way in which the NMF is formatted and released. In a letter recently published in Science, we explain several obstacles researchers may face when accessing, processing, and using the demonstration data for statistical analyses.
  3. LocalView: a database of public meetings for the study of local politics and policy-making in the United States. Nature Scientific Data, 10, 135, 2023.
    • with Soubhik Barari.
    • Click for Abstract. Despite the fundamental importance of American local governments for service provision in areas like education and public health, local policy-making remains difficult and expensive to study at scale due to a lack of centralized data. This article introduces LocalView, the largest existing dataset of real-time local government public meetings -- the central policy-making process in local government. In sum, the dataset currently covers 139,616 videos and their corresponding textual and audio transcripts of local government meetings publicly uploaded to YouTube -- the world's largest public video-sharing website -- from 1,012 places and 2,861 distinct governments across the United States between 2006-2022. The data are processed, downloaded, cleaned, and publicly disseminated at for analysis across places and over time. We validate this dataset using a variety of methods and demonstrate how it can be used to map local governments' attention to policy areas of interest. Finally, we discuss how LocalView may be used by journalists, academics, and other users for understanding how local communities deliberate crucial policy questions on topics including climate change, public health, and immigration.
    • Project Website
    • Press Coverage: Nature Behavioral and Social Sciences Blog
  4. Comment: The Essential Role of Policy Evaluation for the 2020 Census Disclosure Avoidance System. Harvard Data Science Review, Special Issue 2. 2023.
    • with Christopher Kenny, Cory McCartan, Evan T. R. Rosenman, and Kosuke Imai
    • Click for Abstract. In "Differential Perspectives: Epistemic Disconnects Surrounding the US Census Bureau's Use of Differential Privacy," boyd and Sarathy argue that empirical evaluations of the Census Disclosure Avoidance System (DAS), including our published analysis, failed to recognize how the benchmark data against which the 2020 DAS was evaluated is never a ground truth of population counts. In this commentary, we explain why policy evaluation, which was the main goal of our analysis, is still meaningful without access to a perfect ground truth. We also point out that our evaluation leveraged features specific to the decennial Census and redistricting data, such as block-level population invariance under swapping and voter file racial identification, better approximating a comparison with the ground truth. Lastly, we show that accurate statistical predictions of individual race based on the Bayesian Improved Surname Geocoding, while not a violation of differential privacy, substantially increases the disclosure risk of private information the Census Bureau sought to protect. We conclude by arguing that policy makers must confront a key trade-off between data utility and privacy protection, and an epistemic disconnect alone is insufficient to explain disagreements between policy choices.
  5. Simulated redistricting plans for the analysis and evaluation of redistricting in the United States. Nature Scientific Data, 9, 698, 2022.
    • with Cory McCartan, Christopher Kenny, George Garcia III, Kevin Wang, Melissa Wu, and Kosuke Imai
    • Click for Abstract. This article introduces the 50stateSimulations, a collection of simulated congressional districting plans and underlying code developed by the Algorithm-Assisted Redistricting Methodology (ALARM) Project. The 50stateSimulations allow for the evaluation of enacted and other congressional redistricting plans in the United States. While the use of redistricting simulation algorithms has become standard in academic research and court cases, any simulation analysis requires non-trivial eforts to combine multiple data sets, identify state-specifc redistricting criteria, implement complex simulation algorithms, and summarize and visualize simulation outputs. We have developed a complete workfow that facilitates this entire process of simulation-based redistricting analysis for the congressional districts of all 50 states. The resulting 50stateSimulations include ensembles of simulated 2020 congressional redistricting plans and necessary replication data. We also provide the underlying code, which serves as a template for customized analyses. All data and code are free and publicly available. This article details the design, creation, and validation of the data.
    • Project Website
    • Replication: Dataverse
  6. Presidential Patronage and Executive Branch Appontments, 1925-1959. Presidential Studies Quarterly, 52(1): 38-59. 2022.
    • with Jon Rogowski.
    • Click for Abstract. We study presidential patronage as a form of distributive politics. To do so, we introduce comprehensive data on supervisory personnel in the executive branch between 1925 and 1959 and link each bureaucrat to the congressional representative from their home district. We identify testable hypotheses regarding the impact of electoral considerations, partisanship, and legislative support on the distribution of bureaucratic appointments across districts. Results from a variety of fixed-effects estimation strategies are consistent with several forms of presidential patronage. Our results provide initial evidence about the mechanisms through which patronage appointments are administered in the executive branch and illustrate how presidential politics affects the composition of the federal government.
  7. The Use of Differential Privacy for Census Data and its Impact on Redistricting: The Case of the 2020 U.S. Census. Science Advances, vol. 7, eabk3283. 2021.
    • with Christopher Kenny, Cory McCartan, Evan Rosenman, and Kosuke Imai.
    • Click for Abstract. Census statistics play a key role in public policy decisions and social science research. Yet given the risk of revealing individual information, many statistical agencies are considering disclosure control methods based on differential privacy, which add noise to tabulated data. Unlike other applications of differential privacy, however, census statistics must be post-processed after noise injection to be usable. We study the impact of the US Census Bureau's new Disclosure Avoidance System (DAS) on a major application of census statistics: the redrawing of electoral districts. We find that the DAS systematically undercounts the population in mixed-race and mixed-partisan precincts, yielding unpredictable racial and partisan biases. The DAS also leads to a likely violation of "One Person, One Vote" standard as currently interpreted, but does not prevent accurate predictions of an individual's race and ethnicity. Our findings underscore the difficulty of balancing accuracy and respondent privacy in the Census.
    • Selected Press Coverage: Associated Press, The Washington Post, San Francisco Chronicle
    • Originally a Public Comment to the US Census Bureau (May 28, 2021)

Working Papers