Publications

  1. A summer bridge program for first-generation low-income students stretches academic ambitions with no adverse impacts on first-year GPA. PNAS. 121 (50) e2404924121. 2024.
    • with Rebecca Johnson and Kosuke Imai.
    • Click for Abstract. A large body of research documents the barriers faced by first-generation, low-income (FGLI) students as “hidden minorities” on elite college campuses. Although existing studies show brief psychological interventions can help mitigate some of these obstacles, universities are investing in more intensive interventions that try to both shift mindsets and mitigate structural disadvantages in FGLI students’ academic preparation. In collaboration with the administrators at a highly selective university, we conducted a randomized controlled trial of a summer bridge program targeted at FGLI students. During summers between 2017 and 2019, we randomly selected 232 out of 418 first-generation or low-income students and invited them to attend an intensive, six-week-long residential summer program featuring courses for academic credit. Students randomized to the control group either interacted with online content offering no academic credit or had no summer intervention. Our preregistered analysis shows that the program encouraged FGLI students to pursue a more ambitious first-year program, increasing the proportion of nonintroductory courses by 7 percentage points. The program also increased the proportion of courses taken for a grade rather than as pass-fail by 6 percentage points. These improvements were accompanied by no discernible impact on first-year grade point averages (GPAs) and academic withdrawal. The findings show the potential to academically integrate FGLI students into selective university communities.
  2. The Promise of Text, Audio, and Video Data for the Study of US Local Politics and Federalism. Publius: The Journal of Federalism. pjae046. 2024.
    • with Soubhik Barari.
    • Click for Abstract. A large-scale study of US local policymaking has long been hindered by a lack of centralized data sources. Our own project, LocalView, supplements data collection efforts by creating the largest existing database of local government meeting transcripts, audio, and video yet released. In this article, we describe promises, implications, and best practices for using nontabular sources of meeting data in the study of federalism. Throughout, we argue that these new sources of data allow scholars to ask new kinds of research questions. We demonstrate this potential with an empirical application focused on the use of national partisan language in local government meetings. We find that nationally salient partisan phrases are common in local policymaking discussions (especially in large cities), although prominent national terms vary drastically in how often they are used at the local level. Finally, the slant of partisan language (i.e., the amount of partisan language that is identifiably Democratic or Republican) across local governments is correlated with local partisan preferences.
  3. School desegregation by redrawing district boundaries. Nature: Scientific Reports. 14, 22097. 2024.
    • Click for Abstract. Schools in the United States remain heavily segregated by race and income. Previous work demonstrates districts can promote group diversity within their schools with policies like redrawing attendance zones. Yet, the promise of such policies in many areas is limited by the fact that most school segregation occurs between school districts, and not between schools in the same district. I adapt Markov Chain Monte Carlo algorithms from legislative redistricting to redraw school district boundaries that decrease segregation while maintaining desirable criteria like distance to school and using only existing school facilities. Focusing on New Jersey, where the segregation of Black and Hispanic students from White and Asian students is among the worst in the country, I demonstrate that redrawing school districts could reduce more than 40% of existing segregation in the median New Jersey county, compared to less than 5% for redrawing attendance zones alone. Finally, I show how my proposed methodology can be applied to as few as two districts to reduce segregation in proposed consolidations, when small districts are merged into a larger district.
    • 2024 Robert H. Durr Award (MPSA), for “the best paper applying quantitative methods to a substantive problem.”
  4. Evaluating Bias and Noise Induced by the U.S. Census Bureau's Privacy Protection Methods. Science Advances Vol. 10, eadl2524. 2024.
    • Coverage: Election Law Blog, Science Insider
    • with Christopher Kenny, Cory McCartan, Shiro Kuriwaki, and Kosuke Imai.
    • Click for Abstract. The United States Census Bureau faces a difficult trade-off between the accuracy of Census statistics and the protection of individual information. We conduct the first independent evaluation of bias and noise induced by the Bureau's two main disclosure avoidance systems: the TopDown algorithm employed for the 2020 Census and the swapping algorithm implemented for the 1990, 2000, and 2010 Censuses. Our evaluation leverages the recent release of the Noisy Measure File (NMF) as well as the availability of two independent runs of the TopDown algorithm applied to the 2010 decennial Census. We find that the NMF contains too much noise to be directly useful alone, especially for Hispanic and multiracial populations. TopDown's post-processing dramatically reduces the NMF noise and produces similarly accurate data to swapping in terms of bias and noise. These patterns hold across census geographies with varying population sizes and racial diversity. While the estimated errors for both TopDown and swapping are generally no larger than other sources of Census error, they can be relatively substantial for geographies with small total populations.
  5. How academics and policymakers can collaborate effectively: Lessons from using behavioral science to improve US federal government policies. Behavioral Science & Policy. 2024.
    • with OES Members: Shibeal O’Flaherty, Lizzie Martin, Syon Bhanot, Crystal Hall, Sebastian Jilke, and Mary Steffel.
    • Click for Abstract. The U.S. government administers many public programs and services. Creating programs that work requires an understanding of the psychological processes that influence behavior. To this end, policymakers may collaborate with academics who have expertise in behavioral science to generate ideas for improving existing programs, procedures, or policies; to test existing programs; or to design wholly new programs that address societal problems. Such collaborations also enable academics to test new or established theories in real-world settings. In this article, we draw on our collective experience in the U.S. Office of Evaluation Sciences, where we have worked on studies that evaluate various federal programs, to outline some of the core issues that make research collaborations between academics and government agents challenging. We also offer tips for making these partnerships productive and mutually beneficial.
  6. Census officials must constructively engage with independent evaluations. PNAS. 121 (11) e2321196121. 2024.
  7. Making Differential Privacy Work for Census Data Users. Harvard Data Science Review. 5 (4), 2023.
    • with Cory McCartan and Kosuke Imai.
  8. Widespread Partisan Gerrymandering Mostly Cancels Nationally, but Reduces Electoral Competition. PNAS, 120 (25), e2217322120. 2023.
    • Coverage: PNAS Blog
    • with Christopher Kenny, Cory McCartan, Shiro Kuriwaki, and Kosuke Imai.
    • Click for Abstract. Congressional district lines in many U.S. states are drawn by partisan actors, raising concerns about gerrymandering. To isolate the electoral impact of gerrymandering from the effects of other factors including geography and redistricting rules, we compare predicted election outcomes under the enacted plan with those under a large sample of non-partisan, simulated alternative plans for all states. We find that partisan gerrymandering is widespread in the 2020 redistricting cycle, but most of the bias it creates cancels at the national level, giving Republicans two additional seats, on average. In contrast, moderate pro-Republican bias due to geography and redistricting rules remains. Finally, we find that partisan gerrymandering reduces electoral competition and makes the House's partisan composition less responsive to shifts in the national vote.
  9. Researchers need better access to US Census data. Science, 380, no. 6648 pg. 902-903. 2023.
    • with Cory McCartan and Kosuke Imai.
    • Click for Abstract. For the 2020 decennial census, the Census Bureau adopted a new Disclosure Avoidance System (DAS) based on differential privacy. The DAS was designed to protect the confidentiality of responses by injecting statistical noise into a confidential individual census dataset. A key output of this system is the Noisy Measurement File (NMF), which is produced by adding random noise to tabulated statistics. The resulting Noisy Measurement File (NMF) is an invaluable resource for Census data users to understand the error introduced by the DAS and perform statistically valid analyses that properly account for DAS-introduced error. The Bureau did not initially release the NMF, but released a demonstration version in April 2023 after several public requests and subsequent litigation. The Bureau plans to release the NMF for the P.L.94-171 redistricting data and more detailed census data (the DHC file) later this year. We commend the Bureau's decision to provide the NMF, which will help advance social science research, improve policy decisions, and further strengthen the DAS itself. To maximize the benefits of the released NMF, however, we believe that the Bureau must substantially improve the way in which the NMF is formatted and released. In a letter recently published in Science, we explain several obstacles researchers may face when accessing, processing, and using the demonstration data for statistical analyses.
  10. LocalView: a database of public meetings for the study of local politics and policy-making in the United States. Nature Scientific Data, 10, 135. 2023.
    • with Soubhik Barari.
    • Click for Abstract. Despite the fundamental importance of American local governments for service provision in areas like education and public health, local policy-making remains difficult and expensive to study at scale due to a lack of centralized data. This article introduces LocalView, the largest existing dataset of real-time local government public meetings -- the central policy-making process in local government. In sum, the dataset currently covers 139,616 videos and their corresponding textual and audio transcripts of local government meetings publicly uploaded to YouTube -- the world's largest public video-sharing website -- from 1,012 places and 2,861 distinct governments across the United States between 2006-2022. The data are processed, downloaded, cleaned, and publicly disseminated at localview.net for analysis across places and over time. We validate this dataset using a variety of methods and demonstrate how it can be used to map local governments' attention to policy areas of interest. Finally, we discuss how LocalView may be used by journalists, academics, and other users for understanding how local communities deliberate crucial policy questions on topics including climate change, public health, and immigration.
    • Project Website
    • Press Coverage: Nature Behavioral and Social Sciences Blog
  11. Comment: The Essential Role of Policy Evaluation for the 2020 Census Disclosure Avoidance System. Harvard Data Science Review, Special Issue 2. 2023.
    • with Christopher Kenny, Cory McCartan, Evan T. R. Rosenman, and Kosuke Imai
    • Click for Abstract. In "Differential Perspectives: Epistemic Disconnects Surrounding the US Census Bureau's Use of Differential Privacy," boyd and Sarathy argue that empirical evaluations of the Census Disclosure Avoidance System (DAS), including our published analysis, failed to recognize how the benchmark data against which the 2020 DAS was evaluated is never a ground truth of population counts. In this commentary, we explain why policy evaluation, which was the main goal of our analysis, is still meaningful without access to a perfect ground truth. We also point out that our evaluation leveraged features specific to the decennial Census and redistricting data, such as block-level population invariance under swapping and voter file racial identification, better approximating a comparison with the ground truth. Lastly, we show that accurate statistical predictions of individual race based on the Bayesian Improved Surname Geocoding, while not a violation of differential privacy, substantially increases the disclosure risk of private information the Census Bureau sought to protect. We conclude by arguing that policy makers must confront a key trade-off between data utility and privacy protection, and an epistemic disconnect alone is insufficient to explain disagreements between policy choices.
  12. Simulated redistricting plans for the analysis and evaluation of redistricting in the United States. Nature Scientific Data, 9, 698, 2022.
    • with Cory McCartan, Christopher Kenny, George Garcia III, Kevin Wang, Melissa Wu, and Kosuke Imai
    • Click for Abstract. This article introduces the 50stateSimulations, a collection of simulated congressional districting plans and underlying code developed by the Algorithm-Assisted Redistricting Methodology (ALARM) Project. The 50stateSimulations allow for the evaluation of enacted and other congressional redistricting plans in the United States. While the use of redistricting simulation algorithms has become standard in academic research and court cases, any simulation analysis requires non-trivial eforts to combine multiple data sets, identify state-specifc redistricting criteria, implement complex simulation algorithms, and summarize and visualize simulation outputs. We have developed a complete workfow that facilitates this entire process of simulation-based redistricting analysis for the congressional districts of all 50 states. The resulting 50stateSimulations include ensembles of simulated 2020 congressional redistricting plans and necessary replication data. We also provide the underlying code, which serves as a template for customized analyses. All data and code are free and publicly available. This article details the design, creation, and validation of the data.
    • Project Website
    • Replication: Dataverse
  13. Presidential Patronage and Executive Branch Appontments, 1925-1959. Presidential Studies Quarterly, 52(1): 38-59. 2022.
    • with Jon Rogowski.
    • Click for Abstract. We study presidential patronage as a form of distributive politics. To do so, we introduce comprehensive data on supervisory personnel in the executive branch between 1925 and 1959 and link each bureaucrat to the congressional representative from their home district. We identify testable hypotheses regarding the impact of electoral considerations, partisanship, and legislative support on the distribution of bureaucratic appointments across districts. Results from a variety of fixed-effects estimation strategies are consistent with several forms of presidential patronage. Our results provide initial evidence about the mechanisms through which patronage appointments are administered in the executive branch and illustrate how presidential politics affects the composition of the federal government.
  14. The Use of Differential Privacy for Census Data and its Impact on Redistricting: The Case of the 2020 U.S. Census. Science Advances, vol. 7, eabk3283. 2021.
    • with Christopher Kenny, Cory McCartan, Evan Rosenman, and Kosuke Imai.
    • Click for Abstract. Census statistics play a key role in public policy decisions and social science research. Yet given the risk of revealing individual information, many statistical agencies are considering disclosure control methods based on differential privacy, which add noise to tabulated data. Unlike other applications of differential privacy, however, census statistics must be post-processed after noise injection to be usable. We study the impact of the US Census Bureau's new Disclosure Avoidance System (DAS) on a major application of census statistics: the redrawing of electoral districts. We find that the DAS systematically undercounts the population in mixed-race and mixed-partisan precincts, yielding unpredictable racial and partisan biases. The DAS also leads to a likely violation of "One Person, One Vote" standard as currently interpreted, but does not prevent accurate predictions of an individual's race and ethnicity. Our findings underscore the difficulty of balancing accuracy and respondent privacy in the Census.
    • Selected Press Coverage: Associated Press, The Washington Post, San Francisco Chronicle
    • Originally a Public Comment to the US Census Bureau (May 28, 2021)

Working Papers

  1. Tabling Debate: How Local Officials Use Agenda Control to Stifle Conflict
    • with Mirya Holman.
    • Click for Abstract. Public officials influence policymaking by choosing which items receive attention and action -- and which do not. Accounts from national legislatures typically explain agenda control in terms of party leadership and discipline. But, do politicians exert agenda control outside highly professionalized legislatures? We bring the agenda control discussion to school boards, which generally lack strong party control and feature few restrictions on agendas. We argue that local officials will increase their use of procedural rules to avoid making decisions in conflictual settings. We test our argument by constructing measures of both agenda control and conflict in a dataset of nearly 65,000 school board meeting transcripts. Consistent with our theory, we document an increased use of procedural control in highly contentious meetings. Responses from these school board members to a novel survey experiment confirm the causal link: they increase their use of tabling when conflict occurs on an issue.
  2. Measuring Conflict in Local Politics
    • with Mirya Holman and Rebecca Johnson.
    • Click for Abstract. Many of the most tangible and immediate political conflicts in American's lives occur at the local level. Yet, we lack large-scale evidence on how, why, and where conflict occurs in local governments. In this paper, we present a new dataset of nearly 100,000 videos of school board meetings, and a new measure of local political conflict. We use and validate this new approach using sentiment analysis and structural topic modeling. We then document consistent results: conflict in school board meetings broadly occurs at some point for most school boards, but the most intense conflicts are concentrated in small numbers of districts; this conflict often centers cultural issues like racial diversity and gender identity. We then show that conflict, particularly cultural conflict, is most likely to occur in larger school districts in cities and suburbs, in places with more white students, and in places with more political competition.
    • 2024 Honorable Mention - Best Paper in Urban and Local Politics Presented at the 2024 APSA
  3. Redistricting Reforms Reduce Gerrymandering by Constraining Partisan Actors
    • with Cory McCartan, Christopher T. Kenny, Emma Ebowe, Michael Y. Zhao, and Kosuke Imai.
    • Click for Abstract. Political actors frequently manipulate redistricting plans to gain electoral advantages, a process commonly known as gerrymandering. To address this problem, several states have implemented institutional reforms including the establishment of map-drawing commissions. It is difficult to assess the impact of such reforms because each state structures bundles of complex rules in different ways. We propose to model redistricting processes as a sequential game. The equilibrium solution to the game summarizes multi-step institutional interactions as a single dimensional score. This score measures the leeway political actors have over the partisan lean of the final plan. Using a differences-in-differences design, we demonstrate that reforms reduce partisan bias and increase competitiveness when they constrain partisan actors. We perform a counterfactual policy analysis to estimate the partisan effects of enacting recent institutional reforms nationwide. We find that instituting redistricting commissions generally reduces the current Republican advantage, but Michigan-style reforms would yield a much greater pro-Democratic effect than types of redistricting commissions adopted in Ohio and New York.
  4. Does Reducing Documentation Burden Broaden Access to Emergency Rental Assistance? Quasi-experimental Evidence from Virginia.
    • with OES (see Analysis Plan).
    • Click for Abstract. We examine the effects of a “fact-specific proxy” (FSP) introduced by Virginia’s Department of Housing and Community Development (VA DHCD) to broaden and streamline access to assistance. The FSP used the applicant’s ZIP code as a proxy for income eligibility, simplifying the requirement of documenting income eligibility for some applicants and not others. Simplifying income eligibility verification represents a substantial documentation burden reduction. Our general goal in the project is to ask: to what extent does simplifying the individual requirement to document income eligibility for applicants in relevant zip codes increase applications (especially among underserved groups) and reduce processing times? We analyze application data aggregated to the ZIP code level in order to answer this question.
  5. Using Large-Scale Data to Monitor Conditions in New York City Public Housing.
    • with OES (see Analysis Plan).
    • Click for Abstract.

      On January 31, 2019, the U.S. Department of Housing and Urban Development (HUD), the U.S. Attorney's Office for the Southern District of New York (SDNY within DOJ), the New York City Housing Authority (NYCHA), and New York City (the City) signed an agreement to help NYCHA significantly improve housing conditions for its residents. Housing conditions targeted for improvement ranged from lead paint to heat to pest infestations. In turn, unsafe housing conditions can harm public housing residents' health, increasing their risk for conditions such as childhood and adult asthma.

      Measuring improvement in pest conditions is difficult because there was no “pest census” at the time of the agreement—that is, no complete account of the presence or absence of pests in all units, buildings, and developments. Meanwhile, the administrative agreement requires that NYCHA reduces its pest population by certain magnitudes (e.g., 40-50% depending on the pest type), which makes it important to obtain unbiased measures of pest prevalence. Our collaboration focused on methodologies that could be used to monitor: (1) the baseline levels of pest infestations at the beginning of the legal oversight and (2) whether there are improvements over time. We explored the pros and cons of four strategies for estimating this prevalence: (1) using tenant-submitted work orders as measures of underlying issues, (2) using results from randomly-scheduled inspections, (3) using results from randomly-scheduled inspections but reweighting these results to account for unequal probabilities of being selected for an inspection and agreeing to have one's unit inspected, and (4) using predictive modeling to see whether we can predict inspection results using many predictors (e.g., work order history; building characteristics). We are preparing a manuscript that discusses broader lessons for the role of data science in monitoring compliance with legal oversight and provides recommendations for academics and policymakers