Publications

  1. Measuring Conflict in Local Politics. Urban Affairs Review, Forthcoming. 2025.
    • with Mirya Holman and Rebecca Johnson.
    • Click for Abstract. Many of the most tangible and immediate political conflicts in Americans’ lives occur at the local level. Yet, we lack large-scale evidence on how, why, and where conflict occurs in local governments. In this article, we present a new dataset of nearly 100,000 videos of school board meetings, and use them to create a new measure of local political conflict. We validate this new approach using sentiment analysis and structural topic modeling. We then document consistent results: conflict in school board meetings occurs at some point for most boards and has become more common since 2020, but the most intense conflicts are concentrated in a small number of districts; this conflict often centers cultural issues like racial diversity and gender identity. We then show that conflict, particularly cultural conflict, is most likely to occur in larger school districts in cities and suburbs and in places with more White students.
    • Award: APSA 2024-25 Best Paper in Education Politics and Policy
    • Award: APSA 2024-25 Best Paper in Urban and Local Politics, Honorable Mention
    • Coverage: Education Week
  2. City-Defined Neighborhood Boundaries in the United States. Nature: Scientific Data, 12, 1031. 2025.
    • with Stephen Ansolabehere, Jacob Brown, Ryan Enos, Ben Shair, and David Sutton.
    • Click for Abstract. Neighborhoods are frequently cited as impactful for social, economic, political, and health outcomes. Measuring neighborhoods, however, is challenging, as the definition of a neighborhood may change dramatically across places. Researchers lack widespread but locally-sourced data on neighborhoods, and instead often adopt widely available but arbitrary Census geographies as neighborhood proxies. Others invest in the collection of more precise definitions, but these types of data are hard to collect at scale. We address this tension between scale and precision by collecting, cleaning, and providing to researchers a new dataset of city-defined neighborhoods. Our data includes 206 of the largest cities in the United States, covering more than 77 million people. We combine these data with block-level Census demographic data and provide them along with open-source software to aid researchers in their use.
  3. Contextual Stochastic Optimization for School Desegregation Policymaking. Proceedings of the 2025 AAAI Conference on Artificial Intelligence (AAAI). 2025.
    • with Hongzhao Guan, Nabeel Gillani, Jasmine Mangat, and Pascal Van Hentenryck.
    • Click for Abstract. Most US school districts draw geographic "attendance zones" to assign children to schools based on their home address, a process that can replicate existing neighborhood racial/ethnic and socioeconomic status (SES) segregation in schools. Redrawing boundaries can reduce segregation, but estimating expected rezoning impacts is often challenging because families can opt-out of their assigned schools. This paper seeks to alleviate this societal problem by developing a joint redistricting and choice modeling framework, called redistricting with choices (RWC). The RWC framework is applied to a large US public school district to estimate how redrawing elementary school boundaries might realistically impact levels of socioeconomic segregation. The main methodological contribution of RWC is a contextual stochastic optimization model that aims to minimize district-wide segregation by integrating rezoning constraints with a machine learning-based school choice model. The study finds that RWC yields boundary changes that might reduce segregation by a substantial amount (23%) -- but doing so might require the re-assignment of a large number of students, likely to mitigate re-segregation that choice patterns could exacerbate. The results also reveal that predicting school choice is a challenging machine learning problem. Overall, this study offers a novel practical framework that both academics and policymakers might use to foster more diverse and integrated schools.
  4. A Community-Driven Optimization Framework for Redrawing School Attendance Boundaries. Forthcoming, Proceedings of the 2025 ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization (EAAMO). 2025.
    • with Hongzhao Guan, Paul Riggins, Jasmine Mangat, Cassandra Moe, Urooj Haider, Frank Pantano, Effie McMillian, Genevieve Siegel-Hawley, Pascal Van Hentenryck, and Nabeel Gillani.
    • Click for Abstract. The vast majority of US public school districts use school attendance boundaries to determine which student addresses are assigned to which schools. Existing work shows how redrawing boundaries can be a powerful policy lever for increasing access and opportunity for historically disadvantaged groups, even while maintaining other priorities like minimizing driving distances and preserving existing social ties between students and families. This study introduces a multi-objective algorithmic school rezoning framework and applies it to a large-scale rezoning effort impacting over 50,000 students through an ongoing researcher-school district partnership. The framework is designed to incorporate feedback from community members and policymakers, both by deciding which goals are optimized and also by placing differential ``importance'' on goals through weights from community surveys. Empirical results reveal the framework's ability to surface school redistricting plans that simultaneously advance a number of objectives often thought to be in competition with one another, including socioeconomic integration, transportation efficiency, and stable feeder patterns (transitions) between elementary, middle, and high schools. The paper also highlights how local education policymakers navigate several practical challenges, like building political will to make change in a polarized policy climate. The framework is built using open-source tools and publicly released to support school districts in exploring and implementing new policies to improve educational access and opportunity in the coming years.
  5. A summer bridge program for first-generation low-income students stretches academic ambitions with no adverse impacts on first-year GPA. PNAS. 121 (50) e2404924121. 2024.
    • with Rebecca Johnson and Kosuke Imai.
    • Click for Abstract. A large body of research documents the barriers faced by first-generation, low-income (FGLI) students as “hidden minorities” on elite college campuses. Although existing studies show brief psychological interventions can help mitigate some of these obstacles, universities are investing in more intensive interventions that try to both shift mindsets and mitigate structural disadvantages in FGLI students’ academic preparation. In collaboration with the administrators at a highly selective university, we conducted a randomized controlled trial of a summer bridge program targeted at FGLI students. During summers between 2017 and 2019, we randomly selected 232 out of 418 first-generation or low-income students and invited them to attend an intensive, six-week-long residential summer program featuring courses for academic credit. Students randomized to the control group either interacted with online content offering no academic credit or had no summer intervention. Our preregistered analysis shows that the program encouraged FGLI students to pursue a more ambitious first-year program, increasing the proportion of nonintroductory courses by 7 percentage points. The program also increased the proportion of courses taken for a grade rather than as pass-fail by 6 percentage points. These improvements were accompanied by no discernible impact on first-year grade point averages (GPAs) and academic withdrawal. The findings show the potential to academically integrate FGLI students into selective university communities.
  6. The Promise of Text, Audio, and Video Data for the Study of US Local Politics and Federalism. Publius: The Journal of Federalism. pjae046. 2024.
    • with Soubhik Barari.
    • Click for Abstract. A large-scale study of US local policymaking has long been hindered by a lack of centralized data sources. Our own project, LocalView, supplements data collection efforts by creating the largest existing database of local government meeting transcripts, audio, and video yet released. In this article, we describe promises, implications, and best practices for using nontabular sources of meeting data in the study of federalism. Throughout, we argue that these new sources of data allow scholars to ask new kinds of research questions. We demonstrate this potential with an empirical application focused on the use of national partisan language in local government meetings. We find that nationally salient partisan phrases are common in local policymaking discussions (especially in large cities), although prominent national terms vary drastically in how often they are used at the local level. Finally, the slant of partisan language (i.e., the amount of partisan language that is identifiably Democratic or Republican) across local governments is correlated with local partisan preferences.
  7. School desegregation by redrawing district boundaries. Nature: Scientific Reports. 14, 22097. 2024.
    • Click for Abstract. Schools in the United States remain heavily segregated by race and income. Previous work demonstrates districts can promote group diversity within their schools with policies like redrawing attendance zones. Yet, the promise of such policies in many areas is limited by the fact that most school segregation occurs between school districts, and not between schools in the same district. I adapt Markov Chain Monte Carlo algorithms from legislative redistricting to redraw school district boundaries that decrease segregation while maintaining desirable criteria like distance to school and using only existing school facilities. Focusing on New Jersey, where the segregation of Black and Hispanic students from White and Asian students is among the worst in the country, I demonstrate that redrawing school districts could reduce more than 40% of existing segregation in the median New Jersey county, compared to less than 5% for redrawing attendance zones alone. Finally, I show how my proposed methodology can be applied to as few as two districts to reduce segregation in proposed consolidations, when small districts are merged into a larger district.
    • Award: MPSA 2024 Robert H. Durr Award, for “the best paper applying quantitative methods to a substantive problem.”
  8. Evaluating Bias and Noise Induced by the U.S. Census Bureau's Privacy Protection Methods. Science Advances Vol. 10, eadl2524. 2024.
    • Coverage: Election Law Blog, Science Insider
    • with Christopher Kenny, Cory McCartan, Shiro Kuriwaki, and Kosuke Imai.
    • Click for Abstract. The United States Census Bureau faces a difficult trade-off between the accuracy of Census statistics and the protection of individual information. We conduct the first independent evaluation of bias and noise induced by the Bureau's two main disclosure avoidance systems: the TopDown algorithm employed for the 2020 Census and the swapping algorithm implemented for the 1990, 2000, and 2010 Censuses. Our evaluation leverages the recent release of the Noisy Measure File (NMF) as well as the availability of two independent runs of the TopDown algorithm applied to the 2010 decennial Census. We find that the NMF contains too much noise to be directly useful alone, especially for Hispanic and multiracial populations. TopDown's post-processing dramatically reduces the NMF noise and produces similarly accurate data to swapping in terms of bias and noise. These patterns hold across census geographies with varying population sizes and racial diversity. While the estimated errors for both TopDown and swapping are generally no larger than other sources of Census error, they can be relatively substantial for geographies with small total populations.
  9. How academics and policymakers can collaborate effectively: Lessons from using behavioral science to improve US federal government policies. Behavioral Science & Policy. 2024.
    • with OES Members: Shibeal O’Flaherty, Lizzie Martin, Syon Bhanot, Crystal Hall, Sebastian Jilke, and Mary Steffel.
    • Click for Abstract. The U.S. government administers many public programs and services. Creating programs that work requires an understanding of the psychological processes that influence behavior. To this end, policymakers may collaborate with academics who have expertise in behavioral science to generate ideas for improving existing programs, procedures, or policies; to test existing programs; or to design wholly new programs that address societal problems. Such collaborations also enable academics to test new or established theories in real-world settings. In this article, we draw on our collective experience in the U.S. Office of Evaluation Sciences, where we have worked on studies that evaluate various federal programs, to outline some of the core issues that make research collaborations between academics and government agents challenging. We also offer tips for making these partnerships productive and mutually beneficial.
  10. Census officials must constructively engage with independent evaluations. PNAS. 121 (11) e2321196121. 2024.
  11. Making Differential Privacy Work for Census Data Users. Harvard Data Science Review. 5 (4), 2023.
    • with Cory McCartan and Kosuke Imai.
  12. Widespread Partisan Gerrymandering Mostly Cancels Nationally, but Reduces Electoral Competition. PNAS, 120 (25), e2217322120. 2023.
    • Coverage: PNAS Blog
    • with Christopher Kenny, Cory McCartan, Shiro Kuriwaki, and Kosuke Imai.
    • Click for Abstract. Congressional district lines in many U.S. states are drawn by partisan actors, raising concerns about gerrymandering. To isolate the electoral impact of gerrymandering from the effects of other factors including geography and redistricting rules, we compare predicted election outcomes under the enacted plan with those under a large sample of non-partisan, simulated alternative plans for all states. We find that partisan gerrymandering is widespread in the 2020 redistricting cycle, but most of the bias it creates cancels at the national level, giving Republicans two additional seats, on average. In contrast, moderate pro-Republican bias due to geography and redistricting rules remains. Finally, we find that partisan gerrymandering reduces electoral competition and makes the House's partisan composition less responsive to shifts in the national vote.
  13. Researchers need better access to US Census data. Science, 380, no. 6648 pg. 902-903. 2023.
    • with Cory McCartan and Kosuke Imai.
    • Click for Abstract. For the 2020 decennial census, the Census Bureau adopted a new Disclosure Avoidance System (DAS) based on differential privacy. The DAS was designed to protect the confidentiality of responses by injecting statistical noise into a confidential individual census dataset. A key output of this system is the Noisy Measurement File (NMF), which is produced by adding random noise to tabulated statistics. The resulting Noisy Measurement File (NMF) is an invaluable resource for Census data users to understand the error introduced by the DAS and perform statistically valid analyses that properly account for DAS-introduced error. The Bureau did not initially release the NMF, but released a demonstration version in April 2023 after several public requests and subsequent litigation. The Bureau plans to release the NMF for the P.L.94-171 redistricting data and more detailed census data (the DHC file) later this year. We commend the Bureau's decision to provide the NMF, which will help advance social science research, improve policy decisions, and further strengthen the DAS itself. To maximize the benefits of the released NMF, however, we believe that the Bureau must substantially improve the way in which the NMF is formatted and released. In a letter recently published in Science, we explain several obstacles researchers may face when accessing, processing, and using the demonstration data for statistical analyses.
  14. LocalView: a database of public meetings for the study of local politics and policy-making in the United States. Nature Scientific Data, 10, 135. 2023.
    • with Soubhik Barari.
    • Click for Abstract. Despite the fundamental importance of American local governments for service provision in areas like education and public health, local policy-making remains difficult and expensive to study at scale due to a lack of centralized data. This article introduces LocalView, the largest existing dataset of real-time local government public meetings -- the central policy-making process in local government. In sum, the dataset currently covers 139,616 videos and their corresponding textual and audio transcripts of local government meetings publicly uploaded to YouTube -- the world's largest public video-sharing website -- from 1,012 places and 2,861 distinct governments across the United States between 2006-2022. The data are processed, downloaded, cleaned, and publicly disseminated at localview.net for analysis across places and over time. We validate this dataset using a variety of methods and demonstrate how it can be used to map local governments' attention to policy areas of interest. Finally, we discuss how LocalView may be used by journalists, academics, and other users for understanding how local communities deliberate crucial policy questions on topics including climate change, public health, and immigration.
    • Project Website
    • Press Coverage: Nature Behavioral and Social Sciences Blog
  15. Comment: The Essential Role of Policy Evaluation for the 2020 Census Disclosure Avoidance System. Harvard Data Science Review, Special Issue 2. 2023.
    • with Christopher Kenny, Cory McCartan, Evan T. R. Rosenman, and Kosuke Imai
    • Click for Abstract. In "Differential Perspectives: Epistemic Disconnects Surrounding the US Census Bureau's Use of Differential Privacy," boyd and Sarathy argue that empirical evaluations of the Census Disclosure Avoidance System (DAS), including our published analysis, failed to recognize how the benchmark data against which the 2020 DAS was evaluated is never a ground truth of population counts. In this commentary, we explain why policy evaluation, which was the main goal of our analysis, is still meaningful without access to a perfect ground truth. We also point out that our evaluation leveraged features specific to the decennial Census and redistricting data, such as block-level population invariance under swapping and voter file racial identification, better approximating a comparison with the ground truth. Lastly, we show that accurate statistical predictions of individual race based on the Bayesian Improved Surname Geocoding, while not a violation of differential privacy, substantially increases the disclosure risk of private information the Census Bureau sought to protect. We conclude by arguing that policy makers must confront a key trade-off between data utility and privacy protection, and an epistemic disconnect alone is insufficient to explain disagreements between policy choices.
  16. Simulated redistricting plans for the analysis and evaluation of redistricting in the United States. Nature Scientific Data, 9, 698. 2022.
    • with Cory McCartan, Christopher Kenny, George Garcia III, Kevin Wang, Melissa Wu, and Kosuke Imai
    • Click for Abstract. This article introduces the 50stateSimulations, a collection of simulated congressional districting plans and underlying code developed by the Algorithm-Assisted Redistricting Methodology (ALARM) Project. The 50stateSimulations allow for the evaluation of enacted and other congressional redistricting plans in the United States. While the use of redistricting simulation algorithms has become standard in academic research and court cases, any simulation analysis requires non-trivial eforts to combine multiple data sets, identify state-specifc redistricting criteria, implement complex simulation algorithms, and summarize and visualize simulation outputs. We have developed a complete workfow that facilitates this entire process of simulation-based redistricting analysis for the congressional districts of all 50 states. The resulting 50stateSimulations include ensembles of simulated 2020 congressional redistricting plans and necessary replication data. We also provide the underlying code, which serves as a template for customized analyses. All data and code are free and publicly available. This article details the design, creation, and validation of the data.
    • Project Website
    • Replication: Dataverse
  17. Presidential Patronage and Executive Branch Appontments, 1925-1959. Presidential Studies Quarterly, 52(1): 38-59. 2022.
    • with Jon Rogowski.
    • Click for Abstract. We study presidential patronage as a form of distributive politics. To do so, we introduce comprehensive data on supervisory personnel in the executive branch between 1925 and 1959 and link each bureaucrat to the congressional representative from their home district. We identify testable hypotheses regarding the impact of electoral considerations, partisanship, and legislative support on the distribution of bureaucratic appointments across districts. Results from a variety of fixed-effects estimation strategies are consistent with several forms of presidential patronage. Our results provide initial evidence about the mechanisms through which patronage appointments are administered in the executive branch and illustrate how presidential politics affects the composition of the federal government.
  18. The Use of Differential Privacy for Census Data and its Impact on Redistricting: The Case of the 2020 U.S. Census. Science Advances, vol. 7, eabk3283. 2021.
    • with Christopher Kenny, Cory McCartan, Evan Rosenman, and Kosuke Imai.
    • Click for Abstract. Census statistics play a key role in public policy decisions and social science research. Yet given the risk of revealing individual information, many statistical agencies are considering disclosure control methods based on differential privacy, which add noise to tabulated data. Unlike other applications of differential privacy, however, census statistics must be post-processed after noise injection to be usable. We study the impact of the US Census Bureau's new Disclosure Avoidance System (DAS) on a major application of census statistics: the redrawing of electoral districts. We find that the DAS systematically undercounts the population in mixed-race and mixed-partisan precincts, yielding unpredictable racial and partisan biases. The DAS also leads to a likely violation of "One Person, One Vote" standard as currently interpreted, but does not prevent accurate predictions of an individual's race and ethnicity. Our findings underscore the difficulty of balancing accuracy and respondent privacy in the Census.
    • Selected Press Coverage: Associated Press, The Washington Post, San Francisco Chronicle
    • Originally a Public Comment to the US Census Bureau (May 28, 2021)

Working Papers

  1. Tabling Debate: How Local Officials Use Agenda Control to Stifle Conflict
    • with Mirya Holman.
    • Click for Abstract. Public officials influence policymaking by choosing which items receive attention and action -- and which do not. Accounts from national legislatures typically explain agenda control in terms of party leadership and discipline. But, do politicians exert agenda control outside highly professionalized legislatures? We bring the agenda control discussion to school boards, which generally lack strong party control and feature few restrictions on agendas. We argue that local officials will increase their use of procedural rules to avoid making decisions in conflictual settings. We test our argument by constructing measures of both agenda control and conflict in a dataset of nearly 65,000 school board meeting transcripts. Consistent with our theory, we document an increased use of procedural control in highly contentious meetings. Responses from these school board members to a novel survey experiment confirm the causal link: they increase their use of tabling when conflict occurs on an issue.
  2. Redistricting Reforms Reduce Gerrymandering by Constraining Partisan Actors
    • with Cory McCartan, Christopher T. Kenny, Emma Ebowe, Michael Y. Zhao, and Kosuke Imai.
    • Click for Abstract. Political actors frequently manipulate redistricting plans to gain electoral advantages, a process commonly known as gerrymandering. To address this problem, several states have implemented institutional reforms including the establishment of map-drawing commissions. It is difficult to assess the impact of such reforms because each state structures bundles of complex rules in different ways. We propose to model redistricting processes as a sequential game. The equilibrium solution to the game summarizes multi-step institutional interactions as a single dimensional score. This score measures the leeway political actors have over the partisan lean of the final plan. Using a differences-in-differences design, we demonstrate that reforms reduce partisan bias and increase competitiveness when they constrain partisan actors. We perform a counterfactual policy analysis to estimate the partisan effects of enacting recent institutional reforms nationwide. We find that instituting redistricting commissions generally reduces the current Republican advantage, but Michigan-style reforms would yield a much greater pro-Democratic effect than types of redistricting commissions adopted in Ohio and New York.
  3. Does Reducing Documentation Burden Broaden Access to Emergency Rental Assistance? Quasi-experimental Evidence from Virginia.
    • with OES (see Analysis Plan).
    • Click for Abstract. We examine the effects of a “fact-specific proxy” (FSP) introduced by Virginia’s Department of Housing and Community Development (VA DHCD) to broaden and streamline access to assistance. The FSP used the applicant’s ZIP code as a proxy for income eligibility, simplifying the requirement of documenting income eligibility for some applicants and not others. Simplifying income eligibility verification represents a substantial documentation burden reduction. Our general goal in the project is to ask: to what extent does simplifying the individual requirement to document income eligibility for applicants in relevant zip codes increase applications (especially among underserved groups) and reduce processing times? We analyze application data aggregated to the ZIP code level in order to answer this question.
  4. Using Large-Scale Data to Monitor Conditions in New York City Public Housing.
    • with OES (see Analysis Plan).
    • Click for Abstract.

      On January 31, 2019, the U.S. Department of Housing and Urban Development (HUD), the U.S. Attorney's Office for the Southern District of New York (SDNY within DOJ), the New York City Housing Authority (NYCHA), and New York City (the City) signed an agreement to help NYCHA significantly improve housing conditions for its residents. Housing conditions targeted for improvement ranged from lead paint to heat to pest infestations. In turn, unsafe housing conditions can harm public housing residents' health, increasing their risk for conditions such as childhood and adult asthma.

      Measuring improvement in pest conditions is difficult because there was no “pest census” at the time of the agreement—that is, no complete account of the presence or absence of pests in all units, buildings, and developments. Meanwhile, the administrative agreement requires that NYCHA reduces its pest population by certain magnitudes (e.g., 40-50% depending on the pest type), which makes it important to obtain unbiased measures of pest prevalence. Our collaboration focused on methodologies that could be used to monitor: (1) the baseline levels of pest infestations at the beginning of the legal oversight and (2) whether there are improvements over time. We explored the pros and cons of four strategies for estimating this prevalence: (1) using tenant-submitted work orders as measures of underlying issues, (2) using results from randomly-scheduled inspections, (3) using results from randomly-scheduled inspections but reweighting these results to account for unequal probabilities of being selected for an inspection and agreeing to have one's unit inspected, and (4) using predictive modeling to see whether we can predict inspection results using many predictors (e.g., work order history; building characteristics). We are preparing a manuscript that discusses broader lessons for the role of data science in monitoring compliance with legal oversight and provides recommendations for academics and policymakers