Coding Protest Event Data & Studying Black Protest

The MPEDS (Machine-Learning Protest Event Data System) project was funded by National Science Foundation grant #1423784. Our current NSF-funded project # 1918342 builds on that earlier work to study Black protest. The core team on this new project are PIs Pamela Oliver and Chaeyoon Lim, grad students David Skalinder and Morgan Matthews, and consultant Alex Hanna.

Black Protest (2019-2021)

Now that MPEDS is basically working, we are using it to collect data. Our first project used MPEDS to identify news stories about Black protests in news wire sources. A working paper from this project is posted on SocArXiv. Our main methodological innovations beyond developing MPEDS are developing better data systems to maintain the relational information linking events to the articles describing them and better work flows and tools for human coding of protest data. As we develop these systems we will both improve the reproducibility and accuracy of the data we have already obtained from newswires and add news coverage from Black newspapers.

Here are some graphs from the working paper.

MPEDS (2014-2017)

The goal of the MPEDS project was to replace the labor-intensive process of having human coders look for information about protests in news sources with a computerized process. Many researchers want to be able to study the conditions under which protest emerges or grows and the ways in which protest affects social policy and/or is repressed. Researchers want to study whether movements around different types of issues exhibit different patterns and whether different “types” of places give rise to different movements. Do environmental movements have different patterns of growth from feminist or Black movements? Do they use similar or different tactics? Are they responded to similarly or differently by police? Human coding of news sources to find information about protests takes a long time and costs a lot in the wages for the human coders. For this reason, most projects are limited to a single issue or a short time period or just one or two news sources. These practical limits make it difficult to know how general the results from one study will be for other issues or places or time periods.

The Machine-Learning Protest Event Data System (MPEDS), developed by project member Alex Hanna, is the first of its kind coming from within the social movement community that is specifically focused on identifying and coding information about protests. MPEDS uses recent innovations from machine learning and natural language processing to generate protest event data with little to no human intervention. This permits the timely coding of information about recent and current events and improves the ability to code information on historical events from the growing pool of sources that are available in machine-readable format. MPEDS is already working for us. The latest version is posted on GitHubAlex Hanna has  papers  (some sole-authored and some co-authored with other team members) that describe the accuracy statistics of MPEDS that vary with the types of materials it is trained on, and give examples of how it can be used.

Other team members besides Alex Hanna are Pamela Oliver and Chaeyoon Lim (PIs).  Graduate students Emanuel Ubert and Katherine Fallon, and a host of undergraduates also participated. The rest of us have primarily worked on the problem of human-coding articles from a wide variety of sources to feed them to MPEDS as training materials. One of the things we have learned on this project is that human coders do not always agree with each other about whether something is a protest or about its descriptive characteristics. We will be writing articles about the overall problem of coding protest events and weighing MPEDS’s accuracy against human coding.