Goal
The goal of this project is to predict the sentiment of Wikipedia discussion comments and identify any sources of bias that may exist in the datasets, and to develop testable hypotheses about how these biases might impact the behavior of machine learning models trained on the data extracted from these datasets, when those models are used for research purposes or to power data-driven applications.
The corpus we use for the project is called the Wikipedia Talk corpus, and it consists of three datasets. Each dataset contains thousands of online discussion posts made by Wikipedia editors who were discussing how to write and edit Wikipedia articles. Crowdworkers labelled these posts for three kinds of hostile speech: “toxicity”, “aggression”, and “personal attacks”. Many posts in each dataset were labelled by multiple crowdworkers for each type of hostile speech, to improve accuracy. Google data scientists used these annotated datasets to train machine learning models as part of a project called Conversation AI. The models have been used in a variety of software products and made freely accessible to anyone through the Perspective API.
For the purpose of this analysis, we will be focusing on personal attacks dataset
There are currently two distinct types of data included:
- A corpus of all 95 million user and article talk diffs made between 2001–2015 which can be scored by our personal attacks model.
- An annotated dataset of 1m crowd-sourced annotations that cover 100k talk page diffs (with 10 judgements per diff) for personal attacks, aggression, and toxicity.
Analyze the personal attacks datasets and answer some of the following questions
- Predict sentiments of personal attack comments using Naive Bayes
- Explore relationships between worker demographics and labeling behavior
- How consistent are labelling behaviors among workers with different demographic profiles? For example, are female-identified labelers more or less likely to label comments as aggressive than male-identified labelers?
- If the labelling behaviors are different, what are some possible causes and consequences of this difference?