Chapter 6 Ethics: Amplification

6.1 The Problem

6.1.1 Algorithmic Determinism

The Alleghany County Office of Children Youth and Families runs a system called the Key Information and Demographic System, or KIDS; effectively, this is the part of the local government that deals with issues of child neglect and abuse, and the county uses a risk model called the Allegheny Family Screening Tool (AFST) to forecast child abuse and neglect. As Eubanks (2018) points out, this system is composed of a model that is at it’s core a regression. Ideally, ASFT would predict child maltreatment, but for obvious reasons, measurement is impossible; instead, the system predicts the proxy variables of “community re-referral” where multiple calls are received for the same child within two years, and the proxy variable of “child placement,” when a call results in a child being placed in foster care within two years. According to Eubanks (2018), “the ASFT actually predicts decisions made by the community (which families will be reported to the hotline) and by the agency and the family courts (which children will be removed from their families).” Because three quarter of cases are actually “child neglect” cases, and the line for neglect can be quite vague, poor families disproportionately get reported for child neglect. This most commonly occurs when parents leave their children at home while they go to work. Once a child gets placed in the system, they are already flagged once, and their risk assessment scores increase. Poor and minority families are more likely to get reported on face. Subsequent referrals thus create a reinforcing cycle, once a report has been made, future risk assessments are more likely to be triggered as problematic risk assessment.

This is a case of an algorithmic system constructing a feedback loop, thereby amplifying pre-existing bias that already exists. It has also been called “algorithmic determinism.” However, it is not the only sort of bias that exists.

6.1.2 Amplifying Misinformation

In 2018, a study by Vosoughi, Roy, and Aral (2018), titled “The Spread of True and False News Online” which found that false content was 70% more likely to be shared than true content, and that it also takes true stories about six times longer to reach users as for false stories to reach the same number of people. Users receive their news from algorithmically driven social media; these social media organizations optimize for engagement, and at the end of the day false content drives a sense of “novelty” and in turn, greater engagement.

Building on this, Phillips (2018) goes one step farther and lays out how the alt-right manipualted journalists, and recommendation systems in order to amplify their message. A small but very vocal group of bad faith actors would begin by manufacturing outrage, and bombarding their victims with abuse on social media. By generating enough outrage, the content would go viral, at which point, mainstream media outlets would cover the the material. Because journalists are inclined to cover both sides of the issue, they cover both conflicting sides. Thus, this mainstream media coverage would result in more users sympathizing with bad faith actors, generating even further mainstream media coverage and amplifying the message even more prominently in society. Consider an example from Boyd (2017), wherein computer scientist Latanya Sweeney, searched her own name on google and found a series of advertisements “inviting her to ask if she had a criminal record.” In turn, she ran a series of black and white names through the search engine and found only the black names generated criminal justice results. This information in turn suggested that the search engine had in effect “learned” about racial biases from user search history. In the former case, a series of bad faith actors had gamed a series of recommendation engines in order to amplify their message, whereas, in the latter case a search engine learned underlying human biases and amplified the results right back.

What is worth noting though is that this amplification loop is not unique to social media, in essence, it applies to all algorithmically generated content, and recommendation systems. Any recommender system that uses mass user-generated data is vulnerable to this sort of amplification. Even if one does their best to try to sanitize a dataset that may contain racialized content or bias, at the end of the day, “no amount of excluding certain subreddits, removing of categories of tweets, or ignoring content with problematic words will prepare you for those who are hellbent on messing with you” (Boyd 2017).

6.2 What can be done

Unfortunately, amplification is a feature not a bug of large technical systems. Large technical systems are designed to scale, and many machine learning systems are created specifically to enable this scaling. Most systems can eventually be “hacked” in order to amplify problematic content or results, there are even bad actors that bank on it. That being said, documentation, an open door policy, stress testing and early detection are key. Most importantly, however, what is needed is a mindset from practitioners to consider and think about manners in which their machine learning systems can be ‘hacked’ to amplify the aims of bad actors.

Race and socioeconomic should be warily used as inputs for algorithmic models, as should data that functions as a proxy for race and socioeconomic status (zip code, high school). These inputs have the potential to amplify existing inequality trends, and create deterministic output based on the inputs.

Brainstorming sources of bias is a helpful intellectual exercise to explore the possibilities of misuse of any AI system. This can be done with a handful of practitioners where they brainstorm and note possible sources of bias that could be both introduced and amplified in their machine learning system. Once the brainstorming is completed, noting possible results in any model documentation is integral to see if these issues crop up again. In a more structured manner, Mitchell et al. (n.d.) has suggested utilizing model cards for all productionized machine learning models. In addition, impact analyses are a helpful intellectual exercise to brainstorm manners in which underlying systems can be possibly “broken.”

Acceptance testing is integral in any technical product being built. For machine learning models in particular, creating extreme sample data in order to run “sanity tests” on the resulting machine learning model being implemented is integral. This sample data can be derived from existing sample data, but with artificial changes in specific columns relating to race, or geolocation. This provides an straightforward mechanism to stress test an algorithmic system for possible amplification bias.

6.3 Resources

An article to read: “Alternative Influence” by Rebecca Lewis https://datasociety.net/wp-content/uploads/2018/09/DS_Alternative_Influence.pdf