forum on data analytics and inclusivity, part 1
Data analytics is a buzzword in the business world these days. One of the industries in which data analytics has made the biggest impact is sports. The publication of Moneyball in 2003 signaled a sea change in how baseball teams used data and statistics to make personnel changes. Basketball wasn’t far behind in implementing advanced statistics in the front office. The MIT Sloan Sports Analytics conference has become a hub of industry activity, attracting academics, journalists, and sports insiders.
In general, data analytics has been celebrated as a more enlightened way to approach sports management. But it was only a matter of time before sports analytics got some backlash. The most recent criticism comes from the respected sports journalist, Michael Wilbon, who wrote a piece for The Undefeated about how data analytics has fallen on deaf ears in the African American community.
Log onto any mainstream website or media outlet (certainly any program within the ESPN empire) and 30 seconds cannot pass without extreme statistical analysis, which didn’t exist 20 years ago, hijacking the conversation. But not in “BlackWorld,” where never is heard an advanced analytical word. Not in urban barbershops. Not in text chains during three-hour games. Not around office water coolers. Not even in pressrooms or locker rooms where black folks who make a living in the industry spend all day and half the night talking about the most intimate details of sports.
Wilbon makes the point that in sports data analytics have become just one more way for the Old Boy Network to reassert their status. Of course, I’ve heard other people make the case that analytics levels the playing field, given that it doesn’t require any sort of credentials to participate and is potentially race- and gender-blind. Other journalists have already criticized Wilbon’s claims and methodology (including this response by Dave Schilling), but it’s undoubtedly true that Wilbon’s point of view is shared by others in sports.
We’re using Wilbon’s essay as an opportunity to have a discussion about data analytics and inclusivity. This is an issue that doesn’t just affect sports. As analytics become more integral to the business world, organizations will use analytics to sort talent in many of the most lucrative jobs. Academia, especially in STEM fields, continually wrestles with questions about inclusivity as well.
I’ve invited a handful of scholars and practitioners in the field of data analytics, many of whom work in the world of sports analytics, to comment on this issue. I’ll post their responses in two parts: half today and the other set tomorrow. Today’s essays are written by three contributors who all have different takes on how analytics can be used to overcome the problems Wilbon identified in his essay. Brian Mills is a sports economist at the University of Florida. His research applies “economic lessons and quantitative analysis to problems that sport managers face in their everyday decision making.” Sekou Bermiss is an organizational theorist at the University of Texas McCombs School of Business. He studies the relationships between human capital, reputation, and firm performance. Laura Nelson is currently a postdoc at Northwestern’s Kellogg School of Management. She uses computational methods to analyze organizational histories and changes in the feminist movement. She’s also, like me, a San Francisco Giants fan.
Thanks to all of the contributors. Come back for more commentary later, and please feel free to leave comments below.
Brian Mills, University of Florida
Avoiding Subjective Biases in Evaluation through Analytics
When Brayden asked if I had interest in contributing a blog post regarding diversity in analytics, I was initially hesitant. I asked myself, “What would I be able to contribute to a diversity discussion, given my rather non-diverse background?”
But, in further articulating this concern with another colleague and with Brayden, I thought I might be able to speak a bit about how the very existence of analytics in sport—and in the workplace more generally—is itself a pursuit in avoiding bias and subjectivity in evaluation.
Let me begin with my own definition of analytics: the pursuit of simplistic and objective statistical information communicated in a context understandable to the decision maker. That’s it. The rest of the analytics discussion is simply about which tools to use and how to build those tools. And, as it turns out, we’ve been doing analytics for a long time. The tools and speed at which those tools are implemented are what has been changing recently.
My research focuses on sports, so I’ll largely stick to this context. In its most raw form, sport at the professional level pines for objectivity. Perform on the field, and you’re loved. Fail to meet expectations, and you’re chopped liver. Perfect objectivity in performance evaluation is the key to the ultimate goal: winning in an uncertain world.
Given this, it is perhaps no surprise that—of the many owners in MLB—Branch Rickey enlisted Jackie Robinson to break MLB’s color barrier in 1947.
Rickey was known as an innovator with an astute business sense and interest in making the Dodgers profitable. One of the strategies Rickey used to make his team profitable was mathematics—single equations that could summarize the contribution of a player in a single number. Early form analytics, if you will.
He knew much of what he needed to know about a player from Allan Roth’s statistical contributions. Can he hit? Can he run? Can he field? Can he throw? Some fans were outraged at the decision to sign Robinson, an unfortunate depiction of our society at the time.
But Rickey knew the raw sports outcome: win and the fans will show up. While scouts of other teams expressed their biases in performance evaluation, Rickey found ways to add players using more objective information. The Dodgers would win. And Rickey and the organization would benefit.
This is precisely the point made by Becker in his Economics of Discrimination.
There are, of course, various ways in which any single person may be biased. Race is only the most salient of these in the history of our society.
Specific to baseball, there are stories of scouts reporting which players have the right facial shape, or even the right backside, to be a superstar. There are claims about which players will be detrimental to team chemistry—with no definition of the constraints to contributing to it. These types of claims have hovered dangerously around ethnic lines. Further, academic research has identified language-based discrimination in the NHL. And both Brayden and I have worked with data on umpires, finding various biases in their subjective judgment on balls and strikes.
But analytics is objective. It is emotionless. We use it to identify biases in those non-analytics based processes. We avoid subjective judgment through our algorithms. Right?
Only if our inputs are diverse.
A myopic analyst might respond: well, I have lots of different variables. But inputs must include perspectives coming from various analysts with various backgrounds and life experiences to guide the use of those variables.
Implicit bias can ensure that efforts toward objectivity through analytics fail to diversify outcomes. We end up creating precisely the monolith that the process shouldn’t be.
As Becker notes, even under the watch of a seemingly unbiased manager, institutions can still drive discriminatory outcomes. Diversity is therefore also important in creating these institutions, or in the case of analytics, rules by which information processing operates.
Diversifying Our Analytic Inputs
The common misunderstanding that analytics is monolithic and esoteric drives much of the resistance to its use. I call these characterizations Capital-A Analytics. It’s the mysterious Analytics we hear about through marketing materials, bad journalism, and overarching discussions of Big Data (as opposed to large datasets).
Characterizing analytics—again, just the processing of information—in this way lends itself to generalization not just about what analytics is and does, but also who should be involved. As analytics is grossly characterized as the Capital-A version, it alienates many people that could make needed contributions. This reduces the inputs available to the analyst.
There has been recent coverage of the bias within algorithms that help make decisions, with the bias often occurring across racial lines. This is a serious problem not just for individuals dealing with this bias, but also for organizations.
Biased algorithms are the antithesis to the philosophical underpinning of analytics—the use of (sometimes sophisticated) tools to search for objective information. But it can often end up exclusionary. The rules are decided on by humans. And humans are inherently biased beings.
Without a diversity of inputs when using analytical techniques—again, inputs are not just some extra variables—it can be difficult to both ask the right questions and identify the correct strategies to do so. The lack of such diversity can bring about the same biases that we try so hard to avoid with our fancy tools. It’s not a replacement for thinking, but a support system for it.
For example, we can build a statistical model based on MLB outcomes, and try to use this to identify the best college players. But we can ask more of our data by identifying constraints of the model with input from coaches and scouts that watch the game.
Or perhaps something my fellow academics can closely identify with: what if we leave it to academic analytics companies to drive all tenure and promotions decisions? No need for a committee of biased colleagues blocking our promotion. We have an emotionless algorithm that decides if we’ve done enough, designed by a physicist and a computer scientist. These two highly quantitative people can probably program an intensive system and give us some interesting results.
This would be perfectly objective. At least based on the rules the physicist agreed on beforehand.
While we often hear rumblings that these biases mean we should just give up, I tend to disagree. It doesn’t mean we should quit trying to measure important things. Measuring sports performance seems like an important task. As does evaluating teaching effectiveness and student learning outcomes. And so does estimating levels of academic contributions.
But rather than jumping on the keyboard and asking questions of the data, we should be asking: what are the questions? In the name of diversity, making this process inclusive is the only way to find the right answers. It ensures we don’t build institutional walls through strange rule decisions in algorithms. It’s not about replacing evaluation, but supporting it with as much objective information as possible.
 At the same time, Rickey enlisted an in-house statistician, Allan Roth, to perform proprietary analyses for the Dodgers.
 I have some thoughts on Big Data characterizations as well, but let’s leave that for another day.
Sekou Bermiss, University of Texas
Let me first state that I have a huge amount of respect for Michael Wilbon, but I strongly disagree with the primary thesis of his recent article on TheUndefeated.com. I don’t believe that the appreciation and discussion of advanced analytics differs drastically by race. Based on my experience, the average sports fan (regardless of race) is unlikely to ring up Effective FG% or Player Efficiency Rating in casual conversation. However, I believe that both black and white fan bases understand the basic premise of advanced analytics. They both consume the same articles on ESPN.com and FiveThirtyEight.com that use advanced stats to help readers better understand the games that they love.
I also disagree with the specific arguments Wilbon makes about the value of advanced analytics. I believe that the purpose of sports analytics is to compliment human emotion and intuition, not to replace it. Just like any good top flight MBA program, a top flight NBA organization needs both “Poets and Quants”. I would not expect Draymond Green to be thinking about his Offensive Rating at different spots of the floor as he is playing. But I would expect someone on the Golden State Warriors coaching staff to be aware of the places on the court where Green is most efficient, and work with head coach Steve Kerr to design plays so that Green “naturally” gets the ball in these places. These issues aside, I agree with Wilbon about one point which is to question if racial disparities around the appreciation of advanced analytics might serve as a barrier for diversity in NBA front offices. In this regard, I think Wilbon’s article represents a central issue related to racial inclusion within organizations: the dual impact of stereotyping and the self-fulfilling prophecy.
As sociologist Robert Merton outlined in his famous 1948 paper, the self-fulfilling prophecy begins with a false definition of a given situation (i.e., black people “are not feeling” advanced analytics) which evokes new behaviors from people that make the false definition come true. Merton wrote how this phenomenon drove the infamous bank runs in the Depression Era. When enough account holders, for whatever reason, incorrectly believe that their bank may be on the brink of insolvency, they all race to the bank and attempt to simultaneously withdraw their entire account. It is this action, however, that drives the bank into insolvency. Now imagine the self-fulfilling prophecy operating when the initial false definition is based on a widely-held negative stereotype. The stereotype threat research by social psychologist Claude M. Steele and others shows that when individuals are reminded of negative stereotypes related to their demographic, their apprehension of being negatively judged by others leads them to perform poorly, which often supports the negative stereotype. Thus telling black students that most black students don’t perform well academically fills them with an anxiety that negatively impacts their academic performance.
This is my primary issue with the Wilbon article. He argues that black people do not have any appreciation or use for advanced analytics. What is especially damning is the unequivocal language he uses. He speaks in absolutes without any room for exception. Talking about how often black people will use analytics when discussing sports he writes, “It’s not part of any discussion of any game for any reason, ever.” I think the arguments in the article are incorrect, but worse I think it is the false definition that may continue the self-fulfilling prophesy that makes black people apprehensive about embracing advanced analytics and thus preventing them from benefitting from technological advancements. Within all businesses there is a large trend towards using the statistical analysis of Big Data to make strategic and operational decisions. There is a legitimate concern that the racial digital divide is a large obstacle that will prevent racial diversity in sectors focused on computing and high-level math. And while there are a myriad of historical, economic, and institutional barriers that may keep black men and women out of jobs in this sector, I believe the the most insidious obstacle is cognitive. The idea that black people have some natural predisposition against advanced analytics is self-destructive and counter-productive especially when it comes from one of the most respected black journalists in the country.
Laura Nelson, Northwestern University
Developing a Feel It, Smell It, Touch It Analytics
The subtitle of Wilbon’s article, “Why blacks are not feeling the sports metrics movement,” is a great jumping off point to discuss the issue of inclusivity in data science/data analytics/computational social science more broadly. In the article Wilbon articulates a point that bears repeating: new tools and technologies tend to reproduce existing social structures. But, like Twitter during the Arab Spring, new tools can also be used to challenge the status quo. We are at a critical moment in the field of data analytics. Will it be used to exclude, or expose? Reinforce privilege, or undermine it?
There are plenty of examples of how data analytics is reproducing existing inequalities. In addition to Wilbon’s examples I’ll add one more: the giant, gaping, “Gender Data Gap” in sports. In response to Nate Silver celebrating the richness of data in sports, Allison McCann clarified: “Men’s sports have awesome data.” Fans, leagues, and institutions like ESPN are pouring thousands of hours and millions of dollars into high-tech data collection for men’s sports but, reinforcing existing inequalities, not for women’s sports. +1 for Wilbon’s cynical view.
As the field of data science/data analytics develops we should of course work to ensure it’s inclusive at every level: training, hiring, promotions, and data collection. But there’s a deeper point in Wilbon’s article, one that takes us out of the world of sports analytics and into the world of academic social science.
In the article Stanford lawyer Larry Irving argued that exclusion is not the only reason why black folks are not feeling sports analytics: “Sports is emotional. And analytics represent the absence of emotion, the antithesis…And it just seems to me we are the feel it, smell it, touch it people.” This statement is reminiscent of a critique made by feminist and critical race theorists in the 1980s and 1990s leveled at the use of statistics in the social sciences. Traditional statistics, they claim, fails to address everyday life and does not allow for inter-subjective understandings, thus ignoring the experiences, lives, and, specifically, emotions of women and minorities excluded from the public sphere. Feminists Dorothy Smith and Patricia Hill Collins developed methods they use in their own work – institutional ethnography to expose the social relations that structure everyday life and standpoint theory to allow for inter-subjectively distinct experiences – to address the failings of traditional statistics, failings that now plague data analytics.
But here’s where it gets exciting. We are experiencing increasing access to troves of subjectively-created data that deal directly with everyday life and are informed by inter-subjective discourse. Context-rich data analysis techniques like machine learning, neural networks, and word embeddings allow us to analyze these data without losing the context and emotion embedded within. We have the potential, then, to turn the data analytics movement into a potent emotional, critical, feminist, inter-subjective computational social science. (While sports is not my specialty, in the sports world an equivalent may be “new wave” optical player movement tracking, rich and potentially emotional data that, as of yet, the “old boy” network does not know what to do with.)
Rather than simply calling for diversity within the field of data analytics on terms dictated by this “new ‘Old Boy Network’ of Ivy Leaguers,” we should instead make big data, and data analytics, work on our own terms. I believe we can develop a data analytics that helps us touch, smell, and feel the game (and the social world).