forum on data analytics and inclusivity, part 2
This post is the second part of our forum on data analytics and inclusivity. The forum was inspired by an essay written by Michael Wilbon about African Americans and analytics. I’ve asked several people who work in analytics to comment on the problems with and opportunities for inclusivity in data analytics, especially as it relates to sports analytics. The first set of essays can be found here.
Today’s essays are written by three contributors who have direct experience in data analytics and sports. The essays all deal with, in some way, root causes of a racial gap in analytics. Michael Lopez is a statistician at Skidmore College who has written extensively about sports analytics at places like Sports Illustrated and Fivethirtyeight. Jerry Kim is an economic sociologist, who has been at Columbia University since 2006 and will soon join the business school at Rutgers University. His research focuses on the consequences of status for evaluation and he has written about about the effects of status bias on umpires’ decision-making in the MLB (a paper that I can say with zero bias is amazing). Our final contributor is Trey Causey, a computational social scientist who has done considerable work as a data analyst and consultant for the NFL and who is now a data scientist at ChefSteps.
I know that this won’t be nor should it be the last word on this topic. Going forward we need more discussions of this type, especially as analytics becomes increasingly central to how business and sports operate.
Michael Lopez, Skidmore College
Ignorance of lack of accessibility?
What was ironic about Michael Wilbon’s article on a racial divide in sports analytics is that the exact crusade he’s skeptical of – an understanding of how to collect and learn from data – is one that would’ve helped him make his point.
For example, Wilbon could’ve sampled players from each race to show whether or not black players use analytics less than white players. He could’ve spent 15 minutes clicking on league websites to calculate the percentage of analytics workers who aren’t white men. Or he could’ve done some research to show that the exclusion of blacks in league front offices matches those in other STEM (Science, Technology, Engineering, Math) workforces. But instead, Wilbon took his stance behind a confusing combination of convenience sampling, anecdotal evidence, and silly misconceptions (Dwight Howard an analytics driven max-contract?).
However, making a point badly doesn’t mean you’ve made a bad point. And so instead of just bashing Wilbon’s approach, it’s perhaps more interesting to consider the root causes behind his and others anxieties. To start, Wilbon’s argument, as well as his approach to making it, implies unfamiliarity with statistics. Perhaps this isn’t surprising. Wilbon may simply fear what he doesn’t understand, matching opinions shared by many other veteran members of the old guard, and, incidentally, one entire sports league.
But there’s more at play here besides a fear of the unknown. An interest in analytics also suggests an education in related fields, and in that respect, Wilbon has reasons to be peeved. STEM fields, both in academia and the workforce, have long been white male dominated, the result of a damaged pipeline stretching back well before adulthood. African American and Hispanic high school students, for example, receive lower access to math and science courses, which then negatively impacts their ability to enter similar majors in college. When they do reach college, minority students interested in STEM majors are more likely to be bad matches at their institutions. This results is lower retention rates for minority students in these majors, with the problem only exacerbated at elite institutions. The problem persists through grad school – in 2007, for example, black women received fewer than 2% of doctorates earned in either statistics or mathematics. So when Wilbon gripes about the “Ivy Leaguers” network of mostly white men with fancy degrees, he’s accurate. The path to understanding, implementing, and appreciating sports analytics has leaked too many minorities. Worse, minorities aren’t the only group that STEM fields have left behind – I can count on my fingers the number of women doing analytics professionally for sports teams, and only a handful more are panelists at the Sloan Conference each year.
How can sports analytics do better? Well, an obvious place to start would be easier entry into education, including cheaper textbooks and open access to publications. More subtly, we in sports analytics are also being exclusionary without necessarily meaning to be. One of our most popular websites sits behind a paywall, and our most popular conference, a breeding ground for Wilbon’s so-called Ivy Leaguers network, cost $575 to attend last year. At that price, Sloan has not only opened doors to careers in sports analytics, it has also closed them. Finally, and more personally, those currently in positions of power in sports analytics should be aware of their own implicit biases when it comes to issues of diversity and inclusion, lest they creep into hiring or recruitment decisions. As examples, it’s bullshit that white names on resumes receive 50% more call-backs than black ones, or that science faculty perceive CV’s with the name Jennifer on it to be less competent and less hireable than those with the name John.
Until those of us currently involved in sports analytics make these types of improvements, the silliness of Wilbon’s article won’t just fall on him, but on us, too.
Jerry Kim, Columbia University
I still recall the pure exhilaration and joy I experienced reading Moneyball, Michael Lewis’ exquisite account of the Oakland Athletics and team general manager Billy Beane’s use of unconventional statistics to compete against big budget teams. As a longtime baseball nerd who spent embarrassingly large chunks of time on forums for likeminded baseball geeks, I was already quite familiar with “sabermetrics”, and the power of data analytics in understanding what happened on the field. What excited me so much about the book, if not the novelty of the ideas, was the fact that the management of a professional sports team was so unequivocally willing to embrace these ideas as a core of their strategy. Moneyball showed that geeks–usually relegated to dark basements in their parents’ home, instead of sunny playing fields filled with spectators–had a place in professional sports. Fast forward to today, and pretty much all sports teams have analytics staff that play significant roles in the operation of the team. The “nerds” are increasing their power and status in the sports world, and the “jocks” are in the unfamiliar and uncomfortable position of being subordinates.
Wilbon’s essay speaks to this anxiety caused by the shift in what organizational scholars would call the “conception of control”. As data analytics becomes a key way through which organizations see the world and formulate their strategies, this inevitably causes changes in the existing power structure with winners and losers. While Wilbon focuses on the resistance of African Americans to data analytics, such lukewarm responses to the data revolution is most likely common to any sports insider regardless of their race, gender, or origins. Ask any current or former player about the role that advanced analytics play in their professional and personal lives, and it would be surprising if the response was substantially different from Draymond Green’s and Shaun Livingston’s view that numbers fail to capture the full impact of a play or player, and that intangibles and feel matter a lot. In fact, skepticism and disdain towards efforts to quantify and compare performances are commonly observed in other professions such as teachers (e.g., No Child Left Behind), doctors (e.g., Surgeon scorecard initiatves), and managers. It is an open question whether African Americans are more hostile towards data analytics—unfortunately, Wilbon’s sampling method was not particularly helpful in this regard, as we don’t have much evidence that younger White athletes have a markedly different attitude—but the essay nonetheless hits upon the larger tension between management, focused coldly on performance and the bottom line only, and workers, who are not appreciative of their multifaceted talents and tacit knowledge being reduced to a set of numbers.
This is not to say that there are no racial or demographic disparities in the data analytic ranks that are rapidly expanding in professional sports and in other industries. For one, the opportunities and resources to acquire the necessary tools are unevenly distributed across society. The persistent racial gap in educational attainment, with minority students having significantly lower graduation rates, and attending lower quality institutions imply that minority students are less likely to have the training required for many data analytics positions. It is also true that, as Wilbon points out, getting a job will depend on one’s network, and networks typically exhibit strong tendencies towards homophily. Wall Street types will see quality in those that share similar backgrounds, just as former athletes will look for the intangibles that made them successful when evaluating talent. These biases (unintentionally) reproduce and exacerbate inequality, and should be a cause for concern.
All this said, we should not undersell the fact that data analytics have allowed a much broader range of people to participate in, contribute to, and enjoy sports. No longer is past athletic success a pre-requisite for taking a management role in a sports organization. The pre-Moneyball era was by no means a paradise of equal access, as it was controlled by an “Old Boy Network” of people that could dunk a basketball, run a sub-5 40, or could hit a 90 mile per hour fastball. This may be in the process of being replaced by a new “Old Boy Network” of former Wall Street types and college graduates with computer skills, but it is safe to say that the new network is more likely to be more diverse in terms of gender, race and sexuality than the old network it is replacing.
More importantly, I would argue that we are asking the wrong question when pondering whether data analytics is inclusive or not. This presupposes that organizations must choose between intangibles and data-driven approaches, or as Wilbon puts it, “emotion vs. intellect”. One of Billy Beane’s greatest strengths was that he combined his past experience as a touted athlete with a strong appreciation for undervalued statistics; the Boston Red Sox broke their World Series curse by having both the father of sabermetrics (Bill James) and Big Papi (David Ortiz), owner of off-the-chart intangibles on their team. No organization can be successful with just data analyst types alone (ask the Sixers), nor can an organization be successful operating based on gut feeling alone. If the rise of the number-crunchers diversifies the organization in ways that contribute to success, then we should celebrate this infusion of new ideas and talents.
Michael Wilbon recently argued that analytics in sports are not only disliked by black people, but are used by the “Old Boy Network” to actively exclude minorities from employment in front offices. Is the field of analytics doing enough to be inclusive? Almost certainly not. Wilbon is correct that the vast majority of analytics professionals in sports are white men. Players, many of whom are black, are at the mercy of the analysts’ decisions. Is it a nefarious plot by Ivy Leaguers to exclude and evaluate those who are unlike them? Almost certainly not. Why is the field of analytics so white and male? And why does it matter?
The two areas that I work in and know the best — technology and sports — have well-documented diversity problems not only in analytics positions but across technical positions. Software engineering, data science, and analytics are frequently said to have a ‘pipeline’ problem: too few underrepresented people even apply for positions in these fields, meaning that applicant pools are dominated by white males. Some organizations try to experiment by doing things like removing names from resumes.
This isn’t enough. Often members of underrepresented groups have taken non-traditional educational and professional paths that don’t match the expectations of more traditional hiring managers. This is not to say that explicit and purposeful discrimination is at work (though it may well be); rather, many organizations are new analytics game and don’t know how to hire for these positions. In the absence of expertise, they rely on noisy and unreliable signals of applicant potential; things like an MBA from an elite business school, a degree from an Ivy League college, or an internship with a team. Of course, these internships are invariably unpaid and often rely on having existing personal or professional (through one’s parents) connections to the team. Homophily and an inability to identify appropriate predictors of candidate success all work against underrepresented groups getting a chance.
But the pipeline problem assumes that individuals already have the knowledge, skills, and confidence to apply for these positions in the first place. Many point to the active amateur analytics community online, where anyone can download R or Python, take a free online course in data analysis, and post their work for the world to read and evaluate. This is of course true, but to argue that the community is accepting and supportive is naive. Particularly in the sports analytics world, the style of critique is often brash and overconfident. This encourages “takedowns” of existing work and extant conventional wisdom.
While some thrive in the often snarky give-and-take that has come to define how analytics is discussed online, it certainly creates a barrier to entry for those who are less confident or unused to a certain style of argument. Those who are seen as less informed are often mocked. Michael Caley, a soccer analyst, and Seth Partnow, a basketball analyst, have referred to this as the ‘quant sneer.’ This is to say nothing of the utterly toxic environment that women working in sports are exposed to on a daily basis. Female journalists, commentators, and referees are subjected to absolutely horrible abuse any time they make public statements.
Given these barriers to entry, why would underrepresented groups even want to participate in the analytics industry? And why should analytics professionals want to encourage this? For the former question, the answer is that data and analytics will be used to make decisions with or without their involvement. Data and algorithms are already affecting our lives in many seen and unseen ways. Anything that is codified and quantified becomes an instrument of control. Those algorithms are made by people — people who decide what’s important to measure, what’s not, and what goes into the models and what gets left out. Doing *good* analytical work requires that assumptions are stated, test, and retested. Having a voice in those decisions is the only way that diverse experiences and backgrounds will be incorporated. For the latter question, diversity makes organizations better — and not just from a sense of morality. Diverse organizations make better decisions and produce outcomes that more robust outcomes. There is a mountain of empirical and theoretical work that supports this. Put simply, organizations are hurting themselves by only hiring people that look like them.
What can be done? It’s easy to say things like “encourage more underrepresented groups to take statistics classes” and “be kinder when providing feedback” — both of which we can and should do. But those are not solutions that scale to industries. For a positive example, I particularly liked the approach taken by the Airbnb Data Science team — they identified a lack of gender diversity in their organization and tackled it as a problem to be solved using data and experimentation. Not only did they make large gains in gender diversity among their applicants and employees, they saw large gains in employee satisfaction and retention. It’s hard to ignore that data.