Archive for the ‘data analytics’ Category

the most metal words of all time

leave a comment »

At Degenerate State, there was an interesting post where someone applied natural language processing models to heavy metal lyrics. From the article:

To get the lyrics, I scraped While darklyrics doesn’t have a robots.txt file, I tried to be gentle with my requests. After cleaning the data up, identifying the languages and splitting albums into songs, we are left with a dataset containing lyrics to 222,623 songs from 7,364 bands spread over 22,314 albums.

Before anyone asks, I have no intention of releasing either the raw lyric files or the code used to scrape the website. I collected the lyrics for my own entertainment, and it would be too easy for someone to use this data to copy darklyrics. If people are interested I may release some n-gram data of the corpus.

So what do you find? A few tidbits  – the heavy metal word cloud:

Tag Cloud of All Metal Lyrics

Then, the most “metal words:”

Rank Word Metalness
1 burn 3.81
2 cries 3.63
3 veins 3.59
4 eternity 3.56
5 breathe 3.54
6 beast 3.54
7 gonna 3.53
8 demons 3.53
9 ashes 3.51
10 soul 3.40
11 sorrow 3.40
12 sword 3.38
13 goodbye 3.28
14 dreams 3.28
15 gods 3.24
16 pray 3.22
17 reign 3.15
18 tear 3.12
19 flames 3.12
20 scream 3.11

And the least metal words:

Rank Word Metalness
1 particularly -6.47
2 indicated -6.32
3 secretary -6.29
4 committee -6.16
5 university -6.09
6 relatively -6.08
7 noted -5.85
8 approximately -5.75
9 chairman -5.69
10 employees -5.67
11 attorney -5.66
12 membership -5.64
13 administrative -5.61
14 considerable -5.60
15 academic -5.51
16 literary -5.49
17 agencies -5.48
18 measurements -5.47
19 fiscal -5.45
20 residential -5.45

The bottom line? Academia, the law and administration are the least metal topics of all time. Who knew?

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($5 – cheap!!!!)/Theory for the Working Sociologist/From Black Power/Party in the Street  

Written by fabiorojas

April 19, 2017 at 1:46 am

w.e.b. dubois’ illustrations of black social science data


The website Hyperallergic has a nice article on the drawings that DuBois’ did visualizing some of his data. For a 1900 exhibition, DuBois made, by hand, these interesting visualizations. Tufte, eat yer heart out!

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($2!!!!)/From Black Power/Party in the Street

Written by fabiorojas

July 11, 2016 at 12:01 am

forum on data analytics and inclusivity, part 2

This post is the second part of our forum on data analytics and inclusivity. The forum was inspired by an essay written by Michael Wilbon about African Americans and analytics. I’ve asked several people who work in analytics to comment on the problems with and opportunities for inclusivity in data analytics, especially as it relates to sports analytics.  The first set of essays can be found here.

Today’s essays are written by three contributors who have direct experience in data analytics and sports. The essays all deal with, in some way, root causes of a racial gap in analytics. Michael Lopez is a statistician at Skidmore College who has written extensively about sports analytics at places like Sports Illustrated and Fivethirtyeight. Jerry Kim is an economic sociologist, who has been at Columbia University since 2006 and will soon join the business school at Rutgers University. His research focuses on the consequences of status for evaluation and he has written about about the effects of status bias on umpires’ decision-making in the MLB (a paper that I can say with zero bias is amazing). Our final contributor is Trey Causey, a computational social scientist who has done considerable work as a data analyst and consultant for the NFL and who is now a data scientist at ChefSteps.

I know that this won’t be nor should it be the last word on this topic. Going forward we need more discussions of this type, especially as analytics becomes increasingly central to how business and sports operate.

Read the rest of this entry »

Written by brayden king

June 15, 2016 at 12:24 am

forum on data analytics and inclusivity, part 1

Data analytics is a buzzword in the business world these days. One of the industries in which data analytics has made the biggest impact is sports. The publication of Moneyball in 2003 signaled a sea change in how baseball teams used data and statistics to make personnel changes. Basketball wasn’t far behind in implementing advanced statistics in the front office. The MIT Sloan Sports Analytics conference has become a hub of industry activity, attracting academics, journalists, and sports insiders.

In general, data analytics has been celebrated as a more enlightened way to approach sports management. But it was only a matter of time before sports analytics got some backlash. The most recent criticism comes from the respected sports journalist, Michael Wilbon, who wrote a piece for The Undefeated about how data analytics has fallen on deaf ears in the African American community.

Log onto any mainstream website or media outlet (certainly any program within the ESPN empire) and 30 seconds cannot pass without extreme statistical analysis, which didn’t exist 20 years ago, hijacking the conversation. But not in “BlackWorld,” where never is heard an advanced analytical word. Not in urban barbershops. Not in text chains during three-hour games. Not around office water coolers. Not even in pressrooms or locker rooms where black folks who make a living in the industry spend all day and half the night talking about the most intimate details of sports.

Wilbon makes the point that in sports data analytics have become just one more way for the Old Boy Network to reassert their status. Of course, I’ve heard other people make the case that analytics levels the playing field, given that it doesn’t require any sort of credentials to participate and is potentially race- and gender-blind. Other journalists have already criticized Wilbon’s claims and methodology (including this response by Dave Schilling), but it’s undoubtedly true that Wilbon’s point of view is shared by others in sports.

We’re using Wilbon’s essay as an opportunity to have a discussion about data analytics and inclusivity. This is an issue that doesn’t just affect sports. As analytics become more integral to the business world, organizations will use analytics to sort talent in many of the most lucrative jobs. Academia, especially in STEM fields, continually wrestles with questions about inclusivity as well.

I’ve invited a handful of scholars and practitioners in the field of data analytics, many of whom work in the world of sports analytics, to comment on this issue. I’ll post their responses in two parts:  half today and the other set tomorrow. Today’s essays are written by three contributors who all have different takes on how analytics can be used to overcome the problems Wilbon identified in his essay. Brian Mills is a sports economist at the University of Florida. His research applies “economic lessons and quantitative analysis to problems that sport managers face in their everyday decision making.”  Sekou Bermiss is an organizational theorist at the University of Texas McCombs School of Business. He studies the relationships between human capital, reputation, and firm performance. Laura Nelson is currently a postdoc at Northwestern’s Kellogg School of Management. She uses computational methods to analyze organizational histories and changes in the feminist movement. She’s also, like me, a San Francisco Giants fan.

Thanks to all of the contributors. Come back for more commentary later, and please feel free to leave comments below.

Read the rest of this entry »

Written by brayden king

June 14, 2016 at 2:06 am