Archive for the ‘technology’ Category
Jim Moody and I are writing an article on data visualization in Sociology. Here’s a picture that won’t be in the final version, but I like it all the same.
I keep hearing about the coming big data revolution. Data scientists are now using huge data sets, many produced through online interactions and media, that shed light on basic social processes. Big data data sets, from sources like Twitter, Facebook, or mobile phones, give social scientists ways to tap into interactions and cultural output at a scale that has never been seen before in social science. The way we analyze data in sociology and organizational theory are bound to change due to this influx of new data.
Unfortunately, the big data revolution has yet to happen. When I see job candidates or new scholars present their research, they are mostly using the same methods that their predecessors did, although with incremental improvements to study design. I see more field experiments for sure, and scholars seem more attuned to identification issues, but the data sources are fairly similar to what you would have seen in 2003. With a few notable exceptions, big data have yet to change the way we do our work. Why is that?
Last week Fabio had a really interesting post about brain drain in academia. One reason we might see less big data than we’d like is because the skills needed to handle this type of analysis are rare and much of the talent in this area is finding that research jobs in the for-profit world are more lucrative and rewarding than what they’re being offered in academia. I believe that’s true, especially for the kinds of people who are attracted to data mining techniques. The other problem though, I think, is that social scientists are having a hard time figuring out how to fit big data techniques into the traditional milieu of social science. Sociologists, for example, want studies to be framed in a theoretically compelling way. Organizational theorist would like scholars to use data that map on to the conceptual problems of the field. It’s not always clear in many of the studies that I’ve read and reviewed that big data analyses are doing anything new other than using big data. If big data studies are going to take over the field they need to address pressing theoretical problems.
With that in mind, you should really read a new paper by Chris Bail (forthcoming in Theory and Society) about using big data in cultural sociology. Chris makes the case that cultural sociology, a subfield that is obsessed with understanding the origins of and practical uses of meaning, is prime for a big data surge. Cultural sociology has the theoretical questions, and big data research offers the methods.
More data were accumulated in 2002 than all previous years of human history combined. By 2011, the amount of data collected prior to 2002 was being collected very two days. This dramatic growth in data spans nearly every part of our lives from gene sequencing to consumer behavior. While much of these data are binary and quantitative, text-based data is also being accumulated on an unprecedented scale. In an era of social science research plagued by declining survey response rates and concerns about the generalizability of qualitative research, these data hold considerable potential. Yet social scientists – and cultural sociologists in particular – have ignored the promise of so-called ‘big data.’ Instead, cultural sociologists have left this wellspring of information about the arguments, worldviews, or values of hundreds of millions of people from internet sites and other digitized texts to computer scientists who possess the technological expertise to extract and manage such data but lack the theoretical direction to interpret their meaning in situ….[C]ultural sociologists have made very few ventures into the universe of big data. In this article, I argue inattention to big data among cultural sociologists is particularly surprising since it is naturally occurring – unlike survey research or cross-sectional qualitative interviews – and therefore critical to understanding the evolution of meaning structures in situ. That is, many archived texts are the product of conversations between individuals, groups, or organizations instead of responses to questions created by researchers who usually have only post-hoc intuition about the relevant factors in meaning-making – much less how culture evolves in ‘real time’ (note: footnotes and references removed).
Chris goes on to offer suggestions about how cultural sociology might use big data to address big theoretical questions. For example, he believes that scholars studying discursive fields would be wise to use big data methods to evaluate the content of such fields, the relationships between actors and ideas, and the relationships between different fields. Of course, much of the paper is about how to use big data analysis to enhance or replace traditional methods used in cultural sociology. He discusses how Twitter and Facebook data might supplement newspaper analysis, a fairly common method in cultural and political sociology. Although he doesn’t go into great detail about how you would do it, an implicit argument he makes is that big data analysis might replace some survey methods as ways to explore public opinion.
I continue to think there is enormous potential for using big data in the social sciences. The key for having it accepted more broadly is for data scientists to figure out how to use big data to address important theoretical questions. If you can do that, you’re gold.
In the More Tweets, More Votes paper, we established that Twitter share correlates with future Congressional election results (e.g., % of tweets that mention GOP in a district correlates with the GOP vote share in the district). The deeper question – why? We’ve got a working paper that suggests an answer: Twitter, in some respects, mimics conventional text, which means that is close enough to the grass roots. In other words, people are more likely to use technology if it resembles what they know – an idea going back to a classic paper by Kwon and Zmud.
We can tease out testable implications. Specifically, technologies that are more sophisticated will be less likely to correlate with mass politics. In others, social media that is easy to use and relies mainly on pre-existing language skills are more likely to correlate with social trends than social media that require higher levels of functionality.
We test this with our tweets/votes data. We measured three types of candidate tweet share – “free text,” @mentions, and #hashtags. Free text is the “people’s” method of tweeting, while @mentions and #hashtags are syntaxes that require more knowledge. The grassroots hypothesis implies free text mentions of candidates will have a stronger correlation with election outcomes than @mentions or #hashtags. The results? Free texts correlate (as per the original paper) but the others are not significantly different from zero. The picture says it all.
Stark result. The implication is profound for social scientific studies of social media. If your data requires distinctly Internet based skills, it is less likely to speak to population level trends. Sophistication is probably the mark of connoisseur. Indeed, additional analysis of our data shows that @mention and #hashtag users are “intense” Internet users. For example, they have bigger median followers and are more likely to be “verified” by Twitter.
Last week, it was revealed that the NSA collects important data about all Verizon phone calls and has access to the servers of most major Internet firms like Facebook and Google. Of course, this sort of behavior is exactly what civil rights activists had warned about for years.
But there is a deeper lesson – the Internet has made it remarkably easy for the Federal government to collect enormous amounts of information on many aspects of our lives. If the reports are to be believed, the Prism program, which allows the Feds to search Internet firms, costs only $20 million. I can’t imagine the downloading of Verizon data can’t be that much more expensive. When communication was mainly done through voice and paper, this simply was not possible at the same scale.
So it has come to this. The Internet gives us cheap and easy communication, but it also makes a low cost copy of everything that third parties can hold onto, whether we like it or not. It is clear that the Courts, Congress, and the President aren’t in a rush to make sure that searches are done for probable cause. As I type this, President Obama asserts that it’s ok because they can’t hear your calls, but they just know who you are calling all the time. It’s not clear to me that there is anything that will reverse the erosion of privacy in the Internet age.
Definition: Given a set of X characters, a twiggle is the total possible number of tweets, or X^140. Since most English speakers will use the ASCII characters, one standard English twiggle is 95^140:
This was computed using the Online Long Number Calculator.
Over at A Programmer’s Tale, “jewsin” argues that Facebook is a failure because it encourages people to post junk. Two clips:
I am signed into Facebook right now. At a quick glance, the entire list of posts on the first screen are irrelevant to me. If I scrolled down I can find 4 stories I actually care about, from a list of about 30. The most important page on Facebook has more than three-fourths of absolutely useless content.
Surprising. Facebook is a company with a very large number of talented people. They know a lot about me. Yet, their product looks like one of those spam filled mailboxes from the nineties.
Since everybody is on Facebook, one can expect that it will in some way mirror the behavior of society in general. In the real world however, people’s opinions only have a limited reach.
Facebook is godsent for people who love to talk, but have nothing to say. Here is a network that doesn’t care about originality or the quality of content. In the time it takes to create something original, they could share dozens of things.
I agree, but “jewsin” is missing the point. The point of Facebook isn’t to produce high quality content. It’s a tool for getting millions of people to divulge precious marketing data (however coarse) in exchange for creating a platform where they hang out and stalk their high school crushes. On that count, it’s a mind blowing success.
A number of people have asked me a very important question about the More Tweets, More Votes paper. Do relative tweet rates merely correlate with elections or is there is a causal link?
The paper itself does not settle the issue. The purpose of the paper is merely to document this striking correlation. Given that qualification, let me explain the argument from both sides and my priors.
- Correlation: Twitter is a passive record of how excited people are. If a candidate somehow garners the attention of the public, they get excited and start talking about it, which translates into a higher twitter presence.
- Causal: The unusual attention that a candidate attracts in social media sways undecided or weakly committed voters. In a sense, highly active twitter users are the “opinion leaders” of modern society.
My prior: 75% correlation, 25% cause. How would tease out these arguments? For example, what variable could instrument the district level tweet counts? Interesting to find out.
When people read our More Tweets, More Votes paper, they often wonder – where is the “sentiment analysis?” In other words, why don’t we try to measure whether a tweet is positive or negative? Joe DiGrazia, the lead author, addressed this in a recent interview with techpresident.com:
DiGrazia said the researchers were “kind of surprised” that they saw a correlation without doing sentiment analysis of the Tweets. “We thought we were going to have to look at the sentiment,” he said. He speculated that one reason for the correlation could be a so-called Pollyanna Hypothesis, “that people are more likely to gravitate toward subjects that they are positive about and are more likely to talk about candidates that they support.”
The idea is simply this: the frequency of speech is often a relatively decent approximation of how imporant people think that topic is relative to salient alternatives. If people say “Obama” a little more often than the competition, then it’s not unreasonable to believe that he is more favored. And you don’t need content analysis to suss that out.
Unit of analysis: US House elections in 2010 and 2012. X-Axis: (# of tweets mentioning the GOP candidate)/(# of tweets mentioning either major party candidate). Y-axis: GOP margin of victory.
I have a new working paper with Joe DiGrazia*, Karissa McKelvey and Johan Bollen asking if social media data actually forecasts offline behavior. The abstract:
Is social media a valid indicator of political behavior? We answer this question using a random sample of 537,231,508 tweets from August 1 to November 1, 2010 and data from 406 competitive U.S. congressional elections provided by the Federal Election Commission. Our results show that the percentage of Republican-candidate name mentions correlates with the Republican vote margin in the subsequent election. This finding persists even when controlling for incumbency, district partisanship, media coverage of the race, time, and demographic variables such as the district’s racial and gender composition. With over 500 million active users in 2012, Twitter now represents a new frontier for the study of human behavior. This research provides a framework for incorporating this emerging medium into the computational social science toolkit.
The working paper (short!) is here. I’d appreciate your comments.
* Yes, he’ll be in the market in the Fall.
The interesting thing about technology is that early adopters tend to be very technical people. The average person who owned a computer in 1982 was probably educated and very interested in technology. A Popular Mechanics reader, if you will. Later, there is nothing remarkable about computer owners. Scientific literacy is not a precondition for computer use.
That leads me to a distinction: computer literacy vs. digital natives. The computer literate is someone who is steeped in the ways of computing. Not a professional engineer, but they approach a computer the way some people approach a car. It’s a machine, you can take it apart, make it do things, and so forth. The digital native is some who is comfortable with computers because they grew up around them. They are consumers of computers, not builders. They know how to use computer, but they can’t really write code or otherwise command a computer. This isn’t necessarily a bad thing. It should be expected that when a technology is well diffused that it is easy to use and requires little training or knowledge.
Here’s Joel West giving a primer (at Berkeley) on open and user innovation.
I’m a sucker for nutty futurist speculations. So bear with me on this one.
A few nights ago I was watching Neal Stephenson’s talk on “getting big stuff done,” where he bemoans the lack of aggressive technological progress in the past forty or so years. There’s obviously some debate about this, though he makes some good points. He raises the question of why, for example, we haven’t yet built a 20km tall building despite the fact that it appears to be technologically very feasible with extant materials. Nutty. But an interesting question. From a sci-fi writer.
Stephenson ends his talk on an organizational note and asks:
What is going on in the financial and management worlds that has caused us to narrow our scope and reduce our ambitions so drastically?
I like that question. Even if you think that ambitions have not been lowered, I think all of us would like to see the big problems of the world addressed more aggressively. (Unless one subscribes to the Leibnizian view that we live in the “best of all possible [organizational] worlds.”) Surely organization theory is central to this. This is particularly true in cases where technologies and solutions for big problems seemingly already exist – but it is the social technologies and organizational solutions that appear to be sub-optimal. So, how can more aggressive forms of collective action and organizational performance be realized? I don’t see org theorists really wrestling with these types of questions, systematically anyways. It would be great to see some more wide-eyed speculation about the organizational forms and theories that perhaps might facilitate more aggressive technological, social and human progress.
I can see several reasons for why organization theorists don’t engage with these types of, “futurist” questions. First, theories of organization tend to lag practice. That is, organizational scholars describe and explain the world (in its current or past state), though they don’t often engage in speculative forecasting (about possible future states). Second, many of the organizational sub-fields suited for wide-eyed speculation are in a bit of a lull, or they represent small niches. For example, organization design isn’t a super “hot” area these days (certainly with exceptions) — despite its obvious importance. Institutional and environmental theories of organization have taken hold in many parts, and agentic theories are often seen as overly naive. Environmental and institutional theories of course are valuable, but they delimit and are incremental, and are perhaps just self-fulfilling and thus may not always be practically helpful for thinking about the future.
That’s my (very speculative) two cents.
I’ve been reading up on intellectual property of late. Here are some sources worth perusing and reading (some of them can be downloaded for free), along with some interviews and clips.
- Boldrine, M. and Levine, D. (2008.) Against Intellectual Monopoly (you can download all the chapters on the website). Cambridge University Press.
- Boyle, J. (2008.) The Public Domain: Enclosing the Commons of the Mind. Yale University Press. (Here’s a short lecture based on the book.)
- Cohen, J. (2012.) Configuring the Networked Self: Law, Code and the Play of Everyday Practice. Yale University Press. Here’s the open version. (And, lecture at Berkman.)
- Johns, A. (2010.) Piracy: The Intellectual Property Wars from Gutenberg to Gates. University of Chicago Press. (Here’s a C-SPAN interview.)
- Lessig, L. (2001.) The Future of Ideas: The Fate of the Commons in a Connect World. Random House.
- Merges, R (2011.) Justifying Intellectual Property. Oxford University Press.
- Zemer, L. (2007). The Idea of Authorship in Copyright. Ashgate Publishing.
Interestingly, there isn’t meaningfully any kind of sociology of intellectual property, that I am aware of (feel free to correct me). Though several of the above scholars do call for increased dialogue between law and the social sciences (e.g., Julie Cohen), though this seems to be a relatively nascent area.
There is of course the “social construction” argument (e.g., that authorship or ownership is a myth)—a favorite argument of mine (e.g., see Beethoven and the Construction of Genius)—or the ubiquitous and tired references to “networks” (help!), but it seems that there is much opportunity in this space.
I’m sort of intrigued by the various innovations emerging from the Occupy Wallstreet Movement (I posted at strategyprofs about some of the tech ones, specifically apps).
One of the cooler, more low-tech innovations (ok, ok, these have been around for a long time – but still) is the use of the “human microphone” – note that the wiki entry was initiated just two weeks ago. Occupy also has its own hand signals (and, check out the hand signals for consensus decision-making). Cool. Twinkles.
Here’s a hand signal tutorial:
[link via David Lazer]
Twitter is getting lots of interest from social scientists. Here’s a piece from the current issue of Science about how “social scientists wade into the tweet stream” (the figure below is from this article). And, an NPR piece on a forthcoming Science article by Macy and Golder on affect and mood and twitter.
There’s lots that is nutty about the Quantified Self movement. But I love it nonetheless. Here’s the blog, Quantified Self.
And, here’s an example of someone who carefully tracked social interactions, for years.
Other than financial measures (like ROA) I can’t think of another firm-level variable that is more commonly used in organizational studies than patent activity. Patents are used to track everything from innovation to technological niches to social networks among scientists. Patents are an all-purpose measure because we think they are tightly linked to creativity and knowledge production, the engine that drives both science and capitalist enterprise. But what if this is increasingly not true? What if patent use is becoming decoupled from creativity?
This is one of the questions posed made by last week’s This American Life, my favorite NPR show and one of the most consistently interesting programs of journalism out there. The show talked about patent trolls – companies or individuals who acquire patents for the primary purpose of suing other actors who might use technology that potentially infringes on that patent. The show focused on the firm, Intellectual Ventures, and its founder Nathan Myhrvoid. Through a couple of interesting vignettes and sly investigations, they showed how the company uses lawsuits, brought by a number of shell companies, to get large settlements out of technology companies, some of which are struggling enterpreneurial groups. The show demonstrates how, rather than protect and promote innovation, increasingly patents are being used to stifle innovation by wiping out or financially weakening companies that are actually trying to bring innovation to the marketplace. Meanwhile, patent trolls sit on those patents and do nothing to advance the innovations.
This must have some implications for our current understanding of patents as indicators of creativity and innovation. One of the startling revelations in the program was just how much redundancy there is in the patent system. The number of patents issued that cover the same basic function is often in the thousands, especially in the software industry. Patents may be more indicative of turf wars than they are of real innovation.
Even if you’re not a technology scholar, I highly recommend that you listen to the podcast of the show.
I really like what companies like kickstarter are doing — they provide a “crowdfunding”-type platform for artists. Artists and budding entrepreneurs can post project ideas and needs onto the web site and readers can pledge funds to help realize these projects (based on a threshold funding system). The projects range from hundreds of thousands of dollars to much smaller ones. (Warning: thumbing through the various projects is pretty addicting.) The wikipedia site for “crowdfunding” lists other such companies (e.g., kiva.org, sponsume, pledgemusic).
As NSF funding for the social sciences appears to be under threat, it would be great to see a crowdfunding model for academic research as well. There seem to be lots of potential benefits: a new source of funds could be tapped, researchers wouldn’t have to chase funds as funders might find them instead, new populations would be introduced to research, etc, etc. Lots of benefits, downsides of course too.
Pressure seems to be mounting as other disciplines are setting up online “TV stations.” Philosophy TV (philostv.org) features very engaging discussions between philosophers, similar in format to bloggingheads.tv (also a favorite). econstoriestv is a Russ Roberts venture — the site seems to largely be dedicated to the Keynes-Hayek rap videos (perhaps there is a part III to come). I really like the fact that academic content/discussion is now available in this type of format. What’s next? orgtheorytv?
- NPR story on removing traffic signs in Germany.
- Wired story on ‘Roads Gone Wild.’
- Tom Vanderbilt’s book Traffic: Why We Drive the Way We Do (and What It Says About Us)
- And, lets throw this in too — audio of Friedrich Hayek speaking in 1983 on ‘evolution and spontaneous order’