orgtheory.net

Archive for the ‘mere empirics’ Category

minor puzzle about academic hiring

A small puzzle about academic jobs: If getting “the best” is the true purpose of doing a job search, then why do academic programs stop interviewing after the 3rd person? Why it’s a puzzle: There seems to be an over-supply of PhD with good to excellent qualifications. Many never get called out for interviews.

Example: Let’s say you are a top 10 program about to hire an assistant professor. Then what do you look for? You want a graduate of a top 5 (or top 10, maybe) program with one or more hits in AJS/ASR/SF. Perhaps you want someone with a book contract at a fancy press.

You fly out three people. They all turn you down or they suck. The search stops – but this is odd!! These top 5 programs usually produce more than 3 people with these qualifications. Also, add in the fact that every year the market overlooks some really solid people in previous years. My point is simple – departments fly out 2 or 3 people per year but there are usually more than 2 or 3 qualified people!

The puzzle is even more pronounced for low status programs. Why do they stop at 3 candidates when there might be dozens of people with decent publication records who are unclaimed on the market or seriously under-placed? While a top program can wait for the next batch of job market stars, low status programs routinely pass up good people every year.

I have a few explanations, none of which are great. The first is cost – maybe deans and chairs don’t want to pay out more money per year. This makes no sense for top programs which can easily find an extra $1k or $2k for interview costs. For low budget programs, it’s a risk worth taking – that overlooked person could bring in big grant money later. Another explanation is laziness. Good hiring is classic free rider problem. Finding and screening for good people is a cost paid by a few people but the benefits are wide spread. So people do the minimum – fly a few out and move on. Tenure may also contribute to the problem – if you might hire someone for life, you become hyper-selective and only focus on one or two people that survived an intense screening process.

Finally, there may be academic caste. Top programs want an ASR on the CV… but only from people from the “right” schools. This explanation makes sense for top schools, but not for other schools. Why? There are usually quite a few people from good but not elite schools who look great on paper but yet, they don’t get called even though they’d pull up the dept. average.

Am I missing the point? Tell me in the comments! Why is academic hiring so odd?

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($4.44 – cheap!!!!)/Theory for the Working Sociologist (discount code: ROJAS – 30% off!!)/From Black Power/Party in the Street / Read Contexts Magazine– It’s Awesome! 

Advertisement

Written by fabiorojas

March 29, 2018 at 4:11 am

i declare complete victory in the more tweets, more votes debate

In 2013, my collaborators and I published a paper claiming that there is an empirical correlation between relative social media activity and relative vote counts in Congressional races. In other words, if people are talking about the Democrat more than the Republican on Twitter, then the Democrat tends to get more votes. Here’s the regression line from the original “More Tweets, More Votes” paper:

MTMVjournal.pone.0079449.g001.png

People grumbled and complained. But little by little, evidence came out showing that the More Tweets/More Votes model was correct. For example, an article in Social Science Quarterly showed the same results for relative Google searches and senate races:

senate_google

Latest evidence? It works for wikipedia as well. Public Opinion Quarterly published a piece called “Using Wikipedia to Predict Election Outcomes: Online Behavior as a Predictor of Voting” by Benjamin Smith and Abel Gustafason. From the abstract:

We advance the literature by using data from Wikipedia pageviews along with polling data in a synthesized model based on the results of the 2008, 2010, and 2012 US Senate general elections. Results show that Wikipedia pageviews data significantly add to the ability of poll- and fundamentals-based projections to predict election results up to 28 weeks prior to Election Day, and benefit predictions most at those early points, when poll-based predictions are weakest.

Social media DOES signal American election outcomes! I spike the football. I won. Period.

It’s pretty rare that you propose a hypothesis, your prove it’s right and then it is proved right a bunch of times by later research.

#winning

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($4.44 – cheap!!!!)/Theory for the Working Sociologist (discount code: ROJAS – 30% off!!)/From Black Power/Party in the Street / Read Contexts Magazine– It’s Awesome 

Written by fabiorojas

September 19, 2017 at 4:01 am

unresolved controversy bleg

Installing Order, the sociology of science and technology blog, has a request – can you identify scholarly work about unresolved scientific controversies? 

I need your help: anybody know a few research papers or a book specifically about unresolved controversies? It would be terrific if there was some conceptualization, or even a functional analysis of the manifest and latent consequences of unresolved controversies. In fact, it would be amazing to see research on “intentionally unresolved controversies.”

My hunch is that they should be rare because writers probably want to focus on narrative with clear stories. Anthropology is full of unresolved controversies, so maybe focusing on the writing surrounding Napoleon Chagnon might be helpful.

What would you suggest?

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($2!!!!)/From Black Power/Party in the Street

Written by fabiorojas

November 28, 2016 at 12:33 am

critique of a recent ajs genetics paper: levi-martin v. guo, li, wang, cai and duncan

John Levi-Martin has written a comment on a recent paper by Guo, Li, Wang, Cai, and Duncan  claiming that the social contagion of binge drinking associated with a medium genetic propensity. Levi-Martin claims that GLWCD having simply misread their data:

Guo, Li, Wang, Cai and Duncan (2015) recently claimed to have provided evidence for a
general theory of gene-environment interaction. The theory holds that those who are labelled as having high or low genetic propensity to alcohol use will be unresponsive to environmental factors that predict binge-drinking among those of moderate propensity. They actually demonstrate evidence against their theory, but do not seem to have understood this.
The main claim is that GLWCD are testing against nulls rather than properly estimating a U-shaped effect:
This is consequential because of the way that choose to examine their data. Although
the verbal description of the swing theory here refers to the comparison of magnitudes  (“more likely”), the methods used by GLWCD involve successive tests of the null hypothesis across three subsets formed by partitioning the sample by level of what is termed genetic propensity. If we denote these three subsets L, M and H, standing for low, medium and high propensity, then, for the kth predictor, they estimate three slopes, bLk, bMk, and bHk. Because the swing theory does not require that any particular predictor have an effect, but only that if it does, it does not in the extreme propensity tiers, this theory holds that for any k, bLk≈bHk≈ 0.
Publishing note: The comment is on SocArXiv for all to read. If the criticism holds water, it’s a shame that it is not in a journal, preferably the AJS. If journals simply aren’t interested in error correction, then they simply aren’t into science.
50+ chapters of grad skool advice goodness: Grad Skool Rulz ($2!!!!)/From Black Power/Party in the Street

Written by fabiorojas

November 21, 2016 at 3:29 am

social science did ok with the 2016 election but not great

masket-graph

From Seth Masket at Pacific Standard.

People have been having meltdowns over polls, but I’m a bit more optimistic. When you look at what social science has to say about elections, it did ok last week. I am going to avoid poll aggregators like Nate Silver because they don’t fully disclose what they do and they appear to insert ad hoc adjustments. Not horrible, but I’ll focus on what I can see:

  1. Nominations: The Party Decides model is the standard. Basically, the idea is that party elites choose the nominee, who is then confirmed by the voters. It got the Democratic nomination right but completely flubbed the GOP nomination. Grade: C+.
  2. The “fundamentals” of the two party vote: This old and trusty model is a regression between two party vote share and recent economic conditions. Most versions of this model predicted a slim victory for the incumbent party. The figure above is from Seth Masket, who showed that Clinton 2 got almost exactly what the model predicted. Grade: A
  3. Polling: Averaged out, the poll averages before the election showed Clinton 2 getting +3.3 more points than Trump. She is probably getting about %.6 more than Trump. So the polls were off by about 2.7%. That’s within the margin of error for most polls. I’d say that’s a win. The polls, though, inflated the Johnson vote. Grade: B+.
  4. Campaigns don’t matter theory: Clinton 2 outspent, out organized, and out advertised Trump (except in the upper midwest) and got the same result as a “fundamentals” model would predict. This supports the view that campaigning has a marginal effect in high information races. Grade: A.

But what about the Electoral College? Contrary to what some folks may think, this is a lot harder to predict because state level polls produce worse results in general. This is why poll aggregators have to tweak the models a lot to get Electoral College forecasts and why they are often off. Also, the Electoral College is designed to magnify small shifts in opinion. A tiny shift in, say, Florida could move your Electoral College total by about 5%. Very unstable. That’s why a lot of academic steer clear of predicting state level results. All I’ll say is that you should take these with a grain of salt.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($2!!!!)/From Black Power/Party in the Street

Written by fabiorojas

November 15, 2016 at 12:01 am

driverless cars vs. police departments

In my view, driverless cars are revolutionary. At the very least, they will eliminate a major health problem – auto injuries and fatalities. No system will be accident free, but driverless cars will be better at driving that most humans, they don’t get drunk, and they won’t drive recklessly.

There is another social consequence of driverless cars that needs discussion. Driverless cars will seriously disrupt police departments. Why? A lot of police department revenue comes from moving vehicle violations and parking tickets. In a recent news item, one judge admitted that many small town fund their police department entirely through speeding tickets. Even a big city police department enjoys the income from tickets. New York City receives tens of millions in moving violation fines. This income stream will evaporate.

Another way that driverless cars will disrupt police departments is that they will massively reduce police stops. If a driverless car has insurance and registration (which can be transmitted electronically) and drives according to the rules of the road, then police, literally, have no warrant to pull over a car that has not been previously identified as related to a specific crime. Hopefully, this means that police will no longer use moving violations as an excuse to pull over racial minorities.

Even if a fraction of the hype about driverless cars turns out to be true, it would be a massive improvement for humanity. Three cheers for technology.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($2!!!!)/From Black Power/Party in the Street 

 

 

Written by fabiorojas

November 3, 2016 at 12:15 am

the parking lot theory of third parties

Right now, we aren’t seeing a collapse of Donald Trump. Instead, we’re seeing (a) Clinton 2 steady at about 45% in the four way race and (b) Trump moving from about 40% to 43%. That means that the third party vote is collapsing. Johnson is dropping from a summer high of 10% to 4%. How can that be?

My current favorite explanation is the “parking lot” theory of American third parties. Most people today are highly polarized, which means they strongly sort themselves into parties and stick with it. A number of people, including myself, have argued that parties are a sort of social identity. Perhaps not as fundamental as gender or racial identity, but important none the less. The consequence of party-identity theory is  that people usually become defensive about their identity and they are loathe to leave it.

“Parking lot” theory is a corollary of party-identity theory. When people are faced with a horrible candidate from their party, they become defensive and don’t want to give it up. They refuse to consider third parties. At best, third parties become “parking lots” for voters who are indecisive or embarrassed until they finally pull the lever for mainstream parties. I suspect that is what resulted in those 10% polls for Johnson. The Libertarian Party was simply the “parking lot” for 5% of American voters who fully intend to vote GOP but are too embarrassed by Trump. There is a real libertarian vote out there, but it is in the low single digits. Definitely not 10%.

I’m not the first to make this argument. In fact, one my BGS* pointed out that this argument appears in Shafer and Spady’s recent book The American Political Landscape. They don’t develop it fully but the historical data is there. In 2016, we see a spike of 10% for the Libertarians but they’ll be lucky to get 5% on polling day. In 1992, Perot peaked in the 30%  range but ended up with 19% (still impressive) and then 10% in 1996. In 1980, John Anderson peaked at 20% but ended up with a paltry 7%.  Nader 2000 is probably the only modern third party candidate that wasn’t a voter parking lot. He polled consistently in the 2%-4% range and got 3%.

Bottom line: The Libertarians may spoil a state or two this round, but they are doomed to be a voter parking lot.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($2!!!!)/From Black Power/Party in the Street 

* Brilliant Grad Students

Written by fabiorojas

November 2, 2016 at 12:01 am

new ways to measure movements via hyper network sampling

Rory, from the home office in South Bend, sends me links to new social movement research. A major question in social movement research is how you measure contentious politics. A lot of our sources are notoriously incomplete, such as media accounts. Kriage Bayerln, Peter Barwis, Bryant Crubaugh, and Cole Carnesecca use hypernetwork sampling (asking a random sample of people to list their social ties) to attack this issue. From Sociological Methods and Research:

The National Study of Protest Events (NSPE) employed hypernetwork sampling to generate the first-ever nationally representative sample of protest events. Nearly complete information about various event characteristics was collected from participants in 1,037 unique protests across the United States in 2010 to 2011. The first part of this article reviews extant methodologies in protest-event research and discusses how the NSPE overcomes their recognized limitations. Next, we detail how the NSPE was conducted and present descriptive statistics for a number of important event characteristics. The hypernetwork sample is then compared to newspaper reports of protests. As expected, we find many differences in the types of events these sources capture. At the same time, the overall number and magnitude of the differences are likely to be surprising. By contrast, little variation is observed in how protesters and journalists described features of the same events. NSPE data have many potential applications in the field of contentious politics and social movements, and several possibilities for future research are outlined.

Readers in social network analysis and organization studies will recognize the importance of this technique. As long as you can sample people, you can sample social ties and the adjust the sample for repetition. Peter Marsden used this technique to sample organizational networks. In movements research, my hybrids paper used the technique to sample orgs that were involved in movement mobilization. It’s great to see this technique expand to sample large samples of events.See Bayerlin’s research project website for more. Recommended.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($2!!!!)/From Black Power/Party in the Street 

Written by fabiorojas

October 31, 2016 at 1:34 am

the mind blowing achievement of john hattie – or, we know how to run schools

We often act as if running a school is a mysterious thing. It’s not. There have been thousands of studies looking at every sort of education policy. John Hattie is an educational researcher in Australia who took the time to collect data from thousands of studies and do a meta-meta analysis to figure out what works.

He has a number of books and articles that summarize his findings. Below, I have included a diagram where he standardizes the effects of 195 factors that might affect achievement and ranks them. Major take home points. Here is what predicts achievement in a big way:

  • Prior performance – by far, the biggest predictor of future achievements are estimates of past work (#1 teacher assessment, #3 self reported grades).
  • Process oriented learning (“Piagetian programs” – don’t focus on outcomes, but on how you get the outcome – #4)
  • Teacher practices aimed at individual students – such as intervening directly disabled pupils (#10, #11), micro-teaching (e.g., one on one interaction with students – #9),  and integrating classroom discussion (#10).

What clearly has a negative effect?

  • Home corporal punishment (#193)
  • Television watching (#192)
  • Summer vacation (#190)
  • Student depression (#195)

What has surprisingly small effects (defined as about .1 or less)?

  • School type – being in a charter school, a single sex school, or learning at a distance (all have nearly zero effects)
  • Mentoring
  • Student diversity
  • Teacher credentials

In other words, the baseline is student ability, which determines who well they do. But you can also get big effects through hands on, processed based, and interactive learning. You should avoid disruptive things, like vacations or television, and the school and teacher credentials don’t get you much.

Thank you, John Hattie.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($2!!!!)/From Black Power/Party in the Street

Hattie rank below:

Read the rest of this entry »

Written by fabiorojas

September 12, 2016 at 12:01 am

did bill clinton accelerate black mass incarceration? yes, but he did put a bunch of white people in prison to even it out

cpus_race_national

Pam Oliver has a fascinating post that empirically investigates incarceration trends during Clinton 1 era (1993-2001). It’s an impressive post. Professor Oliver pulls up a lot of data on overall incarceration rates and breaks it down by race. You should read it yourself, but here is my summary, diagrams are from her article:

  • Imprisonment rates, overall, kept on increasing during the entire Clinton 1 presidency.
  • By race, Black imprisonment rates increased till about 2000 and then plateaued. It started at 75/100,000 and then peaked around 200 per 100,000 and then stabilized. There are huge increases, in rates, for Whites, Hispanics, and Native Americans. Asian rates seem to be stable.
  • The story of racial disparity is a bit more complex. Roughly speaking, the Blackness of the prison population peaked around 1995 (see below). Then the Black/White ratio in prisons began to decline.

cpus_blackdisp_national

My interpretation. First, you have to distinguish between between absolute and relative effects. To be blunt, Black mass incarceration in absolute terms unequivocally increased during the Clinton 1 years. Period. Perhaps the only qualifier is that it eventually stabilized, but the Black imprisonment rate never declined or even remotely went back to the levels of the 1980s or early 1990s. Mass incarceration was built in the 1980s and 1990s and it was here to stay.

The real question is why it stabilized. One hypothesis is that it was a policy effect. Perhaps in the late-1990s, there were policy changed that took effect circa 2000. A second hypothesis is that the prison system became saturated and there weren’t any more people to imprison from that population. Professor Oliver’s data are not enough to settle the question.

Second, the real story is in relative rates. Imprisonment became a much more equal system in the 1990s. In other words, prison shifted from being a Black institution to more of an all American institution. My hypothesis is that the drug war machine simply reached its limit in imprisoning Black and expanded by targeted low income white.

In this data, the American prison system appears as a hungry beast, ruthlessly scooping up low SES populations one at a time. After being built in the 1950s and 1960s by liberal reformers, the American justice system now had the power to quickly and swiftly punish people. In the 1970s and 1980s, Republican and Democratic administrations turned this machine on urban blacks and went unstopped until the early 2000s. The machine then turned to poor whites in the 1990s and a similar machine was built to imprison and deport Mexican and Central American migrants.

Francis Fukuyama wrote that we reached the end of history because liberal capitalism won over its socialist and fascist competitors. The sad truth is that the history must continue and the next chapter will be the struggle to liberate the world’s people from predatory prison states.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($2!!!!)/From Black Power/Party in the Street

Written by fabiorojas

September 7, 2016 at 12:18 am

trump symposium iii: social science and trump

In this third installment of the Trump symposium, I want to talk about how social scientists should think about Trump. Let’s start with prediction – who foresaw Trump? We need to make a distinction.

So what should a social scientist do?

  • Start with the following mantra: Social science is about trends and averages, rarely about specific cases. If an outsider becomes a major party nominee once, then you can cling to the old theory. If you get three Trumps in a row, then it’s time to dump the Party Decides model, unless, of course, you see the party openly embrace Trump and he becomes the new establishment.
  • Feel confident: One crazy case doesn’t mean that you dump all results. For example, polling still worked pretty well. Polls showed a Trump rise and, lo and behold, Trump won the nomination. Also, polls are in line with basic models of presidential elections where economic trends set the baseline. The economy is ok, which means the incumbent party has an advantage. Not surprisingly, polls show the Democratic nominee doing well.
  • Special cases: Given that most things in this election are “normal,” it’s ok to make a special argument about an unusual event. Here, I’d say that Trump broke the “Party Decides” model because he is an exceptionally rare candidate who doesn’t need parties. Normally, political parties wield power because politicians don’t have money or name recognition. In contrast, Trump has tons of money and a great media presence. He is a rare candidate who can just bypass the whole system, especially when other candidates are weak.

What does the future hold? Some folks have been raising the alarms about a possible Trump win. So far, there is little data to back it up. In the rolling Real Clear Politics average of polls, Trump is consistently behind. In state polls, he is behind. He has no money. He has not deployed a “ground game.”In fact, the RCP average has had Clinton 2 ahead of Trump every single day since October with the exception of the GOP convention and about two of the worst days of the Democratic campaign. Is it possible that Trump will be rescued by a sudden wave of support from White voters? Maybe, but we haven’t seen any movement in this direction. A betting person would bet against Trump.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($2!!!!)/From Black Power/Party in the Street

Written by fabiorojas

September 1, 2016 at 12:44 am

slave names no longer forgotten

The Virginia Historical Society has a website that brings together many documents from the antebellum period of American history so that you can search for the names of African Americans who might otherwise be lost to history. From the website:

This database is the latest step by the Virginia Historical Society to increase access to its varied collections relating to Virginians of African descent. Since its founding in 1831, the VHS has collected unpublished manuscripts, a collection that now numbers more than 8 million processed items.

Within these documents are numerous accounts that collectively help tell the stories of African Americans who have lived in the state over the centuries. Our first effort to improve access to these stories came in 1995 with publication of our Guide to African American Manuscripts. A second edition appeared in 2002, and the online version is continually updated as new sources enter our catalog (http://www.vahistorical.org/aamcvhs/guide_intro.htm).

The next step we envisioned would be to create a database of the names of all the enslaved Virginians that appear in our unpublished documents. Thanks to a generous grant from Dominion Resources and the Dominion Foundation in January 2011, we launched the project that has resulted in this online resource. Named Unknown No Longer, the database seeks to lift from the obscurity of unpublished historical records as much biographical detail as remains of the enslaved Virginians named in those documents. In some cases there may only be a name on a list; in others more details survive, including family relationships, occupations, and life dates.

Check it out.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($2!!!!)/From Black Power/Party in the Street

 

Written by fabiorojas

June 24, 2016 at 12:07 am

agent based models in sociology, circa 2016

A few days ago, we had a discussion about the different meanings of the word “computational sociology.” A commenter wrote the following:

Are agent based models/simulations a dead end? Are smart people still using that technique? Have there been any important results? I didn’t realize it peaked in the 1980s.

I’m a current doctoral student considering pursuing ABM, but if it’s a dead end then maybe not.

I think that olderwoman’s response is on target. There is nothing out of style about ABM’s, but sociology is mainly a discipline of empiricists. You will find scholars who occasionally to ABMs but no one who ONLY does is very, very rare. Examples of people who have done simulations: Damon Centola, Kathleen Carley, Carter Butts. In my department, I can think of two people who have published simulations (Clem Brooks, Steve Benard, and myself) and those who do methods research often employ simulations. Olderwoman is also correct in that writing simulations helps you develop programming skills that are now required for “big data” work and for industry.

So don’t write an all simulation dissertation, but by all means, if you have good ideas, simulate them!

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($2!!!!)/From Black Power/Party in the Street

Written by fabiorojas

June 14, 2016 at 12:01 am

twitter’s glass ceiling for women

Twitter Glass Ceiling Image

This diagram compares Twitter users with male and female profiles. Female users are baseline. IRR means “incidence rate ratio.” The quartiles refer to quartiles of users – Q1 Twitter users have very few followers and Q4 users have many followers.. See the paper for study details.

I have a new paper that will be presented at the 10th International Conference on Web and Social Media. This paper, written by Shirin Nilizadeh, Anne Groggel, Peter Lista, Srijita Das, YY Ahn, Apu Kapadia, and myself, documents an important phenomena on social media: there seems to be a “glass ceiling” that penalizes women who strive to be visible on social media.

We took a random sample of 100,000 Twitter users and asked – what is the difference in visibility between those who appear to be male and female in their profile?* Answer, not much – except among those users who have a lot of followers. The nearly identical level of visibility suddenly shifts and those with male profiles have more followers and the difference is significant. Similar results are found using other measures of visibility like retweets and the results hold accounting for user behavior, off line visibility, and other factors.

* There is the subtle point of users who do not present a gender. The paper deals with that. Read it for details.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($2!!!!)/From Black Power/Party in the Street

Written by fabiorojas

March 30, 2016 at 12:01 am

the black democratic primary vote is completely predictable

There is a lot of punditry about Bernie Sander’s inability to make a dent in the Black vote. This is crucial because a lot of Hillary Clinton’s delegate lead comes from massive blowouts in the Deep South. Even a small movement in the Black vote would have turned Sander’s near losses in Massachusetts, Illinois, and Missouri into narrow wins.

My approach to this issue – Sanders’ poor performance among Blacks is entirely predictable. Post-Civil Rights, the urban black population became heavily integrated into the mainstream of the Democratic party. The connection is so tight that some political scientists have used the African-American vote as a classic example of “voter capture” – a constituency so tightly linked to a party that there is no longer any credible threat of moving to another party and the party takes them for granted.

If you believe that, then you get a straight forward prediction – Black voters will overwhelmingly support the establishment candidate. Why? Black voters are the establishment in the Democratic party. As a major constituency, they are unlikely to vote against someone who already reflects their preferences. Here’s some evidence:

The exceptions:

The pattern is exceptionally clear. Black voters overwhelmingly support establishment candidates. The only exceptions are when you have an African American candidate of extreme prominence, like Obama the wunderkind or Jesse Jackson the civil rights leader. And then there’s a tipping point where almost the entire voting block switches to a new candidate. So Bernie is actually hitting what a normal challenger hits in a  Democratic primary but that simply isn’t enough to win.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($2!!!!)/From Black Power/Party in the Street

Written by fabiorojas

March 25, 2016 at 12:01 am

statistics vs. econometrics – heckman’s approach

Over at Econ Talk, Russ Roberts interviews James Heckman about censored data and other statistical issues. At one point, Roberts asks Heckman what he thinks of the current identification fad in economics (my phrasing). Heckman has a few insightful responses. One is that a lot of the “new methods” – experiments, instrumental variables, etc. are not new at all. Also, experiments need to be done with care and the results need to be properly contextualized. A lot of economists and identification obsessed folks think that “the facts speak for themselves.” Not true. Supposedly clean experiments can be understand in the wrong way.

For me, the most interesting section of the interview is when Heckman makes a distinction between statistics and econometrics. Here’s his example:

  • Identification – statistics, not economics. The point of identification is to ensure that your correlation is not attributable to an unobserved variable. This is either a mathematical point (IV) or a feature of research design (RCT). There is nothing economic about identification in the sense that you need to understand human decision making in order to carry out identification.

In contrast, he thought that “real” econometrics was about using economics to guide statistical modelling or using statistical modelling to plausibly tell us how economic principles play out in real world situations. This, I think, is the spirit of structural econometrics, which demands the researcher define the economic relation between variables and use that as a constraint in statistical estimation. Heckman and Roberts discuss minimum wage studies, where the statistical point is clear (raising wages do not always decrease unemployment) but the economic point still needs to be teased out (moderate wage increases can be offset by firms in others ways) using theory and knowledge of labor markets.

The deeper point I took away from the exchange is that long term progress in knowledge  is not generated by a single method, but rather through careful data collection and knowledge of social context. The academic profession may reward clever identification strategies and they are useful, but that can lead to bizarre papers when the authors shift from economic thinking to an obsession with unobserved variables.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($2!!!!)/From Black Power/Party in the Street

Written by fabiorojas

March 24, 2016 at 12:01 am

algorithms in society – comments on a talk by jure leskovec

Leskovec2

Jure Leskovec describes his bail data… in technicolor.

Jure Leskovec is one of the best computer scientists currently working on big data and social behavior. We were lucky to have him speak at the New Computational Sociology conference so I was thrilled to see he was visiting IU’s School of Informatics. His talk is available here.

Overall, Jure explained how data science can improve human decision making and illustrated his ideas with analysis of bail decision data (e.g., given data X, Y, and Z on a defendant, when do judges release someone on bail?). It was a fun and important talk. Here, I’ll offer one positive comment and one negative comment.

Positive comment – how decision theory can assist data analysis and institutional design: A problem with traditional social science data analysis is that we have a bunch of variables that work in non-linear ways. E.g., there isn’t a linear path from the outcome determined by X=1 and Y=1 to the outcome determined by X=1 and Y=0. In statistics and decision-theoretic computer science, the solution is to work with regression trees. If X=1 then use Model 1, if X=2 then use Model 2, and so forth.

Traditionally, the problem is that social scientists don’t have enough data to make these models work. If you have, say, 1000 cases from a survey, chopping up the sample into 12 models will wipe out all statistical power. This is why you almost never see regression trees in social science journals.

One of the advantages of big data is simply that you now have so much data that can chop it up and be fine. Jure chopped up data from a million bail decisions from a Federal government data base. With so much power, you can actually estimate the trade-off curves and detect biases in judicial decision making. This is a great example of where decision theory, big data, and social science really come together.

Criticism – “algorithms in society:” There was a series of comments by audience members and myself about how this algorithm would actually “fit in” to society. For example, one audience member asked how people could “game the system.” Jure responded that he was using only “hard” data that can’t be gamed like the charge, prior record, etc.That is not quite right. For example, what you are charged with is a decision made by prosecutors.

In the Q&A, I followed up and pointed out that race is highly correlated with charges. For example, in the Rehavi & Starr paper at the Yale Law Review, we know that a lot of the difference in time spent in jail is attributable to racial difference in charges. Using Federal arrest data, Blacks get charged with more serious crimes for the same actions. Statistically, this means that race is strongly correlated with the severity of the charge. In the Q&A, Jure said that adding race did not improve the model’s accuracy. But why would it if we know that race and charged are highly inter-correlated? That comment misses the point.

These two comments can be summarized as “society changes algorithms and algorithms change society.” Ideally, Jure (and myself!) would love to see better decisions. We want algorithms to improve society. At the same time, we have to understand that (a) algorithms depend on society for data so we have to understand how the data is created (i.e., charge and race are tied together) and (b) algorithms create incentives to come up with ways to influence the algorithm.

Good talk and I hope Jure inspired more people to think about these issues.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($2!!!!)/From Black Power/Party in the Street

Written by fabiorojas

March 14, 2016 at 12:10 am

#bern out?

To defeat Hillary Clinton in a Democratic primary, a challenger has to do the following:

  • Push HRC’s solid support from about 45% of Democratic voters to about 40%.

Obama did that by swaying Black voters and some educated professionals. #Bern has not moved an inch on that 45%. Without losing HRC’s base, it is relatively easy for HRC to acquire  few more undecideds and win. So far, the difference between Obama and Sanders is that “last yard.”

Also, this time around, HRC is prepared. I still think she is a horrible campaigner, as she’s now losing the fund raising battle and has blown big leads. Amazing, since she’s the person that sitting vice presidents have made space for. Still, though, HRC now understands the importance of caucuses and has held up in two caucus states. We are not seeing a repeat of 2008 when HRC bungled by leaving caucuses and later primary states uncontested.

So the writing is on the wall, #Bern needs to win caucuses AND needs to move a much larger portion of the Black vote. We haven’t seen either yet.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($2!!!!)/From Black Power/Party in the Street

Written by fabiorojas

February 22, 2016 at 12:04 am

#bern nevada

The  basic truth of politics is that incumbents have huge advantages and those favored by incumbents also have an advantage. Thus, the fundamentals favor a candidate like Hillary Clinton over Bernie Sanders.It’s not an iron clad law, but it is a big factor that shapes most political races.

The presidential primary, in both parties, the process can be about four months long and it is very complex. Historically, the winners rarely knock out all opponents in the early states of Iowa and New Hampshire. Usually, early states weed out weak candidates and leave only two or three serious competitors who “settle” the election on Super Tuesday. You have to go back about 40 years, to the 1960 election, to find a primary where candidates where struggling into the summer. Since the birth of the modern primary system in the 1970s, those who lead after Super Tuesday tend to win it all since it is hard to overcome the delegate lead at that point.

So that is where Nevada fits in. Previously an unimportant state, Nevada is now important. Sanders needs to keep a positive narrative going into Super Tuesday so that he can continue to fund raise and swamp the Clinton campaign in the media and on the ground in the Super Tuesday states. A tie or loss in Nevada would dampen things and make it harder to sway Black voters in South Carolina, who might only defect in sufficiently large numbers after a Sanders win. Also, a Nevada win could soften the blow of a close loss in South Carolina since Sanders could claim that he’s 2-2 againt HRC.  In other words, Sanders needs a chain of wins to overcome the advantages that Clinton has in terms of name recognition and access to easy money. If Nevada #Berns this weekend, then I will see it as the first actively visible sign that the Democratic party is tipping away from the DLC/Clinton centrist faction of the 1990s. Until then, “advantage incumbent.”

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($2!!!!)/From Black Power/Party in the Street

Written by fabiorojas

February 19, 2016 at 12:01 am

bmi – a conceptual mess, or, andrew perrin i take back what i said earlier

One of the basic measures of health is body mass index (BMI). It is meant to be a simple measure of a person’s obesity which is also correlated with morbidity. Recently, I spent some time researching the validity of BMI. Does it actually measure fatness? The answer is extremely confusing.

If you google “validity of BMI as a measurement of obesity,” you get this article that summarizes a few studies. The original definition of obesity is that you need to have at least 25% body fat for men and 35% for women. This is hard to measure without special equipment, so BMI is the default. Thus to measure BMI validity, you need a sample of people and then compute the BMI and the body fat percentage (BF%). The Examine.com reports that in number of studies compared BF% and BMI. In some ways, BMI survives scrutiny. In non-obese people, as defined by BF%, BMI works well, but it seems to under-report obesity in others. In a few odd cases, mainly athletes with a lot of muscle weight, obesity is over-reported. Thus, you get a lot of mis-classification: “One meta-analysis on the subject suggests that BMI fails to classify half of persons with excess body fat, reporting them as normal or overweight despite having a body fat percentage classifying them as obese.” Translation, we are much fatter than we appear to be.

Then, soon after I read some of these studies, the LA Times reported on a new study from UCLA that examines the BMI-morbitiy correlation. In that study, researchers measured BMI and then collected data on biomarkers of health. This is done using the NHANES data set. See the study here. Result? A lot of fat people are actually quite healthy in the sense that BMI is not associated with cardio-pulmonary health (i.e., your heart stopping). This reminds me of an earlier discussion on this blog, where there were conflicting estimates of the obesity-mortality link and a meta-analysis kind of, sort of, shows an aggregate positive effect.

How do I approach the BMI issue as of today?

  • BMI is a rough measure of fatness (“adiposity”), but not precise enough for doctors to be making big judgments about patients on a single number/measurement.
  • BMI is not a terribly good predictor of mortality, even if there is a mild overall correlation that can be detected through meta-analysis.
  • BMI is probably not correlated with a lot of morbidity that we care about with some important exceptions like diabetes.

The lead author of the UCLA study said that this was the “last nail in the coffin” for BMI. She might be right.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($2!!!!)/From Black Power/Party in the Street

Written by fabiorojas

February 18, 2016 at 12:01 am

will the texas primary stop the #bern?

The Obama strategy in 2008 had a plan A and a plan B. Plan A was to knock out Hillary with big victories in Iowa and New Hampshire. Didn’t work. Plan B was to pad the delegate lead by exploiting small state caucuses and minimizing the damage in Hillary friendly places like New York. That worked, especially since the Hillary campaign was simply incompetent.

Sanders has a similar plan. His Plan A, the early knock out, almost worked. I suspect that Bernie might have even won the popular vote in Iowa, given that the Iowa Democratic Party is refusing to release vote tallies as they did in previous years. So Bernie is on to Plan B. That means he has to accomplish two things:

  • Max out caucus states.
  • Minimize losses in large primary states.

This is the list of remaining states in February and Super Tuesday and delegate totals for Democrats according to US election central:

  • Alabama 60
  • American Samoa caucus 10
  • Arkansas 37
  • Colorado caucus 79
  • Georgia 116
  • Massachusetts 116
  • Minnesota caucus 93
  • Nevada 43
  • Oklahoma 42
  • South Carolina 59
  • Tennessee 76
  • Texas 252
  • Vermont 26
  • Virginia 110

You will notice that Bernie has at least three easy states: Vermont, Massachusetts, and probably Minnesota. Then, it gets really hard, really fast. This is not because Hillary will magically become a great campaigner, but the fundamentals favor Hillary.

There are two reasons. First, you win Southern states in the Democratic primary by doing well among Black voters. South Carolina (Feb 27) will be the first test of how well Bernie can move these voters. If he comes up short in South Carolina, it’s bad news because you have more Southern states coming up real fast  such as Alabama and Georgia on Super Tuesday and other Southern states soon after that. Second, in March, you will see the types of big states that Hillary dominated in 2008 because of superior name recognition, such as Texas (51% for HRC in 2008), New York (57%), California (51%), Ohio (53%), and Pennsylvania (54%).

Is it impossible for Bernie to win the nomination? Of course not, but he needs to really dominate outside of the establishment friendly mega-states like Ohio and California. That means an immediate and massive turn around in the Black vote, a wipe out in the caucus states, and some strategy for containing the losses from the big states, which even challenged Obama. That sounds really hard to me.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($2!!!!)/From Black Power/Party in the Street

Written by fabiorojas

February 11, 2016 at 12:01 am

does piketty replicate?

Ever since the publication of Piketty’s Capital in the 21st Century, there’s been a lot of debate about the theory and empirical work. One strand of the discussion focuses on how Piketty handles the data. A number of critics have argued that the main results are sensitive to choices made in the data analysis (e.g., see this working paper). The trends in inequality reported by Piketty are amplified by how he handles the data.

Perhaps the strongest criticism in this vein is made by UC Riverside’s Richard Sutch, who has a working paper claiming that some of Piketty’s major empirical points are simply unreliable. The abstract:

Here I examine only Piketty’s U.S. data for the period 1810 to 2010 for the top ten percent and the top one percent of the wealth distribution. I conclude that Piketty’s data for the wealth share of the top ten percent for the period 1870-1970 are unreliable. The values he reported are manufactured from the observations for the top one percent inflated by a constant 36 percentage points. Piketty’s data for the top one percent of the distribution for the nineteenth century (1810-1910) are also unreliable. They are based on a single mid-century observation that provides no guidance about the antebellum trend and only very tenuous information about trends in inequality during the Gilded Age. The values Piketty reported for the twentieth-century (1910-2010) are based on more solid ground, but have the disadvantage of muting the marked rise of inequality during the Roaring Twenties and the decline associated with the Great Depression. The reversal of the decline in inequality during the 1960s and 1970s and subsequent sharp rise in the 1980s is hidden by a fifteen-year straight-line interpolation. This neglect of the shorter-run changes is unfortunate because it makes it difficult to discern the impact of policy changes (income and estate tax rates) and shifts in the structure and performance of the economy (depression, inflation, executive compensation) on changes in wealth inequality.

From inside the working paper, an attempt to replicate Piketty’s estimate of intergenerational wealth transfer among the wealthy:

The first available data point based on an SCF survey is for 1962. As reported by Wolff the top one percent of the wealth distribution held 33.4 percent of total wealth that year [Wolff 1994: Table 4, 153; and Wolff 2014: Table 2, 50]. Without explanation Piketty adjusted this downward to 31.4 by subtracting 2 percentage points. Piketty’s adjusted number is represented by the cross plotted for 1962 in Figure 1. Chris Giles, a reporter for the Financial Times, described this procedure as “seemingly arbitrary” [Giles 2014].9 In a follow-up response to Giles, Piketty failed to explain this adjustment [Piketty 2014c “Addendum”].

There is a bit of a mystery as to where the 1.2 and 1.25 multipliers used to adjust the Kopczuk-Saez estimates upward came from. The spreadsheet that generated the data (TS10.1DetailsUS) suggests that Piketty was influenced in this choice by the inflation factor that would be required to bring the solid line up to reach his adjusted SCF estimate for 1962. Piketty did not explain why the adjustment multiplier jumps from 1.2 to 1.25 in 1930.

This comes up quite a bit, according to Sutch. There is reasonable data and then Piketty makes adjustments that are odd or simply unexplained. It is also important to note that Sutch is not trying to make inequality in the data go away. He notes that Piketty is likely under-reporting early 20th century inequality while over-reporting the more recent increase in inequality.

A lot of Piketty’s argument comes from international comparisons and longitudinal studies with historical data. I have a lot of sympathy for Piketty. Data is imperfect, collected irregularly, and prone to error. So I am slow to  criticize. Still, given that Piketty’s theory is now one of the major contenders in the study of global inequality, we want the answer to be robust.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($2!!!!)/From Black Power/Party in the Street

Written by fabiorojas

February 10, 2016 at 12:01 am

the cuban victory in the republican party

A number of writers noticed that we overlooked an important bit of news last week during the Iowa caucus – two Latinos and a Black man took 60% of the Iowa GOP caucus. At the very least, this is newsworthy and merits explanation.

Here’s how we should understand the rise of Rubio and Cruz. The basic elements of minority party politics are as follows:

  • African Americans started in the GOP but moved to the Democratic party.
  • Groups that were forcibly assimilated into the US tend to go Democrat – Native Americans, Mexicans, Filipinos.
  • Groups that benefited from Cold War politics tend to lean GOP more than others – Vietnamese, Cubans.
  • Other voluntary migrants vary but if they are harassed or repressed they lean Democrat.

Using these rules of thumb, it is easy to see how Cruz and Rubio make a path to the top of the GOP. They are Cubans, who have influence in the GOP, especially in Florida. They are also from states with strong GOP parties – Florida and Texas. As many folks have noted, they downplay their ethnic background as well and kowtow to the anti-immigration crowd. Briefly, Rubio endorsed some sort of compromise on immigration but walked that back.

The rise of these two candidates does not represent a big swing of Latino voters to the GOP – that would only happen if large numbers of Mexicans defect from the GOP. It does however reflect an opening made possible by the complex history of US foreign relations. In the messy world of Cold War politics, the US chose to favor Cubans and, decades later, their children are steps away from the White House. And oddly, Castro might be alive to see it!

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($2!!!!)/From Black Power/Party in the Street

Written by fabiorojas

February 8, 2016 at 12:01 am

#bern toast

As of 10:45 pm, Hillary Clinton maintains a slim lead over Bernie Sanders in the 2016 Iowa Caucus. In terms of absolute performance, Sanders fans should be happy. When all votes are tallied, Sanders will either win the caucus or lose by a very slim margin. That means that Sanders will continue. He’ll win New Hampshire and make it to Super Tuesday and probably win a few more states.

However, in terms of winning the nomination, this is tough for Sanders. The reason is that Clinton is the party’s candidate and about 45% of voters in the Democratic party are extremely comfortable with her. They will only defect in sufficiently large numbers if they see that she is indeed crumbling and they need an unambiguous signal. If 2008 is any guide, Hilary can reliably depend on 40% – no matter what happens. Even after it was abundantly clear in 2008 that Clinton did not have a reasonable chance at catching Obama in the delegate count, she still kept winning big states like California, Pennsylvania and Ohio – by large margins (but not enough to make up for earlier losses).

Adding to the problem for Sanders is that Obama’s strategy – maxing out caucus states – only works once. Clinton’s campaign simply wasn’t prepared for it and they weren’t prepared for a campaign that went beyond Super Tuesday. They are prepared this time, poorly perhaps, but prepared. The close race in Iowa shows it.

Here’s the bottom line. When you fight the party’s candidate, you need to seriously knock them down to break the view that they are invincible. Obama did that with a completely unexpected 8% victory. A near miss or narrow victory by Sanders does not do that, so it will be very, very hard to trigger a mass migration that needs to happen over the next month for a Sanders win.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($2!!!!)/From Black Power/Party in the Street 

Written by fabiorojas

February 2, 2016 at 4:01 am

racial biases for low and higher performing students

Former BGS Yasmiyn Irizarry’s research on how teachers treat students of different races is featured in Scientific American’s podcast. From the Social Science Research article:

Education scholars document notable racial differences in teachers’ perceptions of students’ academic skills. Using data from the Early Childhood Longitudinal Study-Kindergarten Cohort, this study advances research on teacher perceptions by investigating whether racial differences in teachers’ evaluations of first grade students’ overall literacy skills vary for high, average, and low performing students. Results highlight both the overall accuracy of teachers’ perceptions, and the extent and nature of possible inaccuracies, as demonstrated by remaining racial gaps net literacy test performance. Racial differences in teachers’ perceptions of Black, non-White Latino, and Asian students (compared to White students) exist net teacher and school characteristics and vary considerably across literacy skill levels. Skill specific literacy assessments appear to explain the remaining racial gap for Asian students, but not for Black and non-White Latino students. Implications of these findings for education scholarship, gifted education, and the achievement gap are discussed.

Check it out.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($2!!!!)/From Black Power/Party in the Street 

Written by fabiorojas

January 15, 2016 at 12:01 am

naming your ethnographic informants: a talk by colin jerolmack

It is rare that I sit through a talk and just agree with about 99% of it. That is what happened when Colin Jerolmack visited IU last week and gave a talk about naming people and locations in ethnography. The argument is simple: the standard practice of masking people and places should not be the default for ethnography. Instead, the presumption should be naming people. Fake names should be the exception not the rule.

Colin’s paper, co-authored with Michigan’s Alex Murphy, makes the following points against masking:

  • Promising anonymity is not honest. A lot of ethnographies can be hacked pretty quickly.
  • Masking people deprives them of the benefit of having their names listed in print. In most cases, people appreciate seeing their names in a book or article. Once in a while, people get a specific pay off from being in a book (e.g., one of Colin’s informants lists his appearance in a book on his website selling pigeons).
  • In practice, most respondents are not worried about privacy. They are concerned about how they are portrayed. Colin and Murphy use evidence from Annette Lareau’s follow up from her study. Some folks were angry about what she said about them, not the level of privacy.
  • Masking suppresses the voices of research subject. It is very hard to dispute an anonymous characterization of yourself.
  • Masking prevents accumulation of knowledge. Follow ups, return visits, verification, and longitudinal studies are made impossible. Colin has a nice example from his current research. He happens to be doing field work in an area that is covered in an earlier book. He wants to compare, but the IRB prevents that.
  • Access is not as restricted as you might think. If a journalists can write on Amazon, the White House and ISIS using real names and places, an academic ethnographer can at least ask if the respondent wants to use their name.

Now, you shouldn’t misrepresent Jerolmack and Murphy’s argument. They are not against anonymity in all cases. Rather, they want identification to be the default. If you really need anonymity, so be it. But at least seriously consider identification as your first option.

I’ll conclude with a few thoughts as someone who has done some field work and often uses qualitative methods. In general, when I interview people, I have a specific protocol where I ask people at the end whether they want their name used. In my black power project, I interviewed 19 people and 12 gave me permission to use names. And this includes activists who did some controversial things and spoke about some sensitive issues.  Of course, for public records, I used names. For the antiwar project, we also gave the option of going public or remaining anonymous. Most people used their name and we used names for all people speaking in public (for an example of our fieldwork, see here and yes, names were used). In both projects, the locations are well known, whether they are contemporary or historical. So overall, I feel that identification is a fairly intuitive default. I hope that other sociologists seriously consider this position.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($2!!!!)/From Black Power/Party in the Street

 

 

Written by fabiorojas

December 18, 2015 at 12:01 am

Posted in fabio, mere empirics

explaining a possible trump win

I still think that Trump is very, very unlikely to win the GOP nomination. But with about 60 days left till the Iowa Caucus, it looks like Trump will last longer than I thought. Real Clear Politics still shows him getting in the high 20%s in national polls, winning in New Hampshire, and still hanging on to a lead over Ted Cruz in Iowa.  At the very least, Trump will make it to Super Tuesday.

The question is why. As I noted before, the dominant political science model of presidential nominations is that party elites choose candidates. Once they choose and publicly endorse, the rank and file move to the candidates, cash contributions and support flows, and only those candidates with elite support can afford to wage a serious campaign.

But this is not an “iron law.” It is a summary of a complicated process that frequently occurs in American politics. Thus, if the conditions that enable elites to guide nominations do not hold, other processes may occur. So what is the “background” that makes elite selection of party nominees possible?

My answer: Elites can guide elections because candidates are cash poor and need the help of parties who can do voter registration, publicity, and legal work. If you  buy this argument, then it is easy to see that a candidate can go solo if they have their own money (like a real estate empire) or publicity (a long career in books and television). Thus, Trump is a very rare person who has the potential bypass the normal party process.

But of course, will he actually do it? The following needs to happen:

  • Since New Hampshire tends to vote for “local candidates,” Trump (a New York candidate) will likely take that state without much effort. To win Iowa, he needs to have a superior ground game where he out-mobilizes Ted Cruz, who is often favored by evangelicals.
  • South Carolina is probably irrelevant. He’ll win it if he’s already won New Hampshire and Iowa. If he splits, he’ll be cruising to Super Tuesday anyway, win or lose. If he loses both, he’s probably out anyway.
  • Super Tuesday has 16 states. By this point, all candidates with any serious followings have dropped out, which means Rubio/Cruz vs. Trump. That means the establishment has its machine going in an attempt to stop Trump.
  • Super Tuesday has a lot of states that look Trump unfriendly on paper: small caucus states (like Wyoming and Alaska) or Southern states (Tennessee, Arkansas, Georgia) that might go for Southerners like Rubio or Cruz.
  • Ideologically, Trump must simply keep going down the same road – extreme anti-foreigner/Muslim prejudice plus more middle of the road stances on issues like social security and taxes. This appeals to the xenophobic “middle American radical” that is now squarely inside the GOP.

Ironically, a Trump win will likely mirror Obama’s 2008 primary win: out hustle the establishment candidate in caucus states and stand out on a single issue that the base cares about (immigrants for Trump, Iraq for Obama).

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($2!!!!)/From Black Power/Party in the Street

Written by fabiorojas

December 8, 2015 at 12:01 am

non-convergent models and the failure to look at the whole picture when fitting models

Very few models in statistics have nice, clean closed form solutions. Usually, coefficients in models must be estimated by taking an initial guess and improving the estimate (e.g., the Newton-Raphson method). If your estimates stabilize, then you say “Mission accomplished, the coefficient is X!” Sometimes, your statistical software will say “I stop because the model does not converge – the estimates bounce around.”

Normally, people throw up their hands and say “too bad, this model is inconclusive” and they move on. This is wrong. Why? The convergence/non-convergence of a model estimate is the result of completely arbitrary choices. Simple example:

I am estimating the effect of a new drug on the number of days that people live after treatment. Assume that I have nice data from a clean experiment. I will estimate the # of days using a negative binomial regression since I have count data which may/may not be over-dispersed. Stata says “sorry, likelihood function is not-concave, model won’t converge.” So I actually ask Stata to show me the likelihood function and it bounces around by about 3% – more than the default settings. Furthermore, my coefficient estimates bounce around a little. The effect of treatment is about two months +/- a week, depending in the settings.

As you can see, the data clearly supports the hypothesis that the treatment works (i.e., extra days alive >0). All “non-convergence” means is that there might be multiple likelihood function maxima and they are all close in terms of practical significance, or that the ML surface is very “wiggly” around the likely maximum point.

Does that mean you can ignore convergence issues in maximum likelihood estimation? No! Another example:

Same example as above – you are trying to measure effectiveness of a drug and I get “non-convergence” from Stata. But in this case, I look at the ML estimates and notice they bounce around a lot. Then, I ask Stata to estimate with different sensitivity settings and discover that the coefficients are often near zero and sometimes that are far from zero.

The evidence here supports the null hypothesis. Same error message, but different substantive conclusions.

The lesson is simple. In applied statistics, we get lazy and rely on simple answers: p-values, r-squared, and error messages. What they all have in common is that they are arbitrary rules. To really understand your model, you need to actually look at the full range of information and not just rely on cut-offs. This makes publication harder (referees can’t just look for asterisks in tables) but it’s better thinking.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($2!!!!)/From Black Power/Party in the Street

Written by fabiorojas

November 5, 2015 at 12:01 am

Posted in fabio, mere empirics

critical thinking courses do not teach critical thinking any better than other courses

Critical Thinking Wars 1 and 2

A recent meta-analysis of studies of critical thinking (e.g., seeing if students can formulate criticisms of arguments) shows that, on average, college education is associated with critical thinking. From “Does College Teach Critical Thinking? A Meta-Analysis” by Christopher Huber and Nathan Kuncel in Review of Educational Research:

This meta-analysis synthesizes research on gains in critical thinking skills and attitudinal dispositions over various time frames in college. The results suggest that both critical thinking skills and dispositions improve substantially over a normal college experience.

Now, my beef with the whole critical thinking stream is that there is a special domain of teaching called “critical thinking.” The looked at studies where students were in special critical thinking instruction:

Although college education may lag in other ways, it is not clear that more time and resources should be invested in teaching domain-general critical thinking.

How the Chronicle of Higher Education blog summarizes the issue:

Students are learning critical-thinking skills, but adding instruction focused on critical thinking specifically doesn’t work. Students in programs that stress critical thinking still saw their critical-thinking skills improve, but the improvements did not surpass those of students in other programs.

Bottom line: Take regular courses on regular topics and pay close attention to how people in specific areas figure out problems. Skip the critical thinking stuff, it’s fluff talk.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($2!!!!)/From Black Power/Party in the Street

Written by fabiorojas

October 30, 2015 at 12:01 am

please stop lecturing me

The New York Times has run an op-ed by Molly Worthen, a professor of history, who implores against active learning in college classes and wants to retain the lecture format:

Good lecturers communicate the emotional vitality of the intellectual endeavor (“the way she lectured always made you make connections to your own life,” wrote one of Ms. Severson’s students in an online review). But we also must persuade students to value that aspect of a lecture course often regarded as drudgery: note-taking. Note-taking is important partly for the record it creates, but let’s be honest. Students forget most of the facts we teach them not long after the final exam, if not sooner. The real power of good notes lies in how they shape the mind.

“Note-taking should be just as eloquent as speaking,” said Medora Ahern, a recent graduate of New Saint Andrews College in Idaho. I tracked her down after a visit there persuaded me that this tiny Christian college has preserved some of the best features of a traditional liberal arts education. She told me how learning to take attentive, analytical notes helped her succeed in debates with her classmates. “Debate is really all about note-taking, dissecting your opponent’s idea, reducing it into a single sentence. There’s something about the brevity of notes, putting an idea into a smaller space, that allows you psychologically to overcome that idea.”

As we noted on this blog, there is actually a massive amount of research comparing lecturing to other forms of classroom instruction and lectures do very poorly:

To weigh the evidence, Freeman and a group of colleagues analyzed 225 studies of undergraduate STEM teaching methods. The meta-analysis, published online today in the Proceedings of the National Academy of Sciences, concluded that teaching approaches that turned students into active participants rather than passive listeners reduced failure rates and boosted scores on exams by almost one-half a standard deviation. “The change in the failure rates is whopping,” Freeman says. And the exam improvement—about 6%—could, for example, “bump [a student’s] grades from a B– to a B.”

If you’d like your students to master the art of eloquent note taking, continue lecturing. If you’d like them to learn things, adopt active learning.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($2!!!!)/From Black Power/Party in the Street

Written by fabiorojas

October 21, 2015 at 12:01 am

editorial incentives against replication

It is rather obvious that scholars have almost no incentives for replication or verification of other’s work. Even the guys who busted La Cour will get little for their efforts aside from a few pats on the back. But what is less noted is that editors have little incentive to issue corrections, publish replications, and commentaries:

  • Editing a journal is a huge workload. Imagine an additional steady stream of replication notes that need to be refereed.
  • Replication notes will never get cited like the original, so they drag down your journal’s impact factor.
  • Replication studies, except in cases of fraud (e.g., the La Cour case), will rarely change the minds of people after they read the original. For example, the Bender, Moe and Schotts APSR replication essentially pointed out that a key point of garbage can theory is wrong, yet the garbage can model still gets piles of cites.
  • Processing replication notes creates more conflict that editors need to deal with.

It’s sad that correcting the record and verification receives little reward. It’s a very anti-intellectual situation. Still, I think there are some good alternatives. One possible model is that folks interested in replication can simply archive them in arXiv, SSRN or other venues.Very important replications can be published in venues like PLoS One or Sociological Science as formal articles. Thus, there can be a record of which studies hold water and which don’t without demanding that journals spend time as arbitrators between replicators and the original authors.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($2!!!!)/From Black Power/Party in the Street

Written by fabiorojas

October 14, 2015 at 12:01 am

Posted in fabio, mere empirics

stuff that doesn’t replicate

Here’s the list (so far):

Some people might want to hand wave the problem away or jump to the conclusion that science is broken. There’s a more intuitive explanation – science is “brittle.” That is, once you get past some basic and important findings, you get to findings that are small in size, require many technical assumptions, or rely on very specific laboratory/data collection conditions.

There should be two responses. First, editors should reject submissions which might depend on “local conditions” or very small results or send them to lower tier journals. Second, other researchers should feel free to try to replicate research. This is appropriate work for early career academics who need to learn how work is done. Of course, people who publish in top journals, or obtain famous results, should expect replication requests.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($2!!!!)/From Black Power/Party in the Street

Written by fabiorojas

October 13, 2015 at 12:01 am

more tweets, more votes: social media and causation

This week, the group Political Bots wrote the following tweet and cited More Tweets, More Votes in support:

The claim, I believe, is that politicians purchase bots (automated spamming Twitter accounts) because they believe that more presence on media leads to a higher vote tally.

In presenting these results, we were very careful to avoid saying that there is a causal relationship between social media mentions and voting:

These results indicate that the “buzz” or public discussion about a candidate on social media can be used as an indicator of voter behavior.

And:

Known as the Pollyana hypothesis, this finding implies that the relative over-representation of a word within a corpus of text may indicate that it signifies something that is viewed in a relatively positive manner. Another possible explanation might be that strong candidates attract more attention from both supporters and opponents. Specifically, individuals may be more likely to attack or discuss disliked candidates who are perceived as being strong or as having a high likelihood of winning.

In other words, we went to great efforts to suggest that social media is a “thermometer,” not a cause of election outcomes.

Now, it might be fascinating to find that politicians are changing behavior in response to our paper. It *might* be the case that when politicians believe in a causal effect, they increase spending on social media. Even then, it doesn’t show a causal effect of social media. It is actually more evidence for the “thermometer” theory. Politicians who have money to spend on social media campaigns are strong candidates and strong candidates tend to get more votes. I appreciate the discussion of social media and election outcomes, but so far, I think the evidence is that there is not a casual effect.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($2!!!!)/From Black Power/Party in the Street

Written by fabiorojas

September 4, 2015 at 12:02 am

inside higher education discusses replication in psychology and sociology

Science just published a piece showing that only a third of articles from major psychology journals can be replicated. That is, if you reran the experiments, only a third of experiments will have statistically significant results. The details of the studies matter as well. The higher the p-value, the less like you were to replicate and “flashy” results were less likely to replicate.

Insider Education spoke to me and other sociologists about the replication issue in our discipline. A major issue is that there is no incentive to actually assess research since it seems to be nearly impossible to publish replications and statistical criticisms in our major journals:

Recent research controversies in sociology also have brought replication concerns to the fore. Andrew Gelman, a professor of statistics and political science at Columbia University, for example, recently published a paper about the difficulty of pointing out possible statistical errors in a study published in the American Sociological Review. A field experiment at Stanford University suggested that only 15 of 53 authors contacted were able or willing to provide a replication package for their research. And the recent controversy over the star sociologist Alice Goffman, now an assistant professor at the University of Wisconsin at Madison, regarding the validity of her research studying youths in inner-city Philadelphia lingers — in part because she said she destroyed some of her research to protect her subjects.

Philip Cohen, a professor of sociology at the University of Maryland, recently wrote a personal blog post similar to Gelman’s, saying how hard it is to publish articles that question other research. (Cohen was trying to respond to Goffman’s work in the American Sociological Review.)

“Goffman included a survey with her ethnographic study, which in theory could have been replicable,” Cohen said via email. “If we could compare her research site to other populations by using her survey data, we could have learned something more about how common the problems and situations she discussed actually are. That would help evaluate the veracity of her research. But the survey was not reported in such a way as to permit a meaningful interpretation or replication. As a result, her research has much less reach or generalizability, because we don’t know how unique her experience was.”

Readers can judge whether Gelman’s or Cohen’s critiques are correct. But the broader issue is serious. Sociology journals simply aren’t publishing error correction or replication, with the honorable exception of Sociological Science which published a replication/critique of the Brooks/Manza (2006) ASR article. For now, debate on the technical merits of particular research seems to be the purview of blog posts and book reviews that are quickly forgotten. That’s not good.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($2!!!!)/From Black Power/Party in the Street

Written by fabiorojas

August 31, 2015 at 12:01 am

working with computer scientists

doof-wagon-2

In North Carolina, this is called the “Vaisey Cart.

I’ve recently begun to work with a crew of computer scientists at Indiana when I was recruited to help with a social media project. It’s been a highly informative experience that has reinforced my belief that sociologists and computer scientists should team up. Some observations:

  • CS and sociology are complimentary. We care about theory. They care about tools and application. Natural fit.
  • In contrast, sociology and other social sciences are competing over the same theory space.
  • CS people have a deep bucket of tools for solving all kinds of problems that commonly occur in cultural sociology, network analysis, and simulation studies.
  • CS people believe in the timely solution of problems and workflow. Rather than write over a period of years, they believe in “yes, we can do this next week.”
  • Since their discipline runs on conferences, the work is fast and it is expected that it will be done soon.
  • Another benefit of the peer reviewed conference system is that work is published “for real” quickly and there is much less emphasis on a few elite publication outlets. Little “development.” Either it works or it doesn’t.
  • Quantitative sociologists are really good at applied stats and can help most CS teams articulate data analysis plans and execute them, assuming that the sociologist knows R.
  • Perhaps most importantly, CS researchers may be confident in their abilities, but less likely to think that they know it all and have no need for help from others. CS is simply too messy a field, which is similar to sociology.
  • Finally: cash. Unlike the arts and sciences, there is no sense that we are broke. While you still have to work extra hard to get money, it isn’t a lost cause like sociology is where the NSF hands out a handful of grants. There is money out there for entrepreneurial scholars.

Of course, there are downsides. CS people think you are crazy for working on a 60 page article that takes 5 years to get published. Also, some folks in data science and CS are more concerned about tools and nice visuals at the expense of theory and understanding. As a corollary, it is often the case that some CS folks may not appreciate sampling, bias, non-response, and other issues that normally inform sociological research design. But still, my experience has been excellent, the results exciting, and I think more sociologists should turn to computer science as an interdisciplinary research partner.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($2!!!!)/From Black Power/Party in the Street

Written by fabiorojas

June 24, 2015 at 12:01 am