Archive for the ‘mere empirics’ Category

non-convergent models and the failure to look at the whole picture when fitting models

Very few models in statistics have nice, clean closed form solutions. Usually, coefficients in models must be estimated by taking an initial guess and improving the estimate (e.g., the Newton-Raphson method). If your estimates stabilize, then you say “Mission accomplished, the coefficient is X!” Sometimes, your statistical software will say “I stop because the model does not converge – the estimates bounce around.”

Normally, people throw up their hands and say “too bad, this model is inconclusive” and they move on. This is wrong. Why? The convergence/non-convergence of a model estimate is the result of completely arbitrary choices. Simple example:

I am estimating the effect of a new drug on the number of days that people live after treatment. Assume that I have nice data from a clean experiment. I will estimate the # of days using a negative binomial regression since I have count data which may/may not be over-dispersed. Stata says “sorry, likelihood function is not-concave, model won’t converge.” So I actually ask Stata to show me the likelihood function and it bounces around by about 3% – more than the default settings. Furthermore, my coefficient estimates bounce around a little. The effect of treatment is about two months +/- a week, depending in the settings.

As you can see, the data clearly supports the hypothesis that the treatment works (i.e., extra days alive >0). All “non-convergence” means is that there might be multiple likelihood function maxima and they are all close in terms of practical significance, or that the ML surface is very “wiggly” around the likely maximum point.

Does that mean you can ignore convergence issues in maximum likelihood estimation? No! Another example:

Same example as above – you are trying to measure effectiveness of a drug and I get “non-convergence” from Stata. But in this case, I look at the ML estimates and notice they bounce around a lot. Then, I ask Stata to estimate with different sensitivity settings and discover that the coefficients are often near zero and sometimes that are far from zero.

The evidence here supports the null hypothesis. Same error message, but different substantive conclusions.

The lesson is simple. In applied statistics, we get lazy and rely on simple answers: p-values, r-squared, and error messages. What they all have in common is that they are arbitrary rules. To really understand your model, you need to actually look at the full range of information and not just rely on cut-offs. This makes publication harder (referees can’t just look for asterisks in tables) but it’s better thinking.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($2!!!!)/From Black Power/Party in the Street

Written by fabiorojas

November 5, 2015 at 12:01 am

Posted in fabio, mere empirics

critical thinking courses do not teach critical thinking any better than other courses

Critical Thinking Wars 1 and 2

A recent meta-analysis of studies of critical thinking (e.g., seeing if students can formulate criticisms of arguments) shows that, on average, college education is associated with critical thinking. From “Does College Teach Critical Thinking? A Meta-Analysis” by Christopher Huber and Nathan Kuncel in Review of Educational Research:

This meta-analysis synthesizes research on gains in critical thinking skills and attitudinal dispositions over various time frames in college. The results suggest that both critical thinking skills and dispositions improve substantially over a normal college experience.

Now, my beef with the whole critical thinking stream is that there is a special domain of teaching called “critical thinking.” The looked at studies where students were in special critical thinking instruction:

Although college education may lag in other ways, it is not clear that more time and resources should be invested in teaching domain-general critical thinking.

How the Chronicle of Higher Education blog summarizes the issue:

Students are learning critical-thinking skills, but adding instruction focused on critical thinking specifically doesn’t work. Students in programs that stress critical thinking still saw their critical-thinking skills improve, but the improvements did not surpass those of students in other programs.

Bottom line: Take regular courses on regular topics and pay close attention to how people in specific areas figure out problems. Skip the critical thinking stuff, it’s fluff talk.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($2!!!!)/From Black Power/Party in the Street

Written by fabiorojas

October 30, 2015 at 12:01 am

please stop lecturing me

The New York Times has run an op-ed by Molly Worthen, a professor of history, who implores against active learning in college classes and wants to retain the lecture format:

Good lecturers communicate the emotional vitality of the intellectual endeavor (“the way she lectured always made you make connections to your own life,” wrote one of Ms. Severson’s students in an online review). But we also must persuade students to value that aspect of a lecture course often regarded as drudgery: note-taking. Note-taking is important partly for the record it creates, but let’s be honest. Students forget most of the facts we teach them not long after the final exam, if not sooner. The real power of good notes lies in how they shape the mind.

“Note-taking should be just as eloquent as speaking,” said Medora Ahern, a recent graduate of New Saint Andrews College in Idaho. I tracked her down after a visit there persuaded me that this tiny Christian college has preserved some of the best features of a traditional liberal arts education. She told me how learning to take attentive, analytical notes helped her succeed in debates with her classmates. “Debate is really all about note-taking, dissecting your opponent’s idea, reducing it into a single sentence. There’s something about the brevity of notes, putting an idea into a smaller space, that allows you psychologically to overcome that idea.”

As we noted on this blog, there is actually a massive amount of research comparing lecturing to other forms of classroom instruction and lectures do very poorly:

To weigh the evidence, Freeman and a group of colleagues analyzed 225 studies of undergraduate STEM teaching methods. The meta-analysis, published online today in the Proceedings of the National Academy of Sciences, concluded that teaching approaches that turned students into active participants rather than passive listeners reduced failure rates and boosted scores on exams by almost one-half a standard deviation. “The change in the failure rates is whopping,” Freeman says. And the exam improvement—about 6%—could, for example, “bump [a student’s] grades from a B– to a B.”

If you’d like your students to master the art of eloquent note taking, continue lecturing. If you’d like them to learn things, adopt active learning.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($2!!!!)/From Black Power/Party in the Street

Written by fabiorojas

October 21, 2015 at 12:01 am

editorial incentives against replication

It is rather obvious that scholars have almost no incentives for replication or verification of other’s work. Even the guys who busted La Cour will get little for their efforts aside from a few pats on the back. But what is less noted is that editors have little incentive to issue corrections, publish replications, and commentaries:

  • Editing a journal is a huge workload. Imagine an additional steady stream of replication notes that need to be refereed.
  • Replication notes will never get cited like the original, so they drag down your journal’s impact factor.
  • Replication studies, except in cases of fraud (e.g., the La Cour case), will rarely change the minds of people after they read the original. For example, the Bender, Moe and Schotts APSR replication essentially pointed out that a key point of garbage can theory is wrong, yet the garbage can model still gets piles of cites.
  • Processing replication notes creates more conflict that editors need to deal with.

It’s sad that correcting the record and verification receives little reward. It’s a very anti-intellectual situation. Still, I think there are some good alternatives. One possible model is that folks interested in replication can simply archive them in arXiv, SSRN or other venues.Very important replications can be published in venues like PLoS One or Sociological Science as formal articles. Thus, there can be a record of which studies hold water and which don’t without demanding that journals spend time as arbitrators between replicators and the original authors.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($2!!!!)/From Black Power/Party in the Street

Written by fabiorojas

October 14, 2015 at 12:01 am

Posted in fabio, mere empirics

stuff that doesn’t replicate

Here’s the list (so far):

Some people might want to hand wave the problem away or jump to the conclusion that science is broken. There’s a more intuitive explanation – science is “brittle.” That is, once you get past some basic and important findings, you get to findings that are small in size, require many technical assumptions, or rely on very specific laboratory/data collection conditions.

There should be two responses. First, editors should reject submissions which might depend on “local conditions” or very small results or send them to lower tier journals. Second, other researchers should feel free to try to replicate research. This is appropriate work for early career academics who need to learn how work is done. Of course, people who publish in top journals, or obtain famous results, should expect replication requests.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($2!!!!)/From Black Power/Party in the Street

Written by fabiorojas

October 13, 2015 at 12:01 am

more tweets, more votes: social media and causation

This week, the group Political Bots wrote the following tweet and cited More Tweets, More Votes in support:

The claim, I believe, is that politicians purchase bots (automated spamming Twitter accounts) because they believe that more presence on media leads to a higher vote tally.

In presenting these results, we were very careful to avoid saying that there is a causal relationship between social media mentions and voting:

These results indicate that the “buzz” or public discussion about a candidate on social media can be used as an indicator of voter behavior.


Known as the Pollyana hypothesis, this finding implies that the relative over-representation of a word within a corpus of text may indicate that it signifies something that is viewed in a relatively positive manner. Another possible explanation might be that strong candidates attract more attention from both supporters and opponents. Specifically, individuals may be more likely to attack or discuss disliked candidates who are perceived as being strong or as having a high likelihood of winning.

In other words, we went to great efforts to suggest that social media is a “thermometer,” not a cause of election outcomes.

Now, it might be fascinating to find that politicians are changing behavior in response to our paper. It *might* be the case that when politicians believe in a causal effect, they increase spending on social media. Even then, it doesn’t show a causal effect of social media. It is actually more evidence for the “thermometer” theory. Politicians who have money to spend on social media campaigns are strong candidates and strong candidates tend to get more votes. I appreciate the discussion of social media and election outcomes, but so far, I think the evidence is that there is not a casual effect.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($2!!!!)/From Black Power/Party in the Street

Written by fabiorojas

September 4, 2015 at 12:02 am

inside higher education discusses replication in psychology and sociology

Science just published a piece showing that only a third of articles from major psychology journals can be replicated. That is, if you reran the experiments, only a third of experiments will have statistically significant results. The details of the studies matter as well. The higher the p-value, the less like you were to replicate and “flashy” results were less likely to replicate.

Insider Education spoke to me and other sociologists about the replication issue in our discipline. A major issue is that there is no incentive to actually assess research since it seems to be nearly impossible to publish replications and statistical criticisms in our major journals:

Recent research controversies in sociology also have brought replication concerns to the fore. Andrew Gelman, a professor of statistics and political science at Columbia University, for example, recently published a paper about the difficulty of pointing out possible statistical errors in a study published in the American Sociological Review. A field experiment at Stanford University suggested that only 15 of 53 authors contacted were able or willing to provide a replication package for their research. And the recent controversy over the star sociologist Alice Goffman, now an assistant professor at the University of Wisconsin at Madison, regarding the validity of her research studying youths in inner-city Philadelphia lingers — in part because she said she destroyed some of her research to protect her subjects.

Philip Cohen, a professor of sociology at the University of Maryland, recently wrote a personal blog post similar to Gelman’s, saying how hard it is to publish articles that question other research. (Cohen was trying to respond to Goffman’s work in the American Sociological Review.)

“Goffman included a survey with her ethnographic study, which in theory could have been replicable,” Cohen said via email. “If we could compare her research site to other populations by using her survey data, we could have learned something more about how common the problems and situations she discussed actually are. That would help evaluate the veracity of her research. But the survey was not reported in such a way as to permit a meaningful interpretation or replication. As a result, her research has much less reach or generalizability, because we don’t know how unique her experience was.”

Readers can judge whether Gelman’s or Cohen’s critiques are correct. But the broader issue is serious. Sociology journals simply aren’t publishing error correction or replication, with the honorable exception of Sociological Science which published a replication/critique of the Brooks/Manza (2006) ASR article. For now, debate on the technical merits of particular research seems to be the purview of blog posts and book reviews that are quickly forgotten. That’s not good.

50+ chapters of grad skool advice goodness: Grad Skool Rulz ($2!!!!)/From Black Power/Party in the Street

Written by fabiorojas

August 31, 2015 at 12:01 am


Get every new post delivered to your Inbox.

Join 3,484 other followers