prediction vs. modelling

I am currently working on a super cool project and I was thinking about the following distinction: modelling of data vs. prediction with data. If you give data to a physical science or engineering type, then they want prediction. They want to come up with an accurate prediction of some future state. You want tiny errors. In contrast, most social scientists are interesting in modelling general trends. We understand that statistical models have error terms, so prediction is inherently hard. It’s even beside the point in some sense. If X perfectly predicts Y, you’ve probably just measured the same thing twice. Instead, you want an imperfect, but unexpected, relationship between variables. Neither approach is wrong, but they do represent different philosophies of data analysis.

Manipulative advertising: From Black Power/Grad Skool Rulz


Written by fabiorojas

December 6, 2012 at 12:01 am

Posted in fabio, mere empirics

5 Responses

Subscribe to comments with RSS.

  1. In my view, there are two forms of modeling: predictive and narrative.

    Predictive is what your credit card company does when it attempts to determine if you are worth the risk. It is also what the machine learning community does.

    Narrative modeling is what most people do, even if they claim to be making a predictive model. It is about telling a story with your data. As generally applied, classical statistical regression techniques are actually a narrative approach as people generally force the regression assumptions (e.g. no hetero-skedacity, normal errors, etc…) onto your model without actually checking that they are true or even understanding the assumptions (yes, I know everyone swears they check their assumptions, but this often doesn’t hold up to scrutiny).



    December 6, 2012 at 1:17 am

  2. We had a stimulating discussion on a related topic over on Andrew Gelman’s blog a few months back. For my part, I have a hard time getting my mind around the idea of “modelling” something without the intent of predicting something. (I’ve got the same issues with explanation and prediction.) Usually, in the discussion, we find out that any conflict over these things is semantic, turning on differences in the meaning of words like “truth” and “accuracy”.

    My view: If you’ve got a data set and a model of it, then you are making a claim about the population that the data set represents. So, if you now sample that population again to generate a comparable data set, then your model implicit makes the same claims about the new data (or that data added to the old data, if this is done properly). Your model is in that sense “predicting” features of this new data.

    Without this requirement of predictive power, I’m not sure what your model says, i.e., what it is claim about, or what it is claiming about it.

    It think we can simply distinguish between stories and models. You can derive a model from story, to be sure. But at that moment you are making some predictive commitments. It’s the basis of testing the story: Suppose I tell you a story about myself as someone who is financially responsible and always pays his bills. On this basis you extend credit to me. You have take my past as a model of my future. Fine. But if I don’t pay you back then you model was straightforwardly wrong, even if the story it was based on was true. That’s why credit card companies derive their models from larger data sets. They don’t want to hear my story at all. They just want to know a number of facts about it, and then let their model predict my behavior.

    The model that is based on a story is a model like any other. It’s just much more likely to turn out to be wrong.



    December 6, 2012 at 9:49 am

  3. in some ways modeling seems easier as it’s predicting the past, but on the other hand modeling usually eschews observables that are too conceptually close to the outcome. for instance, suppose the outcome is candidate preference. prediction will include party registration but modeling will consider this almost tautological and not really explanatory in the same way as a model based just on stuff like race, marital status, and education.



    December 6, 2012 at 5:02 pm

  4. What’s the difference between prediction & modelling causal effects? Honest question.


    Sean M

    December 6, 2012 at 6:35 pm

  5. We model therefore we predict (as one function of modeling).



    December 6, 2012 at 7:13 pm

Comments are closed.

%d bloggers like this: