Archive for the ‘mere empirics’ Category
I have a new article coming out in the journal Public Opinion Quarterly. It’s called “Correcting for Postmortem Bias in Longitudinal Surveys.” I think you might enjoy it. Here’s the abstract:
Ever since the pioneering work of Heckman (1965), social scientists have been acutely aware of selection bias in surveys. This is especially true for longitudinal studies where respondents may drop out of the survey after the first wave due to deceasement. This paper proposes a solution to this problem through postmortem follow up interviews. We illustrate this technique with the Panel Survey of Income Dynamics. Through postmortem interviews, we are able to increase the response rates by approximately 15% and we can identify biases in the original PSID data. For example, we find that employment was over reported in the original PSID. Interestingly, we find that voting increased once the dead were included, an effect attributable to deceased respondents in the Chicago metropolitan statistical area. These results show the promise of including the dead in all future social research. This study was supported by NSF grant 123-666.
Check it out.
Think Progress has been digging further into the back story behind the Regnerus/gay parents paper. The news site got one of the study’s funders to admit that the conclusion was predetermined:
Tellez confirmed to The American Independent that he was referring to same-sex marriage cases. In April 2011 — a year before the study was complete — Tellez wrote in a letter that “we are confident that the traditional understanding of marriage will be vindicated by this study as long as it is done honestly and well.” He also suggested that no prior study had properly compared children raised by a mother and father and those “headed by gay and lesbian couples, but of course the Regnerus study doesn’t even do that.
The study was submitted for publication in February 2012 before Regnerus had even completed all of the data collection and accepted just six weeks later, while many other articles published in the same issue took a year between submission and acceptance. Peer review was similarly hurried, with one social demographer admitting that he only had two weeks to review the study and offer a commentary — without even having access to all the data.
Previous Regnerus discussion on orgtheory.
Salary.com had one of those lists of majors that don’t pay very well. #8? You guessed it – sociology:
People who enter the field of sociology generally are interested in helping their fellow man. Unfortunately, that kind of benevolence doesn’t usually translate to wealth. Here are three jobs commonly held by sociology majors (click on job title and/or salary for more info):
… social worker
… corrections officer
… chemical dependency counselor
This is one of those cheesy magazine articles on careers, but it is consistent with prior research on college majors and income. Sociology is a feeder into service professions. That’s a good thing, though I do wonder how my sublime lectures on the differences between structuralism and post-structuralism help people get off of drugs.
When I posted the Sociology Department Rankings for 2013 I joked that Indiana made it to the Top 10 “due solely to Fabio mobilizing a team of role-playing enthusiasts to relentlessly vote in the survey. (This is speculation on my part.)” Well, some further work with the dataset on the bus this morning suggests that the Fabio Effect is something to be reckoned with after all.
The dataset we collected has—as best we can tell—635 respondents. More precisely it has 635 unique anonymized IP addresses, so probably slightly fewer actual people, if we assume some people voted at work, then maybe again via their phone or from home. Our 635 respondents made 46,317 pairwise comparisons of departments. Now, in any reputational survey of this sort there is a temptation to enhance the score of one’s own institution, perhaps directly by voting for them whenever you can (if you are allowed) or more indirectly by voting down potential peers whenever you can. For this reason some reputational surveys (like the Philosophical Gourmet Report) prohibit respondents from voting for their employer or Ph.D-granting school. The All our Ideas framework has no such safeguards, but it does have a natural buffer when the number of paired comparisons is large. One has the opportunity to vote for one’s own department, but the number of possible pairs is large enough that it’s quite hard to influence the outcome.
It’s not impossible, however.
Sullivan is shocked that the level is high. A few comments: First, the rejection of the war has been stable since around 2006. Second, my hypothesis is that there is a baseline level of support for the war. These are mostly strong conservatives who identify with the Republican party. Third, the wording of the question probably inflates the answer a little. Do you think the United States made a mistake? A lot of people don’t like admitting their country is ever wrong. Sullivan’s post even notes that the support for the war drops in a different wording – whether the war was worth the cost. A bit more impersonal. Bottom line: People know things went badly in Iraq, but nationalism suppresses the feeling.
Update: I updated these analyses (fixing the double-counting problem). The results changed a little, so reload to see the new figures.
Last week we launched the OrgTheory/AAI 2013 Sociology Department Ranking Survey, taking advantage of Matt Salganik’s excellent All Our Ideas service to generate sociology rankings based on respondents making multiple pairwise comparisons between department. That is, questions of the form “In your judgment, which of the following is the better Sociology department?” followed by a choice between two departments. Amongst other advantages, this method tends to get you a lot of data quickly. People find it easier to make a pairwise choice between two alternatives than to assign a rating score or produce a complete ranking amongst many alternatives. They also get addicted to the process and keep making choices. In our survey, over 600 respondents made just over 46,000 pairwise comparisons. In the original version of this post I used the Session IDs supplied in the data, forgetting that the data file also provides non-identifying (hashed) IP addresses. I re-ran the analysis using voter-aggregated rather than session-aggregated data, so now there is no double-counting. The results are a little cleaner. Although the All Our Ideas site gives you the results itself, I was interested in getting some other information out of the data, particularly confidence intervals for departments. Here is a figure showing the rankings for the Top 50 departments, based on ability scores derived from a direct-comparison Bradley-Terry model.
The model doesn’t take account of any rater effects, but given the general state of the U.S. News ranking methodology I am not really bothered. As you can see, the gradation looks pretty smooth. The first real “hinge” in the rankings (in the sense of a pretty clean separation between a department and the one above it) comes between Toronto and Emory. You could make a case, if you squint a bit, that UT Austin and Duke are at a similar hinge-point with respect to the departments ranked above and below them. Indiana’s high ranking is due solely to Fabio mobilizing a team of role-playing enthusiasts to relentlessly vote in the survey. (This is speculation on my part.)
While we’re running our Crowdsourced Sociology Rankings, people have been looking a little more closely at the U.S. News and World Report rankings. Over at Scatterplot, Neal Caren points out that U.S. News’s methods page has some details on the survey sample size and response rates. They’re bad:
Surveys were conducted in fall 2012 by Ipsos Public Affairs … Questionnaires were sent to department heads and directors of graduate studies (or, alternatively, a senior faculty member who teaches graduate students) at schools that had granted a total of five or more doctorates in each discipline during the five-year period from 2005 through 2009, as indicated by the 2010 "Survey of Earned Doctorates." … The surveys asked about Ph.D. programs in criminology (response rate: 90 percent), economics (25 percent), English (21 percent), history (19 percent), political science (30 percent), psychology (16 percent), and sociology (31 percent). … The number of schools surveyed in fall 2012 were: economics—132, English—156, history—151, political science—119, psychology—246, and sociology—117. In fall 2008, 36 schools were surveyed for criminology.
So, following Neal, this tells us the Sociology rankings are based on a survey of 117 Heads and Directors with a response rate of 31 percent, which is thirty six people in total. For Economics you have 33 people, for History 29 people, for Political Science 36 people, for Psychology 40 people, and for English 33 people. The methods page also notes that they calculate the scores using a trimmed mean, so they throw out two observations each time (the highest and the lowest). The upshot is that the average score of a department is likely to have rather wide confidence intervals.
But, don’t let all that get in the way of contemplating the magic numbers. The press releases from strongly-ranked departments are already coming thick and fast.
Update: These numbers are too low. Read on.
I guess it’s possible that U.S. News *might* mean that the *effective* N of, e.g., the Sociology survey is 117, and that’s the result of a larger initial survey which yielded a 31 percent response rate. On that interpretation they they initially contacted 378 departments (or thereabouts). That would be a non-standard way of describing what you did. Normally, if you give a raw number for the sample size and tell us the response rate, the raw number is the N you began with, not the N you ended up with. A quick check of the Survey of Earned Doctorates suggests that there were 167 Ph.D granting Sociology programs in the United States in 2010, which suggests that 117 is about right for the number who had awarded five or more in the past five years. Same goes for Economics, which has 179 Ph.D programs in the 2010 SED. Then again, the wording in the methods can also be read as saying every department might have received two surveys (“Questionnaires were sent to department heads and directors of graduate studies … at schools that had granted a total of five or more doctorates … during the five-year period from 2005 through 2009″). Looking again at the available SED data for 2006 to 2010 (one year off the USNWR dates, unfortunately), I found that 115 Sociology Departments met the stated criteria of having awarded five our more doctorates in the previous five years. If both the Dept Head and DGS in all those departmetns got a survey, this makes for an initial maximum N of 230, which is still quite far from the 378 or so needed, if 117 is supposed to mean the 31 percent who responded rather than the total number initially surveyed.
It seems like the most plausible interpretation is that for Sociology the number of schools surveyed is in fact 117, that every school received two copies of the questionnaire (one to the Head, one to the DGS or equivalent), but that the 31 percent response rate means “schools from which at least one response was received”, and so the total N surveys for Sociology is somewhere between 36 and 72 people, with a similar range of between 30 and 80 for the other departments.
Update: While I was offline dealing with other things, then looking at the SED data I’d downloaded, then writing the last few paragraphs above, I see others have come to the same conclusion as I do here by more direct and informed means.
As many of you are by now aware, U.S. News and World Report released the 2013 Edition of its Sociology Rankings this week. I find rankings fascinating, not least because of what you might call the “legitimacy ratchet” they implement. Winners insist rankings are absurd but point to their high placing on the list. Here’s a nice example of that from the University of Michigan. The message here is, “We’re not really playing, but of course if we were we’d be winning.” Losers, meanwhile, either remain silent (thus implicitly accepting their fate) or complain about the methods used, and leave themselves open to accusations of sour grapes or bad faith. They are constantly tempted to reject the enterprise and insist they should’ve been ranked higher, and so end up sounding like the apocryphal Borscht Belt couple complaining that the food here is terrible and the portions are tiny as well.
The best thing to do is to implement your own system, and do it better, if only to introduce confusion by way of additional measures. Omar Lizardo and Jessica Collett have already pointed out that U.S. News decided to cook the rankings by averaging the results from this year’s survey with the previous two rounds. They provide an estimate of what the de-averaged results probably looked like. Back in 20011, Steve Vaisey and I ran a poll using Matt Salganik’s excellent All Our Ideas website, which creates rankings from multiple pairwise comparisons. It’s easy to run and generates rankings with high face validity in a way that’s quicker, more fun, and much, much cheaper than the alternatives. So, we’re doing it again this year. Here is OrgTheory/AOI 2013 Sociology Department Ranking Survey. Go and vote! Chicago people will be happy to hear can vote as often as you like. So, participate in your own quantitative domination and get voting.
A few weeks ago, I argued that the era of overt racism is over. One commenter felt that I needed to operationalize the idea. There is no simple way to measure such a complex idea, but we can offer measurements of very specific processes. For example, I could hypothesize that it is no longer to legitimate to use in public words that have a clearly derogatory meaning, such as n—— or sp–.*
We can test that idea with word frequency data. Google has scanned over 4 million books from 1500 to the present and you can search that database. Above, I plotted the appearance of n—– and sp—, two words which are unambiguously slurs for two large American ethnic groups. I did not plot slurs like “bean,” which are homophones for other neutral non-racial words. Then, I plotted the appearance of the more neutral or positive words for those groups. The first graph shows the relative frequencies for African American and Latino slurs vs. other ethnic terms. Since the frequency for Asian American slurs and other words is much lower, they get a separate graph. Thus, we can now test hypotheses about printed text in the post-racial society:
- The elimination thesis: Slurs drop drastically in use.
- The eclipse thesis: Non-slur words now overwhelm racist slurs, but racist slurs remain.
- Co-evolution: The frequency of neutral and slur words move together. People talk about group X and the haters just use the slur.
- Escalation: Slurs are increasing.
This rough data indicates that #2 is correct. The dominant racial terms are neutral or positive. Most slurs that I looked up seem to maintain some base level of usage, even in the post-civil rights era. The slur use level is non-zero, but it is small in comparison to other words so it looks as if it is zero. Some slure use may be derogatory, while some of it may be artistic or “reclaiming the term.” I can’t prove it, but I think Quentin Tarantino accounts for for 50% or more of post-civil rights use of the n-word.
Bottom line: Society has changed and we can measure the change. This doesn’t mean that racial status is no longer important, but it does mean that one very important aspect of pre-Civil Rights racist culture has receded in relative importance. Some people just love racial slurs, but that its likely not the modal way of talking about people. Is that progress? I think so.
* Geez, Fabio, must you censor? Well, it isn’t censoring if it’s voluntary. I just don’t want this blog to be picked up for slurs. Even my book on 1970s Black Power, when people used the n-word a bit, only uses it once, in a footnote when referring to the title of H. Rap Brown’s first book.
Like most of us in the world of organization studies, I was saddened to hear of Michael Cohen’s passing. I only met him once and he was very gracious. In the spirit of his work, let me me draw your attention to his last research project – an analysis of “handoffs.” The issue is that doctors can’t continuously watch patients. Whenever a doctor leaves to go home, a new doctor comes in and there is a “handoff.” Cohen wrote a nice summary for the Robert Wood Johnson Foundation website:
1. To be effective, a handoff has to happen.
It may seem incredibly commonplace, but all too often preventable injuries or even deaths trace back to handoffs that were abbreviated, conducted in awkward conditions, or downright skipped. The easy cases to identify are things like leaving before handoff is done, or rushing the handoff in order to get out the door.
Unfortunately, many other causes are also in play. Some major examples derive from schedule or workload incompatibilities. If patients are sent from the PACU (post-anesthesia care unit) to a floor unit during its nursing report, the nurses accepting the patients will necessarily miss out on the handoff of existing patients. If a patient is moved from the Emergency Department (ED) before her doctor or nurse has time to complete phone calls to the destination unit, the patient endures some period of having been transferred without benefit of handoff. If there is a shift change in the ED just before a patient moves, the handoff is conducted by a doctor or nurse who has only second-hand familiarity with the events. To improve handoffs, we may need to teach participants to think about the organizational structures that make it hard to do them well.
I am currently working on a super cool project and I was thinking about the following distinction: modelling of data vs. prediction with data. If you give data to a physical science or engineering type, then they want prediction. They want to come up with an accurate prediction of some future state. You want tiny errors. In contrast, most social scientists are interesting in modelling general trends. We understand that statistical models have error terms, so prediction is inherently hard. It’s even beside the point in some sense. If X perfectly predicts Y, you’ve probably just measured the same thing twice. Instead, you want an imperfect, but unexpected, relationship between variables. Neither approach is wrong, but they do represent different philosophies of data analysis.
Scatterplot has a discussion on one my favorite topics, low response rates. The observation is that political polls have low response rates, but they produce decent answers, contrary to standard sociological advice. For years, I have argued that response rates do not logically entail biased data. It is simply a logical fallacy to deduce that survey data is biased only because of the response rate. Two examples that show the logical fallacy of deducing bias from response rates alone:
- High response rate, very biased: Let’s say that I fielded a survey that everyone responded to, except for Jews. They didn’t respond at all because I printed a swastika on the envelope. Every single Jewish respondent just threw it in the trash. The result? A response rate of about 97%. High response rate? Yes – textbook perfect. Bias? Yes – any question regarding Judaism (e.g., is R Jewish?) will be biased.
- Low response rate, no bias: Let’s say that I fielded a survey on Oct 1, 2012 in New York City. Say all 1,000 people who got the survey responded. Great! On October 21, I decide to use research funds to draw an extra sample of 9,000 names and send them the same survey. Oh no! Hurricane Sandy hits and nobody responds. Response rate? 10%. Biased? No – because not responding was a random event. The people in wave 1 were randomly chosen.
The issue isn’t the response rate – it’s selection into the study. If selection is correlated with the data (a religion survey that alienates a religious group), then the data is biased. If selection is random, then you have no bias. Selection biases can occur or not occur over the range of response rates from 1% to 99%.
Ok, you say, but maybe it’s not a logic issue. Sure, logically low response rate doesn’t *have* to lead to lead to bias. But in practice, low response is empirically related to bias. May low response rates means only really weird people answer the phone or send back the survey.
This is actually a fair point, but it’s wrong. You see, the bias-low response rate connection is an assumption that can be tested. And guess what? Public opinion researchers have actually tested the assumption through a number of studies. For example, Public Opinion Quarterly in 2000 published the results of an experiment where a survey was run twice. The first time, you just let people do whatever they want (response rate 30%). The second time, you really, really bug people (response rate 60%). The result? Same answers on both surveys. Follow up studies often find the same result.
In fact, in discussing this issue with John Kennedy, our recently retired director of survey research, I found out that this is an open secret among survey professionals. Response rates are a completely bogus measure of bias in survey data. It’s a shame that social scientists have held on to this erroneous belief, despite the work being done in public opinion research.
A graduate student asked me if the following sources for Congressional district voting data are reliable:
The only book for PhD students: Grad Skool Rulz
There’s a statistical (!) twitter fight this evening – Jennifer Rubin tweets “when do we break it to them that averaging polls is junk?” Hilarity ensues. There are actually some important subtle points about averaging poll data:
- Averaging bad data doesn’t make it better. On this broad point, Rubin is correct.
- Averaging good data does help. The purpose is to not be swayed by outliers that are produced by sampling. If you want to know the average family income in the US, you should average things so you won’t be swayed by the time Bill Gates appeared in the sample. If you believe that the typical polling firm is doing a decent job, it’s actually intuitive to average multiple polls.
- There’s actually research showing that poll averages close to the election aren’t terribly far off from the actual final numbers. See Nate Silver’s review on the subject.
A few days ago, I noted that Obama is slightly behind in the polls mainly because of the South. If it weren’t for the South, Obama would easily have about 51% of the vote in rest of the country. Kieran went back and compared the October Gallup polls in 2008 and 2012 to produce this picture:
You’ll hear all kinds of post-hoc explanations of the election outcome in November. But they’re probably wrong unless they start with the fact that the South really, really, really hates Obama more than the rest of the country for some inexplicable reason.
A common problem in social research is selection bias – the people who choose to respond to your survey may be systematically different than the population. We have some methods, like the Heckman model, for adjusting your final model if you have some data that can be used to model study participation. If you don’t have a decent selection model, you can still make some assessment using the methods suggested by Stolzenberg and Relles, which have you decompose your models and study the properties of the different parts (e.g., look at the degree of mean regression under certain conditions).
Question for readers: What is the state of art on this issue? Is there something better than Heckman or playing games with Mills ratios?
What makes a study interesting? Is it the empirical phenomena that we study or is it the theoretical contribution? For those of who are really paying attention (and I applaud you if you are), you’ll notice that I’ve asked this question before. It’s become a sort of obsession of mine. For the field of organizational theory, it’s an important discussion to have, although it’s not one that will likely yield any consensus. Scholars tend to have very strong opinions about this. Some people feel that as a field we’ve fetishized theory to the point of making our research inapplicable to the bigger world we live in. Others claim that by making “theoretical contribution” such a key component of any paper’s value, we ignore really important empirical problems. But in contrast, some scholars maintain that what makes our field lively and essential is that we are linked to one another (and across generations) via a stream of ideas that constitute theory. What makes an empirical problem worthy of study is that it can be boiled down to a crucial theoretical problem that makes it generalizable to a class of phenomena and puzzles.
At this year’s Academy of Management meetings, I was involved in a couple of panels where this issue came up. It was posed as a question, should we be interested in problems or theory? If we are interested in studying problems, we shouldn’t let theoretical trends bog us down. We should just study whatever real world problems are most compelling to us. If we’re interested primarily in theory, we need to let theory deductively guide us to those problems that help us solve a particular theoretical puzzle. Some very senior scholars in the field threw their weight behind the former view. I don’t want to name any names here, but one of the scholars who suggested we should be more interested in real-world problems is now the editor of a major journal of our field. He offered several examples of papers recently published in that journal that were primarily driven by interesting observations about empirical phenomena.
One of the new assistant professors in the crowd threw a pointed objection to the editor. And I paraphrase, “This all sounds great. I’d love to study empirical problems, but reviewers won’t let me! They keep asking me to identify the theoretical gap I’m addressing. They demand that I make a theoretical contribution.” Good point young scholar. Reviewers do that a lot. We’ve had it drilled into us from our grad school days that this is what makes a study interesting. If the paper lacks a theoretical contribution, reject it (no matter how interesting the empirical contribution may be)! This is a major obstacle, and I don’t think the esteemed editor could offer a strong counter-argument to the objection. Editors, after all, are somewhat constrained by the reviews they get. I think what we need is a new way to think about what makes a study valuable. We need new language to talk about research quality.
A key empirical question in social network analysis is whether Americans have more or less friends over time. Famously, Robert Putnam argued that indeed, we were “bowling alone.” In contrast, critics contend that these are misinterpreted results. Some types of networks disappear, while other appear.
On the social network listserv, Claude Fischer provides the latest round in the debate. Fischer uses 2010 GSS data to claim that the decline in strong personal relationships reported by McPhereson et al. (2006 in the ASR) is due to survey question construction. I’ll quote Fischer’s entire announcement: Read the rest of this entry »
Attention Stata people (esp. Sr. Rossman): Let’s say I have a data base of articles. I have a variable with the author’s name. Then I want to match the author’s name with other data (e.g., Fabio Rojas is matched with height 5′ 8″).
Merge 1:m is the command, but there’s a problem. Let’s say that my author data base doesn’t use the same spelling (e.g., Fabio G. Rojas or fabio rojas). Then the merged data set will have missing data.
Is there a way in Stata to offer the programmer a choice of possible matches to minimize missing data caused by variations in spelling? If not, what program or language has an easy to use tool box for this sort of stuff?
Last week, I argued that retractions are good for science. Thomas Basbøll correctly points out that retractions are hard. Nobody wants to retract. Good point, but my argument wasn’t about how easy it is to retract. Rather, it’s about the fact that science is exceptional in that it has a built in error correction mechanism.
In reviewing the debate, Andrew Gelman wrote:
One challenge, though, is that uncovering the problem and forcing the retraction is a near-thankless job…. OK, fine, but let’s talk incentives. If retractions are a good thing, and fraudsters and plagiarists are not generally going to retract on their own, then somebody’s going to have to do the hard work of discovering, exposing, and confronting scholarly misconduct. If these discoverers, exposers, and confronters are going to be attacked back by their targets (which would be natural enough) and they’re going to be attacked by the fraudsters’ friends and colleagues (also natural) and even have their work disparaged by outsiders who think they’re going too far, then, hey, they need some incentives in the other direction.
A few thoughts. First, fraud busting should be done by those who have some security – the tenured folks – or folks who don’t care so much (e.g., non-tenure track researchers in industry). Second, data with code should be made available on journal websites, with output files. Already, some journals are doing that. That reduces fraud. Third, we should revive the tradition of the research note. Our journals used to publish short notes. These can be used for replications, verifications, error reporting and so forth. Fourth, we should rely on journal models like PLoS. In other words, the editors will publish any competent piece of research and do so in a low cost and timely way. Fraud busting and error correction will never be easy, but we can make it easier and it’s not hard to do so.
A focus of network research since, say 1999 or so, has been to identify “laws” that generate large networks with certain properties.* For example, the small world network is built by rewiring a grid. Various processes generate power-law networks (i.e., the node distribution is described by a power law).
I can see two justifications for this type of research. The first is diffusion theory. The speed at which something diffuses in a network is definitely governed by the structure. The second is a sort of physical science justification, where you think of a network as a “system” and you show that some micro-process (e.g., preferential attachment) creates that network.
Is there any other behavioral implication of studying power laws/small worlds or other specific large scale properties? In other words, why should I care about scale free or small world networks aside from diffusion theory?
* Let’s leave aside recent criticism of power-law centric research for the sake of the post.
I’m still mulling over some of the issues raised at the Chicago ethnography and causal inference conference. For example, a lot of ethnographers say “sure, we can’t generalize but ….” The reason they say this is that they are making a conceptual mistake.
Ethnography is generalizable – just not within a single study. Think of it this way. Data is data, whether it is from a survey, experiment or field work. The reason that surveys are generalizable is in the sampling. The survey data is a representative sub-group of the larger group.
What’s the deal with ethnography? Usually, we want to say that what we observe in fieldwork is applicable in other cases. The problem is that we only have one (or a few) field sites. The solution? Increase the number of field sites. Of course, this can’t be done by one person. However, there can be teams. Maybe they aren’t officially related, but each ethnographer could contribute to the field of ethnography by randomly selecting their field site, or choosing a field site that hasn’t been covered yet.
Thus, over the years, each ethnographer would contribute to the validity of the entire enterprise. As time passes, you’d observe new phenomena, but by linking field site selection to prior questions you’d also be expanding the sample of field sites. This isn’t unheard of. The Manchester School of anthropology did exactly that – spread the ethnographers around – to great effect. Maybe it’s time that sociological ethnographers do the same.
a response to andrew gelman on the statistics discipline, but not scott because he thinks i’m a sad distraction in higher education and that like, totally, hurt my feelings
On Friday, I write a semi-humorous post about the interaction between statisticians and non-statisticians. The issue that brought it up was that sometimes statisticians like to work on asymptotic results. This, by itself, isn’t bad. It’s good to know what an estimator does when you have a nice big sample that behaves well. My beef is that sometimes small samples – the ones that most social scientists work with – are treated as an inconvenient afterthought. That rubs me the wrong was because mathematical elegance is accorded more importance than addressing the core problem of statistics – which is to accurately model, measure and study the relationships between variables.
Andrew Gelman wrote a simple response, which is that I am hanging around with the wrong people. There is some truth. The last time I had the “n–>00″ argument was with a visitor. Indiana has hired some exceptional applied statisticians, like Stanley Wasserman. The program has also hired people with non-statistics PhDs, like sociology and economics. I have consulted with these folks and it is easier to get concrete guidance on statistical practice.
But still, as multiple comments noted at orgtheory and Gelman’s blog, there are a lot of people with the title “statistician” who do treat issues of model estimation with small samples as an afterthought. This does happen, though maybe not as much as it used to.
Let me conclude this post with a comment about the sociology of the statistics profession. Statistics is a discipline that is analogous to computer science. Computer science can be math, engineering, applied science, or even philosophy (think artificial intelligence). Statistics is the same way. It can be mathematical, applied, or even visual. Consequently, there is no standardized cultural template for what a statistics department is.
Sometimes, statistics lives inside a math department. Sometimes it is distinct. At Indiana, they are trying an interdisciplinary approach where you have stat, math, and social science PhDs in the same unit. Each organizational environment creates pressures for different research.
If you live in a math department, you almost certainly can’t get promoted unless you study functional analysis or numerical analysis as applied to statistical issues. That produces people who are probably incapable of interacting with others who aren’t interested in the mathematics. Once you have your own department, you diverge from this model. Some statisticians are highly applied and many PhD graduates get jobs in professional schools and social science programs. These multiple pressures mean that you probably get a wide range of people, some of whom think statistics is just a field of mathematics while other can actually help people with real world statistical problems.
Here’s a conversation I’ve had a few times with statisticians:
Statistician: ” … and these simulations show how my results work.”
Me: “What does your research tell us about a sample of, say, a few hundred cases?”
Statistician: “That’s not important. My result works as n–> 00.”
Me: “Sure, that’s a fine mathematical result, but I have to estimate the model with, like, totally finite data. I need inference, not limits. Maybe the estimate doesn’t work out so well for small n.”
Statistician: “Sure, but if you have a few million cases, it’ll work in the limit.”
Me: “Whoa. Have you ever collected, like, real world network data? A million cases is hard to get.”
Statistician: “The Internet is a network with millions of nodes.”
Me: “Sure, but the Internet is one specific network. Most real world networks have hundreds or thousands of nodes. Like a school, or firms that trade with each other. Network data is expensive to collect. Some famous social science papers analyze networks of dozens of people.”
Statistician: “Um… the Internet! Scaling! Big networks! The Internet is a network! Facebook! FACE. BOOK!”
Me (rolls eyes): “What-EVER!”
This illustrates a fundamental issue in statistics (and other sciences). One you formalize a model and work mathematically, you are tempted to focus on what is mathematically interesting instead of the underlying problem motivating the science. An economist works on another equilibrium theorem rather than, say, taxes. The physicist works on the mathematics of super string theory, even when the experimental evidence isn’t there.
We have the same issue in statistics. “Statistics” can mean “the mathematics of distributions and other functions arising in statistical models.” Or it can mean the traditional problems of statistics like inference, measurement, model estimation, sampling, data collection/management, forecasting, and description. The problem for a guy like me (a social scientist with real data) is that the label “statistician” often denotes someone who is actually a mathematician who happens to be interested in distributions. That’s why they are happy with limit theorems, because limits smooth out hard problems and produce elegant results.
What I really want is a nuts and bolts person to help me solve problems. I may tease economists for their bizarre obsession is identification at the expense of all else, but at least identification is a real issue that needs to be taken seriously.
Let’s say you are doing discrete logit event history analysis. You are simply pooling all cases and time periods and just estimating a logit , where Y = failure event. See Yamaguchi’s (1991) book, chapter 2.
Question: why don’t people do a fixed effects kind of model, or cluster by case? There may person level heterogeneity that you want to account for. One way to address this is to do a logit w/fixed effects for each person in the population. Another way to do it is to try control for inter-person correlation (i.e., person X’s observation in time T and T+K are probably correlated).
This sort of adjustment is standard in panel data. Event history data has the same basic set and the same issues with correlated errors within cases, but most event history papers (including my own) don’t deal with this. Why?
These questions came up during orgtheory training last week. I did not have good answers:
- A lot of performativity research focuses on stock options, less on futures. Why?
- Are there good studies of performativity of theory that aren’t about the economic profession?
My lame answers: 1. Everyone is taught Black-Scholes first, but no reason performativity theory couldn’t be applied in other types of markets, 2. economics is the most influential intellectual group that has a theory of social behavior that is inaccurate (which makes performativity possible) . Post your answers in the comments.
Michael Bishop, of the Permutations blog, has set up a web site to archive R code for Add Health. Rather than have every Add Health researcher reinvent the wheel, he wants to sponsor an open source community that will provide R code.