data bleg: categorical data

Please put in the comments, or link to, a data set that has the following properties:

  1. A few hundred cases, but not too many ( 300 < N < 1000).
  2. Longitudinal categorical variable X with the following properties
  3. Categorical variable should NOT be ordered. States should be like {chocolate,vanilla, strawberry}, not {strong agree, neutral, strong disagree}.
  4. About 4-7 time periods.
  5. About 4-7 states that X can be in (e.g., five political parties, five ice cream flavors).
  6. “Legitimate data” – no one will bug me about using this data set. Decent response rate, nice set of covariates for X, data collected for a legitimate research project, etc.

This is for a methods project I’ve been working on. So I don’t need something fancy, just something that that has these specific properties to highlight the strengths of the method. Feel free to email me as well.

50+ chapters of grad skool advice goodness: Grad Skool Rulz/From Black Power


Written by fabiorojas

July 25, 2014 at 12:01 am

Posted in fabio, mere empirics

4 Responses

Subscribe to comments with RSS.

  1. You could use racial/ethnic classifications from Census data for four decades. You could use the classification that Samantha Friedman used: predominantly white, mixed white-black, mixed white-other, multiethnic, mixed black-other, predominantly black, predominantly other; or select a geographic area with only three dominant racial/ethnic groups (e.g., Chicago). You could use John Logan and Brian Stults’ Longitudinal Tract Database to match census geography over time. Selecting a city smaller than NY or LA will get you in the right range for the number of cases.

    Probably not exactly what you were looking for, but I think that it might work for your purposes.



    July 25, 2014 at 2:37 am

  2. Race in NLSY seems to meet these criteria. See several pubs by Saperstein and Penner.

    Liked by 1 person


    July 25, 2014 at 2:38 am

  3. Also lots of datsets will work like this with occupation. EGP schema gives you 4-7 categories.



    July 25, 2014 at 2:40 am

  4. Funny, I was also looking for a longitudinal dataset with similar requirements, and also for a methods project (although for mine, the levels had to be ordered, and questions had to specifically be attitudinal). It turns out that attitudinal questions almost never get asked of more than 3 panel waves. NLSY is one of the few exceptions to this.

    You might be able find something in the fabulous 11-wave(!!) special ANES study done around the 2008 presidential election. I seem to remember reading that they were avoiding Likert scale in their design, so you might be able to do well with unordered data there. Another dataset to take a look at is Laura Stoker’s 4-wave youth socialization data (I forget the name of the dataset, but it’s up on ICPSR).


    Andrei Boutyline

    July 28, 2014 at 10:31 pm

Comments are closed.

%d bloggers like this: