algorithms in society – comments on a talk by jure leskovec
Jure Leskovec describes his bail data… in technicolor.
Jure Leskovec is one of the best computer scientists currently working on big data and social behavior. We were lucky to have him speak at the New Computational Sociology conference so I was thrilled to see he was visiting IU’s School of Informatics. His talk is available here.
Overall, Jure explained how data science can improve human decision making and illustrated his ideas with analysis of bail decision data (e.g., given data X, Y, and Z on a defendant, when do judges release someone on bail?). It was a fun and important talk. Here, I’ll offer one positive comment and one negative comment.
Positive comment – how decision theory can assist data analysis and institutional design: A problem with traditional social science data analysis is that we have a bunch of variables that work in non-linear ways. E.g., there isn’t a linear path from the outcome determined by X=1 and Y=1 to the outcome determined by X=1 and Y=0. In statistics and decision-theoretic computer science, the solution is to work with regression trees. If X=1 then use Model 1, if X=2 then use Model 2, and so forth.
Traditionally, the problem is that social scientists don’t have enough data to make these models work. If you have, say, 1000 cases from a survey, chopping up the sample into 12 models will wipe out all statistical power. This is why you almost never see regression trees in social science journals.
One of the advantages of big data is simply that you now have so much data that can chop it up and be fine. Jure chopped up data from a million bail decisions from a Federal government data base. With so much power, you can actually estimate the trade-off curves and detect biases in judicial decision making. This is a great example of where decision theory, big data, and social science really come together.
Criticism – “algorithms in society:” There was a series of comments by audience members and myself about how this algorithm would actually “fit in” to society. For example, one audience member asked how people could “game the system.” Jure responded that he was using only “hard” data that can’t be gamed like the charge, prior record, etc.That is not quite right. For example, what you are charged with is a decision made by prosecutors.
In the Q&A, I followed up and pointed out that race is highly correlated with charges. For example, in the Rehavi & Starr paper at the Yale Law Review, we know that a lot of the difference in time spent in jail is attributable to racial difference in charges. Using Federal arrest data, Blacks get charged with more serious crimes for the same actions. Statistically, this means that race is strongly correlated with the severity of the charge. In the Q&A, Jure said that adding race did not improve the model’s accuracy. But why would it if we know that race and charged are highly inter-correlated? That comment misses the point.
These two comments can be summarized as “society changes algorithms and algorithms change society.” Ideally, Jure (and myself!) would love to see better decisions. We want algorithms to improve society. At the same time, we have to understand that (a) algorithms depend on society for data so we have to understand how the data is created (i.e., charge and race are tied together) and (b) algorithms create incentives to come up with ways to influence the algorithm.
Good talk and I hope Jure inspired more people to think about these issues.