orgtheory.net

privacy issues

with 9 comments

Via Jenn, Michael Zimmer engages in a discussion with sociologist Jason Kaufman about the privacy issues involved with collecting and using Facebook data. Jason’s study examines a cohort of students at “an anonymous, northeastern American university,” using Facebook data to assess changes in the students’ tastes and associations.  The research idea is very cool given that there are so few good data on changes in either networks or cultural preferences.  The coolness of the study is augmented because one can presumably see how the two coevolve.

Zimmer, though, takes issue with the manner in which the data were collected and will eventually be released to the general public.  Although the researchers take great pains to hide (and later destroy) any identifying information, Zimmer maintains that the users’ privacy is nonetheless compromised.  First, it is easy to identify the institution as Harvard.  Second, even with identifying information deleted, demographic features may make certain individuals stand out anyway.

The fact that the dataset includes each subjects’ gender, race, ethnicity, hometown state, and major makes it increasingly possibility that individuals could be identified. For example, if the data reveals that student #746 is a white Bulgarian male from Montana, majoring in East Asian Studies, there probably aren’t that many who fit such a description.

Also a problem, Zimmer argues, is that cultural tastes alone could identify people, given that the more unique someone’s tastes are, the more they will stand out among their peers.  Jason and his colleagues are not releasing the data until 2011 (and until then the data are stored on a local, secure server) but Zimmer thinks that this time lag isn’t long enough. But what really seems to bug Zimmer is a more humanistic concern that the data collection procedure was in some way unethical.

[J]ust because users post information on Facebook doesn’t mean they intend for it to be scraped, aggregated, coded, disected, and distributed. Creating a Facebook account and posting information on the social networking site is a decision made with the intent to engage in a social community, to connect with people, share ideas and thoughts, communicate, be human. Just because some of the profile information is publicly avaiable (either consciously by the user, or due to a failure to adjust the default privacy settings), doesn’t mean there are no expectations of privacy with the data. This is contextual integrity 101.

Jason defends the data collection and usage procedures in the comments section by arguing that their study was painstakingly reviewed and approved by a stringent IRB process. He also defends the ethics of the process.

Would you require that someone sitting in a public square, observing individuals and taking notes on their behavior, would have to ask those individuals’ consent in advance? We have not accessed any information not otherwise available on Facebook. We have not interviewed anyone, nor asked them for any information, nor made information about them public (unless, as you all point out, someone goes to the extreme effort of cracking our dataset, which we hope it will be hard to do).

I sympathize greatly with Jason and think that the likelihood of someone cracking the code and using this information harmfully is extremely low.  At some point the research community has to compare the likelihood of a data collection process causing harm to individuals with the benefits associated with advancing scientific knowledge.  In this case, I think Jason and his colleagues are on the right side.  However, reading Zimmer’s thoughts made me think twice about the ways in which we deal with privacy issues. It’s not uncommon to read an organizational study in which the organization itself is easily identifiable.  All one has to do to find the organization in many case studies is a simple google search of quotations found in the paper and the real name of the company quickly shows up.  Is this a problem? Again, I don’t worry about it too much because I think the upsides greatly outweigh the downsides, but I do respect Zimmer for taking the privacy issue seriously.

The ethical question seems to me to be more complicated but, again, I’ll side with Jason on this.  According to the standards of ethical research put forward by Zimmer, most ethnographies and historical research would be off-limits (did the deceased person in question intend her words to be broadcast publicly when she wrote her thoughts in a journal entry?).  People live with the risk all the time that their actions are observable and that inferences could be made based on those observations. That doesn’t stop people from waking up and walking out of their houses in the morning.  Living in public space is, I think, giving permission to the larger world to observe and record your actions.  It only becomes harmful, from a researcher’s standpoint, when those actions could be used to take advantage of the person or to expose that person to greater risks (e.g., possible physical injury) than simple public exposure.

Advertisement

Written by brayden king

October 22, 2008 at 2:20 pm

9 Responses

Subscribe to comments with RSS.

  1. People live with the risk all the time that their actions are observable and that inferences could be made based on those observations. That doesn’t stop people from waking up and walking out of their houses in the morning. Living in public space is, I think, giving permission to the larger world to observe and record your actions.

    I don’t think this the right characterization.

    It’s a tricky question. I wrote something a while ago saying, in part, that inside every quantitative social scientist is a data-grubbing megalomaniac who is really quite difficult to control.

    Kieran

    October 22, 2008 at 2:52 pm

  2. I agree with Kieran. It’s tricky. First, Facebook is *not* public. You need an account to get access. Early on, it was specifically restricted to Harvard students, and then only to other students. So there is actually a strong presumption of privacy. Also, the site has privacy settings, which also implies a level of privacy.

    However, there may be some waiver that users may have to agree to in order to use the site, which may have included a privacy waiver. Then it depends what the waiver actually says, but I’m not sure public access to data is part of the waiver.

    fabiorojas

    October 22, 2008 at 3:05 pm

  3. It only becomes harmful, from a researcher’s standpoint, when those actions could be used to take advantage of the person or to expose that person to greater risks (e.g., possible physical injury) than simple public exposure.

    People who would like the Government (or private companies) to spy on you stereotypically say, “Hey, it’s not like you have something to hide, right?” But this “greater risks … than simple public exposure” standard is even lower: more like, “Hey, it’s not like I’m going to come over to your house and beat you up, right?” The “simple public exposure” of information about people can be extremely damaging all by itself, as any privacy advocate (or blackmailer) will argue.

    More structurally … you could think of the general problem in a Simmelian sort of way. You’ve got these partly overlapping social circles and roles, and your individuality (and ‘inner core’ of privacy) is in part a product of that structure. Now, if entities in those various circles are collecting data about you, that might be bad for your privacy. But, in practice, maybe it’s not so bad if that data is all on paper, or hard to search, or — even if it is in a bunch of different digital databases — it can’t easily be shared or integrated in a way that builds up a full picture. So your freedom and privacy in part depends on a continuing lack of co-ordination and integration of the various state, market, and community organizations that have digital records of some bits of your life. To the extent that this integration problem is ‘solved’ and some agency gets to knit lots of the stuff together, you get unpleasant outcomes where some people are in a position to know huge quantities of information about everyone, and so the modern individual is returned to a weird sort of gemeinschaft-like state where everything is known about them by someone.

    Kieran

    October 22, 2008 at 3:33 pm

  4. You got the data-grubbing megalomaniac right (and I’m referring to myself).

    brayden

    October 22, 2008 at 4:11 pm

  5. Kieran @ 3: that’s a really great insight.

    Peter

    October 22, 2008 at 4:42 pm

  6. Do we know the level of privacy that Facebook users (or Harvard undergrads) perceive there to be on Facebook? It seems that the users’ beliefs about privacy are relevant to the ethical issues here.

    perchesk

    October 23, 2008 at 2:38 am

  7. Christine is right – Facebook has many features indicating that the owners assume some level of privacy.

    fabiorojas

    October 23, 2008 at 3:29 am

  8. Can we ask, “What would Omar do?”

    Mrs. Smith

    October 23, 2008 at 5:29 pm

  9. [...] to the data, figuring out how to process it, and dealing with privacy issues. The latter issue heated up last year when it was revealed that data gathered from university students’ Facebook networks could be [...]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 272 other followers