Big Data Research Creates Ethical Concerns

Jan 2, 2017

Every time you post to Twitter or Facebook, these sites are collecting data about you. At this point you ought to expect that by participating in social media sites, you’re giving up some of your privacy. It’s just the name of the game.

Some see big data from social media sites as a god send for researchers - a perfect way to study social habits with huge numbers of people. But what happens when that data with your personal details still attached is published for a study, for the world to see?

A recent release of a dataset from the dating site OkCupid has raised the ire of one local research ethicist, and has started a larger question about the use of this big data. 

"A lot of times users don't fully understand what they're signing up for. Or, I understand that Twitter is a platform where I'm gonna share 140 character messages, but it gets kind of lost in the sea of all the thousands of other messages that came out at the same very moment," says Michael Zimmer, director of the Center for Information Policy Research. "They don't really have a clear understanding that that stuff can be archived and then perhaps later accessed by someone."

The dataset that spurred this recent controversy has been removed because of copyright concerns but still exists elsewhere on the internet, which is part of the problem with these kinds of data dumps. 

Although the researcher initially used a bot to scour the website for data, the actual method they used to acquire this information is unclear. Zimmer says this raises another ethical question, because while researchers may claim the information is public, there can be some assumption of privacy.

On OkCupid, for example, users are able to make their profiles only viewable to other people using the website. Other social networking sites have similar privacy settings, so while the content is published on a site that doesn't necessarily mean people expect it to be completely public. 

"So when a researcher like this says, 'Well this stuff was already public,' what he kind of really means is like, 'This stuff was visible to other users who happen to also create a profile,' and those aren't the same thing," says Zimmer. "Psychologically I think it's important for users when they sign up for this thing to have this assumption, or these set of expectations, that I know this data is kind of public but it's meant for this community... Doing this kind of research sometimes violates that assumption." 

*Originally aired May 31, 2016