 One of the buzzwords of our time is big data. It's hard to define the term, but one often used to definition states that big data is characterized by its volume, the amount of data, its velocity, the speed with which new data arrive, and its variety. Lots of different types of data. Most importantly, it is not only the size of a data set that makes it big data. For instance, often, data that were created for a very different purpose are used in another analytical context. Think of social media data. People do not write treats in order to create a data set for analysis, but of course, the treats can be analyzed. In a survey, by contrast, the sole purpose of filling in the survey is to create a data set for analysis. Let us discuss some of the ethical issues that can arise here. In traditional research, the boundary between private and public is mostly clear. For instance, when you ask someone to fill out a survey for you, you can make explicitly clear that their answers are private and anonymous and that any personal data will not be made public without their consent. A participant can then agree with whether or how their data is used. But that is less likely in the case of social media data. A treat is technically speaking public, but the writer probably intended to reach a much smaller audience. Let's illustrate these blurring boundaries. On the dating side, OKCupid profiles are public. Of course, because that's the very reason why people create a profile. They want others to see their profile. However, when two researchers downloaded the profiles of 70,000 users and put them into a data set, including information like user names and sexual orientation, they were heavily criticized. Intuitively, making a huge data set available that allows anyone to search for your sexual orientation seems wrong to most of us. But on the other hand, the researchers in fact only took information that was already publicly available and made it more easily accessible. One might say that the ethical issue lies in the fact that new technologies allow for uses of data that had not been anticipated. From a deontological perspective, this goes against people's rights to privacy and autonomy. According to Kant, we need to respect people's rational decision-making and freedom of choice. And in this case, people no longer have any direct influence over what happens with the data. Using people's data for other purposes than what they originally intended implies that you are using people only as a means to an end. This goes against Kant's idea of moral behavior. From a deutilitarian perspective in contrast, the situation would be less clear-cut. For instance, one could imagine a scenario in which the calculation of the costs and benefits reveals that the costs for the participants are so minimal that they are outweighed by the benefits for the company and maybe even by benefits for the users. If the results lead to a better user experience for future visitors of the website, for example. A similar issue arises when combining data sets. On the one hand, combining information from different sources is the whole point of many big data applications. Data sets that in themselves do not have much value do have value when combined. For example, everyone would agree that many companies need to somehow keep a record of the addresses of their customers. It will also not be very controversial to keep track of their previous purchases. However, one such data set would be merged with data from other sources, like social media data or clicking behavior on a large number of websites. This becomes a different story. Not only because the user has not consented to this, but also because based on such a data set, a complete profile of the person could be created. Again, we can reason in different ways about this. From a deontological perspective, we would probably conclude that this would qualify as treating a person as means, not as an end. From a utilitarian perspective, in contrast, one would have to calculate the sum of costs and benefits. What are the advantages and disadvantages for the users? What are the advantages and disadvantages for the company? And do the total advantages outweigh the total disadvantages? For instance, a cost for the users could be their loss of privacy, but the benefit might be that they get a better service. For the company, the benefits could be more effective marketing and thus more profit. Whether these benefits outweigh the cost for a potentially large group of users is the ethical question that has to be answered.