 Good morning everyone, thanks for coming today. My name is Kenning Arlich, I'm the Dean of the Library of Montana State University and Scott Young is our User Experience and Assessment Librarian. And Carl Benedict is also in the audience, we have Carl, Carl is the Director of Research Data Services at the University of New Mexico and he was a co-author on the research that we're going to present to you today. So it's no secret that we are deep into our surveillance society, I don't know if anybody saw this article yesterday in the New York Times. Your apps know where you were last night and they're not keeping it secret, it's a very interesting article. This is not specifically what we're going to be talking about today, we're just going to be talking about websites, looks great doesn't it? Oh, did you want to show more from the Internet? It's just this. It presents her mouth. Okay. Okay. There's a great article in the New York Times and it is called your apps know where you were last night and they are not keeping it secret, published yesterday. I'll switch back to the presentation now. So as I said we're not talking specifically about apps today but we are talking about an audit that we conducted on academic library websites to determine whether we're actually living up to the privacy and security principles that we like to espouse. So we'll cover, Scott will start talking first about third party tracking and web analytics and then talk a little bit about the privacy principles that we do talk about in numerous different forums and from numerous professional organizations. And then I'll launch into the actual research that we conducted for this audit and the results that we came up with and then we'll make some recommendations for how to improve things. And then we have at the end some discussion of prompts that hopefully will lead to some more conversation. So Scott will start. Hello everyone. So again I'm Scott Young, user experience and assessment librarian at Montana State. Kenning's reference of the New York Times article is a useful one because it kind of shows how compromised we all are in surveillance and third party tracking because the New York Times itself is one of the greatest offenders of news publishers. They have the most trackers of any leading publisher. So they ask for the most data and they're not very transparent about it. So it's quite rich from them in particular. So for third party tracking some of you are really familiar with this. For the rest we just want to instill just the right amount of fear in you to prompt action but not so much that you feel paralyzed. So third party tracking. Web analytics services that's what our research focuses on. So it measures how people use our websites. Information through these trackers is sometimes passed to other trackers. And this happens often without the fully informed consent of users. And we say fully informed because a lot of us have some consent, some informed consent but do we really understand the full extent? So that's what we tried to study and that's what we're trying to talk about today. And when we say users we mean of course the people who visit our websites but then ourselves as well. Do we understand the extent of it? The practice of third party tracking on websites is really widespread and it's increased in prevalence, variety and complexity over time. The trackers, cookies for instance were pretty simple about 10 years ago. They're extremely sophisticated now and they can talk to each other. There's a vast network of trackers that share data. So if there's one tracker has a little bit of you in one context it can match up with trackers that are in another context to build a complete profile and then a version of your digital self is sold and traded across these trackers. So there's a picture of you, your sort of digital self out there being bought and sold. So libraries, we are a part of this. We're partnering with third party vendors that play this game. So Google Analytics was a focus for us but really any sort of e-resource vendor, this is an issue for them. One that comes to mind recently was a browser extension called Lean Library which promised to connect users seamlessly with e-resources but they were a private company and they asked for a lot of data. So these third party companies, they might not share our values. So it's hard for us to operate in accordance with our values when the software that we use doesn't really care about that so much. So we're just going to jump right to the end here. Things we can do. For Google Analytics and web tracking or web analytics we can use a different service. Google Analytics is an obvious one but there are at least two others that you could look into, Matomo, previously Pwik, and then there's one called Open Web Analytics. So just take a look at those, see if they can work for you. If you do want to use Google Analytics, configure the IP anonymization site. This is a configuration that is built into Google Analytics. Turn it on. It obscures a little bit of your user information. And then you can implement opt-out mechanisms to your website to help users turn off tracking. And then use HTTPS. We've been talking about this in libraries for a little while but as our research showed not a lot of libraries have actually implemented this so we're here to say it again. HTTPS. So just a quick overview of Google Analytics. It has some benefits. We know this. It's free to use, monetarily. It's easy to install. It's extensive and sophisticated. It has a lot of charts and graphs which are really fun and helpful. And it can provide useful insights. It can. But of course on the other side of the ledger there are costs. It passes user data to Google. It may not align with our values in terms of privacy and intellectual freedom. There are inaccuracies in the tracking. We know this as well. There's other research that Kenny and others are on to try to understand accuracy in web analytics reporting but that's a really difficult task. And as a result of that it can produce dubious insights. So it's hard to say sometimes what you're looking at and how to build actionable responses to Google Analytics data. So we collect lots of data through these third party vendors like Google Analytics. But we as a community can have a lack of understanding for the technology. And we don't always appreciate its privacy costs. So that was sort of our starting point. And then we wanted to put some empirical data on this. And more of this background is our privacy principles. This I'm sure you all memorized, but we will go over it again because it's very important. NYSO says, libraries, publishers, and software providers have a shared obligation to foster digital environment that respects library users' privacy as they search, discover, and use those resources and services. IFLA says, library information services should reject electronic surveillance of any type of illegitimate monitoring or collection of user data. These are strong statements. They're good statements. ALA says that the right to privacy is the right to open inquiry without having a subject of one's interest examined or scrutinized by others. And CNI has a statement, libraries collecting data using Google Analytics are realizing that they may be violating ALA library bill rights. This is just one example of how easily convenient web based service offerings can come with unexpected consequences. Michael Zimmer a few years ago did a professional wide survey where he found that 97% of librarians agree or strongly agree that libraries should never share personal information. And share personal information. That's the point of that, unquote. So we have these guiding principles. We say we want to live these values. So how are we doing that? How are we doing that? We know that privacy has long been a concern of libraries, but given the extent of third party tracking, it is really difficult to implement analytics trackers like Google Analytics without compromising the privacy for users that libraries have long championed. So if there's one diagram that can kind of summarize this, it might be this one. On the one hand, we want to show our value. We want to understand our services. We want to improve our services through Google Analytics. And then on the other side, we want to live our values. And sometimes this is a site of tension for us, understanding what our values are, but showing our value at the same time. OK, so let's dig right into the research. This is the article that was published in online information review just a couple of months ago, and we'll show the citation again at the end. As they said, we conducted an audit of library website home pages to try to find out if we were really living up to those principles that Scott was just showing. So we had two major research questions. The first was, do libraries implement HTTPS? And this is important with proper redirect practices in place. You can implement HTTPS. But if an insecure request comes in and you're not directing that insecure request to a secure request, to a secure fulfillment, then you're still not really utilizing HTTPS very well. And then the second question was, do libraries that use Google Analytics implement the available privacy protection measures? And this is something important to emphasize. Yes, Google Analytics has its problems, but there are features built into it that can be implemented, can be turned on, and we wanted to protect privacy, and we wanted to know whether libraries are actually doing that. So a little bit more detail on these two research questions. The first one about whether libraries have implemented HTTPS on their home pages. So specifically, we wanted to know if they protect privacy with a secure connection between the user's browser and the library's website. Do they use a permanent redirect to enforce that secure connection? And do they sometimes defeat the implementation of HTTPS by redirecting to an insecure connection? Even when a secure connection comes in, do they sometimes redirect to an insecure connection? And then a little more detail on the second research question. First, we wanted to know how many of the libraries in our sample use Google Analytics. Do those libraries protect user privacy by implementing that secure connection between Google Analytics and Google's, or between the library's web servers and Google Analytics servers at Google? And then there's also an IP anonymization feature that is built into Google Analytics. Has that been turned on? So these are the questions that we wanted to ask through our audit. So the methodology we employed, WebOmetrics is a subset of a much larger set of research methodologies known as Informatrix. Scientometrics, Bibliometrics, Cybermetrics are all part of this, smaller concentric circles. And WebOmetrics was initially developed, I think, about in the 1990s and focused on statistical analyses of words and phrases, but more modern definitions also take into account resources, technologies, and infrastructure on the web. The specific method that we used could be classified as a social sciences research method called covert observation. And this is where you basically watch a participant without them knowing that you're watching them. Well, of course, here we weren't watching people. We were simply checking for the presence or lack of presence of certain technologies on websites. In social sciences, covert observation research has sometimes run into ethical problems. But again, here, we weren't looking at people. We were just looking at whether a certain set of technologies and implementations were available on public websites. So we're totally ethical. So here's our study population, 279 US and international academic libraries, which included 16 countries. The study population had to have a membership in one or more of the following organizations, the Association of Research Libraries, OCLC Research Library Partnership, and or the Digital Library Federation. So that's how we came up with 279 academic libraries. So two hour, wow, this is cool, this is what happens when you switch from one computer to another. All right, so what's your guess? 6040 is pretty close. So about of our pool of 279 academic libraries, 62% or 173 had implemented HTTPS, and 38% had not. We'll go to the next non-value. Then we asked the question, of those 173 that had implemented HTTPS, how many of them had implemented an appropriate redirect for insecure requests to a secure fulfillment? And the answer there is, oh, you're looking at the article. All right, only 32% had implemented those redirects. Then we asked, oh, and I should also say, then we asked research question two, how many research libraries had implemented Google Analytics? Yes, I've got a ringer out there. 88% of us have implemented Google Analytics. Do you want to come up and give this? 12% have not. Then we asked, of those of that 88% that are using Google Analytics, how many of them have implemented the privacy protection features that are available in Google Analytics? I'm not asking this time. So 85% have implemented no privacy protection features that are available in Google Analytics. 14% have implemented the Google IP anonymization feature, that's the blue wedge. And only 1% have implemented the library to Google HTTPS feature. And 0% had implemented both of those protection features, IP anonymization and HTTPS. So I think you'll agree those are pretty, well, it's pretty stunning evidence that shows that even though we care a lot about privacy, we're not doing what we could pretty easily be doing to protect the privacy of our users. So Scott will now talk a little bit about recommendations that we make. Well, that was fun. We should do that every time. Kenny and I actually rehearsed that. So thank you for being a part of that. So just to reiterate those five recommendations, this is within the context of Google Analytics. There's a lot more that we can do to protect our patron's privacy. We know that. But within our context, this is what we're asking the community to do, implement HTTPS. This is a basic encryption protocol. There are tools like OpenSSL and others that can help you implement HTTPS, work with your systems team, work with your digital librarians to get that done. Or your UIT, your campus IT, if that's the situation. And then if you want to run Google Analytics, or you currently are, IP anonymization. That's just a one line addition to the configuration snippet. It's pretty easy. You just have to know how to do it. So now you do. User education, that's one of the things that we in libraries are really, really good at. So leverage your outreach to talk with your communities about these matters. And also your in-reach, talk with your staff so that you're also educated on privacy and the web. Inform consent from users. That's a really difficult one. There's this degrees of fully informed consent. But one thing that we recommend is cookie notices on library websites. This is common around the web. I'm sure you've seen dozens and dozens of them by now. But you probably haven't seen any on library websites. I haven't seen any on library websites. I'm not sure if libraries really do this. But we should. We should tell our users how we're tracking them and how they can learn more and who they can contact. Even though we know that users don't necessarily read privacy policies and they just click those away, it's still something that should be there for the users that do want them to be there. And then lastly, conducting a risk benefit analysis when working with third party vendors. Take the time to really look at what we're getting in return for the services that we're using. Are Google Analytics insights worth the cost, for example? And then that risk benefit analysis can be applied for any sort of contract or situation you may find yourself in where we're trading user data for services. So those are our five recommendations. And I just want to add a couple of things about how we did this, how we conducted this security audit. We looked for HTTPS. We looked for the certificate in the library website and how that certificate was responding to both secure and insecure requests. And then for the Google Analytics, we looked for evidence of the Google tracking code or the Google tag manager tracking code in the library websites. And that's how we came up with these numbers. So this was research that was generously funded by the Institute of Museum and Library Services, part of a larger grant that is closing out this month called Measuring Up. And again, there is the citation for the publication if you're interested in seeing the actual numbers and not just having me tell them. So a couple of discussion prompts. I think we have a little bit of time. This is clearly a thorny issue for us. We have these principles. We have these values about privacy. We're not quite living up to them. And in the meantime, we may be hindering ourselves by adhering too closely to those principles. We cannot know our users very well or how they're behaving or what they need if we don't have some sort of tracking mechanism. So it's a difficult problem. But these are some of the questions that Scott and I thought might spurs some conversations. So rather than going through them one by one, I would just open it up to discussion if you want to address one of these questions. Feel free. Otherwise, we're happy to take a question.