 So, welcome. Thank you for coming. I'm Jen Stringer, Associate CIO for Academic Engagement at UC Berkeley, and basically what I do is the academic technology for the campus. So, I am here on behalf of a bunch of people who have spent an awful lot of time thinking about what I think is a difficult and complex problem that is just starting to be addressed in a broad way in higher education. And so, this is not my work necessarily. I've been part of it, but I want to acknowledge the amazing people who were on the working group, which was an IMS global working group. How many of you know what IMS global is? So, for those of you who don't, it's a standards organization. It's a not-for-profit standards organization that works to create interoperability standards for learning tools. And they also have done a lot of work around the student record and transferability of the student record data and that kind of thing. So, they really think about how to get systems to plug and play. And if you've ever heard of something called LTI, which is the Learning Tools Interoperability Standard, that is a standard out of IMS. And it allows you to plug lots of different kinds of tools into your learning management system. And they've started to think deeply about learning data and how to exchange learning data in a format so that what you get out of one discussion forum is in a standard format from another discussion forum so that you can use it for research and for other purposes. So, and we'll talk about that. But IMS has done a lot of work around thinking about learning data principles around privacy and appropriate use. And then the University of California as a system actually has a statement around student data privacy principles. And so my group, which is a bunch of educational technology leaders from across the campus, started thinking about how do we take that sort of statement of principles that the UC has already put out and how do we actually make it accessible and meaningful around learning data. And so we use that as our scaffolding as we were creating our own principles around learning data. It's not clicking. And I don't know why. Let me try this. Okay, I needed to get in the right thing. So what I wanted to do is set the context for why principles around learning data is important. I'll actually do some defining of learning data and analytics for you. And then I want to present the principles, both the IMS principles and the UC principles, and point out some differences. And also point out some areas, I think, where I'd really love to hear your voices in terms of I don't want to call them areas of disagreement. I think they're just areas across a spectrum of a variety of issues that we're trying to bring to the forefront. So I'm really interested in your thoughts and opinions about this, specifically around a couple of principles having to do with ownership and efficacy. And I'll talk through those. So I want to give a nod out to the fact that probably within, I would say, really the past two years, a lot of work has started to be done around this area. And so I wanted to point out that there was a cinema convention for learning research and higher education a few years ago that really started to look at these issues of what was appropriate, what kind of data do you need to do research around learning science? And how are we going to get that data and what are we going to do with it? And it turned into another conference that just happened about a year ago that was actually student data and records in the digital era. And out of that came a website and some resources called Responsible Use of Student Data and Higher Education. So again, I'm just pointing these out to you to say that a lot of people have been doing a lot of research and talking and discussing about these areas, these issues are ready. And then of course, the learning analytics community exchange is actually an EU thing. And so as we've got, you know, the Open University and a lot of universities that are, they've been thinking a lot about online learning and the kind of data that is associated with it and what's appropriate and what's not. And so they've come up with their own framework that they call the delicate framework. And it stands for a bunch of different issues that you should be thinking about when you're thinking about use of learning data in a variety of both research and what I'm going to call student success applications. So I use the term learning data, but oftentimes what you'll hear is the term learning analytics. And what I would say is you can't have learning analytics without learning data. And so analytics is the process of analyzing the data and learning data is the stuff, right? A definition of learning analytics really didn't start coming about until 2011 or so when the Society for Learning Analytics and Research actually held its first annual symposium. And that was LAK 2011. And you can see here that sort of two definitions of learning analytics were bandied about, I'm going to read them, because I think it's just important to think about them in the two different contexts. So one is the selection, capture and processing of data that will be helpful for students and instructors at the course or individual level. So this is talking about the data that is generated that will help students and learners and instructors. The other piece is much more around learning science. So learning analytics is the measurement collection, analysis and reporting of data about learners in their contexts for purposes of understanding and optimizing the learning, learning and the environments in which it occurs. So this is really much more about the research end of things. And so both are important because learning data is used in both ways. And learning data or learning analytics as in the analysis of that data is used for those two distinct purposes. And what's interesting is as you get into principles around privacy and disclosure and a lot of other those kinds of issues, you might come at them from different ways depending on if you're learning science research or if you're actually an instructor or student in a course. So I want to put the data in context because I think a lot of us are hearing a lot of information about metrics based decision making at our institutions. And now we all have institutional data where has is how many of you have institutional data warehouses at your institution, right? So in context, you've got institutional data that's usually in some sort of enterprise data warehouse. And this is this is my interpretation of the way I think about it. And it's not perfect, I will say. But I think it shows the interplay. Usually it's aggregate. Oftentimes it's de identified. Usually it's a dump. And they're providing analysis, you know, on a quarterly basis, or it's certainly not real time. And it's looking at things like graduation rates and yields and a lot of stuff around money usually. And sometimes things around course course enrollment, that kind of information is sitting in there. And then you've got the academic data. And that's usually in your student information system. And it's usually what you think of as the academic record, right? It's personally identifiable. It's usually the stuff that's on a transcript. But not everything that's on I mean, there's more than just what's on a transcript, like whether a student was on probation or not, which might not get into the transcript, whether they were identified coming in at risk. It may have some socioeconomic information about that student, their SAT scores, and then of course all the enrollments in their courses and that kind of thing. And in general, it's sitting in sys, sys application student information systems applications, and sometimes advising applications that may or may not be connected to the student information system. And then you've got what we're calling learning data. And what that is is personally identifiable user activity. And what do I mean by that? It's really like the log file activity. It's the interaction activity of a student, a student and a faculty member with usually some system of record like a learning management system. Maybe it's a clicker. It also might be a tool that a faculty member has decided to use that the institution hasn't licensed. But it is data about how students and faculty interact with the learning system and learning content. And like I said, in general, it's stuff that you might think of as information that you would pull from a log file. A student answered this question. You know, a student got this result on an automatic quiz score. A student responded to a writing prompt. This student looked at this piece of information for this amount of time. Okay. And there's a new game in town. There's a new thing that's storing this data that's called a learning record store. And there are some that are actually available out on the market right now. In general, those are used for a lot of professional education kinds of spaces. They're very expensive. And then there's some open source learning record stores as well. So I don't think anybody's found the quintessential learning record store. But they are starting to pop up. And they are different than an enterprise data warehouse. And they're different than the information that's in a student information system. And I would actually say that a lot of times it's semi structured data rather than fully structured data. And feel free to stop me or ask me questions at any time. The other thing that I should say is this is the definition of learning data that refers to the data generated by students, you know, interacting with documents in the teaching and learning experience and academic achievement. And I don't know that I actually really love that definition. But it is a definition that's right now sitting on those draft IMS principles. And so we'd love feedback on that too. But why do we even care about this stuff? You know, it's been around since people have been interacting with online systems. It's not just about online education. It's also about all the systems that faculty and students interact with on a regular basis. But we care about it because this data in the era of big data actually enables institutions to make some pretty interesting decisions. If you if you look at it over time and I'll show you some dashboards, it can impact student outcomes. Are any of your campuses involved in any of the students success? And I use that in quotes, not necessarily derisively, but just, you know, student success is all the big rage right now. And we want to increase retention and increase four year graduation rates. And so we're spending a heck of a lot of money on consulting firms and on instant and on services provided to us by vendors reputable vendors, you know, but around student success. And so any of you do know if you're involved in EAB student success, Civitas is another vendor. Do what vendor do you know what you're the one from the advisory department? Yeah. Mm hmm. Mm hmm. I don't know what it's called. Yep. Yep. Yep. So lots of different ones out there. We also care because if we give students information about their own behavior, it can actually impact the outcomes of in a course, for example, it enables faculty to support students and make changes to their courses based on data. And also it supports educational research and improved pedagogy, if used in the right way. So I just want to give you some examples of these things that I'm talking about. So this is a dashboard from Entelefi. And what you can actually see here, it's hard to see, but basically this is giving you information about student aggregate data and whether or not they're taking the quizzes and what the scores are. So it's giving faculty members a dashboard about their course. A lot of times these are built into learning management systems, but not to this kind of level. And oftentimes what they don't have is the mixed in data like information about the student's major and that kind of thing that these dashboards are pulling from both the learning management system, the student information system, so they know more about the student and then they're presenting sort of aggregate information back to a faculty member or even a program director. This is an example. This company actually got back, but Blue Canary was around for a while. This is a dashboard that a faculty member might see that is bringing to their attention students at risk. And this again is where we get into some of the issues that we're going to be talking about later and then I want your feedback on. But these are algorithms that are being created on particular indicators to give a student a red card or a yellow card or a green card. And it's based on a bunch of triggers including time spent in the learning management system, what their grades are, maybe even what their grades are elsewhere, what their general performance has been. What courses they've taken. And all together it's giving the faculty member an actual sort of picture of what this, what Blue Canary thinks the student is, how this student is doing in their course. And I've had some faculty members look at this and say, oh my gosh, that's the best thing ever. And I've had some faculty members go, whoa, wait a minute, I am not sure that I'm, how do I know to trust this thing, et cetera, et cetera. Or do I even want to know what, I know my students better than any tool is often a response. And we can get into some of the things around what Mitchell Stevens from Stanford calls open futures, which is ensuring that students actually have the ability that you're not closing doors to students because of algorithms or because of some red card, that you're not counseling a student out of a course, for example, or counseling them into a different major. This is an example of a student alert. So again, you could set alerts that say if the student hasn't logged into a learning management system in five days and they've got two late assignments, I want to know about it and I want to be able to send the student a notification. Or I'm an advisor and I want to know about it and I want to send the student a notification. This is an example of one of those. So you can see that the reason is the student has excessive absences and is not completing the class reading assignments or homework. This is an example that is a slightly different example. This is called the engagement index. It's actually something that we built at UC Berkeley that is using student engagement data to gamify an online course. And students actually get points based on their interaction with other students' content and commenting and liking. And then they actually get, and some faculty members actually choose to base the percentage of the student's grade based on the engagement index. So this is another example of using learning data or engagement data, not necessarily in sort of the analytics. How is the student doing in the course? Are they going to get a bad grade? But actually is an integral part of the course pedagogic design. So why is another reason why we care? Holy crap. We're collecting a ton of data about our students and yes the faculty always say, oh and about me. Yes, and about faculty and instructors as well. And you know the what is this clickstream lock file data? A lot of information about what you do. User experience designers like this stuff because it's how you improve your product, right? But it's also now being used in learning science and in some places predicting student outcomes. And who's collecting it? Well we'll talk a little bit about that, but it's not just the institution. We're collecting it. We're collecting it as an institution. Libraries are collecting it in terms of you know articles and journals that students might be looking at. Publishers are collecting it. Third-party vendors are collecting it. And this is this is something that is starting as the idea of the next generation digital learning environment. Does that sound familiar to anybody? That N-G-D-L-E is a notion that's been put out by Adricaz about two-and-a-half maybe three years ago which really has to do with the end of the learning management system and the ability to plug and play all different kinds of tools to create a much better engaging environment. And it's wonderful and it also means that there are an awful lot of players in the market that we're not we don't have contracts with that are collecting a lot of data about our students. And it's also Facebook. If faculty members are asking their students to Facebook it's Twitter if they're asking their students to tweet right. All of that is really interesting information about how students are engaging in the academic environment. Here are just some of the you know examples of what I would call you know third-party vendors out there that aren't a learning management system but definitely are in the market space and are collecting a lot of data about our students and faculty. Piazza is an online discussion board tool that's very popular in a number of places. Grade scope is an actual automated grading tool that started at Berkeley but is now on the market as well. And you can see Course Hero. Has anybody heard of Course Hero? Course Hero is a tool that enables students to sell their notes and faculty syllabi online. Yes make a face because my faculty make a face all the time when they find out that their students are posting their stuff there. But Voice Thread is another one that allows you to engage with media and enable students to comment on media. So all of these tools really have wonderful purposes and clearly they're doing well and so they must be serving in need. But they are collecting a lot of information about student and faculty behavior. In the old days way back when when we were hosting things on-prem and we were actually even if we were licensing tools we were running them in our own data centers. We had access to all of those local logs where a lot of that information was sitting. And so you know I don't know if you remember but I remember the days of discussing with my systems administrator how detailed a logging we wanted to do on our learning management system that we were running locally on-premises because it took up a lot of space those log files did and we flushed them occasionally but that data was actually really important and we actually were doing some research around how students were using tools. But we had access to those log files right because it was on-premises and we could just go do a grep if we wanted to. We did a lot of ad hoc reporting mainly for systems issues but sometimes to answer a question like how many courses are using X tool I remember asking that you know and and getting reports on that pretty regularly. But now it's removing the software as a service tools like Canvas or like Desire to Learn or you know name them or even when it's hosted off-premises like if you're getting backboard hosted somewhere you don't have access to those logs anymore. And then to still use those logs to improve the system but I will bet you that in your contract it may not say that you actually have the right to ask for those logs or set out any sort of a relationship around you and the institution and that data and what your right is to it. So you know if we have contracts they're not always specific about the ownership or access to this data and if you don't have a contract like with piazza for example where it's faculty members who are actually signing the click-through agreement on behalf of the institution although they don't know that they're signing it on behalf of the institution. You don't have any sort of right at all right. And so let's just say even if we have ownership let's say you write the best contract in the world and it says you know you have you institution have ownership over this data and and you know yes the vendor has the right to use it to improve their product but you know they will either give it to you or flush it or whatever you still need access to it right. So ownership and access are different and we can argue who should own things till the cows come home but if you own it and you don't have some sort of reasonable access to it it really doesn't matter. And so when you're thinking about access for learning data it really is comes down to a number of things one is this timeliness. So some institutions are actually asking for real-time access to certain kinds of learning data to create these these early alert systems and plugging into advising systems. Some are asking for big data dumps on a regular basis because they have learning scientists who actually want access to the data for research. But regardless we want them to be in some sort of standard format and I think we all know how dirty data can be when you're pulling it right out of a log file and so what we want is so what we're asking for and what's being created actually in these two standards called caliper and xapi and they're sort of worrying standards at the moment that define learner activity. The idea there is to define it in a standardized way so that if I pull it out of one learning management system and I pull it out of another learning management system I can actually you know compare the two or do analysis across systems. Especially in this environment where faculty are using a multiple array of tools to actually ask students to interact with online it's not just the LMS anymore. Does that make sense? Well librarians get you know librarians and data people understand you know standards and interoperability and they need to have some sort of standard definitions and metadata. So I want to talk about a few case studies that that actually have influenced this work around why why some of us got so excited and by excited I mean frustrated or woke up around learning data as a whole. So the first case study is about an LMS vendor and I hope I took their name out but anyway it was an LMS vendor and and many of you probably know who it is but it's the software as a service LMS vendor and they were doing a great job of actually talking to us about giving us our data in some sort of format and they were going to use redshift and we were going to do a bunch of things and it was very exciting and it seemed wonderful and all of a sudden a number of institutions that were engaged with them got a bill and the bill was that they were going to charge us for access to the data. Not even just the services on top of it mind you literally just the data dump itself and they wanted that as an extra fee. Now we understand if you're providing additional services on top of that that might be something that we want to license the actual data itself that's sort of in we felt like wait a minute we need to have a conversation with you vendor and then we went and of course we looked at the contract which happened to be an internet to contract and there was nothing specific about who owned it access how it was you know other than if we pull out of the deal you know here's how how we're going to you know disperse of all of the records and that kind of thing but it wasn't even that they were going to hand us the data at the end that wasn't even in the contract. So we we actually sort of raised this as an issue we raised an alarm bell and we're able to get an addendum to that contract because honestly because at EDUCAS we pulled a bunch of schools together and said this kid you know we need to go to the vendor and we need to actually talk to them. Luckily they were going IPO and you know they were going public in like a short period of time and things kind of worked out in our favor that they wanted to they were very seriously wanted to negotiate but you can't keep doing this over and over again with every single different vendor. So anyway without contracts on those free platforms that you see whether they give you the data or not is at their whim and in fact I think many of these vendors feel like they've entered into an agreement with an individual faculty member and individual students that the institution actually doesn't have a say or role to play in the conversation at all. So this is another story and I am using your name but I won't use it and well they'll see it anyway. So how many of you know what have seen Piazza? Okay. It's incredibly easy and free and so this is their sort of this is how they sell this to faculty members. It's free it's super simple and it actually is a pretty amazing tool I have to say and it serves a need that isn't currently being served by most learning management systems which is it enables students to ask an anonymous question and then the faculty on the back end can see who asked it but it it sort of takes away the embarrassment perhaps of asking the stupid question and it's an incredibly valuable tool but look at 50 000 professors 1500 schools 90 countries but what's a rev gen model? Does anybody know what Piazza is even thinking about making money? They're selling the student data to employers so they're actually serving almost like a LinkedIn brokering service and students it gives this little student profile and it says which students what courses were taken on Piazza so it almost becomes an I don't want to call it an unofficial transcript but it's pretty interesting the way they're actually thinking of generating the revenue so this was how they used to represent themselves so anybody want to tell me what they might like any warning bells going off if you're a product licensing person warning bells would be going off they're like they're using everybody's brand right and so again the leverage that we had with this particular company was to say you're you're breaking our licensing you can't use the Cal brand you know you're you're we're not endorsing this product this looks like an endorsement of the product and the one that really killed us was Piazza Careers is what they call their their LinkedIn service I'll call it a LinkedIn service this is the way the the old sign-up form used to look for students so I made my daughter login so I could take screenshots but so um but here's where it says where Piazza is an aid in your class Piazza Careers is an aid in your career get discovered by companies instantly can you see that the the button's checked she didn't check the button it's opt out not opt in well who wouldn't want to do that who wouldn't want to do that and honestly here's the thing I bet they're getting great you know I I want our students to get fantastic jobs I'm not even saying that this is a bad thing but I think we felt that this was very disingenuous in terms of the way they were doing it because students are not going to uncheck the box and then they can go to the the vendor I mean to the companies and say we have you know 25,000 Berkeley students in our Piazza Careers who don't even know that they're in Piazza Careers right so and I'm being and I have to say I'm paying I should not paint them as the villain but but it was we wanted to bring this to their attention I don't think that they recognize how institutions actually might feel about this or even how students might feel if they didn't if they really understood what was happening so with some publicity with some phone calls and with some cease and desist letters we were able to make some changes and when I say we I'm talking about the higher education community this is not just UC Berkeley or even UC and there were a number of us involved in sort of having these conversations together and sort of made it a concerted effort so now what it says is nothing's checked and you have a choice you're either open to hearing from and connecting with companies and alumni or you don't need any help getting the most fulfilling and rewarding career opportunities so anyway I just have to laugh because I'm like who is doing their marketing is just so used car salesmanish like but anyway but it is better right now they have to check something they actually kind of have to pay attention we've got a little more information about piazza careers other than just the name of it and no link anywhere so some progress there this was before where we had faculty endorsements and again use of the brand this is now you can see the names of the schools have been taken off so you know again I think a little better but but we can't be Don Coyote jousting windmills every with every single vendor every single time and you have to understand that any of these companies that are not charging you are going to make their money from somewhere and we all know that it's in the data right I mean you're making money off of the data in some way shape or form you're either selling advertising or you'd you know even Google and our free Gmail accounts you know we we know I think I think institutions understood what they were trading off and at least we're not getting direct advertisement and that kinds of thing those kinds of things and those in in Google relationships with Google acts for Ed but these third party vendors are not not there yet in terms of even thinking about institutional licensing and so we need to just be thoughtful about what this means and the other thing that I would say is is Piatta has a bunch of great information about our students that if we were running a student success program if we thought that was an ethical good idea to use that data in this way we're missing out on you know all of the engineering and you know mainly it's engineering and CS where they've really made their in runs we're missing all of that data then about what our students are doing and how they're interacting with that content so I wanted to throw a pitch in about libraries and learning data and analytics and a couple people have been in the library world have actually been discussing this for a while so Steven Bell and Megan Oakleaf are two folks that come to mind and I think Megan actually did a talk at CNI fall about libraries and learning data but I think it's interesting because libraries have a lot of really interesting information about how students are accessing information both for their courses and for their research that might be used in a variety of ways both for learning research and also perhaps for student success activities and so in this this ACRL blog came out I should have put the date though it's about 2014 that Steven Bell actually sort of raised this as a question and said while academic libraries aren't encountering it yet it's just a matter of time before higher ed institutions integrate learning analytics at every level of the organization and he was he goes through some issues and concerns that he has around that so that was in 2014 and 2016 Megan is actually talking a lot about the ethics and going back to sort of the libraries ethical standards around protecting privacy and protecting the people's ability to do research without question and without intervention and so she's actually I think taking it to the next level and grappling with a lot of those questions right now so I would encourage you to take a look at some of the work that she's doing if you want to understand more about how libraries are thinking about it and it's definitely a spectrum so this is where it gets long and this is why I gave you the hand out so so IMS Global decided to put together a set a toolkit around learning data and this is one piece of the toolkit and it's an idea that they wanted to throw out some key principles that people should be considering when they're thinking about the use of learning data and learning analytics at their institutions Ownership IMS actually makes a pretty interesting statement that says that it's the faculty, staff, and students that generate the learning data and it should be theirs hold that thought Stewardship pretty basic you know we should be stewards of the data and we should do it in a way that has data governance plan and that we should think about IRB protocols and all that kind of stuff I should say what's interesting is a lot of these learning science big data research up until now because a lot of the data is de-identified have been often exempt from IRB I mean they go to IRB and IRB says yeah it's not an issue at all I think that as we start looking more at triangulation and you know a lot of things it'll be interesting to see if IRBs continue sort of what I think of as maybe not paying as much attention to thinking through the ethical decisions around the use of this data Governance that's pretty standard access which is you know it should be available to the institution interoperability we talked about efficacy is interesting and I want you to think about this IMS is proposing that that ethic I almost think of this as ethics but but they're using the term efficacy here to say that learning data should be gathered and utilized for a purpose that this isn't just about keeping data for data's sake and that it really should be aimed at student success and research and instructional concerns and improving learning science and pedagogy security and privacy they lump them together here and then transparency which has to do with people should know what you're doing with the data so I'm going to skip this for a second and I want to go quickly to the University of California because they we approached it slightly differently for a variety of reasons including the fact that we had this framework already in place which was student data privacy principles and I should say that the focus at the UC level was really vendor focused I mean that's that's sort of what got us excited and a little anxious and so that is sort of the lens that we were using as we were writing these and I think this is my statement now this is just my opinion I think that it may have been a slightly limiting lens on the other hand if you're talking to faculty about collecting their data it's a lot easier to get things get their interest if we're attacking the vendors rather than tackling the tougher questions right like what are we going to do with all the information that we know about faculty right is it going to be used in tenure and promotion so I mean I I know it won't be at Berkeley at this moment but I don't know about other institutions and how they might want to use that information did you guys just see that tenures being revoked from some faculty at I'm blanking on the name of the institution but for not being proficient in in their research and scholarship oh the Chronicle of Ira just had it anyway somebody's actually going to take tenure away from faculty and I just think it's fascinating and then all of a sudden you start thinking about the information that we could theoretically collect around their interaction with students and how frequently they're updating their course and anyway it gets interesting so here are the learning data privacy principles from the University of California ownership notice how we skirt the issue of ownership and we just kind of say faculty and students and the UC retain ownership and I have to say I don't think we're very specific about this particular issue and we don't say the students own their data and the faculty own their data we don't say it's co-created in that we both have shared ownership we just sort of leave it out there hanging well I wonder is data ownership even mean in this context right I mean if I own my data does that mean I can come to UC delete my data today please well and I had a conversation with that exactly I had a conversation with our registrar where he said I hope that we're not considering this all the student record because if it's part of the student record students have the right to ask about it and see it we use the the idea of ethical use instead of efficacy and quite frankly I kind of like it better transparency is the same we specifically address freedom of expression protection and then access and control and then we talk about practices that we might want to put in place with with ownership with vendors and that kind of thing and I'm not going to go into those those are on the second side of your hand up but I want to stop with 15 minutes left and ask you specifically about the two that were when I say we I mean we as a community who are trying to grapple with this I think are struggling with the most one is that ownership question and and I'll just say that there are some people that are very adamant that the that the data is it's student generated and student owned and faculty generated and faculty owned whatever that means and there are others on the spectrum that talk about co-creation and the fact that this data is co-created in the educational and academic environment and therefore we have a need to steward it in a different way and that that the institution for example if a student owned all their data theoretically maybe they could come and tell you that they don't want you to use it for anything right in a co-created space where you think of it as co-creation they probably wouldn't have the ability to say nope I want to opt out of everything I don't want you to use I don't want you sending me those alerts I don't you know I don't want you sending me that I'm sending that information to the advisor the institution would say well we have an agreement with you that we're going to educate you and as part of that education process there are things that we do to ensure your success and therefore this co-created data we are going to use it in certain ways we're going to tell you about it but you're not going to have the ability to say that you completely own it and therefore it could yank it at any time or request that it not be used thoughts so thank you