 So let's take a look at what we're actually going to do today, we're going to be talking about. I'm going to try and burn through some slides and then maybe if people are up to it, we'll talk about it a little bit, see if it's a good idea, if not. Since this is a core conversation, I wanted to give a little bit of background on some of the things we were talking about. We are going to be talking about analytics, web analytics, software analytics. We're going to be talking about managing experiences, alluding back to Driz's keynote for the past couple of years. And then we're going to talk about a little bit of high-level architecture about how I think we can bring analytics, experiences, personalization, and Drupal all into one awesome, beautiful thing. So let's talk about analytics. Analytics is basically, you have your data, you look at it, you make decisions based on it. Pretty straightforward. A little bit more context around that. The data that you're analyzing doesn't necessarily have to be purpose-collected. There could be data that you were looking at that is actually application-level data. Like in a commerce system, you might be looking at items that are in someone's cart. That needs to be there in the system for the commerce application to work. But it's also useful to look at to make decisions. So you can do analytics based on really, really local, small sources in Excel, flat file CSVs. It's more common for analytics to be done on purpose-built databases, sometimes offered as services, or very, very, very large data warehouses. Examples would be like a Google Analytics or on the warehouse side, you might have Amazon Redshift or Google BigQuery or Hadoop, et cetera. In this process, the whole reason why you're doing it is you make queries against your data so that you can actually make informed decisions and improve your organization in whatever way you deem fit. For our purposes, we're going to focus more on web analytics and software analytics because Drupal is both web and software. So let's look at some basics. I thought that GA provided a great framework for it, but also that's the one I'm most familiar with, so I'm going to give you a little bit of background. We've come to kind of expect out of our analytics platform a way for data to be automatically gathered, and GA does that at the page view level. Like every time someone hits a page, it'll collect some information. That page view is really just an event, and it's usually keyed by something like a URL or a path. Some data gets collected automatically like what's their browser, what's their operating system, where are they located, that kind of thing. That data is then aggregated in such a way that we can, for instance, count how many people were on this page, how many unique visitors were there to this whole system over this period of time. In addition to that, we can do some filtering, segmentation, how many of those people were from Tokyo, how many of them were from China, and we're also using Internet Explorer, and based on that, we can make decisions like, I'm going to stop supporting Internet Explorer 7 or not. Beyond the basics, it's really, really important for an analytics system to be extensible, and usually what you can do with these types of systems is provide custom events, be like this user clicked on this thing, this user filled out this form with this value, and you can dump that into your analytics repository and do analysis on it later. In addition to the custom events, you can usually provide some kind of custom metrics or dimensions, so that if someone visits a page, you can say, the visitor on this page is actually associated with this taxonomy term ID, or the page that this was loaded from is of this type. And then obviously, when you're analyzing those things, they're treated like the default stuff that you get from GA, the page view stuff, you can look at events, segment, see how many people in this industry clicked on this element this week, or that kind of thing. Moving kind of further up the stack in the typical organization, we can kind of apply a little bit more business value to the stuff that we dump into our analytics engine. You can set up goals, which basically just define relationships between two events that are in your analytic system. Say, user visited this page, and then on that page, the user clicked this button. And then we can kind of track conversions, which are just like, how many people did that thing that we set up, and conversely, how many people didn't, and what is like the conversion rate of that happening. And then obviously, we can analyze those conversions and segment and do that kind of thing, where it's like, how successful is this page in helping people become engaged constituents, if you're like in a non-profit kind of situation, or how many people downloaded this software that I'm trying to sell? How many didn't? And a really important part of analytics, I think, is a focus on developer experience. It's super, super easy in Google Analytics, for instance, to do those extensible custom things that I just mentioned. You can do just like a... It's kind of encapsulated in this facade where you just call GA, and then you'd say send event, and then you give it a category and an action, or you can say GA set, and here's my custom dimension, and you give it a value. And then next time you send an event or send some data to Google Analytics, it's just going to know. It's also performance-oriented, highly scalable. You just stick an asynchronous script on your page, and it just magically works even before the script is actually loaded. It just kind of does everything for you. Software analytics is a little bit different, but it's pretty much variations on the theme. Rather than looking at page views, rather than looking at click events, that kind of thing, we're looking at transactions that our application is handling. We're looking at network connectivity between different servers. We're looking at app-centric kind of events, but we're doing the same things with them. We're saying how many times was this route called to handle this request? How long did it take looking at it over time and segmenting? Same kind of thing. And then in terms of software analytics, we're looking more at monitoring and making sure that things are below certain thresholds rather than looking at conversion rates. Vendors in this space would include New Relic, Appnetta, Airbrake.io, that kind of thing. So I mentioned a lot of vendors, but I didn't really mention Drupal. Does Drupal do this kind of thing? No, not really. You have the statistics module which can count hits to nodes. We used to be able to look at page generation time in the access log, but that's gone in Drupal 8. And typically these would be the types of things that you would pretty much just turn off on a Drupal site because they would just bog your site down. They would make your site less scalable. I like how my colleague kind of described it. It's like a module that is a burden that you choose to turn on or off. There's some interesting stuff that gets logged. The developers put in their code, and it's called Watchdog, of just like events that the developer thought the module had that were important where if something was going wrong you would want to know about it, but also things that were just happening that are noteworthy. And that's kind of interesting, but it's not really what we're looking for in an analytics system. Basically, Drupal just doesn't do it. Rather than doing anything, we rely on these third-party vendors and services to do the analytics for us. We install New Relic on our servers. We install Google Analytics on our website, and Drupal is just totally unaware of important events that we may or may not actually want to look at. So the whole point of the session is what do we do with that? Do we decide that Drupal doesn't want to do analytics and just live with that and continue pushing all of our data to these third-party services in a way that we can't use or and just actually get rid of the statistics? Like should we just delete this module from core or should we improve upon it in a way that is useful to us in Drupal? Obviously, just deleting it is way easier, but I'm going to make the case that we should build upon it. And I'm going to give you a use case, which is the use case that Dries and Acquia have been really kind of cramming down our throats the past couple of days, which is the experience web, managing experiences, personalizing content. To give a broad definition, it's kind of a fluffy marketing term, but basically it just means a content management system that is capable of personalizing the content period. You can kind of think of it in terms of personas where maybe you have user stories or something on your website. Maybe you want to treat customers different than prospects. Maybe you want to treat in the nonprofit world, you have like the pyramid of engagement and you want to treat those different pyramids differently because they have different goals. And you want them to get their goal easier and you want to personalize it in order to make that happen. That is the like content delivery optimization part of the personalization there. The part that a lot of people don't really talk about but the most important part I think is when you do that personalization to help a person reach their goal on your website better, you need to measure it, make sure that what you did actually works and if it doesn't, fix it and if it does, keep doing more of that thing. I think that's a very key component of managing web experiences. So let's think hypothetically what it would take to actually build something like that on Drupal right now. At a minimum, you would need some kind of mechanism to customize the way that content is delivered. We would need some kind of mechanism to determine who a person is and what kind of person they are and then we would also need some kind of mechanism to analyze the personalization process. We need to measure what we do. If we take a look at Drupal specifically, I think in terms of personalizing content delivery, we're actually doing pretty well. We have a lot of ways that we can build layouts, a lot of ways that we can display content. We have blocks, we have regions, we have the like a contacts module, display suite, panels. We can alter the types of content and how they're displayed very easily. That's like a very, very core part of Drupal and I think we're doing great. In terms of visitor segmentation, we're pretty good, we could do way better, but at least we can throw fields on a user. We can add a taxonomy term reference to a user and know that this user corresponds to this taxonomy term. That's pretty powerful. Obviously, connecting to third party services is kind of like an integral part of Drupal's DNA and we do pretty well with that. There are a lot of enterprise-y real-time identification services that we can hook into and be like, oh, this person corresponds to this IP address and we can get a whole bunch of information about that IP address and associate with that person. That's the thing that we do pretty well. And then there's some like modules in the contrived space like Browsecap or IPGLocation that kind of bring that functionality into Drupal itself. Pretty good at that. I think that the biggest problem that we have if we're going to support the experience web is that we just have no data. Like everything that a person does on a website, Drupal does not really know about. So if we're going to personalize a website for Drupal, we need to have a better understanding of what that person is and we need to do that by kind of tracking what they're doing. If a person happens to go to a lot of pieces of content that are tagged a certain way, Drupal should know about it. Because if it doesn't, then you're going to do personalizations that are not really relevant and if you don't even have a way to store that information, you're not going to have a way to measure whether or not that personalization even worked in the first place. So I think that if we're going to do web experience management, we need to do it in a data-first way. We need to do it in a fully integrated way. That might look like a Drupal where collecting data, analytical data is part of core Drupal. I think that the data collection should be totally plug-able. I think that the process of managing the analytics in Drupal should be targeted at site builders, website administrators who are not developers. They can just click through and configure what they want to know without having to talk to someone in their development team. I think that if we're going to make data collection plug-able, we need to make it so that similar to responsive web design, if your theme that you're providing is not responsive, then it's broken. We need to get into that mindset about Drupal and data. If your module exposes interesting data, you should integrate with this system. I want to be able to, if I have Mandrill installed or something on my website, Mandrill knows if I sent an email to someone and that person opened that email, that needs to be part of this system. Not only that, I think it needs to be done in such a way that this data can be easily accessible throughout the rest of Drupal. So if you have a view, you should be able to make a view of this data. You should be able to make a view of something else that you would do normally now, but relate it to this data repository that we have so that we can more easily personalize the experience for that person. That's my vision. How do we get there? Scalability is hard, especially when we're dealing with, like, if you load a page and we're telling Drupal, hey, this person loaded this page, they scrolled, they clicked on this button. While that was happening, some other thing loaded and you were loading all of those events into Drupal. It's going to be hard to scale Drupal to do this. However, I think that we're actually doing pretty well. In Drupal 8, we have web services and core so that we don't actually need to... Well, web services and core, and also we have this whole re-architecture of the bootstrap phase in Drupal. We have the concept of the Drupal kernel, and so every time we respond to a request, we don't have to load every single module and go through a whole process that we don't actually need to do to generate a page, because we're not generating pages, we're just collecting data. I think we're doing pretty well there. At least we have a path to scale at the application layer. The database layer is kind of interesting. Drupal supports multiple databases, but none of them really support the level of scalability that you need in analytics database. However, recently, I would say, in the past three to four years, we've been seeing this proliferation of highly scalable databases, a lot of which are open source and anybody can download and just use right now, like a MongoDB, CacheDB, that kind of thing. I think that that would be a good path. We should have support for that kind of thing. We already do have some support for that kind of thing in Drupal Core. That might be a good path to supporting this kind of scalability. But in addition, we have this consumerization, commoditization of big data type warehouses, and a lot of that is kind of like software as a service kind of thing. If you have push button Drupal, you should also have push button bottomless pit of data storage, and I think that's actually kind of possible now. We have Amazon Redshift, Google BigQuery, that kind of thing. I think that in order to fully integrate whatever data storage that we have for this data with Drupal, the simplest way to do that is through the Entity API. Entities already have a concept of pluggable storage backends, and they have deep integration with views in other parts of core, like the REST pool services component in Drupal 8 that we can just plug into and use. So I'd like for you to imagine an entity called like a stat entity where each row is just an analytic event and we have properties on that event. We can plug the storage backends so that rather, like by default, smaller sites, nonprofits, whatever, with a database backed analytic engine that just goes to MySQL and they can still have all this functionality that larger enterprise customers would need out of the box. However, it would also integrate with your MongoDB or your New Relic Insights or what have you. If we use entities, we get the entity configuration UI for free. Imagine customizing your analytic system in the same way that you customize fields on a content type. Super easy, anybody can use it with a little bit of background information. The pluggable storage is also super handy because we have this system in Drupal Core called entity query or entity field query in Drupal 7 which allows us to query for entities against arbitrary data sources without having to expose any kind of crazy SQL assumptions. I think that's the way to go. In addition to this stat entity, imagine that we have these plugins for actually collecting metrics, dimensions, basically like data getters. Say you trigger an analytic event and it calls this getter and the getter basically just says, hey, Drupal container, I need to know about the request object and from the request object I'm going to return the user agent or something like that. On this plugin you might define like a JavaScript plugin so that we can compile a whole bunch of plugins into a single file that we can load on any page. It's just going to be like, hey, somebody clicked on this thing and the administrator said that when someone clicks on this thing they want to hear about the user ID or they have RedHend installed and RedHend is like, I know what the RedHend contact ID is so I'm going to store it on this thing. I kind of feel like with this plugin system every module maintainer would just implement plugins for interesting data that they happen to be aware of so that site administrators can just have the whole breadth of the Drupal ecosystem at their fingertips when they're configuring their analytic events. I think that there should be this concept of like data management plugins. In Core right now in the statistics module and in DB log we have this concept of like rotation where because of the limitations of MySQL we don't want to just endlessly store watchdog events or access log events so it will just like, you can configure it to say after I collect 10,000 of these go ahead and just truncate whatever's left. Rotate those out. Or after it's been like two weeks go ahead and delete that information because I don't need it. In this system if we make it a plugin we could do that by default for instance but somebody else could come and be like oh I want to aggregate this data on this column and store it in this table or I want to apply some kind of a filter to delete this information which I don't actually care about. I think that one of the difficulties we're going to run into with this system that it proposes is that we don't really have a way for site administrators to say like hey here's the data that I want but I want you to collect it when these things happen. At least not in Core. We have rules. Rules would be a super interesting way to do it. I know the AQUIA folks have built out some really interesting UIs on how to collect, how to trigger certain events on the client side and maybe we could pull that kind of thing in. Maybe we should rely on like symphony, event handling system that could be pretty cool and just like build a UI on top of that. I don't really know what the best way to do it is but I don't think that's a difficult problem to solve and I think that that's a problem that Drupal has vested interest in solving. In terms of data analysis I think we have a little bit of a problem because views would be the natural way to kind of generate reports based on this data. However, it's really kind of hard to build a view if you don't know views well especially if you get into the aggregation functions. Like if we're going to make Drupal analytically awesome we need a way for people who are not developers or Drupal experts to be able to pull their data down without having to I don't know, come to Drupal.com. There are some kind of deeper technological problems with this views entity kind of integration where if we have our application data stored in one type of database, MySQL and we have our analytical data stored in a larger data store in a totally different type of architecture kind of joining or blending or making a relationship between those two types of backends is going to be super hard. But that needs to happen, I think. The way we're going to do it, I think, is with that entity field query but I think there are some problems that we need to solve there. If you go to Google Analytics or you go to New Relic, you notice they have these awesome pretty charts and graphs. If we're going to have an analytically minded Drupal we're going to need those but it's really, really hard to do visualization and it's really hard to do well. I don't think that it would be in our interest to build a visualization framework. Drupal is a place to manage content and experiences. Drupal Core is definitely not the place to build a visualization framework. However, I think it's a feature that we need if we're going to do analytics and if we're going to do it, I think we should do it in a way that any other module can introduce their own way to visualize a thing and it should probably be done through views and it should just be a light kind of wrapper and maybe we'll provide a default that is okay at what it does. So, yeah, that's kind of the vision that I see for Drupal and analytics going into the future. We don't do it right now. We should be doing it if we're going to support managing experiences as trees has kind of outlined to all of us at DrupalCon. We should be doing it even if we don't necessarily want to support that, I think that there are values in standardizing on a way to do analytics in Drupal so that Contra modules, like whatever provides Google Analytics or other modules that provide interesting data can integrate seamlessly without having to do something in a totally custom way. I think that a lot of the work that we need to do is already done for us in the entity API, in views and core and that kind of thing and the problems that are here are totally not big. They're solvable and they're solvable within a release cycle, I think. Yeah, so either we get rid of statistics and let large third-party vendors do the analytics for us or we build the analytics in a way that it's totally integrated with Drupal and valuable but still also analytically valuable, which should we do if anyone wants to come up to the mic and talk. Let's have a conversation, a core conversation and if not, that's totally fine. It's the last session. We can take a nap. Do any of the existing analytics platforms have an API to get data back out of them? That way, Drupal could just say, hey, we have consumers for the APIs of Google Analytics and these other ones. If you send your data off to them, we can read it back and offer you some insight on what you can do. I think that might be the way to go if it's possible rather than build everything out in Drupal Core by itself. Right, so that's kind of the trend right now. The most compelling, interesting product I've seen so far has been New Relic. It has a piece of software in beta right now called Insights and it's basically just a giant data store by default that gets populated by the same data you send to New Relic for your monitoring but you can also send it custom stuff. Like, hey, here are my events and there's an API to pull it down as JSON. So, and a lot of the vendors are doing that now too like Google Analytics. They're building out in Universal Analytics more and more APIs to pull that data back down. And actually, if you pay them a whole lot of money, they will dump that data into BigQuery and you can pull it down that way too, right now. So, rather than, I agree with you, we shouldn't be building out that kind of thing in core. We should be building out a way to integrate the data that's in those external repositories and bring it back down as like core entities so that we can filter views by them, we can do whatever but keep the data management somewhere else. There's no way to do it that we wouldn't have, Google wouldn't be storing the data. It sounded like, you know, in your proposal, you'd be taking all the data back down and storing it but is there any real reason to do that as opposed to just processing what we get back? So, the only reason we would be doing that is for smaller websites that don't have the ability to spin up those types of instances and store it off-site. I don't think we would be storing it in Drupal, actually. Like, there's this thing, are you a developer? Yes. Okay. So, in Drupal 8, you have entities. Entities have a storage controller and you can write your own storage controller for entities. So, you can hook it up to, by default, there's like this content entity storage controller that hooks up to the Drupal database through the database API. But you can have a storage controller that's like, I'm going to pull all of my data from Google Analytics, but Drupal is going to treat it as this core thing that it knows about. That's the basic idea. You could go either way. Okay. That wasn't immediately clear. Thank you for the feedback. We good? Should we just do this? Does anybody know how we can do this? How... Question? So, the question for the recording is, have I put these ideas, proposals into the queue for developers to consider? Yes. Kind of. So, like three months ago, I built out kind of an early working prototype of this in the Drupal contributed space. It doesn't work anymore because they've just been changing so much of Drupal 8 constantly, especially the APIs that need to be used to do this. So, kind of. And the problem is that the process of developing core is so hard to describe, like where the responsibility is for decision-making, who actually decides, hey, this is a good idea, let's do it. Like it's not very well-defined. And I was hoping that I would get feedback on that from core developers here, but I don't know. But yeah. Yes, I should just make a ticket for this. Probably. Thank you. What kinds of assumptions does entity storage make? Like, I guess what I'm getting at is, we get so much integration out of entity that I would be very cautious in building a different storage interface. Hopefully, the recording picks all of that up. But, yeah. Totally, totally agree. So, you're at Mark, right? Do you, how do you recommend from a process perspective that we propose and implement this kind of thing? Okay. Cool. If anyone wants to continue talking, I am known as I-M-E-A-P, pretty much everywhere. Drupal.org, IRC, Twitter, whatever. Let's make this happen. It really, really needs to happen, in my opinion. Thank you.