 Hello, WordCamp Europe. Bonjour, Paris. It's an incredible honor to be here today, and thank you all very much for sticking it out to the final session. Last thing standing between you and the after-party. I hope that we can have a good time together. Before I start, I want to just note, I learned to code from the WordPress community. I started making websites when I got out of college and learned more and more from many of the people in this room. And it's events like WordCamps that make that community so strong. So before I go any further, I'd just like to ask for a round of applause for the organizers and the volunteers and the translators and everyone else. These things do not happen without them, and I am very grateful to be a part of this one. This one's been incredible. It's been so good that this is, I figured I'd start off with a data visualization, and because whether or not you know it, pie charts are kind of like the comic sands of data visualization. People love to use them, but designers think it's awful. I want to make data that is approachable, though, and comic sands and papyrus are often noted as being approachable fonts. So I thought, perhaps, it would be a good chart to start with. And this chart shows the talks in blue at WordCamp Europe that I have wanted to attend and the talks in red that I would have liked to skip. And the only talk that I would have liked to skip is my own, because I would love to be across the way in Michaels. But thank you all for coming to mind again. And it's going to be an interesting session, because data is not necessarily something that we talk about a lot. I mean, we talk about it often, but we don't think about really what it means. We think of WordPress in terms of the things that it lets us do. We think of it in terms of publishing and posting and commenting. We think in terms of content, because it is a content management system. And we are not used to thinking of content in terms of data, but of course it is. All of that content comes from a database, so what else could we call it? In order to do data visualization, you have to have some sort of information and a desired picture you would like to draw with it. And in WordPress, in order to have a website, you have some sort of information and a desired presentation you would like to give it. Those are the sites that we create. And the way that we traditionally take data out of WordPress and put it in front of our users is through PHP and through the loop. The loop is just PHP code that takes information retrieved from the SQL database and converts it into a format that we can map into HTML. So each individual post gets rendered out in a specific part on the page, usually one-to-one, maybe again in the sidebar. But as you're reading through, if you go to the URL for a page, you get the data for that page. The page is driven by the data in the database. In this way, every web page we make that is driven by a database can be thought of as a data-driven document. This is a phrase we'll return to. Everything we do is giving visual representation or audio or semantic representation to information that is abstract and stored somewhere that we cannot see in a form that we cannot imagine. So the goal of data visualization is to take that complex information and represent it visually to let you learn things about it that you might not be able to see or might not have known about. DataViz is all around us. DataViz being the hip way to refer to data visualization. Charts and graphs and interactive graphics are all around us on metro signs and buses and cell phones and apps and dashboards and boardrooms. We're so immersed in pictures of data that it's very easy to take them for granted. But every chart that we see was created by a person for a reason. In a way, it's a storytelling mechanism. And the more complex the story, the more complex the graphic. If you have a lot of different pieces of information, what we would call dimensions, different aspects of a picture, putting them together in one image that represents in one graphic things like geographical position, size of an army, temperature, that's a lot of work. And if you do it well, you can communicate a lot of information very concisely. It's an honor to have made this conference in Paris, because some of the best data visualization examples that we have from the beginning of the field are by Frenchmen, including Minard, who created this graphic. And the upshot of this is that we have a number of over 100 years, 150 years right now of good data visualization practices that we've accumulated. And they're taught in schools. We learn to make charts and graphs and plot things and learn how to use Excel, which is the most widespread data visualization tool in the world. We measure site traffic and sales with one eye on analytics dashboards like those provided by Google and Jetpack. And as we're reading news stories, data maps and visualizations jump out at us to help tell stories and help fill in the information around the words that we're reading. They don't tell, they show. But of course this isn't just limited to articles, many data visualizations can actually become a story in and of itself. They let us take things that are very complicated, things that are very abstract like budgets and national agendas, and they give us the ability to explore them and to draw our own conclusions from them. They highlight nuances that might be missed. And they can also humanize numbers or just give us a sense of the scale of a problem. When we read a number like 11,419, which is what the ticker on the left will eventually reach, 11,419 is not a picture that we can hold in our head. But if we can think of each individual point as a person, it begins to seem like a whole lot of people. A good visualization can show us the individuals that make up statistics and it can highlight things about the information around us that we really should be thinking about and make us aware of the scale of some problems. It can also help us solve some of those problems. In the sciences where researchers in different groups can produce astronomical amounts of unconnected data, we have a lot more ability to gather information than we do to understand and analyze it. And data visualization, like this portal built with Boku and Harvard Medical School, can be used to build collaborative tools that let scientists explore that data and find connections in it that might lead to breakthroughs and things like cancer research. The scale of data being vast, what we call big data, is not something limited to the sciences either. You can't reason about something that you can't see. So if we're gathering tremendous amounts of information about things like economic performance of different companies or the performance of different providers of internet traffic, you can't really reason about that information. There's no point for you having collected it until you get it in front of somebody who can actually understand it in a visual way, or an analytical way, I should say. And so that's all nice. But why am I talking about these things at a Word camp? Something that really drives me is finding ways to bring in things from the rest of our fields to make WordPress and the community around it stronger, and also to make us more aware of what's going on in the broader world and even under our noses within WordPress. So today I wanted to propose this talk to marry two major interests of mine, data visualization and WordPress on the open web, and give you a sense of how these tools might be relevant to us in our plugins and themes and other projects. I had the immense honor of working at Boku, a data visualization consultancy in Boston for a number of years. I actually have moved on this month to new things, but what they taught me was that this is an immensely powerful skill, and it is a tool that we should all be aware of and be able to leverage in our work. Because content is data, and data can be visualized, so we shouldn't just be looking at the words, we should be thinking about what else is there and what we can understand about it to better meet our own goals. To see how we can do this ourselves, let's think about what sort of data is in a post. It's got a title, some amount of content, an author, categories and taxonomies, and of course a date. And of those, the date is the only numeric field, and because it's a numeric field, we can do things like Jetpack does in their analytics module or like GitHub does on their contribution graph, and we can take that date and plot it in the calendar format that we're familiar with to give us a sense of how regularly we are posting. Regular posting is important for maintaining a blog or a site because it's the way that you generally maintain engagements and get the word out and keep yourself going if you take a break, as I did on my personal blog. I ran this on my own blog and it was basically empty for the past year because I've been working on other things. I think having it in front of you is a good reminder and I really like what GitHub does, but I think that we can actually, for our first example, make me simplify this down even more and say rather than looking at days in a year, what individual dimensions of data might be relevant for understanding posting rhythm over the course of a year on the month-by-month basis. So this is what's called binning as in putting things in a bin and it's taking a number of individual data records, in this case posts, and grouping them together into bars that we can compare. The reason that pie charts are sort of denigrated is that they're often used instead of a bar chart for comparing different values and it's much easier for us to compare values in terms of vertical change than area. Humans are very bad at estimating area, so that's just an aside, but what we're doing here is we're saying let's get all of the posts on our site and understand what months they were published in. Looks like there's some peaks in March and June and maybe again in the end of the summer and some dips towards the holidays. This is actually something that we can render in PHP. We could do a WP query and get back the data and map it into some sort of object that we could then use to draw divs that we style with CSS to create a graphic like this. But in this case, we're actually going to be using scalable vector graphics and JavaScript because if we do that, then we can start to animate it and we can start to say, what is the change over time? Company founded, smaller team, more interesting projects, even more interesting projects, more velocity, more interesting things to share, open source kicking off and you can get a sense of the history of a site and you could use this as a way to sort of understand and think about where there might be gaps. For example, there's a gap here in January 2017 where we didn't schedule any posts before everyone went away on holiday, so when we came back in the new year, there wasn't anything that was quite ready to go out yet. This is an interesting editorial tool and this is something that could be built into a plugin to maybe help maintain a little bit more of that rhythm over the year and be more aware of those gaps when we should be thinking a little bit farther ahead. So how do we actually go about drawing this image? There's a lot of power in getting our data into the browser but JavaScript is complicated, SVG is complicated and getting data in general is generally complicated but it's gotten a lot easier because with the WordPress REST API in core, of course, we can just ask for the data and it comes back to us as data in JSON. So we can step through all our posts, get the first page, collect it, pull the dates and post times out of it and then step on to the next page and keep going until we have got all of them. You can make these requests by downloading them manually using a tool like Curl, jQuery, NEAJaxLibrary and then you'll realize that that's probably not the best way to go about doing this because you're making a number of individual requests and you're throwing away almost all of the content. So it's been mentioned a number of times at this event but I can't believe we can stress it too much. We should be very, very mindful about the data that we're sending over the wire and this JSON on here, which is less than a single post-worth of data, it adds up. For a WordPress plugin, we shouldn't be assuming anything about the internet of our user. It should be totally not known to us whether they're going to be on a low-band width metered connection or an absolute top-tier free unlimited high-fiber system even in a major U.S. city, I've been feeling that phone data plans get more and more less and less, I'm sorry, unlimited every year and it's slow. You don't want to have to make people wait or they're going to go and they're going to find something else. To make a graph like the one I showed, this is all we actually need, although technically we I think need even less, we only need the year and the month and the count. So if we send anything extra, we're wasting bandwidth, money and time and if you saw the lightning talks in this room earlier, also harming the environment because all of that time is server energy. So we use custom endpoints. The first sort of takeaway that people should get is that if you want to go down this route and start exploring with drawing graphics in the browser, use a custom endpoint to get the data to the client because it's that way you're going to be able to make things as concise as possible and if the data isn't going to change very often, you'll also be able to use transients to store that data so that it can be sent very quickly. Some of the graphics that I've worked on can take quite a while to compute and even though we're not working with astronomical amounts of data in WordPress, if you want an overview, that's a lot of queries. You don't want to repeat those on every request. If the data, yeah, using the default endpoints is great for prototyping, but be efficient when you're shipping production code. So now that we have that data, we need to draw the graphic and for this, as I mentioned, we can use HTML and CSS for many things. They're particularly good at barcrafts and area charts, sort of tree maps, which is a rectangular nested area chart. Anything that can render HTML can be used to create a visualization and you can use any tool you're comfortable with to render that markup, whether it's on the client or the server. But again, once you start wanting to get into more interesting shapes and lines and different sorts of layout, you're probably going to want to move towards scalable vector graphics or SVG. There's a lot of drawing tools out there that help you with SVG and this is not going to be an especially code heavy talk. It's more going to be sharing tools and resources. One data visualization tool for SVG stands out above the rest and that is a tool called D3 or data-driven documents. This is a tool created by a developer named Mike Bostock and an open source community and it is a really excellent way to declaratively specify the mappings between the data and the way you want the represented and then to define a procedure for actually making that image and handling updates and things like when you get new streaming data in and you get those fancy movie style dashboards that are constantly changing and animating. But you don't have to do all the rendering yourself so before we dive into more charts I just want to highlight a couple other tools that are available for doing data visualization, one of which is of course Excel but also Tableau and other tools. I don't want you to think that the only way to make a visual is to do it on the web but if you do want to draw it in JavaScript you don't want to handle things like put a circle here put a bar there, make it this big size at this way. There's a lot of interesting tools in the middle that give us the ability to say more what kind of chart we want and then they'll do the rest for us. One that I have some personal experience with is a tool called Vega, which is by the University of Washington Interactive Data Lab in the States. It's actually a lab run by Jeff Hare, the data scientist who actually helped talk conceived of D3. And his group has been working on Vega to produce basically the same way that we, with the REST API, give a JSON syntax for our posts. They've created a JSON syntax for data visualization. So this is something where your tool can simply get the data and give it to Vega and then they'll handle the drawing. Or if you're using React or Viewer or something there's a lot of libraries out there that let you handle the drawing of a visualization using these tools. So there's a lot of other information about our posts that we can visualize besides the date. The date is arguably the least interesting because it's numeric and we kind of have good tools for reasoning about numbers already. But an interesting piece of posts are the categories and the tags that they have and the way that those interrelate. I know that something that many of the WordPress projects I've worked on struggle with is how best to use categories and tags. So now that we know a little bit about some of these tools available, there are some techniques that we can use to visualize those relationships in an interesting way. And graphing a relationship means effectively graphing a network. Relationships define connections between individual entities in space and if you think of those as a network you can start to position them and group them and understand what the relations between them are. So D3 provides what's called a force directed graph. It is a tool to basically do a lightweight physics simulation to take a bunch of points and say, all right, how can we lay these out? How can we give them forces that represent whether they should be connected or not and run that simulation until you get to a point of stability at which point you can start to explore and see that, oh, in this case, which is another Mike Bostock example, these are the characters of Les Miserables. Les Miserables, sorry. And the central characters, like Jean Valjean and Javert, end up in the center, whereas the characters whose names you don't remember are those stray individual points leading out on the edges. Everything recently seems to be about graphs and networks and so knowing how to do this seems like a useful thing. Maybe we can use a network graph to lay out the posts that we want to share and understand maybe visually what types of content we write about can focus our use of tags a little bit better. Or maybe not. This was an idea that I had when I pitched this talk, was to do a really nice network diagram of tags and categories, but because most blogs I feel tend to be fairly focused, you get what ends up being called a hairball, where everything connects to everything else and it all clumps together in the center. There's still things we can learn from this. You can see, as you explore, that open source projects are related to... Tags that represent open source projects are related to a category that represents open source contribution and web development and the outliers along the edge are actually turned out to be tags that are not used. And so this is a useful thing if only to know what tags you could probably delete. But there's probably better ways to represent this information. Not every chart is totally appropriate to every form of data and different representations are going to highlight different things. This is the same data about the tag usage visualized instead of a network as a co-occurrence matrix. And what this means is that if two tags occur on the same post, they are connected and we fill in that square. And if we sort it so that the most used tags and categories are in the upper left corner, then we can finally start making some useful things... useful statements about the content on this website. We can say, ah, we write about tutorials. We make a lot of tutorials, but they cut across a number of different topics or, you know, maybe the most focused topics that we've talked about are performance, things that are constant problems, and so they require constant investigation and writing. You can also see which terms are less common and begin to understand whether those represent areas where you might want to expand the articles that you provide or else maybe focus more on the places where you have the most existing content. And I say content, but we haven't really actually talked about that part. The words in the post, we've only been talking about metadata, dates, tags, authors and content, comments. When we say the contents in WordPress, we usually mean writing words. When I saw this slide in Petri's talk yesterday, I knew I had to ask her if I could use it because the word cloud is another, I think in some cases fairly denigrated or maligned data visualization tool, but it can be really powerful. It's one of the only tools that most of us are familiar with that is actually oriented around writing us to visualize words. And we don't think about it in those terms, but this gives us a sense of the breadth of language of this community. And the tag cloud, which ships with WordPress out of the box, gives us a sense of the breadth similarly of the topics that we write about in a visual way. Again, we're still in this case talking about metadata, taxonomy terms, but if we switch to trying to make a word cloud for post content, then we can begin to see a little bit more about the actual words that we write and what the focuses of our articles are. Unsurprisingly, for a post on the Boku blog, terms that occur in JavaScript's code snippets occur very frequently. This word cloud is sized by using word frequency, accepting that we have removed a number of what's called stop words, which are words like and, the or, that don't have any real meaning. So if you take the content of a post and you strip out all those words and collapse individual terms that are very similar together, you can start to build a sense just by count of what words you write the most, which probably have some correlation to the meaning of the post. But, excuse me, for a blog, for a company that does a lot of JavaScript open source work, knowing that we use the word function a lot isn't actually very helpful. So there's some interesting text analytics that we can do that let us take that and map it into a different set of words that still occur in the posts, but maybe give us a little bit more of a sense of not just what words are most common, but what words are most important. And text analytics, these are all topics that could be and are conferences in and of themselves, but just to quickly explain one way that this can work is that if you use a term in a post, then it occurs maybe in this category, this is actually representing a category's worth of words for web application development. So if we use a term like JS in every single category, it's not really necessarily specifically relevant to that one category that we're looking at now. And so we want to find a way to measure and limit the frequency with which we use a word by how broadly that word is used. And this particular example is a metric called term frequency, inverse document frequency, or TFIDF, which is basically multiplying together the inverse proportion of the number of documents in which a word occurs. So if you use JS in every one of your categories, that would be very low. And then you multiply that by how common the word is. So words that are fairly rare, but also repeated often within a document rise to the top. And we can see, ah, we don't just write about JavaScript, we write about Webpack, and hard source, which is a tool for making Webpack builds very fast. Or the audio API, or IOPort for Internet of Things work. This talk was really conceived just to give an overview. And so, of course, there's never enough time to go into any of these topics as much as I would like. But text analytics, drawing graphs, these are things that I think are interesting and I think are going to be very relevant to us as a field. So I'm going to keep in mind as we begin to step into this space and as we begin to learn how to put together these images. The first is performance. Because I've talked about the DOM and the SVG rendering, but that only gets us so far. The browser cannot render thousands of SVG nodes without a little stutter. For smooth animations with thousands of data points, you need to switch to a different technology. And one of the easier ones is something called Canvas. This is a drawing surface within the browser. It is used for a lot of video games and other high-performance graphics applications because it can give you a little bit of a lower-level API that doesn't have the overhead of the document model, the DOM. And it can let you draw those points in a very performant way. This is a demo reel put together by a developer and friend named Peter Beshai in the States. So for complex charts like large network graphs that show thousands of points across maps, Canvas is going to give you better performance than SVG. And you can still use a tool like D3 to help you do that layout. But because you're now drawing a static picture, rather than having a document node that has its own event system, you lose things like mouseover and click detection. And so you have to re-implement those. There's a trade-off between performance and complexity. That trade-off gets even more accentuated when we get above the many thousands of points. If you want to render tens of thousands of points, you really have to start looking at things like WebGL, which requires you to write what's called shaders. Shaders being very powerful parallel graphics code. But they're very complicated. They don't look anything like JavaScript or PHP, and they can take a while to learn. There's a great resource called thebookofshaders.com, and there's interesting tools like Regal or Reactive WebGL that give us the ability to maybe simplify the code a little bit and understand a little bit better. But this is something that you only need to step into if you really have that major performance problem. But regardless of how you're drawing the image, you're going to probably want to remember that an image is no good if you can't understand what's in it. So be mindful of your audience. Be mindful of contrast ratios. Make sure that your visualization will appear on a low-contrast screen in the sunny lit boardroom or for somebody that can't distinguish between different colors. Don't just use those colors to distinguish information. You should also be thinking about how can I convey maybe this difference not just by color, but also by texture or shape. I promised in the brief for this talk that I would talk about doing custom data bashboards. We don't really have a ton of time for that, but very briefly, to define custom data, which you can do with a custom endpoint or a custom post type, is that you're going to want to use register rest fields to build out a data object again that is more specific to your domain. I built a number of dashboards including this one before the rest API was merged and it took a very long time and we basically had to implement our own JSON API. So if you're dealing with custom data and you're dealing with anything that comes in and out of the browser, I feel things have gotten a lot easier than they were before, so we should take advantage of that. To close, just think about data as something that you deserve to see. We should not be, you know, too scared of the information that we have. It's easy to get started. There's a lot of things in this field that are complex, but the basics are simple and there's a ton of resources available for learning it. There's blogs like Flowing Data or Boku's or conferences like IO Festival, which is a little bit more of an art focus but still touches on a lot of data concerns, OpenVizConf, EuroViz, and many others. The tools that I mentioned today are only the tip of the iceberg. In the contributor day, John Meida gave a talk about the same topic and he was using P5JS, which is a canvas drawing tool based on processing, which is a graphics language devoted to artists. And all of these tools are available for us, so whatever your poison of choice, you can find a way to start drawing with it if you want. Something to remember about Big Data is that Big Data just means a whole lot of personal data. Most of us haven't ever really seen much of our personal data. We might have something like a Fitbit, but we might not have signed up for a developer account to try to get that data out. It's locked up in their systems. The REST API matters to me because I believe it's the most widespread free and accessible API that people have that lets them access their own information, their own content and data that's out there on the web. So we should take that data and we should see the other side of the picture. We should begin to see what you can do when you have all of this information because we should be able to empower ourselves and our users to take advantage of it for our own purposes, in addition to those that are the purposes of the companies that are collecting our information. Visualization can even be used to demystify things like neural networks and the algorithms that govern our lives. So know thy data and learn to read it like that. As John taught us yesterday, creativity comes from juxtapositions, so regardless of your background, I would ask you to consider trying to bring a visual side to the information you're working with and to learn to decipher these systems and to make things that the information that give us tools to empower each other to understand the information we have. To close, I want to make one more visualization which is of you, the audience of each one of us has something to teach each other and none of it is magic. Thank you. It was awesome. We did it. You had to wait so long to just talk. Are there any questions for Kay Adam? I can't see the mics. Are there any questions? The room is full. I can't believe there's not anybody that has a question. Somebody here. Thanks so much for another good talk. As somebody who might be a beginner for data visualization and if they want to get into it and try to find some way to just easily get started, do you have any recommendations for libraries or some approaches, something like that? That's a good question and absolutely. Some of the blogs that I mentioned, Dashing D3JS and Flowing Data are really good resources. All of the slides are available at the link up there and the links are in the slides. But also, if you just Google Data Visualization blog, you'll turn up a bunch. The great thing about DataViz on the web is that a lot of the designers that are working in it are absolutely stupendous and they really like sharing their work. So there's all sorts of really excellent blog posts out there about how individual graphics were made. You can learn a lot by reading them. And D3 is probably what I'd recommend if you're on the web. But it can be a little bit of a bear to get started with. So it's also probably worth looking at something like processing, which as I mentioned is a little bit more tailored to it was sort of designed with the intent to be used by artists as a programming language for that purpose. And there's a lot of good examples out there of using processing for Data Visualization as well. Cool. Hi. My name is Lisbeth. I work a lot with scientists and what I'm trying to do is not only do Data Visualizations but also through WordPress collect scientific data. So basically the scientist uses WordPress to collect our data and immediately visualize it. I've searched and I've found no easy way to do this either static or in real time because I want to use it in classrooms with live voting, polling, immediate visualizations, bar charts of what's going on and I would love to know more to do it in WordPress. That's a really interesting one. WordPress... PHP is getting faster and WordPress is getting faster but it's not necessarily representative of the data pipeline that you would traditionally have particularly for streaming scientific data. When we did things like that counter browser that was a lot of data folding and processing that goes into it same with the measurement lab internet speed visualization. So there's a lot of sort of processing required for some of these things. They might not all be suitable for a real-time environment in a WordPress environment but there's nothing to stop us from tying WordPress into other systems that may be handle and expedite some of that and also I would say that through a combination of sort of polling or I don't know if anyone's ever figured out a way to do WebSocket stuff with WordPress but you could certainly pull an API and pull in changes as they occur fairly easily. So I'd love to talk about it more. I'm just very curious about the challenge. Thanks. Wow, thank you. That was amazing. I'm curious for folks that are in a general web services creating sites for clients and things. Are there little ways that they might be able to integrate some of these things that are big wows whether it's a little widget in the admin area or something on a site that you have ideas around? Many of the sponsors of this event are hosts and hosts usually have great dashboards. I think that from a wow factor adding some flashy charts to your dashboard rarely hurts. It's not necessarily super useful but it's going to impress somebody at some point so if it makes it easier to make a sale sure go for it. In terms of actual utility some of those charts can be very useful. Things like site uptime and latency are good things to know about. But also even just for little plugins I was focusing on editorial concerns that I see as being very underserved in visualization. We have a lot of tools to tell us things about stuff we've put out in the world but not necessarily our editorial content. And so I'd say that if you are defining a data type in WordPress there's probably some way to slice that that would be interesting to present. Thank you. All right. Last question? Hey, I have a question, yes. So how do you decide what data is not to visualize? Or how do you prioritize? Because we generate north of 60 gigabytes of data a month so it's very easy to get pulled into this vanity metrics. So how do you prioritize and decide what is important and it's what not important. Thank you. Good question. Prioritizing data should be driven by your goals. So as I mentioned at the beginning of the presentation, visualization is made to answer a specific question. And different types of question are best answered by different types of graphics and different types of graphics are supported by different types of slice of data. And so figuring out what the major business questions you have about that 60 gigabytes of data is going to help funnel down to know maybe whether, like, are you concerned about matters of consistency or are there major outliers in the data set or are there questions that you have about being able to maybe drill down into it because all of those are going to suggest a different type of visual tool. There's a lot of stuff written out there about how to pick a visual form for particular types of data. There's many different diagrams that can represent things like uncertainty or a range of responses. So I would maybe just look around for that and see whether you can find a particular question that you have and then look around to see whether anyone else is answering that in a way that you find compelling. Cool. I think that's our time. Great. Thank you very much. Give him the biggest applause. I got a gift.