 You are alive excellent. Hello everyone and this is the Tech talk from Wikimedia foundation about graphs graphs graphs graphs About a year ago Dan and a few other people got together at a hackathon and decided that Wikipedia could really use some graphs and they implemented a very simple Testing little project to try to bring a Vega based definition graphs To Wikipedia the project was successful at the time But it needed a lot more work to become product productionized about half a year later zero team I needed to implement some graphs on wikis and that's when I discovered the project and Began my quest to bring it to wikipedia's first to zero wiki being a small wiki was much easier and then to full full-scale deployment, which happened a few days ago So from now I will switch to my little slides if I can find them somewhere They should be So Graphs so Up to this point graphs were either done as png's or svg's So obviously they were not very easy to modify because once you upload an image to to wikipedia You're stuck with it. You have to go to the original source of the data original program in order to Edit and recreate the data and recreate the graph and enter some additional data So obviously that was not very convenient. Hence the This is the implementation. It consists of graph tag Which includes a graph definition? You can read about it on vega.js website. It supports vega.js grammar, which is developed at University of Washington and It also supports template parameters, which brings it to the wiki world Once the definition is resolved It gets I Thought that someone's been Amy once the definition gets resolved and Becomes a proper graph it gets stored as page properties Which later gets rendered either on the client side? By a vega.js library Unfortunately, the process is fairly slow because it requires a large download But on the other hand, it's much more flexible or it can be rendered on the server side at this point We have most of the graphs rendered on the server side Except when you're editing and you click page preview This is a overall schema of how things work first Let's see if I can get the mouse on the wiki markup Contains the definition from in there the parser puts it in page properties and also into HTML HTML gets updated only for the page preview Whereas page properties get updated all the time So HTML gets sent to the modern browser, which uses the JavaScript libraries to render it The page properties on the other hand get used by the graph for its service right there That's second block on the right middle block on the right. That's the service that pulls the data and renders it into a png or later at SVG Now Vega Vega is a fairly complicated grammar. I tried I tried and I failed to become fully proficient with it But there's a potential the problem with Vega is its complexity the beauty of Vega is its power It can express pretty much any graph any graph data you can possibly throw at it So that there are some additional tools coming out that I think then we'll talk about a bit later in this presentation To make it much much simpler So there is data sources that you can either embed the data inside the definition or you can reference it as an external data Then it becomes a transformation system uses transformation where it can transform that data using either JavaScript expressions or it can do various data manipulations like zipping to Data sources together filtering it can do cross product that can do Grouping and many other things and it can do visual encode transformation. That's where it prepares data for visualization It can do Because graphs can do maps maps being just another set of data It can do geo encoding geo transformation. It can do by preparation it It can do the Things like if you have a stack graph to calculate where each point should be drawn Lastly, there's two more aspects. There's scales. The scales is an important concept there once you have the data the data needs to be Spread across the axis you can think of it as a projection you have the the data on Y axis for example, and then you have the scalar that looks at that data and says oh, so This this is the little transformation it should go go through in order to become the actual pixel on the screen There and there's marks, which is the where you said Say how the data should be drawn should it be aligned should it be a shaded area or should it be Bars or stacked areas or many other things now for the fun thing There is The simple way to use graphs, which is to embed data directly You just say values and you specify the values in Jason directly in the Jason format or you can set it up as an external URL where you can actually pull an action raw a Rock content of some other page or even API call There's more fun things like template parameters Where you can just say either a named parameter or there's a default value or you can actually use much more complicated parameters there where you can use Media wiki expressions that expand into parameters and lastly something that I recently started using which is a Lua module that can prepare the data and Embed it right there in the graph. So you basically use the data embedded directly method except that it's template that that Uses law module to expand to Create the data on the fly and that data actually gets pulled from another page for example or multiple pages gets combined in Lua and Lua just gives you the final result as As I'll show a demo of that momentarily Demo, okay, let me first of all switch to check that there is no question so far Other questions has anyone been asking me questions. Yeah, there's one question It's the doc mentioned that the ability to do word clouds, but I couldn't make it work the theory knows that supported Yes, uh word cloud. I know that word cloud is not enabled yet That's because word cloud requires as an additional library That's that wasn't part of the initial deployment and I need to put it through the secure Regular security checks and other things like that once it's it goes through it should be fairly easy and quick I'll deploy it everywhere Any other questions so far So far No, go ahead. All right screen sharing again right now Right underneath the expression graph page There's a demo page which has all my current little toys that I've played with so I'll start at the bottom because that's actually the simplest graphs This is the absolute most simple graph you can come up with It was taken directly from Vega examples There's no external data. There's nothing an actual I copy-pasted the actual definition of the graph here as well So there's some basic setup parameters like width and padding. There is the data source Which is embedded right there into the into the system Into the definition. There's scales which explains how the X and Y coordinates should be worked on Also defines the axis which is very simple and straightforward and the marks which says that the type of mark should be a rectangle and Set a set up where the X and Y and width and height should go plus the the the color of the graph if you click on edit source here and You go to oops page show preview you'll see that the graph actually becomes live That's because At this point and it might change in the future. This gets rendered locally as opposed to on by the graphoid service Ah Live this page. We're not gonna edit this now. Okay. Now next oops. Sorry Next graph from the bottom is slightly more complex one This is the simple graph defined in graph examples template and It takes one parameter Which is the color of this graph and you can click on it and There it is and we can edit source again This this time it's an arc graph Again, the data is embedded. There's a simple transformation to allow pie chart drawing and Basically the only exception to the to the previous the only difference from the previous graph is now It's a template parameter expansion which either uses the ccc color or the first parameter Simple not a very exciting Moving forward now this will become possible if we ever allow for the proper API Calls inside either a Lula or some other method. Yes. You had a question. Yes. Yeah There we go. Yes that example The reason that doesn't have a graph tag around it because it's in that special graph namespace So I'm wondering Are we going to configure a graph namespace on every wiki so that if you just put Jason in it It'll it'll say oh this must be a graph or and if you don't do that how Where where do you put the Jason file see if you don't want to like fill up every page with big chunks of Jason That's a great question I'm zero we had graph namespaces initially. That's how I implement I envisioned graph namespace to be the primary way of doing it and the graph tag was added as an afterthought but the more I experimented with it the less I like the grave graph namespace idea here and the reason for it is because graph namespace is Jason strict Jason Whereas graphs mostly useful as templates as I will demonstrate in a second. So having a separate Namespace just for graphs without actually any benefit from the syntax Processing becomes fairly pointless. What I found the most useful on the other hand is to store graphs as their own templates because they are just as easy to embed you just do to Figure back brackets and you put the the the name of the template in Except that now it it falls into a general Concept of parameter expansion. I mean all the benefits of the templates Yes So because of that I I've decided that graph namespace is more of a hindrance that than a Benefit it's better to put these graphs as their own templates wrapped around by the simple graph tag and If you ever need to put some additional things that cannot be expressed by graph like a little Header or title right above it. It still would fit in perfectly inside of that template Hope that answers the question and Switching right back to So this this is the API results here, I try to show the contributions by their by the users Basically a way to engage Authors editors in the sense that give more credit to the to the editors the first graph is But after some filtering the edits of Barack Obama article with Basically with each user being With the edits being stacked one on top of the other to signify the addition of the Article size or decrease in the article size. So this and shows the top 20 of the Editors and this is a different one. This is Douglas Adams article this one is the The just absolute Contribution by who who made the biggest contribution by the way that spike was due to Google running a Google doodle about Douglas Adams and now I move to the coolest in my opinion graphs graphs that use geographical data So I took Definitely a world definition Jason you can see it as a really really crazy Jason right here It's just basically a whole bunch of Jason data Some some day I hope we'll be able to visualize it properly instead of just Showing it like it. It's a broken wiki markup and then Use that Jason blob, which is a nothing but the definition of all the countries together with various Additional sources. So for example, let's say I want to highlight individual countries. I Would define a simple Jason Parameter like this VR Brazil and pink US and blue China and red, whatever you can you can put hex number hex colors here That's fine too. Now Vega as accepts data in different format something like this ID Let me make it bigger just in case it's not hard to see ID country V color ID country V color. So Two two two parameter Items so the simplest thing I did was first to write a Lua Module graph you tells Expand dictionary which basically converts this format Into that format once it does that I Passed the result of that application into the map So this is the result of invocation as you can see and then If we look at the source of this map It's not as complicated as it seems Right there is the source of the invocation. It's actually fairly straightforward The map itself then this is the parameter right there which is expands the data and An additional parameter just for the fun of it to make the map slightly smaller Otherwise, it picks up the whole screen and that generates this graph now even more interesting graph is when you have Very complex data for example here Here I have a huge blob of data that I copied from UN which is the world population by country So I'm from 1950s up until 2010 I believe yes 2010 for each country I Recoded by country code, but initially it was done by country ID, which I think is actually better in the in the long run and What I do with that is I say I wrote a couple of additional Graph you tell it's like parse TSE Which basically I say, okay, give me just the column to 2010 and This graph is basically the world population distribution For country of the 2010 data and if I was to If I wanted to make a different data like let's say a different year oops All I had to do was just change the number. I would say 1950s 1960s and then it works show preview And And it works And this is slightly a different map. I mean we should probably compare it side by side just to but I mean the difference is not Because this is growth. It's not the meat that Pronounced but what's just I do And there's so as you can see the covers are slightly different But not that much because in reality the world pop you like for example, this this is slightly paler This is slightly more pronounced But in reality, you will need much more design work now to make these data Stand out and be more usable than this. I'm not a designer. So I didn't even try it and This brings me to the conclusion of my little demo and More time for questions No, there is a question. I do not hear Can we make graph namespaces do template extension wouldn't that be the best of both worlds? That's a question from my RC one. Oh, sorry. I let me read it. I did not Oh stuff Where's the question if can we make graph namespace do template extensions wouldn't that be the best of both worlds? We could but then In that means that if we wanted to wrap the graph in something else like a small box Around it or put it on the right-hand side or whatever it would you would still use a regular template Which means half of your time you look involved graphs half of the time you'll invoke templates and you as I said There is no real benefit from the graph namespace. There is Other than separating graphs from everything else Which doesn't really do much because well you can still use a graph tag plus So so there's really from what if from my experience from everything I've used graphs for on zero There is really no benefit over actually Having graph namespace it just it confuses because it puts graphs in two different locations But at the end of the day you still use them as templates So why not just have them as template and maybe have a convention to call them something like template colon graph block and That that solves it's it would look identical and the only difference would be a small graph tag wrapping around it There's a second there's a question in the room Different kind of Projections This one is Mercata Mercata, but it it supports any kind of projections that D3 supports because they get based on D3 so Switching to them and then we'll continue the presentation Hey guys so I was gonna walk through an example of writing a Vega graph from scratch and Graphs and pie charts are useful. I think the templating approach is really really useful And I love the stuff that you have but there's another thing that you can do this extension Which is really to tell a story. So that's what I want to do. I want to show you guys Something maybe too ambitious, but let's see if it works. So is it are people here familiar with the Napoleon visualization It's basically correlating three data sources and showing Napoleon's army going into Russia and What happened to them as a result? So basically in On lives we installed this I'm gonna paste it into IRC and Hang out And you don't realize I'm in Russia, right? I realize you're in Russia so This Lyra is a visual editor for Vega specifications, it's kind of beta quality right now. It's definitely wonky and Some things you have to do in a pretty specific order, but let's give it a shot. So we're gonna we want to look at the army There's some data sources provided here for convenience, but you can add your own It's got a new data source thing and it doesn't work very well with external URLs right now But you can just paste in the values We we're gonna want to look at cities And We're gonna look at temperatures Temps and cities, okay so Lyra has layouts We're gonna make an army layout and We're gonna start by there's basically visual things on the top here that you drag down transform them in some way with data and Kind of compose your graph that way So we're gonna first drag some text and We're gonna throw in The city I'm dragging the city into the text property So you see there's a bunch of like random text on top of each other there and what we want to do is basically call this cities and They distribute these things geographically. So I made a geographic projection, which is transforming the data There's some things you're gonna have to think over here I'm gonna leave these for later. I'm not gonna explain too much And basically I'm gonna take the latitude longitude from the From the data source and put it in the projection and what this does is it generates x and y coordinates that we can then use to drag and drop into the Text and you can see it got distributed In the x dimension and it's going to get distributed in the y dimension. So these are basically our cities and Plotted kind of geographically So We're gonna take this font make it black and we're gonna put dots under them. So we're gonna We're gonna lift these up a little bit Oh and Moscow disappeared. We don't want that so we're gonna make this a little bit wider and Now what we're gonna do is we're gonna add symbols and we're gonna map The city to the sorry We're gonna it has the same geographic projection and we're gonna take the x and y and distribute the labels next to the cities Turn it into circles and make them a little bit smaller So you can kind of see exactly where they are and these aren't gonna be offset. They're exactly at the geographic location Let's And what we want what we want to visualize is the army and the size of it going through these cities So we're gonna go to the army data and we're gonna transform that data and group it by The group in the army and the direction that they're going so they're basically gonna go into March towards Moscow and go back out So we're gonna add this transform to the pipeline and make a windowed. We're gonna window this to I want and We're going to add a line and This does something weird and liar. This is one of the wonky things you have to erase this mapping Don't worry too much But basically we're gonna look at truth movement and what we want to do here is also geographically Transform this with the latitude and longitude from that data set Which is different because they're different there's different data so you can really see how you're correlating to totally different data sets here and We're gonna we're gonna have to replicate this sort of magical Stuff that you get to by playing with it for a while and it gives us an x and y again So basically we're going to plot So there you go, that's the troop movement and that's not very interesting So why don't we make it thicker based on the army size? So we're gonna make the thickness of the lines as they're moving into Moscow Follow the size of the army so we have the size property here and we're gonna drop it into the width and We're gonna make it a little bit different We're going to make it from 1 to 50 instead of so that it's a little more pronounced and we're going to Make it instead of butts we're gonna round it off So that's starting to look a little bit more like the actual visualization, but This color is a little weird. So why don't we take the direction of the army and transform the the color going in and out of in and out of Russia, so What did I do? Oh? So in here, let's pick something kind of more Brownish and let's pick something more blackish. I accidentally moved the color to the width So let's take the size pretty ugly color, but we're not going to worry about that right now And the cities are underneath so we can drag those and make them more visible by putting them on top So there you go now you've got troop movement and The thickness of the line is the size of the army and you can see basically going into Russia They they started kind of really thinning out and by the time they came back out this black line shows them coming back out Because it's mapped to the direction They're really they end up basically completely decimated And what we want to show now is the temperature to kind of like get an idea of what was going on So we're going to add a new layer. We're going to call this temperature Let's make it a little bit wider and So it's going to show up down here and basically what we want to do here is Take the temperatures and plot some text So we want the date as the text and We also want to geographically position the dates But we're going to do something a little bit different. We're going to transform here We're going to add a formula and we're going to just basically set 55 as a flat formula It's a flat latitude because pretty much all the temperatures were measured around 55 degrees So now we have a new field called lat and we can use that as a static because we don't want to oscillate up and down but we can use the longitude from the data itself and basically saying magic Here and we have x and y from the from the transform and we're just going to put the x over here So you can see the dates are spread out now, but for the why we want to use the temperature So we want to show the district how cold it was basically And let's let's add a scale here. Well, let's turn Tucker and let's add a scale So this is Another scale on axis. Sorry. So we're going to make a y-axis. We're going to take the scale that was generated for the temperature And we're going to drop it into the axis We're going to put it on the right side. That looks a little bit better and we're going to offset it a little more and We can play with how many ticks they show. So you can use just four And we can even put a little grid So there you go and let's plot the temperature as lines now so we're going to drop the line in here and We're going to call this We have the geographic the geographic projection already set up so we can take the x And the y is going to be the temperature we're going to make it let's make it linear and The text is kind of messed up. I don't like it over there. So I'm going to go back to the text and the dates and I'm going to offset the y by 15 pixels that way it's a little bit under the line and There you go. That's basically Napoleon the Napoleon visualization is pretty famous it Combines lots of different data into one kind of thing that you can if you if you get familiar with it You can really understand and appreciate kind of how much it's communicating And the cool thing is from Lyra now basically I can export this data with inline values And I can take this paste it into my sandbox here and Check out the preview and there you go. You can you can publish publish it directly to your wiki So Yeah, um You go back and ask if there's questions That was awesome Yeah Is there so Lyra Lyra is built on top of the Vega interactive fingers There's that just what's the difference in Lyra dog I'm gonna blast over and the big a demo interactive demo site that I think we have a link to Yeah, you're talking about let me share back You're talking about this editor here, right? Yeah, the one you the one you were in how's that different from Vega live editor? But this is a Vega live editor you're talking about Yeah, can you explain what the two of them are for? Yeah, so so basically so If you could mute the room sorry, it's echoey So so Lyra is just wonky it does it does weird things it generates random Scales that you want to get rid of I don't know if that came through But I was doing things in a very specific order because if you don't do them that way it kind of messes up the the Vega editor is A lot harder to use because you have to basically keep in your head the whole concept of transforming the data into the shape But it's not gonna have weird It's not gonna do stuff that you're not expecting it's gonna do exactly what you say in the specification here so what what they're both for different things and if you're trying to Completely create something from scratch and you're pretty comfortable with a concept of of basically how it works I would I would use this But if you're trying to learn something I would look at the examples that are in Lyra I would play around with stuff there and then come back to this editor and compare the example so Based on what we just went through you can see these magic values that I kept typing in and here These are the transformations for the geographic projection And you can see that the temperature The temperature data set is using that that transformation So it'll be a lot more familiar to you if you start out in Lyra and come in here and kind of try to understand that way Basically, I think in the long run There's gonna be more focus on on stuff like Lyra and stuff like semantic editing and Trying to get higher level away from Vega, but the fact that it exposes basically Just blank canvas. Whatever you want to do is really powerful. I think and can really like let us do whatever we want Yeah So can you go the other way in other words? Can I can I take any if I see a graph and I want to tweak it? And but I don't understand the Jason Can I paste that into Lyra and Lyra will sort of reverse engineer what it's doing? They claim you can on their read me, but I haven't been able to find a place in the interface But it's open source and and people are those guys are pretty friendly So for example, just to give you an idea the people who are working at Vega are University of Washington and one of the research grad students there Interviewed both Yuri and I yesterday and the other day to ask us like how we would improve debugging Vega And they're really involved and they really are putting a lot of effort into this I would recommend just checking out UW data slash Vega and slash Lyra and and filing issues with them or whatever you want I'll paste them in IRC According to Vega people They are trying to create a standard for sharing graphical information and that's what they're trying to do with Vega so Eventually their idea is so that people don't share Excel documents with the graphs embedded in them But share a definition of a graph with the data as a Vega format Okay, I don't see any other questions So I mean the really cool thing with this I don't I mean people people are generating graphs all the time and I mean generating tables information all the time and Now there's a way to kind of just put the graph into a meet into a wiki page and I know like with Rachel's survey of The hackathon You know she got it from a Google survey and then I realized oh you can put this in the graph and then I think Yuri Went and did a lot of work for quite a lot of work to transform it So I wonder you know what the We should probably sort of encourage people to do something so that this is a lot easier to do so rather than You know coming up with Google Docs spreadsheets and then trying to get something else to graph them Just some sort of general like here's the steps so that you can have graphs of your cool stuff on wiki show up I don't know how we can Make that easier Sounds like Go ahead. Yeah. Well the first step that pretty much The communities have been asking me directly is come up with the templates Show them like very simple templates how to do x y and z Let's say they have a whole bunch of lines and they want to show them Like can we come up with like a prototype template that would do those lines? and then the community will modify it a little bit and make it more beautiful but The initial template to do something is what the community needs the most so if we can come up with the like a Set of most generic graphs that would be wonderful another thing that has been surfacing much more now than it ever was before is the fact that we cannot include templates cross wiki. That's a huge complaint and With the templates it will become even more important because with the templates. I mean sorry with the graphs you Want to have data on comments like for example Statistics for the world population or GDP or whatever else information you want or in wiki data But that's that's another story And you want the term the graph itself also stored in comments because you don't need 300 versions of the same template to draw a graph So we definitely need to work harder to bring the shared shareable templates and shareable wiki pages or embeddable wiki pages and possibly more modules or lower modules as well I want to make a little bit of a counterpoint So I think while that stuff is going to make a lot of things really really efficient I think one of the coolest things here is you can look at a diff between Someone changing a graph if it's the raw graph not not the template and you can you can really start to understand What is how the changes to the JSON are changing the visualization? And so I think a lot of people are going to want to keep doing that Sort of is a parallel kind of thing That template work is important, but also raw graph stuff. I mean it's going to stick around no matter what but I think Yeah, not not forgetting that that's also a separate cool use case Kevin have a question Yeah, so this looks really exciting. It's great. What what is the status? Where's it released or deployed or is there a timeline? What's life? It's all live It's all live on all the wikipedia There's a question on IRC from Guillaume has the communications team reached out to you about writing a blog post about all of this awesomeness He thinks it really needs to be advertised. Yes, they have Heather is sitting right there in the back of the room I can see her I saw some someone making a wiki post about this is it I'm not sure if that was the same person Someone was doing I was trying to put graphs together to show how many signposts Releases has there been through the through the years and I'm trying to find the link right link right now So there's some of the advertisement plus there was some something at posted on village pumps in I know it was posted in Russian. It was posted in English and See a is what is it caravan? I think is actively working on graphs as well I said, I'll keep you here all day asking questions So about about the raw data Can you I mean like for example the Obama size changes data file? It looks like that's just a CSV file. Can you put just data files like that on wiki commons and have them work? It's actually pretty sophisticated. You can handle pretty much any format json Geo json and top of json CSVs TSP is whatever you want and you can put them as long as They're within wiki media domains. Is that correct? You're well, that's the that's the yes and no question if you use the external URL You can externally include any data source you want but that data source has to be raw in other words you can say oh just take This action equals raw URL or API call and Include it and it can come from another domain on the other hand if you want something much more elaborate Which is a Lua invocation that Lua expense that it has to be on the same domain It has to be on the same wiki because Lua cannot work with the from what I know from what I know Cannot work with external data. So you're very as you severely limited to what data you can get and in what format When you use external link, but you can get it from anywhere or you can use Lua to do any kind of data manipulations But then it has to be local Who I have a question for you, right? So let's say I wanted to use an external file that gets updated every day Because then the graph would always be up to date Except it wouldn't be because graphoid renders it is graphoid doing rendering every day or every week or so Currently it's the rendering every 30 seconds Because I just enabled caching just to avoid any kind of hot spot Like basically if the graph gets placed on a very high visibility article, I don't want the graphoid service to die So it will update but Eventually we need to decide how to handle this kind of External data scenario and how we want to do this Whether we we either can do something like the whole graph gets stored and hashed and stored in In in a separate storage basically come up with a few extra SQL tables or There has been some fabricator bugs discussing this because at the end of the day we run into similar problems with links and categories where Grabs get out of sync and there is a number of minor and edge issues that we need to resolve to figure this out but yes, it is possible to make the data graphoid re-render everything every like a couple of hours and Or something like that to be useful Rachel we can hear you So Your examples are using external URL They're both like CSV files, right? So the Actually just updated so the atoms count You somehow you created a CSV file containing those article edits But I I see because our API can return JSON structures presumably you might be able to Get a graph to just actually clear our API for something like that and Somehow make sense of the JSON returns Possibly by a Lua module and then kind of directly render from the API That would be extremely cool Currently there is no way to insert Lua in between API or at least I haven't figured it out a way to insert Lua between API call and Graph you can either call API directly from the graph and use the Graph only transformations Or you can use Lua, but then Lua can only access static data stored in a weekend eventually That's what that's something I've been advocating for a while now Lua should have access to the API to be able to do all sorts of very cool There's a data extractions and visualizations Okay, and I guess the other thing is I don't know if it's possible for To depending on what the JSON structure returned by the API call is maybe you could it's sort of I Don't know if you can sort of get Vega to sort of understand. Oh, this is the one piece of it You care about I guess you'd have to look at every API call and see whether it sort of Vega can understand data Actually, someone has done something like that. I believe on the comments drawing a A pie chart, I believe based on the API results So it is possible someone is already has already tried doing that I can try to find the link as well and pay but You can't do this already It just that you don't have all the power of Lua to process the data only did JavaScript the limited JavaScript transformations that Vega has Because don't forget there we had to rest severely restrict Vega for the security reasons Not to not be able to execute arbitrary JavaScript Okay, I think that's it anything else Dan you're before we finish up Awesome, that was really cool. Thank you so much