 First we have Frederick DeBlesser, who is the author of Nodebox, which hopefully you guys have seen before, and if not, don't look now because you'll be distracted. He's currently pursuing his PhD in Generative Design at the Experimental Media Research Group at St. Lucas University College of Art and Design, and is working on a browser-based version of Nodebox, which I believe we're going to get a glimpse of. And our second speaker, Tom Deschmet, the author of Pattern, a web mining module for the Python programming language. Tom finishes PhD on Computer Creativity at the Computational Linguistics and Psycholinguistics Research Group at the University of Antwerp, and now works at EMRG. And his current research interests include sentiment analysis and stylometry. Thank you. Hi. I'm Frederick, that's Tom, as was pointed out to us. And together we are the Experimental Media Research Group. We're situated in Antwerp in Belgium. And we have a couple of focuses. We focus on computer graphics, meaning data visualization, generative art, user interfaces, and artificial intelligence through data mining, machine learning, and creativity. Our target, oh, sorry, that's us. Our target audience is a bit different, but I'll talk about that in a minute. I'm currently pursuing a PhD about the impact of generative design tools. So a guy from Lyra, come talk to me afterwards. And Tom has got his PhD last year, and also developed the Pattern Library, which we'll talk about. And then Lucas Nice, we have to mention him, he couldn't be here today, but he's ahead of our research department. Today what we want to talk about is show you a little bit of the tools that we're using for graphic designers, how we can do data mining through Pattern. And then we also want to present a case study to sort of explain what the tools are doing and then finally show you a demo. And just to make sure that I'm not running over time, I'll have a stopwatch here. So let's talk about notebooks. Notebooks is a free application that generates visuals by connecting nodes. That's the easiest way to talk about it. So it's been through a couple of revisions. The one that you see here is version three. And it's a cross-platform application, a desktop application, running on Mac, Windows, and Linux. We use it a lot with students and through demonstrating it and using it with students. We found that it's actually really useful as a data visualization tool as well. This didn't come from us, actually. This was a student who wanted to do a data visualization using the software and we thought it was crazy. But it turns out that it was actually a really good match for the tool itself. So we started focusing on that and we do a whole lot of workshops. Last year I think we did four or five workshops all around the world and we're open to do more, if you like, mostly for graphic design students. So I think our target audience is a bit different than other people because we are focusing really on non-technical people. Here are some of the examples of work that students have created. This is the sort of standard visualization of the shipwrecks of California that were shipwrecked during a 50-year period. You see some increase. They're colored by the category or the way they crashed, some vanished, which is interesting. This is a really fun one. It's made by two students from Lithuania and it focuses on the Eurovision Song Contest, which I'm not sure that all of you have heard about. Anyway, you should look it up. It's awesome. It's like the most camping coming out of Europe ever. What's really interesting for us is that the voting process is so predictable. So you get a lot of countries that sort of vote for each other, not because the song is really good, but just because they're geographically located and they're friends and so they want to stay friends. So we're from Belgium. I highlight the part from Belgium. You see that the votes actually are geographically together, which is kind of strange because it's not really about the quality. Then some, I see some other, so close are Great Britain, France, the Netherlands, and then far away are Poland and I don't know what the other ones are, but we don't care about them. This is another one that's also created during the same workshop by three students, well actually two students and one who got coffee for the other ones. They all got the same grades. It's kind of hard to separate them. So they focused on, they took, I can show you, they took subtitle data from 50 years of film speech all through this subtitle information. So they took all the speech data from films and then they visualized word usage over time. So they visualized what words like sir or tank or god would look like over time. And they overlaid it in Excel just to see and see it's kind of messy and the reason why it's messy is this is not the best way to represent it obviously. Also because they didn't want to have the exact numbers, they just wanted to have more of a feel of what all these things would look like. And what they did was they opted for a small multiples approach and they took that into the software and they basically said, well, this is about speech. So let's create kind of a word shape like a sound wave form which you could do in Notebox and then you would sort of distribute over time and here you see a close-up of words like oh okay and then how dad and mom sort of move through time. It's really interesting to see a difference in word usage. Some of the visualizations that I showed you now use, I don't really have words for it. Maybe they exist and I just don't know them yet. But I would call them graphs in the sense that they're like they use these typical graphical shapes like dots and lines and things like that. But often when we do these workshops we have other students who want to do something else. Remember these are graphic designers. And so they use a concept that I call glyphs and a glyph is basically a parametric shape that they're able to tweak by changing the parameter. So feeding in data into the parametric shape. And the best way to show it is actually by showing you an example. And this is a project by a student from Finland who did a fund called the evil fund and what's interesting about this is that the control surface actually has an evil in the slider. So this one is not one you find in illustrators soon. And that's really interesting. I'll come back to that later after I show the example. So this is at zero evil and then we sort of increase the slider and you get a little bit of evil and you get some barbed wire, some blood and you get more and more evil, some razor blades, all seeing eyes and teeth and then the eye here appearing. And so this is basically by controlling the evil slider you would get this thing. So the thing itself is not, it's not built in, right? We don't have an evil note now or yet, maybe we should. But the interesting part is that the student was able to create it. So this is something that we focus on a lot. They can create their own functions or their own abstractions basically around the data that they want to do. And then link that to a data set that they want to do. So other things might be people want to visualize productivity by using a monster or something. So this is really interesting because it's sort of different from the other visualizations in that they're not using the strict graphical objects that we associate with data visualization, but more like an abstract thing. And I think it's, for graphic designers, really interesting because they can work on their graphic aspects and make more cool razor blades. We also see usage of notebooks in the generative art scene. And so there we use notebooks for data visualization but for something else, which is the generative art, which is basically where we came from. So this is a project called Spam Ghetto by a design agency in Italy called Toto who worked for people like Lavazza and Fiat. And what they did is they created a wallpaper based on spam. So they took their spam folder and they made these visualizations that showed headlines of all the spam sort of flowing through it. And it's a wallpaper, so you can order it online. Even better, you can send them theirs, your own spam, and they will create a custom version with your own spam. So it's useful for something, at least. So I'll give the word to Tom now who will talk about pattern and data mining. So thank you, Frederick. The examples that Frederick showed all have something in common. They're all based on data, obviously. And there is a big and cool data set that is interesting to us, which is called the Internet. Now, the problem with the Internet is that the data is all unstructured. You don't get a CSV file or an Excel sheet. You get different languages, different dialects. You get facts, fiction, opinions all mingled through each other. So how do you get this unstructured data into structured information? So from natural language to a table that you can then visualize. The technique that you use is called text mining. And I want to give you a very short introduction to text mining and how we are using it to visualize stuff. So we have a Python toolkit called Pattern. It's documented online. It's free for commercial use. So you can do whatever you want with it. I mean, it has a range of tools. You have some tools for data mining, for getting content from Google, Tweets, Facebook States, Wikipedia pages, and so on. You have functionality for text analysis, for example, part of speech tagging, which is a syntactical analysis. It gives you information about the word type. So is it a verb? Is it a noun? Is it a noun phrase? Is it a collection of verbs? Or you can use it to do sentiment analysis. So finding out if an opinion is positive or negative. There are machine learning tools that support vector machines, vector space models, neural networks, and so on. But I'm not going to go into that. One short example of how you would use it. Suppose you were mining tweets from Twitter. And you get a tweet, my new iPhone is amazing. You could use part of speech tagging in pattern to find out the different word types. You could look at the adjectives, and you could deduct from the adjective if this is a positive or a negative opinion. Because I'm going to start with a small example first. So the code is really simple. You have a Twitter class. You have a Part Street command. In this rule, I'm creating a Twitter object for English language. And I'm searching it for tweets that have the word phone in it. So I get a list of tweets. And then for each text in each tweet, I'm parsing it. So parsing is part of speech tagging. So what I get is a list of sentences. In each sentence, there are chunks. So words that go together. The black cat is one chunk. The words all belong together. And inside each chunk, you have words. Words have a type because they are parsed. In this example, JJ is the abbreviation for adjectives. So I can filter the adjectives out of each tweet. And I can print them. And then you get a collection of adjectives that are used when people talk about phones. Now, adjectives are interesting because we use them to convey our personal emotions. For example, we say a very good phone, a bad phone, an awesome conference, a horrible talk. So adjectives and adverbs tell us something of our personal feelings. And using this technique, just looking at adjectives and assigning scores to them from plus 1 to minus 1, you get pretty good accuracy for determining if somebody is saying something positive or negative. The only drawback is that it doesn't work for sarcasm. So if somebody says, nice try, Obama. Nice has a positive feel to it. But the expression as a whole is negative. Another code example. So I have a Twitter class. I have a sentiment command. In a loop, I'm searching for English tweets that have the word Obama in it. And for each tweet, I'm printing the text of it. And I'm printing some sentiment values out of it. So what you get is a list of the tweet. And below are some numbers that represent if the tweet is positive or negative. So now, basically, we went from a natural language sentence to a numerical value that we can use in a visualization. To summarize this, what is text mining? Text mining is big data mining. So you mine data from Wikipedia, Google, Twitter, or somewhere else. And you do real-time text analysis on that data. So you convert text to numbers. To show one case study, we have a dashboard online that tracks what people say on Twitter about Belgian politicians. So we have the amount of tweets per political party. We have the sentiment per political party. We have politicians per city. We have the top politicians. We have timelines of sentiment. And I think it was the 2010 elections, we were able to predict the outcome of the elections two weeks in advance before the official results were in just by looking what people online were saying about politicians. Another case study is we looked at news articles. So all the Belgian newspapers, I think, in a three-year period, what was the amount of positive news articles or negative news articles about political parties? And what you see, the yellow-orange lines are the right-wing parties. The dark part of the bar is negative feedback. And you see that Belgian newspapers consistently report more negatively on right-wing political parties. So it doesn't matter if you're left-wing or right-wing, that doesn't seem entirely fair. So this was a big thing in the media when we published these results. What we're doing right now is new research. It's text profiling. So what text profiling does is you want to get some information about the author, what the author is not what he's writing about, but based on the way he writes. You can derive information about age, gender, the region the person lives in, personality, education, and so on. To show you some examples, teenagers are more inclined to use informal languages. Chat language, smileys, curses. Adults use more formal language. Women are inclined to use more pronouns, we, she, her, my, in a social context. So they talk about people. I'm generalizing, of course. Men use more determiners and quantifiers in a practical context. They talk about figures. They talk about objects more than women do. People with an extroverted personality will probably say, we think it's awesome. People with an introverted personality will say, I'm not so sure. So it's we versus I, positive language versus negative language. So let's try out a simple case study. We looked at what people comment on different US cable news companies. So we looked at Facebook posts and tweets that mentioned CNN, MSNBC, or Fox News. And the idea is we gather the data. We use pattern to do sentiment analysis to try and find the gender of the person that posted a comment and try to determine the writing level. So the complexity of language use and use this data to visualize a graph in Notebox. We have about 130,000 states from Facebook, most of them from Fox News. We have very little data from CNN. So there's probably a little bias in the data there. We have about 13,000 tweets evenly distributed among CNN, MSNBC, and Fox News. So some results, you get the percentage of male versus female. You get the amount of negative comments. You get the amount of rude comments. Rude comments means comments that are very negative. So that contains swear words or that are really very negative. And you get an indication of cruelty. Cruelty are people that like rude comments. So it's not that spectacular. The only thing that is a bit strange is that you have a very high percentage of males that comment on CNN. But it might be the bias that we have very little data for that. In general, men are slightly more negative than women. And then another experiment is when you look at writing levels. So writing level of 100 would be the cat with the hat, so children's books. Writing level of 30 would be a PhD thesis. And you see that there's some difference between CNN, MSNBC, and Fox News. Now it is interesting that even though very few women comment on CNN, they are the ones that use the most sophisticated language. And well, I'm not going to draw any conclusions for that. Because again, it can be the bias in the data. For Twitter, there's more negative feedback on MSNBC. There's a little bit of rudeness, but it's the same for everyone. And you see some difference in the topics that they cover. But hopefully that will be more clear in the visualization that Frederick shows. So I'm going to give the word back to Frederick. Thanks. Let's talk about visualizing the data. So what I didn't mention in the beginning when I was showing student work is that a lot of the work is actually created in one-week workshops. Actually, all of the examples are created during one week by students who have never used the software before, who can't program their graphic designers, and who basically have to learn the software in two days, find a data set, make a visualization, and then finally, on Friday, print it. So they don't even have the Friday to actually do something useful there. So it's really a really short time. And based on that, I think it's really impressive. So I don't have a full week to show you how that might look in a visualization. But I'll show you what I can do in five minutes or something. So we give the student as a process that's based on the visualizing data book. Who's here is familiar with this book? I think a lot of people. Yeah, awesome. Awesome book. So this talks about processing. But actually, the system he uses is applicable to any kind of visualization process. And I should mention, just as Mike mentioned and other people as well, that this looks very linear in reality. This is not the case, but you probably all know that. But let's think that it is. And let's talk you through all of the steps. So let's take the three first steps, acquire, parse, filter. And what's going on there is that we basically find a way to get the data into the system. Acquiring, we do through the APIs that everybody's using. Or we might find open data sets, open data sets, which is awesome from World Bank or from data.gov. We parse them, and we filter them. And a lot of the stuff there is stuff that we can reuse from pattern.web. So pattern has a really good web component that allows us to access a lot of these data programmatically instead of doing it manually. And the next step, and this is optional in some cases, is that we mine the data. And there we use what Tom talked about, which is the sentiment analysis module, which I'll also show you in a minute. And then comes the last steps. And those are really where Notebox takes over. So at a certain point, we have a data set that's clean. And then we can take it into the software, represent, refine, and interact. And I should stress on this that Notebox really works best with clean data sets. So it doesn't work as well if you have something that's still messy and you have to clean up. That's really horrible to work with in Notebox. And I think everywhere. So I want to show you a short demo of how that might look in the system. And this is a video, and I hope it plays. I think I have to do something like this. Okay, there we go. So first off, we import the data set. We can see here that it says, maybe you can't see it, FB-MSNBC. So this is the Facebook data from MSNBC that Tom got. There we can scroll through the data. And the first thing that we want to do is we want to show this on a timeline. So what we're going to do is we're going to take out the date. So basically you can look at this data set as a tabular data set. And so what we are going to do is look up on the date. And this gives us, basically, extracts one column out of the data. So this is just a date column. Now, this is just numbers. We can't really do anything with them. So what we're going to do is we're going to put them on some kind of timeline. And to do that, we use a time scale, which is similar to what D3 uses, where we can set a minimum and maximum date and then output values that are going to be coordinates. We can use absolute values, or we can use things like two weeks ago. So it can parse some natural language, or from two weeks ago to now, or the other way around. And then to apply that, what we're going to do is we're going to convert our incoming data using the scale that we provide. So if we click on that, now we see that these dates are converted to coordinates. So now we have coordinates that we can use, and we can attach them to a node called makePoint that takes two impits, an X and a Y coordinate. And so if we map them to X, we see that they go in the horizontal direction. If we map them to Y, we see that they map in the vertical direction. Makes sense, right? So we map them to X. And this is our timeline. And already we're going to use, because these are just abstract values, these are points. We want to map them on ellipses. So we're going to change the ellipse, make them a bit smaller so you can actually see. And you sort of see day and night patterns already in the data occurring. It's really subtle, but it's like the emergent property of the data that people are not really talking as much at night. So the next part is that we want to visualize on the Y axis the sentiment values. And so we get that from the data, or we do the analysis. And then we attach it to Y. And you see that there's only tiny, tiny little bit of change. And that's because if we look at the data, then we see that these are actually numbers going from minus zero to one. So they're a bit too small. So we have to do some kind of conversion. There's many ways to do that in the system, but right now we'll just use multiply, which will multiply with a constant value. So we're going to pipe that into make point. So put the node in between. And then we can just interactively drag this up and we see the sentiment value spread out as it were. So this gives us the sentiment value. Because we're graphic designers, we can do, we can change all the properties like color these things. This happens in a separate node. So this is a separate step that we can also parameterize if you like. And then we can add a legend. The legend is going to be based on the scale. So we attach it to the scale. And then if we render this, so if we look at the output here, we see the scale appearing. Now we can only see one thing at the time. So if we want to see both the points and the other and the legend, we have to use merge node. And by connecting the two together, we can see that they're overlap. Now they're on top of each other. So go back to the legend and then we can just drag the position down to actually change the legend. We can't drag it left to right because it's attached. So just left that way. And then we can tweak the color as well if you like. So make it a bit more gray. And so this is our basic, super basic visualization that we got. We'll get the sentiment values over time. Now what's interesting is that we can use this in, or we can embed this into another context. And to do that, what we're going to do is basically going to just make the visualization a bit smaller so it will fit in a different context. So we make it a bit smaller. We change the number of ticks so that we don't get as many points as we want. And then we can press the embed button and this gives us an iFrame code just like YouTube. So we can take this piece and we can copy and paste it somewhere else. We set the size so it fits in what we were doing. And now we can basically just paste this into our text editor, save. And this is, imagine that I'm working on a social media dashboard. This is what the dashboard looks like right now. And if we refresh, we get the visualization that we just entered. So it's really simple to just embed it inside of the other context. Now you see there's something wrong. I don't see the legend at the bottom. It's because it's a bit too high. So we can just go back. We can adjust the size of the legend. That also means that we probably want to adjust the multiplication as well. So we can do that. And then we just go back to the other page, refresh, and we just see the visualization as it should be. So it's all live. What's interesting is that the dataset itself is a parameter. And so that means that, and this is kind of hard to see, but that means that if we put in another dataset value, so another file that we have provided, then we can just interactively make new ones, so make them parametric. And so we get the same data for Fox News and CNN. And we see that for CNN, we only have very little data just for one day. Here are some other visualizations. So there's a word cloud. There's gender distribution, also based on the sentiment analysis or the pattern library. So quickly to show you the word cloud, what we do is we sample the data because we don't want to see all of it. We extract the keywords. We extract the number of upvotes. We make these into text. And then we use a node called Nuch that will basically make a word cloud that will just spread things out to the side. It's a bit of an experimental one. Male females even easier. We just import the data again. We group by gender and then we just turn that into a pie chart and we can make it bigger and smaller if we like. So that's the basics of how you do a visualization in notebooks. Now what's interesting, this is cool for things like social media but a lot of the data that we work with and again we're working with students is that they're not really interested in this kind of data or not only interested in this kind of data. They're interested in all kinds of data and everything's data like music is data and or they might want to play with sound waves or they might want to play with images and do something with that. So there's a whole lot of stuff that I can see and I'll point you to some of the websites that show you more stuff that our students have created. But one fun example that I want to show you is where we use an image basically as a data set. So here we go back into the system and what we can see is that this is a bit more complicated but we take in a basic image and we extract the colors from it and so this is a beautiful sunset so we can make it more detailed but this is still the web so we have to make it a bit bigger in this case until Chrome gets even faster. And then we can extract hue saturation value out of these convert these values into values that we like and then use a make point again to redistribute them and if we attach this to animation which we can do then we can play this and it will automatically sort of redistribute the pixels according to the image. So the dark ones go on the bottom and the bright ones go at the top and you also see the hue spread. Now because hue is often visualizes a circle what we want to do is use a different note that's called coordinates that maps these onto a radial wave and so now you see the hue circle of this image appearing and it's kind of slow because of the screen recording but it's actually works really well with thousands of points. And then because it's live we can just keep playing with it so here we can just change how big the size of the outer circle is going to be for example just by changing it and then here we can see the non-animated version so just the coordinates mapped onto the pixels and of course we can change it as well. So that's another visualization that we did in system. When we started out in Notebox and that was in 2004 I think it was a really different environment we had web browsers but basically if you wanted to make something interactive you would use Micromedia Director which was awesome. But it's no longer supported high-year. So we had completely different tools and tool sets and basically everything on the web is slow and then if you want to do something fast, fast in that day, then you would just do it on the desktop. So we started out just creating these desktop tools but as we got along we noticed that browser became faster and we could do more and more of this stuff. And so we were facing this choice of what are we going to do and in September last year I bit the bullet and I said okay let's just do everything on the web. And so I created Notebox Mark and Mark is make in Dutch and it's a version of Notebox that runs entirely on the web. So that means that all of the nodes now run online. We can run them either in the browser on the server side, in Node.js. And the whole system is basically a live IDE, a live visualization tool that we can run online. Mark will be open source but at the moment you can still already sign up for the beta release that will come out soon. What I didn't talk about and at this slide at the last moment because I'm so used to it is that there's actually two modes of working. There's the visual way but what's also important is that there's also the code way so you can choose. Basically when you create a new what we call function which is the main abstraction in the system you can choose whether that's going to be a visual function or one that's written in JavaScript. And so they both use the same interface meaning they have multiple inputs and one output. And so it doesn't really matter if you use a visual one or one that's written by code because you can just mix and match the two. So the ones using nodes can use ones written in code in the other way around. That also means that it's completely transparent what you're doing so at every point you can view source and look at what this thing is going to be, what it's built out of. And if that's built out of other components you can sort of dig all the way down to see what the primitives are. And the idea is that you'll be able to clone them and make your own versions and change them. So that's really key to us because we believe a visual approach works really well but at some point especially with the model that we are using people will probably grow out of it or they want to do some things themselves. And there we have to have a possibility to use code as well. A little bit about the way that the whole system is set up so we are part of an academy and as part of the job of the academy is to distribute tools for the greater community. And so both Pattern and Mac are free and open source. We had some report bugs and contribute code there so feel free to cooperate. And then there are some commercial services that we're doing so Tom is in the process of commercializing the stuff that he's showing with text profiling. And for Mac what we are going to do is basically allow the people using it for open projects to be open and then if they want to use it for private projects we will pay a small fee basically for the hosting and for getting our servers online. That's the same model that GitHub uses somehow. Because a lot of these talks talk about the positive aspects and I did that too. I also like talking a little bit about the challenges we still face about the things that are not going the way as we plan. I just got back from Lithuania doing a workshop and one thing that I noticed is that the students are self-reliant once they get to the notebook side but the first parts the acquire phase of parsing, the filtering, the mining still requires a lot of intervention. And so there at some point we have to find a way as a community to make people that are non-technical programmers to be self-reliant to be able to get data in a flexible way and maybe people here can point me out to tools that I'm still missing but I haven't found like the perfect tool for people to be able to confidently give them to that. We use Google refining that sort of works but apart from that there might be much more that we can do. Another aspect that we think is interesting because we again come from a graphic design background is typography. One of the things that powered the evil font visualization that I showed in the beginning is that it has access to the raw shapes of the letters because it has to apply these points and change them. And so I looked around and there wasn't really something available that allowed us to do that. There's no API on the web to do that and there wasn't really a library to do that as well. So that was a big challenge for us because we could say okay let's give up and let's not do that at all or we could do something else and that's what we did. So I created a library called opentype.js which does the hard, tedious and insane job of parsing through type and both script typeface just by literally looking at the binary data structures and examining it. And so it takes the whole typeface and it allows you raw access to all of the elements and I can show you a little demo of how that looks. So this is the demo site that you can visit if you go to the website. And here you see the letter shapes. You can choose a file so you can use any kind of typeface that we want. We support almost all of them. I haven't tried all letter fonts in the world but a lot. We can view the metrics of course and then we can start playing around with it because we have access to the letter shapes. We can do things like snap these to a virtual grid for example. Which is just some, it's also a note that's inside of notebooks but it's here we just extracted it just to show you how funny it is to basically create your own typefaces in a really easy way. And this uses any kind of typeface or any kind of typeface so it will look different. And you get access to all the letters. Unicode is supported. We support kerning. And we had some recent contributions from people who created an inspector which allows us to do kind of like the web developer tools in Chrome but then for fonts on the web. So this was an external contribution and it's awesome. But again, I think it's a really useful tool and we already see a lot of people using it commercially. We recently had a Kickstarter funded that used OpenType.js for doing font design. Lastly, what I want to do is I just want to make a short conclusion because what I found fascinating is that this whole data visualization scene is not just for experts like us. This is really for, or it can be, really opened up for any kind of people. We saw this with the tools from Lyra and hopefully with the tools that we are writing as well. I'm constantly amazed at the kind of things that my students can do in one week by basically starting from scratch. And so I think if they can apply that knowledge in one week, I think we have a whole lot more people that can apply this and make it useful for other people as well. So that's it for us, thanks.