 I'm not an morning person and I'm really happy that so many people are coming here I'm not a morning person because I'm a nerd. How many nerds are here? Okay, a lot of software developers Okay, almost 50 percent. Okay. Okay. It's a good thing. I'm a software developer too and I'm somehow stuck between worlds Currently between data and journalism and that's why it's called data journalism Currently we're working at touch people data. It's a little bit difficult We have two labels one is touch people data for journalism and data science and stories for other companies So if one some some companies giving us a lot of data We want to analyze it and show it to the public then we have this other label data science stories and to show you what we are actually doing I Just pulled up some websites Because we do everything in the web Here's a Organization called upco audit and watch and they made a project about The members of Parliament's and how much money they earn in different jobs and you can see there's this long list of Names of politicians and the amount of money and that's it's really it's absolutely correct But it's difficult to actually show what's in there if you just show names and and amount of money And there's this big newspaper here spiegel who tried to make a better job also list, but some bar charts Like the different parties how much money do they earn and also From from the members of the Parliament, but but still it's it's a complex topic. How can you condense it to a very simple? Visualization it's important that Especially if you work in the web they come to the website and you usually have just two three seconds to convince them to stay there Because the whole fucking Internet is one click away So you have to make put it somehow in in one single Data visualization we did like this so everybody's a member of the Parliament and You can see every party the CDU Green Party and the left-wing parties and you can click on them and the size of the bubbles actually on how much money they earn and and by We presenting them as bubbles and putting them in such a kind of chart. You also can see How much money a specific party and for example the CDU the Conservatives definitely And money more money than like like left-wing and that's that's that's approach We have we try to find a visualization that tries to Combine all the different aspects of the story into into one single Graphic and it's then it's quite easy to implement it. But the first thing is you have to actually Do it to find find a way of visualizing it. Here's something totally different It's about a conference like this one And this conference is called Republica and they also have a Wi-Fi network and what we did is we asked the company to Build the Wi-Fi network to give us their log files and it's anonymized so but still We know where all the Wi-Fi access points at the conference are and which devices are connected to which access points at which time and we made this small animation where you can see that All the people are coming in into stage one Then there's this then there's this keynote speech and everybody likes it and then everybody goes to smoking and Other other presentations and stuff like that and you can can scroll through the whole day and also the The second day and so on and you might be interested What kind of people are these and you can track them and see that people watching a presentation about whatever data also go to Drinking beer or whatever. I'm not sure. We actually did some correlations in there and try to find out if if Two presentations share the same audience. So are they actually the same people in there even if the presentation is three days later? But basically found out that people are lazy and love to just stay in the room So so it's a good thing to to have usually one topic Do we have more of course yes in a project called lobby radar? It's actually I'm not sure I can't remember as I think 10,000 people groups and organizations here in Germany So we have something like the Deutsche Bundestag the the parliament and CDU the conservative party and some some Smaller companies and the yellow bubbles are persons. I think Helmut Schmidt and and so on they can click on them and it's broken Click on them and nothing happens Yeah, it's a long story why but I think I found a screenshot no long one It's an all the screenshot we did it with a set if it's a biggest TV station In one of the big TV stations in in Germany We made this project lobby radar the idea was to have a big database and see how all the connections between them So somebody worked there at the think tank and then moved to to the parliament And now he's a journalist and whatsoever and the funny thing is we Published the project with the city F and then they found out that actually some Journalists at city F also worked in some kind of think tanks and this was difficult issue and then some politicians said that Yeah, whatever City of love the project and then half a year later. They dropped it and now we Open source it we tried to put everything on github. So we could ourselves can can make a copy and Duplicate and publish it website, but it's it crashes. I'm not sure why But it's a but it was was a horrible project because of collecting Collecting all the data with different sources. I think much even sure if it's even older than Vicky data. I'm not sure It's five years old But it's combining data from Facebook from Twitter from seeing Official public data and stuff like that We're trying to to combine it and then actually should have here some kind of an box that shows you the sources and stuff like that Whatever Then maybe I can show you something else. Yeah, so these projects were from my company before called open data city And then we moved as a whole team to target spiegel here two years ago as data science and stories and target spiegel data and One of our first project was this one It's we found some old map data from Berlin from 1928 so almost 100 years ago, so 90 years ago and We made we made a small web tool. That's really simple. It's like Google Maps, but the funny thing is that on the left side It's 1928 and on the right side. It's 2015 Yeah, and you can can zoom in and see the Bundestag Still there, but the Zige soiler for example moved Here the Zige so is now here stuff like that again can see Which part of building was destroyed doing the second world war you can see how Spray moved like I think you can see before and after Currently we are at Anhalter Bahnhof Where is it? Well, where am I? This is building. Yeah, that's okay May I should see it from from this time? Up. Oh, it's tear garden. Okay here tear garden and Temple room so so we are currently in this building and you probably have seen this Liquid room and Football court outside but 90 years ago there was a big train station there the Anhalter Bahnhof That's why the train station is called here Anhalter Bahnhof And if you look outside you actually see the leftovers from the Bahnhof there And if you're in the stations down there, they actually show you some photos on how it looked at that time Yeah, but The good thing is when when a station train station is gone You have a lot of space for making parks and planting trees and stuff like that. So maybe better than And one of the most I think exhausting project we did here target speakers is Val Speziale says actually just an usual election by portal It was on on the election in September here in Berlin in Germany And we wanted to make it really hard and really really complicated so we made this almost totally automated web portal system and Actually have a graph out of that, but I can show you there. So so here are the other results There's also a nerd view where you can see on how many people are not voting. It's not just parties also than voters The parliament itself. How does it look like? with every Member so now it's updated doing doing the whole election process live Coalitions here we have a map on the results for for Berlin And not just the current results also you can go back in time and see how building changes and Also There was also an vote voting about the airport Tegel in in Berlin the question was should be Leave it open or close it and the green said, yeah Close it and the blue set No, leave it open and as you can see that we also added some some flight plans of some planes and obviously It's quite loud and the people don't like it. So they want to close it down and most of these this graphics of all these Visualizations updated automatically and that was it was horrible. I can I can't show you can't show you a chart here this one So on the left is the good part. It's so it's like the front end like Like The paint of the car whatever and the right thing is the engine engine and you can see that from So, so for example, here's Do I think no, do I? Do I think pointer No, sorry. So maybe I take this one Yeah, so you can see it but on the right thing you set the F T for an ID T for Uptippin means that actually but somebody said watch TV and put typed in the numbers Was the easiest interface between a TV and the database? Of course And we have some some Local voting agencies and some of them we can we could write the scraper and some of them We still have some people have we have I think three or four people who typed all the numbers in from some websites And in the TV station and then we have some CV some poll exit polls and stuff like that with scrapers and Target Spiegel self so whenever there's an article or a tweet from Target Spiegel we fetch it and check It doesn't actually have something to do with the election. Then we'll publish it Twitter story tile. This is a framework for What's it called life ticker like it's a life ticker tool in Target Spiegel Then we fetched all the data put in the database and so there's another website called Mandats rechner. He's a guy who well comes a professional on calculating the How the parliament would look like? How many people are in there and what kind of parties are in there and stuff like that So we get all the data fetch it to the send the this guy Okay, we have a data update and he pulled the data calculated the new parliament send the data back Pulled the web hook on our side and so we can update our database and so in the end somebody Published new data and in five seconds. We know the new parliament and Can update every data visualization on our website Even with push notifications. So It's not it's a web hook. It's a web socket. So every data visualizations updating automatically all you have to do is just pull up the website at the beginning of the election or whatever and then it will update the whole time and Also including a CDN system. This was really great. It was To ensure that when a million of people are opening the website that the servers won't crash We have been content delivery network and the nerd who built it Was really really proud because an end it costed just $13 and 37 cents So it's lead one one three three seven He's a nerd. Okay, but I love it So but but the basic thing is what I'm showing is you to use We're making a lot of visualizations and put a lot of effort in there that they look great and from the technology point of view we always trying to Yeah, scraping the boundaries and then doing something totally new But to be honest with you 50 maybe up to 80 percent of our time We're actually using is is cleaning up the data. So you see the project on Of the members of parliaments and how much money they earn is Usually we have some data from PDFs or some obscure websites and stuff like that and cleaning that up and finding Is what's in the data actually something that we can actually tell or does it? Look like it's a ballot and stuff like that. So I think I think 50 to 80 percent is just it's just cleaning it up and that's something that nobody sees Everybody's oh, it's beautiful. You have a great web designer. No, it's actually most most of the work And that's very important is something that you can't see so Actually Unfortunately the guy who builds for example this machine this incredible engine and also cleans up a lot of data And automatic processes is not here today because it would be great to talk Have a talk with him about how he uses wiki data and what kind of ways he is using other websites and cleaning up and stuff like that, but It's a thing that that I see in in data journalism for for the future that wiki data will become more and more Important for us if we have a lot of data and trying to find out Like you have an excel file excel file with with country names or other names in there and put it somewhere in and clean it up And make an automatic map for example. There's tools like data web. I'm not sure if somebody knows it. What about that? it can't make a map because You can do is put in a spreadsheet and if it would have names of countries It would be hard for data rubber to know what is USA or what is Sweden or something of that? But if you combine it with wiki data, I think you could be very interesting can generate very interesting tools to Yeah, clean the data up. So this is somehow look in under the hood of Naughty data journalism team and If if you want to have if you have questions you can ask them right now. Oh I was so fast again, sorry Hi, my name is been Here My question is so you you publish all these graphics and so on do you also publish the raw data which is cleaned up Usually we do that Sometimes it's not that easy if it's just a simple file. For example, we got a PDF and clean it up Then definitely we publish it as a TSP file sometimes We Got the problem the projects became sometimes we don't have actually water that we have just a database, but we could provide a database Dump, but it will be invalid in the next month or whatever Currently we have a lot of data using Scraping data from Twitter and stuff like that and we are not sure if Twitter allows it to publish all the raw data, but in general, I think it's very important to Introduce these ideas of open source and open data in into journalism because you want to if you make if you make Interpretations on the data it would be good that anybody you can check it if that's okay what you did So open source and open data is a thing that we will try to to use Yeah, would also be my next question. What about the license of the data? Do you have any issues there? like reusing data where it's not clear if you're allowed to Like like the license. Yeah, like if you are allowed to to publish it and Usually we're trying to get down as Slow as possible. So the best thing would be something like CC zero Because because we know when we work with data when we work with data, it's really horrible We made this project and we want to publish it in one hour and then some I said if we check the sources Could we use the data? Let's check the licenses. Oh, we have this license and this license. What should we do and stuff like that in the end? I think before we were a small company We said CC buy it would be good thing to have great presentations and have your name on that So as a company, but as a developer, I always prefer CC zero So we try to get it's down as low as possible What about the other side like you? The one side is the data, but what about the tools is there a list of tools that you're frequently using? Did you write your own tools? Did you open source some of them? So I think What I'm tell you is actually the same thing you probably have the same issue with most of the tools we're building on So it's usually we become minutes a JavaScript node. Yes as glue code So we have this kind of strange framework and this database and we use Node.js to combine that like like this Engine you can see on the right side and then we try to open source it and the only reason we That's the only reason we have to not open sources because the code is ugly And you have to write the documentation and I think this is the both the most reasons I think I think no developer actually has an issue with with extra publishing the the code But but it's ugly and you have to write the documentation. So we're trying to make a new approach We are trying to give a small talk something like 10 or 15 minutes Just record the slides and then publish it with a YouTube video and said sorry for the code But I explain you the architecture or the basic idea behind in 15 minutes and you don't have to write anything and It's also a nice video and yeah but for for The lecture night we still have I'm not sure I think in the in the github group is something like 2030 projects just for the election. So I'm not sure if we can we could publish open we open up everything but We could it's like dumping it into the internet. It's Yeah, we should make a documentation, but it's it's hard work. So we usually find the gems and publish them Thank you very no Question Yeah Okay, thanks. Thanks for the talk And do you do you also publish the provenance of the information that you are visualizing or what are they? Provenance of the information like this is sorry the data manipulations that you do to the data I said the method or Well, not only the method what exactly how exactly have you Handled the data in order to come up with this visualization, right? Where is the data coming from? Yeah, what have you done to it because I guess that The same way that you are you can highlight one things more or less you can also manipulate the data in between and How how can you make sure that? The the the client to which you are showing the data is showing it. I mean it's exactly There's this actually a big discussion in data journalism because data always seems to be So exact and so real and so you have to trust and believe believe in that but you can manipulate it just by using different data visualizations and Part the part answer one part of the answer is we try to describe it in the text We have a small method description to show you what was the idea behind it and explain it and the second thing is we always trying to publish it on Twitter and showed other data journalism and Then we hit each other because it well you made it around and whatever so for example It's a typical typical problem When when generating maps was a big discussion and it was a big discussion about visualization visualizing Berlin that way because usually they have these If you see it on the front page usually have been in something like that so you have this Map with with big regions that that have a have a color But the problem is that in a lot of these parts of building nobody lives there because there's a park or or or big Airport and something like that. So so what I did is cutting out every part of building where nobody lives So it's actually just showing the the results the election results in regions where actually people live But other people say well, it is misleading because it looks like that in some parts of building nobody's voting Yeah, that's true because nobody lists this or nobody's voting there, but but Yeah, that's a good question For example, there were some maps on on Berlin where you see these huge parts of white wing voters But if you look there, that's actually just There's just a sea or lake or whatever or nothing So so I think there's no perfect solution for data visualization at all. I think that's no There's no perfect solution So one thing is what we're doing is we have a lot of discussions other data journalists and also we have meetings after the election For example to get all the data just to gather and show their projects and approaches and have a talk and to be with about that Okay Susan actually about the Visualization about where people in Berlin live and how they voted. I think you found Something unique how to find out where people actually live and it has something to do with telephone books, right? Yeah There's a system behind it. It's called geo queued the translation would be geo conversion using telephone book entries It's it's it's we have we have a problem on combining data aggregated on different levels. So so for example, you have this voting districts from 2013 and voting districts from 2017 and they don't fit it Because they change during the time. So we need an approach to calculate from from one year to another one or from postal codes to whatever The voting districts and stuff like that and we and in the end we use a hack telephone book because it was the biggest source of Addresses and where people live So so we use this one and and that's also here in Berlin enhanced with local that the statistic data, but but basically it shows Statistical regions. It's a quality in in Berlin. So like like four or five house blocks together and and We we decided if you actually show this block or not by checking on how many telephones are there So whatever But that was our approach we want to publish that too because I think it's a very interesting idea and a project But we have problems because of we use the hack telephone book. I'm not sure we can publish that and also some Goodbye Sorry, we need to come to an end the others because already waiting. So finish a sentence. Oh, no, sorry I can't remember Was nice to have you. All right. Thank you very much