 Hello everybody. I'm going to present to you so the Twitter streaming plugin for Jiffy. Talk louder. Is it okay now? Like that? Zozo lala. Like that is better now, okay? Perfect. So it will be a very simple presentation. Please tweet because I'm recording. You might have seen before the tweet so maybe we can refine your tweets and etc afterward. So don't be scared. It will be easy and please understand. So who am I? So I'm Mathieu. I'm a simple software developer in a company that doesn't do anything with graph but on side parts I'm having I'm like a graph enthusiast. So I have several famous work that I've done like visualization of graph mainly some application that I've done on the web for graph and extra. Look at my name, type my name on Google. You will see that on Twitter. But today I'm talking to you for another project that I'm working on since a long time now which is the graph streaming and the Twitter part with Jiffy. So just to know who knows Jiffy here. Oh not that much, okay? And who uses it somehow like intensively or okay? And who knows Twitter? Okay that's okay. So actually so these these ideas of graph and Twitter came when I was in university in 2011. So at that time I took some classes and we learned like graph where new stuff not very famous as today but that was new that was very fun to use. There was Jiffy in version 07 I think. And also we have also this class of exploratory data analysis where you say okay I will analyze my data with some visualization first and then I will do my hypothesis from what I'm seeing, what I can somehow did dive, what I can filter from visualization perspective. And on the same time I was on Twitter and I discovered I mean I noticed some change on the Twitter usage. First there was a burst of usage. More people use Twitter for more professional stuff. And that was like a big change on that. And more of that they are using it real time. 2011 I think it's around that's where Twitter released their streaming API which is different to the traditional API. So the traditional API you just do a query and it returns you somehow database return or as the stream API you just like consuming a topic that you create. And that's slightly different because you have somehow the real time part there and there was more and more stuff up on that. And the stuff is you can notice and I noticed that with unfortunately the Japan greater earthquake in 2011 that's you have a lot of people that were tweeting like mass you can't understand what they were talking about because it was too fast there was too many stream oncoming. But if you do some stuff and I spend somehow the day looking at the TV for example and Twitter with some manual scripts. And the people on Twitter were much more accurate in terms of data and in terms of news than the TV that were just looping the news or having some false news or etc. So I was like okay there is somehow two different things I'm learning this kind of new stuff let's merge everything. So my first question was like okay could graph actually help us to visualize the the media stream during an event. So social media stream I use Twitter because it's the most more it's the easiest way to have a stream of data Facebook doesn't allow it and there was none of those that I know. And then events I base my research and my work on a catastrophe event but then I move more and more to more global event like let's say for them today or let's say the Trump the Trump election or anything else like a TV show and etc. So the original application so as I explained it was looking like that you take some Twitter stream events you receive some message you process it and then you send it to to GFE. At the beginning it was an external application but then what happened it's like the GFE I was using the graph streaming plugin but it was quite slow because you start to see that when you you have more and more user on the stream are very fast or very massive it starts to begin slow. So the Twitter now the Twitter plugin is just integrated into GFE so that you don't have any problem in terms of HTTP transaction and etc but basically it's the same and the core of that is basically what you have down here is the network logic. So the network logic is the algorithm that transform the tweets to a graph. So for example what I decide to do is to take very natural language as my modeling system. So you take all the entities like the user the hashtag the message and you define that as a node and then you say the link you just transform the link as you were speaking so for example you have a user that do a tweets that makes two nodes one relation a user tweets a message and then you can say also okay a message as the hashtag so you have a link here and then you compile like that you compile like that and also you find a trick like when you have some retweets or some codes or etc you try to emphasize this on the target of the tweets so that you get more importance on the network. So here for example you can see that Gagarin tweets something and you have a lot of retweets around that from Baikonur from anybody nice example by the way and yeah and so it should show something when you are somehow active on the network it should show you more importance on the stuff so that's that's why I explain here and the way of the network logic works within that allow me two things first the auto sizing within jeffy so in jeffy you have an automatic degree computation and you have automatic sizing that you can put in place so what is very convenient is like you have your graph ongoing and if something is getting more important it's growing up and you can see it growing up live without re-clicking each time to have the refresh of your visualization the second part is for sattlas and for sattlas 2 that do two things first it can't textualize your topic you will see that later so you can see that if more people are pro and cons a certain topic you will see them in in different communities and also it removes the noise we will see that later on and actually we can see that now just that after there is other network logics that exist so for example user to user hashtags to hashtags but at the end is just they're just they're related from this original source and from that part you can generate the other one just that in live it's quite complex to do it in parallel for the moment so let's have a demo real-time demo it's completely from the from today's fosdem network that I'm streaming since 10 a.m. somehow I don't know what it will give us so it's purely a test here so as you can see somehow right at the end it's okay no that's sad I would do a meteor like like you know like in the tv so what we have each color represent an entity green is hashtags red is user and yellow is the message somehow so for the moment as a macro vision what we have we have in the center we have the hashtag fosdem 2017 that is somehow the central place over there that because people are tweeting around that and they're tweeting with these hashtags and then so what we can see is like you have this whole bubble of discussions here and okay that's that won't be very I need to zoom a little bit but you see that there is little community that exists now so you have this one basically which is the most them so I think it's mothia and there is a lot of people tweeting around that and there are somehow let's call the mothia community over there then I think we have the big data somewhere let me see so the IoT community here so a lot of people talking about IoT and etc and what's interesting here it's like they are not somehow spread all across the central network you can see that there is some links that's going directly from this one to this one so we can imagine that actually there are some people that are interested by both and that are tweeting by both and you can see that somehow within the twitter what could be interesting to see other part yeah so yeah here for example you have another which is dpk what is it well you got the the concept so there is there will be an engine sorry over there let's go it's my sql dev room I let you judge what it means yeah so basically in most of our use case okay in most of our use case in most of the topic you will have a look there will be a convergence within a big giant component not all the component will be connected for example you can see there that some people tweets but they are not really connected to the global conversation let's say sometimes they are they are pretty big but pretty small but usually what happened inside your your search that you will do you will have a big component that will exist that is the core of your topic everybody will be that are linked over there make some sense to the global topic you are looking for so and you it will be very rare to find two big giant topic the two big giant component within the same topic or if you have to it means that they are completely unrelated somehow the second thing also and we will do a live demo then again is okay for when I see some people do that on the internet when they're providing their research and etc that they are they are keeping their search within their visualization and the problem is you have this kind of let's say star here that monopolize everything because everybody's tweeting that so I mean it's like looking something on something that exists already I mean the graph exists because you're looking at that so why looking at that and so going it somehow mismatch the size of the nodes because everybody's tweeting that but somehow it will hide the other size of the nodes for other topics so usually what you do you try to avoid that and the plugin is avoiding that so for example if here I configure it to have it somehow up and displayed but the plugin when you look at first them 2017 will remove that completely without putting it so that you have a better visualization so if you remove actually the node you see that more node now are bigger because they are they are somehow equal and we are really looking at all the conversation within the topic of the first them here yeah so what's happened it's like see all this node that basically will go off the first class will you get them off but that's okay because they are not integrated into another conversation here and what we can reveal here it's like okay you have some open guys that are talking to open source the most them so the most them are the guys that are more active from what I see here and yeah so that's it I will tell you that after but basically I'm not an expert I'm not guys that have some social class or whatever so everything that I explained to you is just purely visualization part I will talk to you about that later so learning from experience yeah so that's basically what I tell you so you can you have one giant component that described the core of your topic the liveness of the of the graph makes the odd trend very easy to spot because you will see we didn't see here because there is not something that is mainly retweet you will see that a node will appear and will create a lot of nodes a lot of edge with other nodes that exist for example and it will somehow make the graph very panic because he will have to reshuffle everything and etc and it will grow and then you see oh actually this guy is getting a lot of retweets and it's very important to know what are the hot trends when you're looking at an important table for example and also like we saw before there is some context because the the giant component is organized in multiple let's say community inside this component and thanks to for that class basically you will have the the the nodes the entity will be contextualized because everybody talks to everybody that looks the same somehow so if if you have a case where you have people that are pro and cons a certain topic you will have any message that will be merged either to the pro part or the cons part but you will see that you will know that oh this message actually I saw that it's going to these people so maybe I need to take care because that might be for example uh some some some fake news for example or anything like that um yeah so okay that was for example the the example I show you so avoid to have your search within your visualization so here like you see oh there is a lot of graph over there but that's just a star but if you remove it you have the component that's growing out so you can have much more insight about that uh detecting anomalies we didn't see it yet but it's very easy to detect bots actually um here you can see that for example there is one user on the top that do a lot of tweets with hashtag that have nothing to do with my search and basically most of this case the graph will be very big but will be outside the main the giant component so you will see very um you will see a lot of that and you see that okay actually doesn't mean anything for my search because they're out of any conversation so you can just ignore them another interesting bot like stuff was during the the low web paris event where a guy completely legit tweets something and then you have this huge amount of retweet so here is the original tweet and here is all the twitter that retweet is message and the retweets are are visualized here and what can we say with that well I mean I looked briefly on all the accounts that were retweeting the message they soon sometime legit sometimes not legit and etc but what we can say it's like in terms of metrics if you take for example the number of retweets this guy will be on your top rank but in terms of really contextualization of his tweet within the conversation I will say his price is uh is uh is uh is uh is uh uh is uh uh is uh uh is uh uh is uh uh is uh uh is uh uh uh is uh uh uh is uh uh is uh uh uh is uh uh uh basically is not contextualized none of this user here we're talking before or none of this user will talk after I mean I keep this this data running for hours and it stays like that so here what we can detect it's basically yeah the guy tweets something he used somebody to retweet it to get famous on the metrics part but in terms of conversation no value so that's very interesting uh yes so as I say you will see a lot of different uh similar structure so here it was um I think it was another low web or another convention that I see so here I use the user to user same same issue you have a big component and you see communities that emerge from that so here you have a community of nito and uh other form seo and and uh a little bit of bottom layer over there so you can but you see that in real time when you have the jeffy I mean I can put that back jeffy you see it in real time and you can see the evolution in real time which is pretty cool to know um then it's there to be a little bit tricky sometimes so it's another convention here so same same issue but you know like oh it starts to be a little bit difficult to read because it's more sparse the the tweets are more sparse and it appears that sometimes it's really like a cloud of users that are tweeting so you can't really you you still have the community that exists over there around user or around topic but that starts to be a little bit tricky to read a good example a recent example uh the french guys we know was the cha salon so it was about fion that was using bots basically and that was the nice uh a nice stuff to to use so same here again you see the giant component you see the two community the counter sphere and the fion sphere over there and what you can notice is like first it's the counter sphere that making that that that making the the tweets so they're tweeting a lot much more and around themselves you have some people that are doing the connection between the two spheres but that's very low compared to the really the the link between the people here and then so it was about both so I was like yeah let's go let's see the bots and you have two kinds of bots here so you have the fion's bot that were completely unrelated to the conversation of course uh and this this was the the so this was somehow the the hashtag that was created by the bot but they are not linked to any conversation here uh most of them there there is in this mod there is some that that are retweeting but most of them they don't do anything they just tweet that and they they die basically they don't do anything else and uh so and here in jeffy is very interesting because then it will focus you when when you will run it live this one we just get get away from the main one so you can really focus on what's happened here and say oh actually yeah that was some bots and most of the talk now is about the counter sphere part and then there was these guys that decided to do a counter bot on this conversation so basically using a tweet to say hey look I have also bought that I can create some message on twitter like you and here because this guy was also having some friends that were helping him you can see that he was linked to the counter to the counter sphere uh but same same problem here none of this message or tweets were linked to the guys inside the counter sphere so here you can say okay he just create some fine fine stuff he got a lot of advertisement but in terms of generating value into the conversation zero uh so that was uh yeah and then so you have this kind of stuff that can appear so here it was during a tv show and uh you can see that's okay you have some hashtags that getting uh highlighted but okay the user the tweet how we analyze that I don't know and then you got woman's march uh it's only 15 minutes here and you get trillion amount of nodes of edges uh your computer started to take fire and uh and I mean that's somehow the limits there I will explain you later what's the limit but I I developed the plugin for small event at that time in 2011 small event was like what today's are doing in one hour uh in twitter and here it's really the case it's like okay then after if you somehow stop the stream you can somehow analyze it but it takes really long time and you and you need to stop the stream so you can't really get focused on that so here it's cool but we are uh reaching the limits the limits is a little bit jfy part because not because it's a bad tool it's a very amazing tool but it was not focused on this kind of use case so the problem is the problem is for example the matrix and the filters it could be nice actually to have some filters to say hey remove me this part or get focused on this user or guess what's on that the problem is the metric and the filter particularly in life doesn't work you if you try that you will have some exception and jfy will goes banana uh the other problem is the ui is not complete on the sense that could be nice that when you have for example here the the status or the the nodes if I'm clicking that okay for hashtags doesn't make sense but if I'm clicking to this user I can see somehow it's profile there is also the link here a link to a media could be nice that if I'm clicking there I can see the image for example in jfy it doesn't exist yet I can try some plugin and etc to have that but is it really the the good way to go I don't know exactly but it means that there is somehow a limit here that we are reaching and the last limit is I think the need of science as I told you all that I show you and explain you was purely visual I just use that because I know some visualization stuff I read and I spent six years basically looking at any topic that was interestingly but in terms of science I can't tell you why there is a one giant component I can't tell you why people are going somehow to a community so here I think it's something that now I need to focus on is to find some people that are doing classes and etc science classes to find why not to put some science upon this tool so that then we can really know exactly what's happened and we can detect maybe more more stuff and we can grow a little bit to that and all this free stuff for me it's needed now for the heavy graph stream so what we saw with women's march because if you don't have this kind of stuff now that are somehow focused really on streaming you you can't do you can't do that yeah so somehow some demo of an NC why that I've done so you can just do that and then you can see okay you have activities of Twitter and etc with hashtags so it gives you another value I mean it's not incompatible with the stream visualization of a graph but it gives you much more visualization and much more insight with other data and then yeah you can create your own social made room so that's what I've done when you have the live the live image and you have the graph and you can see here it was during the the euro world cup you can see people fighting each other because they like France or they like Portugal so in in conclusion interesting added value here I think you can see a lot of stuff from from the plugin and etc that's very interesting use case the problem is like the requirement and the context change from 2011 it changed a lot and now for example I was talking about TV that were outdated but no TV use a lot Twitter so how to react with that actually what's now the read the value of that some technical issue evolution are needed and science of course and science actually it's why I put this project in open source and I will as we are in an open source demonstration I will talk a little bit about that it was for me a very nice open source experiences to contribute to to jeffie to look at what people are doing outside and to to get some discussion so thanks andré and eduardo from the jeffie community somehow that tells me a lot on creating that and having an insight about the graph streaming stuff I discovered recently that some university use this plugin so in in all part of the world so in Paris in california in mexico in brazil actually if you have some if you know why people from south america heavily are interested about social studies let me know because I don't know why yeah but yeah I mean that's that's why it could be nice to know exactly and also thanks to the user community so some people use that for example server trump use that to have a look on what's trump was saying a very smart stuff some also some forsonic studies with clement levala we start to have jeffie tutorials dedicated for this plugin also and those and other people that are sometimes translating to to have a tutorial that makes sense or to explain what they've done and many more and it's the last the last talk somehow it's please if you're using it and you you publish something or you're doing something tweet me send me a message i'm very happy that people use the plugin and like that i can keep track of what's the usage and what's the need and maybe we can grow something together somehow so thank you very much i think we have five minutes for questions any questions from from you otherwise i have one which is related to things like hashtags because hashtags tend to accumulate especially ones that you're searching for accumulate a lot of additional relationships did you try to kind of reduce that noise by saying okay i don't or i will not kind of track hashtags beyond a certain size i just kind of notice that they are there but i will not put them in and did that make any difference well somehow it exists because when you're looking at a certain hashtag and it appears into the stream the algorithm will avoid it so there is a first bridge to say because this one will somehow aggregate the the stuff that was I was thinking you won't have it on the stuff then it comes with the filter you can still filter but within twitter it's very hard and on jeffy also you need to cut something but jeffy have this filter that exists and that could be nice to have that unfortunately i've tried that several times on the live stuff and that does of course that's the problem