 We're here to speak about leveraging low-clack to build applications and how we run analytics on top of this. I'll be going into slightly a few engineering details of that is behind the low-clack server, how the architecture is and why we built it that way. I'll show you a few analytics behind Kibana and how the Elasticsearch stack works for us and I'll hand it down to Damani to go in to ask Suzie and what we do with the data that we collect. So introducing myself, I am a core contributor with Michael to the low-clack server and Damani here is a core contributor to the Suzie server. So it's a very good collaboration that we have. I will ask her to introduce herself. So hi. I'm so excited to be here. So I've been working with Forsesia and Suzie when it was started right from its inception I was with it and I'm so excited to talk about it. Okay. Thank you Damani. So let's just dive into the talk. So what exactly is low-clack? You've seen the past two talks speak about low-clack and how they use low-clack to do things. So low-clack is actually a peer-to-peer server architecture that we have which collects tweets, indexes them and it does all of this without going through the authentication layers that are that is put behind Twitter. So in case you want to fetch tweets from Twitter, for example, what you would have to do is you would have to create a Twitter app. You have to create an OAuth login into Twitter and then you're allowed to fetch the data that you want and at the same time Twitter has something called a rate limiting window where you cannot query for more than a specific number of tweets in every 15 minutes. So I think you're actually restricted down to 80 tweets for every 15 minutes but if you run low-clack for the first time you will probably collect 25,000 tweets in the same 15 minutes. So that is the power. What it does is it has a scraper and it has a browser. It goes on to the it goes and it's also with elastic search. So if I dive into the architecture what happens is it clear by any chance? Okay. All right. So I'll just explain how it works. You have the local host instance, the server instance that you're hosting. When once a client sends a request, so someone has to send a request to start the entire server or else the first query that the server makes is for beer because we like beer and it goes through a DDoS prevention check because it has an open API. You don't want people to spam the entire API by doing this. What it does after that is there is a rest interface and there's a peer-to-peer interface. There is a crawler queue. So just like how every crawler works, you crawl a particular query. Say for example, I crawl it Twitter.com slash a specific page for a specific query. Now every other hashtag or every other mentions that is the act that is there on Twitter, all of it gets injected into the queue and the queue takes over and then it keeps processing. So the bigger your queue size is, the more number of queries it can make at the same amount of time. So all of this is written on a generic scraper factory. So we use factory patterns. What happens is just the way we have Twitter scrapers, you can now plug in a Facebook scraper in case it's possible to do it but we have tried to crawl Facebook and failed at it. You can similarly have Vibo scrapers. Vibo is the Chinese equivalent of Twitter that is there. We've been trying to do it with Instagram because there are also public posts. So you can use this crawler and scraper framework to do everything that you want to do on public websites to crawl and index all the information that you ever wanted. Similarly, the advantage of this compared to one machine actually querying is the peer-to-peer interaction. So in case you have a set of 10 servers which are doing this, if one of the servers gets blocked by Twitter or the service provider, say Twitter or Vibo in this case, what happens is that the Twitter service will reject all the requests from this particular server. In that case, what it can do is it can forward all of the tweet content that it has to another peer within itself so that it can increase the other peer's data capability. By default, we have configured it to push tweets to the lowclack.org website which is the website that we host for the lowclack project. So in case anybody runs tweets or anybody runs the lowclack server, it automatically collects the tweets, stores it on your system and periodically keeps pushing the tweets to our public server so that people can get access to it and people can use it. So it's a repository of open data that you have and we have 1.45 plus billion tweets in the last two years which is a very big number in case you're a data scientist, which is a very big number in case you're a researcher because most NLP and most sentiment analysis and these kind of research that is carried out in universities go with something less than 100,000 tweets. So that is the data that they use to actually write their entire research but we are giving researchers the capability to have 1.45 billion tweets publicly available for them so that they can do whatever they want with them. They can push the computer science frontiers especially with NLP and AI and data collection forward. And similarly, we use elastic search. So elastic search is the Elk stack or elastic search log stack and the Kibana stack as it's popularly called is completely scalable. So you can plug in multiple elastic search indexes and you can store data, you can balance the data that you load across multiple servers in case you actually do it on a production system. Similarly, it's very easy to export data from one server to another. In case you don't want to maintain your server any longer, you can take a copy of it and hand it over to us in a hard disk and we can just copy it onto our actual servers with absolutely no data loss because it will re-index itself. Similarly, it's very easy to partition all of your disks and only dedicate a particular amount of disk for doing this for you. So say for example, you have a 1TB disk and you want only 100GB of tweets. You don't want more than that. So you can dedicate 100GB so that you can store only 100GB of data and also it has the public capability of looking for other low-collect peers in the entire network. So over the internet, if there are other peers and in case you make a query, you can also make a query to appear to fetch data for you or you can also trigger a peer to perform a query for you. But that requires some amount of elevation rights. We give most of the elevation rights only to the local host or the server in which you maintain because you own the server, you can run something called a campaign on the entire server. That means you can tell the server that until I collect 10,000 tweets, don't stop crawling and crawl anything you feel like. Or you can make it crawl very focused content. You can make it crawl only Japanese tweets or only Chinese tweets and so on. That depends mostly on your starting query. So how exactly do we do analytics? We do this with Kibana. I'll actually just show you how easy this is. So if you actually notice, this is the local server which I just started. It's running at local host 9000 and 9200 is the default port for elastic search. And similarly, this window that you see here is the Kibana search, is Kibana and it runs at port 5601 and you configure it to link it to port 9200. So now if I just open it up. So this is how Kibana looks like when you start it up. And these are all the tweets that it is still crawling and indexing. But there are some very interesting questions that I or anybody as a data scientist would probably want to ask. So some of the questions are, hey, how many people on Twitter are happy? Okay, so it says that out of the data that's collected on my machine. So I have more than 400,000 tweets on my machine. It says 267,000 people who have posted are happy, which is roughly around 50 to 51%. Many of them are sad. So we can all we can try to understand why people are sad. Some of them are angry or some of them are scared. Some of them are angry and some of them, it classifies it more as a class issue. So what similarly, what is the average length of a tweet that anybody ever makes? How many people actually make tweets of 140 characters? How many people use the entire limit that is given on Twitter? That is a huge spike. Similarly, what is the least number of like least number of characters that people actually use? It goes around 117. So no one really uses more than 117. Uses less than 117 characters. That's the minimum number of characters that people actually use on Twitter. And 140 characters is the limit that people want to touch at any cost of at any cost. That's probably because we like to write a lot. And similarly, you can also put them on a map. I don't think the map is loading, but I think you fairly get the idea that it is supposed to work. So you can put it up on a map. You can find out from which location people are tweeting the most. Here it looks like this is Germany, France, something, something nearby over there. I can't see the map, so I can't exactly tell. But similarly, it's very easy to also make queries. So in case you want to make a chart, you can create, you can choose which query you would want to make. Say, for example, you want to plot how many people swear. So you see that there are a lot of people who talk about sex on Twitter. Similarly, there are a lot of people who swear. There are lots of trolls and there are also a lot of lead. So lead is hacker language. There's a lot of hacker language that goes around. So what we want to tell is, look like it's an amazing data source for you as a scientist, as a researcher, or as a data enthusiast, or as an open data fan to store data, understand what the data feels like, how it works. And it's built on such a good architecture that you can share data with somebody else who even when you don't want to maintain the data with you any longer. And similarly, if I go ahead, there are a lot of APIs that are available from Localite. And how many of you have used Twitter apps before? How many of you made apps on Twitter? Okay, so you use an API to fetch all the data, right? And it goes to api.quitter.com. If you just replace api.quitter.com with api.loclite.org and make very few modifications, you don't have to change any of your code and all the data works in the same fashion. Because add it to the same format in which we provide data to you. Say for example, here I want to find out tweets about President of the United States. Okay, these are the tweets that he's made. The structure of each tweet object is exactly the same as that is given by Twitter along with the keys. But added to that, you have much more things that are available with you like, okay, I don't know why it's not here. But you have the text length, which Twitter doesn't provide. Similarly, you can also have classified emotion. You can have the property of the emotion, the property of the language, and so on. So what this tells you is that the system is so scalable and the system is so you can analyze every single line of text that you get. You can add a newer entry into this in case you want to find out translations. You can probably add a key called translations and you can have an object telling this is a translation in Chinese, this is a translation in Japanese, this is a translation in French, all writing your own services. So as a researcher or as someone who is very interested to look into something like this, this is a very amazing tool for you to actually play around with. And I'll actually hand it over to Damani to talk about what we did with the 1.45 billion tweets that we collected. So thank you very much. You covered a lot. So we were working on building apps using the APIs provided by Loclark. So maybe I can share a small story of mine. So I wanted to build an application which will give the sentiment analysis on tweets. So I was, before I was into Loclark, I was using the Twitter APIs and the procedure for getting your account authenticator was really huge. And thankfully, they provide all the tool support. And I was pushing my sample application over there, getting the client ID, secret, replacing after every 15 or half an hour when the tokens get expired. And yeah, they have rate limit, you get trouble after a certain time. So later on, I moved to Loclark. And one point is Twitter doesn't give you an API which will give you data on how the tweet is classified on emotions. That is one of the good thing with Loclark. So internally, there is a classifier where you get, where your tweets are automatically whatever tweet gets in, they classify it under five or six emotions. And then you get the probability how the correct on the correctness of the classification. And I was using the API and was able to classify. And that was one of the sample app. Or maybe you can go to the apps page and look at the previously built apps using the Loclark APIs. And also, the previous speakers have spoken about their applications. So, and a lot more on the apps page. There were many analytic related apps. And their Twitter was timeline apps. Also, one of the main purpose of building apps is we are trying to collect as many tweets as possible. And providing these APIs in the form of apps would help a lot, maybe for a native user. And you can actually check out this page. Now comes the interesting part. Ask Susie. So Michael has already covered a lot in his keynote session about the internal architecture. Now we can actually play around with the app. So why Susie? We wanted to make the best use of the data collected by Loclark. And we came up with an artificial intelligent chatbot called Susie. One good thing. We all love playing around with Siri, Kotana. But one bad part about it is it's a small box. It's a closed AI. You don't know what internal structure, how they are trying to read your mind. So you don't know all those. But coming to Susie, it's open. You can train it according to your needs or for your any other purpose. Maybe if you, the other day I was talking to the Singapore Sign Centre organiser. She was interested to build a train to Susie according to their science facts and all so that whenever a kid enters a centre, they can play around with the Susie. So that was one interesting thing. You can configure it according to your needs. So that's the best part being open AI and Susie Dreams. This is where you try to configure your AI chatbot. So I will just walk you through it. Just go on to this link. This is a temporary place where you can give your own rules and teach Susie according to your needs. Suppose I give the room name. Let's create a dream here. What can I, what tool you want to have here? Maybe hi. So the star over here will read your input and remembers your name and you back. So maybe you can say, sorry, I'm just using the gaps. So this is a created a rule now. Let's get to Susie and check. So there's a simple command. Just type dream about mental. So your rules are taken by the Susie now. Let's say remembers my name. So they are like, there's a doc available and you can go through on how you can configure your rules accordingly to your users. I can show you the link. Yeah, here it is. Just go to the Susie server repo and then you can go through the tutorials on adding rules. Or you can, you know, give me some commands. We can try or play around with Susie. Anybody want to have, you know, and then I can explain from where it's getting the data. So one good thing is we use look like as the data source behind and you can configure the data source over here along with look like maybe you can add inside the media main data source or band dictionary, wiki data. So it's completely configurable. So coming to the future plans regarding adding the data sources to Susie, as of now we have look like as a peer to peer system having a lot of data being indexed and so we want to build similar kind of scrapers for other social networking platforms, be it repo or Facebook, we want to even go to Instagram. Another enhancement which we are planning is add more capabilities to Susie. Maybe you should have an account system so that it will remember your interest whenever you try to talk to it or give you suggestions on your queries, stuff like that. And maybe you can explain more about the IoT support. So one of the future plans with Susie is to have IoT devices speak on Twitter. So they'll talk, they'll post tweets to Twitter and a local Susie server that is present at home will collect these tweets and it can trigger commands inside your home network to the devices that you probably want. Say for example, you just posted a tweet telling hi, what are my plans? For example, and you have a small robotic arm over there which pushes your, which pushes a small glass of water, for example, into the plant. For privacy reasons, you can connect all of them to a publicly available, to your local server that you host at home, give that a public IP address, or you may not even need to give it a public IP address. And when you post on Twitter using the Twitter app on your phone, it will automatically fetch the data that you've posted and trigger a query in your local network to do the required IoT operation for you. So this is the plan that we have for this year. And we should probably be able to demo something amazing for you the next year with the complete IoT integration of home automation or for farm automation or something like that. So these are the three future plans that we have with the project. We're open to take any questions in case you have it. That's the last one. Yes. So no question is recent local indexes tweets. So does it actually just use or it also stores? So that's a very good question. We store the tweets in the Elasticsearch database, but Elasticsearch itself keeps indexing. So you can trigger Elasticsearch to index itself into multiple indices. With respect to the images, we don't store the binary content of the image yet. But what we do is we have a public link of that image or the raw image link which we scrape and we store within ourselves. So in case somebody wants to use the image, they have to make one more explicit request using the image URL that they get in the API and download the image for themselves. So it's probably something we may want to change, but images are very hard to index. And that's probably the reason why we didn't go with indexing images. There was an emotional classifier earlier used. What's the, would you tell us more about the tool being used or actually doing this classification? Okay. So that emotional classification is running a naive bias in classifier. It's a regular off the chart of the shelf library that you can just pick for Java. But that's just a proof of concept to show the modularity of localization, to show how well it has been written and how amazing the library really is. So in case you want to run much better algorithms on top of it, you can just make one file. You can just write one other Java class and it will automatically integrate itself into the rest of the local environment. So that is how easy it is to really do it. I can actually run you through the code in case that's needed. So let me just find file for base. So if you see inside the tools folder, we have a base in classifier. So if you just click on that, so this is a publicly available base in classifier, which we just took. It's by, it's written by Philip with an MIT license and it automatically integrates itself into the local environment. Similarly, there are, there have been some, some of them who have written their own classifiers. They have tried to write their own classifiers and you can also do more things with the data that you have. Thank you so much. Thank you. Thank you. Solution of any time. Thank you.