 I'm Subish. I'm a finally a computer science and engineering student at National Institute of Technology Varendra. So and I was a Google summer of code student with Fawcetia working on the project low-clack. So what is low-clack? So low-clack is a back-and-search-and-storage infrastructure that is actually out there. It's a peer-to-peer based scraper. So it scrapes Twitter. It stores all the messages that are being posted on Twitter. As of now we store one out of every 200 messages with the servers that we have and we can basically to use low-clack it's public and you don't need to have authentication like how Twitter needs an authentication for you to start using their API. The low-clack API is public so you have complete open data with you. You can use it for a lot of purposes. So as of today I would be speaking on what all different things you could do with low-clack API. Some user-centric applications that you can build. It could be visualizations. It could be research purposes. It could be some search portals that you want. So let's dive into them. So the first thing is you can use low-clack to store any open social network or micro messaging information. So it could be Twitter. It could be the Chinese Twitter alike the Vibo or any other micro messages. And there is sentiment analysis as of now which is running a live by a same classifier on the low-clack server. So every tweet has some amount of sentiment that is recognized either as joy or fear. You can use it. You can make it better. So since all of this data it's coming at massive scale. There's a lot of volume of data that is out there. The speed of the data, the velocity with which it's coming is really really high. It classifies under big data. So what can we do with so much data? Can we make some sense out of it? This is the main question. And if we want to make sense out of it what are the different ways in which we can do it? So one obvious way is to have visualizations. So visualizations can help give people a better understanding of what is happening out there. And there's something they can closely relate to. Something that they don't need to look at a log sheet or a JSON file for. So there are lots of people that are currently using big data platforms like Uber or our Facebook. So how they can leverage low click is one of the main questions. So what is the technology stack that's running? It's running a Java Jetty server. It has elastic search, one node integrated into it. It has lots of shell and its front end is written in JavaScript. And we also have lots of APIs that are written around. So we have a Ruby API. You can directly install the gem. We have a Python API. You can directly get it from pip supports Python 2 and Python 3. At the same time we also have a node JS API. You can use it for building lots of applications around there. So the easiest way to actually get started with low click is fork the low click repo, the low click server, clone it and run Ant. So Ant is the build tool that we use. We also support Gradle. But it's preferred. It's easier to do it using Ant. So you can run Ant and then you run the start shell file. So I can just show this to you. So this is the low click repo. You can pull the changes. So once you clone it, you can set up stream. You can do it. So the easiest way to compile it or build it is by using Ant. So you run Ant. It builds the complete repo. So it took four seconds. And you start the server, then slash start.sh. So it automatically opens up the browser with low click that's running on it. So yes, here we go. So it keeps storing messages from Twitter. You can actually make a search query from right from here. And it gives you a JSON response for this. So you can use this JSON response for doing a lot of tasks. So one of the tasks that I'll be showing you, a few of them that I'll be going through in this workshop is building one of the visualizations and one of the tools that we built around it called lowclick.net and support for apps. So apps is something that we support. And you can use the API. You can build lots of apps around them which can directly be plugged in into lowclick.org. So we have, for example, this is a heat map. So you just search for the tweet that you want. So let's say we're searching for Singapore today. Plug the tweet. It's taking some time. So there you see, you know where there are lots of people tweeting from. This is only from the data that we have. There is a lot of data that is still out there which we can still collect. And having a peer-to-peer infrastructure, it basically helps us out in ensuring that... So if you're polling Twitter and if you're trying to collect all the tweets, there is a point of time when they would actually block you out. Having a peer-to-peer infrastructure kind of beats that. And since there is obviously no OAuth token or something that is needed, the API is completely anonymous. There is no IP address that is being tracked in the API. So you can use the API for your own purposes. So let's just head back. So I think this number is old, but we have more than 700 million tweets that are on the localite.org web server. And you can speed it up if you want to contribute to it. You can post your own server and keep localite.org as your backing and contribute peer-to-peer data to the server. So what can you build with this? So we've spoken about what the server is, what the server is, how it works, what is the kind of response that you get, what is the data that we have. So what can you build an enhanced Twitter client? So we know that Twitter does not allow us to upload maps. So it doesn't allow us to tweet a map. So using localite.net, which is... So this is localite.net. It looks like Twitter. It feels like Twitter. Except that you can do a little more things than Twitter. So first thing you can do is you can get your location. So I'm at Singapore Science Centre. You can drop a marker and I can post a tweet. So it converts this into an image and it posts the tweet onto Twitter for your other followers so that they can actually see where you are. So it's basically one of the additional features, the plugin that you can actually generate. Since Twitter doesn't have that functionality and if you need that functionality, you can build that by yourself. So this is one of the things that you can do with localite. Other than this, you can also have a messenger bot. So I can go through that. So this is my telegram. I can start off with slash start. So ideally that's how it should start off. So the server has replied to us and you can actually search for something. So let's say we search for Fawcetia. It's going to give us one of the tweets telling us one of the most. Okay, so this has Fawcetia photo taking fun. So this is probably one of the most retweeted picture that's out there. So it gives us the image back. At the same time, if you're if you want to know about the status of your server and you're supposed you're maintaining the server, you're from the DevOps, you maintain your servers, and you want to know if it has gone down or if there's some kind of an error, you can directly check the status of the server and it replies back to you if it's okay or not. So this is one of the very simple use cases that you can do with this. So another thing that we can do is we can use Twitter. So you can use lowclack.org to do all your data and data, you know, collecting the data, mining it, running the algorithm that you want to run on it. And you can interact back with Twitter using the official Twitter API. So we don't support posting to Twitter from lowclack, but you can use the Twitter API or your particular bot and you can reply to it using that. One of the things where this could be really used is in customer service. So there was one of the applications from Mozilla called Army of Awesome. So how many of you know that? So they used to do service support for Firefox. So the Firefox support used to happen using that. Similarly, you can use this for passive governance. That is what we've used it for. So in a product that we built with IEEE called Gear Systems for Government Resources, we gave the government data, public data from lowclack.org for people posting in that particular country in that particular city and the complaints that they're facing. So if someone is facing a complaint of broken road or the manhole which is not covered or any problem like that, most people don't want to go to a government office and really do it because they don't really like governments a lot. So they prefer ranting about it on social media, Twitter being one of their favorites. We can collect all of this information and use it for doing something really, really productive. So that is where lowclack keeps pulling the data. It analyzes it and you can have a bot which replies back to it in case there is less information. So in case, for an example, if I post a tweet telling that there is an open manhole in this particular address, it misses out the picture as a proof. So it also needs a picture as a proof that there is an open manhole there. So it's going to tweet back to the user telling, can you please provide me an image? So once you provide an image to it, you can reply back to it and the lowclack can pull it back and you can check. Okay, so probably we can do something about this. So this is just one of the other use cases that we have. So the web client, so this is the so this is the lowclack.net. It was the Google Summer of Code project. So other than just giving you something like Twitter, it also has a map feature integrated in it where you can see your followers, where you can see the people who are also following you. And you can post a tweet to Twitter, obviously using your Twitter login. And at the same time, you also have reports which can be generated for different users. So this is mine. We can probably go to find Harish. So it analyzes the data, it generates the charts for you. There are lots of plugins that you can make by lots of plugins that you can add to this. So this is because the internet speed and the client is passing it slow, there's actually no problem with the server as such. You can also add data sources from your IoT components directly into it. So if you want to add information of all the routers that are around Singapore, you want to find out where they are, you can actually add data from all of them into local app. And the geo coordinates from there can be utilized on an OpenStreetMap directly because it's just data. You have data the way you need it. You can use it for whatever you really want to build. Maybe we'll get back to that. So let us just try going through building one of the simple bots that is there. Here is a sample file that we will be using. Let us call it a tweet.py. We're importing JSON, the requests library, the OAuth libraries, and we're importing a file called Twitter constants. So Twitter constants basically has the API key, the API secret, the access token and the access token secret. So to get this information, you go on to Twitter, you create an app on Twitter, and you can put your API keys and consumer keys over here. So what this function does is it sends a request to Twitter from your ID or from the ID that's logged in or the bot, and it posts a particular message by the particular user. So in this scenario it's going to be the bot. That is why we have the strings of the tokens and the secrets present inside the file. It's not dynamic. It's not based on a session token. It appends the message to the URL and it does a post request so that you can post something. At the same time, this is how the polling system really works. It's an integrated app within Django. So what we can do is, so let's start off from here. So this is a dictionary telling the different types of governmental problems, the different types of governmental set ups that is out there. So municipalities are the smaller at the small city based governmental organizations and the punch ads go to the villages in India. Similarly, what? So this is where it starts. It's always polling for a user called gear systems. So we can show this way. So I've just posted a tweet. Let's check. So we are still polling the API. It's going to take a little bit of time before we can grab a tweet. So yes, so you see the truth is grab it. It says processing a tweet at gear systems. There seems to be a broken road with a powerful limit. And it's trying to make a request to Twitter to post a reply back. So the reply message is basically so it says it has replied, but it has actually not because there is no API key. So what it first replies is you can you can set it. You can actually set the configuration as to what you wanted to reply. You can use NLTK. You can parse this parse the string that the user is giving you check the sentiment or to check the type of the type to check for specific keywords or specific lexical icons. And you can use it for different purposes. So for example, I can show you one previous. So here's how it replies back. So this is one of the complete. So there is a pothole on the road. That is what I had pushed to it. And it says thank you for the complaint. Please type the description of the problem. So I tweet back to it telling the pothole is big and dangerous. And it says thank you for the complaint. Give us a type from the following. Either it's a municipality that means it's a city based problem or is it something related to the new Narendra Modi's campaign of Swachh Bharat, which means clean India. What exactly does it correspond to? So this list is extendable because it's just a JSON dictionary. And similarly, we can see some more. You can post an image. Since the image is already there, the type of response that comes back to it is going to be different. Now it's not going to ask you to supply an image to it as a proof because it already knows that the image exists. And it can directly say thank you for the complaint. Type the description because the description of the problem is actually missing. So this is just one of the applications that we are using low-clack for. There are a lot of other things like the apps. So this is an app inside low-clack itself. It's a tweet wall. So you can directly make a search query. And it uses the low-clack API searches and just displays the information for you. You can use this probably at an event so that you have a live tweet wall which is running throughout the event. Similarly, this is a query browser inside low-clack. It shows all the different types of queries that are being made, what is the count, how many times this query is made, when was the latest query that was made, things like that. At the same time, you can use purely front-end libraries like AngularJS or ReactJS. You can make a request to low-clack public API and you can download, you can get the JSON response. Use the JSON response to do whatever you really want to do with it. Like in this scenario, it was just displaying the tweets and the links to the tweets. So this was done in the Google code in by one of the students. Similarly, we have low-clack apps, which is an app in itself. So there is a specific, there's a way in which you are supposed to contribute apps. In case you have a new app idea, which uses low-clack as it's backing, you can contribute a new app. There is a fixed way to do that. So let's try doing that. So it contains three different, so this is how you do it. So you can create an app in the HTML apps folder. It needs to have something similar to a manifest file that we have when we build mobile applications. So similarly, we have a JSON file named in app.json. So the low-clack server automatically searches for all the app.json and lists them out for as apps that you can use. Similarly, so this is the API. We have lots of APIs that are available publicly. So there are different access modes. So the first one is open access, where anybody can use this without any authentication or without any restrictions. Then we have limited access where you can still use it publicly, you can still query it, but you will get less results than the ones who are using it from the local host. You have some things which are strictly restricted to local host, like the settings API or something like that. You can allow course, if you host your own server, if you host your own low-clack server, you can allow course. So there are different applications that can request cross-domain. We have an API to check if the server is actively present or if it is having a back time, if it's giving a 502 gateway problems or if it's giving a page not found or something like that. So we have the server API status. So the search that JSON is the API that you have just seen, it gives you a complete JSON response of exactly the same information that Twitter gives you in the same format that Twitter gives you. So Twitter's JSON structure, so it has the tweet ID, so here we go. So it has the status as object, the status as added. So each of the status contains a created at the screen name of the user, the text of the tweet, the link of the tweet, the ID string. So every tweet has an ID string. So it has, it gives you that place where it was saved from. So it was saved from Twitter. You can also, since I said in the starting that you can use multiple other backends, like you can use the Chinese Twitter equivalent also, the Bebo. And it also gives you the number of retweets that are on it. Similarly, it gives you the number of favorites. If it has any links in it, it gives the links out to you separately. If it has any shortened links in it, like big.ly or goo.gl, it actually de-shortens them and it gives the links to you directly. It gives you the length of it, so in case you need it for some, you know, machine learning purposes or in case you need it for something like that. And it gives you the profile information of the user. Exactly the same information that Twitter gives you. You can directly change, in case you're using Twitter API for something, you can easily change from Twitter to the look like API without a lot of effort. And there also, there's also aggregations that is given. So aggregations are basically the number of hashtags, the trending statuses, things like that. So here, in this example, we have SpaceX, Elon Musk. So the query has been made for SpaceX. And the most mentioned is SpaceX followed by Elon Musk, followed by Space Station and so on. So you have this information also that can be given to you. It's not restricted just to the limit of six that you see here. It's a variable. So you can have limit of 100 probably, you will still be able to get it. Okay. I think it's old. So there is an official Python API. In case you want to really use it, it's very simple. So you create an object of low-clad. And then you directly make, if you want to make the search query, you make the search query for this. You can store it into a variable X. For example, yes, print X. I have all the information. So it's quite easy. If you want to display it, so you actually see the total number. You want me to shrink it? Shrink the window to half the size of the model and not the size of the model. All right. I'll do that. Is this all right? Yeah. All right. So this was the previous command. So X equal to L.search for Cessia, where L is the low-clad object. So it's this simple to actually make a query. The most used for this is in case you're a researcher. So that's what I found very good use for in case you're a researcher. You can directly use this. You can run your NLP. You can run different machine learning, data mining algorithms on the tweets that you have. At the same time, as I showed, so there's lots of visualizations that is possible. Heat maps for events. So for social events like Fawse Asia or Fawse Den, you can have tweet walls that are running around. You can also have different other apps that are built on top of it like the Gear Systems app. You can have a telegram bot in case you're a DevOps person. You want to see the status of your servers or you want to see the status of your application. You can directly integrate them. So low-clad has a lot of possibilities. It has a lot of scale and it can do things. It can work magic with its large amount of data, with its large amount of, you know, with the stability that it really has. It of course needs more work that we are in the process of doing. So we are, as of now, as I said, collecting one out of every 200 tweets posted in the world. And probably the rate will improve with more servers and better infrastructure. But at the same time, more peers, more contributors, and more people running the servers and contributing to the backend will make it even better. So I think I would end it here and I will open it for questions. It crawls the latest. That is, whatever people post, it keeps crawling it. So I'm not sure if it can really crawl all the historical data that is out there. So in case you give a particular user, it can try crawling up to some amount for that particular user. But it will not be able to go back to the user's first tweet for sure, unless the user's first tweet has come after the look like services. So that is one of the limitations that is out there. So, I see that the API itself can post about one page. You can take it for months. You can take it for months. So, in case you just want to use the API. So here's your query structure. You have the query content, what you want to search for. The date from when you want to search for and the date to which you want to search for. So this could be anything between one month, two months, three months, whatever it is. If there is data, it is going to give the complete data to you, no matter how big it is. It's probably like, even if it is like 20 MB minified, it will probably give it to you. Yeah, I think the QB2010, but what I mean is how do I ensure that the data will be there to crawl just to ensure that. So, do you want to crawl something very specific? Yeah, I think for example, I want to crawl something related to some particular keyword, which is very specific to Malaysia. So, I just try to accept the use of luck to do it. I end up with a very poor dataset. So, I end up just being too busy with it. If they can't just download the Q&A for a bit. So, how do I get access to the data? So, what you can do, one of the things that you can do is you can keep polling for the data. You can keep polling. And you can, there is a push API that we have. So, you can push the dataset that you have to localize. So, since you're polling only for something very specific, like how we were doing here in this case when we were actually polling for your systems, we kept polling only for that specific keyword. Because we wanted it to start with that or contain that specific word. So, you can do that and you can store those information into your local server. In case you're running your server and if you run this on the background with local host as your query, it will be able to store only those information as a majority. So, in case you do a query from the local host only for something specific to Malaysia, most of the tweets that could be present on your server would be the ones corresponding to that. Because your server has been trying to search for that more times. So, I think, yes, it does need some training if you're looking at it that way. It can't really go ahead and scroll whatever it does. How does KSS work? Does it send data to the domain? Yes. So, as of now, what it does is it takes this information that is there. It registers them as a complaint. So, I think I can probably show you that. The city of Hyderabad. So, there have been two complaints which have been made. You can just register the complaints through a mobile app or through Twitter or anything like that. So, basically what we want people to use is we know people will not really download a lot of apps because apps is almost like a real estate on your phone. And you don't really want to download lots of apps, apps to do something which is only very very specific. So, people would rather prefer moving over to social media or moving over to apps that they already have. Something like Telegram or something like WhatsApp or Slack, Gitter, something like that. So, all of these they allow bots to be running on them. So, even through Telegram you can actually register a complaint into this. But what we've been trying to do is there is a lot of data that's actually available in public which the government can use to do something really really productive. So, we decided to give it a shot to see if something like that is possible. So, people just tweet. They follow the instructions that the bot keeps replying to them with in case you have information, the bot is happy. It registers the complaint. In case most of the information is not there, it asks you for the information that is missing. So, I can run through the code. It's pretty straightforward. So, we have something called a state list which has six or seven different zeros. Basically, we want to use these as boolean bits. Each of the zero corresponding to one of the fields that it needs to register a complaint. So, the first one is their name. The second one is the type of the complaint. The image in case it is there. The location. Things like that. So, in case it finds out using NLTK. So, we are using NLTK in the background. In case it finds out that this place, this is probably the name of a place. It cannot be the name of a person or it cannot be a complaint specific word. It takes that and it automatically makes that one. That bit. So, I think I can do that here. So, here is the update. It takes the state lists. It goes to the specific state list that you are referring to and it updates that state list to one. It returns the complete array back. And similar. Now, at the end of one iteration, you know what is the data that you have and what is the data that you don't. So, using that, you can, the bot will automatically ask one of the questions. So, in case there is two data that is missing, it will ask the question either together. It will automatically concatenate all the strings that is required for the query and it will post it back to Quitter as a reply or it's going to ask questions one by one. So, depending on the user's reply to that, it updates this. So, how does it update this? Every time a user posts, there is a parent TweetID. So, it uses that TweetID as the primary key and adds all the other replies as a map to it so that at the end of the tweet session, you have the list of tweets that is already there to be stored inside your either your MySQL database or Postgres database. It architecturally supports. So, when it comes to productivity crawlers, it just has to be in the Java output or it's just a Python thing. So, the server is written in Java. So, I think it would be easier if you program a new crawler and integrate it with Java with the data access objects that are already there on it. But, in case you want to just contribute data to it, you can make the other crawler as an app and you can contribute using a push data. So, here is a push API. So, the push API is localhost only. Okay, it has a few limitations. Sorry. So, it has a few limitations. But, if you are on localhost, obviously it's going to do it faster and you won't face any problems with that. But, if you're trying to remotely push to some back in, a remote low-clad back in which is probably not yours, it might cause some problems but it's definitely faster. So, locally they can share the problem and they can express it in the format. Sorry. They push API and express it in the structure. It expects the data object, the object similar to Twitter the object format. Yeah, the Twitter format. So, it's the same fields that you need just a few of the fields. You don't need all of them. It just needs a few. So, things like time. The time when you're pushing it, it automatically records it from the server time. I have not tested that. Anything else? Alright, thank you very much.