 Hi, I'm Rithvich. I run a company called Spiky. We are a data visualization company baser of Bombay. We've been around for two and a half years and essentially we build visual representations of large data sets. We work with different types of clients. We have core businesses which want to analyze data and take benefits. We work with data journalists and last we work with software product companies to embed visualization into the products. So we have clients in six countries. Some of the clients that you might recognize include first course and first base. They are part of networking. Visually, which is a, we are partnered with Visually. It's a very famous data visualization company in US and journalism plus plus color, they are the guys who build data wrapper. So we work very closely with them too. So during the talk I'm going to cover data visualization. I'll start with really the basics. Then go deep into the fundamentals and then I'll cover two case studies where we apply this. So how you should be actually visualizing data. What are the principles? What is, what, what you should be looking for in the data. And then I'll cover challenges in data journalism and what are we doing to solve them. So let's start. So let's start with the pie chart. It's like the most basic thing that one can do. And so here you have the data for the pie chart, which is like E has 38% and D has 25. So basically the objective of a pie chart is to break a hole into parts. Let's look at the data behind this. We call this as like it is one dimensional data out here and you're breaking a circle by the visual encoding, which is here. So let's look at the terms out here. You have dimensions. When you have data, dimension is the column by which you actually one second. How many of you are developers and how many are journalists? Developers? Yes. How many are developers? Okay. And the rest are journalists. Okay. So dimensions are columns by which you aggregate data sets and facts are columns, generally numbers, which you count some, et cetera, right? So you should always first look for dimensions and facts in your data set. So let's look at this data set. Seat count by party. Party is a dimension. Seat count is a fact. Seat count by party and state. Party and state are dimensions and seat count again is a fact. So this is the idea of dimension and fact. Now let's look at various visual encodings. The idea of data visualization is that you're going to take data and you're going to visually represent it using various visual encodings that your eyes can understand. These include area, position, color, length, thickness, right? So let's go back to our pie chart. So at pie key, we follow this model called as data first design later approach to make this to design our data visualizations. And the first step is actually building a categorization of various standard charts and graphs. So we call the pie chart as a one-dimensional chart. And you know what? It might have been just a amoeba kind of a shape out there broken by area and it'll still be a pie chart. Let's extend the concept further. All of these are pie charts, right? So what do we learn from this is that the same data can be visualized in many, many different ways, but what chart should you be using for what business case changes a lot? So if you're representing Maslow's hierarchy of needs, you should use the triangle. But say it's data, it's funnel. It's election, it's election doughnut, right? So, but the data set behind this is absolutely the same. So this is an article from Hindu. It's a very cool pie chart, which is like leaning against the wall, many colors, et cetera. But what is wrong here? They've not done it right. The wrong thing is that they've used color and color is communicating absolutely no data set, right? Next, it is 3D. 3D2 is communicating no data set. So ID of pie chart is focused around one visual encoding and that is area. The only visual encoding that you should be using is area. All your slices should be the same color. And, right? So the moment your idea, the objective of using data visualization is to communicate data. And when you start using wrong visual encodings, you'll not be able to communicate your data in the first go, right? And a lot of these principles are already covered in data, right? Your pie charts always the same color. This is another pie chart. What is wrong with it? Well, there's way too much of values, right? It's not readable at all. And so what is this teachers that area encoding has its own limits. You can't have infinite values in area encoding, right? So how should we be solving this problem? A, if you have more than five slices, you should combine all of them into other. If you still need to zoom in, click on the other and there should be another chart coming in, right? So let's look at another data set. Seat count by party grouped by alliance. So now we're adding a new concept grouped by alliance. And let's say I had this bubble chart with me, where I had party and the radius of the circle is the percentage. Okay, so now if we start adding color, that is the alliance is plotted using color, it becomes a grouped one dimensional chart, right? So we now used color to communicate data. So you can always add an extra dimension into your standard charts and graphs by communicating it using colors. So use colors very effectively, right? Which party one in which year? Let's look at this data set. Now this is a two dimensional data set. You have parties, you have parties and years, so the years and parties plot the dots and you have a two dimensional data set. Are you getting the concept of dimensions and facts? Okay. Right, so connect these lines and now you have a line chart. All of these charts are exactly, they represent the same data set. There's absolutely no difference between these charts. Right, so what is again, so you should start by looking at the number of dimensions in your data set. What is, what are the dimensions you're going to use? And that will tell you which chart you should go to, whether to go to pie charts or to go to these charts, right? Let's look at which party one in which constituency by what vote margin, we've added more data now and we had this scatter plot. Now, what we are going to do is add, add weight, that is vote margin. So now we just make that dot into a circle, the radius being the percentage vote margin, it becomes a weighted two dimensional chart, right? So the concept of groups and weights are something that you can keep using again and again, and they'll communicate different things. So, right, so you can always fit an extra fact into your standard charts and graphs called as weight, okay, and use and communicated using size or area. Right, so this is our weighted scatter plot. This is a circle comparison chart. They both are exactly the same. Let's say the data set was which party one in which constituency by what vote margin group by alliance. Now we have group and weight and it's a two dimensional chart. So we'll add color group weighted two dimensional charts, right? These are multi series two dimensional charts. When the dimensions start increasing beyond two, you have to use these group stack percentage area, match charts, multiple lines, etc. Let's look at this example. This is a story from live mint is the equities rally percolating into the broader market. So what they're trying to say is the stories around this, the BSE small cap is a line view line behind it. The story is about that line, but the line the color uses very bad. So the BSE small cap lines is not visible at all. And that's the story. So this is visualized not in a good way. But there's something that they've done very well out here. They're starting the axis from 90 y axis from 97 and not from zero. And why? Well, the purpose of a line chart is to communicate trend, whereas the purpose of a bar chart and column chart is to communicate discrete values. If the purpose is to communicate trend and you start your y axis from zero, then I think some 60% of your area of your chart will go into actually drawing the line from zero to 97. That area is wasted. And then you have only like 40% of the area to show the trend. Since the trend is not zoomed into the minor variations will not be visible. So line charts you should try to start. Now the whole company is divided into multiple regions, not Southeast West. The percentage of maximum revenues coming from this, then you have South East and North. Then West is further managed by various area sales managers. So I quickly know that West is broken down into area sales manager four and five. They have various sales officers within their data. So you can quickly identify what percentage of my revenue is coming from exactly which sales officer, right? This is a weighted tree chart, weighted grouped tree chart. These are relationship charts or graphs as we call it in developer terms. These are really complex datasets generally used for things like social network mapping or mapping the routers of the internet or identifying how epidemic is spreading. So again, very complex dataset, but again, this can be represented in all of these charts, right? So you have multiple options. So what I'm trying to communicate here is till you don't study your data, you'll always plot your data on bar charts, bar charts and line charts. You need to find these relationships and properties in your data to really identify what you should be showing, what chart you should be using. This is an example of a psyche chart, which is a weighted grouped multi-level relationship chart. So let's take one example, Mumbai local, the train, Suburbian train, fair chart. Now, typically all the stations are essentially a graph, right? I can go from, I don't know if you're aware of the Mumbai geography, but I can go from Andheri station to the other station, get down, do some, run an errand, get back in and then go to church gate. But typically it's the same line. So Andheri to the other is the same line, right? So it's a graph in a nature, but the business case is adding a limitation that since the starting point is Andheri and the ending point is church gate, I just have to buy a ticket from Andheri to church gate, despite the fact that I'm taking a detour at the other, right? So the limitation here is it's a two level relationship and not a multi-level relationship. And the moment we realize it's a two level relationship, you don't need to put it in such an early format, which is very difficult to read, right? So you start putting it in a matrix. So what I'm trying to communicate here is identify limitations in your business case, which you can leverage to improve your design and communicate it better. Okay, next that I'm trying to say here is most matrix charts have this symmetry about it, right? So ABCD is out here, ABCD is out there and you're plotting the fairs across these stations. But now you have, let's say, A and C out there and C and A out here. So there is symmetry, the data is useless. So you break the matrix chart into a half matrix. Do not add confusion. Do not add redundancy in your communication, because it just confuses. Do I start from here? Do I start from there? And actually, it's a matrix chart. You look at the OMEI local fair outside each station. So data visualization as a field, we think of it as something new. It's always been there, right? It's, it's all around us. When you look at the Delhi metro, how the, you know, you go from one line to another, it's always been there. That's, that's the data visualization. This is a brilliant chart. I really love it. It's called the core chart. It's a weighted two level relationship chart. When do you use it? Let's say you want to see the number of passengers who are flowing from one station to another station. Right. So let's assume if this was green colors, and the black color is church gate, then this so many people are going from Monday to church gate. What is the black to black again? So many people are entering church gate station and exiting church gate station. So it's a weighted two level relationship chart. This is the breakdown of charts according to us, the standard rate charts that you find on D3. Why do we have the breakdown again? You start by studying your data, right? And you identify where you should be moving. Right. This is fine. One quick point I want to cover is the dimension of time. Some charts are aggregate charts. So if you're creating the data for it, you're aggregating all your data for it. So like you can never plot a pie chart across time. You can never plot a map across time. You can never plot a relationship chart across time, right? Because the axis itself is not existing. So that's something worth noting. These are the various visual encodings that your eyes will understand. It's a nice breakdown given by Noah Elinsky. He's a designer at IBM now, I think. So let's take a first case study. We did this for first post. The work given to us was, can you visualize the cricket scorecard? And we started looking at various cricket scorecards across these various news portals that are there. And all of them are extremely text heavy. And there are so many pages, you know, the current scorecard, you go to one page, you go to full scorecard, you go to another page, you scroll down, you scroll up. So what we realized is that this is not the right way to do it. So first thing, as I mentioned, whenever you have a thing to do, a visualization to do, you first start by breaking your data set. Your data set out here is pre-match, you have toss playing 11 location time. Post-match is who won the match by how much managed the match. You have per batsman statistics, per boiler statistics, second-run innings, follow-up tickets, partnerships, ball-by-ball commentary. So it's a pretty large data set with a whole lot of dimensions. It's like, if you start plotting it without understanding these dimensions, there will be a lot of way too many charts. But we realize one dimension is very important, time. And time in cricket is denoted by the unit of overs. So that is the common denominator. That dimension is the common denominator across all the other data sets. One over has many balls, one over has one boiler, one over has many batsmen. The existence of batsmen across overs is partnership. The end of that existence is follow-up tickets. So the common denominator is overs. And then we started plotting it. So what we did was we put the overs in the x-axis and the balls in the y-axis and started writing them one run. So first over, first ball is one run. First over, second ball four runs. Now wickets is a very important event. So we started writing it in red. When there's a dot, it's not a very important event. So we just put a dot ball. So we started writing all of this data down. Then what we did was we started plotting partnerships. So let's say batsman one is batting from over one to let's say eight or nine. Batsman two is from one over one to over two. So their partnership is still over two. You see the parallel lines existence. And what we've effectively done is we used the weighted two-dimensional chart of circle comparison, right? Where the y-axis is where the x-axis is overs plus bowlers y-axis is balls for over. And if you hit a four runs, anything about three runs is a very important event, four, five, six. Those have a higher heat dot balls one run, etc. have a lower heat. So it's a weighted, the weight is the number of runs you score. Below that we took inspiration from the grad chart, which is your partnership start. Again, the y-axis, as I mentioned overs is your common denominator. So we reuse the same x-axis as before and plotted the partnerships. Now let's understand again, data, when you are trying to visualize data, do not show all your data in one go, right? You need to focus down on what the storyline is, you know, how the data is flowing. And any extra information that you want to show should be either through zooming or interactivity. That is how us, right? So when we have, when we had started the company, we were making a mistake of trying to show all the information. And increasingly we are working towards zooming and interactivity. So in our case, we put in this case, we put a lot of information in the interactivity part. So this is the end result. Actually, I'll show you life, show you life. Okay, so these are the two innings. This is, as I mentioned earlier, the overs, the bowler per over, the current two batsmen who are batting. The fours and sixes are highlighted. The ones and twos are dulled out, dulled out. While the no-bots are written out there, wickets are red in color. This is the partnership of the current batsmen. This is the current ball, right? And let's say if I want to see the full scorecard, I just click here. And the full scope of partnerships are visible now. And one nice thing I want to show is how if you look at the current batsmen. Okay. So let's go back to the presentation. Right, the current batsmen, you saw the zoom in version, right? All the other batsmen were dulled and grayed out. But we used color to highlight the current two batsmen. So use color to highlight your data models. Now interactivity, the moment you hover over the bowler, you will see, by the way, this is the current ball by ball commentary on the top. When you hover over the bowler, you will see only the overs that he has bowled, right? So this is M. Johnson. You will see the only the four overs that you bowled, all the other balls are hidden. So you now know what his bowling statistics were. Plus we'll have a pop-up here, tooltip showing like the strike rate and the wickets and all of that stuff. If I hover over the batsmen, I'll see only the balls that he's batted, not the other balls. And again, we have the statistics. If I want to look at the fact that who hit this four, I'll hover over this. The commentary will change to that ball. The batsman will be highlighted and the bowler will be highlighted. So interactivity to focus on finer details. Right. So if you start comparing this, it's way too smaller, right? It's really small. There's less reading, less scrolling and more awareness. And the best comment that we got from a reader was, it took two minutes for me to figure it out. But once I figured it out, there's no going back. It was, let's look at another example, election counting day. Right. So let's look at the data set. So again, we did this for first post, we are managing the election counting day. India has like lots of regional parties and two national parties. You have election counting day where you have this concept of leading and one data properties. You have hierarchical relationship between alliance and party. One is confirmed and leading is like transient. You're not sure if they won. What were the readers looking for in this election? They were looking at how badly would say UPA lose, how big would be the BJP victory and how big would be the impact of up, right? So again, we know what are the data sets to focus on. And real-world facts that we can use for design inspiration, BJP is a right wing party, up is a left wing party, and the sunset is like a semi-circle, like the actual parliament hall where they discuss. So what are the implications of this? Seats becomes a weight. The hierarchical relationship becomes a tree. The one and the leading becomes a group. The fact that people are looking at only these key points, we can club all of the parties into others and just like reduce the clutter. Next, we can put if the sunset is just the semi-circle, then we can put BJP here, which is a right wing and up, which is a left wing party, put them here. So placement is also taken care of. Now let's look at choosing the right chart. We had a group weighted tree chart, right? That's what the data set. So this is what we can choose from. Now, since it's a semi-circle, we anyways have to use a sunburst. There's no other option. This was the sunburst, right? Let's focus. So this is what we did. So what we've done here is, so actually, you know, let's go back. So the sunburst is generally a tree chart. You start from the root at the center and you're going towards the leaves towards the outside. Let's look here. Now you have these alliances out here on the outside, the parties out here in the inside. When you start hovering, you will see the actual values of whose meaning leading. You broken it into groups. That is the dark blue is the one, which is concrete data and the light transparent-ish blue is the leading data. Up is towards the left most, BJP is towards the right most. BJP plus by the way stands for NDA. We didn't want to use technical terms. Now, one problem, if you realize here, the root is starting from the innermost and the leaves outside. So initially, when we built this chart, the alliances were inside and the parties were outside. That's what D3 allows you to do. But then we realize the most important thing in elections in India, we didn't assume that you will get all the tickets, the seats. The most important thing is your alliance winning. So if the most important data point is if the alliance is winning, the alliance can't be given such a small area. The area for parties is way too less. So we actually spent I think some 200% more time redoing the whole chart, breaking D3's design patterns by putting the parent node outside and the children inside. So now the root is actually the whole thing. The next leaf is the alliances and then you have the children, the leaves. So this is what we did for election. So again, the idea here is that how do you see your data set? How do you identify what needs to be done? Look at the options that are available and then start bending those to actually give you the result you want. It's out here. Summary for data visualization is study the properties and relationships of your data set. It is very important. You can't start by saying I want to pie chart and then look at your data. You need to look at your data and then decide. Second, use your visual encodings very wisely. Very, very important. Otherwise, you'll get clutter. So this is the visualization bit. Now I'll get into challenges with data journalism. So these are the various steps in data journalism, according to us. Typically, you do data collection, then you clean the data, model the data, visualize the data, write the story. And the journalist is involved in all of these steps. The developer is in some of them and the designer is in some of them. Now, according to us at PIKI, there are two formats of data journalism. The first is data-driven stories. Data-driven stories are stories that revolve around current affairs. So this is something that India spent, fact checker, which is a part of India spent. It's a Mumbai based data journalism venture. What they did was after the, whether you rape case, they found legit data around crime against Dalit, analyzed the data, plotted the data and wrote a story around it. So data-driven stories are typically things that are very connected to current affairs. This is a visualization app. A visualization app is a planned project for a pre-brand activity. It's a massive project, right? You start collecting the data, modeling the data, identifying unknown parameters, design it, build it out. It takes a lot of time. So let's go back to our steps. Data-driven stories are generally things that journalists do themselves. You can't have a developer coming in to say that tomorrow morning I need to write an article. They probably don't even script data that quickly. Whereas visualization apps, all of them are involved. So the challenges with data journalism are how can you quickly access the appropriate data set, which Johnny here is helping with? How do you quickly analyze this data set? And how do you quickly consistently churn out neat charts and graphs and maps? That is what data wrapper is handling, right? Other challenges would be how do you model data lines? So like IPL data, the cricket score, or maybe stock market data, or maybe election counting day, all the data is coming live. It's like every second, every minute. But it needs to be remodeled to fit your visualization. How do you model that quickly? How do you do SEO for visualization? Have you thought about it? Like when we did these charts, this data sitting inside SVG elements, right? And Google does not read it that well. How do you do the SEO for it? It's a problem, right? How do you handle high traffic? These visualizations are extremely heavy, right? And from Pikes perspective, our problem is how do we consistently build beautiful real-time visualizations? So the beauty is important and real-time aspect is very important, right? So what have we done? We have built an in-house tool for ourselves. We typically use this to simplify our lives, right? So the first thing that we've done is the principle around which we've done this is instead of waiting for data to be standardized, we want to make large-scale, high-velocity, multi-format data extraction durable, right? So we've moved towards from writing just as web scraper for one site towards our complete infrastructure of how do you manage and pull data, right? For the developers here, it's driven around event machines. It built around event machines. The second thing is instead of expecting our data users and journalists to have analytical skills, how do we simplify exploration of large data sets? So I'll give you the example of senses. So by the way, this is the tonal approach. It's not looking proper out here, but the idea again was let's look at stories actually here, right? So the idea is that you can, there's a pretty heavy data set and you're filtering the data pretty live, if you can see. And so a journalist, there are more charts below. The journalist can actually just see that what are the lighting sources in various states. You'll go and click on UP, click schedule, cast schedule tribe, see that around 80% of the homes in UP for schedule tribes is led by what you call kerosene. Whereas you go to Gujarat and 80% is led by electricity. So how do you explore a large data set? You build a data explorer dashboard on top of it. Let's get back to this, right? So simplify exploration of large data sets. You build these explorer dashboards to automate extraction of metadata from data sets. This is very important. When data comes in, you don't know what the data is. And if you don't, if your computer does not understand the data, you are in trouble, right? So we built engines to understand given a data set, is this geographic data? Is it a date? Is it this? Is it that? We have dictionaries around this, which helps us clean the data and model the data, right? So you need to understand the metadata yourself. So the computer needs to understand the metadata itself. Next, you need to do assisted data standardization. So let's say if somebody is writing, you know, the data is like yes and true, right? Both are Boolean. We understand it, but the machine should understand it. If the data is tomorrow and second of March and actual date, how will the machine understand that all of them are actually dates? So we have tools around standardization of data. And finally, we have tools for assisted analysis. So there's this nice story that India spent covered about how the stolen cars in Bombay generally are Scorpios and fancy cars. So what they actually did was they took the data set of stolen cars, analyzed that there's either a cluster or an outlier. The story always revolves around clusters and outliers. So to find a cluster and outlier, you need programs. Of course, you can do it manually. But if you have a program which will say that, okay, this is all well distributed, this is what you need to focus on, then the journals will be way happy. Third point, instead of expecting data users and journalists that they'll visualize the data correctly, now that the engine has picked up the metadata, can you do metadata-driven visualization? So if it's ordinal data, then you should use saturation levels that is opacity to communicate using color. If it's categorical data, or let's say if you're using pie charts and the area, the number of values that you have are very high, then it should say, please don't use pie charts. So you need to understand that again, the machine needs to understand that metadata drives visualization. So it should take those smart decisions. I'll give you another example. So if you have data that you've plotted only for Western European countries, and the only other map that you have is that of world map, then your map, the visualization engine should automatically zoom into Western Europe. Because the rest of the point is irrelevant. You have only data for that. How do you do that? The engine should do it. Other examples with journals and plus plus, we're also experimenting with the data-driven blog. Most news sites are like WordPress, which is an article-driven engine. How do you do data-driven journalism? You have the, does your journalists have the tools to actually just explore data, plot it, do this, do that, and quickly get the story out for that you need a data-driven blog. Next, we use tools like configuration editors, this again something that we built in-house. The problem that we are facing is that journalists and businesses and clients always want things to be configurable. Developers always want things to be like the reason for that is that everything that you want to make configurable is a new structure in the database. And you're creating new tables. You're creating new models. You don't know how it's going to impact your design becomes unnecessarily clunky. So what we've done is we built a tool called as configuration editor, where you like the business guy that is the journalist goes in and says that I want the background color to be configurable. The SEO guy says that I want the metadata to come. I should be able to edit it myself. The meta tag should be this thing configurable. They just add a list. Automatically, that whole data set gets stored in a JSON file for us. And then there is a, it's like Google forms. It takes that metadata, generates a form that they can fill up against those into the JSON file. When they press publish, it goes to the CDN. Now the program that we are writing is picking it up and all of these values have got defaults. So the visualization works seamlessly and they can just change everything. So the problem is that as we are doing data journalism and these news organizations work at really high velocity, like it's scary like they are so like it has to be done by this time the story will become stable, et cetera, et cetera. How do you build tools that are designed around, you know, making data journalism easy? So these are some of our experiments. One thing that we've learned, we started off as a visualization company and we are increasingly becoming data and visualization company. We never planned to become like never planned to use no SQL or in-memory databases, et cetera, or NLP, but we need to do it. So data journalism really involves analyzing your data set. So another example that we now took was like press releases coming out. You need to have an NLP engine to analyze it. Let's say, so let's think of it this way. Let's plot the tweets of Narendra Modi, pre-election and tweets of Irvin Kejrivar and you have like line charts, four line charts. One line chart is positive tweets, positive sentiment tweets of Modi and negative sentiment tweets of Modi. Like what people are talking around Narendra Modi, not his tweets. And you have positive sentiment tweets of Kejrivar and negative sentiment tweets of Kejrivar. And now you take all of the data, you run an NLP engine on all the historical news that is being plotted in your news company, like all the articles that you have, and you understand that after which article, that is which event in real world, the tweets, that is the Twitter communication shifted from positive to negative for a certain political leader. That is NLP because like hot co energy or let's say if you have a presidential side election where it's Kejrivar Modi and Raul Gandhi and somebody's talking about somebody stands for secularism, somebody stands for communalism, etc, etc. There are these standard notions that we have of people and political leaders. Can we actually take the text of their speeches, crunch it using NLP and understand are they actually talking about secularism? Are they actually talking about communalism? This is also NLP. So data generally natural language processing. Yes, so you take the speech of the text and analyze, convert it into numbers, basically. So we started off as a visualization company and increasingly we are becoming a data analytics and visualization. I think that's it. Some fun fact, the name Paiki comes from a capture. So when I started the company, somebody wanted to buy something, I wanted to sell something, I didn't have money, and I didn't have time to actually plan the name of the company. No, no, no. So just use the capture to name the company, till we do what we decided that it was, till we do good work, it doesn't matter what you're calling it. It's actually a very nice way to not waste time on deciding. Yeah, we thought of Sanskrit words and Gujarati words and then English words and started looking for an African language and okay, okay, this is getting really crazy. So this is it. So the tools you are using, you are using for a party like the resorts, the backstage you are calling. See now, as and when we face problems, we know we are going to fall. Like certain cases, like many times the expected traffic for certain event was X, but we got like three X or four X or five X, right? Or the data coming in was like at a higher velocity than we could handle, and we failed. So what we did was overnight we cooked up something which is going to next time help us analyze it better. Actually, I can give you a one of the very early versions of the tool. You want a demo, I can show it's a website. So I don't know if you'll see it. Let's see. So this is the first prototype of that configuration editor that we cooked up. So let's go. Right. So let's label what I got. That is the background color, BG color, right about text. It's all around HTML5. Take this and go to is the text one save it? How do you do this? How do you do this? Okay. Now, okay. Yeah, much better. Right. So now press here. And we have what is the background color, right? Okay, I can see the tooltip like the color selector out here, right? So background color is red. And now you save it. It's goes in the value of the same. What have we done? We've effectively eliminated the need for changing databases, right? Journalists and the SEO guys and the business guys can decide what is configurable and it's configurable. But what are you able to decide the dimensions of the type of difference? No, that is, I come to that. But this is only for configuration. This is like the IT configuration editor. Right. So let's look at this. Let's go to datasets. I want to explain the concept of workers. When you start pulling data at really high velocity, it becomes a problem because you don't know what is failing is the NSE API problem, like is the crickets core card feed a problem or is your processing a problem? What's going on? So let's look at this worker. It's a test worker. Basically, every data set in Python has got workers associated with it. And you go in and it was so basically it did the fetch did the web scrapping or did API pulling or whatever that is. Identify that is 0.2, one to second it fetch so much. How many records it fetched if there's an error. And then did the processing. That's the cleaning, the saving, the modeling. So with this we are able to do live modeling. And then sometimes when you need to push live data to CDNs, it will do the live push. So we know which step is failing. Now for a typical IPM match, you know, like there are lots of these things working like in like seconds. So we can monitor things very quickly. If we don't have these tools and what will happen is that we'll write something in PHP, it will be in PHP, it will be working in PHP. There's no way to monitor all of this or Ruby and Rails for a matter. Right. So now let's analyze an actual data set. Let's pull Pyki Twitter, which has no data. And let's say it has to refresh automatically every 4,000 seconds. Again, I don't know if it's going to work or not. It's a very early prototype. Let's see. It's supposed to be your image here. Yeah, no. So I can click on it once the image comes. We'll see the workers. Okay, so basically we'll go back to that same page, see that the first fetch worker worked in so much seconds. It pulled how much data and then was the modeling done? You can see the data there. In case you were doing the data and how you put it in there, like what are you going to do? It's like it's pulling the data, right, from some place. So pulling or? Yeah, no. So again, without this system, we would fetch, keep pulling data, new data, without modeling it, because modeling takes a longer period of time. And then that would get stacked up and then something would fail. Right. So now how this works is you do fetch, you do modeling, right? You push it to the CDN and push to CDN as a separate worker and then you start fetching again. Yes. Pulling is on this front end. The data needs to be modeled. Right. I think elections are better example. So once this is done, it would then go to the next stage, the next stage, the next stage. Right. What we can't see here right now is the visualization layer. I can show one more. So let's call this demo. Let's call this country, GDP, life, road access, some, some kind of index, and is, I don't know, safe place. And finally last updated after India, some random GDP, load index. Let's call this Y. Let's call this updated yesterday. Let's call Somalia balls. Right. So now we've taken a date column, which is in text. We've taken Boolean column, which is in random Boolean standards. Let's see if the worker man. Right. So again, this is a UI error, but despite the fact it was Y and true and false random data, it helped with the probability of being Boolean. So it is extracted metadata. Despite the fact that this is yesterday and something I mean, it knows it's state time. So this is smart extraction of metadata. And now what does assist its standardization? The fact that we know that this is not standardized data, but it is daytime. So we can convert it. You can take today minus one becomes yesterday. So, yeah, so. So that's the objective, right? If you have geodata, if you have all the country data, it should convert it to ISO codes, right? It should find clusters. So yeah, I think that's it.