 So well, it's a small group so we can make it much more interactive. It doesn't necessarily have to be too much theory or the idea is to talk about the architecture design for interactive visualization, so There to us who kind of work together and teach together. My name is Amit Kapoor and I kind of work in this intersection of visual data, visuals and stories, so data visualization, visual data science, machine learning, but kind of still trying to talk a lot about this intersection, right? And my partner in crime there is Bhargav who who has been doing data science before the term was invented and works has worked across the both sides of the US as well as India and many different areas and currently runs a startup trying to build a recommendation system product and personalization product for companies and for B2B sales, so Okay, how many have have done any how many have made done visualization. I lost easy questions first Okay, how okay, that's pretty good. How many have done interactive visualization? Okay far less hands, how many have thought about how to build something as an architecture for doing interactive visualization? Okay, that's great. Well, so we have a few people who have also thought about it. So that's great. We can talk about it So let's take a simple example right like let's take the most basic Let's make the most basic data set This is the data set on the left. It's just 15 data points area sales and profit One categorical two quantitative data points and we have a very simple interactive visualization on the right, right? We are We have We we have what you would call about kind of a bar chart But really encoding area as bars area on the x-axis sales on the y-axis profit a percentage as color, right? profit percentage as color and We then also have a rule which is the average. So the red line is the average And then we have a brushing action that's happening. There's also a hover happening, but there's a brushing action happening when you brush The area that selected actually recalculates the Recalculates the average, right? So whatever is the area selected in that now If you were to try to unpack this right I mean to try to unpack this Visualization we basically made We first did a data transformation because we don't have profit percentage So we needed to calculate another column which is profit percentage So we had one column more which was added and then we have two layers that we created So two layers literally one which is kind of the bar chart if you're comfortable with the chart terminology and a rule chart and then we laid it on top of each other and Then we linked it with some interaction so that when we're doing a filtering on the first on the bar It is Decalculating the line on the second layer, right? So just a simple visualization requires actually multiple steps to create or multiple steps to Think about and this is a good way to think about how we can think about architecture for visualization, right? So if you want to think about architecture think of these four layers, right? So there is a data layer. There is a transform layer there is a visual layer and There is an interaction layer, right? So there are these four basic components that are layers that are that the data has to go through Where before we get into a visualization now? If you think of a very simple architecture, which is if you're trying to do visualization on the browser Or which is this example everything is getting done on the browser So I'm kind of think Articulating the server client architecture there. I have the data on the server so I we have this data as a CSV and I can do then all the subsequent elements required Within this browser to do make this happen, right? So I can transform my data in the browser. I can Encode it in different layers Visually and then I can also code or encode the interactions that need to happen between the layers Whether the layers are on top of each other or a composed in a dashboard way, right? so but everything from the data onwards is happening on the client side, right and This is pretty much kind of the grammar of graphics. I think the label below has changed, but okay the grammar of graphics and If you look at this on the right, it is really great this triangle. It's really flexible I can do everything I want on the client side It is high latency or low latency in the sense I can interact with it everything the only challenge would be in scaling So the data still has to come from the server. How does it go from the server to here? But once it can go there, I could probably do everything on the browser, right? So when we think about if we are okay with this architecture is this clear in terms of Think of this as the grammar of interactive graphics, right? It's a grammar of interactive graphics. I think there's a previous talk talking about the grammar of graphics, which is very common if you think of using tools like are or Vega light or T3 there is a grammar and This is the grammar of interactive graphics, right? So interaction will add some more layers to this selection filtering exploring reconnecting reconfiguring and Our focus is to think about how this architecture changes as we make different trade-offs, right? So everything in design is a trade-off So we will talk about four trade-offs or One how do we render for data skills? So as the data goes up, how can we change the rendering? So how do we scale as data changes? interaction requires Interactive speed so when I interact with it I may require some re-competition that happened So computation for that interaction speed if the data shape changes how do we handle that and Then if the data is not static, but there is also a velocity element to it How do we handle that right? So think of point three as variety or Complexity is point four as velocity as the data stream Yeah So rendering for data skill So Five data points are okay, but if you have a billion or billion data points, what do we do? right, so it's When we have million or data Data points or a large set of data points. What are our options because that's why we want to talk about rendering for data scale and So let's think about it. What are the options? I have this points that I'm trying to render which is about a million points and Given that the pixel on my dated MacBook Air is 1400 by 900 is nearly equal to the number of data points I'm trying to do right. So if I plot it or just on an x and y axis. I am literally over plotting, right? I can't see anything. So what are my options? What are my options? find city position, okay, right? So I can Take some aggregation features and do it. What else? Yeah, so Alpha, yeah, this is alpha Actually, alpha doesn't do anything at this level right a million points with 0.001 alpha will also pretty much be Because there's so much over plotting here. There's nothing you can the alpha is like literally of no use, right? Yeah, so the first option is sample, right? I mean, this is a realistic way to think about this I can always sample the data and as long as I'm okay with my sampling techniques. I can actually plot this really well, right? I think we are in like a AI conference I'm sure half of you want to build some, you know, great a new model So obviously we can model the data and the modeling the data obviously reduces the visualization space and It's great because modeling scales really well and the model can be shown far more easily But it doesn't help us address the visualization topic Which is really I want to be able to see the data to know what is what I don't know, right? So how do how do we do that? So typically the most effective thing for this context is normally binning, right? So, you know Putting it into bins and using some aggregate counts to do that can actually increase Can make it easy for me to render data, right? So binning actually starts to show some some visualization, right? I Kind of tend to make this statement at visualizing data at scale is just a process of creating generalized histograms, right? I'm always trying to aggregate data in some lower buckets and it's all about how I do it where I make that trade-off of creating Okay, I haven't talked about One thing in the rendering so this is kind of getting the data in but there's also another challenge of rendering Which is when I want to render the data to can my screen handle A million points when it's moving, right? So typically we render using Dom elements We never use dominant. Sorry. We use kind of scalar vector graphics or we use raster graphics Or we want to use WebGL activate Accelerated canvas, right? So WebGL canvas and if you think just at the rendering of how many data points I can do Typically the order goes like this. I can if I have a thousand points I'm okay to use SVG as the rendering there if I'm about 10,000 Maybe 20 30,000 I can go up to a canvas and if I really want to think about a million points. I want to use The hardware's accelerated canvas, which is WebGL in this context in the browser to actually do it, right? So I may bin the data, but even in spite of binning I may need to still think about my rendering options, right? So one option is if I'm can pull in still a million data points I can just use WebGL to plot it, right? So here's an example of using One option that that people you can experiment with is DECGL. DECGL is kind of an open source project really looking at geographic mapping and it basically uses a WebGL layer to do 2D 3D projections on 2D 3D protections on maps typically based on geographic. They have a little bit of non-geographic stuff But you can really use this to do it. So moving to a WebGL architecture can really put it Plotly if some people use is also able to plot a lot of data points because they seamlessly move between WebGL and other rendering options, right? Here is an example that I did I was helping someone build the visualized secondary freight so how the freight is moving between different delivery points and Over time so, you know the points are distribution the big logic stuff is clustered Aggregated amount of how much Freight is going through each warehouse the lines are basically showing routes from Points in India warehouse to different distribution points and the timer slider on the bottom is really trying to Show you how that changes and plays with the time horizon on that right so you can play around So this is interaction But still I've loaded everything in the data in the in the browser and everything is happening in the browser It's just that I have used a rendering capability Which allows me to use a larger number of data points able to handle a larger number of data points on this right so the first detail option if you were to think about is Should I bin the data? should I summarize it and should I use a better rendering engine if I just want to move to Handling data at scale right so I still put the data on through the through the pipeline, but I use Effective visualization Transformations and rendering mechanisms to do that right so I can increase the scale level a bit Right, I may not be able to I mean people who come and say can I do my six terabyte the Hadoop cluster on this? No, obviously not, you know We're talking not Hadoop clusters, but you know if you're realistically looking at a million points You know 250 MB data point 250 MB data you want to push that's still realistic If you compress it you can make it So the other technique which is which is another way to do to reduce is how can I? Okay, I still want to send large data. Can I compress the data before sending it in the wire, right? so I can then actually Send more data across the wire and still use the same technique to do it right so this Effective approach was called been summarized smooth if anybody's looked at it This was a paper by Hadley Wickham, and it basically said instead of transferring raw data I will transfer bin to summarize smooth it smooth data to actually Effectively render it right so I will actually move My understood the paper is called big ways and the idea was really it's also paper It's also packaged in on as so I move the transform layer back and instead of actually rendering Sending in the entire data. I send only the bin summarized and Smooth data across right so now I'm compressing the data and only sending that part So I can actually increase the scale latency still good Obviously flexibility goes down because now I'm making choices around when to send the data or what data goes through it Right, I can't I can't recompute back the raw data from this right if anybody If you can take this approach one level more if somebody uses Python for example Well, if I've already been the data I've already summarized and smooth it instead of doing the visual rendering on my image on my Client side why don't I just make the image also on the server and just send the Image across right and then just leave only the interaction part right so this is the idea around data shader Which basically takes this idea of saying instead of sending Compressed data. I will just send an image across which takes it to the next step Right, so I take the image and this is like I think the US census so about every person in the US rendered and every all the compression of the data as well as generating the images run on the server side and then I translate that stuff Image and just transfer only one PNG or one Literally a PNG across right and then I allow build interaction capability on top of that Where the interaction is done and then the signal sent back and a new images sent if you zoom in or pan on to all of this Right, so I am translating everything back to the server and I'm making kind of a different tradeoff I'm moving this line further to the right here, right? So now instead of only been and summarizing I also have to think about encoding options What exactly the chart would look like visualization would look like and I'm sending it Across this right. This is a very effective technique for scaling I mean and especially for data scientists who are thinking about how do I do visualization at scale these techniques are available in R and Python and you know, you can actually start using some of these packages Data shader for example is just a generic rendering pipeline so you could use that as a rendering pipeline with your favorite visualization package and just Take that concept and then use your visualization package to render it, right? So a lot of opportunity to kind of even in your notebook how to scale up, right? You obviously scale up you get latency Good latency can be still done But it's obviously less flexible, right? So now you can't do a lot of the visual encoding changes. You have to go back And recompute this and send it back Does that make sense in terms of these tradeoff these are very practical approaches in terms of Visualizing if you're thinking of really visualizing data at scale, right? So the architecture design is focused on how do I look at largest largest scale then these are good choices to think through, right? Okay That's one dimension one tradeoff, right? The second tradeoff in interaction is really interaction speed, right? When I interact with a visualization I Want an immediate feedback from the visualization if there is more than 200 or a 500 millisecond of delay it may not be allow me to do interactive data analysis or interactive visualization, right? So if I am selecting an area or zooming into an area and it takes three minutes for the new visualization to render Then it doesn't really work out, right? Then I can't really do interaction the interaction is just Is more like a batch process than really interactive, right? So there's a lot of study done on what's the latency that you need to do it And that's another tradeoff to make when we have we're thinking about this, right? So And this is especially true for multi Let's think of it as multi Multi-chart layouts, right when you're really interacting on one part of the chart and you want to see The representation on the other part, right? So Common example is cross filtering. So this is an example of cross filtering where we're filtering on some dimension brushing on the lower part And as we're brushing we want to see the data represented in the other Elements, so we want to see it in the map We want to see it on the other parts and we're really trying to understand how this looks right, right? So One approach to this and this is more just a demo but was really can I create these aggregation cubes and Which allow me to do cross filtering but I really build it on locally and allow people to cross filter really fast into this So cross filter the original library Was again built on this concept. This is much more a concept taking both the idea of cross filtering but also using Tile generations to do it, but the idea is basically I'm aggregating the data keeping cubes pretty much like you would do in a database and Then using that to drive interaction faster, right? So I'm not just getting the data I'm also creating cubes of the data or index data or hash data in a way that I can Cross filter them very effectively, right? And that's the one way to improve latency around it, right It's still happening on the client side but Whatever pre aggregations I do and make those kind of operations become faster So I do remove my flexibility down a little bit, but I can still improve the latency interaction speed on on on the visualization, right? This is the same idea if you were to translate on the server side I've not talked about it, but you on the server side is caching, right? So if I pre-compute all my visualizations or data in and cache that data set And then I'm only sending a very small amount of data set I could actually get interaction speed just from the server side But now using cached data as opposed to memory cubes, right? So that's another way to think about this trade-off of interaction Another technique Which is Which is interesting to try especially those who have larger data and you want to do interactive speed is approximate querying So approximate querying would basically mean if I want to interact with the chart and I want an answer correctly It may take a long time, but if I have an approximate query engine on top of my data Base or in the middle of the database think of it as a middle where I get an eventually consistent answer Eventually considered, but initially I get an approximate answer, which is pretty much like sampling So sampling my database to get an answer for that query And it can be up to 50 times faster than the actual one So this is great for if you're really dealing with large data sets, which luckily I don't But a lot of people want to interact with, you know, terabytes of data Then approximate querying can be one way to do it from a visualization point So there are two challenges from a database point You need to figure out how to implement an approximate query engine in the middle from a visualization challenge It also has one additional challenge. So now I'm querying the approximate result and getting some results So I'm doing the transformation on the server Everything else is on the client is still really fast. I can do interaction really well But what is the challenge is visualization is that now I'm getting an approximate result So I need to find a way to show this uncertainty in my data like we will show on a projection And in a projection now again, you have to think about how to show the uncertainty that eventually my data Is this answer but currently you're seeing an approximate version of it So the visualization challenge normally using approximate engines is how do I How do I I lose one more encoding channel because I have to think about how to show Uncertainty on the results also, right? So that makes it a much more harder question One way to think about is it's basically sampling. So suppose I'm trying to I'm trying to minimize find a minimum value within my one column and I want to get that value, right? Now if I want to scan the entire database, I will probably need a longer time to do it Let me sample the column intelligently and show you an answer and also the Range of likelihood of that. So instead of getting one answer, I'll probably get Like a mean answer and the range, right? And yes, eventually the system will Can calculate given enough time, but if you're focused on interaction Does the analyst or does the business user seeing an approximate result? Can he start making decisions? Which are as good as saying eventually consistent, right? And that is a different trade-off I mean if really you want to do large interaction at scale, but also have interaction speed Then this is one approach to do it. The other approach is obviously what we talked about as In memory cubes or caching, right? So caching is an alternate way to do it in the sense I cache the data then then I may not have the most freshest data, but I'm kind of caching it So the linkage to kind of the data layer becomes much much much stronger, right? The third approach if you want to think about this or more approach is really use a faster database, right? So if I if my Challenges is I don't want to do caching. I don't want to do pre computes Then can I just use a faster database to do it? And so GPU database that some of them are from are kind of thinking about in terms of high Really low latency database. This is map d an example from map d and It's open source at least the The database part is open source. So you can actually use it And the visualization is basically sending an image in this case. So but Because the database itself is GPU it is not really sending an image It is really sending me directly the gpu vectors and I'm plotting the gpu vectors itself, right? So I can actually make this really faster if I can get my querying happening very faster and this is I think a large tweet database And I am able to interactively analyze and scan through it and look through the results, right? So Getting interaction speed on larger data is not as simple You will have to make some choices and that's what you know architecture design and in a lot of ways Thinking about this, right? So I'm getting faster responses But literally doing all the transformation and visual Encoding on the server side, but in this case the service actually literally a gpu, right? So it's doing a gpu Computation and pushing in this side. I can get It can be highly flexible very low latency scale So it looks like this is the golden solution, but I missed one number dimension Which is really how much data can I really store on the gpu? And kind of the cost of doing this, right? So it may not be most effective for all your data, but it may be really effective for your data that is really Mission critical and you want to operate on it. So operational database And really looking at fast insights from it, right? So if you add the cost dimension to it The amount of memory dimension it'll make you a difference, right? Okay, so far, okay Any questions? How do we think about complexity? So This is the third challenge, right? Which is in visualization. It's easy to make visualizations when your data does not Is very static and I make it once and I know everything that's there in the data set But if my data is is actually Not only static, but it changes in variety or it has some other characteristic which make it harder for me to visualize How do I handle this, right? One approach is Really thinking about responsive visualization to space and data, right? So This is really It's much more on a rendering challenge, but the idea is effectively Talking about my visualization can adapt itself Depending on the size of the visualization space that I have as well as the number of data points that I'm doing, right? So it's not a scatter But as the data goes up I can actually go to a heat map and and even further if the data goes keeps going up I can go to a contour map and really go forward in it. So it's really adaptive To the space and data that is theirs. There are a few libraries that do that right now But it requires much more link Custom coding at the moment, I guess to get more responsive data visualization But I'm really adapting my chart space to more data points more data points different visualization Not just the same visualization. So there is another dimension to the encoding which is the number of data I have or in it could also be the size the same thing rendered on mobile looks different to the same thing rendered On a desktop. So it adapts also and changes the visualization Depending on the format that is there. I think I showed this in the morning But there are a number of experiments also to think about how to handle cardinality, right? So data number is one but high cardinality is also a big problem in and cardinality basically means A number of categories in a particular column. So if I'm doing categorical data, if I have five categories, it's okay to show But as my data goes and becomes more cardinal More categories, I need to have tools that can actually show summarize others As is happening in this case. So there is another bucket here Which is dynamically computed and adapts itself to the visualization that I'm trying to do So it's binning on the fly and I'm really trying to look at that right uh the third is Another way to think about cardinality is really change how we look at data So when I displayed with tap plots in our Or really thinking about data in a very different way. So this is Combing through my data It is a big table, but it is Encoded visualized and as I scale through the data, I can actually start to look at each values I can group stuff I can start to so I can actually handle large amount of data variety but allowing at the same time to interactively Work through it and learn about it, right? So I'm both visualizing summary features Plus interactively going through and I can start to see the patterns within each columns if I am right So the moment you have visualization, which is high cardinality, which has high High variety Or can change in size dramatically you have to think of different alternatives A lot of this is not easily available In tools and that's again a decision you have to make whether it's worth Building this in your architecture or not Um, yeah, I mean the other approach obviously is thinking about dimensionality reduction um There are two ways to think about dimensionality reduction One that is typically used in business, which is basically faceting the data or sorry looking at the data in smaller smaller Viewpoint so it's multiple layers as you do it. That's why we have all our dashboards are like this Charles let's set of Rectangles that we have because we're trying to handle multidimensional data The other from machine learning end is really thinking about whether my data can be Visualized in a different way and using projections to do it So projections is one approach interaction is another approach And if projection applies then I can think of that as a way to do it Okay Last trade-off. I mean this one is a Relatively an easy one, which is the uh velocity one. So if my data is really moving How do I think about this right? So if my data is refreshing at a rapid pace How do I handle you know real time or near real time data or refreshers In it I think I only have one example of this Which is I think this is one of the original time series visualization Very common pattern now. This was from cubism. I guess cubism and it is really looking at Uh from a visualization context, it's not really hard And data handling context actually streaming data is much more easier to handle than any of the other data Because the data that I'm handling at any time is fixed And I'm mostly adding one data point and dropping one more data point If I'm really handling streaming data point the data point happens the visualization challenge becomes harder For real time or streaming data is when when I want to actually store large amount of data But if I'm only looking at a fixed window of data, then visualization mostly ends up being a slightly easier way And I'm only trying to uh visualize the The last end window or the last 10 minute window and the visualization just means shifting data and it becomes actually Not that hard to do it Um, I guess this also reflects I haven't done enough on this area. So maybe there are more challenges to do this uh but Yeah, so Thinking about interactive visualization is great We don't have tools to that allow us to do a lot more that we would like to do Both in terms of just creating that first simple chart that I showed at the start of the Start of the talk To then trying to build products in or Where we will have different trade-offs with that we want to do so my advice is really to think about More and more easier client side solutions if you can initially before you start to think about More harder solutions on the server side So a lot of possibilities are there As I talked about for data scale and computation on the client side And when you feel like that doesn't does not give you the right trade-off You can think about what other trade-offs I want to make in the server side to do that right Interactive visualization at scale is not removed from my architecture design around database or what I choose On the raw data and transfer right? I mean if that That linkage is really strong if you have made choices on the front end in terms of how you render it What libraries you use how adaptive they are they are again same choices you have to make in terms of caching And transformations or faster databases on the back end to really do interactive data visualization Right, so this spectrum requires kind of thinking About both these trade-offs Okay, I think I went at a rapid pace, but this is what I have um We still have some time um Any questions thoughts people who have done this stuff and built products or built We'll walk through these trade-offs if they have Questions or thoughts they would like to share We probably have a mic also So in your experience In your experience, what do you feel? uh in terms of I'll Go away from the approach because you really cleared that up The tools and techniques which work better for you rather than Some places which you always go back to like some tools you go back to or some ideas you go back to again and again um So I think it depends a bit on my context, right? So my context is both in terms of teaching this stuff as well as mostly handling smaller data, right? So my bias is towards tools That are is pretty much the first slide that I do that I want just the data and everything I want to do Kind of on the client side or the tool able to do handle it and or client side I'm not only mean client side, but since I'm dealing with small data. I'm comfortable handling everything in memory or In that so that kind of biases a lot of my trade off lot of my My usage in terms of tools, you know, I think I'm not very different from most data scientists I would say, you know if if I'm using the r universe You know the tidy versus universe is what I would use again interaction options are really not that great in our Or they are great, but they're all bolted on top like through html widgets So Which means they're great for single charts, but if you want to get Cross talk as they call it and get multiple charts to work then they're harder And then you have to basically adopt like a dash dashboard framework like shiny to do it um python universe is Actually pretty much more harder. My bias there is altair right now because that's a offshoot of Vega light or Vega. So it allows for composition and it allows for creating a lot of these charts But doing interactive multi dashboard at very large scale Stuff like they should data shader and all that helps but in that ecosystem I think there's so many options and so many choices That it's hard to pick one and just go deeper in it You ultimately end up picking smaller stuff to do it and a lot of this stuff is not hard A lot of this stuff is hard right even though I think the anaconda data shader kind of thing tries to do a lot of these large scale there In the js ecosystem obviously Vega d3 is what I I kind of go back to Um Vega because it's kind of Vega because it's declarative and allows me to do interaction declaratively d3 for custom stuff p5 for kind of doing Generative and creative stuff Yeah um And trying out new and I think the other thing to keep in mind is that new things keep coming up So at one point you would say I wanted to map visualization I want to use leaflets and all that but like deck.gl from uber And you know their tool Kepler for example that that I used to create that can actually do a lot of your interactive graphic visualization in the business context and I've seen businesses are adopting it As part of their tools So it is also not a standing still equilibrium So as we start to use more of this webgear stuff I'm guessing we'll see new options which will make it very specific for domain specific problems So that's another kind of you know, keep an eye out for stuff that comes out. Which is really nice Kepler is really nice if you focus just on geographic data Yeah Question there, huh? Yeah, great talk Amit a lot of insights to work on One question would be on the mobile interfaces How do you see all these efforts evolving with mobile penetration increasing drastically in terms of visualization? So I I I think I mean mobile has it To be honest we we do visualization And if you think about it most of our stuff is optimized not for touch interface, but visual kind of mouse interfaces So if you think about pure interaction a lot of these examples won't really work You know because so mobile requires a different way of thinking I think there are two things to think about there one is adaptivity becomes much more complex I kind of just talked about it in a very narrow context here But you know when you start designing for mobile interfaces being adaptive to different screen sizes really brings it into Focus how to think about it? That's a also a little bit of a ui and design kind of imperative which overlaps with data visualization and This audience is normally not Generally very keen on that So I think how to think about different sizes how to think about touch and most of our stuff that we use in data science Actually, it doesn't really work well on mobile interfaces So we need to really think differently about that I haven't done also enough work in it, but you know android and Those ecosystems also don't have good libraries to do visualization stuff So web is a good option, but make it more on mobile optimized For that trade off Yeah Thanks for the great talk and sharing different tools One of the I mean, it's a it's a very curious question. I'm not sure if you will have answer for that or no Many times business users feel comfortable in interacting with the system with the language that they understand like a lot of corporates have 800 reports In their whatever visualization tool that they use, but nobody knows in order to get what they want what report to click and Do what with that? So that's where they struggle and the idea is to have An interface wherein they could chat. I mean where they could send message To a system and message will have machine comprehension would understand would actually convert the query which is written in English to SQL and get the data out from their data source and actively decide What visualizations to what visualization to show? Yeah Do you know if any of the tools Are if anybody's really working on that At some stage or could you share some examples? Sure I mean I consult or help a friend out or mentor his company and he's building something like this but Which kind of addresses the question that you have But the question is not as just simple question of architecture for the interactive part, right the moment you want to Um Get a natural language search based query system to generate visualization What you need is first I need to translate that into sql query as you said But that's not only enough. I need a data map Of all my data sets joined And have some knowledge graph built on top of that that I can use it, right? Yeah, that so that part is actually harder in the business Why businesses have 8,000 or eight reports is that these stuff is all siler out and not together, right? So the company that I was referring to sprinkle data. They try to Kind of combine this data set Build an knowledge graph on top of it and which works really fine for you know startups or emerging companies But I'm not saying it's an easy solution for large companies to do it, right? Um Trying to do search based query and auto visualization a lot of tools have tried right power bi has search based Very very basic right because what it lacks is not so much Um, it does it does not have deep domain knowledge of that Like I mean the search width query would only work Once you know the entire data sets that they are in your system and you can actually do it you can actually kind of Find all the connections created The other part of then is which is the second part to this So this is the first challenge the second challenge is auto ways or creating visualization automatically Okay, so auto ways is kind of like given a query Can I select a visualization out of the combinatorial options that are there, right? Now again, there are a lot of You know ways to think about this and people are researching on it. Voyager is one tool Which does it? Draco is a new tool Or is a tool that from the ideal lab interactive data lab of washington No, so the Automatic visualization you say auto ways is also a term that lilin welkinson uses for Generating insight automatically. So can I say tell you out of all the graphs? Which Two by two graphs should you be looking at as a user, right? So that's also one auto ways the second for business automatic ways is more like For this data set for this question. This is the best visualization Those are actually two separate questions as people do it. So it's best visualization. So that's again a Second problem you have to solve both these problems Along with all the machine learning parties and it's easier to do it for a startup company But it's harder to do for more larger enterprises where Just getting the connections itself will take you a long time before you can do this, right? But happy to talk more offline on terms of other questions other Resources that we have Thank you. Both of us are at these links. I'm just pointing that out amitcaps.com and bhargav.com Or impel.io for bhargav Go to his startup and you know Get some recommendations. Thank you so much