 We're going to talk about building your own data visualization platform. It's on the data visualization track at ODSE. This is one of the two sessions we have. And this is a big news session, so you don't need to know anything about the work that has been done already. And we can still get a grasp on it. Let's start with an introduction. One of the speakers in the previous session, one of the previous session, he said that use whatever is already created, so that's what I did. ODSE created this batch for me, so I'm just using that for introduction. It's all right here. I am an independent consultant. It's been around three years now. I have an analytics consulting practice. I've spent a lot of time in the pharmaceutical industry doing forecasting for them, but now there's a variety of things that I do. So why did I create this product? I do call it a product because you can start using it as a product. I used it when I started my independent consulting. That's when I built it. So why did I create it? I was working with a lot of clients who had sensitive data, and they didn't want their data to touch the internet at any point of time. So what do you do? You create visualizations on your local machine in that case. And I was using R, and of course the easiest way to go around there is use ggplot2. Everyone heard of ggplot2? So you write a code, and you'll get your chart. But then you have to keep writing that code again and again every time you need a new chart. So what I did is automate that process and build it into a product. So that's what we're going to see here. And that's why the first why is why did I create that product? Why is the layout the way it is? And I'll talk more about it when we come to the layout. We'll do a demo of the product as well. So when we come to the layout, I'll talk more about it. But the reason why it is designed the way it is to make sure it is very simple. This tool that we will see today is not a data analytics tool at all. It's a hardcore data visualization tool. So no data manipulation happens in there. It just creates the plots. So let's move to the demo quickly. You can fire this up directly from RStudio. So I have R, and I have then RStudio, which is the IDE. And I use RStudio. You fire this up from the RStudio directly. There's a button called RunApp. And this comes up. And let's see, I realize I'm not showing on the screen. So this is what the home page for this looks like. Very simple. You upload the delimited file here. And then you specify the delimiter. And then you select the plot type. It has three tabs, which is the summary, data, and the plot tab. And that's what I want to see. I want to know what my data is. I want to see what the structure of the data is. And then I want to see the plot. I'm not making any manipulations on the data here. Right now, it accepts only delimited files, which means it could be a CSV extension. But the delimiter could be comma, or a tab, or a colon as well. So let's quickly pick up a file. I have used a data set called empty cars. Empty cars is an open data set. It comes as a part of R. And it was the data which has captured for 32 cars, I think around 32 cars. And it was just testing data. So it gives you miles per gallon and those kind of statistics, displacements, weight, and number of gears. So you browse it on your local machine again and then pick up the file that you want. I've already specified the delimiter here. And the moment I upload the file, more options come up. So that's something that we call reactivity in this case. So that event will only happen once I have triggered it. So that's how it is coded. These will only be visible once I have uploaded the file. So summary file gives you what all columns do I have, if I may. And these are all the columns. X is the name of the car. Then you have miles per gallon, number of cylinders, displacements, and all the other features of what a car data looks like. Then this is the actual data. So this is my second app. And these are the exact parameters. If I have to search for a specific parameter, I can search from at the bottom. And then there's the plot. So right now, what it is doing is, obviously, the default plot selected was scatter. And then on the y-axis, it is plotting the miles per gallon. So that's the mileage. And on the x-axis, it is plotting x, which was the car name. And this chart doesn't really make sense at all. So what you do is what I want is let's say what happens to the mileage with reference to displacement. And this is a two-dimensional chart at this point of time. So as displacement of the car increases, the mileage will go down. Now, we can also make it a three-dimensional chart. And we can use the color by option. So let's color it by number of cylinders. Can you see the color difference on here? Is it visible? So with the number of cylinders, now we know that more you increase the number of cylinders, your mileage is going to go down. And that's what we see. That top cluster is with the least number of cylinders. And then there are more. Then additional elements, which give a trend line. So it plots a trend line as well, to just see where the general trend is going. Does anyone know what faces or panels mean? Have you guys used this before? Some logic uses this before. So that's what this thing is going to do. So it's going to add another dimension to this. And this is the four dimensions that we can have in here. So it just plots the chart by every number of carburetors. So the AM, is it automatic or manual? That's what the AM says. And zero is, it is automatic one, and it is manual. So that's what it splits this by. And then also, your y-axis could either be fixed or a free scale. So what does that mean? Right now, if you see, there's a lot of blank space on the top left chart. Because there's a single y-scale addressing both the charts. So you shift this to a free scale. Now you have individual y-scales for individual charts. A scatter was not the biggest reason why this was created, though. This was mainly created because I wanted to create box plots out of it. So let's quickly go back and see what happens with the box plot. Now we all know, all of you have used box plots. Not used to make it. Box plots is a one-dimensional plot. You need only one column of data to create that plot. So what happens when you change the data on the x-axis? Ideally, nothing should happen. So x-axis, right now, it is displacement. It's not going to do anything if I pick up something else also. So let's say I pick up cylinders. It's not going to make a difference. But what happens if I change this to group? Now I have a box plot for every cylinder, every number of cylinders. So these are now individual box plots. And now I can see, within a group, how do they compare to each other? And then the other chart types are bar and line and the usual. Now it's not that there can be only four. Once we go to the code, we will realize that there can be as many as you want. And what we're going to do today is I'm making this public, and I'm making the code public as well. So you can go back and play around with it, and we can probably collaborate and see where we can take this. This can be a full-fledged thing. This is a very basic version at this point of time. Let's shift back to the presentation. So the packages that I used is shiny. How many of you have used shiny already? The packages that I used is shiny. We have ggplot2, which is for the plots. Shiny is for the app development platform. Lubricate is for date parsing. And then the last one is only for color scales. So you're free to choose whatever package you want. For those who have not used shiny, and that was the intention when I said this is going to be a business session as well. What is shiny? It's just an app development platform, which comes out with RStudio. So you can use it within RStudio to create all the applications that you want. The app template looks like there's a section of the code, which is the UI section, which is where you define all the layouts. So where what component is going to be located, that comes as part of the UI. The server section tells you how are the objects going to interact with each other, and how is the reactivity or rendering going to be. So that's what happens in the server section. And then there's a call to shiny app. So this is a necessity command, and the syntax is very simple. It just goes UI and server, the last one here. And that's exactly a basic structure of a shiny app. That's what this structure is. You don't need to have both UI and server in the same file. You can have it in different files. If you have it in different files, then you can do it without calling the shiny app. So that's what the basic structure is. Components are you define the layouts, you define the inputs and the outputs, bend you, render components, and what's the reactivity going to be. And reactivity is what happens when you click something, or you change something. So you can also have buttons, rather than selecting components and updating graphs, you can have buttons to update graphs if you run into latency issues. The most useful resource out there is the shiny cheat sheet. Even before you try to run to Google or check out what command to use, this cheat sheet is outstanding. Everything is given here. And the component that I use, the app definition, how do you deploy it on the shiny server? So this can be deployed on the shiny server. I've just deployed it. It took me two and a half minutes. I went in the lunch break, deployed it, and it's live now. So it's available on the internet. All the components, the outputs, inputs, how do you define the layouts? This continues on the next slide. So these are all the layouts that you can have. If you know a little bit of HTML coding, then you can use the HTML or the JavaScript as well in here. So all that comes in, and there's an explanation about the reactivity I've rendered there. Let's walk through the code, what we have used for creating this. I said in the beginning the first section was the UI section. And then the other section was the server section. So this is the UI section. It starts with the title, which comes on the code if you remember, it says easy plot at the top. So that's what the title is going to do. And then you have the sidebar. Are you able to read this from the back? I am not sure how do we zoom this. Let's try. Is this visible? So at the top, that's the title panel. It just says easy plot. That's what I call it because I'm not creative. So I just call it easy plot. It's easy to do. You have the sidebar layout, and all it says is this is the file input, and this is the text input. It's specifically defined. These are the commands from Shiny. So that's what's going to be the file input. It says file upload delimited file. Those are the text. And then it says what is the text input. So you specify the separator. So those two are inputs. Then the plot type is the input. And this is where you change what plots you want. So if you look at this syntax, this is the ggplot2 syntax. When you create plots in ggplot2, this is what you use. garm is what you use. And it has, right now, four. So as I said, if you want to change it, you can add in one more here. And it'll start giving you that in the options. And then we will see how to be related to the output. But this is all the input that's going to happen. And then this is the UI output. And again, another UI output which is set up and face it. This is written separately because this is reactive. Remember, there was nothing till the time we loaded the file. And once we loaded the file, this one came up. You guys want me to go back and show that again? If you don't remember it. Let's close this. Let's open this again. So you see there's nothing on the left panel there. What we saw in the first section was delimited file. They specify the delimiter and select the plot. And once I select the file, then everything starts showing up. And that's why it is in the separate segment because that component is reactive. Those two UI sections, these two are coded in a reactive fashion. And this is what the output is, so the output structure. There are different structure available. I have used the tab set panel. So you can have it as different pages. You can have a navigation menu, or you can just have everything all together, one against the other. But I've just used the tab set. And that's why you have the three tabs there. And all you need to define is what the tab would say. And then we'll pick up the output ID from the bottom, from the server section. Let me zoom the server section. Right. That's the syntax for signing. Server function input and output. Then we have the data file, which we're saying is reactive. So once we upload the data, something happens. And you have the input file, required as input file, read CSV. We have only reading CSV at this point of time. We move to the next one, which is observe event. And observe event is one way to define what comes up when a reactive step happens. And this is the x, this is the y, this is where all the other elements are defined. Train lines, groups, and faces panel, these are the last section, the additional elements that we saw. This is where all of that happens. And if you would notice, it says selected is equal to none and value is equal to false, because these are different input types. You use selected when you have a dropdown. And you use value is equal to false when you have a checkbox. I don't want anything to be checked in the beginning. So and then there's the observe event when you click the facets. So when we clicked facets, let me go back here again. When I click the facet panel, this appears at the bottom and goes away when I uncheck it. So this is where all the facet coding is happening. So facet panel by choices, names, data files. Data file is the file that we have uploaded. We've taken the names for that. So that gives you the column names. And that is being assigned to the choices that you can make in the dropdown. Pretty straightforward coding. And then there's the last section. So this is where you are actually defining the plot. This is what the ggplot syntax is. It starts from here. You have the x, y, the a, s, and everything. If you actually go through this code, you will realize that you can connect it to what we have in the beginning, because scatter and the bar and the box plots, they have all different code components. And we have to have a condition here to ensure that we are picking the right one up. If you look at what we had here, so we are saying, if the user is going to make scatter as a choice, I want to give this as the input to my plot section, which is the geompoint size 4. If he picks up line, I'll pick up geomline, which is a syntax for my ggplot. And then I go here, and I start putting all of this together. So data is equal to data file, then parse the text input, then scale the colors, and that's where the R-color brewer comes in. And then we have the output summary. So this is the first tab that we have in here, which is the summary. I'm pasting it as text only. So that's why you see here, I just do render print of this structure. And then render data table is a different option. So when I do render data table versus a render print, render print prints it as a text. Render data table actually creates a table out of it, in which you can search and change the number of entries. So that's how the table looks like. And then the next, the final one, I have the shiny app UI server call. So this is all a part of one single file, which is right here in my R studio. I call all the libraries at the top. And then the same code is written here. There's nothing else. And then once you save the file as app.R, that's what the name becomes. And then you reload app, or if it is not loaded already. So if I do reload app, it will just reload thing. If it is not loaded already, I just do run app. Now this runs in this window, or if I say open in browser, it will also run on Chrome. So I'll start running on the browser. Also, now that I've made it live, it will run here as well. So there. So there's the link, which you can go to and use. And it's live, you can go explore. And obviously, we'll make the code live as well. Any questions before I move forward? Yes. You can make it interactive, or you can make it static. It's a call that you take in the code. So the idea is this code is out. Let's customize it. Let's create more things out of it, because I clearly know that this is at a very basic level. It is not a full-fledged data visualization engine. But it's there. It's something that fitted my need when I created it, because I didn't want to go online. And I really needed box plots. And I really needed the other plots as well, and to show this quickly without writing, coding in again. So one of the customization ideas that I did have, but I haven't worked on, is accepting Excel as an input. So instead of the read CSV, which you, I think we'll have to stick to this. So instead of the read CSV. Sorry. Instead of the read CSV that you see here, just go and change this to reading Excel. And that will do the trick. Also what you can do is, in the beginning, you can give a drop-down of what kind of file do you want to upload. So you can make it either CSV or Excel, or some other file, and then update this code. Come back here and update. And then we'll start fixing it up. I'll just work through this format, because the moment I go to presentation, it's going to be interesting to come back. Then you can connect it directly to a database. I have built one more product which actually connects to the database, and then creates a time series forecast out of it instead of taking an input for the Excel. But we haven't done that with this yet. But this can be connected to a database, too. If it can read data from CSV, it can read data from a database as well. All it will take a username and a password and a server address. The other customization option that should definitely be explored with this is, what are the plots based on? Right now it is ggplot2. Can this be plotly? And the code is now very easy to change, because the section of the code that needs to be changed is just this one. And obviously the input drop-down that is there. But this code, if you can update it to plotly, it will start showing you plotly plots. This is the component where you change to make this for plotly. And then obviously the visualization options. So additional plot types or less plot types or more interesting plot types that are out there now. And then the plot component customization options. The reason why I have written this is if you remember the trend line that comes up, let me load this quickly again, and just go to the plot. This is more relevant. If you notice the trend line that comes up, it just gives you a line. It doesn't give you any other information beyond that. But what can come up here is the r-square value, if that starts helping. And that can be picked up. It's not difficult to pick it up. You will just render it as a text and post it on the graph, or post it on the side of the graph. So r-square value is one of the things that you can get in up here, or a smoothened line that you can get in up here, or order two is something that you can get in up here. And shiny by itself is reactive. If you open this link on your phones right now, it will actually adjust to the screen. So you won't see this layout anymore. You will see one below the other. Also, if I do the same thing here and move to the plot, you will notice that I'm not able to see the entire plot because I have to scroll. So if I zoom out, it's just going to adjust to the screen and widen it itself. So shiny is it takes care of the screen size that way. So again, I don't know a lot of HTML coding. I don't know a lot of coding also, to be fair. But this was not very difficult to build. It took me some time. It will take you less. I mean, it's definitely something that should be explored, created, and especially people who are beginning because I haven't seen a lot of people picking up data visualization as the first step. And it's easy to do, honestly speaking. It's not difficult, but it will be great if someone comes up of a better version of this, honestly. Right? Questions? Any questions here? Yes? Now it's a public one. Yes? Yes? So shiny has a server. You can deploy a shiny server within premise. So they have a commercial version, and they have a free version as well. Yes, you can deploy it within your premises. So sorry. Well, yeah, I mean, there's an RConnect package that you need to install in the RStudio. And then there's a token code, which you just need to run. And after that, it's very simple. Is this one line of code every time you update it? Can the session be saved in this? Like if I upload a data set and have it there, can that session be saved? Not right now. Not right now, right? It's every time you open the app, it replaces. Yes, yes, it opens up a new session. Yes. Right now, that's how it is configured. Yes? Sorry? Dashboard. Is it possible, Saini? Sorry? Dashboard. Dashboard? Yeah. Create a dashboard out of it. Multiple layout. You can. You can have multiple visuals on the same page, just like you do on a Power BI or a Tablu for that matter. You can have multiple visuals. And again, let me show you this. You see this layout on the right-hand side? Each of those boxes can be a visual. So there's no restriction to what goes where. I can have my visuals on the left, and I can have my selectors on the right, and I can still start using it. All that I'll need to change is the UI code, which was this. So on the UI code, I define sidebar layout, and I define this. So before this presentation at the ODSC, I did not have three tabsets. I had everything on one single page. And that's essentially creating three visuals at the same page, because every box is a visual. You can definitely create a dashboard. You can have selector for each of your visuals. Yes. My question is, if you want to share your app with just a limited number of people, like four or five people, and you don't have a shiny server, what are your options? Shiny server is actually very easy to install. I mean, it's not that you need to buy it. Is there no cost involved? There's no cost involved for the free version of it. There's a commercial version of it, but the free version there's no cost involved. And then you can restrict it using user authentication. Inside an enterprise, you can use a free version of it. Yes, enterprise version, yeah. OK, so you're saying the enterprise version is only for a large scale. Enterprise versions for a large scale, you get support. It's more like R, R as an enterprise version as well. You get support and contact helpline and all of that stuff. So if you have R studio enterprise already, do you get shiny server as well with it, along with it? No, shiny server is a different one. You have to buy it separately if you are buying it. Otherwise, you just take it from R. Most of the links to shiny server, even if you search for shiny right now, it will direct you to the R studio website. Those are the people taking care of shiny now. OK, thanks. Any other question? So summarizing what I told you today, it's something that I built a long time ago. Very simple. It uses Shiny for the web development platform. It uses DGplot2 for visuals. I definitely want some of you to explore Plotly and send me a ping. That'd be really cool. That's something that I'm looking forward to. And then we did a couple of other packages to ensure that the dates are correct. And then we get enough variation in the colors when I do a color buy on the plots. We look through the code. If I actually don't call this as four lines, if I don't call this as four lines of code, this is actually less than 50 lines of code, the entire thing. And it creates the entire visualization platform. So it's not a lot of coding involved at all. It's very, very simple, very straightforward. The purpose of Shiny was to give the ability to create dashboards. That was the original purpose, and that was the problem statement. But it's a web app development platform, so you can create a lot of things that you want. You have a question? Yes. Interesting question. Depending on how do you extract real-time data from your database? So if you apply a reactive command to refresh the data every time the base data refreshes, then it can be done for real-time data as well. Although I do think there would be performance issues while trying to use it for real-time data. If you're using for a lot of data, and I'm guessing when you use real-time data, it's a lot of data only. I think there could be performance or latency issues, for sure. That's a risk that you will fall into, but it can definitely be done. So that's my email idea. If you guys want to reach out to me, those are the links for easyplot. And as I said, deployed it in two and a half minutes. So it's not difficult at all. It's very, very easy, very straightforward to do. And then we have the cheat sheets. And the cheat sheets are not only for Shiny. The cheat sheets are for pretty much everything that you can do in R Studio. Even R. It's for ggplot2. It's for data table. How many of you use data table? Data.table package. Cheat sheets are there for that as well. So a lot of fun stuff when you go to the cheat sheets link. How many of you already use cheat sheets? Right there. So less than the one who are using data table. Yes, question. Open source R package. S6. No, S6. Yeah, that one. I think it can supplement. It can. So the thing with that package is, do you want to tell everyone which package you're talking about? Hello. This is given by. So I have the URL, maybe. So it is developed by Dreams R, a French company. Right, right, I guess, yeah. So it's a built-in package. Sx, right? How do you pronounce it? It's French. It's going to be difficult for me. Yeah, Dream R is built by. Dream R, Dream R. So what they did is they bought Tableau-like visualizations to R. So how many of you have heard of this package? You have? Good. So what now you can do is, using this package, you can do drag and drop. And you can actually do drag on the x-axis from the column names and drag on the y-axis from. It's a pretty cool package because you guys should go and check it out. The thing with that is, you have to supply the data beforehand. So you have to run a code to supply the data. Now what you can do, using this, is upload the data here and then have another button which links the data to your new package. So then you can start doing drag and drop there. So it's proper Tableau-like visualizations. If you guys haven't tried that on R, I mean, you guys should go and try it. It's pretty cool. It's really, really cool. I will send a spelling for that. Maybe I can share it. It's ESQ-I-SSE. SKIS. SKIS. Hopefully. So it's very Tableau-like. But you have to supply the data beforehand. It reads the data frame. And then you run this package. I didn't try it a lot. I just found it cool. It has performance issues, though. I have a question. Do you have any thoughts around large scale visualization? I've tried HoloViews. I've tried GPU-based techniques. But it always fails. Beyond 10 million data points, some other thing fails. One way is I can just sample it. But what if I want to visualize that? Do the entire visualization? Yeah, no. I haven't tried that with R. I haven't definitely tried that with this. And I'm telling you right now, it's going to be difficult. Because there's already a dash. The purpose for this was dashboard. So this is not going to work for big data. It's definitely not going to work for big data. Honestly, no, I'm not sure if I can answer that question based on what I've tried to do with this package. For the other things, probably, is it necessary to use R or Python or open source for what you're trying to do? Or can you, I'm not sure if there's something on the Hadoop side of things which can do this. But still, visualization should still be front-end right? Very quick. Like, yeah. Yeah, it should be quick. Absolutely. And you should get all the data to JavaScript or whatever, the front-end rendering engine. But I don't know. I'm not aware of any. No, I haven't tried it with big data yet. Someone else had a question. I saw a hand go up. Thank you guys for listening. Thank you.