 Today we have with us Sainthika Bani, so she will be presenting a talk on visualization and underlying psychology over to you Sainthika. Thank you for the introduction, so good afternoon, I think it's the start of the second day of PyCon, we are absolutely energetic and thrilled to be part of the conference. As part of today's topic, I will be covering two important concepts visualization and psychology. The topic is not very heavy from the name that it sounds, we won't be touching much upon the core aspects but rather the concepts that build these. This will help you not only in your academic careers, you could also apply them into your day-to-day professional life and mostly these concepts get violated in in view of the aspects and we might just ignore them or don't and subconsciously try to ignore them and don't particularly embed that in our minds. The agenda looks I am expected to complete the entire session within 30 minutes. The first we will start with an overview where I'll talk about the history of visualization, understanding how visualization actually existed back then and what we see in modern world right now. They are similar aspects but what evolved in that entire journey. Life-acted, it's very important that we understand what the mathematical concepts are there behind it and how exactly we could implement in our visualizations that we can see. Again, these are concepts and these are language agnostic, be it Python, be it any software, it is not violated by default but when it gets violated we have to take certain steps to understand and replicate the precautions that we have. Primal instincts of human being. We are always thought that up is high and down is low but when these get reversed in certain visualizations our mind doesn't process it too fast or it takes some time to process it or completely ignores it as well in some times. So we'll touch upon a few aspects in that. A short case study, here is where I will take your inputs and present my thoughts and together we'll combine and come to a conclusion. Hopefully if there are no technical glitches from my end, we will be able to cover it within the simulator time. A little bit about me, I'm Sainthika, I am a data engineer at Quonsite and I'm also a student of business analytics at ISB. So in my day-to-day life I deal with both academics and the professional work and I try to combine both of my learnings and my journey and try and this talk is all about presenting what I have learned in my journey. Once we can see this plot like let's state like a 10 seconds and just look at these plots. This is one of the historical representation presented by John for the Botanic Atlas. This was the first visualization, I'm calling it a visualization because the first visualizations in the history were made by the cartographers who travelled the entire world, went to overseas and then came up with the paths that helped in trading, that helped in a lot of other commercial activities. So this is one of the most important aspects as the history comes up is that this incorporates the concept of scale. When we talk about visualizations, we talk about bar charts, we talk about scatter plots, we talk about spark lines, we talk about n number of other aspects but one thing that remains consistent among every visualization or should remain consistent is the scale. We should not violate the scale, the x-axis, the y-axis even if there is a z-axis the scales should be consistent to give the accurate statistical representation. So here we have like six plates and each presents a particular section of the Botanica Road Atlas. Don't worry if you're not able to trail everything in one go as we move forward, we will able to combine both of these aspects together. This another visualization by William Playfair very combined, the prices of wheat, weekly wedges and the regum oak in the 250 years span. So every small grid that you see over here is a century, 16th century, 17th century, 18th and so forth. It combines a multiple aspects in one graph. So today we also see such dashboards coming in and this is entirely done using simple parchment papers, ink pen and every scale and every detail is handcrafted. So visualizations are there from a long time. So what is new today? A standard plot. Now here where we deal with the modern visualizations that we see. What are the components or what are the directions or dimensions that we can introduce to a plot? The first let's start with the x-axis. We will have certain quantity, maybe a categorical, maybe a numeric entity on the x-axis. There would be a y-axis. There would be some points or some values that could be also represented by the size. Another, we can also divide those entire populations, small samples and present it using a different color. And overall if we have a time series data or we have something using a panel data and it has some time series concepts, then we can also introduce a concept of animation. Shape is another one which is, if you are using color, there's not much requirement of shape but color and shape are used interchangeably in a plot. Texture is a bright red dot because it's my view that with today's visualization and how we perceive it, texture is not highly, like don't use textures. It doesn't communicate a lot of value but these are theoretically the six dimensions that you can contribute and make a perfect plot which is not cognitive heavy but in the same place gives you a lot of information. So we are going to see one of the examples done in Tableau. So this is a Hans Rosling Gapminder which I have replicated in a Tableau format where the x represents the fertility rate, the y represents the life expectancy, the colors represent the continents and when I hover over it, the size represents the population. So bigger the bubble, the higher the population and animation as you can see is a years on just right here on the top and it also has the trails. So whenever I click on the bubble, I can also find a trail from where it went from x-axis in the fertility rate kept on, kept on, decreased how is the life expectancy is varying. So this graph is an interesting example where we are not loading our users with a lot of information. It's not cognitive heavy, it's light but at the same time we are communicating five dimensions. This could be done in a lot of different format. You can do a line chart, you can do various other plots but how exactly these dimensions combine together and gives you a nice interpretation is once such examples. So I mostly want to talk a lot and at the end we can have the questions that you might have. So this is very important from here what we are going to learn. Text heavy slide, yes it is a text heavy slide and this concept requires certain understanding as well. What exactly is like a life factor? I'm not going to read this but I'll give you an example. Given that 10% has been represented by a line of 1 centimeter and 20% is represented by a line of 5 centimeter. Scale is a linear concept but whenever we play around with the scale, the life factors are introduced. Life factors could be also introduced when there is a squaring effect like your area from x and y represent for 10% represents around 10 square meter while for 30% it goes around to 50. So that is where life factor comes into picture. Formula if at all we are saying that oh there is life factor in your graphs and I'm trying to communicate to maybe my client maybe some other users how I need to have certain validated data or a formula that precisely explains that. The formula is very simple. Here we are taking an example where the fuel economy of standards though the back at the very back from 1978 18 miles per gallon is represented by 0.6 inches long while 27.5 miles per gallon is represented by 5.6 inches long. So similarly if you have done some standardizations or normalizations in your day to day data journey this is the formula that we will use. So we simply divide both the content that's the gallon and the inches and take a ratio of it. So 100 would cancel out and finally what you get is 14.8. Let's interpret that. If at all this had been a linear entity we should be getting close to one. So sometimes there could be an interval of 0.05 plus minus on both the intervals of the confidence interval but it should be ideal cases it should be one. Now we see a factor of 14.8 means what we are seeing in a visualization and the statistical data that should actually communicate of inflated I would say the word inflated and maybe you are just bragging about what you have done but your visualization and the percentages don't match and here we see a 14.8 factor. So sometimes you might have seen in newspapers when there is a political journey been represented or there is how the revenues have been casted. There can be certain interpretations where visualization what we want to see and the percentage that we want to see are very different. Life factor which we can see here this again this represents a certain quantity x1 this represent a certain quantity x2 but where is the difference coming in. The difference is you are increasing in both x and y axis so technically you should only have one one dimensional increase that either x axis or either y axis but here we are taking an x and y combination. What happens when I see these numbers I will just perceive that okay 30 is very big maybe my client B is performing way way better than my client A while we see the difference is just 20% but already my mind is loaded and blown away by how the area has been interpreted. We are attracted to visuals we are attracted to colors. We very well interpret them but we are not very good at interpreting text or interpreting the the math behind it it may come in the latest stage but the first is instinct is the area the color and we just take that for granted and move ahead. So these kind of visualizations are very popular in malls when you see this percent is of that 20% off and somewhere it's just like like 30% off but it's much much larger or the representation is very bold bright that just attracts you like 30% okay this is bold representation and I must go ahead and buy some amazing not amazing everything that it's the ball it has to offer me. Squaring effect is like this is one of the representation that I found on internet and I found it funny and I found that having life factors both at the same times this represents that how are the different os at back at those days we are performing so you see non meaning people would have used their root systems or linux etc windows and macintosh so windows as a tiny tiny member and say it's just just there macintosh oh lord but when you actually put the x-axis and y-axis which is in this case very difficult because you don't know where to put it do you put at the end do you put at the bottom how exactly do you comprehend that then the first impression that it comes is that the bigger one is the most popular one and we just move ahead and say this is the most popular item and I would want to have it or I want to proceed with my purchase so we are moving ahead with an interesting case studies I will try to ask you a couple of questions and also get your answers and we will combine all the concepts that we have learned together and gel them together in this case study so I will move ahead and show you the representation so take a moment like take a minute I will be silent for that time and just try to interpret that what do you see in this graph my first question and do you see anything which is incorrect my second question so one is a subjective type another one is a boolean two false type so I'll try to see the chats now that's a very interesting wow amazing I'll wait for maybe 10 more seconds just it's it's doesn't have to be correct go ahead and put your thoughts so I see the chats every answer I feel is very relatable and correct the first thing that you see first when I saw this representation I was come I had a certain direct connection between the gandets and how the representation flows I saw the representation of flowing blood that was my first instinct that suited me the second instinct that suited me that after the inaction of d200 2005 florida enacted stand-air crown law detailed rates went very down but when we see on the site the scale it's inverted this was my third step that then analyze okay this is incorrect it has gone very very high in that sense so this is how life factor can get incorporated in many of your visualizations so all the answers that I've got are absolutely correct and that's that's that's one way of when you see a visualization you don't jump to the conclusion you see these factors you see the concept of scale and you see the primal instinct for me always up is high so that's what I was looking for and low is down anything that comes down should have a lower value anything that goes up that should have higher value that is how the primal instincts work but here the author has very interestingly inverted the y-axis and hence the context that we wanted to have has been lost all the answers that I've got the inverted y-axis the addition of art the statistical meaning is hard to comprehend I would rather would want to have it the reverse like up is higher and lowest bottom the usual that we see but here it's different and it is incorrect at at any point of time if there you see some visualizations that are doing that it is incorrect it is incorporating life factor and you have now a basis to present your argument if you see such interpretations or you see a certain certain slide decks or pictures which is happening in in your own organizations and this really does happen in few cases where maybe certain certain aspects certain numbers are not very felt gelled and we want to have visualizations behind it so amazing all all the comments are very interesting and so takeaways from now what we studied is how what is the history how history went history was not all about scale but soon the scale became very important we saw two interpretations then we saw an interesting gap minder in action where we saw the five dimensions playing in very interesting role play and they are not cognitive heavy as well so when you see those visualizations you don't feel overwhelmed so this is one of the very important aspect in in in our day-to-day world when we see there is so much of data there is there is no breadth of data anymore you can pick any data set and implement visualizations sometimes we are we might be in a situation where we could be asked to have some some really out of world visualizations or they call upon something which is little bit not normal or I can use the word here fancy but in those situations please don't violate these rules what we talked about that and also don't overload your users instincts the humans are not very good at when a lot of things are thrown at our eyes we just perceive that or basis of shape basis of bright colors and very very few times we would look around if we are not like very data driven or we just in a day-to-day world shopping around or trying to interpret a lot of things we miss those and as as anyone as in in in the data feel if we are having visualizations it's very important that we we embed these in our in our plots in however in any any plots that you might be using in in your journey again interesting another interesting point is that it may be also very much possible that the incorporation of 3d plots is being used so the 3d plots also dwell into the similar challenge perspective anything that comes in the bottom in the front and anything that that goes behind the behind one could be perceived as as has been having a low value which is because of the perception we would want to see the anything that's in the front is larger bigger important rather than what we see at the back so there are a lot of other factors that we could be incorporated and hopefully you could also reach reach out to various ways to make you have you're interested and you would want to explore more so these are the takeaways that we have don't violate the primal instincts of user definitely don't use consistent scales skills are important avoid life factor so you we saw a life factor in one dimensional but it could be also in 2d so you at that time you take the area if it's 3d it goes one dimensional dimension beyond so here it gets complicated and complicated the formula gets a little complicated way beyond that reduce cognitive load we're already bombarded with a lot of information set out and it's it's kind of our duty to make the basic use that we first came up like visualizations are meant to be easily consumable and not hard and and it should be easily digested by anyone who who is looking at them so please make sure that it's not cognitive heavy so finally I would like to thank my professor from whom I also gain all those concepts I was new at that time and when I saw these being incorporated in few of the areas I got pretty much interested and I dwelt more into that so thanks to professor john and here the few links that I have been using in my entire presentation you may go and take a deeper view when the slides are being shared these are the two ways which I am actively emails you can email me at this email address or this my Twitter handle you could also DM me or or just just whichever medium so it's best to you thank you everyone and it's it's been really interesting and I am very happy to share few of the concepts with you today so we have five minutes I'm happy to take any questions that you might have thank you scientific for the wonderful talk visualization is actually a wonderful topic since I am from academic background it is of particular interest to me there is one question which I have received right now you can also see it on the screen is lie factor valid for non-linear skills logarithmic for example if not what else statistical major we can use so if I am getting this question correct so lie factor is basically a ratio so if you have any ratio that's coming in the scales automatically gets cancelled so I believe and any any lie factor if it's two-dimensional or it's just one D you could use the same formally and and you can move ahead don't worry about the scales there