 Okay, well, we're dead on five past. I actually can't see how many participants are in here, but I'm hoping that there are some people here. So I'll just get started. So hi everyone. My name is Louise Capena and I'm a research associate at the UK Data Service. I work within the Computational Social Science Team and our main goal is to spread the word about exciting computational methods that can be applied to social science research. So today, what we're gonna be exploring is sexual orientation and gender identity through interactive visualizations. And hopefully I'll be able to inspire you and convince you of their importance. Also listed on the screen are our other free contributors to this project, which include Jay Kazmaier, Nadia Kenner who's joining us as a facilitator and Ali Bloom. Okay, so just a brief outline of what we're gonna be covering today. So first we'll start by defining interactive visualizations. I'm gonna show you some examples. Then what we'll do is move on to my mini project which looks at gender identity and sexual orientation. What I'll do is I'll briefly cover some of the background so we'll look at what data already exists on gender identity and sexual orientation. Then we'll move on to the app that shows the visualizations that I've created. After this, we'll discuss a bit more in depth why interactivity is useful and finally we'll go to Q&A. Just to note as well, there's gonna be a couple of QR codes that are shown throughout the presentation. These are gonna link out to my app which shows the visualizations and also at the end to my GitHub repo as well. So you might wanna have your phones out and ready just in case. But what I first wanna know is have any of you seen or used interactive visualizations before? Okay, so we've got, and now we've got some people that I'm sure a lot of yeses. I'll just wait a little bit longer, let some more folks come in. Okay, we'll just look at that. But I assume that there's probably, there's gonna be like a bulk of people that will have come across them before. You know, a lot of their websites now have interactive visualizations embedded. You get a lot of them on apps as well. But what we'll do is we'll have a look at some examples as well. And I wanna just state the definition. So an interactive visualization, it's simply a visual representation of data that allows users to engage with it. So that can be done through various actions such as clicking, hovering, zooming, or dragging. So we'll have a look at the first example and I'm gonna click out and show you it more in depth. So for my first example, I decided to look to Reddit. Those of you that use Reddit might come across the subreddit r slash data is beautiful. And this is a really good source of inspiration if you're looking to create your own visualizations. And I first, I found the first example here, which is a simple interactive mark and it shows the peak foliage across the US during the autumn season. Emma should be put in the website in the chat now, hopefully. And you can switch out and we'll have a little play with it. So I'm just gonna switch my screen. See, okay. We can see that what we have here is this simple slider at the bottom, which we can drag to the left or the right to view the predicted foliage across the season. So if we drag it towards the end, you can see that we're past peak foliage for a lot of these sort of northern states and we're coming into the better looking orange leaves now in the southern states. And this was designed basically for travelers to use as a tool to suss out what state and county they need to be in to see the most photo worthy orange leaves. So as you can see, it's quite simple here. We've got our legend, which shows, so you know, here there's no change, getting some peak foliage on the top here. So yeah, quite a simple one to start out with. So I'll just quickly switch back to Mentimeter. So the method of data collection here was actually quite simple. If you do have a play around with the website, you can see that they have this online form where they collect info about the foliage status across the counties of different US states. So quite a nice, simple, but a topical one to start with, but let's move on to a more sophisticated example. So the second one that I'm gonna show you is from another, this is a really great website called informationisbeautiful.net. It's a really good source of inspiration. You can find a vast range of really interesting visualizations on a ton of topics. And they're always really fun to play with and just super beautiful to look up. So again, the website's gonna be put in the chat. We'll go to it. I'll talk through it a little bit. And then we'll look at some of the stuff. So I'm gonna zoom in a little bit. So the example here focuses on popular health supplements and it shows which ones have the most evidence behind them and which ones are total fads. So we can see there's this y-axis here and it indicates the strength of the evidence behind the particular supplement. So we go from strong to good, all the way down to no evidence. And there's also this worth it line on the x-axis. So all the supplements that are above this line are the ones that are most efficacious and beneficial to take. The bigger the bubble is, the more popular the supplement is. So the more people that have gone to Google it and research it. And you'll notice that what we're looking at here is a lot more sophisticated than the previous example. So you can see there's way more interactive elements. If you scroll over one of the bubbles, you'll get short summary of the evidence and what it shows. And you can also, if you click the bubbles, what it'll do is it'll take you to the specific journal article for that supplement. So you can see here we've got this one on long-term coffee consumption and risk cardiovascular disease. And if we click on that and go back, you can see that it also has these really cool filters. So you can filter for specific health conditions. Say for instance, we want it to look at eye health. It'll isolate those particular supplements. So it doesn't look like we have any super promising ones for eye health, but we've got a few down here. And in terms of the data collection process for this, you can click to see the date sheet. I've got it open in it and another tab here. And you can see that it was done using Google Sheets and there's various columns such as popularity. So the metric for this was Google Heats. So how many times people were looking for this supplement and that determines the size of the bubbles. We have our evidence column. So we have a score from zero to six. We've got our links as well and so on. So I'll just switch back, close these off. So now that we've explored two key examples, what I'm gonna do is I'm gonna move on to my own project which uses interactive visualizations to explore gender identity and sexual orientation. So what we're gonna do is we'll have a look at what research is already out there now on these two topics. So in terms of sexual orientation, prior to the 2021 UK Census, relatively little research addressed the UK's sexual orientation profile. For example, sexual identity information was collected in Natsal, Understanding Society and the Health Survey for England, but not at lower geographical levels and often with quite limited sample sizes too. The ONS has also produced experimental statistics on sexual identity and they've done this using data from the annual population survey. Unfortunately, this data is actually provided at local authority level but unfortunately it's considered to be unofficial as it was created primarily to assess methods rather than to produce results. So there's not actually a hell of a lot to work with here in terms of sexual orientation data. If we then look at gender identity, we'll find unfortunately that there's even less data available. There has been some relevant questions that have been trialled in various UK surveys but the best estimates that we currently have come from admin data such as in higher education. Internationally though, we do see questions on gender identity now popping up in various censuses including in Canada, Nepal, India and Pakistan. So in order to catch up with that international research, the following questions were then added to the 2021 UK census. So you can see I've put them on the screen and we have which of the following best describes your sexual orientation, there's a number of categories and a writing option and then we have is the gender you identify with the same as your sex registered at birth? There's a yes and a no option but there's also a right and option here as well. And both of these questions were voluntary but it's worth noting that the wording did vary slightly between the different countries. The census also allowed respondents to request an individual form that would be kept confidential from those in that household. So now to the main question then. So why make interactive visualizations of this data and how can these visualizations improve our understanding of census data? Well, you'll remember that I mentioned that we didn't have much data alone from geography nor do we have large sample sizes. And in this sense, the census data provides a novel opportunity to really bridge that gap. And it's worth saying that while I'm not gonna be focusing on geospatial visualizations, my colleague Nadia did some really interesting research before undertaking her part of the project. And she noticed that the ONS interactive maps were only single-variate. So her part of the project looked at building interactive multivariate maps for these two census questions. And if that is your particular area of interest, then there will be a link at the end to a GitHub repo where you can sort of explore Nadia's work further as she created her own app as well. But as to how these visualizations can improve our understanding, in general, interactive visualizations allow you to get hands-on and explore data with more specificity than you have with a static image. Also, when you're working with lower level geographies, with larger sample sizes, and with multivariate data, that makes for quite complex data. And what interactive elements can do is they can really reduce that complexity by enabling users to focus on specifics and subsets of data where they can then discover patterns and correlations. So Nadia was focusing on geospatial stores. What am I going to focus on? Well, there is, on the bottom right, you can see that I've got my first QR code. Now that will take you to my interactive scatter plots and tables. But before you rush over there, I just want to explain what it is I'm actually going to be looking at. So obviously there's a ton that we could have delved into these questions, right? Two new really interesting questions. But given that these census questions are novel and they're voluntary, I thought it'd be interesting to consider factors that might influence the rate of non-response. In particular, I was quite interested in this question on gender identity. As I thought, well, it's probably less familiar and less clearly loaded than the sexual orientation question. And perhaps responders then might not answer that question because they don't understand it. So that led me to think, okay, well, what factors could influence that understanding? And that led me to look at the main language and religion as two really key factors. And that's because they can be closely tied. For instance, Sikhism is closely tied to the Punjabi language and Islam is closely tied to the Arabic language and so on. So what I hypothesized is that non-response rates are going to be different between respondents whose main language is in English. And I further hypothesized that what we'll see is different non-response rates between the two questions and partly due to that wording of the gender identity question. So for instance, for a religion such as Islam, I thought, well, I'm probably gonna expect that we'll see a much stronger relationship between non-response and religion for the gender identity question as opposed to the sexual orientation question. So I expected a much stronger positive relationship for this gender identity question and a higher non-response rate. So before we see whether that's actually the case, I just wanna briefly talk about my computational environment. So I decided to write my code in Python as that's what I'm most familiar with and I did this on Jupyter Notebooks. And then what I did is I deployed these visualizations to Heroku, which is a cloud-based platform. So that's where my app is. But if you are particularly interested in the ends and outs, again, get in touch at the end and there will also be that QR code that's gonna link out to the GitHub repo. So now what we'll do is we'll take a look at the app. So if you scan the QR code, we'll go ahead and have a look at it. So the scatter plots for sexual orientation on the left and the gender identity there on the right. And both of these show the relationship between the non-response rate and the percentage of non-English speakers in different local authorities. And I deliberately chose to display them side-by-side so that you can get an idea of whether the findings differ between the two. So the package that I used to create these is a package called Bokeh, it's a Python package and it has loads of really cool interactive features. So I'm just gonna zoom in a bit because I appreciate it, it can be a bit difficult to see. Hopefully that's enough. So what you'll notice is if you hover over an individual data point, you can find out what local authority refers to and the specific X and Y values. So for instance here, we can see that for our gender identity dataset, Brent has the highest non-response rate at around 10%. And we can see that around 34% of its inhabitants don't speak English as their main language. Also you can see it's placed in the dataset which is represented by its index number which is quite handy if you want to quickly look it up. You can see as well that we also have these dropdowns through your scatter plot where we can switch between data points by region. Okay, yeah. So yeah, we can switch between data points by region, by urban and rural classification and also by something called the Shannon index which is a measure of religious diversity. And if you're not sure what they all correspond to you can look up our little description box here as well so you can check that out. And a really nice feature of the region and the urban rural classification plots is that they have these interactive legends. So if we look at the region plot, what we can do is we can remove some regions to reveal some interesting patterns. So if I just go ahead and click on these or click on a bunch of them, let's just leave. London and Wales. So we can see for London that there's a fairly positive relationship between the non-response rate and the percentage of non-English speakers. Whereas with Wales you've got sort of a strong concentration of data points along the small range of X values with quite dispersed Y values. And that could prompt further questions. So maybe we can look into the religious diversity of Wales to find out more about the local authorities in this region, okay? And if we scroll down to, you can see the second set of scatter plots and these focus on the relationship between different religious groups and their non-response rates. So let's have a look at the hypothesis from the slide before. So we can see that for this gender identity question there's a very strong positive relationship outlined here for Muslims, which is quite interesting, right? Because I mentioned before that when I started working with the census data, I thought that I might observe a much higher non-response rate for this community due to that association between Islam and the Arabic language. And if we compare the scatter plot, I said, I thought that I'd probably have a less strong pronounced relationship for the sexual orientation question. But you can see that it's actually similarly a strong and positive relationship. And if we actually go down and check the data tables, we can see that the non-response rate amongst Muslims for the sexual orientation question is actually slightly higher than it is for the gender identity question. And this is the same if we look at the Sikh religion as well, which is quite interesting. If we look at other religions, so let's for instance look at Judaism, we can see that there's actually a much less clear relationship here. And in fact, this graph is actually quite difficult to examine because of this tight cluster of data points. You can see if you scroll over them, there's just an absolute bunch of them stacked together. Fortunately, what we can do is we can use both caves in built zoom functions. So you see here, we've got this toolbar on the right. It's got the pan function, which is already activated. So you click and drag, you can send to the data point that you wanna look at. If we use the wheel zoom function, what we can do is scroll into the, look at the data points that we wanna look at. There's also something called the box zoom function. So let's say I wanna look at this, these little data points here, you can highlight the data points and it'll zoom right in for you. You can get a closer look at those local authorities. Then if you sort of played around and you feel like you've got a bit lost in the axes of a graph, there's this handy reset function too. So you press that and it resets to how it was. Also, so you've got these data tables here which show a non-response totals and non-response rates. Then if you go all the way to the bottom, I've got a little breach section on the relationship between non-response and sex for each data set. So you can see that the female non-response rate for sexual orientation is slightly higher and that this relationship is reversed when we look at gender identity. So it's a slightly higher male non-response rate there. So there's tons of stuff that you can do with this and I've got many different album news that I wanna take in. They're really know which direction to go in next but I'm thinking that I might expand this section and have a more of a look at the links between sex and sexual orientation and gender identity. So that's something that interests you do stay tuned for when I eventually get around to updating it. But for now, let's head back to the slides. Let me just... Sword, okay, right. So now that we've played around with some of these interactive tools and features, I'm gonna outline what I think are five key strengths that interactivity can bring to your research. So first of all, I think it's great at improving user engagement. As researchers, you know, I'm sure we're all used to the static graphs and tables that we sort of showed to the back of our appendix of academic papers. And I'm not saying it's wrong at all to do this. Just the interactive visualizations are a great supplementary addition to these papers. And that's where packages like Bokeh can come in as it can provide you with those more eye catching and engaging alternatives for your readers. And it's also beneficial if you're gonna be presenting your research in different settings. For instance, if you're presenting it to a large audience or a conference like I am, it's, you know, if you're at a conference and it's quite cool to have an app that your users can then use to interact and explore your data with. And, you know, let's be real, everyone wants more eyes on their research, on their data, on the cool things that they're looking at. So it's a pretty useful thing. And obviously I'm totally biased, but I think from what I've just looked at, you can see that the colors and the graphs look quite sharp and clean. Obviously we're zooming in a bit, so there's a little bit of blur, but with most visualization coding packages, there's gonna be a ton of different customization tools. So you can really tailor the final product to your specific needs, you know, you can design your own color palettes to suit. Maybe you've got your own specific brand you're trying to plug, you know, you can do a lot with these packages. Secondly, interactivity is great for its flexibility and its ease of use. So the built-in functions that come with these Python packages make it really easy to play around with the data and to identify certain patterns. So in the demo, I showed you just how simple it was for me to remove different regions just by interacting with the graph's legends. So that's just one function in the bokeh package that allows me to do that. And then all I had to do when I showed it to you is just click on each of the regions that I get one and it will remove those data points from the graph. Furthermore, what interactivity allows for is a more in-depth data exploration. With the interactive tools, we can be more granular and specific in our analysis. So you'll remember that I demonstrated how we can scroll over data points to get more info about them. And that can be extremely helpful. For instance, you know, maybe we spot what looks like an outlier on our scout blog and instead of then going to consult a table to find more info about it, you can get that simply by hovering over the data point. And the specific package that I used allows you to add a bunch of informations for each data point with, I think a bunch of functions called tool tips. And you can add a bunch of information for each data point and it doesn't just have to be limited to, you know, its name and its X and Y values or its label. You can add any information to it that you want as long as it's in your CSV or Excel file that you're using to feed into that specific graph. And I've also demonstrated other useful functions that allow granularity, such as that scroll zoom and the box zoom, which allowed us to interrogate different data point clusters. Aside from exploration, interactivity also benefits users by providing a multifaceted analysis where, you know, they can kind of get as much out of the data as they want. As you can explore data from multiple perspectives within a single visualization. So with that first set of scatter plots, we can pose multiple questions about the relationship between non-response and non-English speakers. You know, we can use that, we can do that by using those dropdowns to interrogate confounding factors, such as region, rural, urban split and religious diversity. And that's gonna make for a more nuanced analysis. For instance, then we can go on to ask further questions like, well, do urban areas with a higher religious diversity have a different pattern of non-response compared to rural areas? So there's a lot like said that you can get out of these visualizations. And it's maybe not an entirely obvious point and definitely one with a few caveats, but interactive visualizations can in fact improve accessibility amongst certain demographics. So I mentioned before that when you're coding and you're using various visualization packages, you're afforded a lot of control over what the finished product looks like. So for instance, Boke includes a number of accessible palettes which are pretty useful for those that are colorblind and use one of those for my scatter plots. And there's also other packages out there like Dash, which works quite well with under the visualization package called Plotly. And Dash allows for accessible rich internet applications. So they're referred to as area attributes and they can provide clear labels for objects which can be pretty handy for screen readers. However, of course, obviously there's gonna be limits as to how accessible you're gonna be able to make an interactive visualization, but I thought that was just an interesting one to point out. Okay, so now that I've drawn out what I think the five key strengths, I'm quite interested to know just on my own sake to see what the software landscape is looking like. So out of interest, what do most of you guys use as your main software or programming language for your day-to-day research? So that could be Python, it could be Excel, it could be SPSS. What can then do is, I'll suggest some relevant tools and programming packages depending on your answers. Okay, so we've got Excel, I'll wait for a few more. Snapsurveys, haven't heard of that one, Esri. Okay, quite a few people using Excel, Power BI, RStudio. Okay, well then, let's start with R. So when it comes to R, you can use something called the plotlet package. And I've been told by Nadia that it's the best package to get started with if you wanna create interactive visualizations in R. It allows you to create interactive charts, plots and graphs within RStudio. And these can then be easily shared or hosted on Dash. So Dash allows you to easily create an app to display and then host your interactive visualizations so others can interact with it. Other packages that have been recommended to me by Nadia include Shiny. So Shiny is designed for building interactive visualizations which can then be hosted online as a Shiny app so anyone can access them. If you're also particularly interested in interactive mapping, another package that was suggested is Leaflet. And this has the advantage of having quite a simple syntax to as well. So it's quite good at the beginners. But yeah, I've noticed that probably the biggest one that I'm seeing here is Excel. And it's true that, you know, I see this Stata, Stata, Reluctantly. Yeah, fair enough. It's true that these are mostly geared towards static visualizations. If you are super wary, not that you should be at all, about sort of moving towards a program in language like Python and R, there might still be things that you can do with this software. So with Excel, for instance, I have heard that it does have some interactive features. So slices, timelines and pivot charts, if that means anything to anyone, which can be used to create interactive visualizations. These can also be apparently embedded on webpages if you use things like OneDrive or SharePoints. So users can actually interact with your visualizations directly on the web page. Another option and one that's been mentioned is another option is to use Excel in conjunction with Microsoft's Power BI, which is a useful tool for creating interactive visualizations and hosting them online. So what you could do is you could import Excel workbooks into Power BI and further enhance the interactivity before sharing them online. But yeah, in terms of stuff like Stata, unfortunately, it's interactive visualization capabilities are quite limited. I think, I know for SPSS, it does have some capabilities with its graph board, but it doesn't have any built-in features for embedding interactive visualizations online, which is a shame. And I assume this seems probably true for Stata Let's see what else we've got. So Qualtrics, I've heard of that. And I've heard of SurveyMonkey. A lot of people use that, but you know, a lot of people have probably filled in some ones like, I know a lot of my friends that uni did like psych. The end of year stuff was done on SurveyMonkey, but I'm not sure about if it has any sort of interactive elements, MS bonds, but you know, these are all like stuff like MS bonds. SurveyMonkey, they're great things to learn. I'm sure you can export CSVs from them or Excel files and then feed them in to, you know, a visualization package. And I'll just talk a little bit about what Python can offer as well, because I noticed someone has put that. So what I use for my visualization was Bokeh. It has the advantage of not being too complicated for beginners. And you know, it's been designed expressly for creating interactive and real time streaming visualizations. But there is also the package that I mentioned for R. There's also that version of the package in Python, which is Plotly. There's also something called Altair. And these two are probably best for beginners altogether when it comes to that they've got really intuitive APIs and really good documentation. So especially Plotly. And Plotly as well allows for quite aesthetically pleasing visualizations and a bonus of Altair, by contrast is that it's got very simple and user friendly syntax, but apparently it's less flexible when it's dealing with large datasets. So if I was starting out with interactive visualizations in Python, I'd probably opt for Plotly due to its popularity. If you use a popular visualization package, you've got a great advantage because it means there's gonna be a ton of user forums that can help you. And they'll be probably like, I don't know if any of you are familiar with, there's a website called Stack Overflow. It's great if you've got code and issue. You know, it's always beneficial that it's got a large user base the package that you're using because it means there's gonna be a lot of previous questions on it that you can get help with. And the great thing about Plotly is its integration with Dash, which as I said before when I was explaining it, but it can allow you to easily create an app to display and host your visualizations. So the point that I'll make before I move on to taking some questions is just about programming languages. So the point that I'll make when it comes to programming in languages like Python and R, is that they're gonna offer you a higher degree of customization and flexibility when compared with tools like Excel or Stata or SPSS. And yeah, understandably there's gonna be a trade-off that the learning curve can be slightly higher. But the reward that you get is the ability to tailor those visualizations to your specific needs as well as create automated workflows. So, you know, you could create a Python script that every time you work on a new bit of data, you've got your CSV file, you feed it in and then it points out some great visualizations and then you host it on and up. When it comes to the cost as well, Python are the open source and the free to use. So all those packages, you know, they're gonna be free, which is a big advantage if, you know, you're out of uni and you find that your SPSS or you start to license those run out, which has definitely happened to me before. And finally, the last thing I'll say that is in terms of integration, interactive visualizations can be integrated into web applications and other software more easily, which is a big advantage if you're gonna, if you're seeking to create a web-based visualization. So like mine, where you can just give someone a QR code or a link and they can access it anywhere. Coding long is just pretty good for this. And you also don't need to be from a coding background at all. I'm not, I did politics at undergrad and then just have an interest in it and decided to do a master's in it, but you don't need to have done these things to be able to get into coding. So yeah, that's the last thing. I'm gonna say, and what we'll do now is we'll take some audience questions if there is any. Actually, what I'll do is I'll, just because we've only got a few minutes left now, I'm just gonna, you can still ask questions if you want and I'll try and cover them. But just to note, Emma's put in the chat when you leave the event. If you could please fill out an evaluation survey. It helps us, you know, sort of see what everyone's wanting for future events and stuff like that. Also, if you're interested in any of the background research and stuff that's been mentioned today, you've got some references if anyone wants to check them out, but please note that these slides will also be available online after the event. So you can always go back and take another look if you need to. As well, you know, thanks to everyone for attending. And if you do wanna check out my code or Nagia's code, you can use this QR code here or, you know, just the GitHub, which is UK Data Service Open. You can go to our interactive gender repo and everything is there. If you wanna take a look at my code and see, you know, how intensive it was to create these visualizations, the full code is there. It's got all the comments for it. So, you know, you can follow along with it. I've tried to separate it out into different notebooks for different things. And then you can find the, there's a file called main.py, which is in the root folder and that's the whole app. So, yeah, you can take a look at that. I've also got contact details as well, so social media info available. So if you do have any more questions or you wanna get in touch about maybe your own project that you're working on or keeping up for our future research projects, then you can, you know, follow me on Twitter or, you know, email me a question. But yeah, thanks everyone. This has been really interesting to see what sort of packages as well that you guys are using. So yeah, big thanks.