 Okay, welcome everyone to session 9A, which is the third in a series of our in-production talks. My name is Jenny Bryan. I'm going to be chairing this session. We have four 20-minute talks coming, and slight change from the schedule. The first two are recorded, but all of our speakers are here with us today, so we will watch a talk, maybe have time for some questions, watch a talk, maybe have time for some questions, and then we'll have two live talks. And then I also want to make sure everyone knows that if you have questions for the speakers, please use the Q&A feature within Zoom. You could also have a live discussion over in the Slack channel inside the lounge for our session, which is talk underscore r underscore production underscore three. And that would also be a good place to discuss talks and some questions might get transferred from there. Otherwise, the last thing I want to remind you about before we start is that after this session, there is sort of the closing ceremony of the conference, and we'll make sure that we end on time so you can go there. Otherwise, without further ado, let me introduce our first speaker, Erin Jacobs, talking about production metrics for R with the open metrics package, and we will queue up that video now. Hi. My name is Erin Jacobs, and I work on the R&D team at Crescendo Technology in Toronto, which is a company focused on the technology side of the sports betting industry. And today, I'm going to be talking about production metrics and monitoring with R, specifically with the new open metrics package. Now, the inspiration for this talk comes from a general trend that I've seen in the R community over the past few years. As all of you know, R is a language that has traditionally been used for interactive data analysis, exploration, visualization, modeling, and it's excelled in those domains. But with the introduction of application frameworks like Plumber and Shiny, R is now being used to write genuine applications, applications that have users, others, and their authors that are running in some kind of production. Now, I think that overall, this has been really beneficial for the R community. It's really increased the sort of range of things that R users can create and increase the value that they can bring to their organizations or companies. But with these new opportunities come new responsibilities. Specifically in this case, now that we have these applications, how do we keep them running? How do we know when they're misbehaving or broken, or when they need more resources so that R users can rely on? As usual, the answer is collect data. And we can start with generic data our system already supports, things like CPU and memory usage. But generic measures like these are fundamentally limited. They don't really know anything about what's happening inside of our application. So for instance, if your R code is using a bunch of CPU, does that mean that it's broken? Or does that mean that it's working? What you really want is R to be able to admit metrics in its own right. Things you might need to know are things like how many users are currently using your Shiny application? Is your plumber API producing errors? Or perhaps, is it producing more errors than usual? If you're querying a database, maybe you want to know how long that takes. Maybe you want to know if the amount of time it takes is increasing. These data are called metrics. And the discipline of tracking and visualizing and responding to problems that are surfaced by these metrics is called monitoring. Now, these are all old problems. And there are tons of proprietary monitoring solutions out there, some of which have been around for decades. But none of them, to my knowledge, have ever supported R. So what about the open source ecosystem? There are really two big open source monitoring projects. The first is StatsD and Graphite. And the second is Prometheus. Both of their use have been around for a while, although Prometheus is a little bit newer. Prometheus uses a pull-based model, which means that Prometheus itself is responsible for finding your application and asking it for its metrics. It's currently the tool of choice in the Kubernetes ecosystem. And it definitely has more project momentum behind it at the moment. At Crescendo, we were originally targeting Kubernetes. So using Prometheus made a lot of sense. Since we started using this about a year ago, we now run or monitor dozens and dozens of plumber APIs and shining applications and even some stream processing workloads with Prometheus. We use these metrics to create monitoring dashboards and generate alerts when things go wrong. The open source outcome of this work is the Open Metrics package, which is already available on CRAN. You can install it today. It's a complete and slightly opinionated client library for Prometheus. And it's actually named after the Prometheus data format, which is currently in the process of becoming the Open Metrics specification for the Internet Engineering Task Force, which is the same group of people that standardize things like HTTP. The Open Metrics package allows you to create custom metrics for your R code. To give a more concrete example, suppose you have a Shiny application that accepts user uploaded CSVs. This could be a very important measure of load on your application because if users are uploading lots of CSVs, it may impact the service for others. So you probably would want to count it. So this is what it looks like to create a counter using the Open Metrics package. There's a counter metric function, and all you have to provide is a name and a little bit of human readable help. And then somewhere buried in your reactives in your Shiny application, you want to increase the counter when a user actually uploads a CSV file. Now there's lots of good advice out there on how to actually go about designing and implementing metrics for your applications. So remember how I said that Prometheus uses a pull-based model? What this actually means is the your R app that Open Metrics will add a slash metrics endpoint to your R application. You'll also have to tell Prometheus where to find your application, but that's a little bit outside of the scope of this talk. But I can show you what Prometheus sees. Because the slash metrics endpoint is just a regular HTTP endpoint, we can query what our app is doing using regular R tools like the HDR package. And if you do so, this is what you'll see for something like a plumber API. It's a little bit intimidating at first, but actually this is the text-based format for Prometheus, and it's surprisingly easy to read. You can pick out important information like the human readable help strings and the type of metric being used. I actually think that one of the reasons Prometheus has been successful is because it's so easy to look at the text output and kind of suss out if your application is reporting the right things or not. Now if this is a little intimidating coming up with your own custom metrics, there is some good news. Which is a central goal of the Open Metrics package has always been ease of use with existing R application frameworks. So right now that means there's built-in support for Shiny and Plumber. If you're using Shiny already, you can add useful metrics to your application just by calling register Shiny metrics. It gives you a bunch of defaults. The same thing is true for Plumber. You can call register Plumber metrics with your Plumber router and get a bunch of useful default metrics. Now because this is a goal of the project, I'm also interested in further community contributions for other R popular R frameworks. In particular, I'd love to see support for something like HTTR or other curl-based packages. And in addition to that, I do think that there's room for improvement on the existing built-in metrics. Now I've emphasized that Prometheus uses this pull-based model, but for some things this is kind of inconvenient. A good example of this is if you have say automated R markdown reports that run regularly, and you might want to collect metrics on how those reports run, but generally they aren't around long enough for Prometheus to be taught where they are and to scrape them. The same is true of the kind of stuff you would make cron jobs with, or say R-based ETL. The Prometheus community actually does have an answer for this. There's an application called the Prometheus push gateway. And this supports a simple API so that you can push metrics from applications that need it. And then Prometheus itself will scrape push gateway. The basic idea is that at the top of your script you define a bunch of metrics and then you set an exit handler so that these metrics are pushed to push gateway when your script ends. And then you do your regular script stuff using the metrics as you wish. So this presentation wouldn't be complete if I didn't do some sort of demo. But because this is a prerecorded talk, I can do something a little bit more dangerous than I might otherwise be able to. So I'm going to show you some live production demo or some live production dashboards. I've modified them a little bit to occlude potentially proprietary information, but they should give you a pretty good idea of the things you can do. The Prometheus itself actually includes very limited UIs for creating graphs and certainly no UI for creating dashboards. So it's actually overwhelmingly common to pair it with a tool called Grafana, which is an open source dashboarding and alerting solution that can query Prometheus and actually many, many other data sources. So the examples I'll show today are actually dashboards made with Grafana, which query Prometheus, which is scraping your application. So in this example, I'm actually going to use one of our stream processing workflows, which basically takes in data, doesn't work on it, and spits it back out again. So over here on the top left-hand corner, you can see we've got a measure of activity. So those are the payloads that are coming in, which have various categories. You also have a measure of latency. That's how long it actually takes to do this work. If you look really closely, it takes about 70 milliseconds on average, but 1 in 100 payloads takes 500 milliseconds to process. So maybe we need to take a look at improving that time. For this particular application, we also care a lot about network latency, which is why we have a dedicated panel measuring this latency. We also have traditional measures like CPU usage, as well as memory. Memory is not very useful in this case, but CPU is a good proxy for activity. And finally, we do have a useful application-specific metric, which is that these operations often need to look up metadata from an external source. We want that to be as fast as possible, so we cache it as much as possible. And so this really just shows what percentage of the lookups are cached versus needing to go out for real data. And hopefully that gives you an indication of what's possible. So that's a broad overview of what you can do with the Open Metrics package today. But one of the reasons that this is kind of an exciting space is that there have been important developments in the Prometheus community only in the past six months or so. These are intimately tied to the fact that finally, after three years in development, the Open Metrics specification itself was finally released to the public. And this has created a huge burst of activity around various clients and vendors implementing new features of the specification that don't exist in the current Prometheus data format. Biggest new thing is the introduction of the concept of exemplars, which I like to think of as effectively asking Prometheus, can you give me an example? And to illustrate this, suppose you have a plumber application. If you looked at a dashboard or an alert, it might tell you something like, in the last hour, there have been 221 errors, to which you respond something like, great, I guess there's a problem. But that doesn't usually give you a very good place to start. An exemplar might say, including one with request ID, blah, or trace ID, blah, or customer ID, blah. And this is really useful because it gives you a starting point to help track down these issues. This gives you something to search your logs for, or maybe search other internal systems, search logs of other applications. Maybe you can check in your database for a particular customer ID, maybe that's part of the issue. And that's because exemplars, by and large, are really about improving the integration of metrics with other systems that help us observe and understand and keep applications running smoothly. Overall, I think this is a really positive development for the Prometheus format. And I think over the next few years, we'll see a lot of excitement and activity around those features. It would be great if our users could be part of that. And that's all the time I have. So, as I mentioned, the OpenMetrics package is available on CRAN. There's also a package-down website if you're interested in that. The development version on GitHub does contain some goodies that aren't yet available on CRAN, but that probably won't be true by the time you see this. Hopefully, the OpenMetrics package can help you increase your confidence that your R code or application can be effectively monitored in production and help bring more reliable value to your company or organization. It's my belief that as production use of R grows, expectations around production features of R, such as monitoring and metrics and alerting, will grow as well. I'm Erin Jacobs. You can find me on Twitter, on GitHub, or on my own site. I work at Crescendo Technology in Toronto. And if this piques your interest, we are generally always hiring. Thanks for watching. And I hope you check out many of the other great talks that this year's users are. Thank you so much, Erin, for that talk. It looks like really interesting work. I can totally see how this will be very useful. I think we only have time for one pretty straightforward question, but I'm going to pose one from Steffi Groot. Could you say a bit about what the default metrics are for a Shiny app? Yeah, so I didn't mention this in the talk specifically right now. We can do things like track how many users are using a session. There's also some metrics around how long it's taking reactives to flush, which sounds kind of technical, but it will basically tell you if your Shiny app is getting stuck. We also track things like CPU usage, memory usage. There's also some stuff around something weirdly technical, again, called file descriptors, but that can be super useful to know if you're, for example, you have too many database connections open or you're leaking database connections. But as I said in the talk, I don't think that the metrics for Shiny are perfect. And partly that's because Shiny's a little bit hard to hook into, but I think that they can probably be improved in the future. And certainly I think that they give you a starting point. For us internally, we often add custom metrics for Shiny apps in particular. For the particular app, yeah. All right. Well, thank you again, Erin. And we're going to move on to our second talk now, which will also be a video, but our speaker is with us. So we're about to hear about automating business processes with R from Sfandune. Hello. Thank you very much for your interest in this talk. I'm going to be talking about automated business processes with R. And I'm very happy that there is such a session as R in the wild to talk a little bit about the other things that we do or people do with R. I'm one of them. It's a sponsor talk. We're very happy as Xpantia to be one of the sponsors of user Global 2021. And my name is Frans van de Neen. I'm the Chief Data Officer at Xpantia. And if there's any questions that I could answer after this talk or anything else that you might come up, please feel free to contact me. So in the next 15 minutes, what I will talk about is a data-driven innovation, right? Data-driven innovation is a process that we found the health organizations to quickly adapt to change, something that has been necessary in the last year and a half more than ever before because of the pandemic. Things that we have found are things that I want to share with you because it might be, it might be of use. So first of all, APIs are really excellent tools to reuse existing tools that you have within your organization that allows you to innovate without having extensive additional investments that you need to make. This having data scientists or analysts or just people tinkering with our come up with new ideas may impact the interaction that they have with IT. So that is part of what will come up in this talk. At the end of things, the main message I want to convey is that domain experts with the help of R really can impact the top line in organizations. We only need to let them do it. We need to let them do it, but then we also need to monitor the results of their activities and make sure that we have the metrics in place that we can evidence whatever success they've had or not. The way that we do this in Expantia is using a process that we call the data-driven innovation process. It has the following steps. We start off with a common preference. We need to be sure that everybody is talking about the same thing. Concepts like big data are notorious for everybody having their own idea about what that might mean. Same goes for machine learning, even worse artificial intelligence. So if we sense that that is not in place yet within the customer where we're going to work at, we offer workshop to help people come together in how to see things and how to interpret things. Having that, we map data sources within the organization. It's a full map of everything that is there, including the formal and the informal data sources, so that we know what the basis is for answering business questions that have priority. When we present the data map, we also make the inventory of those priority questions to make sure that people can give us the boundaries within which we do the exploratory data analysis. You can go and explore data for a long time, but we need to know that we are actually adding value to the organization. In that same meeting, we select typically three of the many data sources that we have to get an idea of what is available within the organization. Based on that, we go out into the, we talk to people and we make an inventory of opportunities. It could be lists of 30, it could be lists of 100 different opportunities that we have prioritized according to criteria that we're showing in the next slide. When we have that, we draft up a report that we call the innovation route where we pick the top three opportunities and give an idea on how we could execute them. Now, the inventory of opportunities is basically the Christmas list of each and every one within the organization, what they want, where they see opportunities to innovate. What we do is we put them, we compare them to, well, what is the strategic value, or what could the strategic value be of this particular initiative and is it aligned to strategic objectives within the organization? What is the scope? How many people or how many areas will it touch when we are able to deploy? How feasible is it and can we do it with a small investment or is it something that takes a lot of time and a lot of money? We look at the availability of data and the accessibility of data. It's great to have data, but if we cannot access it, if we cannot use it for analysis, well, there's not much value to it. So having that, you could imagine an example like this. This was with a consulting company who had the typical impact of the pandemic where usually they organize events. They invite a lot of people. They present something fun and exciting and then they start building on the customer relationship. With the pandemic and the absence of these kinds of events, what happened is that they switched to webinars. To their surprise, they worked really well, right? The first one had like a thousand people in the first call. And Zoom gives you the option to download data of the participants, including how long they stayed to watch the show, to watch the webinar. Uploading that data to the CRM then gives a better overview of the potential customer because we get to know a little bit about their interests. And then the combined data that we have from the CRM historical Zoom data allows us with R to do a customer segmentation. And based on the customer segmentation, we can upload groups for specific email targeted messages to invite them to the next webinar that might interest them or give them an offer that is targeted on them. But it takes time. So here, a person is downloading, is working on the Zoom data to make sure that it fits in the CRM. In the CRM, we need to create a report. We download it. We need to tweak with it to run the R script. Then based on the R script, we do the uploading to the email marketing platform. If you want to do it regularly, it gets cumbersome. So that is where automation comes in. And with R, there's a lot that we can do because in each of these steps between these applications, we can make use of APIs. APIs are application programming interfaces. And in essence, it's a set of instructions that each of these software packages have so that we can make a call to get information. And we can make a post to put information inside the next link in the chain. By doing that, we cut out the handwork by people who need to download, do things, et cetera. It speeds up the process and it also helps us avoid common copy-paste mistakes. I mentioned get and post because it means that for each of the applications, we need to go to the documentation and see that, for instance, in Zoom, we can do a get, pass to webinars, webinar ID and get a list of participants. In our email platform, we may be able to do a post so that we can send off the group's name, email and whatever, you know, whatever other characteristics we need to specify to get the correct message to them and do that on a daily or weekly basis, fully optimized. Now, automation has usually been the domain of IT, right? So there's a group of IT and they are in control of everything that has to do with access to data sources, user authentication, servers, database infrastructure, cloud-based services of different kinds. And suddenly, we get a group of analysts and sometimes not even formally analysts, but domain knowledge holders who just have a good idea and know how to execute it, that are working on reports, maybe in our markdown, they figured out how to send out automated emails, they make interactive dashboards, start playing with containers in languages that usually on the IT side are not so common, like R, but also Python, Julia, there are others. In our experience, this tension does commonly exist. And one way to mitigate whatever friction that might cause is to help both sides understand that the analysts can package their R code or their Python code in APIs, right? And R, we would do that with Plumber. Take code that runs from sequentially from top to bottom and there is more than enough documentation online to take that to a Plumber API so that whatever IT is working with, maybe Java, maybe .NET, they can make the calls to those APIs and R becomes completely invisible to them. Really, it doesn't matter, it's just one more service within the whole architecture of services within the organization. Dashboards similarly can play a role in closing down, narrowing the playing field where we do not depend on anybody else, like for instance, a BI organization, to make the dashboard for our data, or in the case of R and Shiny, it's so easy to make, really bespoke, made to measure interfaces that it usually helps to do that in the interaction communication. The key to this is the library Hatter, or HTTR, or however you pronounce it, and the get and the post functions are the ones that we use most, right? Get to get data, post to put it somewhere else. We've even published the packages that we use in-house, like the one we use with our CRM, less annoying CRM, and the one we use with our project configuration tool, which is called Getty. They're both wrappers around the APIs that those software packages provide, and if you're interested in how a package that would be used internally within an organization, what that looks like, please feel free to visit our GitHub. If you just want to use them, you can download them from CRM, they're both published there. In general, what we see is that if you take small steps, but you measure and you keep communicating with the rest of the organization, learn from experience, then do it again, and again, and again, making sure that you have impact, the kind of impact that you want to have, that that is a very sound strategy to implement innovation. And we've found R to be a very useful tool to do that. One recommendation is that as soon as you start to automate processes, make sure that you start to automatically monitor them as well. This is a very simple, shiny dashboard that we use. Unfortunately, I see now that it is the template, so there's nothing, nothing running. But you can imagine that for each of the processes running daily, weekly, monthly, hourly, whatever frequency you have set up, that being able to see that the last run was okay in green helps a lot to identify when something goes wrong in red. We set the processes up, not as R scripts running, but as R markdown documents running every hour, every month, so that whenever something goes wrong, if you press on CDHTML, you can see exactly where the process stops. In some cases, we've even included some validation or some testing steps, so that it becomes more easy to identify whatever went wrong so that you can go to the right place to fix it. What I try to share with you in the last 14 minutes is that, first of all, you need a common frame of reference. You need to know, you need to be sure that you're talking about the same thing as the persons in your team and the people at the stakeholders in your organization. It's very important. Every step here will help with that as well. It is useful to have a map of all the data sources that are available, and I guarantee that once you start making the list, there will be some data sources coming up that somebody else has not seen before. There will be surprises. Make sure that you include the informal ones, like the one Excel that is calculating customer segmentation, or I don't know what the Excel does, but very often they exist. Do exploratory analysis in close contact with knowledge holders. It's very easy to retreat to your ivory tower and start working on what might be there, but what you might find interesting might not be interesting at all in a business context. Making sure that you do that in those communication will help to get better results. It's also more fun because the data that you're handling has meaning. With that, organize go-no-go meetings. Present your findings of your exploratory data analysis and be forward. Be honest about what you will be and will not be able to do. If there's not enough signal in the data to make a predictive model, then recommend no-go and recommend do something else. Don't get stuck on somebody who wanted a model of such-and-such phenomenon when there might be other things that are more helpful to work out first. When you want to make an inventory of opportunities, make sure you interview key stakeholders in the widest sense in the organization. Last time allows prioritize those opportunities and suggest how to execute them. Last but certainly not least in this conference, the more people share a common language to access data, the more opportunities you will see. It seems very obvious, but we've really seen this happen because suddenly somebody in, I don't know, in a commercial position tinkering with R will come up with something that you might not have thought of. You can take that as an idea and as an analyst, as an R programmer, make that much better than what it perhaps was in the first iteration, but it's that. It's taking an idea, working on it, making it better, and then going into the next iteration again. Thank you very much. If there's anything I can help you with in the question and answering session, I think we've got a few minutes and I hope to somehow meet with you again soon. Thank you. Thank you very much, Franz, for that. I assume you'll unmute and reveal yourself. So we have about two minutes before we move on. And so a question I have for you is, when you go into an organization and you're trying to help them make better use of data, I would love to know if you have great examples of something you often do for people that has a great ratio of how hard it is versus how delighted they are once something has happened or the opposite. What's something that everybody asks or everyone thinks they want, everyone thinks they need, and then they get it, and their world is actually not any different. So can you comment on either of those? Sure. The first one is, there's maybe it's one of two, it's making a report that people might be used to, they work on that for sometimes two or three weeks to create a report, take that and turn it into an arm markdown. The fact that you with the press of a button take the last cut out of the date cut out of a database and then have the full report with all the graphs and everything else in just a minute is amazing. And very close to that is the availability of R sending out email. So the Blastela package is absolutely fantastic. It's something that very few people think of before they start on analytics and whatnot. It's that part of communication and it tends to make people very happy. The other one, Jenny is a little bit harder to answer. So I don't want to shoot myself on my own feet, but it's like sometimes that managing the expectations around artificial intelligence, machine learning models really is part of what we do. So make sure that the appropriate expectations are said that everybody shares them and all these visions of glass particle that magic balls disappear before we actually start is I think one of the hardest parts of what we need to do. All right. That's great. Nothing terribly shocking there, but I do. The magic of having a report that used to be produced through painful manual processes become automated is great. Well, thank you again. We're going to move on to our third talk now, which will be Delivered Live. It's on our studio managed workbench and we will be hearing from Patrick Trotz from Syncra. Yes. Thanks, Jenny for the introduction. Let me show my two lights with you or in more detail my browser. So yeah, managed workbench. I'm talking on behalf of Syncra and I want to introduce our comprehensive, our studio infrastructure, what we can offer to any kind of organization, whether it's small size, medium size or large size organization to really enhance the way your data science teams work with our studio products. So but before I go into detail of our product, let's talk a bit about who we are actually. So we are really a small team based in Zurich in Switzerland. We are quite young. We were founded in 2018 by Kirin Mirra and Chris of Sex and we are quite the national. So we are currently five to 10 people from seven different countries. We have a strong open source philosophy in everything we do and we are an official studio certified partner. So we went into training with our studio and do everything with our from in all of what we do. Some of our clients that we are already working with successfully are listed on the right side and we mainly do conciting for them. This is one of our yeah, big, big tasks that we do all day. We also give our studio our stats workshops in general and we also write a lot of our packages. So 50% of our work is really contributing to the open source community, especially our making everything better, easier and more accessible. And the other part is where our studio managed workbench then comes in. So we deploy and maintain the infrastructure for these clients so they can really have a great environment for data analysis and everything they want to do with our and then we can optionally also system in this environment. So to give you a basic introduction about our studio and their product line up in case you're not fully aware of it, there are quite some products our studio offers. They are free and products and licensed products. So let's say we have the free products on the left, the our studio desktop that most of you are familiar with, the shiny server and the basic our studio server. And on the right, they are the equivalent for the on the enterprise level. So for example, for the studio desktop, they would be our studio desktop pro, which combines very nicely with the our studio workbench here, which is the license equivalent of our studio server. And then in there is our studio connect, which is the professional product to the shiny server. And in the last one here on the right is the after the package manager, which you can use to build your binaries for your local internal packages, for example, or just in general make use of binaries on Linux or crown binaries on Linux. If you want to see more details between the free and the licensed version just, yeah, you can click the link up here, you see also for the slides and then check out these links on the bottom left. Coming to our studio professional products in more detail, because this is what our offering and the product in the end is about and based on. So as you might know, our studio workbench, which was recently rebranded to workbench before it was our studio server pro connect and the package manager, they formed a bundle, they are studio team bundle, which is really the enterprise product that can help to really push your data science team onto the next level coming from a normal studio desktop use. So by being a centralized installation, they can really take a lot of time away from installing and maintaining our studio local installations on every laptop of your team member and the simplified permission handling, you have centralized logins and so forth. So this is all great. The downside is that this usually requires quite some configuration in the first place. So deploying such a central service, you really need to ensure that it's stable, that it's always available, that the configuration with the authentication is just working. And this is usually not done in a few hours. So that's our experience. And we've seen lots of people trying to do it themselves and really struggling and then maybe getting frustrated and things are not working as they wished. And there's a very great and detailed admin guide, but this is really also sometimes very techy. So we are there to help. So we know the pain and we want to take the pain away and just make everything easy. And this is where our idea and added value product on our studio professional product comes in. So we call it Managed Workbench. And a lot of details are also listed on our website. If you click on here, sync.com or studio, you'll see a lot more detail. I'll walk you through most of the important points that our solution has to offer. So first of all, it's containerized. And so we have a Docker image and not just for our studio workbench, but for all our studio products that are out there. So even though we call it a Managed Workbench because this is really the centerpiece and to our in our experience, that's what most people want to have. You can have everything what our studio offers. These Docker images always ship, of course, with the last version of each product. They are essentially managed and updated by us. Everything is configured. They run in an Ubuntu, a BS environment. So you will always be in a very common, maybe the most common Linux environment out there. And this will stay no matter what your company is using in the background as a operating system. So things will usually just work. We have a lot of experience with all kind of authentication protocols that are out there that the company might use. Active Directory, LDAP, local users, SAML, OIDC. So there's a lot out there. And to our experience, everybody uses something else. So some of our story reasons, some might change in the meantime and companies and organizations might switch to a new one. And usually this comes always with some trouble. And if you want to get things properly set up and really stable, this is where the pain comes in and where you usually it's nice to have help there. And another thing we realized when working and helping people with R is that sometimes Pandoc and Latish can be a trouble. So this differs from operating system to operating system again. And by having an installation based on Ubuntu LTS, Latish and Pandoc, we just ensure that both are working. This takes away a lot of pain, especially when rendering our markdown and other interactive documents. So no worries about anymore about these kind of rendering issues. The next thing is security. So we couple our installations with an Intraneterious proxy in front to really make sure there's a proper load balancing for the web server and incoming requests. And also set proper security headers to make sure everything's just safe. In addition to the simple installation of the product and making sure that there are small defaults in place, we also provide additional helpers for everybody out there, not just for people that would like to have this kind of preconfigured RStudio workbench, but to everyone. And as we said, we are very devoted to open source in our company and to we have a lot of R packages out there for the community. And one of this that couples very nicely with this idea of having kind of small helpers and smart defaults to make RStudio Live DVD series, Syncron this. And Syncron this is here, a small package that comes, for example, with a lot of R&D helpers that make certain small reoccurring tasks easier. RStudio Connect helpers and some other things you might find useful for daily product project life. You can check out the package here. You see the URL on the top, but you can also certainly find it easily on GitHub at Syncron at GitHub.com slash Syncron. Yeah, what else do we have? So we also have a default in configuration of our workbench installation coupled with an RStudio package manager. So we by default just users make use of package binaries on Linux because we're running on a button to set these options as the repo options to installations. Users would just have a very, very quick installation of any package that's available on CRAN. So for example, if I go in here in the RStudio workbench and spin up a session here, let's assume the last one, that's our internal RStudio workbench instance here. And let's say we install a package. Let's do a simple one. You will see what happens. It's just download. We download it from an RSVM that we can, in this case, this is our internal Syncron RSVM, but you could also have it and use the global RSVM from our studio or even host your own one. And you just saw how quick that was, right? There's no compilation from source or other things. You will always just download the binaries. And if you install Tidyverse, it's done even within one minute, even if you have no package available beforehand. So it's just working as you wish. And yeah, we ship with any R version starting from 3.6.3 as the long term support. But if you need all the versions, that would also be no problem. So let's go back here in our instance in the dropdown, for example, you see what current R versions we have available. And this will just continue to, with all the new R versions coming out in the future. And anything is possible here. So you can just quickly change between any R version that has really no trouble for your members of your team to use any R version they want to have. So if you're still not convinced, let's look at some conduction setups that might convince you to get help when you're thinking about deploying our studio products yourself. So these are kind of really some examples in the wild that we have seen in the past. So people and organizations also like to combine the workbench with other products, for example, Connect package manager. Sometimes they do not. Sometimes they have another open source or studio server running as a backup. Sometimes they use a Shiny server instead of Connect. And everybody really has different SSO types for logging in, OIDC or SAML, or everybody uses different database drivers. And this is really when it becomes really a custom installation where you really need to narrow it down to your specific setup. And this really takes time in our experience. And this is where we can help, where we have the experience and where we really like to make things easier for you. We built everything in drone and githy. So we built half the Docker setup to build everything in the end. We have a custom client image just for you that includes the drivers you need, that includes the services you need, and only ships everything in a simple bundle just tailored to your organization that just works. So we base everything off Syncra Ubuntu image here that we've snapshotted from Ubuntu 2004 snapshot. And then we just have this image inheritance and we cherry pick from each image and put it into the next one. Yeah, you see a lot of like, well, kind of a table on comparison table between RStudy Desktop, RStudy Workbench in our solution. So this kind of, this is a summary of our added value to the default RStudy Workbench installation that should make it clear, sorry, make it clear where our product really tries to be smart and have smart defaults and just working out of a box working configuration. In addition, we have kind of, we have made a comparison with the settings we applied for the startup speed between an RStudy Workbench setup here on the left and a normal RStudy Desktop installation on the left. And you see that the left one is kind of a few seconds faster, even though this was benchmarked on a very recent MacBook Pro, which is quite fast. So this is quite impressive I'd say, because if you take all the laptops or if your company has all the laptops, then this difference might be even higher. Also with the RSPM, if you host your own RSPM, for example, here you can get an increase in download speeds of up to four times. For example, here we have, we're downloading the Th package from our own RSPM and we are at 0.09 seconds, whereas from the global RStudy package measure, we are at 0.4 seconds. And this is mainly because of the fact that we are able to host our RSPM on a local machine near and next to us with a very good network connection, whereas of course, if you would use the global RStudy package, which is perfectly fine, the travel distance would be a bit longer. So these are the small techy things that we can also offer to really take out or get out the most out of our RStudy product. Yeah, so that's what we do. That's what we can help you with. You would like to really make things easier for you if you're thinking about RStudy products. We have a lot of experience setting them up, doing everything from A to Z, the whole environment, the whole DevOps part that you don't want to do yourself usually and it's very a pain if you do it yourself. We're happy to cooperate with any IT of your organization. And usually it works out great because they're happy they're not doing it alone and we are happy to assist them to make everything smooth. And if you want to find us, you can find our web pages here or LinkedIn and also find us on GitHub. So yeah, happy to take any questions. And we are also hiring, so just hit us up if you're interested in how we do things and how we work. All right, thanks. Thank you, Patrick. The note you ended on is actually what I wanted to ask about. So I do work for RStudy but nowhere near the pro products side. And so I kind of lurk in some channels where I get to see some of this but at arm's length. So I'm wondering like when you go in to help someone stand out of this up, is it usually because they don't really have that kind of IT support or there is IT support in the organization but they don't want to install our studio or like sort of how does, who are the typical sort of organizations that would use a managed product like that? Yeah, that's a good question, Shani. Thanks. So it usually differs but what we see most is that organizations have IT support but this IT support is, has not any experience with R. So most of companies are started with R or have used it at local RStudy desktop installation yet. And then maybe they get the request, oh, we want to build a centralized studio infrastructure at ours and then see, well, it's not so easy maybe and it's kind of complicated and there's this huge admin guide, well, can we get any help? And this is really what we then can jump in and make the experience that this makes things easier better for everybody. And yeah, this probably answers hopefully the question. Yeah. All right, well, it's time to move on but thank you again very much for this talk and for this product. And our final speaker is Joe Rickert from RStudio. So he's my colleague and he will be talking to us a little bit about RStudio. So I'll hand it over to Joe now. Hello, everyone. It's really good to be here at USAR and many of you may know me from my work with the R Consortium but today I'm here with my RStudio hat on and I'd like to tell you a little bit about our company. So first off, well, what is a company? Well, it's people with a mission, a culture, a product plan, and hopefully some customers. So here you see a smiling happy face. Yes. Joe, you're not sharing your slides. I don't know if you mean to be. Yeah. Yeah. There we go. Sorry about the barking people. Oh, well, I apologize. So now you see some of the smiling happy faces of my colleagues. When I started here in 2016, there were only about 40 of us and I am really, it's gratifying to be part of such a large and growing group of competent people who just have a tremendous amount of energy. And we're all remote workers, or most of us are. We're spread out all over the United States and developing a growing footprint around the rest of the world. We deliberately try to seek diversity and we're trying to take a global perspective on everything. You can see in the corner there we have a small office in Boston, but I bet most RStudio employees have never been there. So this is our mission and I'm going to read this slide because it's important and it's the one thing I really want you to take away from this presentation. Our mission is to create free and open source software for data science, scientific research, and technical communication in a sustainable way. Why? Because it benefits everyone when the essential tools to produce and consume knowledge are available to all regardless of economic means. So that's quite a mission and we're trying to develop the culture and sustain the culture to accomplish that mission. The one word that I think best characterizes what we do is the ancient Greek word eudaimonia, flourishing. So we're always internally asking ourselves, how can we fit into the art ecosystem? How can our company flourish while helping our employees, our customers, and the community flourish also? And how can we do this over the long run? Internally at RStudio we talk about being a hundred year company. So what do we need to do in order to last that long and to try to keep focused on our mission and our culture over that period of time? And a significant step we took in 2019 was to become a public benefit corporation. What this means really is that management of RStudio not only is allowed to but has the obligation to consider the interest of our share customers, our employees, and the community in addition to the shareholders. So that's very different from the legal hierarchy of what a normal INC is, a normal incorporated company in the United States. And we not only took this step legally but we are qualified as a certified B Corp. That means there's a nonprofit that audits every year on how well we are doing with respect to other public benefit corporations and with respect to how well we are achieving our own goals. So open source, it's the center of what we do. Over 50% of our engineering resources were devoted to open source last year. We have lots of RStudio open source projects and we contribute heavily to other open source activities. For instance, we're strong supporters of the R Consortium where I spend most of my time. We support a number of non-focus projects. And the graph on the bottom shows that how well we're trying to do with integrating community participation in what we're doing. So both of those graphs for tidyverse on the left and shiny on the right show project commits over time from 2017 to I guess the beginning of this year. The dark blue shows the commits done by RStudio employees. But you can see in both graphs, there's a healthy and growing number of commits being made by non RStudio people, interested people from the community. This is a slide that Hadley Wickham presented recently at an internal RStudio meeting. So you can see Hadley up there on the left, along with a seven and a half smiling faces of the tidyverse team, you know, a tremendously productive group. You can see some measurements there over 230 packages. In the month before Hadley gave his talk, there were something like 66 million collected downloads. And that is all accomplished using the goals that Hadley and the team have set for themselves. And what they're trying to do is provide a seamless end to end data science experience to help our users flourish and to build an ecosystem. So again, it's about contributing and taking a long run view of things. If you go to our webpage, this is what you'll see in terms of products. You'll see open source on the left, hosted services like RStudio Cloud. That means that there's usually a free component in addition to something you pay for. And there are professional projects, products, which Patrick did an excellent job of providing an overview. So I'm not going to go into much detail there, but I will share this with you. And this is my view of what we do here at RStudio. So we build tools for data scientists. And our data scientists, they build things like on the left block. And that means building models, analyzing the results of statistical modeling. They publish things. That means sharing all kinds of artifacts, publishing reports and models and bits of pieces of code. And they do that while collaborating. So when we're building these professional tools and the open source tools too, we have this in mind that we have to build things that help data scientists, our customers, our open source users build to publish and help them collaborate while they're doing this. So Patrick talked about the workbench. I'll just mention this as a really good example of a professional project that's all about helping with the building process. Our markdown is an open source tool that is really fundamental to the whole concept of the original mission of open source and providing a way of facilitating and help statistical computing and collaboration among data scientists and statistician flourish over the long run. So this is a fundamental tool. And I'll think that you'll see this thing develop over the years. And it's a root kind of thing from which other ideas flow. And then I offer you RStudio Connect as an example of a collaboration tool, a professional collaboration tool. It's a platform. You can publish R and Python and shiny artifacts. And you can do this in a way that facilitates secure and managed collaboration. Now here, this is the secret RStudio business plan. So this is the secret hiding in plain sight. And it really talks about how we go about our daily business. So here we are, our studio. And we think about first contributing to the open source community. So this is the real secret. I had the opportunity to attend an entrepreneurial investment conference not too long ago. And there were young entrepreneurs talking about how they were going to leverage open source. No, that's wrong. First you have to contribute. And then after you've contributed and helped to build the community, then maybe you've got the place where you have some commercial customers who in turn will drive the adoption of open source. And then maybe they'll buy some of your products which you can reinvest. So this is the virtuous cycle. Contribute first, help the community grow, help other people make money so that maybe they'll buy some of your products and you can have more to do your mission, your vowed job of contributing to open source. We have customers. We're really lucky to have over 5,000 active commercial customers, many of them Fortune 100. And what you see here are the logos of some of the customers. We're really very proud to have and it helps sustain us and our mission. And that's it. That's just all I have to say about our studio. And thank you. I hope to see some of you anyway next year in person at USAR. Thank you, Joe. I have one question I want to ask of you when you're sort of wearing your R consortium hat. Maybe when you answer it. Oh, yeah. Maybe when you answer it, you should give like the 10 second, like what that is in case people don't know. But what I wanted to know is, are there any projects that the R consortium has funded in the past that you think like were particularly effective and or are there types of projects that you think are really needed and that you would love to see people pitch to the R consortium? Okay. So projects. So there are two ways that we develop projects in the R consortium. Twice a year, we go out for a call for proposals. And what we do is have people, you know, send us proposals and we fund some of them that we think will have a big impact on the community and will be something that's sustainable over time. So to answer Jenny's question there, it was spatial statistics. Spatial statistics was a big winner. We provided a little bit of seed money in the beginning. We have funded several spatial statistic projects to the tune of tens of thousands of dollars. Altogether since the beginning, the R consortium has put out over 1.3 million in projects like this. But we'd like to think that we really helped spatial statistics to become prominent and to make R like the leading place to do that kind of statistics in the open source world. And we also contribute that, you know, we fund, we take a broad view of infrastructure projects and we fund, for instance, we funded R ladies initially with a small grant. And we'd like to think that we've helped them become as successful as they are to take nothing away from the tremendous drive of R ladies all over the world. But we see infrastructure as community infrastructure too. The other kind of thing that we do is we are a center for building collaborative working groups. So these are projects that we initiate internally or people will come to us and say, you know, we have a working group and the ISC, our technical committee that Hadley hands up, we'll give a go ahead. And right now, we have a lot of collaborative work going on. For instance, as we speak, the R consortium is gearing up to be able to submit a test clinical trial on behalf of some pharmaceutical companies that are collaborating in order to make possible and streamline and shake out the process of what it would be like to do an all our submission to the FDA. So the consortium is the place to go for a company neutral place to collaborate in industries that may otherwise be pretty competitive. Those are great. I think those are great examples. Yeah. But I think it helps people understand sort of the point of the consortium and the type of activities that might happen otherwise, but definitely are more successful and have greater impact by having, you know, funding or other contributions from something like the consortium. So I think I'm going to take this chance to conclude our session. So Joe, you could stop sharing or maybe that's going to be imposed on you. And I would like to thank all four of our speakers. Again, I thought this was a really interesting session. And I love I think, especially because I had the journey I did with our sort of moving from being an individual data analyst in academia and now working on the Tidyverse team. I find it really interesting to see how our is used in production environments because it's not sort of where I grew up with our but I find it very fascinating. All right. So this session, and I think three sessions running right now, we are the last sort of regular sessions of user 2021. And I'd like to thank the sponsor for this particular session, Roche. And we're going to end now. You have about 13 minutes by my calculation for a health break. And then there will be the closing ceremony where you hear probably a little bit about the attendance of the conference and all the different platforms that were used and how global it was, maybe some awards and important announcements about the shape and form of USAR in 2022. And I'd also really like to thank our Zoom hosts as well. And I will see you folks soon. Bye bye.