 This is the last talk, it's a record talk. So the talk is a Python data science with BS code and Azure. And it's a talk by Claudia Reggio, or Reggio, I'm sure was a right way to put in that. Claudia works at Microsoft. Is a program manager at Microsoft and she's focusing on Python and data science. Yeah, so we're going to play the video and Claudia is going to be available for questions in the break room after, so enjoy. Hi everyone, happy EuroPython 2021. I'm Claudia and I'm here with Sid and we are both program managers on the Python data science and AI team in BS code at Microsoft. I'm currently focusing on the Python notebooks experience within BS code. Hi everyone, I'm Sid and I work on the Azure machine learning extension for BS code. So today Claudia and I are really excited to show you demos of both Jupyter notebooks as well as the Azure machine learning extension. So let's get started. So let's go through a few slides really fast. Our Twitter handles are right here. So if you want to contact us, these would be the ones to grab. So first order of business, what do you need? You're going to need BS code, obviously. You will also need the Python extension which comes with Jupyter and Pylance. And you'll also want gather and live share to get the full experience about what we're talking about today. One thing I want to call out really fast on this slide is for anybody who has seen me talk about this in another video demo potentially, this slide used to say BS code insiders. That is not the case anymore. We have officially rolled out native notebooks has the default experience for everyone in BS code stable. There's no more opting in, no more needing to jump through hoops and download insiders to try it out. This is the official new default experience. And we're really, really excited to roll it out to you guys and we hope that you try it and let us know your thoughts and feedback. As always, reaching out on GitHub is a great place to find us as well as Twitter as previously mentioned. So go ahead and try out and let us know your thoughts or feedback on how we can improve. We're excited to hear from you all. All right, why the changes? So previously we had a web implementation and that comes with a couple of limitations, unfortunately. So we went ahead and moved to natively supporting the IP, Y and B file. Some of the benefits that come with that, for example, are the integration of your favorite BS code extensions from the marketplace. A good example of a really popular one is Bracket Colorizer. That one could not light up within the old implementation of notebooks. However, it does work within a new notebook. So whatever ecosystem of extensions you have will light up within the notebook requirement. You can also expect improved notebook load times and of course, a new and refreshed modern design. So now we can go ahead and get started and just go over some features. Coming to our Titanic notebook for the purpose of the rest of the demo, running locally is gonna be just fine, but if you ever need to leverage more powerful remote resources, you can come down to the global status bar and click Jupyter Server. And this will allow you to connect to an existing Jupyter server as long as you have the URL or if you have the Azure Machine Learning extension installed in a compute instance, you can also connect to one from here. All right, so continuing on in our notebook, this for self is just a bunch of imports. We can go ahead and skip that one, pretty typical. But the next thing we might do is want to take a look at our data frame that we just imported, get familiar with the columns, the number in there, their ranges. And a lot of times what you can do in a notebook as you start to explore is you create variables and you're renaming them a little bit or maybe you're just overriding their value and rerunning and rerunning. And it gets really easy to lose track of the state of your variables. So we found ahead and created the variable explorer, which will show you the active variables within this notebook. And that's really helpful. You see the name, the type, you get to see the size and a preview of its volume. So you don't have to pledge your notebook with a bunch of print statements to understand where you're at anymore. You just open this up and check where you're at. And as far as checking, you know, seeing tabular data, you can go ahead and access the data viewer through the variable explorer too. So any tabular data, go ahead and click on the icon to the left. And that's going to open the data viewer. And this is just going to give you an Excel like view of your data. This makes it a lot easier to process. And we also have these filters at the top. So typically when you're getting started with your data, you want to make sure it's clean. And so you could write code to identify the issues and then write more code to fix the issues or you can use these filter tabs at the top to help you identify those issues a bit quicker. So you're not writing code. That turns out you don't actually need, saves you a little bit of time there for that stage. Coming back to our notebook, here we have pandas profiling. We have support for IPI widgets, obviously. IPI widgets are very powerful. They make even outputs more interactive. So if you're somebody who shares out reports, potentially the people who are not as familiar with the Python language, changing outputs or tweaking them slightly based on the parameters can be difficult. That's a lot more easily achieved with IPI widgets. So of course we have support for those. Coming down through the rest of this notebook, we have feature engineering, salon product coding, categorical variables, normalizing continuous ones. We're also going to make sure that we're not going to have any null values, et cetera before we keep going. And then we can start with training our models. However, if these two cells have actually ended up in the wrong spot, I actually want to make sure I move them and put them under the test train split section, which is where they're more appropriately categorized. If you want to move multiple cells at once, you can select multiple cells because we support that now. So if you want to select multiple cells, go ahead and hover to the left of the other cells you want to select. If you hold down shift and click, that's going to be the traditional selection of continuous objects. Otherwise, if you want like every other or separate cells that aren't necessarily continuous, go ahead and just click control and move as well. And that is going to select multiple cells for you. Then you can hold down click on any cell in your selection and that's going to move both of them. So I'm going to go ahead and just put the linear test train split. All right, once we've done our testing and we've done our training and testing split, we can actually go ahead and start creating and initializing some models and training them and checking their accuracy score, et cetera, to predict the survivors based on inputs for this type of notebook. So here we have my phase model, decision tree, got even nice visualization of a very complicated tree, as well as random forest and eventually even a neural network. You can go ahead and see also the model accuracy, train and validation, et cetera. So here I have the last cell in this notebook is actually basically comparing the accuracy between these models that are created here. And let's say we want to move forward with the most accurate one, could be the decision tree, could be the neural network, these appear to be pretty even. Let's go ahead and just move forward with the decision tree for now. What I could do is I could scroll back up all the way to the decision tree or we can navigate through the notebook with the table of contents, which is much, much faster to access the table of contents. First, make sure you are in the file explorer tab and then come down to the bottom and select outline. And this is going to show you the outline based on the markdown headers that you've created by default. So I can go ahead and just go back to my decision tree. This is the model I want to move forward with. And by default, the outline is actually going to show you just markdown. However, if you're somebody who would also like to see your code, you can change it in your settings as well to show your courses. So moving on, let's say I want to, as I mentioned, we want to move forward with the model that we have the most likely success with. So let's go ahead and actually gather on this cell. Now, what does it mean to gather on this cell? Gather is the exploratory extension that I mentioned earlier. And gather will basically analyze the dependency of the stuff that you have previously run in your notebook. And it will pick up all the lines of code that are necessary to generate that cell or that cell's power. So here we have a decision tree classifier we talked about. So let's gather on this cell. And what that's going to do is it's going to generate a notebook that will basically just grab the lines of code that are necessary to make that one cell. So here you can see our imports are a lot shorter and even our cells are a lot shorter because it's only grabbing really the essential lines of code that we're required to make that cell. Now, you can also customize this. If you prefer that this is not a notebook, you can go to your settings and you can actually have this export to a Python script. And we can go back to this notebook. One more feature I wanted to show you guys another one that is also coming with us from the old implementation is the ability to export to a Python script. So when you're ready with whatever script you may not necessarily want to clean with gather maybe your notebook is already clean and you want to get it into a state that's ready for production. What you can actually do is you can go ahead and click export and we have the options for Python script, HTML, PDF file for easy sharing purposes as well. So I can go ahead and just export this to a Python file. And here we have our new Python file. So I can actually go ahead and save this. Let's do save as. We're gonna call this titanic.py. Sounds like a reasonable name to me. All right, and then we have our titanic.py. So now we have our Python file and we can go ahead and come back to our notebook. Now, one thing that I'm really, really excited to show you guys one of the biggest benefits of making this change to the native implementation is that we finally have support for notebooks with the source control and get integration. That being said, I'm actually gonna go ahead. I'm gonna save this notebook real quick. I can close it for the time being and we can actually go ahead and open up the diff view of this notebook. For some of you who may or may not know this on basically under the hood, a notebook is a JSON file and the segments of those JSON files are comprised of three components. You have the input, which is where you write your code. You have the output, which is what you see. And then there's also metadata, which is data about the cell. And so when you're using line-based diffing tools for notebooks, that's really, really hard to parse the changes that you made in your notebook and be able to genuinely understand the progress that your notebook has gone through. So VS Code has created a rich editor diffing you just for notebooks. That will allow you to see the changes in your notebook very, very clearly. So what I went ahead and did is my first version of this notebook actually did not include the neural network. I went ahead and added that a little bit later. So here you can see the imports that I added after. You can go ahead and see actually deleted a line when I was doing some testing. But it's really nice because these box segments allow you to really, really understand your differences in your notebook fairly well. And probably the best part about this is sometimes you don't always wanna see all of the differences that a notebook may bring up to you. For example, if you run a cell a couple of times, the execution count will change and that would be considered a metadata change and get replied back. Or you may run an output a couple of times and the output ID and the metadata changes however your output may not necessarily change. So you can actually customize what kind of get diffs you want to be surfaced in this diff view. To do that, you will actually just want to come to the top right and hit the overflow menu and you can actually just select what differences you'll want to see. So basically you're fully in control of what you're seeing when you see it. Now that I have gone ahead and now that I've gone ahead and shown the custom diff view, I have all my changes. I'm actually going to go ahead and push all the changes that I have locally. And again, we can do this all through the VS Code UI now because there is support for notebooks natively here. So I'm going to go ahead and click on the plus side on the changes row. That's going to go ahead and stage all my changes. And I'm going to go ahead and provide a little commit message here. And we're going to call this updated notebook and high script. Then we're going to go ahead and click this little check that will commit for us. And here in the source control pane, I can click more actions and end up actually going to push to. And I can select origin. This is just the origin of that repo, which is the repo that CID is going to be using in the moment here. So now that I've gone ahead and pushed all the local changes that I made to the local repository, now CID will show you how you can actually further accelerate your model development using Azure through the Azure Machine Learning Extension. So thanks so much, Claudia. So now I'm excited to show you how you can use the Azure Machine Learning Service to accelerate your model development and training in BS code. But before I start with showing you all the cool stuff, I want to briefly talk about the Azure Machine Learning Service for all of you that may be unfamiliar. So Azure Machine Learning or Azure ML, as I may refer to it throughout this presentation, is a cloud-based environment that you can use to train, deploy, automate, manage, and track your machine learning models. This cloud-based environment can be used for all kinds of machine learning. So anything from classical ML to deep learning as well as from supervised to unsupervised. With the Azure Machine Learning Service, what I'm excited to also go into today is the Azure Machine Learning Extension for BS code. This extension is a companion tool to the service that allows you to tap into more powerful resources for your model training from directly within BS code. You can use it to manage, manage, list, use, create, and update all of your machine learning resources. So now here I have BS code open, as you can see. I can get to this extension by just navigating to the extension marketplace, searching for machine learning, and then clicking on the topmost item that you see here. Now, this is the extension page. I have it installed already. That's why there's the uninstall button here, but if I didn't have it, I could click install. And once I have it installed, you'd see that this Azure tab is created on the left-hand side. Now, if you have existing Azure extensions in BS code, you may already have this tab available to you. But what's important to know is that now you can use the Azure tab to open up the Azure machine learning pane, as you see here, and get a list of all of your Azure subscriptions. So now if you're not signed into your Azure account, you'll get prompted to do so because the extension will not work without an Azure account. But once signed in, you have all of your subscriptions listed here. And so the next thing would be that, what happens when you expand on a subscription? So when you expand on a subscription, as I'm gonna do now, you'll see a list of machine learning workspaces available for you to use. You can think of a machine learning workspace as a top-level resources that organizes all of the underlying resources you'll use for your model building, training, and deployment. The concept of underlying resources, I guess, becomes more apparent when you expand the workspace node, presenting a list of first-class resources and concepts in Azure ML. For the sake of time, I won't be going through each of the available resources, but I will do my best to explain some of them along the way. So now let's talk about what I hope to achieve with this service. So as I talk to you right now, I'm using my work laptop, which frankly is great for things like recording demo videos and writing documents, but not so much for training complex machine learning models. I'm interested in using the Azure Machine Learning Service to create a much more powerful workstation that I can use. So now what do I mean when I say more powerful? Well, I'm referring to a machine that has GPUs significantly more RAM and memory, sorry, significantly more RAM and storage and is highly compliant and secure, something the IT administrators on my team are very strict. So with the Azure Machine Learning Extension, I can list all of the available computes or VMs that are available to me. You'll see here that I'm interested in using a compute instance. Now, this is the equivalent of a personal workstation or machine that I can use for my iterative model development. I have compute clusters listed here as well, but I'll expand on those a little bit later in the presentation. Right clicking on a compute instance note presents me with an option to create a new resource. The Azure Machine Learning Extension presents a simple set of prompts for me to follow, eventually culminating in the creation of my resource. Now let's go through these prompts together. So the first thing that I'm asked for is a compute name. Here I'm just gonna input Euro Python machine and I can hit enter. The next thing I'm prompted for is the VM size. So this is where I can search for the machine specifications based on what I want to do. Right, as mentioned earlier, I'm looking for something that has a GPU and more RAM. So I can simply just search for a GPU here and I'm presented with a bunch of options. The NC6 VM SKU seems like a great one for me to use. So I can go ahead and select that one, right? That's it's six cores, it's 56 gigs of RAM, more than what I have on my laptop right now and it has a GPU that I can use. The next thing that I'm prompted for is whether or not I want to make this machine SSH enabled. So I know that making it SSH enabled may mean that it may mean that it might be easy for me to connect to my machine, but I also know that my IT team won't be happy about me managing my machine access through key credentials. It's just not auditable. So let me go ahead and select no. And now once I've selected this option, you'll see that the machine learning extension will immediately proceed to creating the resource for me. So it's going to take a little bit of time for this machine to get created, but luckily I have existing compute instances that we can use. So now you might be wondering, okay, I'm creating this machine through the Azure Machine Learning extension, awesome. I'm able to tap into a much more powerful resource with ease, but what if I want to use VS Code with this resource, how can I do that? Well, the Azure Machine Learning extension makes it really easy for you to compute, or sorry, for you to connect to this compute instance and get started working with it from within VS Code. So you can do that by right clicking on an instance. So here I have this EuroPython 2021 demo instance, and then choosing the connect to compute instance option. What this is going to do is that it's going to open up a separate window, which is a remote VS Code window. And now a couple of things are happening. We're just installing or we're establishing a connection between your local machine and the compute instance all from all through WebSockets. So everything is going through Azure Machine Learning control plane, everything is happening over WebSockets. It's a highly secure connection and we talked about auditability, right? So the key thing here is that all of the access management is done through AAD. So because I'm signed into my Azure account, I can use the credentials that are associated with my account and successfully connect to the compute instance. Now, once I'm within this compute instance, I want to navigate to a specific folder so I can hit file, open folder, and then navigate accordingly, right? So now what I'm going to show you is a couple of things that I can do once I'm connected to this compute instance from within VS Code. So now what does the remote connection mean? Well, it means that I have VS Code hooked up to the compute instance that I just created and I can use anything that VS Code allows, right? So I can navigate to the extension marketplace, I can look at all of the extensions that I have. These are automatically installed for me on the compute instance. I can use any one of these extensions. I can debug processes. I can debug Python files. Both of these are debug configurations that I've already created. And then I can also use things like the remote terminal. So in this case, what I'm going to start off by doing is I'm actually going to clone the GitHub repo that Claudia was previously working in, right? So Claudia during her part of the presentation, sorry, showed that she was making changes to the Titanic notebook and then she committed those changes and then she pushed those changes to the remote repository. Now what I can do is I can get the remote repository URL and I can simply do a git clone here and then clone that exact same data science repo so that I can immediately get started working with the same notebooks that she was previously in. So it's just going to take a second to clone. And now once it's cloned, what now I can do is I can navigate to that folder as I was showing you earlier. So I go to Azure Cloud Files Code. That's the data science repo, hit okay. My window gets automatically reloaded. And now with this window reload, we're again establishing the connection to the compute instance. It's really fast on subsequent connections because we've done all of this setup from before we're just reusing that. And then once we've done that, now we're working directly within the scope folder. So you'll see here that I can open up that Titanic notebook file that Claudia had, Claudia was working on earlier. When I open up the notebook file, it's just gonna take a second. But when I open up the notebook file, you'll see that the cells get rendered correctly and I can now use this notebook and run all of the cells that were there from before. So my notebook is loaded. That looks awesome. One thing is I can also open up the Python script that Claudia had exported from before as well. I'm gonna be using the Python script a little bit later, but for now I'm gonna work in this notebook. Let me select a kernel now. So I have a couple of different options here. You'll see that I have this Azure ML Pi 37, that's the one that I wanna use. So now I'm connected to this kernel and then I can just run all of the cells. So now each of these cells are running and eventually the entire notebook is gonna run the exact same way in which it did on Claudia's machine, but now on the compute instance that I was working with. So this is really awesome because Claudia was previously working on that repo and she made a bunch of changes, she wrote this notebook, she committed those changes and now I was motivated to use it, not on my local machine, but on the more powerful Azure machine learning resource that I had available to me. I created it with ease, I then connected to it from within VS Code. I'm now working in this remote window and I can just clone the repo and run all the cells as before. So the cell running will take a little bit of time as you're aware, but I can kind of continue on with the rest of my machine learning development and deployment all from within VS Code using the power of Azure. So now that I've kind of showed you how you can do iterative development from iterative development using Azure machine learning compute instances from within VS Code, I wanna now get into another part of the Azure machine learning service that may be of interest. So here I was working in a notebook and Claudia was working in the same notebook and she actually exported it to a Python script earlier, right? So now what I can do is I can use that Python script that she had exported, which I just renamed to train.py and I can run that Python script on a compute cluster in Azure ML. So what am I doing here? This is, you can think of this as kind of like a fire and forget operation for your model development, right? So you work on a notebook, you use that notebook against a more powerful machine and you validate that the notebook is behaving as you'd expect, right? So your model is being trained correctly. Now you wanna do more complex model training or more intensive model training, right? So you can think of that as like increasing the number of epochs to train with your neural net with doing a very complex grid search that's gonna take a really long time, but you want to offload that operation to remote resources as opposed to using your machine, right? Because you wanna continue, you may wanna context switch and work on something else and you don't want the model training or development to kind of be a blocking operation, even if it's a really powerful machine here, right? You just, you kind of wanna fully offload it. So what you can do is you can submit what's called an experiment to Azure ML. So Azure ML has this concept of experiments which are comprised of a couple of different things. You're the train, the scripts that you're trying to train with, the compute that you wanna run on. So that's the compute cluster that I was talking about and notice that it's a cluster, meaning more than one node. Here, the compute instance, it's just a single dedicated node that I'm using, but a cluster has multiple nodes that I can make use of which means that I can even do things like distributed training. And then lastly is the last two things would be data and environment. In this case, the data is part of what we're gonna be putting on the cluster ourselves. So I don't have to talk about, I won't get into Azure ML related data, but an environment I'll just quickly speak to, you can think of as an environment as a definition of your Python packages and libraries that gets materialized as a container, a Docker container on your compute cluster. So I've already created an environment. Let me just quickly show you my submission script. I hear like I'm referencing a compute target. So that's a compute cluster that I've created from before. I hear I'm creating an environment using an environment.yaml file. So if I open up this yaml file, this is a condo specification file that has all of the dependencies that I need in order for me to run this training script. And then I'm gonna change this experiment to EuroPython, right? And so what EuroPython 2021? And so now what's gonna happen is that the experiment is gonna be, when I run this script, it's going to run an experiment against this experiment name. It's gonna package my training script, my environment and materialize all of that on the cluster. And it's gonna run. And then what I wanna show you is how you can use both the Azure Machine Learning Extension as well as the Azure Machine Learning Studio to track your run. All right, so let's run the script. So I'm gonna create a new terminal. What's gonna happen when I create this terminal is that it's going to connect to the Azure ML Py37 environment that I wanna use. And then I can simply just navigate to the Azure ML folder and run Python submit experiment.py. So now this is gonna submit my experiment for me, right? So I've used the Azure ML Python SDK to submit the run. And I've included a couple of streaming here, a couple of things to stream here. You'll see that there's this WebView link. I'm gonna get to this WebView link in a bit. But before I even get here, what I wanna show you is if I go to the Azure Machine Learning tab and I go to experiments, the very top experiment you see here is the one that I've created as part of this run. When I click on this, you'll see that there is an experiment run here with the status, with the status. Dropping down on this also gives me logs and outputs that I can look at. So here, if I click on logs, there's nothing yet. But hopefully eventually the logs will come up. I think that what's happening is that the experiment is just being queued against the compute cluster. And in just a second, there will be an image build log for us to take a look at. All right, so the state of the run actually changed to running. And if I look at the logs, you'll see here that there are these Azure ML logs which I can double click on and then stream directly from within my VS Code console. This is, so this is really cool because these are logs that are being updated real time and I'm streaming them from within VS Code through the AML extension. So now I mentioned a different way in which you could kind of interact with these runs to look at both things like metrics and logs. And that would be from the Azure ML Studio. So you'll see here that if I look at outputs and logs, oh, there was a failure, that's okay. We can look at it, we can look at that a little bit later. But if I look at the outputs and logs, there is information here. It's the same kind of logs that I was streaming as part of my, it was the same logs that I was streaming in VS Code just a second earlier, right? So I have access to the same set of outputs and logs. And then if I was logging metrics as part of my run, I could view them from here as well. And then this kind of overview is a full, provides me with full details of my run from things like the environment that I was using to the compute target as well as well as the properties of the run in both JSON and YAML format. So yeah, let's get back to VS Code. So I just wanna quickly recap everything that we've talked about thus far. So we kind of started with Claudia doing her thing in notebooks and getting to the point where she wanted to transition, where she's transitioning it to me. And then the first thing that I was motivated to do was I said, I don't wanna run this notebook on my local machine, I need, I want more powerful resources. How can I quickly tap into those? Using the Azure Machine Learning Extension, I can quickly create a compute instance, I can make it, I don't have to make it SSH enabled in order for me to work against it. And I can establish a connection from my local machine to this compute instance. Now with, after I've done that, I can open up the notebook file, I can do things like debug, interact with the remote terminal, use extensions, run the git commands, everything that I would otherwise do in VS Code, but now on this remote machine. When I'm ready for the kind of that fire and forget operation, right? Because I want to context switch and I wanna do something else on my local machine while still training my model elsewhere, I can easily use the Azure Machine Learning service to do so by submitting an experiment to a compute cluster and specifying an environment for my code to run in. Once that experiment is running, I can stream logs, not only from within VS Code through the AML extension, but also from the studio UI, as I just showed you there. So that was it for my part of the presentation of how you can use Azure to accelerate your machine learning development. Now, if you'd like to join both Claudia and I for Q&A, please hop over to Matrix and join the Microsoft sponsored room or channel. We'd be happy to take any questions that you have. Thank you so much. Hey, so that was the last talk today. As you saw in the video, you can go to the sponsors, Microsoft channel and ask questions there. So, okay. Now we are going to have a few minutes, like one minute break or maybe because this video was longer than expected. After that, Mark is going to be doing the closing. So thank you everyone. See you later.