 We're gonna be kicking things off here in just a minute with a talk by Jeffrey Mu of Microsoft. And speaking of Microsoft, you all haven't noticed in the track name, should probably mention that Microsoft is one of our generous sponsors that makes EuroPython 2020 online possible. They've got some really great tools for Python development. I can say firsthand, they're awesome because I do all my Python development in VS Code. It's a fantastic tool. So yeah, we got people coming in here and so this is gonna be great. So Jeffrey Mu is a program manager at Microsoft working on the Python data science and AI experience in VS Code, which means he's responsible for building some of the stuff I like. Specifically, he focuses on making the lives of data scientists easier across our ecosystem. He holds his bachelor's degree in computer engineering from the University of Waterloo. He's a lover of dogs and Python, the language that is, he's still a little bit unsure about the snake. So Jeffrey, thanks for joining us. Thank you for having me. So where are you streaming from? So I'm streaming from Seattle in the US. So it's actually in the morning for me. Yeah, same here. I'm over in Spokane. So we're only about 300 miles apart. Okay, really close. Awesome. Yeah, is it warm there too? Yeah, it's pretty good weather. That's what I love about the West Coast weather, just really nice all the time. Normally you don't hear people say that about Seattle, but we kind of like the rain, sometimes. All right, well, I will turn this over to you then and rock and roll. Sure, thank you so much for having me. Let me just quickly share my screen. We're gonna be talking about how to supercharge your own data science workflow with notebooks, VS Code, and Azure. And like Jason mentioned, my name is Jeffrey. I'm a program manager here at Microsoft and I work on all the data science and AI tools within Visual Studio Code. And there's some of my contact info if you ever want to get in touch. So obviously I just want to start with a quick demo of what we're actually building today. And along the lines, like I mentioned, we're gonna be showcasing all the cool data science tools that we have in VS Code. So we're actually, I've actually created this website and it's already hosted in Azure web apps. And this is what our final product is gonna be. It's just a website that it takes in the image URL. It passes this image to my model that I've already predefined. It's also hosted in Azure and it will generate the pet breed detector and give me a prediction of what it is. So I can just go into search for images of my favorite pet breed, which is Shivas, because they're so cute. Let's just pick this image. I can right click, copy image address, go back to my website, paste it in. It gives me a preview of what it is. And if I click submit, it actually passes this image to my model and it gives me a prediction. So you can see here it accurately predicted Shivas. So this is what we're essentially gonna be building today. So how do we actually classify, how are we gonna classify dogs and cats and more importantly, pet breeds? Well, it's really easy for us as humans to detect the difference between like certain, especially between dogs and cats. But with pet breeds, you can see between these two, it's also, it might be a lot harder for even humans. So to do this, we're actually gonna be training a computer vision model to do the same thing. So as humans, we've been trained to essentially tell the difference between these pet breeds and dogs, but we're gonna be doing the same thing with our computer vision model. So to do this, we're gonna be going through our traditional machine learning workflow. So this is something that almost all data scientists go through. So we're gonna start with our data exploration phase. So if this includes things such as getting our data set, doing the data cleaning, we're gonna move on to our training step, which is where we're gonna actually create our training script. We're gonna be doing this in Jupyter Notebooks and VS Code, defining some compute. So obviously you probably don't wanna run this on your local machine or it'll take forever. We're gonna be running this on a more powerful GPU system. And then finally, we're gonna actually productionize our code and deploy it to Azure in the cloud like you just saw, I had that website hosted and this is our final step. So the first step like I mentioned is data exploration. So to do this, we're gonna be using something called the Oxford Pet data set. So it's a data set from the University of Oxford that has around 37 different categories of pet breeds and around 200 images of per class. So this is something we're gonna be using in VS Code. So we're gonna go back into our demo and we're gonna go into our favorite editor in ID, Visual Studio Code. So to actually get, you can see here in the notebook I've already created but to get the data science features within VS Code, you'll obviously first need to install Visual Studio Code which is completely free cross-platform. So work on Mac, Windows, and Linux. And the more important thing is you need to install the Python extension because that is where all of our data science goodies are held. So to do this, you just quickly click on the extensions tab and then search for the keyword Python. And it should be the first one that pops up and authored by Microsoft. And then you can quickly install it. I've already installed it for the sake of this demo. But once you actually have it installed, you'll see this start page up here and pop up. And here you can just do things like if you wanna create your own notebook, you can do it. But I already have an existing notebook which I click on this to create and sorry to open up an existing notebook and open up in what we call our notebook editor. So this is a brand new feature that we released near the end of last year. And you can see it's your traditional Jupyter notebook UI type view. So you have your cells, you have your input outputs. You have any output cells and it'll show below. You kind of mark down and everything. And what's really great about the notebook editor and VS code is that it combines the flexibility of Jupyter notebooks as a data science or just Python editing tool in general. So you can, because you can run cells on an order and everything. But with the power of VS code as editor.id. So one example of this is you have full and intelligence and auto complete support within VS, within your cells in VS code. So if I want to do, for example, import pandas as PD and then I can type PD dot and then it will actually, you can see a lot of these auto completions are popping up as I'm writing it. So if I want to also do OS dot, it gives me the top suggestions of what it thinks I want to do with this package OS. So if I want to make a path with something, I can type that as well. And then it gives me these suggestions, which is really great because traditional Jupyter notebooks often doesn't support this. Great, so now we can, the first step, like I mentioned with the data exploration is actually importing our data set into our local machine. So to do this, I have this cell. So the first cell is just some generic imports I needed to do. But the next part is actually importing the data set onto my machine. So I have my data set URL which is pointing to that same Oxford data set. And I've written a bunch of helper functions as you'll see throughout this presentation in my notebook, just to clean up the code a little bit. So you don't have to worry too much about the code, but if you're ever interested in learning more specifically of how all the code works and how these helper functions or what these helper functions actually do, I'll be including a link at the end of my demo and at the end of this talk to the GitHub repo so you can actually look at the code yourself and your own time. But in this code, it's all it's gonna do is it's going to download this data set and it's gonna save it into my local workspace. So you can see here, if I go to my files tab, if you can see, I have this images sub folder. I've already run the cells just for the sake of time. I've already pre-downloaded it and then separated it into its training and validation step and you can see all the different pet breeds are here. So once we actually have our data set downloaded on your machine, one thing we wanna do as a traditional data scientist, you wanna actually make sure that that data is correct. So just do a sanity check. And to do this, I've written this really simple function using Matplotlib. And what the function does is it will just plot random samples of each of the categories of images. So you can see here, I can just do a quick sanity check to make sure that these images look right. There's no like really weird images. There's no images of, I don't know, maybe like a snake or a pig. So we can make sure that these images are correct. But another really great way is we have a, we also fully support custom IPI widgets. So you can see this plot using Matplotlib, which is a really great library. You can see it's really static. There's no real interaction with it. You just look at it. But with IPI widgets, if we run this cell, ACAWidgets are essentially a more interactive plotting or interactive experience. So you can see here, if I want to look at my plots or my training images more interactively, I can actually scroll between and look through the different, look through my different categories. So it's kind of like a more customized interactive UI. And we fully support this within VS Code as well. Great. So now that we've actually imported our data set and then quickly looked through our data set to make sure that all the data is correct. Now we can start transforming our data into tensors. So like I mentioned, we're going to be doing this pet pre-detector and to train it, we're going to be using something, we're going to be using a Python framework called PyTorch, which is a really popular deep learning framework. But PyTorch doesn't actually understand these image files. It needs to actually be converted into something called tensors, which you can just think about as matrices of numbers. And we need to convert them to tensors before we can actually pass it to the training, pass it to the model, sorry. So to do this, usually we just want a Plasma Transform. So here I'm just cropping them because all these images can be different sizes. So I'm going to want to make everything uniform and standard before it passes the model. I'm going to crop it, a Plasma Transformations, and then convert it to a tensor. So again, I can run the cell by just clicking this run cell icon, or I can actually run the cell by we fully support Jupyter hotkeys as well. So if I want to use my shift enter, I can just type shift enter and it'll also run the cell as well. So with these transforms, it's often really easy to make a really quick mistake or it's hard to tell, it's really easy to make a one-off error because these are just numbers and you're applying these transforms. So a really great way to check to see if these transforms are actually performing properly is with a feature we recently added called Run By Line. So you can think of Run By Line as a simplified notebook debugging experience. So what this means is that, I'll just show you with the example, there's this Run By Line icon here. And what Run By Line does is it steps through your code line by line and it shows you a state of all your variables at each line of code. So rather than having to write print statements between all these, so maybe you have the right print image.shape. Instead of having to do that and having a bunch of print statements and just generally taking it more time, you can actually run this, sorry, you can run Run By Line and it will actually showcase, you can see it's stopping at the first line of code and it'll step through each lines of code and tell you the state of all your variables. And actually showcases the state in what we call our variable explorer. So if you can see, I opened up the variable explorer at the top of my notebook here. And I wanted to look at the variable image because that's the one I'm most interested in. So if I look at image, you can see right now it's just a JPEG image file because in the previous cell, I just opened up a random sample image just to test to see if my transforms are working. So as you can see, do you pay attention? Let me just make this bigger because I zoomed in on VS Code just so everyone can see better. But you can see image right now is around 500 by 375 pixels. So if you run to the first step is I crop it to 224 by 224. So you can see as this is running, yep, so you can see now the size is 224 by 224. So you can see the crop actually worked and I can continue to step through my code and if there was an error, it would just stop and tell me there's an error. But so far there's no errors, which is great. Then I next transform into a tensor. So you can see the image afterwards it transforms. You can see now it's a tensor type before it was an image type. So now I know that this tensor transformation is actually working. And then finally at the end it just finishes and it knows that because I got no errors and everything looks right so far that I can now apply this transformation to all my images. So I just wanted to test on one image before I did not everything. Great, so let me just close up my variable explorer again and the next line is where I'm actually, I've written this function previously to just do the same transformation but instead of just that one image do it on all my images in my dataset. Great, so now we just did the data exploration phase. So like you saw, we import our dataset, we did some data cleaning where we quickly checked all our data within our map plot live just to see if everything looked great. And then we converted everything into something that the computer and more importantly PyTorch will understand when we do the training steps. So now we're gonna actually head to the training phase. So again, we're gonna go back to VS Code for this. So in the training phase, we're actually, all we're doing is we're applying something called transfer learning. And to do this, we're just taking a pre-trained neural network. So we're taking something called Resnet 18 but we're actually just setting up the model within VS Code. This is all from PyTorch as well. And we're going to actually train the model. So I've already written this training function. And again, if you want to learn more about how this training function works, please check out the GitHub repo at the end of this presentation. But for the sake of this demo, I'm just trained it for one epoch. And you can think of epochs as iterations. So the more iterations you do, generally the better accuracy you're gonna get. So I only did it for one just to show a proof of concept. But you can see after training it for one iteration essentially, this kind of took 40 minutes almost to run, which is extremely slow because I'm running it just on my personal laptop. And I think we can do a lot faster than this. So to do this, we're actually, we can actually leverage something called Azure VMs. So Azure VMs are actually, we can leverage Azure VMs because they have GPU compute. And this is where these deep learning models actually thrive. So to do this, I just need to actually download this Azure VMs extension. So I can search Azure virtual machines. And then you can see here, I already have it installed. But once you actually have your Azure virtual machine, this extension installed, you can quickly go into your Azure tab. This Azure tab will now pop up. You can go into virtual machines. And then you can see I've created a virtual machine already for EuroPython. And then what's also great is VS Code has really great remote SSH support. So you can actually connect to remote machines and then do everything you're doing live on the same thing that you do on your local machine on that remote machine. So what we're actually gonna do is we're also gonna install the remote SSH extension. Okay, the remote SSH extension. And then from here, we can actually connect to, you'll see this button up here and then you can actually connect to your remote machine. And I already have this set up. But for the sake of time, I'm just gonna quickly skip over this. But this will just pop up a new VS Code window. It'll look the exact same thing as this as you saw already, but it's now gonna be running on your remote machine. And then on that remote machine, we've gotten, I ran the same training, the same exact same code, the exact same file, but I ran it and it's actually got the same accuracy. Might be some errors with my connection, but I got the same accuracy, but I ran it for 10 iterations, sorry. This is only one. So it took my local machine, it took 40 minutes to run one iteration. But on that remote Azure machine, it actually took around 15 minutes run 10 iterations, which is insane about how much speed up I got from using the remote machine versus my local machine. So now that we've actually trained our model, we actually need to save our model, which is the final portion. So I just have this basic function, which just saves the model. And once we have the model saved, we can start deploying it to the cloud. So now we're actually on, we've completed our data exploration and we've completed our training and now we're ready to productionize the model. So to do this, we have this really great feature that we just released recently called gather. And you can see the button for gather right here. And gather is essentially a dependency analysis and code cleanup tool. And what it does is it will just look through all the code in your notebook and look through all the cells and all the code and only extract the relevant code and lines of code, only extract the relevant cells and lines of code that are required to generate the cell. So if I click on this button, it will just give me a new file called the gather notebook. And you can see it only contains the code that is required to generate my model. So you can see it only contains the data set, the transfer of the data and the training. And it left out all these plots because it realizes these are just intermediate steps that were not needed. It left out a bunch of my markdown cells because that's not needed as well. And it left out a bunch of imports. So you can see it only kept the key imports as well. Finally, I just need to move this, sorry. So finally, once you're actually in this gather notebook to productionize your code, instead of having, you usually want to convert to Python script because you can't really do much with a notebook to deploy instead of having to copy and paste your code over. You can actually, we have a feature called export where you can just quickly export as a Python script and boom, it's now a Python file where you can start refactoring your code and change it into something that, you can basically refactor it and get it ready for production to deploy to the cloud. So speaking of the cloud, we're going to get to the deployment step now, where here's the outline which is just showing you how we're going to use this. So previously we used VS code and Azure virtual machines to leverage the compute, but now we're going to actually make use of these three Azure functions called Azure services, sorry, called Azure storage, Azure functions and Azure web apps. So we're going to be using Azure storage to actually store our model because we want it to be kind of like a microservice where we don't have to update the entire web app each time. We can just upload the model directly to here and our API or whatever will read from this microservice. The next step is we're going to be using Azure functions. We're going to next use Azure functions where Azure functions you can think about as an API service and this is where we're actually going to host our API endpoint for the model. So this is where the website is actually going to call it. And finally, we're going to be using Azure web apps which is the front end. So that's the front end. We saw at the beginning of the presentation of the demo where I hosted the website where I actually did the prediction of the pet breed. And what's really great is because Azure is so tightly coupled with BS code, I can do everything to deploy from within BS code. I don't need to go anywhere else. So again, to do this, I just need to install the relevant Azure extensions. And again, these are all completely free, these extensions. So you can just, you can see I already have them installed. So I have Azure app service, Azure functions, Azure storage. And the first step, like I mentioned is I want to deploy my model to the storage. So if I just open my store, just have, I can see, I have, I want to create a container to actually store my model. And I've already created a pet detector, but if you want to create your own, you can just right click and click create. And to deploy it, you just right click and click upload blockbub. And then from here, you can quickly scroll to where your model is. So I have my model folder right here. And it's a checkpoint.phd because that's what TensorFlow saves it as. And I can just quickly click it and then click upload. But again, for the sake of time, you can see if I've already previously uploaded this, once you click upload, it'll show up here. So now that I have my model in storage, the next part is actually creating my API endpoint. So to do this, I will go into my functions tab and I'll create a new function. And you can see if I go into my folders, my functions actually contained within this inference subfolder. And you can see there's kind of a lot of files here. It might seem a little bit daunting, but what's great is with this Azure functions extension, you can just click this one button here, which creates a new project. I can define where I want it to be. In this case, I created the inference one. And here is where it automatically generates all these files for me, so I don't need to do anything. And this predict.py is the only thing you need to create. So when you actually create the template, it's kind of blank. But once, this is where you're actually editing to make your custom API. So you can see here, this is where I'm actually just getting the image URL. It's passing it through the model that I'm getting from Azure Storage. And then it's returning a response of what that prediction actually is. And again, once you have your Azure function created, you can quickly just deploy it by right-clicking my Azure function, click deploy to function app, and it'll just deploy to that function app. Again, I've already deployed it for the sake of time. And again, you don't have to worry too much about the code. I'm gonna be linking the code at the end of this presentation if you want to take a close look in your own time as well. Finally, the last part is app service, where we're gonna be hosting our front end. So here is where we're gonna be hosting that front end. I just created a basic, I go into, I just created a basic HTML file. That was what you saw earlier for this. And this basic HTML file, all it does is it just calls my API that you just saw previously on my Azure functions and then just returns the result. And again, super simple to deploy to this. I just go to my Azure tab, I can right-click, create a new web app, but I already have one here called JME PyTorch web app. All I need to do is just right-click, deploy to web app, and then just point it to that website. So I have this website and then you can see this folder, all it contains is that index HTML, just one file, click select and then it'll deploy to the cloud. So those were how, and as you can see, all this was done within the context of VS Code. So I'm just gonna quickly jump through a quick summary just because we're running low on time. Yep, so we started off with their data exploration. We did this with the Python extension within Visual Studio Code. We next went through our trading. So here again, we stayed within Visual Studio Code, but we used Azure virtual machines for the compute. So that's what we leveraged our GPU compute to speed things up. And then finally, to productionize our code, we used our different Azure services. So we use, you can see Azure functions as the API, Azure storage, the store model, and Azure web apps to actually host our front end. And the key portion is this was all done within VS Code. So this is why VS Code is so great. It has all our data science tooling and needs, and I didn't have to leave VS Code for anything. So what's next? Here's just the link for the GitHub repo if you're interested as well. And as well, if you want to try our VS Code notebooks, you can just go to aka.ms slash notebooks. So these are the only two links you need to remember if you wanna remember this presentation. And that was it. Thank you for putting up with the technical difficulties in the beginning, but thank you so much for your time and listening. I'll leave the slide up if people wanna take notes as well. So I think you're muted. Oh, thank you. I'm sorry about that. So thank you so much for that. Yeah, so Microsoft, Harry has a question. He said Microsoft has announced the Azure notebooks will be discontinued soon. Is the VS Code editor recommended to replace the Jupyter IDE? So the Azure notebooks service is being discontinued, but we're replacing with something called the Azure notebooks component, which is gonna be basically a version two or a better version of Azure notebooks, but embedded within different components of Azure services. So one example could be like the, within Azure machine learning, but I would say that code spaces, that's something I forgot to bring up because I didn't have time, but I can quickly show it right now. I don't know if I'm still sharing screen. Yeah, but there's a thing called code spaces, which is VS Code in the browser. And this is a really great alternative to Azure notebooks, which you just have the same notebooks experience that you just saw in VS Code, but now it's within your browser and within a virtual machine. So it's just called code spaces. Awesome. Well, again, thank you very much. Thank you. Thank you.