 Hi everyone. My name is Amit Kapoor. The topic that I've chosen today is deep learning in the browser. I want to talk it through the lens of what I do, which is I work in this intersection of data, visuals, and stories. I teach data visualization, I teach data science, I've been teaching machine learning, I've been teaching storytelling with data over the last eight years. And I teach that in a number of different contexts, including academia, including industry, including people who are really starting off wanting to learn and enter into this field of what we call data science or in general, AI right now. And one of the challenges that I really find in helping people learn is actually to move from what is a traditional classical programming or a traditional data analytical programming background to what is the learning paradigm, right? So what we're doing right now with machine learning or deep learning is essentially trying to bring in this learning paradigm of including probabilistic thinking, including how to deal with uncertainty, how to deal with things, newer algorithms, newer, and we want to build that knowledge into people and help them actually do that, right? So there is a big transition from how people have been doing things to when they start to transition and start to use these machine learning algorithms or deep learning algorithms as users, as creators, as builders, there is a different requirement that is there, right? Some people refer to this as software 2.0. I refer to it as just another module that will go as part of your overall stack. But the more challenging part is not building, this is challenging is kind of understanding what the learning paradigm is, right? And when I think about, as we move to this learning paradigm, we are not really trying to remove humans from the loop, right? So we're really trying to augment intelligence. We want them to use these new tools, these new techniques in a way that helps them argument their current workflow, in a way that helps them make better decisions, right? And the way I define AI and I actually don't like using the word AI is just really augmenting intelligence. And what we really want to do is we want to augment intelligence is think about the human in the loop. So the person who is trying to do this, so trying to use this deep learning or machine learning algorithms and who actually are going to be working with it. And there are three kinds of audience from my experience that I found. One is the users, so the end users who are actually going to be seeing the output of this, who will be either seeing the output of it or there's a user who is actually interested in learning this whole technology, right? So how do I target these users in terms of helping them understand what's happening, right? So there is a user. There are creators who want to use the output of this output of these topics and build tools on top of that, right? So there are these creators who want to use these new algorithms and build on top of that. And then there are, I'm calling them builders, we probably call them coders, people call them deeply learning engineers, but we really want to address this third people who are really trying to build something using deep learning, right? So there are these three different type of use cases that we have, right? And in all of this, they're trying to either learn, which is what the first is, I want to really learn and understand in my use case, whether it's business, whether it is my learning how to do deep learning. I want to really learn. I want to play with it. I want to take the output and actually do something with it. Or I want to create this new things, understand the problem space and create new things with it, right? So I either want to learn, I want to play, I want to create. How do we do that? Right? How do we do that right now? Right now, what we do is really hard. I make this joke when I teach that I don't really teach anything which is less than five year old or sorry, it has not been in, in work for at least five years because till that time people don't really know what they wanted to do with it. And trying to teach deep learning or trying to get people to understand deep learning has been hard, right? Setting up an environment to do this, setting up ways to, to use this into production, the tooling around that. All of this is really hard and it's only now we are reaching a stage where this has become easier but still not, not easy enough, right? The reason we want to talk about the browser is in the browser, is it possible to do some of this stuff? Is it possible to learn? Is it possible to play? Is it possible to create things? And what I want to talk about is what are the possibilities right now because we are still at the start of this. We can think of this as doing deep learning in JavaScript but that's not really it. It could be some stuff done in the JavaScript but some stuff done in the server but the idea is basically can we move some computation, some part of this learning into, into the browser and really make it possible, right? And is this really possible? And for a long time this was not really possible. It was not really possible but if we can make this happen right now, we provide people an immediate access so you can actually do something with it. I can go to a browser and I can start to do any of these activities. I can reduce the friction that it takes to right now do anything. I've been doing a two day workshop on deep learning for the last, the last two days and just getting people to start running machines to start learning has become easier but still not easy enough to do this stuff, right? And it can help us reach a wider audience which may not be just limited to deep learning to engineers but a wider spectrum and helping into trying to think about democratizing or making it at least accessible to other people are interested in. Does that make sense? Yeah. So for a long time we didn't have a lot of stuff in the browser to really do things that were possible earlier, right? So most of the earlier libraries for doing this were CPU based. They didn't really have any kind of accelerations really slow on that. Bulk of deep learning is tensors or numerical operations on high-dimensional matrices and there wasn't good numerical support to do all of that. But it's just not only those things that there's a huge ecosystem you also need around how to get in data into the system. How do you read these libraries? How can you visualize them easily? Not by doing hard stuff like D3 but more easily and how can you actually play with it in kind of a notebook kind of environment that we are used to right now in when we do when we learn do this in a kind of Jupiter type environment. So this was a really hard question, right? So let's look at what what what has happened. So we now have WebGL Accelerated Learning Frameworks. So WebGL is the closest to compute closest to GPU compute that we have at the moment on the browser. So browsers do not have a direct compute. You do computation on the on the browser. You have to translate the code to WebGL shaders which is called. So you need to write a little low level code WebGL code to translate your numerical operations into shaders and then you can start to use the same kind of GPU acceleration. Whatever is provided with your browser that is there, right? So you get access to it. There's people are still talking about how to get direct compute access and when in if the specifications go through and that happens, it probably will accelerate a lot more of what we are now trying to get compute access through WebGL which was originally designed for graphics, much more for graphics rather than direct compute, right? So we now have WebGL Accelerated Learning Frameworks and let's see what what the potential for doing this is, right? I'm going to just mention four of them at the moment. There is TensorFlow.js which has just come out or recently come out in a in a in the last couple of months. When I talked about this talk a long time back or when we're thinking about this talk, there was a previous framework called Deep Learn which was much more academic. But now TensorFlow came out which is in not yet in feature parity to everything else you see in Python or C, but it is improving some extent and you can start to use it today. The code to write this is fairly similar to what you would write in a TF Keras style. So at least if you want to use the Keras style, you can do that. If you want to use core layer style, you can do that, right? TensorFlow is another project which has shown very performance, good performance, but it's still, I don't think there's an open source it out there, but that's potentially one of the ones that could come out. And the two other ones that I would recommend is WebDNN, which is basically allows you to which is even compiled to Wasm, which is let's say compiled closer to writing C code and compiling it and making it run on the browser. So it has it's got got much better performance on running things. So if it is, if you want to use for inference only, you can use WebDNN or obviously there are some wrappers on top of other libraries which do that, which is for inference for Keras or MXNet, very specific Keras.js MXNet also I think has an implementation on JS, but they are all the lower part is all inference only, which means you can run them for all model inferences, but you can't really run them for, you can't really run them for training on that, training on the browser. Okay, so yes, there is some possibility that we can, trying to see now right now, deep learning if you want to do the browser. So let's go back and address if you, if you believe this thesis that there is a possibility or if many of you are like, I just want to stick to the server side, that's fine. How can we do? So if we take the three aspects, learning, playing, creating, what is the possibility, right? When I think about learning, doing things in the browser has one really key aspect which we want to harness. We want to really talk about explorable explanations. How many people have heard of that term at least explorable explanation? You know, Rasa Gay has. Okay, the idea of explanation is that when I want to learn something, I want to really be able to interact with it in a very active type of way, right? So I want to discover how things are working. I want to discover by active learning, right? And this active learning is not the same as the deep learning active learning. Let me make it. This is active learning in the sense that I as a learner can interact with the medium and learn from it, right? That's that's basically the idea. And this whole concept was done by Brett Victor or at least coined five years back where he really talked about when we do programming or learning any system, can we make it interactive? You know, can we make it really interactive? Allow people to learn and understand the system in a deep way. I would probably have if known if people had not aware, I would have put more slides on it. But the idea is how do you discover active learning? So what are the what do you want to discover in deep learning? What do you when we're trying to learn? We're trying to learn. We want people to develop intuition around what's happening in the deep, right? Whether I'm a user, end user of it, or I'm actually one of the builders, creators who are a new student that's really trying to build this, right? I want to get that intuition to go on, right? So what how do we build intuition? We want to build intuition on these three levels, right? We want to build intuitions on the algorithmic level, right? There are algorithms that are running. How are they running? What is the possibility of when do the algorithm is? I want to build intuition on the data that was going to go into the into these algorithms. And I want to build intuition in what we call the model, which is the interaction of the data and algorithm, the state of possibilities that are there when the data and model plays. You can think of that as simulation scenarios, but I want to interact with the algorithm data data models if you want, right? I want to interact with all these three levels, right? So let's look at a few examples of what is possible right now, right? Or what we started with. This is a very simple example to read of visualizing algorithms by Mike Bostock, which is this is showing a random depth search algorithm. And this article is a long one. And this is a very big tradition or when people trying to learn algorithms is how can I learn algorithms by understanding what's really happening in this, right? In the job. So, you know, this is a maze generation algorithm, but in technical term is a randomized depth search, depth first search, which is what the algorithm is really doing, right? So if I want to apply this to deep learning concepts. This was the first one that many of you may have seen which is done by Andre Karpati on this. And it's everybody's going to see as 231 and Stanford course, the main cover also has demo. It was a covnet.js, which was written in really allowing people to interact with, really interact with it. I think the GIF is not working, but it should be really interactive, right? In 2016, we had this option, which was now allowing me to look at the algorithm and understand what's really happening, which is the tinker with neural network. This is all of these were done with what are called toy deep learning libraries in the sense written on top, which are not really applicable or possible to use if you want to use for yourself and try and do that, right? But now with some of these other libraries, you can actually write an interactive model very easily with like 10 lines of code and actually using a few other interactive run frame environments, make it really interactive for people to play with. So I can actually put in the number of iterations, put in sliders in easy way for people to learn and interact and say, what's really happening in this algorithm, right? Writing this was really hard. We can now actually do this and build many more of this example. So we can help understand and build, use the JS part if you want on the browser to build these explorable explanations, which help get people to understand what the algorithm is doing or algorithms are doing. And I'm only focusing on deep learning algorithms, but there are many algorithms and including probability, not only looking at the frequentist approach, but the probabilistic approach and other parts of learning and statistics that we don't really talk about so much in this conference. Okay, we also want to be able to explore the data, right? So how do we, how do we explore the data really fast? And two of the ways, which, which I really like, which are really very powerful ways to explore the data that's going to go into this is, I need to build an intuition on that, right? So this is facet dive. It's again, WebGL accelerated visualization, but really focused on understanding individuals, individual elements that are in my tabular data, right? Still tabular data, machine learning, key part of the input, I can actually run this on the browser, input data into it, and it takes data in type tarry, which is different from it can load data in type tarry, which is different from typical JavaScript, which is written in the JSON format. And I can actually start to read it, interact with it, and like a load up to 10,000. I can also do images on it. And I can see all the images displayed, and I can segment them by category and start to look at my data space that I'm really trying to do. So that, that part is really important. And if any of you have used dimensionality reduction, then TSNE projector or PSA projector that runs, if anybody has experience in doing it, using TensorFlow, there is also embedding projector as a stand-in where I can enter my tensors and it can look and visualize the data really fast, right? And one of the benefits of doing this is we've actually, they've improved the algorithm for doing TSNE much faster now. They figured out a linear approximation do it so it runs much faster on the browser. The challenge when we're trying to teach, and I teach this a lot, is why would I prefer to teach TSNE for one is because it's easier to visualize. It's easier for people to interact, right? There are other algorithms and a lot many algorithms, but I don't have a easy way for helping people understand what happens there, right? They either have to go and write code, but here I can provide easier access for doing this. And this is all running in the browser using JS, right? We don't know enough about how, so we know, we want to look at the data, we want to look at the model, but at the end of it, we want to use that output for people to interact with, right? And learn something. So what is really happening inside my neural network? This is the most common question I get. Can you tell me how the neural network is really performing? And there are concepts that we talk about feature visualizations, spatial activation that can be done, but it's really hard to do that element by element and there is no either easy framework to do that, right? But this is an article written which runs in the browser talking about building blocks of interpretability. So it is not really, it's a research article on distil, but is interactive all the way through running TensorFlow. Yes, you can interact with different images and see feature visualization and understand this concept of what is happening when I do in doing spatial activation, right? What is happening when I'm doing feature visualization? Can I move from looking at a neural network at the level of layers, but instead segmented differently in different channels? So it builds a whole different framework of thinking about this. And yes, you don't necessarily have to use this, but the framework out of this, you can also go and use in your other environments if you want. So the framework is called Lucid, which is for visualizing the feature visualization. And you can actually use this also in your traditional environment if you want pieces of that. Or you can build something now that I can actually understand and explain to people what's really happening when you are using an image network, right? So moving from just algorithms to data, but then moving from the interaction of the data and the model, which is where what we really want to understand and explain to users, to learners, whether they are people learning deep learning or whether they are actually going to be users of it. And they have this question of can you help me understand a little more of this, right? I'm going to touch about one more topic, which is in user, which is, okay. Yeah. So this is this is again, trying to explain what's happening in my model in terms of my business outcome, right? So I want to build a model. And, you know, there are lots of questions around fairness. How do I make my model more fair? I try to do this visualization of trying to understand people understand classification just for pure ML and try and get them to build something like this in their exploration of Jupyter notebooks or in their work that they're doing. But if we can actually get something like this out of the box or easy to create, and it actually allows me to tune the model, run it, and I can experiment with fairness or different strategies that I want to do to check whether my model is biased and all. That's actually something I can as a user, as a business use and then interact with it, right? So it opens up also for my user to actually see what happens when I make different decisions on it. Because the deep learning part is not just about interacting with creating the model. It is how the model is going to be used in the end to do this, right? And the end user's questions on how to interact with this is possible to make it. So this is loan decision. But you could actually apply this to any kind of scenario. And with the availability of a neural network, you could run a very simple MLP, which is driven, which is running probably in the network also on the browser to help them to interact it, right? So the possibilities are really high on that. Yeah? So this is building these visual explorations. Explorable exploration is actually requires multiple skills, right? So A, they are visual in nature, which is why probably it's my bias. I'm attracted to them. I use them as a way for helping people to build intuition. But they are visual. They are reactive. I can play with it and understand what's happening. And they provide me an image of actually interacting with it and being able to get that out of it. So if they're not only reactive, I can actually go to my browser and understand what's really happening around it. Whether I want to understand algorithms, whether I want to understand my data, or I want to understand the data model and the outcome and impacts of it, right? The challenge here is obviously none of us may be fully equipped to do this one. So then because it's a multi-disciplinary skill. But there is a possibility if you are doing this in JS because of the availability of a lot of visualization capability that's been built for custom visualization, it is easier to integrate what you're doing on the browser with this, right? And I'll talk later about how to do this, because that's the other important piece around doing it. But it's possible to build this much more fast than now than we can. We could do earlier. But it still requires us to think beyond just as a people who are training deep learning networks and doing hyper parameter optimization. We have to expand the scope of what we're trying to do and use this as a tool to actually do that. Yeah. Okay. How do we create? Right? So I'm gonna, how do we create? So how do we create is really allowing me to do model inference, model inference, right? So how do I get create? I want to be able to run whatever I've created as my neural network somewhere. I've been able to get people to access it, right? Easily on on on the browser. It flips this whole question of instead of bringing my data, sending my data to you, I want to bring the model back model to the browser, right? Which is actually a very pertinent question in these days, because we have a lot of topics on privacy, we have a lot of topic on how we do it. Is it possible for me to just send the data model, just send the model to the browser and and actually pass the compute there and the compute for the inferences and is in orders of magnitude lower than what it is for training it. So the compute can very well run it, provided we do some adjustments to it. So we'll talk about that, right? So I'm gonna talk about only abstract data, but the real interesting stuff of doing DL in the browser is actually happening with perceptual data, where people are, or let me say abstract data is like tabular data that we think about, but perceptual data video images is actually a very active area and I didn't want to go too deep into it, given the focus of the conference, not on creative coding, but I'll mention that at the end, right? So model inference and text. There's a recent competition also on this is using Keras GS, but you could actually deploy this as a common tagging system to flag comments as either they are toxic, not toxic, and you could build your model and you can actually deploy it and people can start to expert. You can give them immediate feedback as the type, whether the text that they're doing is really helping the conversation, not helping the conversation. So sentiment of it, right? It's possible to flip that around instead of saying I want you to send this information and run the model on the server. I can actually in many cases run this on on the on the browser itself, right? The standard MNIST example, I really thought I should drop in one MNIST example in a deep learning conference, but you can also do inferences in this way where you kind of type this, you type this and you can actually start to see the output. And there are many use cases of this Airbnb has a product in which you are actually sketching out as you talk a user design and I have a very small neural network trained on about 150 components. And on the right, I can actually start to see the component design in a fully built up UI with react working prototype coming up. So my sketches ideas on on ideas on what I'm done of what I think the UI should look like can be prototype in real time because I have a component architecture and I can train a very small neural network and start to build us, right? So the exposure of going from here to that immediately. In fact, it really helps me also build very, very different products than what we have, right? Neural transfers, style transfers are obviously far easier to execute. There is a library called even now, which is ML, ML five JS, which is based on P five.js, which is for creative coding. And a lot of these standard examples of loading data and running it on the browser are even more simpler than what was is for TensorFlow days because they're designed for a very different audience who are dealing with creative coding. And it really has a very easy way to actually start to build these products if you want. Two things, two thoughts that I will add to this, which are slightly, which have not been done is can I do data augmentation in the browser itself, right? And use that as a way to train it, right? But even more interestingly, can I can I collect data, which is a very hard problem that we all face? Is it possible for me to collect, you build a simpler model, which is semi supervised. And as people are, I think the image is not shift right, but is this is how quick draw was collected. In the sense, I got people to draw interactively. And as I would, they were drawing, I would give them feedback of whether the drawing is correct or not. And ask them later to provide their own input on it. And that really helps me. That really helps me to collect data in it, right? So can I run model inference, not just for running inference, but as as a way of running semi supervised learning on my browser and allowing people to resegment data and run it. That's a very, very important use case. And data collection, data labeling, which is a huge problem can can be solved by this, right? We were doing those workshops yesterday about Italy Dosa Vara classification, image classification and way to go and collect 300 images on ourselves, for ourselves, label them, it's them just for doing a one day workshop, right? And in real life products, you have even more challenges to do that, right? So labeling, which is a real problem. Can I run this? Can I run this for same for text and run semi supervised models as a way of getting information and from the user to do this, right? I want to mention these two things, because the far more exciting work in model inference is done in the art domain, which may not, but if you really are interested in looking at very chill and very chill or auto encoders and and GANs for music for art and what people are creating check on magenta.js, check out ML 5js, where people who are much more in the creative coding sites are actually doing this, right? There are two issues on this. One is obviously data versus model privacy. When I emphasize in data privacy, I'm basically deemphasizing model privacy. In the sense, people may not be willing to send their model to the browser. So that's a big issue that that is really a problem for you. But it helps in data privacy, but obviously model privacy is is kind of not damaged, is not maintained. And there are no easy solutions for that. You also have to think about how to make the model size smaller. So we won't run an image. We will run smaller models. We may have to quantize them. So example, like from instead of using a full vector, I want to use word to bit to quantize it or other ways to quantize it, which also come with the libraries to do that, right? It also allows me to build applications rapidly if I want. So if you are into the space and you're building web-based applications and you can you can deploy them as electron apps, you can deploy them as mobile apps and all that. So if you are in that space where that model is part of your output that goes into it, then it's easier for you to use and or easier for you to build, use some of these libraries to actually build products on it, right? Especially important for low latency, low latency requirements where you want to capture the data immediately, get the inference and because it can run even if the connection is not there to the browser, you can save the inference and then send it back later, right? So low latency applications requiring low latency or no or where there's no connection, you can still make it work because you can run it in the browser. I'm going to cover one last thing, which is how do we build this, right? The best, how do we build this? We want a tool to rapidly prototype, right? I want to learn this. I really want to learn this really fast. I want to experiment and try new things and that's easy to do. You can obviously build a UI on top of it. So this was the original deep learn.js UI for running. You could just add convolution networks, put some hyper parameters, add your own data and you could train it. It was just like a demo, but you could build specific UIs for it if you want. But for a lot of us, at least for me, code is very expressive in a way that probably building UIs may not be. Building UIs will work for your domain. So if you have a domain specific problem, you can go and do that. But building, you know, can I get an immediate environment to write code? And I really want you to check out observable, which is a reactive notebook very reactive notebook on the browser, free to use. You can just go and start typing. You can actually explore a lot of the stuff already. You can look at how it runs, visualization, explorables, maps. If you search for tensorflow.js, you'll find enough examples there on tensorflow.js. It is a reactive notebook which runs, which gives you full access to the entire JavaScript environment for one. You can load in TensorFlow or any other library you want, and you can start to run it in it. And it is very similar to Jupyter if people use, but two things. Instead of linear execution, it is a cycle, it's a reactive execution. So it maintains the state of your data and updates every cell whenever the change happens, which is really good for asynchronous programming. Anybody who's done, it maintains the state, it's easy to do it. And I would really, it basically allows you to share those notebooks and build stuff and really prototype. People have really implemented new papers on observable to understand if something new is coming in deep learning. How do I understand what's happening and can people explain it? Right? I'll just quickly wrap up. Training on GPUs can be done if you integrate with Node, so you can get both the benefits if you really want to. Obviously this is very young ecosystem. So some of the custom stuff that maybe you should be building or feature parity with what is the APIs in Python and see may not be there. But the ecosystem is really improving. You should check out Vega and Vega Light, which is the easy way to do visualization, observable, if you really want to do the reactive runtime environment. And if you're really into sending a lot of data, then you should check out Arrow, which is a columnar in memory format to send data very in a compact way. So I'm really excited about it because it fits my way of helping people learn, helping people create, helping people build things. And if as the ecosystem goes better, we get more stable libraries and I can show you better use cases to convince you to do this or start experimenting with it, then it would be it would be even better as a talk next, next, next year when we talk much more about the possibilities and people have built stuff on it. Thank you so much. I'm pretty much at amitcaps.com and I'm pretty much amitcaps everywhere. So if you want to ask me any questions right now, that would be great. If you want to do afterwards, you can reach to me at amitcaps.com or at amitcaps on Twitter. We have time for like one or two. Yeah, sure. Questions? Yeah. Yeah, yeah, go ahead. I think yeah, it's on in creative coding part. You said how you could do art with images and you also mentioned in passing music. So I was really curious how that works. Right. So I think you should check out really. I didn't want to cover this because I'm I'm already talking about a topic which is a little more out of maybe people's interests. But check out and I want to go to check out magenta.js. They have a library called music.js, which is basically in trying to integrate deep learning way techniques and deep learning models like variational auto encoders and gants to generate music. Right. It's so this really focused on art and music. So that's what I would really recommend or you should really look at P five. Sorry, not P five ML five JS, which is focused on the creative coding community and you will see a lot more of interesting stuff. I could put 50 examples from that space, but given the audience, I didn't really want to go too much into creative coding. What are your thoughts on distil pub and collab by Google? Yeah. So a few of the examples I showed from distil. I really I really buy the idea of research debt that you know, there is a huge in a deep learning field, especially which moves really fast. New things come and unless we have an interactive environment to understand research papers or what people have done, it's really hard. Like people will put up code, even reproducible code and GitHub. But just to get that working for my own machine is really not that easy. Right. So if something new comes and people have done some tweak, how do I really understand that and how do I see that as effective? I have a simple example, which kind of runs very easily, and it's executable where I am. I don't need to install anything. That's great. Collab is great. Like collab is really great. It's a little slow if you run GPUs for sure, but it's free. I think for me, the question is that if that becomes like available for always Gmail equivalent, then maybe that'd be interesting. I don't know how long it'll be free, right? It kind of takes care of your infra for this. And it is really collaborative and all. I really like it, but I don't really use it because I just find it at the stage very slow to just do normal stuff, not in GPU. This normal stuff is very slow. Hey Amit, great content. I want to know what is the audience best benefited from these UIs? Like because let's say if there's a data scientist, he might not need this UI to just tweak the hyper parameters and to see the output, right? Right. So what is the audience which is best benefited from these UIs? No, so I think I think that's my my bias in in kind of teaching is your job as a data scientist or data engineer is not tweaking you. You know, it's not always tweaking just hyper parameter. Yes, that's part of it. But building the case of trying to explain to people how, why what's happening? Because those questions will ultimately come back to you, providing people a way to understand the models that you've built. They are I think we are with they're not just UIs. Yes. I mean, the UI capability can be enhanced because we're talking in JavaScript. So, you know, we have far more. We don't need to write a wrapper that we would write in Python on R to access it, right? But I think of it as more as communication, which is an essential job for any data engineer or data science deep learning engineer. How do I communicate my results? And how do I communicate in a way that people understand how you build that as a way? I think really allowing people to play with it in a simpler scenario. We haven't even explored, you know, the search space of this possibility at this moment.