 for tomorrow but I'm starting in for Justin who's always sick. It's my pleasure to introduce Ben Sighetti from Microsoft and tell you a little bit about his background which goes on for pages and pages and pages which is phenomenal because with a background from aerospace missions with NASA all the way down to like IoT devices, fitness trackers, doing analytics, data analytics and things like that all the way now to solutions on Hadoop and now with Microsoft he's going to be talking to us about the Microsoft Cognitive toolkit. Good thank you very much. Kind introduction. I might be okay. Anyone here okay? I'd rather not use the mic. I use it for the recording. So right the Microsoft Cognitive toolkit this is a relatively new deep learning framework that has been open sourced over a year now came out of Microsoft research and it's already in production a variety of different Microsoft tools and services. So I'll just give you a background on how Microsoft's dealing with deep learning today. It's pretty much along a variety of whole portfolio of different products right? One thing that I might touch on later on in the period of time is the cognitive services that are available. These are pre-built deep learning models focused on computer vision, speech, NLP that are really available as web services accessed via RESTful APIs right? Everything done here was using these what we call CNTK rules it's formerly called CNTK as the cognitive toolkit. Skype translator itself, a lot of deep learning in the background there. Cortana the personal assistant. If you're not familiar with Siri and Alexa or the same vein deep learning back there. The Bing team uses deep learning extensively for search, for relevance and tolerance. This is again another very new product out of Microsoft's augmented reality tool if you've seen it you know what I'm talking about. All of this is coming out of the two Microsoft research teams, one in Beijing and the other in Boston. So I briefly mentioned the cognitive services, a couple screenshots of a few apps that were made a year ago on the left is Howl.net. If you guys remember it kind of went viral when it first came out. It's just a little app, a little HTML, I think the whole thing was like 50 lines of HTML and JS and it tapped into one of these cognitive services, the APIs. And it pretty much just passed an image and it would predict, it would first actually affect faces and from there make a prediction on the individual's gender and age. Second one, the middle one here is a captions bot, similar to, again part of the cognitive services, you just feed an image and respond back with a sentence, what it thinks it's actually saying. Thirdly is a prototype that was put together by one of our partners, this is Lieber. They're a refrigerator manufacturer and they started putting cameras within their refrigerators and pretty much detecting all the objects that you see to do pretty good with pretty good activity. Speaking of image recognition, so many of you might know the ImageNet data set, massive data set of just images, you know, cats, dogs, bicycles, chairs, that sort of thing. And it's a competition, it's been going on for a while, internally across a variety of different academic groups and other organizations that are pretty much trying to get to classify what they see in these images with as low as an error rate as possible, right? And in 2015 Microsoft actually hit, pretty much got the lowest error rate, low under 4%, which was on record all the time, right? Big jump from, as I said previously, where Google excelled. And so that was on the vision side, on the audio side, there's now a speech recognition, another sort of milestone here that's been done is being able to recognize just audio and be able to transcribe this with an error rate of 6%. And that's practically where humans actually start messing up. So we're matching human error rates in terms of speech recognition. And all of this, everything I've mentioned here so far, all these different use cases, research papers, competitions were all done with this, the cognitive toolkit, formerly called CNTK. You still see it at CNTK, GitHub at CNTK. So it's really easy to use, it's meant to be just something that you can come in, define a few layers for your neural network and not be bothered with the underlying infrastructure. It comes with a pretty rich set of tools, a library already built in, so this includes feedforward neural networks, convolution, recurrent networks, long short-term models, deep structured semantic models, sequence of sequence models, and this goes on. It's optimized for GPUs, you can still use on CPUs if you like, but really GPUs is the real intention. And not just a single GPU or one server's worth of GPUs, but this goes across a farm of servers powered by GPUs. This is the, so the parallelization is actually done with using the MPI protocol. And in terms of APIs, the whole library itself is a C++ library, because it's a source code C++. There is an internal language called BrainScript, which you can use, but now there are APIs available for C-sharp.net, of course C++, and Python. This v2.0 provided Python wrappers. It's built for Windows and Linux, so internally we actually did Microsoft team, so they might be using Windows for their development, but for actual big jobs, they throw it on a big Linux cluster. Right, so that's just how it's done, and it's been, of course, an open source, it's been active, the developers have been active on there since January of 2016, it's all on GitHub, so yeah, everything is available. And by the way, this is a tool where the internal and the external are tri-identical, so we're using the GitHub master, or branch perhaps, but it's all on GitHub, right? Okay, so a few things. A few things to point out, two important notes here. One, it's that you can build an arbitrary neural network. Okay, so you can actually define different layers, different types, using this framework. So you can say, and this is what's actually quite novel, innovative about this framework compared to some other ones like Torch, TensorFlow, MXNet, what else, you know? So here you can actually say, okay, I want a neural network where the first layer is dense, second one is convolution, third is recurrent, then followed by six dense ones and such and such. And you can actually stack them up as you wish. So you can get really creative on the architecture that you put together. And of course, try all these various permutations and see which ones really work best for you. There's a big gallery actually on the site, which I'll share in the end, of what different architectures that some have tried and were successful with. So you can take a look. And it's production ready. Like I said, it's used internally on a variety of different Microsoft solutions already, right? So it's true and tested from the production side. Of course, there's got C++. So any model that you build, you can just print production. There's a BLL you call from C++ and there you go. Imagine your classification regression. So it comes in sort of three steps. There's a reader, so depending on whether you're processing text, audio, video, there's a set of different readers available for the data ingestion. There's, of course, a network that you build, the multi-layered system, the architecture there. As I mentioned, again, it's optimized for GPUs, but CPUs work just fine. So you can just install your laptop and play around if you like. And then for the training part, the various different flavors of stochastic gradient descent. Because we are talking about three large data sets. So it has to be some stochastic functionality. So talking about building arbitrary networks. So here I'll give you a very simple example of a two-layered feed forwards or a dense system. So here, X is here, is our input, that's our features. There's a weights matrix, followed by I've added to a, and then I'll put in addition of a bias. That is our first layer, very straightforward. And again, it's just matrix multiplication, and that's why GPUs come to play. That sigma there is a low provision for the sigmoid function. For those of you who are familiar, it's a little mapping of the real domain to the unit range. Softmax is similar in function. And then the actual training, the criteria is cross entropy. That's a pretty standard approach. So here, the actual code, this is now in brain script. It looks like this. In Python, it looks very, very similar. Again, I'll give you some links on some code, some actual tutorials. Well, that's it. Two-layered network, plus the loss function that I defined, and I'm ready to go. So we just defined our whole network. I think of course, build these as long as you like, layer them on. You can bunch layers together and reuse these bunches. So you can go macro in terms of layers. In the Python API, I can actually put a for loop in, say, for I to 1 to 6, give me events layers, and then go. So really, really straightforward. And in terms of actually what the model looks like, so it is always a graph built. And this is something that's editable. So this is, in a sense, it's executed after the graph is built. So you can actually just hold on, edit the graph and then execute. So everything's editable. So here you see the weights. That's the matrix modification piece. And the bias is coming in every step. You can go and change those. You can bring in, so it's recurrent. You can actually bring in the previous steps. T minus 1, that sort of thing. So it's all available here. Pretty straightforward. And some, a few things that are, that's actually really important here is automatic differentiation. So for stochastic gradient descent, as you know, that's the gradient. There's the differential there. And here you're actually to go and define the differential. You just pass it your loss function and it will just too much go and auto differentiate for you. Yeah, big deal. And everything's cloneable. So as I mentioned, you can make a block, some bunch of layers and you like it and you want to use it elsewhere as a sub architecture of a big architecture and just clone that piece and then move it over. In terms of different layers, this is actually something I screenshot it from the GitHub 30 page. So it just goes through some of the different layers that are currently available. Dense, convolution, max pooling, everything goes on and on. So you can just play with these layers. So getting to something a little more fun, actually a little demo. It's talked about a common computer vision task, which is classification or more importantly object detection. So the example I gave earlier for the faces or all the different little products within the refrigerator. This would fall under this object detection category. So I'm going to play a little video. This is something that was developed with some work that was done by Microsoft Resilient in conjunction with the security field firm Alcatra. So they went ahead and they were pretty a lot of video detection tools. I hope the video goes here. Let's see. So first one, I think it's a spatial. There you go. So now this is actually a spatial recognition. It's fine even. So this is compared to a class that's still brought in and still recognized. And of course the match mentioned what's detected on the sunglass. Here's the hope, yeah. It'll actually identify you at some ender mask. So again this is joint product with the security firm. So there's actual use case that sets out there, right? And all of this would be put into the production. Here's a stocking. As ladies weren't stocking there. It's classifying all this pretty correctly. And it's an alert mechanism for when the fellow's coming in with a helmet. Again, it actually recognizes the individual. This one's pretty funny too. I think it's funny because it's a really cool set here. So right after the mask is all detected. Yeah, actually right multiple people, right multiple guests. So this is vehicle tracking. So here there is some space where actually the vehicle types are classified. And there's a velocity and color prediction made as well, right? So that's pretty neat against school. And by the way, everything's real-time, right? Everything we've seen was real-time. So this meant to be real-time. So yeah, event detection. So here you see, okay, so it's a little busy sort of unwrapped. There's a fellow actually crossing. And this actually gets tagged as a funny event. As a warning, right? Because that fellow is another piece. These guys get dropped off this Brazil, by the way, so... So yeah, that's how it was. So they're about to cross the highway. Again, that event is about to be tagged as... Right? Even if it's like a traffic die down. They're around here. So yeah, it gets tagged. Here's a similar one in a couple more. And this is a real concern, right? A real risk, yes. This is a real concern. Again, yeah, so all these events get flagged. This actually employs the city. Comes in pretty ballsy. Fast-forward a little bit. Give her a time, my friend. Yeah, helmet detection. I'm going to try and skip, maybe go through this. You already saw something similar. So again, depending on the scenario, if someone has come to the helmet, you want to tag that, send out a warning that some person shouldn't be walking around in the ways of the helmet. No need for that here. I'm going to go halfway through. I just want to show one last piece, which is abandoned object detection. Yeah, so the still frame first, the controls defined, when someone comes in and just leaves an object. There's a timer on there, and after a few seconds or something, they'll get flagged as an abandoned object. So it's ideal for any sort of public transportation, sort of a public area, right? If something's abandoned, you want to flag it immediately. Or another similar one. This one is where it tries to differentiate between, similar to the first where there was traffic and humans were caught in it, similar scenario. So flags, humans as alerts, but then when cars combine, those just go through as unsexuals. Okay, so let me go continue. So this was done, I think when there was a matter of three, four months, that these folks put this together, they already had the data sort of assembled, so the tough part in the sense was done, the data collection. And they just went ahead, started feeding this to a cluster of GPUs, using CNTK and built out a variety of different models. Each of those use cases had their own specific models. So currently there are quite a few, everything's on GitHub, including a variety of different tutorials, and using different APIs. There are even Jupyter notebooks available, so you can actually just spin up Jupyter, all the documentation's there, and just walk through these tutorials. Great resources. There's also Azure notebooks. Again, Jupyter notebooks hosted on Microsoft's cloud. These, by the way, are currently free. You can do your own Azure notebook, and there's a nice CNTK tutorial there as well. But more importantly, getting the actual use of GPUs. So on Azure right now we have GPU-powered VMs. These are NVIDIA GPUs, so they scale from an instance for one, two, and four GPUs on the same VM. So these can be used for anything, but in this case there's actually an image built by our algorithms data science team used internally, and then shared publicly with everyone else, which comes pre-installed and pre-configured with a variety of different machine learning tools, including deep learning frameworks. So if you see the second paragraph there, MXNAT, CNTK, TensorFlow, Keras. So everything is pre-configured, two on one on these GPUs, and you can just spin one of these things up. All the software's free. And yeah, just get busy during your deep learning work. A few links. CNTK.ai is the actual website. So as I mentioned, there are quite a few of the gallery of architectures out there. So what others have tried, where they've been actually really, really powerful, what's really worked for them. So you can just browse those and just clone it pretty much and go forward. And everything else, right? All the issues, the big wiki there too. And of course, all requests are very much welcome. So please, if you get involved, feel free to contribute. And that's it, actually. That's my brief little talk on CNTK. Thank you. Questions. Our writing, so this has been asked for that. It's in the works. I don't have a timeframe for you. So right now it's brain script, c-shop.net, c++, and Python. So the real time one is actually pretty easy. So that is, it's computationally, of course, not as expensive as a training bit, right? So you can have these, once you have the module that you generated, you could run them on CPUs. You don't necessarily even need the GPUs for the prediction part for the operation. So you could potentially actually have this on a relatively, you know, edge device, a relatively weak processor.