 How many of you are first-hand coming to the finals of mid-term? Keep coming. We do this at mid-term every month. We are quite regular with that. Let's move on to our first speaker. We will talk about machine learning. There will be primarily one talk. It's going to be in a tutorial start, followed by a very small announcement from Dr. Mahadev. One thing which is exciting this time is that the speaker is going to ask some easy questions. I asked him to make the questions easy and there is a giveaway from O'Reilly. It's like a small giveaway. It's a situation for you coming to the mid-term. So let's move over to Karthik and here from him. Thank you. So hello. Welcome. I'll be giving a talk probably on machine learning and using TensorFlow and mostly Python. So how can we integrate all these three to actually come up with something? How do we actually see whether the data you can actually do something with machine learning? Is it actually appropriate just because there is a height? Do we actually use the data for machine learning? So those are the type of questions that I would like to answer because we can always read and we can always try to do something and then we can always come up. So the most important thing that I would like to do with this talk is about how do we understand what data can waste and how do we take it for the two machine learning problems? So I actually wanted to show a video but I think it might not be appropriate given that we are actually recording this. So maybe I may share the video link. It's on YouTube. It's called The Next Rembrandt. It's actually a very beautiful video that actually talks about how freely printing and machine learning deep learning was actually combined. So it's actually a very nice video which actually talks about how machine learning and deep learning is actually combined to make a brilliant output. So they were able to make a painting. Let me not talk about it and maybe it's better you watch it. What? The Next Rembrandt brings back to life one of the greatest masters. Only this time data is the painter and technology, the brush. Together with experts from various fields over 160,000 friends from all of Rembrandt's 346 paintings were analyzed using 3D scans and digital files upscaled by a deep learning algorithm. Facial recognition software was designed to understand Rembrandt's style and generate new facial features which were assembled based on his use of geometric proportions. Finally, using a height map to mimic Rembrandt's brush strokes, the painting was brought to life through an advanced 3D printer that printed 13 layers of paint-based ink. And so, 347 years after his death, a new Rembrandt painting made from zeros and ones emerged. Unveiled and exhibited in Amsterdam. On Next Rembrandt.com, people could dive deeper into the process of creating the digital painting. The launch video helped spark a global social conversation about where data and technology can take us. Rembrandt just from a data-stabbing workday. Rembrandt? Rembrandt. Rembrandt? Rembrandt? Rembrandt? Rembrandt? Rembrandt? Rembrandt? Rembrandt? Rembrandt? Rembrandt? Rembrandt? Rembrandt? Rembrandt? Rembrandt? Rembrandt? Rembrandt. So this is very interesting because a data standpoint has actually made quite a lot of impact. For me, I'm not a great art user. If I were to maybe look at what the machine learning, the deep learning technology, created, I would have said maybe this is a Rembrandt. I would have maybe taken previous paintings of Rembrandt. I would have said most provinces of Rembrandt. But pure art collectors might actually recognize this is not what Rembrandt created. But as you saw, some of the people actually said, oh well, if I had not seen Rembrandt's paintings, I would have said this is a Rembrandt's painting. So that is the level of sophistication. And that is the level of data that we can extract so much data from so much information. The computer can learn so much from previous data. So let me start with machine learning here. So the objectives. First is I would like to bring some basic understanding of machine learning so that we're all on the same page. And maybe apply some machine learning techniques to solve some interesting problems. Like say, maybe take the Titanic data set. So we'll get Winslet die. We'll maybe Jack die. So maybe do some sort of fun things and then see what happens. Can the machine actually predict it to an extent? So with the data that we have in hand. So in all these data that we would like to use, I would like to actually use TensorFlow for all these things. Because TensorFlow gives a very integrated setting for us to do all these machine learning and inference. So next is I would also like to do a step-by-step tutorial on TensorFlow basics. So I do believe that you might have all access to the GitHub repo. So if not, then maybe I can share the repo maybe few minutes or maybe half an hour later when I start the tutorial. The GitHub repo actually has all the tutorial on the code that I will be sharing today. So we go through that line by line. And then we can actually, if you'd like to execute it, you can execute it right in front. And then if not, maybe we can try it out at home. And then yeah, so that's for you to play with. And that's for you to get a hang of what TensorFlow has to offer. And yeah, in the end, the idea is to get inspired and do something awesome. So with all these open source, so the beauty of open source is that we get to have access to, you know, like what Google Scholar says, stand on the shoulders of giants. So maybe, yeah, it's time to get inspired by open source software. Before we get into all this, I'd like to first talk about myself. I'll just give a brief introduction. I am Karthik. I work for Panasonic Research in Singapore. I've been living here for the past seven years now. I graduated from NTU with a PhD in computer vision and machine learning. So obviously computer vision and machine learning on my research interests. And more recently, I've gotten myself into algorithmic trading. It seems to be some sort of a very interesting part to trade into. So maybe I would like to just see where algorithmic trading gets me into. And yeah, of course, programming, most of my programming is on Python. But the problem with Python is that it cannot be deployed onto hardware. So I have to trade into C++. So most of my prototyping is on Python and I end up with C++ for the deployment. So that's just at the end of it. So here are some machine learning resources that are very useful. I found these resources very useful when I started with my vision learning. And I still find these very useful today as well. If you're brand new to machine learning, if this is the first time you maybe just want to know what machine learning is all about, I would actually tell you to take the course on a machine learning course. It's by Professor Andrew Nguyen. He's actually the chief scientist at Baidu right now. He was a professor at Stanford. I think he's still a not-gen professor at Stanford. If you had to hear machine learning knowledge and you would like to actually go deeper into, say, maybe deep learning and maybe the university course is pretty good. So otherwise, all these are straightforward lectures and notes. So things that are very, very useful in the long run. So for the agenda, I'll talk about the motivation, some basic concepts of machine learning, linear regression, logistic regression, of course, deep learning, machine learning with TensorFlow, some demos, as well as some hands-on tutorials. So this is the brief agenda for what I'll be doing today. So why is this all important? If you look at it, so maybe five years ago and maybe 2010, people would have been skeptical about deep learning. So the problem itself is quite old. The concept of neural networks, machine learning is actually quite old. It was proposed in 1956. They wanted to do a computer vision machine learning on a summer project and it's still a summer project. So the point is that machine learning has actually got into quite a lot of industries. It started out with the image search. Now it's into insurance. It's into healthcare. It's into so many other things, like investments into, of course, image search. In terms of Google search, it's now being changed from page rank to something like a deep learning based inference engine. So there are quite a lot of platforms here that have actually been added quite recently. So that is actually a machine intelligence 1.0 and this is actually 2.0. This was reflected just about in one year and they almost added two times the earlier number of companies that were present in this. So that being said, what actually is machine learning? So machine learning, to put it crudely, is actually the ability to teach computers to learn from data. So I throw in some data. The computer basically just learns from it. It's basically to teach the computer to learn from data. But the point here is if I would just want to give it some data, there's also a chance that the computer might actually memorize the data. So that is not our intention. Our intention is for the computer to actually to generalize using the data that it actually sees. So it actually builds models. It does not just see the data and locate. It knows because it has a lot of memory. So it just does not memorize the data. It actually builds a model so that it can actually use that model from the observations and using the training data and so that it can actually generalize and learn from the data and input something from it. So to put it in a very formal context, a computer program is said to be learned from an experience with respect to some task T and performance measure B. If it's performance at task T as measured by T improves with the experience. So this is a more formal definition. So if I have an experience, if I have a task T and over time, if I'm able to improve my performance, then I'm able to say that I'm actually learning. So if you think about it, that is actually what we are also doing. So we actually start from the first standard, first grade and then we go 12 years down the line we are actually graded on some marks and grades and then we are actually done against and then we see how far we've actually progressed. So that's something that we've also taken and then mission learning also does the same thing. We've actually given a more formal context to what mission learning is. Moving on. So the most important part or rather the most, I would be more so biased towards supervised learning in the start. That's because it's the most common type of problem encountered in mission learning and something like what I say, suppose I would say how do I commute to work? So maybe I could learn this from the number of steps I walked from office to office from house so maybe did I take the train, did I take the bus? So maybe these are all types of mission learning techniques. So I could actually use different types of data. So if I use steps then I would basically be saying my feature is the number of steps. If I actually say use a gyro meter then I would say or rather the accelerometer then I would say maybe the value that the accelerometer is saying. So these are all types of different types of features. So number of steps or the value of the gyroscope itself actually becomes a feature by itself. So another example is predicting the salary. So if I say, okay, how do I commute to work is basically like yes, I commute it by bus. No, I actually walk to work. Predicting salary would be like actually saying, okay, what is the trend? So if I took Singapore data, so if I went to the Ministry of Manpower website and actually I downloaded the data of all the wages, the median wages from say 2000 to 2016, I could basically say what is the trend that the industry is progressing into? So in 2017, what would be the, how much salary increase could actually be possible? So that is the sort of data that I could do from prediction. So basically that's linear regression. So in the first case it's logistic regression where I actually do a classification. So image labeling is more so towards doing what is actually spoken widely. It's about convention neural networks where we actually do some sort of even higher level, more computationally intensive tasks. And then we actually understand what is the data that is actually present in an image. So that is another that's completely going to hardcore machine learning, deep learning itself. So coming into deep learning, so now that we've spoken about machine learning, why is it important? So the problem with conventional machine learning is that computation was a big problem. So it could not take in the amount of data that we actually had. We generated quite a lot of data. But the conventional machine learning methods were not able to do as well as what neural networks could do. But the point is neural networks themselves were proposed in the 1980s. But then hardware was actually very limited. The computational capability of hardware was partly limited. But now if you talk about GPUs, so in 2012, so that is when the breakthrough happened. In 2012, there was this, there's actually every year, there's this ImageNet competition that's actually happens. And that's actually a competition where a thousand categories of images or actually one million images across a thousand categories are given. And then you're actually, you're supposed to create a data, your data classifier, and then you're supposed to upload your data onto the ImageNet server. So the challenge here is that, so people from the University of Montreal, they were actually pioneers with deep learning. They actually proposed the AlexNet, which is actually where all these started. So the AlexNet was like a breakthrough showing that GPUs can actually do a convincingly beautiful job. So the difference between 2011's result and 2012's result in terms of identifying the number of categories and the accuracy was close to 26 to 30 percent. So that sort of a difference is seldom heard of when you actually talk about machine learning because you talk about say five percent or 10 percent improvement. And so this sort of a difference was actually, people woke up. So in 2012, when the data was released, people were actually quite interested in why this happened. So this is purely because of the data that the AlexNet was able to crunch. And one of the good things is that it can scale across multiple machines and the optimizations are actually not quite required. So in conventional machine learning techniques, you have to do quite a lot of optimizations. There are too many hyperparameters. You have to change too many knobs. People who are like maybe engineers who are just trying to put together two things might not know what these actually are for. So immediately there are too many optimizations and people will just give up and, or maybe for the worst, they might do the wrong things and then they might end up getting poor data, poor results. So that was one problem with conventional machine learning. And deep learning kind of solved that but still over time now we have so many hyperparameters that we are still trying to, but we know what is actually happening right now. And the customization for new data is a big problem with machine learning. So if I train for some data, machine learning, if I get some new data, then I have to start from scratch. I have to start to do the feature engineering. I have to do everything right from scratch. So that was a problem with conventional machine learning. So with deep learning, what happens is we can avoid hand-drafted features. So that is one of the biggest advantages. So we don't care what the features are, we actually have a network which does this part for us. And we design models instead of actually doing new features. So earlier on we were actually trying to say, okay, shape was a very nice context, so let's bring on shape. And then someone used to come back and then say, okay, another, there's another paper which said, oh no, viewpoint invariance is more important. And then another paper would say, okay, there's actually illumination, there's a problem. And then finally there's another paper that says, okay, all these put together is a bigger issue. And then they come up with a new paper. So this was a problem with conventional machine learning until we actually started doing deep learning, where we actually have models that actually take care of all these viewpoint invariance, rotation invariance, translation invariance. So there are actually networks which can do all sorts of data variations right now. So the amount of the improvement that we've got with deep learning is actually quite, it's fascinating, that's to say released. So if we see conventional deep learning, so we have an input that's on the right hand side here and we have a set of hidden nodes. And what happens is the network itself learns from the, because we are trying to train it, so from input we have some say fixed set of data and we have the output that is going to be another fixed set of data. So in our case, let's take, say, number of steps that we want and the output is yes or no, did we take the bus or not. So the output is binary zero one and the input is basically the number of steps. So in this case, the network will actually learn from the error that, so it'll actually randomly initialize the network and then we actually learn some sort of an error function and then subsequently we learn a model that actually kind of generalizes across the entire data set that we have. So the important point here is we can actually, why is it important is because it can actually scale, you know, do for multiple perception tasks. So from pixels we can say output is cat, so this is a typical image that challenge where I'm given say one million images and thousand categories, how do I categorize so images in each of these categories. So from pixels we get and the label and from audio, so we can say restaurants in Singapore. So that's another way of putting it. From text, so this is machine translation. So from text, so or, from text we say from English, we go to French or from vice versa, so. From pixels, we can go to now a word or a sentence itself. So we actually train, we actually get images like this. We ask people to label in our sentence or for images. So from something like this, we start with something like this and then we actually ask people to label the entire data with sentences and then we learn through, because now that we have something of a text based understanding and an image based understanding, the next obvious step is to put these two together and then from a CNN plus an RNN, so that's what this typically would actually happen with pixel plus a sentence based. So if we give an image, we would like the network to come back and say, oh hey, it's a cake with a slice cut out. So that's the sort of, we are already here, by the way. So we can already do some sort of a real time recognition of a sentence generation from an image. So image captioning is what we call, and this is already there. So we are doing it to such a level that the progress is very, very fast right now. So also in terms of we've introduced a sort of machine learning, we know what this deep learning going to do. So let's move on to TensorFlow. So why is TensorFlow that important? It's because it was released only in November 2015. It's not even a year actually. And it's actually released with the Apache 2.0 license. So if you are working for the industry, it becomes very, very useful. So nobody's going to come back and tell you, oh hey, there's a patent and nobody's going to come back and sue you for that. So that's very good. The next thing is it has bindings for C++ and Python. That's good for me. So good for people like us. So that's a one straightforward point. The second thing is that TensorFlow is right now the number one for repository on GitHub, machine learning repository, of course. So this is today, of course. So it's at 31,730. So if you look at, as of July, it was 28,000. So there are already 2,000 stars in just over three months or two months. So that is why it's actually very, very... So over a span of less than one year, it's gone to being number one. So that's the level of contributions and the power that actually Google exerts. And of course, the capability of TensorFlow itself is so good that people actually take it up and then they want to use it. So we'll see why it is so. And yeah, we will get to know why it's actually so good. The advantage is that they were almost, within a span of four months, there was a distributed TensorFlow release. So the first version itself was a 0.7 release in November, 2015. In March, it was 0.8, and 0.8 released itself supported distributed machines, training on distributed machines, which means that if I had four GPUs and I had 20 machines, basically earlier on, we used to train every single machine. We used to take a different task and then we used to put it on different machines and then we used to train it and then maybe combine it or do something with it. But now what happens is we can leverage all these together. So we can actually say, okay, hey, I have this data. So there are different forms of doing the applications and modern application, data application. So we can actually do that with 0.8 release within a span of six months of TensorFlow release in March, because now it's actually much, much better. So as of yesterday, I think it's 0.10 release. So it's being improved by quite a lot. And yeah, and one of the beautiful things is it can be deployed natively on Android. And as of 0.9 release, iOS as well. So which means that all our iPhones and Androids can actually use TensorFlow natively. So of course you need to install something called Bezel. So that's Google's own compiler system, which actually connects with your iOS and Android device to actually deploy the data, the models. And yeah, so TensorFlow 0.1 release was actually this one. I think it was about two days back as well. So if you go to the group, it's in the Google groups in TensorFlow, you will actually see the contributor announcing it this morning. So yeah. Okay, so coming into the important question, so why TensorFlow? So it gets to quite a broad range of, broad family of machine learning techniques, quite a well-engineered software engineering architecture. So the serving architecture is very useful for production-level environments. Logging system and the interactive log visualizer is actually quite useful. We can actually see that maybe down the tutorials. And prominently exposed intervals and math operations, which means that if I would like to add a new math operation that is not present in TensorFlow, I can do that. So there are quite a lot of variants. So one is the most popular is of course TF Slim, the TensorFlow Slim. We will see the TensorFlow TF Learn, but there are other variations, which is the Keras variation. Keras variation allows you to use Tiano as well as TensorFlow interchangeably, but it gives you one front end, which is the Keras front end, which is actually very cool. So TF Slim is basically, it was another project and then they brought TF Slim into TensorFlow. The same thing happened with Zescapeflow as well. So these were all different projects and Google thought, okay, it's better to just integrate them into as a contributor package. So these are the lot of variants. So quite a lot of variants. So if you feel happy with this sort of very high level code, you can actually go ahead with that. But if you'd like to actually write Python level raw, Python TensorFlow code, that is also possible with TensorFlow. So this is a basic architecture for how TensorFlow works. So like I said earlier, so these are basically the neural network itself. So there's an input here. So in TensorFlow, we call something, what is like a computation dataflow graph. So it's basically a directed graph that actually takes the data. So the data, the operations are on a node and the data is on the edge. So basically these, so the data resides on the edges. So you have the operations itself on each of these nodes. And what happens is these are all, sorry. So like I said, it natively supports deep learning models, so yeah. So what is a dataflow graph? So suppose we would like to say define this equation. So A plus B plus A times B. So that's the sum of, so sum plus the multiplier. So we have two inputs, that's 10 and 20. So we have A and B here. So what happens is we would like to, so as a graph, we would represent the multiplier here and the add over there. So basically you see this becomes very easy for us to represent. And then in the end, we actually add these two results and then we get the output over there as 230. So this is a very simple way of explaining how TensorFlow is doing. So of course you're talking about two scalars here and the output itself is just another scale also. This is being done on a tensile level, so which means that we have large, huge matrices. And the computation here is very simple. We're just doing some addition and then we are adding these two to again and another addition. But we are talking about neural networks which are hidden layers, thousands and thousands of layers or thousands of hidden neurons itself. So this is a very simple way in which we can understand how the computations are actually happening in TensorFlow. So what is happening? So as a base, we have a CPU and a GP. So the advantage with TensorFlow is that it's actually device agnostic, which means that you can actually do computations on different devices and then you can plug the output later and then you can actually combine the results. So I think there is an example in the tutorial itself which actually does this. But so it actually trains CPU as well as GPU, so which means that if you have say a non-maximal suppression, you can do it on CPU and then you can do a CNN on the GPU and then you can combine these two results to get the final output. So front-end supports Python and C++ and yeah, I think okay. So let's start with the tutorial itself. So I'll just go into a simple tutorial here. So let me just show, before I talk about constants and variables, I'll talk about, I'll just show an example with how these work. So in this case, I basically imported TensorFlow and what I've done here is that I have, so I think if you do have the code that you can actually see, there's actually basic ops there is a basic graph. So maybe let's start with the basic ops because these are very interesting. So you will first get to know what it's actually doing. So the first thing that TensorFlow does is basically we have constants. So what constants are is obviously, as we all know, constants are basically, as the name implies, these are all constants. So if we have A and B which are 23 and 11, then basically we can do something like C is equal to tf.tf.add, A, A. So what's happening here? So C, so let me say, let me print C. So print C. So basically you see that it says, TensorFlow's coming back and saying that it's basically an add, it's a zero and it's a shape, there's nothing, no shape because it's a scalar and then the type is D32. So what has happened here is that TensorFlow has basically initialized the computation graph but it never executes the graph. But the point is that you need, so TensorFlow uses something called a session. So unless you create a session and then you run the graph on the session, none of your variables are initialized or run. So basically you just say that this is basically a node, it's a type of add, it has a shape zero and then it's a D type of integer because you initialize it as a constant of type of integer values. So that's a very simple example. So then how do we understand or rather how do we say evaluate this case? So let's actually say, let's just try out, let's just do something like tf.session. So what TensorFlow has is basically a session. So you create a session and then you use this session to actually run the value itself. So now if I want to evaluate the value of C itself then basically I will say, in this case I've done a tf.session, let me do it into an interactive session. So I have an interactive session here which actually allows me to evaluate at runtime. So I can basically say C dot eval and then I'll get 34. So this is what is happening at the back end. What TensorFlow is doing is basically taking, okay, you gave me an A with a value of 23, you gave me a B with a value of 11. So basically I'm just gonna add these two because I know that C is basically an add operation. So all the operations are done on tensors. So you cannot do, so of course you can also do something like this. So C is equal to A plus B. This is basically, as we all know, it's operator overloading. So basically we just overloading A plus B. But if you're doing this on a large production level code, what happens is TensorFlow is now going to go back and say, okay, so you've overloaded something, let me figure out what this is. And then it will try to figure out that okay, this is an addition operation. So the overhead is unnecessary when you're actually doing large deployments. So when you know the flow itself, you can basically say that, okay, you have a tf.add, so you can do something like this. So you can do something like this for all the operations. You have an add, you have a subtract, you have a multiply. So all these operations are present with, in TensorFlow. So moving on, so like you know, another point is typical of Python. What happens is if you initialize 23 and then say, let me divide this. So it's very typical of Python. You can actually see where it's coming from. So if I say C is equal to A by B, so let me not do this, let me actually, so A do B. Then what happens is, do you have any answers here? So yeah, okay, let's ask the question. So what do you think the answer for C would be here? Two. Okay, any other answers? Answer. Answer. No, of course I'm going to do a C.evolve. So yeah, so if I do a C.evolve, what do you think the answer would be? Okay, so if I execute this. Okay, so let me just show you the result. Let's do, yeah, you're right. So typical of Python, what happens is TensorFlow is an integer. So if I want to do a floating point result, then what I need to do is, I need to do a to do. So typically if I did something like this, then what will happen is, and if I do like this. And if I do an eval now, then I get 2.090909090. So that's very typical of Python. And another way we could also have done this is basically, again, similar to, very similar to what we do with Python. You can also initialize it with a floating point way. So we can say that, okay, let me give you a floating point value, A is another floating point value. Then let me evaluate C. So this is another way of doing the same devopration. And if I try to see here, oops, I didn't see. And let me if I evaluate this, then I'll get 2.09090. So the way TensorFlow is connected with Python is that TensorFlow actually calls numpy calls in the back end. And then there are C++ bindings as well. So which means that, so this is the reason why you actually initialize tf.constant. And so when you do this, TensorFlow at the back is actually initializing integer 32, integer 16, integer 34, 64 based on the data type that you're initializing here. So that is the reason why TensorFlow is pretty fast. It's actually been faster and over time, it's actually become almost one of the fastest machine learning libraries out there. So of course, so because you're actually talking about Python, I can also use a numpy array and then I can simply initialize a TensorFlow call. But the problem is that because it's numpy, now TensorFlow has to go back and figure out what is the type of data. And then again, it's going to lose some more computation because it has to figure out what the type of data is. So if you know what the type of data is, and if you know the size of the data, it's always good to actually initialize the data. So otherwise, if you're just trying to have fun with TensorFlow, go ahead with numpy, do whatever you want, try to create TensorFlow. Yeah, it's all fun. So other things that we can do, TensorFlow is, we can actually do, of course, with a less than b, a power b. So things, common mathematical operations can be done. So if we talk about, say, let's talk about tensors. So let's initialize a tensor. So what's a tensor? A tensor is basically a vector or, say, a matrix. So if we talk about it's a 2D matrix or, say, 3D matrix or an n-dimensional tensor. So in tensor, all the matrices are called tensors, and once you move on to more than two, you talk about nD tensor or 1D tensor, 2D tensor, so. So if we initialize a tf.constant, so we'll talk about very little three terms, but we'll talk about first constant. So let's initialize. Very typical to NumPy. Let's initialize, say, initialize 1, 2, and then 3, 4. So yeah, so the next thing is, you would have noticed something called the name here. So let me say, what is this tensor, right? So let's say 10 minus 48. So what I've done here is basically, this is a tensor. So I have actually given it a name. The reason why we have a name is because when we have a large neural network or a network that we're training, what happens is it'll be too difficult to actually go back and figure out where this data came from. So you just be having nodes, so you not know what the node is basically doing. So maybe I'll just next this, maybe I'll just run through the tensor board itself, and then we can see why this plays a big part. So this name is basically, so this initialized the 2D tensor right now. So if you see, let me initialize a B tensor right now. So, and basically I would have said, okay. So there are other examples on the code base. So you can actually download that and then set it out. But I'll just do a quick, like a 6, that's, oops, what is that? Okay, what is the problem here? So can someone say, what is the problem here? Right, because we're just the best. Great. Ah, okay, yep, yep. So we basically initialize A, we basically initialize B. So when we do, when we want to do matrix multiplication, so what we do is again, we do something called math model. Okay, come on. And then we typically do a name. So this name actually gives us a context to what we're doing. So we are saying A comma B, so we'll say let's model A and B. If you look at the computation graph, then what will happen is you will see there are two nodes, A and B, with the values, the matrices. And you'll see a node which actually is the matrix multiplication. So if I do a C dot develop, so yeah. So I get 13, 16 and 29 and 36. So this is basically a two by two matrix multiplication here. So these are some of the, you know, tens of four examples that we can actually play around with. So moving on, let's just talk about what constants, so as we saw here, constants are in-memory buffers that are initialized, but they're not changed, obviously. But if we talk about variables, so in this case, what happens is we have, so we have a constant here and we have a variable. So why do we want a variable is that we want to actually say, in a simple case, like a counter. So we want to actually see how many times a value is card and then we would like to maybe do something like this. So the only problem with variable is that all the variables have to be initialized. So you cannot call a graph without initializing the variables. So every time I have a variable, I actually assign the state of the variable to something. And then I have to do an initialize all variable. So typically the best practice would be to actually have something like add an operation that actually runs on all these and then initialize that in one shot. So that's a very easy way to do it. And then a variable also allows us to save and restore the data. So if we trained a model, then what we are basically doing is we are actually, we can store it at some point and then we can come back and restore the model at some point. So if we have a training of a large data and suppose, you know, the power runs out, we can actually store the variables locally and then we can restore this back later on. So that's the kind of the advantage that variables actually provide us. And it also, so it maintains the state across different graphs. So if we have multiple graphs, then we can actually maintain the state across each and every of these graph. And the second one is placeholders. So placeholders actually, if you're talking about CNNs, we'll immediately be using placeholders. The reason for this is that at runtime, so the images that we are actually training come in batches. So what happens is I wouldn't be knowing all the data right before, so immediately there will be shuffling, there will be batches of data. So what happens is at runtime, placeholders are actually, I know the size of the image. So basically I initialize something here, 432 here and then this is a very simple operation that I'm talking about. But in very heavy operations, what we'll be doing is we basically have a huge tensor initialized. So basically these are zeros. So at runtime, these placeholders will actually change. And the way you do this is actually, you feed this to a dictionary through this data here. So you pass this session.run and the output is basically calling this here. And to the output call, the calculation, is actually fed with this dictionary here. So the feed underscore bit is going to say that, okay the input number one is going to be fed with 7.0 and input number two is two. So that's the way you do it at runtime. So you don't have to initialize it. You don't have to actually do anything at runtime. You can do, of course, it's going to have its own problems because you're going to do it at runtime. But still you know the type of data at the back. So it will have initialized some values. So it's still good to go. So if you see that these are, placeholders most probably the widely used things with CNNs. So yeah, it's good to know what all these variables do, variables constants and placeholders. So yeah, coming to the most important part of what TensorFlow can do. So TensorFlow actually gives us, like I said earlier, computation graph. So what happens is if I have a large neural network, I can actually go into each part of the neural network and actually see and visualize each part of this. So earlier I said with, or a name value, right? So the name actually comes up over here. So if you see this con one or con two is basically a name. So if I have not initialized this with a con one name, then what have happened is basically I would have seen a pin box there. I wouldn't have known what it's doing. So I would have to go back to the core base and then see what it's actually doing. So it's kind of messy. So with this sort of setting, what happens is I can actually easily debug the network. I can see where there are bottlenecks. So the TensorFlow board itself allows us to see how the data flows and where there is actually a problem. So you can actually do a lot of things with the TensorFlow board. So this is one of the most powerful things that TensorFlow offers. Nvidia has its digital interface. So the digital interface is actually what it does is it's basically cafe in the backend. And then what they've done is they have a beautiful interface in front that's going to take your training data and then convert it to a model. It's actually calling cafe in the backend. And it's only giving you a beautiful front end. So that's all. It's like going into a shopping mall. That's actually beautiful outside, but it's just calling the same old, you know, old dilapidated shops in the backend. So that's the main problem with Nvidia. So the digits itself. So right now they've actually improved it quite a lot. But the beauty of TensorFlow board itself is that you can actually do a lot more debugging than the natively available digital interface. Does TensorFlow also support profile? Yes. Yes. So if, okay, let's see maybe I'll just show you. So I think in the example, there is a TensorFlow basic graph. So if we run that, let's just go through the code there. So let's just. So if I go to this, let me say. So yeah, so one more thing is the good thing with Windows 10 is you natively support Ubuntu. So people like me, what happens, I don't have to do a good Ubuntu anymore. So I can actually run Ubuntu natively. So I can run I Python, I can run Jupyter. Everything comes right out of the box. So you just have to, in Windows 10 anniversary edition, just have to turn on the developer mode and then you can actually go into the Windows features and just install Ubuntu for bash on Windows. So it's beautiful. It's actually a very good thing that Microsoft has done. So I'm absolutely amazing. So yeah. So you can actually see that it actually, I can run a bash script here. I can run completely all the shell script right out of my Windows box. So I don't have to install VirtualBox. I don't have to install anything on top. So yeah. So, okay. So this is a tier plan, so let me go back. So if I do this, so let me just run Python tier basic, if I do a basic graph. So what this happens is it actually goes back and then it creates a, let me see, there should be a graph DIR. So if you look into the graph DIR, there'll be a checkpoint. So yeah. So this is the file that TensorFlow creates. So it's kind of weird. It looks weird, but the point is, let me show you what it actually does. So let me run Python. Yeah. So that's the line. And then yeah. This actually, if you put this on your path, then you don't have to do so many things. You just need to add TensorFlow, TensorFlow, the main, this one to your path. Then this thing fixes, but I just have to fix that. And then you have to put log or square DIR. Is it log or square DIR? Is it log or square DIR? We are in there. Yeah. So now what we can see is, oh, wow. Do we know what it's called? Oh, log on, okay, I think it's, yeah, that's correct, yeah. Thank you. So basically it's created, so let me go. So if you look at this, so actually it looks pretty lame. So if you go into the graphs, you can see that there is a graph here. So let me show you the code itself. So the code, if you see the code here, so the code itself is basically doing very few things. First it's initializing two constants, and then it's basically multiplying. So this is the example that we saw in the first few slides. What it's now doing is it's basically creating a session, and then it's running the session E, that's all. So it's running the session with this E variable. It's basically adding AB, and then it's multiplying AB, and then it's adding these two final outputs. But the beautiful thing here is, you can see what is actually happening without even knowing the code behind the back end. So if I go in by scrolling. So the namescope here, I n underscore A, I n underscore B. And so these are the two inputs here. I'm basically adding A and B, multiplying AB. So it's telling me that these are scale Rs. So because it knows the data type, and then it's basically saying that finally I'm adding C and D, and then that's the output. So this is a very simple example. If you actually talk about training data, so what happens is there are events here which will say, okay, how much time the training took, what is the training error, what is the accuracy, where were the gradients lost? So what is the type of gradient? So all these information are available over here when we actually do the training itself on maybe say, CNNs or RNNs or something like that. So in image, of course, if you have image data, then you'll actually see things here. So given that this is a very simple example, so there's nothing much here. So you can actually see the data distribution. You can see the histogram here. Quite a lot of things that you can do with TensorBoard. So it's actually fun. If you were trying to build an image classifier, so there was this wonderful, so Kaggle is of course a very good place to start with. There was this nice story that there was actually a dog who was a cat classifier. So there was this wonderful enterprising theme which took the existing image model, and then they retrained it for new cat and dogs with less than 10 minutes, and then they pulled out the Kaggle and then they won the competition. People actually took ages to actually train that, and then these guys just came on and then they took an existing model and then they trained it. So the important thing to note here is that this is what we call as transfer learning. So what transfer learning is basically, you can actually train a network on a large amount of data and you can actually use that data on new data. So I have, say, 1 million images on 1000 categories. I can use this network and then retrain it on, say, only two categories, in this example, dog versus cat. So what happens here is that I'm only, I'm using existing data to initialize my network, and then I'm actually using this, you know, initialized, pre-initialized network to learn some new data. So basically the network is not going to do much of an effort, so that's the reason why it took just barely 15 minutes to train the network itself. But the point here is that it's important to understand that transfer learning is a very useful technique. We in the industry use it quite a lot because it gives results that are very, very uncanny. So you can actually use results from data that is not even related to what you're trying to do, but you can actually get results, quite phenomenal results with transfer learning. So if you're trying to build an image, say, suppose you want to, tomorrow you want to just go home and then start, maybe Google search, or maybe your own search engine for images, then you can actually start with something called the transfer learning. And then Google, TensorFlow actually gives you the latest inception V3 model that was trained with the close to 12 million images. So the latest performance that they do is better than human performance. The error is close to 3.46 percent. So human beings actually make 5.1 percent. So the robots have taken over more probably. So but still, the model, they've made it open source. So you can actually take that model and they've also given the network also. So it's called inception. So you can use these two and then you can train your own image classifier tomorrow. So that's possible right now. So if you go to TensorFlow's page, TensorFlow's tutorials, you can actually see that they give you how to do this also. So they actually tell you, okay, they're telling you, I'm giving you my model, I'm giving you my network. The only thing that you have to do is read the pages and then maybe train your data. So if you're interested, you can do that. So it's called transfer learning and it's available on TensorFlow. So of course it's available on all machine learning. The point here is that Google's, the latest image classifier is among the best in the top one or two right now. So you can use that model to retrain your own data so that that's possible. So I actually spoke about TF Learn. So TF Learn is basically TensorFlow Learn. So this is based on a people who are actually asking for an interface similar to SK Flow, scikit, typical in Python. So what the package developers did was, okay, you take this. So basically you just import learn and then you say DNN classifier, number of classes is two and the number of hidden units is equal to, that's all. So you basically define a three layer network here with the initial number of hidden neurons being 1,000, very cool, 512 and 256. And then you basically, so this is all you do. So this is deep learning, two lines of code. So you have, you trade a two class network with three layers of hidden units and then this runs for thousand steps. Straight forward, yeah. So this is with TF Learn. So I think there's an example with the TF Learn also. So maybe I can just go quickly and just run through that example as well. So in this example, let's just see if Kate and Winslet, sorry, Kate Winslet and Jack, if they actually survive Titanic or what is the rate of survival that they can have given the data that we gather from the people on the ship itself. So what we do here is it's basically, let me just show you what is the data here. So Titanic data set is another cable data set that we are using here. So in this data set, maybe quickly I can just open it. So we have different, so we have the first one being survive, the B-class, the name, text, the age, S-I-B-B parts, ticket fare. So we don't care about the names than most of the other things. The things that we care about is basically the sex, the gender, the ticket, the class that they were. And we would like to see if we can actually train a classifier with just these data. And then if we can predict if, say, Jack and Rose were actually, what is the percentage, what is the survival rate that they will have. So given that, let's just go through. So there is another code that I actually put up on the red post, it's called TF Learn Titanic. So in this case, so we are, the most important part is over here. So most of the data is pre-processed in this case. So let me just go through quickly over here. We are loading this CSV, it's a standard CSV load. We pre-process this data. Basically, we change all the female to zero and male is one. So it doesn't matter, you can switch it zero, one either way. The next thing is for you to sort the columns. So these are all just data pre-processing. And you actually need your certain columns, the number one and the number six, it's basically the ID and certain things like maybe the class, sorry, in this case it's, number six would be. Ticket number. Okay, yeah, ticket number, okay. So one, two, three, four, five, yeah. Parts, I am not sure what parts is maybe, so you just maybe ignore all these unnecessary, data that's not useful. Maybe ticket number is not useful. So basically what we've done here is we take this data and then we actually train fully connected. So in fully connected, what I showed you, the neural networks that I showed you is basically a fully connected layer, which means that every input is connected to the next layer of hidden neurons completely. So if I have 10 inputs, all the hidden neurons will be connected to all 10 inputs. So in this case, I have the first hidden, 32 neurons, 32, 64, and then finally I have only two because I would like to see if they survive or not. So let's just do a quick train here. We'll train this case, so let me just quickly train this. So what? So if you see that actually TensorFlow gives you a run ID, that ID is again something like what we saw earlier. So if you're gonna go back and then see what it's done, you can actually go through this temp tf learn logs. You can use this checkpoint and then you can go back and see the network itself on TensorFlow. And these are different training steps. So in this case, we've trained for, we've trained for, I think, 10 epochs. So if you understand, so typically a batch size is basically 16 here, which means that if I have, let me put it in simple terms. If I have 100 data points and if my batch size is say 10, I would basically be doing 10 epochs. So basically one, sorry, a number of iterations. If I take 10 iterations, I will go through one epoch. So what I've done here is I'm taking small batches and then I'm moving across the data and then I actually go through one epoch only when I've seen all the data. So in this case, the batch size is 16. The reason for keeping smaller batch sizes because you wouldn't want to, you know, too much of computation overall and it's also important not to keep it too small because you would basically be seeing too small data and there won't be much of a learning. So there should be a trial and error. So some of the people who I work with, they actually keep saying that mission learning is part black magic, part mathematics. So you should actually quite learn all these black magics. So tricks that you need to do. So a batch size has to be around 16, 32. These are very good numbers, typically work. So that's why it's a good number. It's not like something, a magic number or something. So in this case, we are doing it for 10 epochs, which means that I am seeing this data 10 times. And once I see 10 times, I quit. So that's the type of fit I'm doing here. And so yeah, so that's more or less it. So I've created a deep neural network of this network. So basically I've built a network here. The input data is here. So this is a placeholder. So if you see, if you remember the slide I talked about placeholder, this is a placeholder. In this case, I do not know how much data is going to come in. So I just initialize the shape as none comma 6. I know that there are going to be six columns, but I do not know how many rows I'm going to give it. So this is going to say, okay, hey, I don't know. I'm just going to say there is something, some data that's going to come up. It's going to have six columns. So why don't you just initialize an input data of this type? So it just initializes this. When I go back here, it says batch size is 16. So now this network's going to take 16 by 6 and every time it's going to run, an iteration, it's going to run a 16 by 6 here. So that's what it's doing here. And then it keeps updating its weights and then that's how it learns. So every time there is an update here, you see here it's going to say epoch one, two, three, four, five. So this is a training step. There's a loss here. So the important thing with machine learning is you should always monitor the loss. So that's the prime, the most important thing. So you don't care about the accuracy because you can always get 99% accuracy. That's not the target. So you always monitor the loss. So and you should always ensure that, so there are lots of things that actually go back. So you should have a validation set. You should have a training set. You should ensure that the test set is completely, you know, you don't forget the test set. And there are a lot of things that are actually set about how to train a network. So these are things that are very important when you actually train a network. So yeah. And so yeah, do you have any, any other questions? So if you have any questions, I can do a quick. Can we do a cross validation to get the value of all the parameters? Yeah, so that's, so in typically when you do one epoch, what happens is next epoch you actually shuffle the data. So cross validation in some sense would actually be taking one set and then leaving the others and then doing shopping. So that sort of effect is actually kind of caught here. But this won't be with the validation set. So typically you will forget the validation set and this epoch is done on the training set. So yeah, you could do that. So those numbers are not actually black numbers or the numbers that come from black numbers like six. So the 16, the reason why I said that's black magic is because you wouldn't know, say why can't I set 100 or 200? So that's the problem that I'm talking about. So when you actually increase the number of samples, then you are actually decreasing the number of computation nodes. But there's a chance that you might also look at too much of the data or you might actually see some sort of data that is actually biased in the network. So that is the reason why all these numbers are very important. So yeah, okay, so moving on, I think we are nearly out of time. So these are some examples of image classification, the image classification task that I was talking about. So if you would like to go through this, you can actually go through this in the image name. So Andrew Karpati is a graduate from Sanford. He writes brilliant articles about convolutional networks, vector neural networks on his blog. So you can go through his blog and then you can actually go understand most of it. He actually writes it very, very interestingly. So these are the type of tasks that we typically do. So if you take, this is just a sample. So you have about 10 classes here, but you have millions of images over here. And then you try to generalize, you try to learn the structure here. But if you see that there are so many variations. So that is the reason why the classifier itself gets its generalizability. And yeah, so as I said earlier, it has about 1.2 million train images and across 1000 categories. And yeah, it achieves better than human performance. So right now it's a 3.46 class and you get down on this model as well. So yeah, so what happens, what if we want to scale it up? So TensorFlow scales up to based on the number of machines. So this is a paper back in 2000. Yeah, this was in April 2016. So they showed that TensorFlow can actually scale up almost linearly with the number of machines that you have. So which means that right now TensorFlow supports distributed training. So we can do this. There's another example that I've shown in the code. You can do this with something called as a device. So if I have a device called, I can basically say which device a computation has to run on. So maybe I can show you that quickly. So in this case, yeah, so if we go through this, so these are some straightforward examples that we already spoke about. So about interactive sessions. So if you're trying to evaluate TensorFlow, you can use it in the interactive session and then you can do a C dot eval and then evaluate values and then see what happens. Typically you will also use a session block. So those are all things. But the most important part about devices is down below here. So if you see this case, I have two variables. I create a session and I use this. So this is a width block. So typically what we do in Python, if we go to write a file, then we use a width block. So we use a width block. We use a tip of session as a session. So this is the variable that we're going to use for the session. And then the second level of block would be saying what's the device on which is going to run. I can basically say if I have four CPUs on the machine, I can run simultaneously four different computations. So that's the level of, you know, trending or that's a level of fine, fine level of fine grained computation that you can do. You can say which device you can allocate it. So that is why I'm saying that actually, you can actually do computations on different GPUs without even worrying about how to allocate it, who's going to take care of grabbing back the memory. All this is taken care by TensorFlow. So if you actually do this, maybe I can create another session. I can put, you know, a conditional network on GPU over there. If I have four CPUs, I can say put on different GPUs. I can use data parallelism. I can create multiple wave variables. I can get back the parameters. So I can do all sorts of things with this block. So I use this session, I use this device. So and then on this device, I run the session. So that is what is happening over here. And if I have a GPU, so the most important, another important thing is if I have distributed machines, then I'll be using GRPC calls here. I will be saying I'll have one master machine that will have a case of, it'll actually understand or rather know what are the slaves that it's going to have. And then you can actually initialize with the IP address. You can actually say GRPC call. And then you can basically do the same thing with just the IP address. So once you have the distributed system initialized, you can do the same thing on a distributed machine. So that's how easy it is with TensorFlow. So. So how do you distribute the data in that case? Yeah, so what happens is there are two ways of distributing it. So if you have a data parallelism, so then you basically have copies of the data across machines. So that's a given. So I understand that with data parallelism, all the machines have the access to this data. If it is not available, then what it will do is it will basically do the transfer at our time. So it is going to be unnecessarily overlapped. So you basically copy the data, and then you keep it, and then when there is a call, it basically takes the data from each and every machine. Sorry. It will do that for you. Yes, it will do that for you, yeah. The different slaves in the distributed network, they have different capabilities. For example, one could be a mobile, one could be a powerful machine. Is it smart enough to figure that out, or do we have to put? Sure, yeah. So what it will do is it's going to basically wait for the data. So you saw that there's a computation graph. So the computation graph is agnostic about the device that it's going to run on. So basically it's looking out for the final output. So in the end, when the graph finishes computing, so it's going to give back the results. So it's going to wait to, so if I run a particular computation on a mobile device, and run a computation on a server, basically I'm going to get back the server resource faster. But the point is I could actually throw more data onto the server, and do a less amount of data on the, so that's outside. So basically, the software actually takes care of understanding that my data, that's going to be a latency. So it will take care of updating the parameters based on the type of latency that you have on the device. Okay, so it's like constantly scaling different, depending on the. Yes, yes. So of course, you don't run training on a mobile device. So if there is a latency, it's going to basically wait for the computer to finish computing, yeah. But it will distribute it depending on what it sees the latency. Yes, yes, okay, yeah. Because it has to wait for the, so if I'm doing a gradient computation, so basically it has to wait for the gradients to come back. So it will still be waiting for the gradients to be completed, the computation to be completed. But maybe meanwhile, if there is another powerful server which is going on, let us wait more, will it stop the Android and pick it up and say, hey, there is a resource in the line around, let me use that. So typically, you can maybe profile it and then understand that Android is doing a bad thing. But the point is it won't do it on by its own. It won't do it by its own. Sorry, yeah, I couldn't answer that question. So the question was what happens if the devices don't have the same compute capabilities? So yeah, so the answer is that it will not, it will give you back saying that it is going to have different compute capabilities, but it's not going to do an automatic scaling. And then it's not going to say, oh, hey, you have a smart device, let me take care of it. That's not going to happen. So is there like a tool on top of TensorFlow which distributes it accordingly, or like Google machine learning or something like that? So Google machine learning is totally different. So TensorFlow allows you to install this on your own local server. So Google machine learning is actually on the cloud. So that's going to do it seamlessly. You don't have to worry about how it's going to train. You can actually run that. So that's a different topic. But if I have my own network, let us say, if you've got machines, can I put a library which is going to do this for me? So you mean that the distribution? It's going to do the optimization for me. Oh, so the optimization, that is interesting. So I don't think... And it's like a old question because it's so long and useful because you can put it on Amazon or on Google. But I don't think, maybe you can have another layer of optimization to understand what is the cable. But I don't think TensorFlow does that by default. I'm sorry, I'm new to TensorFlow. So it would be fair to say that TensorFlow is an application server for machine, ensemble machine learning models because it looks like you were doing stuff. I mean, I could use those operations that you were showing to do a typical, I don't know, logistic regression calculation. Yes, yes. So that's like the application server. And on top of that, there is a deep neural network library that you can use. Okay, so your question is, is it a typical machine learning or is it a deep learning system? No, I mean, it looks like these are two different things. One is the execution in TensorFlow which is like, I don't know, some sort of more advanced Hadoop. And then a library to do more. Okay. Certain models. Yeah, so if you see, it's actually an integration of all these. So that is why it's actually very useful for a lot of people. If you see, I'm actually coming to TensorFlow serving, which is actually a production level machine learning system. What happens is in TensorFlow learning, we have something called as a server booths. So what you're talking about is basically like a web app or like an engine that actually does take the machine learning, produces the output. That's what you're talking about, right? So if I give you an input as an image, then do you actually give me an output? That's just executed in the moment, right? But what you were showing, if you want to run a training of a model before, you can either use this distributed system underneath to create any model in machine learning. Yes. I mean, a little loose, it looks like that because if you can do non-pi and arise, you can pretty much tot almost any... True, true. The problem with non-pi and traditional machine learning is that it's not scalable. Yeah, like you said, TensorFlow actually does that and more. So that's the point. But the TensorFlow that you were showing looks like some of those steps were normal, either a convolution or something like that. Yeah, yeah. So this is because I, on my machine, so because the Ubuntu that actually I'm running on my machine is basically running out of the box from Windows. So I don't have direct access to my GPU. I couldn't train a convolutional network and show you a network itself that actually leverages all the capability. So I had to show a very simple training, but this is just showing what it can add, how the graphs can be visualized and how the names can be seen on the machine. That's exactly my question. The graph is not necessarily just a neural network with hidden pledges, right? You have all the steps before data caning and all that stuff, and then you can execute a graph on top of the distributed engine. Yes, yes. So you have like two components. One is the distributed engine that you could use it even to do normal computation, right? That's correct, yeah, that's correct. And then the machine learning. The machine learning, yeah. I can do it on a single machine as well as on a distributed machine. That's the point here. So I seamlessly move from one machine to end machines. That's the, I can actually do it all over here as well as on, say, 32 machines or 1000 machines in the case of Google. So yeah, so it can actually, it's designed, so TensorFlow serving actually designed for production environments. So the key concept is that you have a server boom which is an abstraction of the objects that you train and then the server will streams is going to give you versions of different models that you train. So basically, if I have a classifier, say, today I train with 5.5% accuracy, tomorrow I train with 3.46% accuracy and then I want this to be seamlessly moved to a production environment and then my customers should not be affected. What I do is basically I upload my new model to a server stream. What TensorFlow serving does is it takes care of actually giving this data using the manager. So the manager will take care of seamlessly moving from one model to another without any downtime. So that's the advantage with the TensorFlow serving model. So this is one of the things that actually sets TensorFlow apart from almost all the other machine learning libraries. So all the other machine learning libraries are mostly aimed at academia and towards research. While TensorFlow has this as a pure production level code that is actually, you can deploy it and then you can see the results right away. So that's one of the key differentiators in TensorFlow here. So the important question is where to next? So that's something that I've always been asking myself. So if you're a complete beginner to machine learning then probably you could apply machine learning to some simple day-to-day examples, like I said. Maybe take up, maybe you have a pedometer on your phone. You can actually say how many steps I walk. I can say what is the calories that I would have burned. Maybe a prediction system that I could model based on all this data that actually my phone gathers for me. The next thing is maybe explore other performance, other methods and its performances and then see how to go forward to the next step. If you're an intermediate ML practitioner then it's time to move to deep learning. Of course, because there's no point in actually doing featured engineering anymore because deep learning is there for a reason and then it actually shows quite a lot of success. Not doing featured engineering? Not, yes. So featured engineering is choosing the features that you want to? Yes, so in the conventional machine learning you would do featured engineering. You would basically say take up a thug or something like a sift and then you would basically say, okay, why not this? Maybe build that top of something else. So that is the type of machine learning that we were doing earlier on. We were building different models, we thought something and then we were doing something. But now we have a very, very deep network. Of course we don't know why it's very so powerful. That's actually been a question that all the practitioners are trying to answer. No one knows why a neural network is actually so, so good. That's kind of the ironic part of it. Apparently there are answers to this by physicists. So recently there was a blog by physicists who actually say that the reason why neural networks are so powerful. So maybe, yeah, it's time to go through that. So mathematicians were not able to say why this was actually so good. So yeah, the next thing would be to learn about hyperbaribid tuning, like I said, black magic. So it's important, we know the tips and tricks. So as and when you keep doing different machine learning models, you will understand all these tips and tricks and you will understand why it's important to know these. So if you're interested, you can go through, I have a list on Twitter. So you can actually go through, these are all people who are currently on in deep learning and machine learning who keep posting about newer things that are happening in the academia and in the industry. And then, yeah, there's this wonderful advice for machine learning that's been given by Professor Andrew, the chief scientist in biology. And some more learning resources. So this is a book that I would actually recommend if you're actually interested in TensorFlow. This is the, I think, the only book right now. So it's called TensorFlow for Machine Intelligence. It's a very interesting book, very nice book. It's available in almost all PDF e-book forms. So maybe if you're interested in TensorFlow, it's good that you get this book either in hard copy or e-form. And then this is a TensorFlow tutorial site. You can start with lesson one. You can go from one to, I think there are about 25 lessons. It gives you a very interesting way to understand TensorFlow, how it works. And then there's this creative applications with deep learning with TensorFlow for deep learning. This is also another interesting course that you can do once you, or if you're quite the artsy type, you would like to take a look at this. It, they give a very nice overview of all the things that you can do with TensorFlow. And some of the applications, this is version 10. And tutorial also, they have a lot of tutorials on their page, of course. If you are doing multiple hidden layers, right? How much data you really need not to overfit? Yeah, so that's the question that had, everyone tries to, so there's a, typically there is a broad path. So based on, again, all these black magics and experience, people suggest that if you don't have to overfit, if you don't want to overfit it, then typically you should have the number of parameters on a layer should not exceed more than three times the sample size that you have. So if you start with, say, if a batch size is, say, 64, then typically the size of the, you know, the bar pack is less than three times the number of, yeah, the sample size that you have. So this is basically for one layer. So you have to do this for multiple layers. And yeah, there are different ways, based on the network, you have to do different sorts of other stuff. In the case of, you know, in the case of inception network, you have other types of networks. There's actually network in network itself. So, yeah, that's a very clumsy question to answer. So it's all a bar pack. So there is never a standard answer for that question. But the Titanic data set has to be to the device. If you run it on this, you're probably going to overfit and do it poorly in the general. Yes, yes. Actually, I did some tests to see what happens. So if you keep adding more neurons, what happens after some time, because the performance will go down. So you can understand that based on the, so of course you know that overfitting happens when you have the validation error goes quite over with the training error. And then so, yeah, so you understand. So this performance actually degrades quite a lot when you add key padding. So yeah, that's something that we still have to maybe understand. Have you run it on the Kaggle competition? Which one? The Titanic? Yeah. The Titanic data set is, the Kaggle competition is actually over. It's completed. It's too late. That runs forever, I think. That one runs forever. Oh, okay. They never... Well, I think, okay, though, that's something that you don't compete for money, right? No, that's for everyone, because they actually got some people like escorting. Yeah. Okay, that's a one-on-one, I think, yeah. It's a Kaggle one-on-one. Yeah, maybe you can try running this. Another question that's kind of related to that though. You mentioned that you're saying that we're moving away from feature engineering and too deep learning, but I would have thought that's going to depend on the size of your data set, surely. Because, I mean, I was going to use the Titanic one as an example. You've got a small data set there. So, if you just go throwing every single feature in, there's going to be things like, I think if I remember rightly, there's sort of ticket number and stuff in there. Ticket ID, if you put it there, you get 100% with almost any model. The ticket ID, basically, in it. If you train on the same data that you're then testing. Yeah, of course. Yeah, but if you're not, then it's just a, it's a meaningless feature. Yeah, exactly. So, I mean, when you say we're moving away from feature engineering. Yeah, so basically. I just like to understand that a bit better. I mean, is it not still important? No, so it's not that features are not important. The type of features that you choose, so basically, initially what we were doing is if we are going to say train human detection, what we would basically do is we take samples of human beings and rather crop human beings and then we would say do an LVP feature or something like an assistant. So, we actually use different types of training like an EDA boost or something to actually do the machine learning training itself. So, there are all these papers that actually come up with a few percentage improvements. The point is that when you know what you're doing, when you know the type of data, you can always come up with a few more percentage points. The problem is that, how do you prove that this feature is actually the way that that's the one that has to be the right for that time? It's so happened that you actually understood the type of data. You were able to come up with a feature that is actually good. What happens if I throw in new data? So, that's the type of problem that we're trying to solve with deep learning. So, in deep learning, you don't actually look at the data feature itself as a fixed feature. You can actually change the feature. If you actually look at the network itself, there are different layers and then if you see the activations across different layers, you can see that there are actually different types of activations that actually happen for different data. So, that is the reason why, so there are actually other concepts like dropout and other things that actually come up and then they mimic the human neuron behavior in the brain. So, that is why it's not like the features are not important, but it's not important to engineer the feature to that level. So, that's what I would like to say. Very related to this discussion. When you say the, I mean, it's okay to say that the features are not important, but frankly, in order to collect data when you're collecting data, you are already making a lot of assumptions on the features that you are putting in. Otherwise, you won't even know what data to collect and what is important and not important. So, in some way, you are already doing some features in generating just by selection. But my question is in the old methods like logistic or regression or any of the other algorithms, you'd be doing logs of something or putting in lags, leaves, and leading the, especially in date format, like when you're playing with dates of let's say when Walmart is selling more of one of the cable competitions, you'd be trying to see the lags as well, right? Let's try T plus one, T plus two, put them into feature as well. So, do we not have to do that in... So, yes, we still have to do that. So, the point I'm trying to say here is that we don't have to do a very fine-grained feature engineering. So, every time, so the traditional machine learning, actually you have to literally go and understand and come up with new features every time there is a change in data. So, you don't have to do that sort of a fine-grained feature engineering with deep learning. So, that's the key takeaway here. You still have to do understand, yes, you are correct. You still have to understand what are the lags or in terms of CNNs, what is the kernel size? What is the computational load? What is the image resolution? You still have to do all that sort of things, but the point here is I don't have to worry about what is it going to come up with. So, I don't have to understand the feature itself if I actually throw in, say, 10 layers of neural network, 10 hidden layers, I don't have to worry about what is going to come up at the 11th layer or what is going to come up at the fourth layer. So, all this is actually taken care because like I said earlier, there is a lot of being... The data has also been trained so well on lots of data that actually we can do quite a lot of things with the existing networks that have actually been trained and then there have actually been shown to work with quite a lot of data. So, you don't have to worry about the scale of the features and all that kind of stuff? You still have to do that. So, those are the things. So, earlier on, you have to do all the other things. So, right now you only worry about... So, in fact, there are networks which do this even scaling right now. So, there is a spatial transformer network which actually does even transformation for you. So, it does it in the network itself. So, the network itself does that spatial transformation. So, in terms of rotation, in terms of translation, in terms of scaling. So, all that is done right inside the network right now. So, yes, that problem was there. You don't need a network to do scaling, right? It's just a couple of... No, but the point is like, what is good? So, you don't know what is good and what is the necessary type of data, right? So, you have to do... And then you have to produce the data to actually do that sort of translation and rotation. So, yeah. Right now, we have a lot of things that are actually done right inside the network. Earlier on, when we actually do object recognition or object detection, we were actually doing a setup, you know, we used to make boxes and then we used to run these boxes across different images and then get right now, the neural network itself predicts the location also. So, that's the sort of movement that we're actually going forward with neural networks. Can we interpret the results? Yes, we can. So, all these intermediate layers are going to produce outputs, right? So, all these outputs can be visualized. No, interpret is like in linear regression, we say if this moves this much, this variable changes this much. Yes. Yes. So, when you're talking... That kind of interpretability. Yes, in this possible, yes. You can understand how much of data is trained, how much of data is actually understood and why certain things are activated when something happens. So, yeah, you can interpret. So, like in the cat image, can we say why it's classified data? Yes, you can do that, yes. You can actually go through each of these layers and see the activations that happen across layers. So, in the case of dogs, some activations actually happen and in the case of cats, some other neurons fire. So, it's exactly... It mimics what human brains are actually doing. So, it is possible to interpret, yes. But isn't that a challenge of neural networks interpreting the layers? I mean, I think what you're trying to say like in traditional modeling, you can interpret. You can kind of say these factors are what's causing it. But in neural network, I mean, the layers are basically features, but you can't really interpret them. Yes. So, that's why you actually come up with... So, that was a problem. So, that was recently why we actually visualized the layers independently in NC. So, there's a high-level layer, there's a mid-level features that are low-level features. We actually extract features across different layers and then we see what is the sort of activation that's happening across different layers. So, yes. That is a challenge, yes. But, yeah, I think we've actually come across and we've kind of solved that problem. But for you, for example, if you build a system to approve or reject loan applications, right? If you do it with logistic creation or decision trees or stuff like that and somebody sue you because you're not even approved alone, you can show why he was rejected, right? You cannot do that with a neural network. So, that is the problem with this sort of network. So, the reason why people are skeptical about self-driving cars is because now, with so much of feature, you know, deep neural networks and things going on, we cannot say exactly why it behaved that way. No, but self-driving cars are relatively easy because you can show that they have less accidents than people. So, if you see the latest Tesla accident, so that was a problem. So, a Tesla was not able to exactly say which layer performed, which layer was the problem. No, I don't mean that. I mean, Tesla is still safer than 99.99% of the human driver. So, it's killing less people. That's easy, but if you have a loan application and you reject it and somebody sue you for that, how do you show there is no bias in your model? That's a fair question. I think that we need to still understand, yeah, that's a fair question. How do you show there is no bias? So, that depends on the type of data that you train. So, you basically show what is the type of data that you train. So, if you're actually showing that you're trained with data. Even if you know the data, you cannot say. For example, if you do logistic rejection, you can say, well, you know, you have too much data already. So, the model is going to reject you. It's very clear, is the depth, is not your eye color. So, yeah, so that's a fair question. That's a problem with the future. Yeah, but there are certain areas where you are going to ask why to do something. Future, which is taken into consideration for the law of machine learning. But if you put a location of living where they lived your address, that's taken into consideration. But where a person lives can be a big indicator of the race of the person as well in the U.S. That's very tough to figure out. That's a challenging problem. So, exactly if someone sues then, yeah, that's something that you have to understand. Yeah, it's not exactly easy to say which one is the reason, yeah. Now, I think the whole thing boils down to granularity. So, it depends on how much complexity you want and how much you build it that way. And therefore, all whatever biasness and what nots can be taken into consideration from granularity. So, this is a matter of decision you write from the start. No, I haven't. The question was if it is interpretable, but you can say, for example, the decision was taken in this layer of the neural network, but you really don't know. Just take the recent example of Facebook where we move the photo of this thing called the prize winning photo of the Vietnamese boy. Oh, yeah. Yeah, and then later, because of the protests in Norway, so it reversed the action with human intervention. Yeah, but yeah. So, this is a clear case of this kind of learning. Yeah, but Facebook. So, such like, hang on, such like thing go across the board with any topic, any GPS without... I agree, but the point is Facebook cannot tell us exactly why it removed the photo. They just wanna fix it, right? And that's okay, but... There's a lot of biasness, like LinkedIn. No, no, I'm looking at... Gender biasness. No, I'm not saying this. Who did it arrive at? So, because of the network, how did you calculate that? Is premises, presumption, and a lot of things that have been put in? So, this cannot be eliminated as long as human is involved. And the granularity, how much you take that into consideration whether you want the computation to be extremely complicated, or you simplify it, or you call that an organization? Yes, ma'am. Sorry. So, perhaps you could take this discussion on LinkedIn. So, we have one last question before you end the session, along with some, I don't want to give away. And so, one last question, please. Is there a pattern that you have to be able to do on LinkedIn? Generally, again, so it... Just look at the data. So, you typically start with the data type, but in this case, you have 6,000 to 6,000 times, I mean, 6,000. So, you start with the smallest number of, like I said earlier. So, you actually... You understand the features, I saw in 6, and then you want your first layer to be at least three times or four times, which is not exceeding the number of times. So, in this case, you at least start to... That's not a bad part. And then you keep doubling the neural network, and then you actually go up and then you come down. So, that's the sign of the trend that you follow when you actually build a neural network. So, there is no obvious, there is no straightforward answer to actually say exactly how to build a neural network. But you do that with different... You understand different networks, you read through different networks, how it works, and then you kind of build it yourself. So, yeah. Question? How mature is this that's not the problem? So, this is actually quite mature right now. So, at 0.7, it was an early release last year, but right now it's 0.10 release. That's of yesterday or the day before yesterday. It's actually fantastic. So, it performs better than touch in some cases. In most of the cases that we are using, so if you're going to use a deep learning network, most of the combination, 2D combination, it performs better than touch. And the second, the best is neon. A neon was acquired by Intel recently. That's Nirvana. So, that's the only thing that actually beats TensorFlow right now. It's better than Cafe, it's better than TapTouch. Almost all the computation, all the contribution, it actually, TensorFlow does a better job than most of the other. And does it handle the streaming data? Streaming as an image streaming. It's like you have the real data coming to your model Yeah, see, the network actually does not care what is the type of data because the network, the input resolution or the type of, the size is actually fixed. So, you don't have to worry about whether it's streaming. So, as long as you understand how the data is coming in and then you interpret it and then you give it to the field in the network, the network does not actually bother about whether it's streaming or not. That's okay, but we can't. Do you remember anyone who? Oh yeah, sure. Maybe you can just take a few more and I think it's possible. Yeah, sure. Please take, and even. And we are always looking for speakers who are interested to present at five days of Singapore, please let us know. And,