 talk is going to be more developer friendly. I'll be showing a lot of code so in that sense it's a little bit different from the older talks. So I hope you like the title. So if you went to sleep like five years back and woke up, you'll find that this works. You'll find that people write more code of this style than this style. This is 1.0, this is 2.0 for lack of a better name. Stuff that you are used to for loops, if then else, can't find them here. Used to lists and dictionaries. You don't find this there. This is, more and more systems are written 2.0 style. They are the form the core algorithm part of it. What are these X's? You have lined them from top to down. What are these X's? So it's a new style of programming. We call it programming with examples and knobs. Don't worry, it's similar to what you have been hearing all day. But I have high left. The loop is, you have examples which is X and Y. Black box guesses the output Y prime and tries, there's a diff engine, delta. And the diff engine tries to make the guest output Y prime similar to Y. And if there's a difference, it goes back and turns these knobs so that the next guest Y prime will be close to Y. Okay. So there is this learning loop. These are learning programs that we write using examples, X and Y, and these knobs, the parameters. That broadly construes the majority of code, majority of machine learning efforts that you see around applied to all various fields. It's a new style of programming because you have to get familiar with completely new set of entities here. Okay. There's the annotator, example generator, stuff we have been hearing about. There's a model, model generator, model bias creator, model performance, improve maintain engine. And there's this optimizers which turns the knobs. And then there's this diff engine, the loss function. Each of these entities are verticals in themselves and supplies in a few years back. Very few people in the world care about these things. But now in the 2.0 world, you have to know each of these entities very closely. The common language, the common language that these four entities talk through is a tensors. They were lists and dictionaries and sets earlier. Now they are these tensors, these new objects called tensors. So you have scalars, you have vectors, set of vectors, matrix, set of matrices, three-dimensional array, start calling them tensors roughly. Okay. And it could be four-dimensional, five-dimensional. You don't need to visualize them. At most, you can view them as these boxes. Do we cube like boxes? That's the mental model we keep. Common, all of these irrespective of working on images, text, sound, tables, all of that has to be boiled down to this tensor format to help this engine, these learning programs work. So like sets, like lists, you need to know the API, the program interface to this. You have what happens when you insert, what happens when you combine. Tensors also have their own API and actually a laundry list of API. This is just a subset that you see. D-shape perm, you would expand, reduce, select multiple, different kinds of multiple. It's a whole new world. Okay. And anybody who's trying to work with tensors has to get familiar with this API. And then there are these, when you start looking for code that people have written that you see, it's kind of weird code which has a reshape of a tensor, the tuple, there's a transpose with some indices that these weird shape modifiers that hijack your code. How do you deal with them? Getting familiar with tensors is really important to do, like tensor programs, do machine learning programs, deep learning programs. So yeah, so if you have to sort 2.0, you have to know tensors. So we'll slowly ease into knowing about tensors and as we get to know them, get to know their flaws and then we exploit them by making friends with them. There are two libraries that I quickly talk about that help you work more comfortably, more ease with tensors. TS11, Stanley. Okay. Understanding tensors, flaws. Good. Let's start with the gory details. The cube that you see in the picture, the sets of matrices, it's actually, that's not how it's represented in memory. It's actually a flat list. Okay. And when you request a value for a dimension, there's metadata, which is red and green arrows, which tells you how far do you have to jump into the list to get the particular element. So for the first axis, you have to jump eight elements. For the second axis, you have to jump four elements. And you have to work in this flat world if you want to understand the nitty gritties of how a tensor is represented in memory. There's alternative representation as a matrix. Why is this so? Because if you want to, say, reorder the dimensions of a tensor, you want to change. Okay. So maybe example I have. So an image is a four dimension or three dimensional tensor where there's a height of the image, the width of the image, and there's channel RGB or RGBA report. That's a three dimensional tensor. Text similarly, word embeddings. You start with string words and have to map them into these tensors. Okay. So periodically you need to permute the dimension, reorder the dimensions of the distance and make the channel come before the height and width. And this low level representation actually helps you reorder it without changing the whole memory data layout. Okay. So you keep the memory layout, but you can just change the axis zero and axis one pointers. Yeah, I will do the layout. Next time you ask for axis zero, it will jump only four blocks as opposed to eight blocks. So clearly this representation is, takes some effort. It's pretty low level. But fortunately, and it's very far away from the high level box view that you have in mind. But unfortunately the tensor libraries that you work with, starting from the all popular NumPy, TensorFlow, PyTorch, Mx, MxNets, all of them, all of them are really close to the low level representation. It forces you to convert between the box and that flat view all the time. And I'll give you examples. That's how, that's why writing these tennis programs. First of all, you have to embed everything that you, all the real objects that you know into these tensors. And then you have to learn to manipulate them. It gets really painful. And the problem is throughout the world, even the Google engineers find it painful. So some examples. Yeah, so this is an array, matrix. You can think of the two rows each with three elements. And the library allows you to do a reshape operation, three, two. It's a valid one because there were six elements earlier and then six elements, three, two at the end. But you look at the output, you lose the correlation between, you lose the dimension aspect of it. Point one, point two, point three are no longer in a single line. They have been jumbled down. Okay. Why this happens? It takes a while to think about it and realize that this is perfectly valid if you have the single linear layout in memory but not semantically. There's no semantic notion of axis or dimensions that the library pretends to give you. It's not actually there. The library allows you to violate that assumption also. What you wanted to do here is not reshape but permute. Reorder the dimensions two, three, three, two. Okay. But now you look at the transpose operator, you have to work with the indices of the, there's no other way. You have to remember that I had this first zero, one, two earlier. Now I have to convert in zero, two, one. And if it was zero, one, two, three, I have to careful, you know, permute one and three. It gets painful. Okay. Second thing is about broadcasts. So if you program, basic programming, if you have integer and a float, integer gets casted into a float and then you can add. That's what happens underneath. Right? So it's casting there. In TensorFlow, it's broadcasting. So you have to add a integer to an array. The integer gets expanded to an array. The top example here. Add integer to an array. Integer gets expanded into array. You know, maybe optimized way. And then you can do the addition. And so you can combine three dimensions, two dimensional tensors with one dimensional. Everything has to broadcast. Broadcasting is the fundamental underlying primitive of this. Unfortunately, the rules for broadcasts at the bottom are ad hoc. They work most of the cases, but they won't work in some tricky or borderline cases. It's like you add in the one point of the world, you add an element to the list, and you don't know whether it's added to the beginning or at the end. Okay. That sort of confusion. You can, so this, this example shows you adding an array of ones of size 32, one and another area of ones. So the sum should be 64 because there are 32 elements in both. But unfortunately, when you do that, it's 2048. The repetition, the duplication happened 32 times. Why it happens is because the 32 gets broadcast into 1, 32, and then it has to match both the two dimensions. And both, both arrays get broadcast in 32. This, this follows the rules here, but it's a very tricky thing. And this, when you write loss functions, this kind of errors can pop up and then make your debugging. Okay. And finally, shapes of the tensors, the dimensions and the values of the size of each dimension. That's the main connect that a programmer has with the library. Okay. So you think it's also safe. And unfortunately, there's no way to make those shapes explicit in your code. So, so this is code from Bert. How many of you guys know Bert? Okay, some of them. So, so very crudely, Bert is, Bert is what Bert happening to NLP is like, you know, ARM1 happening to the Indian music industry just revolutionized everything. So this is code from Bert, actual real code, Google engineers wrote it. And you see these, these names everywhere, different definition there. And the query layer has the shape B star F and star H. If you remove this at all comment, come back even few days later, there's no way you can figure out that query layer has that shape. And that makes, so code without these ad hoc comments, it's, so people, developers need these shapes to be explicit, but all they have is ad hoc comments. So, and just builds on that. Essentially, the low level view of the tensor is exposed at the top. So you have to work with indices everywhere and get weird code like this. So that's, I guess best term that describes tensor, majority, new kind of misery just invented with this talk. So let's look back again at the disconnect. At the top are the images that I showed you earlier. I find it better to view model learning about answers, learning about manipulation is, is a learning curve for everybody in the beginning. Okay, it's better to view them as just trees, partitions. So if you have the first dimension as two partitions, you have two outgoing arrows from the tree. Then next dimension has two more arrows. Finally, you have four. Okay, and if you go down, you find a scalar value at the bottom if you pick each dimension. More compressed representation is on the right. Just have a ordered dimensions and the size of each dimension is mentioned. A more closer to semantic view is when you label them, give them names. It's b, t, d, batch, time or length and they're bidding dimensions. Okay, so that's the, if you give them names, it automatically becomes easier to think about them. Okay, last one is, is the real conceptual view where these dimensions are not related. That's also useful. So unfortunately, we think our mental models at the right end and the represent, memory represent at the lower end and programmers have to constantly convert with them. Okay, so we know tensors now briefly and we know some of their flaws and now we are ready to befriend them or exploit those flaws to help them work with us. Key idea is, is very simple just to name the dimension. And even though it's, so instead of referring them by indices, you call them b, t, d or n, c, h, w, but we have been using that, but those are not integral part of these libraries. And a lot of people have realized that naming these dimension that libraries should have inherent name dimensions. And there have been proposals since 2017, Google, Facebook, a more recent post by Alexander. But all of these proposals have are involved changes which are deep into the libraries. These tensors libraries are really complex. They have, besides these operations, they have auto diff, auto differentiation improvement in there and nobody wants to touch that without having clear understanding. So making deep changes is out of the block, except very recently, last month, I think, PyTorch added a named dimension support and an experimental base. The tensor flow, people have built libraries on top to add names. Almost everybody knows that it's important, but nobody has a very clear way to do it. Okay. Meanwhile, this project started, I started writing some code decoders, coded decoder for sort of Indian language translation, Indic language translation. And I took some code somewhere and this is the forward loop of the decoder. And it's almost impossible to modify this code without knowing that you have to add like dozen of print statements everywhere to figure out what the shape is. It's impossible without making them explicit. So we decided what can we do about it? And that led to this library. Our goals are don't make deep changes to the libraries. That's their work if they want to do it. We still want to have the names around. Once you have the names, you want them as first class citizens, not as documents. And we want to write crisp transformations over them. That's interesting that I'll show. Also want to write assertions over them. Many of the bugs that are beginning program is about shape violations, you know, I put in the right shape, why is it failing at the middle of the mind network? Okay, so we want to write assertions just like in two 1.0 assertions are extremely helpful and essential and work with arbitrary backends and integrated Python. It should work right now as opposed to waiting for a proposal to be implemented. Okay, so quick example. Yeah, this is the rest net. How many people know rest net is the core block of vision networks. This is the forward function on the left. You can scroll. It's very easy to lead. You want to understand what shape changes are going on here. You have to go and read the layer one function, which is make layer. And, you know, you can try to decipher that piece of code. The TSL version is this. Okay. The same forward function that each variable is labeled by its shape. So what you see are X colon something, which is the type annotation that Python allows you can put arbitrary types or your own defined types there. And we chose a particular language to define the store. At the top, we the starting point is is how do you define it's like the database schema. We define the name dimensions in a structured way at the top and use it throughout the program. So irrespective of whether these networks are, you know, complex question answering networks or, or, you know, multi layered faster RC and the number of dimensions that involve are actually very few. That's very important observation. We can just declare them one place and use them throughout and and suddenly your program becomes much more readable. You start with the image, which is batch and three hw and the number of channels increases 64 the height and width decrease by two. Again, the height and width decrease by four. You can see the flow of how shape changes until it becomes a flattened array to be classified at the bottom. Okay. There's one drawback here that these are global variables. So you can further, plus it's a little bit busy here, you can reduce the latter by just putting the strings with its special language where you specify this shape expression. These are arithmetic is not just names, but also arithmetic expression. And you can do stuff like get the integer value of the of the dimension by name and use that in functions as integer values. So it helps the program being clean and you can track the shapes everywhere. You have to write these annotations manually. But once you write them, there's a module used by everybody, everybody can build upon it, use it. More examples is the dot product. If you have looked at INSUM representation, it forces you to write all the dimensions. Instead, we have these underscores, which help you get rid of the unwarranted, unneeded operations, unread dimensions in this operation and just expose the dimensions on which the tensor product happens or the matrix multiplication happens. Aligned to is an explicit broadcast. You remember the 32 one example, the library had no way to know that the 32 at the bottom and the 32 at the bottom were aligned. They are aligned in some way. Having the names, just saying that this is BD and that is BLD. So automatically when you expand, you insert a dimension in between and not anywhere else. That helps explicit broadcasting. So we introduce a bunch of operators here, which allow you to write shape transformation in a clean way and with name dimension. And all of this works with TensorFlow or PyTorch or whatever library because it's this very small adaptor life backend dependent adaptor that you need to change. Okay. And you can write the shape assertions, which are instead of writing the numbers 224, 3, so on, you just write the symbolic names or expressions, which you can directly, you know, match. If you even change those defined values, these assertions, your program doesn't change. That's the advantage of using names assertion. Okay, so we started with this is a hairy slide. If you want to make your mathematicians do this to make your slides complex. But basic idea is that you start with just dimension names. And what you need expressions, you saw that, you know, height divided by two, you need expression also, you need a language or grammar to define. Okay, so that's a symbols, constants, arithmetic operators, you can have even list of the shapes, list of shapes get compressed into a single tensor. That's operation that lot of these networks do. Okay, once you have this language, you can look at the bottom. A permutation is just BTD going to DTB, as simple as that. Stacking operation is BTD star a sequence of these BTD tensors going to be a new dimension inserted there and TD. Simply all all all of the laundry list that I show you is very simply expressible if you choose to have this dimension. So in fact, you can just forget what was there on the right side, those the names of the API, which is very hard to learn, just forget them and just use this notation in your programs. So that is what we did. And this is example from GPT to this actual code lying out there at that hugging face repository. Okay, this is the old function at the top, which is beautifully cryptic. And we introduced this warp operator, which allows you to do sequence of transformations in the named world. Okay, so all that function is doing is to convert the answer of BHTD shape to BTHD and BTH star D. It's wonderful to spell out these things. But it is just extremely explicit. That's what the, you know, that's what this complicated function was doing. And this is actually a very simple represent a more readable concise code that does that. So advantages. So we so we actually to see the advantages of this name represent because nobody had written code using these names that we took bird. Okay, and we try to rewrite as much parts of it to incorporate this named dimensions. And many what you found the many such big functions actually just translate to one liners. That's all it's going on. And it's awful. It's painful. It's, it's misery to write it like that and not like that. So having sequence of transformations actually just just helps you a lot. And we actually managed to cut down 25 lines of one of the main functions, just illustration that, you know, this code that has been stable for a long time, but it could be made much more concise and more readable to do that. Okay. So I told you about a sort of grammar names, a grammar over those names that can be used to specify shape transformation. But what is going on underneath essentially, you have to the library still understand 0123 indices. And you're working with names. So you need to be to lower these names to library integers. And so I mean, use the machinery of symbolic expressions, a library called SimPy, which allows us to, for example, you, if you have the shape expression, you, you give them indices B give goes 0, T1, D2, K3, and evaluate the right hand side in this context 0123. And you automatically get that 2103 sequence, you pass it to the low level library. That's how the low level, the lowering happens. Yeah, I mean, I'm just jumping out quickly. If you want to know more details, right hand side is what the mathematicians do to make it more cryptic. Okay. And all of this open source. This is TSA lab library. I think it has been developed majorly in the beginning of this year. And a lot of people noticing it, users growing a little bit, not as much as I'd like to. And you know, people like it. And I'd like to know if you are working with tensors, if you are working with this ugly programs, I suggest that you try TSA lab and feedback welcome, contributions also welcome. Okay. So picture will be bucky. There are still two issues. You have to still write these manually, annotations manually. You have to still write the named assertions explicitly. So that gives rise to the next tool which builds upon TSA lab. And it's a dynamic shape checking tool. So again, some code. If you have the function foo, where you have y and z that you obtained by doing a mean of x along different dimensions. So you have to use zero and one and using zero is an error. So you did an aggregation along the zero dimension, which is wrong. Aggregation on one dimension was needed. So this is the error that nobody will flag your program will run through. And at later stage, you'll just crash. I won't know. The problem is right here. And so we have these truck shapes, but we need assertions there. So the idea is to use these labels as implicit assertions. And when you have these assertions, the tool can run and just point out there. So at line 36, it's good. That's the shape you specified and it's that's the shape of the actual runtime value. Line, as you expected, bd, but in fact it's td. So the runtime gives you an error. And line 38, that succeeds. So you can use this. If you write them, you automatically can check them. And if your program changes, you make inadvertent errors, you can, the tool stand, it can help you. The internals I can talk to you if you are, it basically builds on the Python tracing mechanism and piggybacks on the way it does. Okay, so TSElib and Stanley are at off-not laws. At off-not labs, we are one of our big goals to open-source research. And right now the focus is to build tools which help developers write machine learning programs faster and quicker and simpler. And these tools, two tools. We are also building this tool Litex. Some of the talks mentioned, or the discussion mentioned the problem of experimenting. Machine learning, deep learning is primarily an empirical side. There's a huge number of experiments that you have to do to figure out which of the options are correct, irrespective of how much intuition you have. It's, a lot of it is hard right now. So experimenting is better. People have not paid as much attention to how expenditure-famous should be designed. And Litex is an effort. Other projects that plan are building data pipelines, which is another. This was more about modeling this complex. We want to make writing pipelines also much simpler as compared to state of art. And the final thing is debiting the hardest problem. And we want to solve it. Not sure how. Okay, so that is basically the talk. I told you about what tensors, you know, what they look like, what the semantics are, what the problems are, what flaws are. And these two libraries will help you befriend tensors. Essentially, reach the disconnect between the low-level model that the current libraries have and your mental model. And naming has multiple benefits. We hope tensor flow and numpy in particular adds these name dimensions internally. So everybody benefits. But until then, you have TSLM. And coming soon, you have aiming tensors. It's not just befriending you to actually tame them. Thank you.