 Hello. Hey, guys. So next up is Aparna Krishnakumar from Who is a Sophomore at SRM University, Chennai. And we'll be talking about delving into art and creativity with Python. I'm personally looking forward to her talk, because not only do we share a first name, my name is Aparna Pandey. I promise you it's not a very common name. And this was not planned. But we're also coffee lovers. So give it up for Aparna. OK, so welcome to Delving to Art and Creativity with Python. I'm really happy to be here, and I'm having an amazing time so far. So before I get into my talk, I'd like to introduce myself. So like Aparna said, I'm a sophomore studying computer science and engineering in India. And I work at a student-run research lab called the Nextec Lab, where I had the AI division. So a lot of my research involves artificial intelligence and image processing and computer vision. I'm also a classical Bharath Nathiam dancer, which is a classical Indian form. So art and creativity and expression is a very important part of my life. And the more I started to explore technology and AI, and I started to research and computer vision, the more I wanted to integrate both of them. And I think it's an integration or an intersection that's less explored. And so the aim of today's talk is to talk about what has been done in terms of deep learning, especially image processing, and what can be done, according to applications as well as architecture. So I'm going to start with the first circle, which is the libraries and the architectures and the building blocks that are used for the AI techniques, then how I can integrate creativity to create something artistic or art, and how we can take this a step further to make something impactful, something that maybe bridges communication gaps between cultures and languages, and maybe even helps the world. So let's get started. First, the bi-python. I mean, we're at a PyCon, right? So easy implementation. I find it really easy and beneficial for me to make my ideas a reality, because you're not worrying about missing semicolons or anything. They also have really powerful image processing libraries, like OpenCV and PIL, which you can use on their own to generate art. But I can't really cover that, because it's only 25 minutes. And everything's open sourced, and you have particular AI frameworks, like TensorFlow and Kerasin PyTorch, which are built using Python. And like Python, they do have a very active community. And it's very easy to be a part of it. So let's start. Artificial intelligence and deep learning. Artificial intelligence is basically a branch of computer science which tries to mimic what humans do and how we behave and bring that to computer systems. And deep learning is a particular part of AI which is inspired by how the brain works. So before I start with the building blocks of the stock, I'd just like to talk about how we represent data. Like a little kid who learns by looking around him and around things, the AI models to learn on data. And we usually represent data as a matrix of training examples. Those are the different examples that we want to learn from and features. So when I say features, I'm essentially talking about consider a house. So you have the size of the house. You have the number of stores. You have the number of rooms. These are all features. So the most basic architecture or building block of deep learning is neural networks. And it's basically inspired by the brain. So you have an input layer, an output layer, and one or more than one hidden layer. And what happens is your input undergoes a series of mathematical computations to get your output throughout the layer. So how does this happen? Every connection between the input, between any two layers actually, has something called a weight. And what we do is we multiply the weight with the input and add a bias term. Now, the bias term is very important because it makes sure that the data propagates through the network and it reaches the output layer. And once we do this multiplication, we pass it through something known as an activation function. And activation function, basically what it does is it introduces an element of nonlinearity to your program as compared to an algorithm like linear regression. If we talk about how the learning actually happens, it's all based from or happens because of the cost function that we try to optimize. So this is code that I wrote using NumPy just to get us started with how it works. So what we do is we initially initialize a random set of weights and we forward propagate. That is like I said before, we multiply the weights with the input and add a bias and pass it through an activation function. Then we do something called back propagation. So you've got your predicted output and you have your actual output. What you now do is you use your cost function to measure the difference between both of them and you find an error. Suppose it's really big. So what you do is you back propagate through the layers and you find the error through each and every layers in between. And you do this over a series of iterations such that your cost function actually converges and it's minimized. So this was neural networks using numerical data. Now, when we talk about images, it's a little bit more complex because usually we represent images through pixel values and it's usually represented by a 3D matrix. So you have the height, the width and the channel. The channel signifies if it's a black and white image or if it's an RGB image. So if it's a black and white image, there's one channel. If it's an RGB image, there's three channels, red, cream, blue. So what happens is we talked about features earlier. Wait, yeah. So for images, because every pixel value is a feature, the number of features that we have is height times width times channel. And this is extremely big, especially when we're talking about huge and very detailed images and a data set of a big size. So what we want to do is we want to compress the data. We want to make it smaller. How do we do this? So there's something called a convolutional neural network which from its name uses the principle of convolution, which is what it does is it makes the dimensions of the data smaller. So essentially from the vast number of features, you're choosing the best ones, the ones that you think will help the model learn the best. This is done using four functions, the first one of course, convolution. So essentially what you do with convolution is you have a given input matrix like the first one and you have a filter. You pass the filter or you slide the filter across the input image and for the path that's superimposing with the input image, you multiply every element with the element of the filter and you add all the elements together. So like in the example, all the nine features together are compressed into one feature. Max pooling is also a similar operation where you use pooling for reducing the dimensionality. And there are many different ways of doing this. One of them is max pooling, the other one is also average pooling. So like you can see in the pink square, the element six is chosen because it represents the maximum value. This is also done to create something known as spatial invariance. So what that is, is for example, when a CNN is detecting or has an input image of a cat, it doesn't matter where in the picture the cat is, it's still a cat and it needs to be identified by the model. So that's what pooling does. And the third and fourth are very similar to a normal neural network where in the third, ReLU is an example of an activation function as said earlier and you have a fully connected normal neural network. Now how do we use these four things to build a proper convolutional neural network? What you do is you do repeated functions or repeated times, a convolution function, then a ReLU function, then a max pooling. Again, a ReLU, then a convolution and a max pooling. And I've only done this, I think, around twice. But when you're talking about very, very big CNNs like Inception that's used by Google and VGG16, they do this many, many times so that the dimensions or your data site is really, really reduced. Yeah, so another really cool thing is, like I said, you do the convolution operation many times. And the more times you do it, the more detailed or the more advanced features are that the network picks up. So again, if we talk about a house, when you do convolution once, the network may identify just the edges and when you reach maybe the fourth layer, it's identified the windows and maybe the texture of the roof and so on. Okay, so now that we're done with the building blocks, let's look into the creative applications, right? In around 2016, the Tate had an exhibition called Recognition. Did anybody actually go to this? I would be interested. You did? Okay, so for those of you hidden, it's here. So what this does is it uses convolutional neural networks to identify, let's take an image from the archive of the Tate and it identifies features again. Like for example of the ladies, it's identifying that it's a party and that there are two ladies dressed up and what it does is Reuters is essentially a photo journal so they have photography. So based on what it's identified from the archive of the Tate, it gives you the same recommendations based on what is in the Reuters Gallery too. Let's move on to another application or another creative application of convolutional neural nets. So this is something that I overuse a lot and it's in the name, so it's style transfer. So you have the content of one image which is the guns and roses band and then you have the style of another and you're transferring the style of one image onto the content of another and you get that output, which I made. And how does this work? So like I said before, everything is based around a loss function. Previously we had one, now we have three. So you have noise which is initially again a random initialization of pixel values and what we do is we use weights to again minimize three cost functions and we get an output. So the entire concept of this is that the property of the content and the style of an image can be separated and can be represented mathematically. So you have the content function, cost function which basically is the mean square error between the generated image and the actual output image or the actual content image and this is usually calculated for the fourth convolution layer because like I said before, that's when the really advanced or the specified details are captured by the content and then you have another one called the style layer. There's something called a gram matrix and I would love to go into the math of it but we don't have time but essentially what it does is for every output of the convolutional layer so for each convolutional function it kind of calculates a similarity measures because that's what style is. It's a sort of correlation if you want to represent it mathematically. So you have these two cost functions the content and the style. How do you bring them together into one? So the final cost function at the bottom is basically using two ratios, alpha and beta. So you have alpha times the content loss function plus beta times the style loss function and yeah, this is the algorithm and there's many ways to experiment with this and a lot of future research if you want to do. So one is style on style. There's an artist on Twitter who does this or he did this before both this content and the style images were essentially his paintings. They were both styles and they weren't content. So he tried to experiment and see what that would output. Another thing you could do is you could use genetic algorithms to maybe get the hyper parameters that you need in terms of like alpha and beta that you want to measure. Maybe even use instead of using the con four as per the paper you could use the con two function or the second one earlier on if you want to create maybe a more abstract art and you could use other types of pooling like average pooling or max pooling. So I talked about creating art and then I talked about how we can use it in the real world. So like I said before, I'm a Bharat Natya dancer. So for me, one of the hardest things is making the audience understand my art and a Bharat Natya is very dramatic. You have one character who portrays many roles and the audience is often really really confused about which character she's portraying. So what I thought of doing is essentially showing the change of emotion with style transfer. So every time using open CV, a change of emotion was detected on the dancer's face. I added a new style transfer to highlight a change of character. So if I can play this, okay, wait, oops. Context, it's basically about kid A who's showing sand at kid B and then the dancer changes into kid B and is shocked. So how do we show that progression? So that's kid A, there's a change of character. So another application of style transfer that I did was to create personalized greeting cards. A lot of people like customizing their greeting cards for Christmas, maybe putting like a picture of the family, but how do you make it more festival appropriate? So we thought of again using style transfer of maybe transferring the style of a snowy picture onto the content of your family photo. And we also generated the text using an AI machine, an LSTM, but again, different talk. So this is all for style transfer. Now, we're gonna talk about deep dream, which is one of the coolest applications of AI in art, because usually when you talk to AI practitioners or machine learning practitioners, they're all against bias. They don't want bias near their model. But the cool thing about deep dream is that it actually likes bias and it encourages it. The other really cool thing about deep dream is people are always talking about how AI is a black box, but deep dream demonstrates what it actually thinks. So deep dream is essentially a special architecture of a convolutional neural network, wherein you want to emphasize certain biases in the given inputted picture. For example, of the animal with the antlers, it passes through the deep dream algorithm and as you can see, the characteristics of the antlers is repeatedly showcased on the output image. So how does this work? What happens is for a given layer of the convolutional network, it detects features. And imagine that someone takes a snapshot of the picture generated at the end of a single layer. And you feed this as the input to the next layer. So what has happened is, at first, the first layer has detected, okay, so the most important feature in that object is antlers. And then it takes a snapshot of this picture and feeds it again as an input. So the presence of an antler in the picture is repeatedly emphasized over iterations of the output. And that's how you get something so loud as that. So this is when you have a neural network that's trained on particular images. The other type, the one in purple, is when you initialize your CNN with noise, that is random pixel values, and you train it over an architecture that's probably trained on another object or something different. So in this case, I'm guessing it was buildings. So the CNN just had random noise fed into it and it was able to generate this using just noise. So again, how is this applied to the real world? So how many of you guys foster the people? Okay, so in one of their music videos for doing it for the money, they used DeepMind to demonstrate that even computers can think or dream, sorry. Transfer learning was used when one particular given CNN was trained on dogs, hence the characteristic imagery that was produced. Firstly, we talked about style transfer, then we talked about deep dreaming, right? We still haven't talked about how AI can generate images from scratch and that's where GANs comes into play. So GANs are generative adversarial networks and they essentially produce images like never seen before from scratch. So I don't, again, have time to go into the code, but I'm just gonna explain this with a very simple example. So take generator who is boy G and he wants to enter a club, but he's not cool enough and the discriminator doesn't think he's cool enough. So what does he do? He keeps changing himself, like puts on a disguise, maybe puts on like a leather jacket to make himself look cooler until he looks cool enough to fool the generator, sorry, the discriminator. So this is the same concept that's essentially used in GANs. You have an entire training set, which is the training data, and you have a generator, which is random noise and a discriminator, which is, again, a loss function. So you again keep changing the weights as per the given loss functions until the generator generates something that resembles the training data. And if it's able to fool the discriminator that what's generated is, in fact, part of the data set of the training data, it succeeded. Yeah, so this is an example of things that GANs produce in the past. Every face that you can see above is fake. So there's no human that actually looks like that. And I know of a startup in India that actually uses this for ads. So they don't really use models, but they use GANs to generate fake faces. So it really cuts marketing costs. And the last one is wearing, it's basically coloring essentially, wearing the input is an outline. And yeah, it basically colors in based on GANs because the input is the generator and the ground truth is the input data. And I actually use this again, culture, right? So in India, we have something known as Rangoli's, which is essentially floor art. So when we have festivals like Diwali or Holy, we like to celebrate and we like to draw art on the floor. So I'm currently working on a project with GANs wherein we generate Rangoli's using GANs and we pass this through a chalkboard which basically draws the outline of the Rangoli for you because as you can see, they're very symmetrical images. So it's very hard to get the symmetry right and then we can color in the Rangoli's later which is much more fun. Also, when we talk about art, we should, I strongly believe we should also give back to the community. So ignore the name, but I'm working on, I'm actually a part of Hope India, which is an NGO wherein in India, we don't, a lot of children don't have the opportunity to go to school or to be educated. And there's a foundation called Hope which not only contributes to education, but also sanitation and healthcare and food. And I'm currently working on a campaign with them called Coded Couture. And I really wanted to show you guys the designs, but apparently that's not allowed until the campaign actually kicks off. So I've used everything I've talked about in this talk, style transfer, creative coding GANs. And I'd also like to say this is a really good example of augmented intelligence, which is where AI and humans work together and don't replace each other because I generate the patterns and a friend of mine uses Photoshop to tweak them or shape them and make them into designs. Yeah, so why should this intersection of creativity, art, culture and deep learning be explored further? So like I said, evolution. I mean, one of the reasons we're here and we've evolved so far or even civilization is because we were able to be creative. We were able to think. And I honestly think that technology and AI allows us to make our ideas a reality. And like I said, there's a lot of research to be done in terms of applications, new applications, architectures. You can experiment with architectures, maybe not only stick with com nets, use something else and apply them to solve socioeconomic problems. So like, you know, Pearl from the morning, there's more than one way to do it. And I'd really like to talk to you over coffee or discuss more ways to collaborate anytime. So thank you.