 Hello, everyone. Thanks for coming to my talk. And even if it was not in the app, so I guess it was a bit challenging for you to know what's going on here. This, as the title says, this wants to be a gentle introduction to data science just to give an intuition. There are some formulas, some stuff, so I'm not sure how gentle is this being gentle. The goal is not that you really understand everything on that or really all the concepts. It's more to get an intuition or first where data science come from or where artificial intelligence come from that it's quite related, it's my area. And second, just to see some applications, how it can be used. It's not really technical, but there are some parts that are a bit more about formulas and understanding some concepts. So, who I am, I'm working currently in a dating site, but I'm a data scientist there. I have a master in artificial intelligence to give some context that this is data science, but also I'll be talking about the human brain and things that are more considered artificial intelligence than data science. I'm also the non-focus ambassador here at EuroPython. The non-focus is a foundation that is sponsoring many of the PyData projects. All the PyData projects, even the PyData brand, is a non-focus thing. Non-focus is also organizing the PyData events. So, you can find me most of the time at the non-focus both later on if you are interested in some of the projects to do data science in Python or even not in Python, some of them are also in other technologies like Julia or Stan. So, let's talk a bit about the human brain. I would like to do a kind of a small experiment, but just try to make you visualize a bit to really be able to show what I want to tell. It's kind of a silly example. It probably will sound to you more as a meditation exercise than as an artificial intelligence. What I want you to think is that your own eyes are kind of like two cameras. Let's imagine that they are like 20 by 20 pixel cameras. And this image on the screen is just like the number five. There is nothing magic on it and it's not like anything that fancy. But it's just like 20. It's a famous dataset named NINST. It's quite common just as a benchmark in artificial intelligence in deep learning and other techniques. And what I want you to do is just to imagine that your eyes are just cameras and that you just perceive this image, this 20 by 20 pixel. So what you arrive in your eyes, in your retinas, is just like this 20 by 20 information that can have values, let's say that 0 for one and 0 for white and 1 for black. If you imagine this, then hold information, it goes into your brain. It feels arrived to the eyes, these spheres you can see there. There are these channels of neurons. We'll talk a bit about neurons in a second. But there are these channels that just propagate these 20 by 20 pixels. So imagine 20 by 20 neurons. Just taking this to the end of the brain where the visual cortex is. The visual cortex is the part, as the name visual says, that it's mostly focused on the understanding and the recognition of images. So, in back to the example, we've got this number 5. It's just like 0s and 1s to say that get to your eyes and then, at the end, get to the visual cortex. So, some years ago, I just started to think this brain, I think it's something quite obvious, but I think it's still worth mentioning. The human brain is just like a network of neurons. And neurons basically are just like these cells that transmit electric information from one side to the other. The tricky part, the whole human brain, the whole human mind, everything we know or we think we know, it comes from these neurons just interacting among them in a massive scale and all the interactions are what are really making all our thoughts, all our perceptions, everything. It's probably not very intuitive when you think as a single neuron, like having all these powerful, but when you combine thousands of them, it's really like that. It's like if you imagine a switch, a switch is actually not nothing intelligent, but if you have a microprocessor of a computer, it's just like switch, it's just like 0s and 1s, just activating some signals, some electrical signals. And at the end, it's so powerful what you can do with a computer. So, it's exactly the same thing. So, Hubble and Torsten were two researchers quite advanced for their period, that they did an experiment with this poor cat. He did an amazing contribution, but I think he wasn't really happy about that. They basically connected some sensors to his brain, to the initial part of the visual cortex. As I was saying in my example, you are getting this information, 0s and 1s, it's like perceived by your eyes as if they were a camera, then they go to the visual cortex and the first thing is this first layer of the visual cortex. So, what happens when these neurons, instead of being just a channel that propagates the information, they start to mix one with another. So, they start to build these networks. So, the idea is like, okay, probably this cat in this visual cortex is recognizing circles, like the one you can see in the experiment is probably recognizing squares, recognizing different shapes. So, they took this cat, they had the cat just looking at shapes for many hours, looking for circles and waiting for neurons to activate. And the funny part of the story, I don't know if you can see in this, I think it can be seen in the screen. You can see this, those are the oldest lights, that it was just a transparent paper and they had like the shapes on that. The funny part is that the neurons of the cat, of the visual cortex of the cat, the first layer actually activated not because of the circle, not because of the triangle, but because of the edge, like the border of this paper. You can see this line over here in diagonal. That was really what it made, it made the cat activate these neurons. So, in the first layer, the takeaway of this is that in the first layer, of these neurons, of these neurons connecting one with another, actually what it's going on is that they activate for very, very small patterns, to say, very small, just like edges in different rotations and that. Another important experiment that Donald Hepp is based in the Heppian theory is that these connections on the brain actually are, besides some of them, that are of course when you are born, you already get some connections in your brain, but some of them are created just because they get activated. So, if you keep watching this edge we've seen before, for many, many times, what this theory says, this theory is a bit complex and I don't expect you to even read that, so you have them in the slides. But the whole point is that, if you keep activating, if you have like an impulse, you see like this person, this phase, this phase, this phase, one time and another time, and once and once more, then at some point, the neurons that are activating, when the first time you see them, they become stronger and stronger. And they are kind of hard coding the information in the brain. So, it's like something that you've never seen, one activate much neurons and something that you see often, they will activate much more neurons. That's something very interesting, because at the end, like memory or the learning, actually basically everything starts from this, everything is based on seeing something and then having it her coded because some connections on the neurons are there and then being able to reproduce that. So, we'll see this kind of like a bit of the history of the human brain, what's in the human brain, now I'll go to the counterparty like how computers are kind of cloning all those things. I'll start, this is just like a linear regression, I assume that most of you know a linear regression, it's just like I got some input, let's say that you told me your age and you told me your weight to say and I want to break your height. So, I just multiply each of the values and then I add them together. That's a linear regression and at some point you find if there is a correlation between these input features and that, you get that. So, this is something quite simple but at the end, it's just like a way, if you think as this, as neurons, it's a way that every neuron can have a weight, so every input you will have will have more or less importance. At the end, you pack all together, you will have like a single value and what's very important in this is that this activate function what it does is binarizes the result. At the end, what you have is that the output of this is a zero and a one. This is something similar, it's almost a logistic regression if you ever heard about that in data science. But it's still quite powerful if you start to do that. How is it binarized? I won't talk much about that but it's not usually taking like positive plus negative and making positive zero or positive one and negative zero. It's usually used in these sigmoid functions just as a reference that what they do is just do the kind of the same exact thing but you can take the derivative on that and it is very key because in optimization problems when you want to build these networks and this and you need to see which weights are the good ones being able to derivate that, i no justifies how you really optimize. I'm going to talk about hopeful networks. I'm going to cover this briefly. Hopful networks are just like networks of these artificial neurons that everything is connected with everything. And the nice properties, it's kind of something super simple. It's nothing really complex. Just a linear regression you can implement this in a single line of code and then you just connect them among them. I think that this has a property that it's called an associative memory. What it does is like you show, as I said before, with the number five you keep showing images and then this network is remembering the images in the same weights. In the weights of the linear regression it's learning these weights. In this example what you can see is like you have the original image in the left is the image that you shown to the network and then at some point you show an image that is similar to this one. And the network kind of remembers and it optimizes in a way that after several iterations what you get is the original image again. So it can be used, for example, to recover corrupted documents, corrupted images because whatever you've shown to the network before at some point will be back to you. That's kind of the point. If you show all this to a neuron and of course you need a proportional number of pixels of neurons that are able to store all this information but if you show all this at the end and you check the weights and you represent the weights in a matrix you have something like this. It's kind of subtle but you can see that this is like the shape of the average number and in every of the pixels at the same time you have the average of the numbers that are activated, that have values at that pixel. So, for example, if you check on this side over here on the top right corner you will see the number 7, number 5 that go in that direction. Well, if you see, for example, in the top left you will see the number 3 because the number 3 actually ends up in that dimension. This is, actually it's kind of more a trivial thing but this really shows how the weights of the linear regressions I was talking about before actually are keeping information on these images. They are able to recover that later. After hopeful networks there are Boltzmann machines that start to be the basis of deep learning. If you've heard about that, I guess you do. Boltzmann machines are the same with the difference that in the previous example all the networks are the same as the pixels you are getting. So, if you are giving like 20 by 20 pixels you have these neurons and this is exactly what you have to store the information. In these cases you have hidden neurons that they don't have a direct exposure. You won't have the initial perception but you will have an indirect perception. And these are actually much more powerful because this work as a generative network. A generative network is something similar that we've seen before but the idea is that you've got a model that it learns some parameters in these cases like the weights of the neurons but if you think as a Gaussian imagine that I want to generate data that it comes from the distribution of the height of the people in this room or the ages. So at some point I can have like a Gaussian distribution I say, okay, the mean of the age in the room is 30. To say, on the standard deviation is 5. So I can start generating samples and whatever samples I'm getting it would be like the real samples like if it was real people that I knew the age of the people. Like the same as if I get the data from people entering the room. So, what is quite important if you do this with image recognition you can start making like these models, these networks start to learn patterns. This is something I think it's named like the deep dream. I think it's an experiment by Google that they started to show images to one of these networks and what was happening is that then they say like, okay, now tell me you about the data. Give me these data, as I was saying with the ages. Don't show me more data. Don't show me more... Don't tell me about the data you have already seen generate new one. And this is what they got. I'm not sure exactly what they used to train this network but this is something generated by a computer that just seen some images. I guess they saw some animals for what you see here. These capsules I don't know exactly what they are. It looks like it also saw people, maybe a temple. I don't know exactly what it saw but it can actually show again based on the images that he's been seen. Again, just to emphasize a bit on the example if you tell me the age and I start to assume that the distribution is this I think I'm already saying numbers. I think a normal age would be like, I don't know, 32 and 28. So this is basically what it's doing but with images, which I think is quite an interesting thing. The problem with artificial intelligence usually is that you'll find these models, they are super cool, they are quite amazing but at some point you'll realize that they are NP-complete, which means that the computer would take like as from the same time as the big bang to the present day to compute all this information to get like the optimal solution. En pràcticament, el que ha estat just és aquestes bolsmanes restriccits que tenen algunes propietaris que no tenen totes les connexions i que no tenen aquelles propietaris de la mateixa manera que preveu per les bolsmanes. Però encara hi ha moltes dades. És una aproximació, però una bona aproximació. I aquesta pot ser traïda. Hi ha unes tècniques per optimitzar-les i trair-les. És molt bo fer-ho en pràcticament. En pràcticament, no tens una de les netes, allò que fas és crear理 d'un relief i fer-ho a altres estratégics que handcoins a un隻. És corret a la tecabutxar una taula i a la teatre i a la teatre en reacció d'una altra rep Chel Zoom. temperaturem només un Ricky la relació de la independència a la Hopesla. Eluites a la fórmula i et fas un maco. Et parles de la societat, que acaba que les grans competències transmeten en un túnel fishing que no escolteu al carrer, però el que hem de fer és la respecte de la col·laboració. I en aquest moment, volem que ens faci unSPEAKING que la acompanyes es fa a la empresa, i que el segon producte que es fa, que l'evolucre i la lliure. L'ingrés de la pel·lícula, En català, i en català en català, i en català, llavors, i en català, i en català, i en català, Per tant, diria que la solució d'aquesta deriva ha de ser唱, de darrere qüestió de les dones, i tu dius que hi ha mode d'estimació i no de darrere, però, no ho sé, que hi ha una determinació de l'independentisme i d'independentisme. I ho dius? de la mateixa manera. En las redes sociales tu crees que el modelo es surfieres en alguna edad que no es principal. Si es minimal, strategies, etcètera. Si es minimal, et crees que el model es performing as expected o que es reproduïa càstides o el millor. això era 40.000 CPUs i ells treuen aquestes sortes de neurons amb videoconsples de YouTube, frames de videoconsples, per a un segon weekend, per veure què era la representació internada que aquests models eren per a la estudiació. I el que es va fer, o el que es va fer, és aquest cut en una de les neurons, és el que es va fer a la informació. I això probablement ho va fer sense perquè en YouTube és full de cuts. A l'endemà, si pensem en aquesta cosa, en com el cervell és enganyant, els neurons són enganyant les transformacions, basant en la informació que veiem. Tu pots veure el cervell com una cosa que és una seta de patterns que han estat enganyats, com aquestes shapes, que si tu tens filles, potser ho pots reconèixer. I en algun moment, tu tens aquestes âges, aquestes triangles, aquestes fases, i tu només ets capaç de reconèixer el que has vist abans. El teu cervell ha estat enganyat per entendre certes patterns, i només aquestes patterns són les que pots entendre. I és exactament el que el teu cervell fa en termes de reconèixer. Aquest és un experiment molt interessant. El que van fer és que tu tens aquesta foto de Van Gogh, l'una a la mida, en una altra, i traien un neural neural amb aquesta informació. Aquesta informació, el que va passar és que el neural neural ha fet un implementat que no no se proclamava, que va agafar una reforma i ho va fer el que va passar. I el que va passar és que el neural va fer un representatge de la segona part de les que tinguin un pot. I el que va passar és que no tinguin enrere el que va passar. És el que va passar i el que va fer va ser que la meva dermà va ser que el que va passar la realitat de la model, la representació interna de la model, és només sobre Van Gogh. El que retorna és la mateixa exacte pictòria, però només amb un patent de Van Gogh. És com els patents de Van Gogh. No hi ha una manera que es canta a una rècorda, una rècorda, perquè la neurona ha never been trained for that. I crec que és molt, molt interessant com això funciona. Ara, jo just volia donar una intuïsió sobre què pot ser achievat o què és la direcció d'aquesta tota la rècorda d'artificial intel·ligents i què és el top trend, però ara vull parlar més de les aplicacions pràctiques, què haurà de ser fet com una dita científica en un joc d'avui. El primer és la clasificació. Jo diria que és la més commona una, una cosa que és usually named asupervised learning, que hi ha alguna dita. Jo treballo d'aquesta dita d'aigua. Hi ha molts problemes amb espàmers, persones intentant abusar el sistema, persones intentant treure els usos. I hem de detectar-los i bloquejar-los. Hem de bloquejar-los, com a alguns acounts. A un moment, en una dita científica hi ha un dataset en què moltes persones rebuten profiles i diuen que és una espàmera, no és una espàmera, no és una espàmera, no és una espàmera. I hi ha certes informacions. Canviem aquí que la y-axis és l'aigua del profe i l'aigua és el que pot ser imaginable. No ho sé, és el temps que l'user ha registrat. I veus que hi ha patterns que el profe que veus o el data que veus falla més en un dels seus sites. Per això cal plotar una línia o un separator. Per això ho podem identificar. I això pot ser per l'exemple clàssic que és de spam, quan vas a l'inbox i veus una espàmera que ha estat clasificada, es utilitza la mateixa línia. Justament tenen alguns fites, representen, ploten una línia o altres altres altres altres altres. I a un moment, s'acaben de distinguir quins són els bons del debat. Això és donat també a la prevenció de càncer. Per exemple, analitzen imatges de tumors i es decideixen basant en aquests models si això és càncer o no, si és mal càncer o no, per pèl·lits d'acord, si algú retornarà el seu pèl·lits, sempre en aquest projecte que és la més justa i més senzill de començar, em diré, el que necessites és algú que l'ha laburat o en una manera que et posi labels a la data. Tu coneixes que certs usos no paguen, que certs coses així. I el que fas és replicar les mateixes decisions que han fet, i start replicant-te en el mateix way. A la regressió, tu probablement coneixes exactament una cosa similar, però et dic que vull saber el preu d'aquest building i, a base de les quals es van. Per això, tu pots plorar tots aquests dots i després et facis una regressió línica. També podem fer-ho en robòtics. Vull mostrar un vídeo. Tu has fet una idea. És com l'estat d'art en robòtics. És una diferent tecnic. És una regressió d'enforçament, que és una manera que la robòt kept l'experiència a base de l'experiència. La robòtica que utilitza a la feina, a l'estat de trepidària, que també el repetici al cop. Tinc un missatge com a robòtics, que primer hi ha la proposta de la relació de la gent que s'emporta, que perquè el que passa és que el telecològic bleibt fallint i que si et fan un Skill, que no menja... No em suprime, una de les conseqüències i l'artifici intel·ligent és si robots controlaran les hores. Hi hagi un seny. I potser ho faran tan clar, que veuen com els tietasses Així que trobem aquestes patterns en aquesta associació. El clàssic és simil·lar al que he dit abans de la clasificació, però en aquest cas, el segon punt és que no coneixis les labels. T'has de saber que hi ha dos diferents tipus d'users, he de parlar de dades d'insights. D'acord, tenim els users que són interessats a la muslim sexe i els users que són interessats a la muslim sexe, però en una manera més sàpida. Ok, i volem identificar-los, però no tenim labels. Perquè ningú es rebreu i diu que hi ha aquestes. T'has de treure models, i aquests models, a l'endemà, s'acaben d'explicar les dades de la distància entre diferents grups. D'acord, hi ha aquestes grups que són molt simil·lars, però a l'hora d'anar a un altre grup que són molt simil·lars, però són molt diferents grups. És a dir, es fan de les patterns. La analització neta és un topic molt interessant. S'acaben d'explicar, per exemple, alguns problemes amb les signes, com la flora. Crec que era una de les variants de la flora que van començar a detectar, a partir d'un tècnic de connexions de l'esforç, quan va començar les primeres casos en cada país, i ho preveu amb gran acústica. També, per el sistema de banc, es veuen les transaccions entre bancs, i diuen que si el banc és default, l'altre banc és default, perquè es fan moltes transaccions. Imatge de recognització. És un tema molt difícil en moltes cases. En aquest cas, l'idea és distinguir. És a Xihuahua i Náchoquit, qualsevol tipus d'esforç, i hi ha mafins. I, sí, la reconeixement és aconseguir identificar quins és l'esforç, quins és l'esforç. I, probablement, millor que una persona, perquè, en alguns casos, se'n fa molt tric. També, el car d'autorització és com l'estat de l'art. Probablement, ho has de dir. S'estan already en els estats, en diversos places. S'estan already driving autonomously. Tenen aquestes càmeres, aquestes sensors. I, realment, s'estan already being able to, even in cities, stop when there is a pedestrian who wants to cross and all that. Final, just briefly mention, I was mentioning before, there are many projects in Python that you can use. The more famous ones you probably know already about pandas, NumPy, which is like the internal representation of any of these software at the end. You have Jupyter, probably you already know about this one. It's just like a web application. It's like a Python terminal, but web-based and it's very useful when you are in data science because you usually plot things. So, also you have PyMC3s for Bayesian probabilistic models, quite advanced ones. Jensim is a pretty cool one that it's being used for topic modelling. So you can really find the topics of conversations, what I was mentioning before, which are the topics that people is talking about. So, that's it. I'm just a quick mention where thinking on organising something related to data on Saturday. Probably it would be partly a pandas sprint, so people who want to contribute to pandas. But if people is new to pandas or new to data science, we can also do a tutorial and things like that. So please get in touch with me if you are interested. I'll be around on Saturday just in case anyone wants to contribute to these projects or just get a tutorial or discuss about that. Ja, that's it. Thank you very much. Ja, we have a couple of minutes for questions. Have we have any? I guess no one really understood anything. I'm afraid it wasn't as gentle as I pretended. We can maybe get you both in the field. Quick. Hi. I don't know if my question is really related to mostly on computers, but on the other side of human brains. So you said that when there is an image, or a pattern that it's already... I mean, a the server has already seen more neurons activated than, I mean, a new image. So my question is, why do we feel much more tired when we learn new things? At least, instead of, you know... My guess would be, if there are more neurons that are activated, one should be, like, more tired. Why is the opposite, if... I didn't quite get the question. Why the number of neurons activated change? So my question is, you said that I was expecting that you get more tired, at least intellectually, where more neurons are activated. Okay? But if I can... I mean, I would guess at the same time that when I'm learning new things, I get much more tired. Okay. Okay, but why is not the keys, I mean... Well, I think there are two different steps in this process to say. One is about training. I would say that when you feel tired, is because if you are seeing something new, you go to your new day at job and that, of course you feel much more tired because everything is new, so your neurons are actually kind of training in this training stage. So all the connections need to be created. So you have new connections, and they are connected among them and they are going to get activated for the first time. So I would say it's related to that. And then when you are doing something repetitive, it's like your neurons are already being activated, the ones you are using. So it's just like the information just flows. I'm not a neuroscientist. I think it's kind of a question for more for a neuroscientist. But yeah, I think that would be probably the reason. I have another question. I don't use other languages than Python, like machine learning, like the language is designed for machine learning, like Prologue or something. I think all the... I don't use much other languages than Python, actually. I think everybody is like using C, because NumPy is mostly programming C. There is also Cyton, that is quite the standard for all those things. So I would say that I would use Pandas, I would use something that is not Python in terms of performance, because for coding, actually, I love Python, so I would never really like to have to use another language. In some cases, there is, for example, TensorFlow, Theano, all these things, some of them. I think Theano is partly writing in Python, but TensorFlow, it's not. But usually you have bindings. At the end, I think Python is very good with the interaction with the user, with the programmer. I think it's the most beautiful code you can read in Python, more than in C or anything else. So, yeah, I think it's like... The trainee is moving there. You have a stand, for example, for probabilistic programming. You have R that is quite good and quite used in this world, but in my case, I'm not using anything else, and I think you can really do a lot of data science with Python. That's great, thanks. That's all the time we have for this talk. So thanks again.