 OK, I think we are live. Sorry for the delay, but this always happened. Welcome to our new webinar of low physics. We are very happy, and we would like you to join to our next webinars. Please stay tuned over our YouTube channel, Twitter, Facebook to see our schedule, which is very nice. Today, we are very happy to have Marcelo Ponce, who is joining us from University of Toronto. A little bit about him. He got his PhD from RIT, Rochester Institute of Technology. I believe so he was in 2011. Before that, he got his master's in theoretical physics from Universidad de la República in Uruguay. That was in 2008. He did a bachelor's in physics in 2003 from the same university. Surprisingly enough, he was an engineering. And maybe he will talk about why he started like that. And also, in 1996, he was a programmer analyst. He is a world expert on many topics regarding high performance computing and also numerical relativity and astrophysics. That's how I met him. And right now, he is a computational scientist and HPC scientific analyst at the University of Toronto. After he finishes PhD, he went to Perimeter Institute and University of Guelph for some postdoc experience. And today, he's going to talk about HPC in physical sciences. So Marcelo, thank you very much. I'd like to remember everybody that they can write, type all the questions over the YouTube channel or over Twitter. And then we will read to Marcelo to them. So Marcelo, thanks for joining us. Hi, Alejandro. Thank you very much and good morning, everyone. Thank you very nice for the kind introductions and for the invitation to have this webinar. As Alejandro was saying, my experience has gone through a lot of different aspects in physics. But one thing that most of the different topics I have been working in is computational physics and some degree of computation. What I'm planning for today, for the webinar today, is to try to show you a little bit of what is the common denominator in my career path. So probably I should start sharing my screen of work. So I hope that you are seeing my presentation by now. So the title for today is the role of scientific computing in physics. And I add a little bit of other sciences because scientific computing, HPC, as we will see, is a bit of everything and almost everywhere. So let's see. So I had to warn you. I had a lot of slides, as you can see. I don't think we have time to cover everything. But the truth is that this presentation probably should be like three or four presentations at once. So we will see how far we can go. And at some point, we can actually ask if there is any particular interest in a specific topic that you would like me to cover because I'm going to be talking about high performance computing. I'm going to be talking about the trends in AI in artificial intelligence. And I'm going to argue that making a clear distinction between these two different fields or fields that appear to be different at the beginning is hard to say or hard to draw in some cases. But let's start and let's see how it goes. So a little bit about myself. As you probably heard, my name is Marcelo Ponce. I am working as a computational research scientist at SINED. It's a full-mouth name for just a researcher doing basically the kind of things that you like to do. What is SINED? SINED is the supercomputer center at the University of Toronto. So not only we can do research, but we can use the most powerful supercomputer available to researchers in Canada. So we are very proud to have the two large supercomputers in Canada. Actually, the newest one, it has been deployed this past year at the beginning of 2018. It's less than a year old. It's a very powerful supercomputer. I will tell you a little bit more about it later on. But just to give you an idea, our machines has of the order of 30 to 60,000 cores. And they are ranked in position 60 right now, the latest one in the top 500 chart, which is a chart that basically organized from the largest supercomputers, the fastest supercomputers in the world. A little bit of my research background and experience. My background is in theoretical physics and computational astrophysics, as you wish. Most of my work is on the field of numerical relativity, simulating the merger of black holes, multiple black holes, binary black holes, binary neutron stars, accretion disks. And I had the pleasure to work with Alejandro supervisors, Nico Yunes, another good friend of mine, Enrico Barrause, in a bit of discretion of the path of GR, in alternative theories of gravity, scalar field theories of gravity. But again, all that work has been done through numeric, through computations. In addition to that, as Alejandro was mentioning, my master thesis was done in quantum gravity. We did it for the first time. We implemented for the first time what is called a consistent discretization approach, which was applied to a cosmological model. In that case, it was quite a hard implementation because the questions were complicated, but actually the algorithm itself is complicated. So as you go and look at it, it's kind of involved. But it's a very interesting problem because it tackles not only, it's a problem that not only tackles the numerical issues, but also tackles the conceptual issues when you try to marry quantum mechanics and general relativity. In other words, quantum gravity. In addition to that, because I was young and I'm full of energy at that time, I was doing also some complex networks studies. Basically, we're studying the synchronization of properties in a network that is modeled by basically topology and interactions in the nose. And again, all this work was done mostly simulating these interactions in computers and then trying to make sense of them. And finally and lastly in the last couple of years because at the same time that I'm doing research here at the supercomputer center, I also teach a lot of the courses for the university. So I was exposed and in touch with students from other fields. So that's one of the nice things. And I think I always kind of like to do is to have this kind of multidisciplinary approach and collaborations with other fields. So we start working in bioinformatics pylons. So we have been developing a bioinformatic pylons to analyze data coming from chromatin immunoprecipitation sequencing of DNA. So that's a little bit of my background. So let me show you a bit of the motivation and why one will be interested in considering research computing if you haven't done so because nowadays it's almost everywhere. So just to back up a little bit and come back to what research computing or also known as computational science or scientific computing is. So basically it's using a device like a computer to figure out numerical values or quantities for the pursuit of a particular interest, right? And this is very generic and you can think, okay, this may apply to almost anything. And that's the thing. The one thing I want to differentiate here is the terms. It can be a bit not confusing, but people can very easily get around computational science with computer science. And that is a big difference that we have with our friends from the Computer Science Department where those guys, they do very important work and job implementing things, but it's more from the theoretical side. It's more like a computer engineering than a scientist itself. So one of the things that we usually like to think or analyze is, okay, how and why people use computers. So there are different cases and it's hard to again differentiate between all of them because there are so many. But if you have a broad spectrum, you can say, okay, well, people may do or use computers to deal with large data processing cases or data mining, investigate the behavior of models. So there are two complex to deal with pencil and paper, trying to understand experimental results using a theoretical model, finding simple model from more complex ones. Visualization is a big area of research and also when one wants to show results as I'm going to do in a few seconds, visualization is another big thing. Of course, there are more. The other interesting thing that is shifting the paradigm in the way that we pursue science, at least from the computational side, is the research computer is there to be what is called the third leg of science. And I like to put this dichotomy between experiment and theory and in particular in physics, this is very well known, right? Usually people in the theoretical side don't talk too much to the experimental side unless that you are forced to do so because you are in a big experiment like LAC or a collaboration, right? Like LIGO for instance or something like that. But this dichotomy, this kind of separation, what we have also is that this basically being merged or fashed by the computation. So we like to think that research computer and scientific computing as the third leg of science of this third category, where basically not only can be used by the experiment and theory at the same time, but also can shed light in both and can offer more resources for the researchers. If you think about a simulation, for instance, an analysis of simulations, it looks more similar to what we will say is a well controlled experiment. But the one thing I will argue and I'm going to talk about this in a second is that you need to have some skills uniques to this particular new field, new area in order to harvest the best of it. So one of the skills obviously is programming because that's the way that in which we communicate with computers. So you usually need to do a bit of programming to basically write your simulations. Depending on the problem that you are tackling, you may have to do a lot of programming. Relate, the very close related to that is the selection of the language that you use. It's like how I teach or I tell my computer to do what they have to do or what I want to do. So there are just to set the ground here, there are two types of approaches here. And you will start to see my presentation, now we will start to merge things from the physics and my own research and concepts that we usually had to deal with and had to know for implementing this in the HPC side, the high performance computing side. So the first thing that we like to differentiate is the type of computer language that we use. We have the compilers and interpreters. So people writing calls for very demanding simulations to run very fast and take full advantage of computers like clusters and supercomputers. They may end up using C, C++ and Fortran. These are low level languages that allow you to take the full advantage of the computer and gain the most of it performance. Now we have on the other hand, interpreter languages that basically what they do is go land by line or they are waiting for you to input commands in the shell and the prompt. Basically as soon as you hit enter, it basically interprets, reach your command and execute it. Examples of these are Python and others. So the disadvantage between one and the other is the interpreter is usually a Heigel language so you don't need to reconstruct every front scratch. It's easier to get used to deal with them but you don't get as much performance as with the compiler languages. On top of that, on top of the programming which is the instructions that you give to the computer, we also need to design the algorithms. In other words, the techniques, the strategies and methods that we want to use for implementing or solving the problem we want to talk. And I'm going to be talking a little bit about them too. So let me start and this by no means is a complete review because as I said, the techniques and algorithms in particular that they use nowadays are humongous in the sense that there are too many and we cannot even be close to scratch the surface of all of them. But let me just review a few of the ones that are most used in the fields of computational astrophysics. So the first thing that comes to mind is to differentiate them between having a mesh-free and a mesh algorithm. So what is the difference? Well, in the first case, when you have a mesh method, basically you need a grid. You need a system of coordinates that lay down on your computational domain. And then you divide, you discretize this domain and then you get coordinates, you get physical quantities like volumes, densities, temperature, pressures, whatever it is that you're interested in, in those particular points. Now, the mesh-free method, again, you still need points to do the computations. They are going to be a scatter in the computational domain. But there is no geometrical relationship between them. And it may sound crazy at the beginning because the question is how I keep track of those. So I'm going to show you an example of a very powerful technique, a very beautiful elegance and simple mathematical technique of how can you do this? So just to give you an idea of which are grid-based methods, mesh-free methods, and are like, well, finite difference that you probably are all familiar with them, finite element, finite volumes, adaptive mesh refinements, and others are examples of grid-based methods. If you think a particular example can be a spectral methods of this, mesh or grid-free methods are basically particle-based methods. M-body simulations, a smooth particle hydrodynamics, SPH for short, are cases of those. I'm going to be talking about SPH in a second. And of course, you can have a combination of these ones and I will show you an example of a result that we obtained by combining these two. So let me very briefly review and remind us all about the M-body particle method. You're basically solving the classical gravitational problem, M-body problem, which has a second derivative of positions with respect to time, equal to the sum of the masses and the difference in the distances of the positions divided by the distances squared. Now, we will have M-particles. So we will have to repeat this summation for each of the particles and of course, we will have initial conditions for them. So how is the algorithm straightforward algorithm to solving the M-body problem? Well, you calculate the net force on a given particle on a given time, you determine the new position of the particle at somewhat advanced time. And there you go. So now, just a historical note here, if you wish, the first M-body simulation was done in 1941. This is way before having computers accessible in the traditional way we have it. It was done by Eddie Holmberg in the land observatory in Sweden. And this is, I found it super cool because you can see here how people figure out things to do in a different manner at different times in history, right? So what this guy did was, okay, I have light bulbs. The electric field of the light emitted by the bulb goes like one over our square. So basically it has the same behavior as the gravitational field, one over our square. So what you can do is you can put light bulbs in a table, representing the position of the particles. Then the luminosity of these light bulbs will represent how the gravitational field decays in distance, and then you can put photosense. So basically they were measuring the luminosity of these light bulbs, then writing that into paper, doing the computation, and then moving the light bulbs according to the new force that they compute. So I think it's a remarkable example, a very early remarkable example of the beginning of what is an algorithm, a method implemented with the tools that you have at that time. Again, I know what this is, okay, you can do that in your cell phone probably. There are body simulators that runs in your cell phones, but I found it really inspiring from the way of thinking on the way of figuring out how to do things at very early times. So that was a quick review of Enbody. Let me show you a technique that I particularly fall in love when I was studying my PhD. It's called a smooth particle hydrodynamics. The main idea is that you replace the continuum by discrete moving elements, points or particles, however you want to call it. It's very well known and used in computational fluids. So the idea is that you can use this for simulating fluids. It has been used in many different fields, astrophysicists, balistics, vulcanology, oceanology, engineering, you name it. It's basically everywhere. The features is a mesh-free Lagrangian method. So you should remember when you have fluids, you can have two ways of describe the motion of fluids. Aulerian description and a Lagrangian description. One is basically saying, okay, I'm in the lab frame. The other is in a commuting frame with the fluid. So that's the main difference. In this case, the Lagrangian method is basically a commuting frame with the fluids. One of the most beautiful features of this technique is that the resolution. So meaning how fine control of the results I'm getting. Of this method can be easily assessed with respect to the variable such as the density. So there's a lot of math behind this method that is very nice and very easy to follow. So if you have the students that are starting on this and they want to very quickly get results and overall and they start very straightforward with the procedure, I will very much recommend to take a look at SPH. The other thing is that there are different frameworks, there are different libraries. You can even run SPH course nowadays in Python if you wish. So the learning curve is not steep at all. It's quite swallowing, I will say. So this is in a natural how SPH works. It divides the fluid into a set of discrete elements. Usually we call these particles. These particles are finite in size. So this kind of spatial distance that basically describe them is called the smoothing lens and all the properties are smoothed by a kernel function. So that's where the math start to play. If you think about this is kind of atomized or separated view of the fluid. So I don't want to show you to match the math but I want to show you some of the rest that we are trying by using this method. So what you're seeing in this slide and I hope that the movie is played nicely in the transmission is in the top row, what we study is the fate of a gravitational recall in a black hole surrounded by an accretion day. So one thing that people have shown in simulations in numerical relativity is when you have two black holes that they merge, they emit of course gravitational waves which we have been listening and observing since a couple of years now. But also what may happen is depending on the spin and depending on other configurations on the black hole, the remnant black hole may have a kick at velocity, a remnant velocity that can go up to 1000 kilometers per second. So the question that we had in mind when we started this project is, okay, how can we understand, how can we see the fate of a recoiled black hole that is surrounded by an accretion day? So what we did was, okay, let's simulate that, let's assume actually that there was a merchant of black holes that the remnant black hole has a gravitational recall is kick it in layman terms. And what it will be the fate that you see when there is a thin disk surrounding this remnant black hole. So on the left side, the two left panels here that represent the fate on the accretion disk when the kick angle is 15 degrees was to the vertical axis of the disk. And on the right, you see the face on an edge on views of the disk when the kick is 60 degrees. So I'm going to play the movies for you. One of the nine things you can create very nice visualizations. So you can see in this case as well, material is accreted, the black hole remains in the center, let's play it again because it's a short movie. There is a kind of a short way escaping the material is kind of depleted in the external parts, more accretion is on the center. If you see the edge on view, there's these beautiful patterns like material also being explode out or taking away. For the 60 degrees angle, you will see the pattern is quite a bit different. What we have done in this paper as well is we estimated the luminosity. So in principle, if a case like this may happen, you should be able to distinguish if you have the optical capabilities of seeing infrared of detecting and differentiate the angles of the recoil of the black hole with respect to the disk. Now, the figure I had in the bottom, I like this project very much because in this project we have, we are a bunch of people in this paper and what we did was to combine several techniques. So we started simulating how a neutron star is being disrupted by a black hole. So there is a neutron star orbiting a black hole. What happened is that then the neutron star is disrupted very dramatically by the black hole and what we did was that first stage was model using a spectral code called SPEC, spectral Einstein code. Then we imported the material, the density, the basically how the star was distracted by the black hole, the positions, density and composition of the star into a grid. And then we put that information into our SPH code. So this is a large, tiny evolution of that what you see is one of the final stages and you can see in the center of this kind of cloudy shape it, iso contours representing whatever is left of the bound material of the star. In the center is the black hole. So the different colors here represents the density of the star and then the arrows, this is one of the ninth thing of this visualization. The arrows represents the velocity of the debris that is basically escaping away from the black hole. One thing that we were able to prove here is that the velocity field is homogeneous as you basically go away from the black hole. And this has been theorized for a while but this is one of the first time that we can actually visualize this. There was a third technique that we added to this simulation that was a nuclear reaction scheme basically. So it's a code that basically can take the temperatures, composition and density of the fluids and then basically see what kind of chemical elements are produced. So in that case, this simulation was also used to say how much different type of abundances in the composition of the star was correct. I told you at the beginning one of my PhD thesis was doing computational astrophysics. So I want to show you a couple more of examples. The one on the left is how we simulate the measure of actually I think it's a black holes. What we were trying to study at this point was if there was a possibility, a configuration where we can generate a sort of toroidal singularity. So we didn't find evidence of that but it was a fun project to implement. And then on the right you will see kind of the same study I was showing you before with the cricket black holes and the SPH. But then at some point it will show, it will switch. Let me actually go there. So this is our simulations of binary neutron stars where in addition to have the fluid we have electromagnetic fields. And one of the interesting things when you do this kind of studies that you can actually look at the gravitational way in printing and pattern but also tangle or link it to the electromagnetic signatures. And that's something that LIGO in collaboration with other observatories, astronomical observatories has been done in the last year. So we were able to simulate the things and provide interesting information to able to determine which is or what are the alignment of the dipoles inside the neutron stars. So all this for say that for instance if you are in the field of medical relativity you will have to deal with things like the ones I was talking before of course you will be probably developing your own codes. You will be dealing with a lot of mass just to figure out in which way is better to actually describe the problem. This may sound as trivial or simple but let me tell you in particular for numerical relativity the field was stacked for more than 10 maybe 15 years because the description that people was using to solve the equation was not the appropriate. It was mathematically ill posed as you wish. So that's a very important point. So sometimes you need to stop your horses and say, okay, I'm writing my problem in the computer in the right way for being sold. And there are a lot of details, like tiny nasty details that also happen when you basically put your problem inside a computer. In addition to that, if you are doing neutron stars or any kind of matter things get complicated because you need to add fluid and if you want to deal with electromagnetic you need to add some sort of prescription of how to deal with the electromagnetic side. The good news is, if you want to do numerical relativity nowadays there are several frameworks that are more or less ready for using right away. Actually, they are ready. It's just, if you want to do bleeding as research you will need to do some tweaking and implementation of your own but all the infrastructure you don't need to rewrite Einstein equations the 40 Einstein equations with all the variables and connections and transformations. There are frameworks that are already available for you. I listed here some of them and they are hyperlinks to them. So if you're interested, you can just look at them. The Einstein tool keys previously known as cactus is one of them. The spectral code I was talking about before is also another one. And then there is another project called whiskies in European one that basically allow you to deal with fluids and magnetic fields in general. So let me move forward here. So I think I'm going to switch gears a little bit now. Well, not switch gears completely but take a different approach and start to talk a little bit about high performance computing. Why we need high performance computing? Why we need parallel programming in particular? The limits of parallel programming, this is very important. The computer architectures where you run the things because one thing I want to, one message I want to transmit to you today is when you do this kind of research, you need to be aware of the machine that you are using and you need to be aware of the limitation of these machines and which approaches you may want to take. So let's talk a little bit about HBC. So why is necessary? Well, there are different arguments if you wish. If you had or you are dealing with big data, experiments like LHC in the CERN few weeks ago, sorry, days ago, there was a breakthrough in one of the Canadian experiments, CHIME, looking at the cosmic microwave background. These guys actually, they host a lot of data in our site. They had a very interesting approach how they handle the acquisition of radio birds in the sky. They use something called FPCH. I won't be talking about them today, but it's a very cool thing. It's a very cool device. It's basically a computer that can program itself depending on what you want to do. So things like big experiments where you need to basically digest and process a lot of data. That's one of the niche for HBC. Big science, obviously we all want to do more every time. We don't only want to simulate black holes, we want to simulate black holes in the presence of anaccretion D's and in the presence of binary neutrons stars, the formation of galaxies, the core collapse of supernovies, you name it. You always want to be doing cutting-edge research and that implies in many cases you need to push your boundaries in order of what kind of computations you can do. New science is another case, things that couldn't be done before, the kind of simulations I was doing in my PhD for sure couldn't be done 10 years ago, even five years ago as we changed the computers, many of these things are reachable nowadays. The thing to take into consideration though, because this is all nice and good, but there's always a bad, one thing that we have been noticing in particular, if you are kind of a geeky and you follow computers a little bit, you will notice that the clock is beating the processors, they haven't, I mean they had advanced, but they haven't changed much. So, bigger and faster memories and this has been lagging as compared to 10 years ago. The point here is there was a time where you could just wait one more year in the, I'm talking about probably 10 years ago or 15 years ago, you just wait one year, there was a new computer faster, twice the speed, that is not the case anymore. And you will see how we can actually hack into this. So, in particular, more computer resources here mean more cores running concurrently and we're going to talk about concurrently. If you think about your laptops, at least your laptops may have probably two or more CPUs. Even your cell phone usually has seven or eight CPUs. So, that's why we need parallel computing. And you will see parallel computing is not free, it doesn't come automatic with everything. You need to find your way around. So, why actually spending time doing parallel computing? Well, you will get results faster. There is a limit on how fast a computer can process things. If you had two computers, in your best case you can process at twice the speed, that is not always the case. I will show you mathematically how you can prove that it's not the case. But you can tackle bigger problems because now you will have twice the capacity. You will have twice the memory, twice the disk, whatever. You can do more. We want to do the same thing that was done on one computer, but on sounds of computers. Remember, our supercomputer here has 60,000 cores. So, imagine this 60,000 cores doing a computation for you. So, if you ask them, okay, solve this equation, but you don't tell how to do it. They will be both, all of them, solving the same equation at the same time. So, one thing that we will have to change, that we need to change, and we change for taking full advantage of these supercomputers is the algorithms. So, one of the messages we always like to be clear about is the algorithm, the program, that works well in your lab, in your one core machine is not the best and most optimal algorithm program running in order to compute supercomputing. And there is some messages associated to that that will lead that by the end. If you are also a geeky, you may have heard about the Moore's Law. So, what is Moore's Law? Well, Moore's Law is a quantitative tendency that engineer Moore noticed in Bell Labs in the 70s. That is, every one year, the number of transistors in an integrated circuit are duplicated. And this is a plot where it shows exactly that. It shows the number of transistors. We started with thousands in the early 70s. We are in the order of 10,000 millions by 2015. So, the plot is not up to date, but you can see the trend very clearly. It's a log scale, by the way. So, this trend holds. So, wait a second. I told you just seconds ago that our computers are not getting faster. If you buy a laptop two years ago and you buy a laptop today, you will see that the CPU clock is roughly in two point something gigahertz. It hasn't been going faster than that. So, what's the problem? What is the... So, our number of transistors are increasing according to Moore's Law, but our computers aren't getting faster. So, what is the problem? So, Moore's Law, basically, the trend is trending history in computer and hardware. That's all good. But what Moore's Law don't promise or didn't promise is increasing the clock speed. We have gotten more transistors, but it's getting hard to push the clock speed up. Why? It's basically thermodynamics. The power density that you are basically including in this integrated circuit it basically cannot be dissipated. And I would love to invite you to our supercomputer center and you will see the most impressive thing there, of course, is the supercomputer, but the second most is present. It depends who you ask. Maybe it's the first or the second is all the chilling infrastructure, the cooling infrastructure that we had to have for cooling down that piece. We have air flowing. We have water circulating in the rear doors of the racks and we have humongous pipes having water circulating all over the place just to cool down the system. So, that's the main reason why people can or designers, engineers, cannot increase the speed of computers is they get too hot, they melt down. So, what is the solution to that? Well, instead of increasing the speed, we put more cores in parallel doing the job. This is called in engineer and layman terms as you wish, it's called Debye and Conquer approach. So, this is another way to represent smooth law, the trends in the transistors in the different cores, but also how you can see the star nation in the power and in the speeds in the different plots. So, let me come back to the idea of concurrency. So, concurrency is nothing else than having all these cores doing something at the same time. So, it's like having a factory and now you have several workers, but you want the workers to communicate between them so they can ensemble the product or whatever more efficiently, take less time and doing nicely. So, we need to find the parts of your program that can be done independently separate them and give them to our workers the different cores. Ideally, the order of execution in these cores won't matter, but when you have data dependencies and it's in most of the cases, so in other cases things start to get complicated. So, let me show you just a simple view of this. So, let's say that we have, so each of the new here represents a core, if you wish, so one to four. Let's suppose that we have some data to process so each of the core process is data and at the end you get an answer. So, if this is the case where you can basically split the data in your four processors and it can be done independently so there is no communication whatsoever here, then this is what we usually call embarrassingly parallel problem because you basically give a chunk of your data to which core and then you wait until they're done and you're done. And that's easy. You basically scale your problem by the number of cores is what is called linear scaling is awesome. Now, how you measure, how come we improve over time by running this way? Well, we have a concept, a definition called throw put represented by letter H. So, that's the number of cores divided by the total time that the shop takes. So, if you are doing this sequentially that you will do with one core in one computer, you have four of these orange units, the circles representing the work done by one of these cores. It's also called task. And then you are set at the end. So, the time here is the number of tasks and times T one is the individual time for each of the tasks. And now your throughput is H, sorry, it's one over T one. Now, if you divide this, you do it in parallel. Now your total time is N times T one divided by P because you have P of this cores doing at the same time. And now your throughput is multiplied by a factor of P. So, P is the number of processors or cores which basically is increasing H by a factor equal to the number of cores. So, that's one way to measure how fast we can do things if we do it in parallel. Now, let's talk about scaling. So, given a problem, the question is how can I measure how fast I can do it with P processors? So, that is usually called a strong scaling. So, in our previous case, the scaling was linear as I was telling you and you can see in this plot. So, the task per unit time is proportional to P as I increase P H, the throughput increases linearly. And again, this is the case for embarrassing parallel cases. What happened in reality? Oh, sorry, before that. So, another way to measure this is what we call a speedup. It's another way to measure that is instead of just having the throughput is dividing the time of running this problem or this problem serially, like we just one core over divided over running it, sorry, with P processors. And again, for embarrassing parallel applications, the scaling is proportional to the number of processors so it lasts a linear speedup, okay? But remember, this is the ideal case. This is the case where there is no communication. I can't have the 60,000 cores in my supercomputer running independently of each other. They don't need to know anything about the other computation is standing in the core next to it. Reality is more complicated like that. So, in most of our cases, this is how your program will look like. At the very beginning, you will have what is partitioning of the data. So, you take your original data, you divide across your processors and then each of the processors here are one to R4, we'll do the computation at the end. There is what is called a reduction is the combination of the results and your final answer is there, okay? So, this whole in reality, these things looks like. So, you've had a parallel overhead is called at the beginning for splitting the data across the processors. You had a parallel region where you can, if you are lucky enough, and there is, again, no communication between these processors is embarrassingly parallel. And then you have a serial portion at the end where you recombine the data, okay? So, let's rework our scaling equation, the time of the serial divided the time of running parallel. And in this case, I won't go through the math, it's very simple, but in this case, you can show that this equation can be rewritten in terms of what is the serial fraction number, what is the serial time divided over the total time of the process. And the most interesting thing that you can show is if you rewrite this equation slightly in this way and you take P, the number of processors to infinite, you get something that is no zero, which means that your scaling is never going to be perfect. So, even if you have a case which is pretty much ideal, that is, remember, there is no communication here. It's not the case in real war. If you have AMR, there is communication all the time through the boundaries of the domain. But even in this simple case, you can show that your speed that is limited no matter what the size of the processors, well, no matter what number of processors you use. So that's one of the take-home messages of parallel programming. So let me very quickly talk a little bit about hardware because I told you, if you want to take a full advantage of supercomputers, you need to be familiar with the hardware because the way in which computers organize the memory and process things is a bit different. So the type of hardware that you will find in supercomputers centers usually is categorizing these ones. So clusters or distributed memory machines are the most common ones. It's basically take a bunch of computers, connect them, throw a network, a very high speed network, easy, easy, relatively easy and relatively cheap. Then you have multicore machines. These are machines that share memory among different processors. So they can be seen in the same pool of memories. The issue with these ones is limited number of cores and they are much spent here than the previous ones. The other ones that are very much in use lately are accelerators machines, either using CPUs, graphic processing units, or accelerators like the CM5. They are quite fast, but they are quite complicated to program. And finally, vector machines are the very early supercomputers. Nowadays, all the processors do vectorization, which is try to do several operations at the same time, but at the very beginning, they were kind of the bleeding edge. Most of the supercomputers actually have a hybrid, a combination of these two, and this is the most case, a cluster of multicore machines. So let me very quickly, so I told you clusters are just grab a bunch of computers, normal computers, connect them with a network, usually it's a specialized network, and you are in business. In this case, what happened is each processor communicates to other, but each processor has its own memory, so that's the red block here. So if you need to transmit or communicate information, you need to do it by yourself, basically. And the program or the algorithm, the implementation that we use for that is mess-as-passion interface, MPI for short. So that's how you basically handle that. The share memory approach is different. So the red square now is bigger. All the cores have access to the same red square, basically the same memory. So there is no need for communications because communications actually happen through the memory. There are other issues, there are other kind of problems that you can have, but the implementation is easier. One of the languages of the implementations that you use for programming these machines is OpenMP. Finally, hybrid architectures is a combination of these. So each block here now is one of these share memory machines. So there is the big memory pool here, the nodes, and then connected between all the machines. So at the end, you end up using OpenMP plus MPI for programming these computers. On top of that, you can have a hybrid where each of the units is, okay, the cores plus an accelerator, which could be a GPU or an actual accelerator, okay? If this is the case of your supercomputer, then you may not only use MPI and OpenMP, you may need to use CUDA or OpenCL, which are different languages, targeting these accelerators in particular. So programming approaches, let me see if I can go through. It's basically what I mentioned, if you have an embarrassing parallel application, you can just basically leave with the code that you have, if you have a share memory machine, OpenMP, is your target, distributed memory, is MPI, and graphic computing CPUs, CUDA, OpenACC, or OpenCL. And of course, combination of their own. So let me backtrack a little bit and do the connection with research and science from the hardware side. So we notice at least in our system that people run for different reasons. Astrophysics, as the examples I have been showing you, Amorphoedics, I'm going to tell you a little bit more about that, high energy particle physics. And I think that we have one of our moderators from that field, if you need to process or with the data coming from the LHC, and by the way, we host some of the data here. I think we are tier three or tier two of LHC data centers. Latte, QCD, investigations, all these kinds of things fall under that category. Contents, matter, quantum chemistry, and material science, they basically try to solve 15,000 fields approximations using codes. Soft contents, matter, and chemical, sorry, biofixes, they use a lot of molecular dynamics or Monte Carlo simulations, engineering, they deal with a lot of optimization problems. And finally, bioinformatics, it has a vast of quantities of genomic data that needs to be processed and it has their own problems by themselves. But this is more or less the niche where we can see the role of HPC in current search. Let me tell you about two study cases or two early science cases that were running in our supercomputer, Niagara, that was deployed, as I told you almost a year ago, just a bit of the specs that our supercomputer has. These are kind of unique to the supercomputer that were designed in this way. I told you, basically we have a cluster where the computers are connected through a network. Well, we have a very, very high-end network, it's called Dragonfly Plus, which has adaptive routing. So basically it means that there is a computer controlling the connection between the cores and what it tries to do is it try to figure out which is the best path, the path's less busy for sending messages between the cores. So it has some smartness on top of the topology of the connection of the networks. The total machine has a power of 4.6 petaflops, means petafloating points operations per second. These are 60,000 cores in 1500 nodes, basically 40 cores per node and has hyperthreading enabled, so you can even reach 80 cores per node. It has a parallel file system, so the nodes don't have a disk, but they have a big hard drive that is shared across all the nodes. On top of that, we have something called Burst Buffer, which I think has, I don't remember exactly the capacity, but this is a collection, it's an array of solid state devices. It's hard drives that are faster than the usual hard drives, so it can be used as a temporary storage or if you need to write a lot of things. So we had two examples of the early science program that we ran last year, and we had other attempts from which I was part of one, but it wasn't success, so that happens on science, but let me tell you about the success stories. So we were awarded the best use of HPC in physical science award by HPC wire and supercomputing last year, which is the most important conference in supercomputers, and that was for implementing a spatial resolution model of the Pacific Ocean, just to validate ocean waves, movement, and assisting global warming calculation. So you can read more about that in the link. The second one was the simulation of core combustion in massive star, revealing the turbulent flows of the interior and the state of oscillations done by Professor Edwards and collaborators. So those were what we call heroic calculations because they basically were able to utilize the full cluster at once. So the 60,000 cores computing in parallel and communicating to each other in a very harmonic and nice manner. So just to summarize these parts of the HPC, and I think I'm kind of running short in time, but let me see if I can switch gears now to AI. If you want to use these supercomputers, you need to learn parallel programming. You cannot take your code as it runs in your computer, just move it to the supercomputing and expect to get the right scaling, the advantage of running on the supercomputer. It's quite the opposite. In most of the cases, you will need to rethink a little bit, not only rewrite, but also rethink a little bit the way how you do things. Depending on the hardware, depending on the type of clusters, this is different. So it's not one size fits all that doesn't work in supercomputers. And of course, the way how you program this election or election of your language, it also matters. So let me very quickly, I think we have less than 10 minutes, maybe five minutes, talk a bit what is machine learning. And the reason why I want to do that is because we have been noticing more and more people interested in this field. So what is machine learning, first of all? Well, broadly speaking, machine learning is, I put there model fitting, but now that we are talking, right? It's not fitting anywhere, although it's going to be recorded. Let me use the phrase, is statistics on steroids? Okay, so that's my way of seeing machine learning. In some ways, it's ethical to data analysis, which may involve fitting curves to your data, determining parameters in a ready established model, but it can differ from data analysis as well. If you don't know the correct model, there are algorithms that can help you with that. Is you are also in the model to make insights of the data, but they're looking for any scientific insight based upon it as another case. This can be particularly useful at the beginning of research when you don't have much idea and you're basically doing exploratory data analysis. So one, okay, this is another thing that people usually differentiate in machine learning is supervised versus unsupervised learning. So supervised learning comes, basically you can think that your data comes with slabs and you know what the right answer is. So curve fitting is one possibility because you have the values and then you can try to fit a model to that. That's one thing. Prediction type analysis, like decisions trees or neural networks is another. Unsupervised machine learning is when we are looking for patterns in the data, but we don't know how they should look like. LIGO has a nice example of this. If you think about LIGO, the way they search for gravitational waves, they have a basically unsupervised recognition when all the detectors basically triggers the signal at the same time or you have the matching filtering search where you have predefined models and they basically try to fit to those. So those are the two examples, for instance. And of course, there are semi-supervised methods where basically you combine both of these. Let me tell you an example of one of these methods for instance classification. Classification is in some way similar to regression because you can basically think that you have a model to the data with no answers and then you use the model to make predictions about the new data. But what about if the levels are discrete? Well, you can still do that. So these are categories now and one example of this is logistic regression. So logistic regression is a case where you basically do the classification but just in categories are not in continuum values. What kind of problems are well-sued for classification? Where bioinformatics, for instance, classifying proteins according to the functions, medical diagnosis is a big one, image processing, recognition of objects in images, handwritten text analysis, text categorization like spam filtering, sentiment analysis, language recognition, flotation. In this case, the variables or the data can be continuously discrete or combination of this. What kind of a classification approaches or methods there are? One of the best notes is decision tree. Basically, you can basically take decisions depending on the features of the data. Logistic regression, I mentioned this, is linear regression but it basically has two categories, zero one binary. Naive base, which is basically a statistical method based on Bayes, K and N or K nearest neighbors. You basically use the K or a number of nearest neighbors to the data point to predict the category of a new point. Support vector machines is essentially a linear model of the data. And then neural networks that they will, if I had time may talk a little bit but this is something clustering on the other hand is a type of unsupervised learning. Basically, because we are going to try to group things but without knowing which are the groups in advance. So applications of this can be finding patterns in properties of fallacies, determining proteins with similar interactions, microsegmentation, question-wise customer who buy eggs often by this or Netflix offering you a particular type of movie after you watch another. Other machine learnings, algorithms or topics can be, when we talk about classification algorithms, assembly methods, the most well-known one is random forest. And then I think with this one is you can think about them as an effective field theory approach. If you wish, dimensionality reduction, examples of this is PCA's principal component analysis, non-parametric regressions, it's a kind of enhanced regression, linear regression and then variable selection, which there are a lot of them. I know I'm going fast here, but I want to reach this point and make some final comments. So network and networks. So basically networks are inspired by the human brain, the way in which networks connect to each other. The nice thing is that these things can be trained. So you can have data, you can use the data to train the network, basically fit the model and then ask questions to it. So where are them? Well, they are using almost everything. They're using image recognition, medical diagnosis, natural language processing, novelty detections, next-word prediction, text sentiment analysis, system control, and I will show you some particular cases of the data astrophysical applications. One of the most well-known example is, and the motivation is think that, so you can see these digits here, think that I had to teach, I had to program a computer to recognize each of them, how I will go around that. Well, if I disanign, I can tell the computer, okay, there should be a circle in the top, a line, vertical line, a one is just a vertical line. What's about if it is twisted, if it is rotated? So the details are humongous. So instead of that, I can create a network and tell the computer, okay, this is a nine, this is a two eight, then you figure out the rest, okay? So that's the main motivation. This is common to all neural networks approach and is that you need to split your data in three groups, training, testing, and validation. This is quite important because you will use the training data to basically set up your network, you will use the test data to see how good the network did and then the validation is kind of a third independent test. Very quickly, how this unit looks like, well, a neuron is nothing else than a function, it's a fancy name for a function, it has different inputs, X1, X2, X3, and then an output B. So this is in math how it looked like. The important thing is each input has a weight and then there is one bias per neuron. So you can see in vectorial forms as a vector multiplied by the input vector plus a vector of bias. Let me skip this one and this is how the network will look like. So you have different of these neurons connected all together. There is an input layer where the data comes from, there is a hidden layer where is the data is processed and then there is an output layer where the network basically offers the result. The important things to consider about this is that each of these has these vectors with parameters that are trainable. So we are coming back to this idea of regression models of fitting data. So what happened here is that you have a hyperspace, a humongous hyperspace of trainable parameters and that's basically how the neural network works. You give data, you train all those parameters, you optimize those values and at the end you have a system that you put information that the network hasn't seen and is able to produce a result that is good enough. There are much more details about that. I'm running out of time. So let me tell you the examples I have in nine, gravitational waves, deep learning detection is one of 10s. This one is a very nice one because it's a couple of students that took our course and they end up doing a project with star matter halo catalogs. So they created a convolutional neural network for simulating more star matter halo catalogs. And the last one, which is a very interesting one, is a paper or neural ordinary differential equation. So this is an approach that can be used or is proposed to be used for solving ODEs. I won't talk about that, but a lot of tools that you can use to implement neural networks. Let me skip deep learning and just run into my conclusions. I will skip this one if you have questions, ask me about the future trends. I will be happy to answer that in the question section, but I do want to mention this very quickly. So one of the things I have been trying to convene in this presentation is that you need to know how to implement your goals in supercomputers. And these are a slide with resources. So my first suggestion is to look in the supercomputers that you have close by in the States, there are a lot of them in Europe as well. We are in Canada. In addition to that, and this is how we met with Alejandro is try to go to summer schools. The IHPCS international high performance computer summer school is very nice. It's free. It's a very nice opportunity to meet people, very interesting and nice people. They have the Petascale Institute organized by the Blugo Waters in the States is online. So anyone in any part of the world can assist. We have our educational website. It's very interesting in conference. There are different conference that you can attend. But more importantly, I realized this is a Latin American webinar. So most of our audience must be in Latin America in machine. And I realized I don't have many resources about Latin America, even when I came from Uruguay. I don't really know if there are many resources, many systems there, but what I can tell you is if you have projects, if you have questions, if you want to do something, just contact us. We have this address as research at sign.utoronto.ca or if you have questions about training, we have courses at sign.utoronto.ca or Alejandro can share my email with you. So I will be more than happy to talk with any of you. And I think I'm done. Sorry, I ran a little bit longer than expected in machine. Okay, thank you very much. I know it was too long. No, no, no, it's okay. Thank you very much for this nice webinar. Let me see if our coordinators have some questions. Yeah, I have a couple of them, as usual. So very nice, Marcelo. I like a lot the talk, especially all the content that is inside is gonna be very useful for all the viewers of this webinar cycle. So I have a couple of questions. One is, I mean, also you mentioned at the end, but yeah, what would be the role of the quantum computer? Because this, I mean, there is still no commercial yet, but there are some semi-commercial, I don't know how to... No, no, no, no, actually, I don't know. Are you seeing my slides again now? Yes. Okay, so for the quantum computing, there are a couple of options. Actually, there are some of them are commercials. D-Wave is commercial. Rigetti, I think is commercial, but IBM, I don't remember if I put the link there. If not, I can send the link around. IBM has a version that is available online and you can sign up and you have two versions. You have five qubits and I think 16 qubits and they have some nice training material. So the difference, let me talk a little bit about, there are basically two approaches right now to quantum computing. One is what we call annealing machines. D-Wave and Rigetti are annealing machines. Basically what they do is they solve minimization problems. So think about spins oriented in different directions and you set up your system like a particular configuration and then you let the system relax like a nice model if you wish. And that is what Rigetti and D-Wave are solving. The quantum computer in IBM is based on gates in Q gates. So it's the generalization of the logic gates like and, and not, or, or, and none and these kind of things. So you can create your own quantum circuit. But that one is available. That one is available for free in particular for academia. So I can, I don't think I add the link there but I can share the link with you guys if you are interested. The interesting thing about quantum computing coming from the perspective of a person that has been training computers in general is that you need to basically give up the idea of discrete math. And so in computers, in classical computers you have, okay, a bit can be a zero or a one, right? It's basically related to the fact that a circuit can have electricity flowing or not. And that's where the one and zero comes from. And that's how we discretize everything in terms of that. On quantum computers, the qubit can be in any superposition between zero and one. But it's not only that it can be continuing, it can be in a probability. So you need to associate a probability state to that. So that's, that's basically the machine. But, but coming back to your questions, there are resources that are already available for, in particular for academia and research to use. I don't know if I answered the question. No, yeah, yeah, yeah. But yes, in fact, but just, but do you expect that for instance, in the same trend that with the normal computer is to make clusters, the idea then is also to make clusters of quantum computers? It's a really good question. And I don't really have an answer for that because on the one hand, we try to think about the quantum computer as an evolution of the traditional computing. But if you think in terms of the machine itself, it's a complete different animal. So it's hard to see how they can connect. I know there are initiatives like, I think Amazon has an initiative to have, also IBM has an initiative to have cloud services with quantum computers, which is kind of tricky, right? It not necessarily means great clusters of quantum computers, but the access to the data is clusterized issues. So I think we are at the very beginning of the, of the quantum computing, but it may happen at some point. I can't really answer that for sure. Okay, so just one short question because it seems there are questions from YouTube. Just one small stuff, because at the end you mentioned how it works, the Niagara, the cluster that you have there. So is it too hard or too difficult to apply for time in those clusters? So it's kind of, you have to make a proposal then evaluate it. Is, yes or no? Let me tell you why. So it's very easy, but if you are in a Canadian institution or you have a Canadian collaborator, if you are in that situation, it's just an application, it takes almost one week. Usually we process that faster than that a couple of days, stop, and that's it. Okay, it's similar to what happens sometimes here in Chile with the observatories that people as a Chilean institution has more priority. It's exactly the same model, exactly the same model. Okay, thank you, I have one, but later I can. Sure, sure, and as I say, feel free to email me, probably Alejandro can share my email with all of you. Yeah, thank you. Okay, I think we have the question, time for two more questions. One is from Yassin Afnan. He's asking like a sort of recommendation, what are the languages or so students should learn? Okay, that's a very good question, but let me rephrase the question or target or answer the question in a different manner. I couldn't categorize by physics because it depends on what kind of problem you're trying to solve. If you're doing numerical relativity as I was doing, definitely I will say Fortran C, C++. If you are doing data analysis, like heavy hardcore data analysis, even for high energy physics, probably Python, probably are, there is a language that is catching up, which is called Shulia. It's very nice because it has the best of both worlds. It's a high level language, but it can also be compiled. So I'm sorry to don't give a specific answer, but it depends a little bit more of what you want to do. If you want to solve all these or PDEs, probably you're actually PDEs in a numerical manner, probably Fortran C is your best target. If you want to do some symbolic algebra, probably Python, Python is very versatile, has a ton of packages. Notice that I'm always restricting myself to open source kind of suggestions, and that's a kind of a thing that I think is good to emphasize, try to use as much as possible open source tools. Nowadays, we have a community where open source tools are basically at the level of any commercial software. We always have to fight with people coming from fields like engineering where they use a lot of math lab or Mathematica even, but nowadays, I think tools coming from the Python, especially Cy are at the same level. Okay, thank you. There's another question from Thomas Bailoons. He's asking, is there any resource collecting wealth concerns in parallel programming, especially for shared memory systems? There are a couple of resources. There is one called, I think we share this one in the international. Oh, you mean like resources for running in parallel? That's what I mean. Yeah, it looks like. Again, it depends. I mean, open to the public. I'm not exactly sure. I know that you can apply for Amazon web services, which has a cloud thing and you may get a free account there. If you are talking about a particular experiment, of course, people from the CERN has their grid infrastructure set up. So if it is a free resource, I'm not exactly sure, to be honest. I can talk a little bit more about the specifics to each country, in particular, Canada and the United States, which are the ones I know the most and maybe a little bit about Europe, but not in general across the globe. Okay. And I think I got a question from the email. If it is possible you two comment a little bit on the difference, like just like a scratch, between CPUs and GPUs, like when should one use one of those instead of the others? What are like the main difference, like in a nutshell, when people shoot like for the sole architectures? For sure. That's a very good question. And I already had the chance to explain that in the presentation. Like they are quite different, actually. Like I make also the slide available because it has a lot of links and movies, but basically what changes a lot is the throughput. So remember, we define the throughput. The CPU is basically just one unit processing doing operations. It has vectorization, so it can do more than just one operation at a time. But the CPU is a beast. This CPU is a beast doing operations. It's actually thought for that. If you think about the CPUs, it's the hardware controlling our monitors, right? And our monitors has, I don't know how many thousands of pixels by thousands of pixels. And that computation, deciding which color each piece has has to be done at the same time. So that's the kind of show that the CPUs are good at. Now, having say that because it feels and sounds like CPU is the way to go. Now the caveats with that are, CPU has very limited memory. So when you deal with CPUs, you need to think in a couple of things is like, okay, will my problem fit in the memory of the CPU? Because again, CPUs were thought to deal with just pixels, basically integral numbers, right? And just one at a time is a stream of numbers that come boom, boom, boom. It basically paint the colors in the screen and that's it. So let me tell you just a quick example of what I'm trying to say. People have been for several years now, probably 10 years, trying to port the numerical relativity codes to run on CPUs. And there are some papers out there where they basically did a proof of principles of concepts where they say, oh, we were able to rewrite Einstein equations and run it on the CPU. Turns out that no one is using that approach because it doesn't perform well for that kind of problem. So when people usually ask me, okay, is it better to use CPUs or CPUs? I will say, okay, which is your problem? If it is matrix operations, yeah, probably CPUs are the right approach. An example of this is neural networks. Machine learning on that regards, most of the frameworks are listed in the slides. Most of them has a backend running on the CPUs because CPUs are very good doing that kind of computation. Same thing with accelerators, it's more or less the same idea. The main difference are how the memory is handled. The programming is very nasty. So programming cores, programming CPUs can be complicated. It has a medium learning curve, I would say. Programming CPUs is a complete different beast. It has a steep learning curve. So you need to actually, to take full advantage of the CPU, you need to be very knowledgeable about CUDA, about OpenCL, you should go in that direction, but at least CUDA, OpenACC is starting to tackle that. The latest version of OpenMP4 has an offload to accelerators. We can take advantage of the CPU, but the caveat there is it's never going to be as efficient as just programming pure CUDA on the CPU. Okay, thank you very much, Marcelo. So the slides will be available in our webpage in a couple of days, and we will post all the information of Marcelo if you need something. So thank you and stay tuned for our next webinar, which is going to happen in two weeks. Bye-bye. Thank you, Marcelo, for this nice one. Thank you very much for having me.