 All right, so my name is Daniel Mazer, and I'm an HPC analyst. HPC stands for High Performance Computing at Clumec at the McGill University site. And I'd like to, of course, thank Montreal Python for putting this all on and giving me a chance to talk. And I'd also like to introduce my colleague Bart Oldman, who's here. Also, he might help me answer some questions at the end of this. OK, so just by show of hands, how many people use some sort of HPC facility right now? OK, good, a handful of people, that's good. So I'm going to be starting with an introduction to what is an HPC center and what is the McGill HPC center. Our machine is called Gilliman. And then I'll talk. Basically, this talk is sort of anthropological instead of computer science. So I went and asked everyone on our system how they use Python. And so I'm going to give a bunch of examples for my presentation. And it's sort of a summary of what are the benefits of using Python and HPC's cluster. Why do people choose it over other options? So Clumec is the name of the consortium that employs me. And so they're involved in these aspects of providing research support to people. The obvious one is technology. So we provide machines for people to run code on. Less obvious is training, mentoring, dissemination outreach, and strategic partnerships. So we try and provide support for people who are developing code and who are having errors come up and they want help and to be able to ask questions. We provide workshops to provide training for people who want to get involved in scientific computing and HPC. These are pictures of the actual machines. So on the left-hand side, we have all of our computing nodes lined up in rows like this. We take up a total of 5,000 square feet of IT space at École Technologie Supérieure. So we have a total of 1,200 worker nodes. And there's 12 cores per node. So that's a total of 14,400 cores. We have two petabytes of parallel storage on the bottom left-hand side is our storage rack. And the speed of our system is 130 teraflops. So when we first launched this system a year and a half ago, that put us at the top 34 worldwide. But we've since dropped to 179 worldwide. So it shows you how fast things scale up. And in the top six in Canada are the top four academic sites. This shows our usage history. So it's pretty steadily high. I've marked our theoretical peak usage at the top there. And so we run at about 90% capacity most of the time. So we try to be a very multi-use facility, support a very diverse range of different computing projects from a very diverse range of fields of research. So there's a bunch of different fields listed there that is not at all meant to be inclusive of everybody. We have a total of 1,300 users currently. And those represent more than 350 research groups. That's from Canada and also internationally. We also get involved in some industrial partnerships with some non-academic people who need access to HPC resources. And finally, providing outreach and educational activities for HPC and scientific computing and things like this. Here's a list of some of the services that we can offer researchers to get involved. We have a website full of information that you can check out if you're interested. This website, support.clumec.ca, is where you can find documentation about how to sign up for accounts. Or if you have an account already, how to start using the system. We also provide all sorts of supports. So the support we provide is not just limited to, oh, we have a problem with the cluster. We're getting this error message. And also, you can contact us if you need help developing some new project and you want help planning it and how to run it efficiently on an HPC cluster like ours. So we can provide more long-term support like that in addition to helping you solve individual problems that come up. And we provide a lot of training sessions in workshops. So I recently gave a workshop on GPU programming in CUDA C. A couple of weeks before that, we had an introduction to MPI given by Bart. And then we have some more workshops coming up in the future. This Friday, we're giving a workshop on OpenMP, which is unfortunately already filled up. But you're welcome to sign up for the waiting list and hope for the best. And we're going to give an introduction to Python workshop in June. So the details for that aren't set yet, but you can check out our website. OK, so now I'll start talking about Python and how it gets used. We have several versions of Python set up in a directory that's available to everybody. So as soon as you get an account on our system, then you have access to these three versions of Python. And these are the general scientific modules that we've set up. So NOS is used for unit testing. We've talked a lot about NumPy already. SciPy is a similar set of scientific libraries. Matplotlib is very useful for plotting things. I'll talk a little bit about MPI for Py and iPython a bit later in the talk. And Cython we've talked about. So all those things are already set up on our system. And of course, we can add more versions and modules if there's enough popular demand. And users can also install things for themselves. So if you want to keep up with the latest and greatest versions of things, you can do that on your own. OK, so this is my list of reasons of why people use Python and HPC. It's very high level and that lowers a lot of barriers to adoption, especially doing things like parallel computing can be intimidating to a lot of people. And having a high level way to do those things makes it a lot easier for people to adopt it. Reduces time to solution. So people on HPC centers don't just care about writing really fast code that runs in a very short amount of time. They care about their own time, too. They're busy scientists, so they want to get things done quickly. The last speaker I think mentioned this great duct tape quality of Python. So it interfaces with the OS really well and libraries and other software written in different languages. So the modern scientific workflow is sort of characterized by combining a bunch of programs that have been written by graduate students over the past several decades in a research group in various languages that were popular at the time. And you just stick all that software together into a unified workflow. So Python often serves that purpose. And then there's this great package sage that ties together a whole bunch of open source numeric software into a unified Python interface. So anyway, all this stuff helps scientists reduce reinventing wheels. They can just use software that already exists. Python's open source, so it's portable, free, transparent, verifiable. The last two are particularly important for science, I think. And unlike something like Matlab, Matlab is also this high level way to access parallel computing easily. But Python, you can scale it to arbitrary numbers of nodes without having to pay a whole bunch of licensing fees. Python's interpreted, so you can do interactive data analysis and plotting. And there's this iPython package that lets you do interactive parallel computing quite easily, some head nodding. So maybe some people are already trying this out. OK, I'll move on to talking about some specific examples. So first up is one group using our system is doing gene flow in animal populations. So gene flow measures how easily genes get transmitted in between separated populations of animals or plants. So if you put something like a mountain in between these two populations of animals, that restricts how easily the genes can flow. So there's this piece of software called CircuitScape. And this makes a model of gene flow by taking the landscape that these animals are in and modeling it as a network of electrical resistors. So now you have a physics model of how gene flow works. And you can make pretty maps like this. So this map shows for 11 different white-footed mouse populations in southern Quebec, which are labeled by these black dots with labels. There's 11 of them. And you can see some of the geographical features emerging from this gene flow map. So the blue corresponds to where the gene flow is really low and the red corresponds to where it's very high. OK, the next example comes from climate change research. So in this case, for this particular group, there is a model of ocean and ice model called MITGCM. And it's written in Fortran. But it outputs its data into this net CDF format. And then you have to go and analyze it to get the information out of it that you're interested in. So that happens to be done with Python and NumPy. And so the group does all their statistical analysis on the data, runs diagnostics on the software that's running, validates the data, and then makes graphs and plots of it. So this is the type of plots that they come up with. This is the ice thickness at the North Pole for three different years, 1992, 2005, and 2035. And in all cases, this O9 corresponds to September. So on the left, you can see there's a lot of red. That corresponds to sea ice concentration. So the left-hand side is sea ice concentration. And the right-hand side is sea ice thickness. And then you can see the evolution as the ice melts, as the planet gets warmer. And I'm told that the 1992 and 2005 predictions correspond quite nicely to what happened in reality. So maybe we can expect this is what 2035 will look like. OK, searching for pulsars. So this picture is from something called the crab nebula, I believe. And at the center of this nebula is an old star that died in a supernova explosion. So the supernova blew off the outer layers of the star. And what's left inside is this very dense core that is made of the most extremely dense material in the universe. And it has the most extremely high magnetic fields in the universe. And it sits in the middle of that stuff rotating very quickly. So it has a dipole magnetic field like the Earth that's not aligned with its axis of rotation. And it likes to emit radiation out in cones along that magnetic field. So it emits radiation out in cones like this. And what it looks like to someone sitting on Earth is like a blinking lighthouse type effect. Every time this cone of radiation sweeps past. So when you're searching for pulsars, you're searching for little blinking lights out in the sky. To search for these things, you first start with some telescope somewhere that collects data, looks out in the sky and collects data. And then it loads it up to a server somewhere. So you download that data, and then you do a job submission on our cluster. And then there's a software package called Presto that completes these steps that are in black. So they remove any sort of radio frequency interference from the data. Then they search for these little blinking sources. Then there's a whole bunch of analysis that happens on the pulsing sources. And from the data that you get from that analysis, you try to identify what are likely pulsar candidates. So this is very computationally intensive stuff. It requires a lot of cores on our computer. And everything's driven by Python. So you use Python scripts to download data from the server. And then you automate the job submission. You automate the launching of the software. I believe this Presto is written in C++. But then Python takes over again, uploads the results back to the server, and then human beings look at it from there. So this plot shows all of the pulsars that we know of. So there's about 2,000 in the pulsar catalog right now. The horizontal axis here is the period of the pulsar. So how fast is it spinning around? And the vertical axis is the period derivative. How fast is it slowing down as it spins? All the black dots are the known pulsars and where they fit on that plot. And then the vertical lines are the new pulsars that have been discovered on our supercomputer in the last couple of years. So there's different colors because there's two different surveys that are happening here. One is the Green Bank North Celestial Cap Survey. So Green Bank is a telescope in West Virginia, I believe. And so they've found a total of 61 pulsars in their survey. And of those, Gilliman has found 48 of them. The Pelfa Survey uses the Arecibo Telescope, which is famous for being in the movie Golden Eye, and Contact. And they found 113 pulsars in their survey, of which 30 to 40 have been on our system. So finally, there's a group doing parallel partial differential equations using, I believe, totally Python everything. So the PDEs are solved using numpy routines, and it's parallelized using MPI for Pi. And then visualizations are created using this package MayAVI2. So I believe this is a picture of what's called an icing model. So this has different lattice sites that are coupled with nearest neighbors, and then there's instabilities that make them turn red or blue. And so you can visualize it this way. And if you have specific questions, I believe Dan is here today, so he'll be willing to answer them. I'll just quickly run through a few other examples. There's an Atlas high-energy physics experiment that was famous for finding the Higgs boson, or a particle like it. And they use it for automating a lot of their tasks. There's a group using it for automated software management, so installing and uninstalling packages quickly that they need. There's a package that's fairly widely used, HTSeq, that's used for gene sequence analysis. And there's people on our system using that. There's a lot of people who use Python for interfacing with other bits of software that are written in other languages. And finally, as I mentioned before, there's this IPython package, so there's some people doing interactive parallel computing, which is really useful for learning how to do parallel computing and for developing algorithms that you want to use in your research. OK, so I'll wrap up now. Just a summary, I gave a few very diverse examples showing that there's lots of ways to use an HPC center, and there's lots of ways you can use Python on an HPC center. And all these examples illustrated these sort of main benefits of Python. OK, thank you.