 Thank you for coming to my talk. First I'd like to say that I work for the Max Plan, Computing and Data Faciltiy. So in short mPCD-F, which is the cross institutional memetum center of the Max Plan society. So we collaborate with scientists from different Max Plan Institute societies to support them from the high performance computing point of view in for data science as well. Max Planck Computing and Data Facility is involved in the development of application and algorithm for high performance computing, as well as for data intensive, for implementing and designing solution for data intensive project. So not only operates the state of art of supercomputers, but also provides up-to-date infrastructures for data management or long-term archival. In collaboration with one of these institutes, to be precise, the Max Planck full-isoforschung centrum in Düsseldorf, we are working in the so-called big-data-driven material science domain and in particular I'm supporting them in the imagined area of atom probe crystallography. So their goal is to reveal both structures and composition of crystals and basically they ask us to perform Fourier transforms of large data set. By large I mean something like billions of atoms inside a crystal. They plan to retrieve high quality crystalline data and iteratively apply Fourier transform, which is imperative if you are interested in looking forward to high quality subvolumes and this cycle of operations helps you to improve and reconstruct the parameters useful for the atom probe tomography. This requires state of art of data mining and visualization. The Fourier transform is no scattering maps for nanostructures in principle can be computed with the formula in the upper box as long as the electron or atomic nuclear density is well defined on a grid, fine enough to resolve atomic position R inside a crystal. So you can solve basically the Fourier transform either by the well known algorithm of fast Fourier transform which is pretty fast, scales like n log n and well suitable for large crystalline structures or by direct calculations. Which it's a bit slower, but we prefer to follow this way because it's the most general case. In principle you can compute the amplitude of scattering maps starting from the atomic position and scattering factors inside the crystals from any structure model, I mean it can be also ideal or non-ideal case the crystal can present some deformation, some tension some strain, so that's why it's very useful to compute discrete Fourier transform by direct calculations. So in this numbers usually we have to do with a lot of atoms, as I said billions of atoms so in this presentation I'm showing results based on 10 to the 8 atoms the reciprocal or lattice space denoted by hkl usually requires a very fine resolution so let's say 10 to the 6 and so in terms of floating point operations you have 10 to the 8 times 10 to the 6 times 10 which accounts the floating point operation due to the algebraic operations involved in Fourier transform essentially you are evaluating the sinus and cosine function so in general you end up with 10 to the 15 flop but compared to the peak performances of modern architectures it seems this algorithm is well suited for GPU computation which has 10 to the 13 floating point operations per seconds and I'm going to show you how this algorithm scales perfectly on multiple GPUs on a computation time of order of minutes and we are doing the MPCDF, not cooking perfect blue crystals, no worry and first why GPU programming in combination with high level scripting languages like Python they are polar opposite GPU parallel programming are highly parallel very sensitive to the architecture, the hardware floating point memory throughput in order to give you a tremendous high performance when you address your scientific task on the other hand Python is in favor of easy use favor is of use but PyKuda aims to join together these two aspect and this is similar philosophy and can be considered as sort of sister project but for the rest of the talk I'm just referring to PyKuda and why Python, well not only is easy and user friendly, easy to learn general purpose but it's very valuable for the scientific community so a lot of packages very useful for addressing your scientific task and this allows you to write your code in dozen of lines instead of hundreds or even more in other programming languages especially this avoids to reinvane the wheel anytime it excels in the displaying your data since scientific visualization is in scientific process and NumPy which is a foundational package for scientific computing it gives you powerful and dimensional array broadcasting function, optimize the linear algebra Fourier transforms algorithm and tools to integrating C in C++ and Fortran code simple and useful program you can write in PyKuda is simple to multiply by two element wise, your four times four array two things you import PyKuda driver as Kuda alias and then gives you access to driver level Kuda interface then import PyKuda auto init because automatically picks the GPU available up and run on it and then simple you define your four times four matrices you allocate memory on the device, by device I mean the GPU and literally host to device transfer your NumPy array to the device and now in the red box the most interesting part you have a purely Kuda C code so essentially this code executes on the device and then it's called within and from existing Python code the same results can be obtained with much less effort using GPU arrays since PyKuda offers abstractions and so the GPU equivalent of NumPy array in agreement with the edit run repeat style PyKuda has a two fold aim it aims to simply usage existing Kuda C and wrapping this Kuda C avoid to reinvent basics of GPU programming or on top of the first layer PyKuda offers abstractions so PyKuda gives you easy, complete and pytonic access to the GPU so this guarantees automatic resource management and error checking convenience in sense they provide abstractions as I showed you before and tightly integration with NumPy array and that's a very well documentation here just I'm reporting some links where you can get more information about PyKuda what we're using to analyze our crystals we are using PyNx which stands for tools for nanostructures crystallography it's an open source library, the author is Vincent Fevr Nikoli and the code has been developed at the European Synchrotron Radiation Facility I'm just talking about the main modules in charge of computing x-ray scattering by getting benefits of using graphical processing units just to give a complete overview I'm just enumerating the remaining modules in the rest of the discussion so the main aim of PyNx is to give a large assemble of atoms, let's say billions of atoms to compute a Fourier transform in 1, 2 or 3D coordinates in the reciprocal space with very fine resolution using the performances of GPUs the high performance is obtained by using either NVIDIA toolkit and so along with PyKuda library or as I said with OpenCL at MPCDF the default Python environment is provided by Anaconda distributions PyNx support Python version 2.7 and above it can be simply downloaded by the website project account and become a developer or simply pip install PyNx it's required to have PyKuda, of course if you want to run on GPU and NumPy, matplotlib if you want to display data PyNx if GPU are not available can run on CPU as well import PyFFTW optionally you can use some external library the cctbx library which stands for crystal computational toolbooks under your Python distribution what makes PyNx very valuable is you can simply use the Python interface but you don't need to learn Kuda I've just finished to say you don't need to learn Kuda and then I'm showing a Kuda piece of code but at least it's useful and nice to have a view of what's going on essentially KudaFHkl is your device scanner you have a device scanner for each modules in the PyNx library you are indexing array by combining threads and multi threads and block for each block you are located in shared memory because threads are better for communication and synchronization for global memory you are transferring your input data to the shared memory you have access to the same portion of the data and importantly you have each tree to compute each single reflection and also you have fast and optimized trigonometric function this is just the continuum to take care of the remaining of the atoms included in your data sets and now finally to Python interface actually our data providers they give us this file extension dot pos which is essentially made of four columns and the fourth column is the mass to charge ratio which helps to identify different atomic spaces in your input data file it's a good habit to convert real nanometers units in some fractional coordinates which depends on the crystal you are exploring and you can define your reciprocal 3D space HKL and run your function FHkl thread to compute the Fourier transform the name of the GPU cards available is passed by command line and you can choose whatever you have for instance mpcf, I am running on Maxwell architecture which is very performant and essentially what is computed is the formula in the box discrete Fourier transform distributed on several GPU it returns is a tuple a complex non-py array FHkl also the computation time which is nice sometimes especially in the beginning when you want to perform some speed test this is just to give an example of what you can get or you can display for instance on the left is in HL plane contuplot of scattering maps for monotomic cubic structures under tube cells in this case you can also appreciate the fact that I am working with non-ideal structures you can see a slight offsets along the vertical direction on the right is just to compute the complex refraction in terms of the scattering angle on the left is what I ran on our infrastructures as I said the Maxwell architecture the GTX 980 so you are varying the number of grid points in the reciprocal space and as well the number of atoms and please compare with the plot reported in the seminal paper that the pionics has been introduced to the scientific community nowadays we can reach throughput per GPU of order of 4 x 10 to the 11 reflections per atoms per seconds more benchmarks, please pay attention that on vertical axis it's on logarithmic scale and this is the difference in computation time pionics on a GPU and on 64 logical CPU for instance looking at the bar charts in the middle using a resolution of 64 cubes in the reciprocal space you can make your computation in roughly 5 minutes compared to 2 hours on a CPU so there is a kind of a factor of 24 between the two and to make more example on new generation of GPU in this case is Pascal architecture in fact between the Maxwell and Pascal architecture you gain roughly half an hour for instance in the most extreme resolution case and this is how we deploy our data data science project basically we submit our job on the supercomputing cluster and in the box is just a small script how to submit your job because it uses the slurm workload manager now you have raw data and you would like to convert it in a form which is viewable and understandable to humans first we are going to use scikit image which is a collection of algorithms for image processing developed by the scipy community and written in python language especially I am using the matching cubes algorithm which iterates across your data volume trying to find the regions with your isosurface value so in the end what is the goal you have a data volume and from this 3d cube you want to extract a surface of equal value isosurface so it takes two input parameters the data volume called pulse pack which is the scattering amplitude essentially computed with pionics but to be more interactive with the visualization is convenient to use vtk the visualization toolkit which is an open source software for computer graphics image processing scientific visualization is a collection of c++ library but well also a rapid in python for instance so what we need to do is to convert our numpy array given by pionics in a vtk.xml based format and I am going to use pyavtk which is very easy to use python library package it saves numpy arrays straights in your vtk.xml so once you have this vtk.xml file you can process these files using one of the most common applications like vzit, mayavi or paravu we are using paravu as our main workhors for 3d analysis paravu is an open source multiplatform perfectly scalable useful for visualizing huge amount of data in 3d it's scalable in the sense you can run from your notebook up to your cluster or distributed memory supercomputers has an intuitive user interface and when you are doing with the scientific visualization you need data with a well done spatial representation so the data types used in paravu are meshes and to our proposal we are using rick linear grid mostly uniform this is just to say that paravu is very popular in the academic also government institutions when you think about paravu you just think about the small client application in reality paravu is a tool stack of libraries and vtk is the core provides all functionalities for doing your visualization and volume rendering concerning python paravu comes with pv python which is a nice application which allows you to automate your task and make your python scripting for visualization basic 3 steps as most of we have a menu bar with all the features included in paravu a tool bars with the most common features used for visualizing your data a pipeline browser where a collection of pipeline objects in indented syntax is presented you can have a look inside your data or your pipeline collection and change parameters in the inspector or properties panel and finally a 3d viewing so you probably are wondering why I am talking about paravu if you look at this plot which is an iso surface of my simulations done with pionics behind these plots there is a python scripting so you set up your camera and your parameters change your parameters, you use all the filters you need to apply counter plot, thresholds or whatever to visualize your data this is just a collection of filters you may want to apply a collection of all these filters gives you pipeline objects I am using for my ordinary work just 3 of them for instance on the right I want to do plot of my data and then I want to make an iso surface extractions by just looking a range of values, I can also inspect in the data opening a spreadsheet browser and you can make a query on your data so I want to extract for all my data just the cells or the grid points and also you have this nice feature by clicking and dragging on your viewing to select the data you are interested so I am going to conclude we want to address if you are interested in looking for some sub volumes in your Fourier space, let's call some spot you want to identify these spots measuring angles and distances and iteratively pass to the data drivers in order to improve atom probe tomografi reconstructions and so part of you and Python it's very useful for addressing this task and thank you for your kind attention Hello, thank you for the talk you say that you use your direct Fourier transform because your atoms are not equi space have you tried to use the non-uniform fast Fourier transform, which doesn't require you to have equispaced grid no, I have a look at it but in terms of performances here you are using the benefit of using GPU can you run this norm? it actually runs on top of fast Fourier transform but it is actually a mathematical theorem that lets you go from equispaced grid to non equispaced grid it is n log n as fast as the fast Fourier transform as far as I know there is a lot of software used in crystallography that has been around for decades and crystallographers were using computers that are mostly unknown today are you aware of any integration of the things that you do with software that has been used by crystallographers more traditionally sorry there is the integration of pionics tool with external library crystallography computational toolbooks for making other scientific tasks like computing grasing incidents or use it in this difficult to pronounce it, typography, technique and so on so this is the only thing I am aware of that in terms of functions then let's thank the speaker again