 I find it amusing, I do not know about others, because complicated, I do not know, maybe it is complicated, maybe it is not, it is just that we have not solved the mystery of this complicated object known as the brain, that is why we are calling it complicated, but maybe soon enough as technologies grow, people get even more better in understanding the mystics behind the brain, maybe things might change. So, what will be covered during this talk? So, we will be covering what is NiPipe, the overview, what are workflows, what are the packages in NiPipe, what are the other dependencies required for NiPipe and other things like those and how do we create them basically. So, we will be looking a bit on the core as to how do we work on NiPipe. So, let us move on. So, this is the basic process, not a level flow in neuroimaging. We first have a, I have not mentioned in the start, but we have a MRI image, any kind of MRI image will do, it is a normal MRI, functional MRI, it could be an EEG image, EMG image, whatever. So, we start with an image, then we try to guess what kind of parameters are we working on? Are we working on something which is related to autism? Are we working on something which is related to ALS? Or are we working on a disease or are we working on certain connectome? I will explain what connectome is a bit later on. After guessing the parameters, we later on collect images for processing. So, basically we aggregate all the images that we want to work on and then later we collect them and process them together. So, if we have collected the images together, then we specify the goals. So, I have the images, I have my parameters, what do I need to do next? Do I need to process it? Do I need to color image it? Or do I need to specify which connectomes are connected to which? Goals or it could be a single goal, anything can happen. And once we specify our imaging goals, we create some new parameters for our goals. Once we specify the goal, can you see the text right now? So, where was I? Yeah, imaging goals. So, once we specify our imaging goals, we move on to creating some new parameters. So, what we mean by that is once we have our goal as we have to create a new connectome for autism patients. We have to create imaging pipeline or whatever. So, we move, keep on creating parameters until and unless we get our final output. So, this is the basic process behind the neuroimaging. The neuroimaging, the dilemma with neuroscientists always happens is we have too many choices. We have too many choices to choose from. As you can see, we have AFNI, we have FSL, we have ants. All of these do separate jobs of their own. Like SPM works on functional MRI images and AFNI works normally on EEG and other images which are taken through the eye. So, we have too many options. That is the case which happens with the neuroscientists or neuroinformations for that matter. Another problem which neuroscientists face right now is when data keeps on increasing and we get lot more data to compute. So, how do we process them? How do we make it in a format that the final neuroscientist will understand that he doesn't have to look through the blinds to see what is needed. And you can see the final point which emphasizes the heading which is big data computing, the 1000 functional connectome project. So, as I mentioned connectomes before, I will explain it right now. Connectomes are basically neuronal paths which connect from one neuron to another. So, basically what you do is you connect one neuron, you show their connections in a computer format which a programmer or a neuroscientist will understand. So, this functional connectome project is basically a pet project by one of my mentors who is working under. So, this is the data for aggregated from one of the papers which was presented by a mentor. This project started in December 2009 and since then there has been an explosion in the axis of the data as well as sharing of the data. Like a lot more neuroscientists and neuroinformaticians are taking in the data, sharing the data with each other and accessing them, working on them and running them in their own ways. So, as you can see the graph is steadily increasing at around December 2011 which is when a protein result came in for ADHD and then it went down. And again peak at the end of September 2012 when A-bide which is a connectome framework for autism patients. I can't remember the full form but it's for autism patients like the project is basically you have connectomes which is scanned from the autism patients only. And it is later aggregated and stored inside the FCPI and DI which is the functional connectome project. So, this leads to conflict between the neuroscientists and the developer. Neuroscientists thinks okay which package should I use because obviously he's not a programmer, not a full-fledged one at least. Which package should I use? How do I use this package and why do I need to use the package? Why do I use PsiPi? Why do I use NumPi? What impact will it have in my project? As for the developer part, developer will think why will I develop my project for a neuroscientist when I can easily make something else. Something else for let's say some corporate developer let's say as a freelance maybe or when he creates one let's say for a neuroscientist. How do I share it? How do I upload it online for other neuroscientists or other developers to use? And the final point which is creating a package which can support multiple computer architectures. As we all know there are multiple computer architectures starting from the low level PCs to bigger supercomputers. And as data keeps on increasing, as processing power keeps on increasing, we need to create packages which can support all the architectures uniformly. So going on this creates a massive conflict amongst the both breeds. So we have more questions like these like how do we train people in using these packages, these softwares? How do we train the neuroscientists and how do we train the new developers who want to work on neuro-imaging neurosciences? Or how do we create new tools for research and performing research which could be reproducible by others? Reproducible out here is a very key point because your data or your research, whatever data you got after your research and your inferences you made. If another scientist thinks let me reproduce these experiments, let me take the data, let me reproduce the experiments. It should produce the same inference, the same output no matter whatever happens. It should produce the same inference when whoever kind of scientist finally reproduces the kind of conditions the research is being done on. And how to work with different packages, interfaces, file formats? As we mentioned in this brain imaging software, as we all know these softwares are all different. So obviously they all store data in different format, they all have their interface differently to work on. So when you work on something which is different, which has different packages, which has different interface, so how do we link them together, how do we work on them together so as to you know like it doesn't create a conflict in the program itself? So this is one of the questions which was faced by most of the neuroscientists as a part of this project. So NiPype comes into the picture in this case. So why it helps in this case because it brings together all the imaging and the pipelining software together under a single wrapper. So you don't have to work on SPM separately or FSL separately or AFNI separately and you have conflicts when you are linking them together in a single program because the formats of these three programs are different altogether or when you're working on NiPype, it's another package which NiPype comes under. So all of these packages they are linked together under this name NiPype. So now the question comes why use Python because obviously people can make a package like this in C++ or maybe in other languages. So one answer was it was very easy to learn because it's being taught in schools right now and all over the world. I don't know about India but in all over the world you are being taught Python as one of the primary language in schools, in colleges, high schools and it's cross-platform self-explanatory because it explains why you can run the same Python script in Windows and if you send it through Linux it will run the same no matter what the cases are. Whatever your version of operating system is it will run the same. Extensive infrastructure for scientific computing. It supports all kind of infrastructures be it your small laptops, PCs or be it supercomputers. It supports all. More institutions are adopting it as part of the development. The institution which I'm working under which is INCF, International Neuroinformatics Coordinating Facility is one of the organizations which uses Python for all of its work except there are some projects which are made under C++ but most of the time it's mostly done under Python and there are other organizations like Nathan Perkins Institute then we have Oldrack Lab which I'm currently working under. It's a Stanford, it's a lab which is currently under Stanford University. All of these labs and the organizations make use of Python for all of their projects. Conduct data analysis using Rpython, Octave, SciLab. So basically what this means is you can import your package, Octave package, you can import Rpython, you can import SciLab, you can import MATLAB or whatever you need for your project and after you are done with your project in neuroimaging you can then do data analysis on it to filter certain results, then modify those results and make them presentable for the neuroscientists and neuroinformaticians which is my next point easily understandable by neuroscientists. The thing with this point is since Python is a scripting language and it's one of the easiest to understand so even you give a student to learn, start reading some commands in Python within one or two days you will start to understand this is how Python works, this is how Python runs so once within one or two days if he learns a bit about Python he can understand what are these functions about what is this class about, how are these running together. So next point is neuroscientists can create their own packages easily according to their requirements. So basically what this means is since this is linked to my world first point which is easy to learn since it's easy to learn you can create, program your own packages for your own requirements like if I want to make a package which runs all the pipelines in a parallel computing architecture neuroscientists can easily create one of those or if I want to create a package which links to MATLAB and then make, then mask the filter, filters the whole image and improves the quality of the image it can be done easily. So that brings to the next slide what is NiPype. So this is the basic architecture of NiPype. It consists of three parts out here interfaces, execution plugins and the workflow engine. So our next slides will bring us to explaining all of these. So the first slide is engine. So we have three parts actually two, I'm sorry, two parts in this slide. First of all is the node. Basically what a node does is you have a function. It is wrapped inside and you can link it to other nodes together to run a whole program. So it is basically what I'm saying is about the workflow like you have one function you have another function you wrap them together in a node you link them together that is one workflow. And then a workflow it's a graph whose nodes are of type node map node as I explained to you you have two nodes you link them together and then you create a workflow as it is you can have multiple nodes linked together you can have a MATLAB script link with a Python script or you can have an SPM program link to Python script whatever the case work you can do all the things in a workflow. Then we have executable plugins. So basically this is all the packages, softwares which we require to run the program at our convenience. So basically as you can see right here it's Torq, Ipython, Linear, SSH all of these plugins which we require to run our program is executable plug comes under executable plugins. We require Ipython so that we can share our whatever program we are written through a Python notebook we require a linear to share our share make new pipelines new workflows multi proc to run a single script in a multiple process multiple threads SSH to link to computers and then we have many other packages like these which come under executable plugins. So then we come with installation of NiPyp. So it's readily available as one of the packages in NeuroDebian which is made by again INCF Python package index or you can just fork it from GitHub and run your own deployment. Currently it's I think the version of NiPyp is 0.11 I'm not sure I think it's 0.11. So basically we require all these dependencies before we install NiPyp in our in our computer NumPy, Ipython, NiBubble. I think I forgot to add one more there is a package called NetworkX I forgot to add that one. So after you install all of these packages and you are done with it you have to ensure that all tools are installed and accessible. This basically means that you have to check the version of whatever tools you have installed. What are the versions? Where are they stored? The exact path of the installation path so that you can link it later on to your script. And you have to ensure that okay once you have linked these scripts together once you have linked these installation paths together it should not happen that okay I write my path hints in a program or a script and it happens that it cannot access it. It should not happen. And one point to be mentioned it's very important. NiPyp is an umbrella project under NiPy and it's not a substitute for the imaging packages like ANDS, FSL or AFNI. Basically what this means is it's a sub project under NiPy and in no means it is not meant to be substituted by a package like ANDS or FSL. You still need those packages to run your tool or your script and one should not use should think that okay so basically these are some of the softwares which are used in NiPy. One of them is free Kamino. It's one of the softwares which is used for diffusion MRI. And I learn it's a package which is made by Gail Warokhwa who is again one of my mentors. It's a machine learning package for neuroimaging and SPM for recording the brain activity for your experiments. Then we have MNE for magnetoencellography and electroencellography. And then we have AFNI which is for processing functional MRI images and analyzing them. These are just some of the few packages we have more and not all of these, all of the tools which are tools and softwares which are there in NiPy are not necessarily updated. Some of them are deprecated even so developers keep working on that even now. So now we move on to some of the codes for working with NiPy. So we'll take an example for NiPy. So our example is basically we are taking a functional MRI scan or it could be anyone, it could be a monkey, it could be a human, it could be a Waxhall rat, it could be any species. And then we realign the MRI scan. Basically this means it's okay if we have an MRI scan. There will obviously be some kind of head bobbing when there is an MRI scan. There will be head bobbing, there will be shaking in the equipment. So we basically realign the MRI scan to make those corrections. For registration basically what this means is we have our MRI scan. We link it to our anatomical scan. The basic anatomical scan which is available online anywhere. Normalization to fit the brain images. Since size of each organism brain is different. Waxhall, rat, human, monkey, macaque, whatever organism you pick up, their brain sizes are different. So we have to fit the fMRI images, the output of the core registration. So that it fits according to that size. And then seeding to improve the final quality. Basically just filter, color filter in the process of seeding. So the first step to every NiPyP program is basically we first create a workflow of what we want to do. So workflow is a pipeline to process data. A directed acyclic graph which represents data flow. Nodes are processes or functions. So the flow of data defining the input and output of processing node is a must. So basically what we mean by saying this is we define an input, we define an output. The lines in between define this and this processes and functions are linked together finally. And they are directed together in an acyclic graph. So actually I made this kind of like a flow chart because the acyclic graph wasn't coming properly. I am sorry for that. So I will explain you what the start and end over here means. The start over here is the fMRI scan which we have in question. It could be any organism once again. So then we realign it along according to the target image. We co-register it to anatomical scans, smoothen it, filter it, normalize it according to the brain image. Seed the final image and then we get the output, the final output. So this will be a first-level fMRI analysis. So we first import the necessary modules which we require for our program. As you can see it's a lot of import statements out here. As you can see in the second line right here in nipype.interfaces.afni import dspike. dspike is one of the packages which is required to remove all the spikes within your fMRI images. Free surfer to convert your mRI image, make it in a binary format, applying volume transformations, making a volume, volume matrix and so on. And then we have the final import statement, nipype.pipeline.engine import workflow node map node. So basically this enables us to create our own workflow, create our own nodes, make a link between the two nodes and mapping them together. Next, and then we basically, our next step is to import some of the statements and script locations for MATLAB and free surfer. So we import the necessary package which is MATLAB command. And after that we set our default path. In this case it's spm12 and then we give the necessary options to the computer that okay we require this option. And we have the location for our free surfer program. In this case it's nipype, slash nipype tutorial slash free surfer. And we set that to our default FS command function. And then we define all the parameters which you require. In this case our output directory will go to this. Our working directory is present over here. We have our fmri slices should be 40. You can set it to 20 or 100. But as you increase the number of slices it's obvious that the computing process will increase. The load on the computer will increase. And then we have time repetition. So as you increase time repetition there will be an increase in the computational load and the FWHM size. So basically this tells the computer that this will be the size of my final output. So after this we create nodes. So basically as I said once again we create functions which is again processes for the preprocessing of our final for the input fmri image. So in this case we are first giving order interleaved order. Basically this means to... So we have a number of slices of our script of our fmri image. You slice them and then you order them accordingly to whatever the area which is required. Like singlet cortex could be one of the areas, neuronal cortex could be one area, amygdala could be one area and then we apply slice timing. Second one is realign which is correcting for motion problems which happened during the MRI scanning process. Smoothing to smooth the images with a given kernel. In the final, in this one I actually put these two in comments because I will explain you later. These things work differently accordingly. So we have... So what we basically do in smoothing is we smooth our image four times, five times and so on. The last line as you can see right here, surface underscore fwhm 4, 6, 8. It means that we have to run our subgraph several times. We'll run one fmri image and filter it four times. We'll run one fmri image, run it six times. We'll run it parallely or we can run it serially in three ways. dv register, core register of volume to the free surfer anatomical surface. Basically this means we have an anatomical surface. Then we have our output image which we get from smoothing. We register it to the anatomical surface. And then finally we connect all of these nodes together. We connect all of these nodes together in the final line which is normalized. It equals to pe.node interface spm.normalize. So basically what we are doing right here is we have a package from engine called node. And we connect this with a function called normalize and name it something called normalize. So this is the connectivity of workflows. So we first name our workflow. It could be anything first level analysis, second level analysis, fsl analysis. It could be anything. Then we connect these together. We realign. So you could put any number of parameters in workflow.connect, realign, bv register. You can write it separately so that you don't want any conflicts in your functions. Then we have our next. We start with the main processing which is the level one design analysis. So basically this is all mathematical processing. We derive basically we have a matrix. It derives based on that and conducts interest scanning. So we apply estimate contrast, volume transformation, converting MRI to a zipped file format. So I guess this is it. For more information you can look on all these resources which I aggregated together. And as it is, I would like to mention my special thanks to all of these professors whom some of them I work under. Thank you. Guys, any questions? Questions? I have a question. Python is never meant for high processing and for speed. And here you are talking about the real data like brain monitoring and the neurotransmitting. So I want to know like how it is more accurate than C? I know it's not accurate. But again, as I said, speed and accuracy come at a cost. Like you increase speed, you decrease accuracy or you increase accuracy, you decrease speed. They both come at a cost. So this is the sacrifice which some of my mentors or developers which they have to make for making knee pipe. So basically they had to sacrifice the accuracy. It is not accurate obviously because as it is, we are using third party packages too. We have MATLAB, we are using MATLAB, we are using FSL. All of these are not updated. MATLAB is updated, I know. But some of these packages like free Kamino is there. We have MNEs there. Some of them are deprecated, some of them are not updated periodically. So there could be problems where you have a new image coming in, and you don't get the final output. So this happens normally. So this is the sacrifice which they had to make. But it has worked so far until now. No problems, no issues. So what you guys are doing to make that improvement? Basically, we are still working on that. We are slowly removing the deprecated packages if possible. If it's possible that if it's still useful, even if it's deprecated, we don't remove it. But if it happens that okay, it's useless, I don't still want to use it, we remove it instantly. And we keep on updating the packages obviously. And we still work as a third neuroscientist and neuro-informatization, work on new packages which is aside from this project, outside of this project domain. Like one of the projects I can mention you, fancy pipe, it's made by one of my mentors whom I'm currently working under. It's a package for running serial or parallel pipelines. Like you have a pipeline, you can run it serially or you can run it parallely. So this is outside the domain of my pipe. So this is what we are doing currently. Thank you. I expected this to be very difficult. I'm surprised I'm not seeing any questions. So I thank you guys for patiently listening to this talk. Thank you all.