 Well, yeah, like like PJ said, my name is David and I come from Mexico You can find me on the internet as David. I'm like that on Twitter and GitHub or Almost all social networks and yeah, I'm here to talk to you about how to process millions of images with elixir Versus Ruby It was a shortened the title, but I have to I do have to make a warning before starting This might not be the best way to do it But I just want to tell you like this is this is the project that I decided to use to learn elixir and I learned a few things along the way So this is this is for you if you're starting or you're undecided we're not to move to elixir This is gonna be just gonna be a good story So let's let's start I work on on An application that's for the real estate market and it looks something like this, right? So people can go there and you know, you can put your property for rent or for sale and you can upload pictures For for the properties, right? Like, you know, like any regular website do and if you use if you're familiar with the Ruby on Rails world I'm using a jam. That's called car wave that you know Creates different versions of the images like Tom Nails and maybe the big size or smaller size and And uploads them to S3 to a bucket in in Amazon web services and that's where the images live, right? And everywhere everything was what's going great except for One day I get a call from the designer and he tells me hey, you know, we have this this weird layout here on the home page Well, I want to change it. I Want to have a map on the right side of the screen and I'm like, yeah, that's not gonna fit Oh, no, no, no, of course not gonna fit. We need to make this images a little bit smaller. Oh Okay Sure, so I thought myself. Yeah carry wave. It's you know, it's easy so I can just call a method in carry wave which is called recreate versions with the with the new sizes and It will all be magic So I'll just create a rake task that goes through all the images one by one and call that method, right? Yeah, that sounded like a good idea except for one thing and that thing is that each Image since you have to download it from S3 process it and then reuploaded took around one second to process each and When I you know, I made a query to the database and I had you know 2,000,000 2,000,000 images there Which for you know, if you make the map if you have one second per image, you know, you divided by 60 that will take 45,000 minutes 750 hours at that total to 31 point 25 days and You know my boss said yeah, no, that's not gonna work. We need it like tomorrow And Yeah, that's not gonna work either, but I'll make my best so I was like, oh, okay, right Let's just use threats, right because Ruby's so good at threading. I'll just do it and this is sort of what I did What are you laughing? It's very good So this is what I did, right? I just created a There's there's a queue class in in Ruby that's used exactly for this So I just you know, pull the images in batches. I created that queue object pushed every one of the image into the queue and then just have workers in this case 20 Because that's the max number of CPUs that you can get a digital ocean so I said hey, yeah a worker for CPU sounds great and just you know started processing images and You know if you know how yeah no, that's that's MRI sorry and Yeah, so Weird as you mentioned it because if you know how Ruby works. It's something like this. This is an old slide but it sort of Gets you a picture of how it works right like in Ruby 1 8 7 there was no like parallelism at all and then In further version there came some parallelism, but it's not real because it's not really using taking care of Using all the CPUs and then yeah, there's implementation like your Ruby and Ruby news that do that But I didn't sort of wanted to use another version So the problem is the problem with Ruby is that it has a global lock And it you know, it sounds like Something that why would they put a lock and in place, but then you come to code like this I made some research and There's sometimes where for some reason you have a code like something like this where where you have an array, right? And and you can populate that array with objects out of nowhere in different threads Ruby with its global interpreter lock guarantees that you get the results that you're expecting So this code running on different versions of Ruby in MRI You get the five thousand objects that you'd expect because it's locking every time you're adding to the array And it's not letting any other code modify it before it's done with it Whereas with J Ruby or Rubinus that don't lock itself you get unexpected results, right? so the philosophy is that For mats is that he's taking care of you, right? Like he's he's making sure that you don't go ahead and do stupid stuff like trying to Tangle the code or data in it and whereas the other implementation say no We're gonna give you a performance and you are responsible of doing the right thing like implementing a mutics or whatnot, right? So there's this is this is one of the examples where the the the global interpreter lock is in place in Ruby But there's many more right like there's many more Functions or or methods on on the on the language that locks you out So you are not really using Parallelism or maybe you're using it, but at some points it will stop you and then continue So that's why it's not that efficient, but even so the new version of code that I did with threads It reduced the average of processing to 0.6 seconds per image, which is a little bit better But you know still 18 days so Buzz was still not happy at all so All right, I decided to give elixir a try and and because you know I Did not get into the go train For some reason like I never and I will listen to the people that that keep telling me like go He's so great concurrency stuff and and I didn't get into that frame, but elixir on the other hand called my attention because It has like a better syntax I am a fan of since since I became a Ruby developer I am a like a syntax knob right like the code needs to look good and and and I hate like Semi-colons and all of that stuff that you have in all languages. So elixir when I looked at some sample code It it looked better. So I decided to go with elixir and You know the biggest difference between Ruby and elixir and it's that if you have code in Ruby your code runs in a single process and When that process like if you get an exception for some reason let's say that I'm running my rake task And for some reason he can upload to it to Amazon and then there's an exception then the whole thing dies, right? And I may not notice it Whereas in elixir You have a model where you your process can spawn order smaller processes Right, it's it it distributes the work between processes and then those processes can make all their processes And if one of them dies then you just replace it with another one And you know the the application continues working perfectly. So it's a it's better on that aspect, right? so I Decided to give it a try and let's see what I needed to do. I needed to create an app and no TP application Then my app will retrieve records from the database And then I will download the original image from Amazon Create the new image sizes with image magic and then upload them back to Amazon s3 Let's like care wave does for me except there's no care wave in elixir. So So the first part creating the app was very simple You probably all already have created a new app It's just you know mix new the name of the app and it creates the whole three with the required files And you're done. You're ready to start coding. So that's that's unimportant And then here comes the the real code you need to retrieve records from database The best way to do it right now that I've found is by using ecto Which is sort of like the active record of the elixir world except it's not and all I needed to do is just configure it, you know added the adapter and The database name the username and the password and which of course as root user. You don't need a password and Create a model In the model, it's it's just a module and you use the the ecto model module and then you define the schema right the The columns that are supposed to be on your database. It doesn't do it for you except for the id I think and since I only needed the the file name from the database to create the URL for S3 That's all I added there. So Then once you have a model You can start creating queries and the queries are created sort of like Functions and then you change those functions to get your results. So let's say that you have a main query which is just the Main select everything from this table and then you can do all things like find maybe find only one So you change you change the main query into into disorder Query and then maybe you need a page. So you you can add the page thing limit an offset and And in your code you just sort of use a pipe to change all those functions and get The data that you require in my case I only needed to get everything because I was not going to pick the information So I just create a an all method and just bring them all because I need and need everything Then I needed to download the original image from Amazon S3 so that was pretty simple I just needed something to to download via HTTP like Like curl does or double you get So I found this library that's called ish TT potion and in elixir which can do that for you and And it's just a wrapper of another Erlang Library, but it works and and I needed that and here's here's an example of something that I love in elixir which is the pipe and And look at how how it gets your code like like it makes it makes it look cleaner To me and and more more descriptive right the the pipe what it does is you get the result from the first function and Passes over to the second function on on your pipe list as the first parameter Of the other pipe of the other function So I could have written this code like this, you know where this is the first parameter But it looks so much nicer when you use the pipe once you have like Four or five functions change you just start to get the benefits of using the pipe and it just looks so good And that will download like I said that will download the image from Amazon into into my local storage, right? So the second sorry the next problem that I had is that I needed to create new image sizes So I decided to look figure out what what I could use for you know image magic manipulation or anything and I found this Just package that's called Mogrify, which is like it says right there and makes a wrapper for the image magic on the command line and It did what I wanted to do, but it didn't had All the tools so here's a here's another example of how the how the pipe looks this is so awesome It has methods to resize the images, but if you have used Carewave there's there's other methods like resize to fill resize to fit resize to I don't remember the other ones that give you Different behavior on how how it it's it does it resizing so I didn't want it to like Put code in there without knowing exactly what it was doing. So I decided to port those methods from Carewave into Mogrify and send the pull request and get it accepted. So yay me I Love doing open source stuff like that. So now we had These methods for me to use on on my on my own application Now the next problem was uploading to Amazon S3 and this is where it got a little bit tricky for me You know the first thing that you do when you don't when you don't know the language is do like Google How do I upload files to S3 with Elixir? And then I got no results That was oh, oh, what's what's going to like do Elixir? Programmers don't use Amazon at all or what's going on? This is this is dramatization, but yeah, I found no results about uploading files To S3 with Elixir. So I was like, all right, that's not gonna stop me because I have the command line and I can use Amazon's own tools like S3 CM CMD and I'll just make a system call and upload the files through the system and do something like this So I was done and run the script and every image I was taking 1.6 seconds per image, huh? so at that point I was like What's going on someone lied to me because you know This is this is not working right this this was me at the moment Am I am I wrong so I? Fortunately, I know people that's experienced with elixir and I you know I call them and say hey Your elixir things not working. This is this is what's happening to me I'm doing this and doing that and it's taking longer like what's going on and as I explained to my friend What I was doing his face went like this basically like Dude, he told me look at your code What you're doing right there is that you're opening a Operating system process and then doing your thing and then closing it and then opening closing for 2.7 million times That's gonna take a lot of time what you're doing is totally wrong And I was like, okay. Yeah, I understand what's going on. Yeah, of course makes sense right open the process close the process Open the closes that's that's a lot of work. He told me. Well, let's let's talk about the library They're using s3 cmd is a Python library so why don't you use airport and Open which is used to connect Erlang to our languages and Open a Python process and then load the Amazon's code and use that That process for the 2.7 million images and it's only gonna be like once right and then you'll be you'll be processing like the images way way faster and That sounded like a good idea except I didn't have the time to do that So at some point when when I was when I was talking about this, I would like wait, wait, wait, wait So I can you you're telling me that I can call any Erlang library From elixir. You're like, yeah, you can just you know use the syntax like this like that and Whatever code that's written in Erlang. It's a library. Whatever you can just call it in elixir And I don't know that's that's interesting. So what I did was went back to Google and Google Please tell me a way to upload Files to s3 using Erlang and that's what I found it There's a library to do that in Erlang not in elixir yet, but there's something for a length Yeah, I should have started there So once I found this it was very simple. You just need to Add your library to your to your dependencies and then just call it like this and you're basically on your way except for one small thing that I forgot to Make it, you know more Clear is that When I found this I started getting a lot of errors from from the from the call to the to the to the library because it was not there was no matching of the of the Of the arguments that I was sending to to the function and Because there was something missing and the problem was that I was using strings in elixir And when you call Erlang libraries you you most most of the time you're gonna want to send Character lists, so I needed to convert that All of the all of the arguments into character lists before sending them to Erlang that's This way I also learned that there's a difference between the two double commas and and the single ones in elixir and it's you know, I Learned it the bad way basically because it was just blowing up and I didn't know what like what's going on It's it's the right. There's a string it has what I'm looking for and and it still said that it there was no match because I was sending a string and It was expecting a character list. Hmm. So if you ever use Erlang libraries, you're probably gonna need this advice Obviously when I when I when I told my friend he told me he was right there in the example I send you and I was like, I didn't read your example He's talking about so anyway, I Can now upload files to Amazon and I can do it linearly So what about concurrence right because that was the whole point of this So if you think about What I need to do to process every image is just retrieve the records from the database and then download the image from S3 and then create two new image sizes and then upload the result to S3 Those things are sort of unrelated right like I can download I can retrieve the records from the database and as soon as I get them I can start downloading Images and then as they are being downloaded Another process can start processing them and then another process can start uploading them So they're separate things that you can just Separate into different processes and that will be like the optimal thing. I didn't have the time to do that but at least what I did is separate all the the processing per image on different Erlang processes, so The only thing that happens is that I retrieved the records from the database and then I created several processes for the whole thing of downloading processing and uploading and for that I Found a tool that's called pool boy which is basically that it's just a worker pool factory and You use it for cases like this. It's actually I was actually trying to add it into Into my dependencies and it was nice that it was already there because apparently ecto uses it to handle the connections or whatever So I already had it and that was that was cool so what you do with pool boy is that you create a worker model which You know it just Basically just starts The gen server and then you put some code that you want to initialize it with Every every worker. This is where you put the code that that you don't want to be repeated every time So in this case, I just initialized The connection to to Amazon When when the when the worker starts and then you have the code that actually process Whatever, whatever you want to process. So in this case I'm just calling the the process method that will do everything from downloading the images all the way to Upload it again and then return, you know a reply and and the result of that and You keep the state because there's no state per se in Electric classes, so you need to you know pass it over through through the whole life of the of the worker and Then you need a supervisor, which is the one that's going to be handling That's gonna know when when everything's up and working and you initialize it Well, basically you just need to tell it its name Which I put in another function in case I needed it for for future reference and the model That's gonna that's gonna handle over which is the worker that we just saw and then you can handle the size of the queue which is It has it's very flexible and I found it very very interesting how how the police work because you can set a Size first of of of the queue Let's say that I say 20 and if you start sending work to the queue Then you can set another parameter, which is max overflow and you can say it Hey, if you if you get a lot of work, then you can grow maybe up to 50 or 100 or wherever So the 20 the 20 processes that I that I specify there are always gonna be up But the other ones will only will only excess if the 20 is not enough So your pool if it gets a lot of work It will grow and then as soon as it's done then it shrinks back to do whatever you want So I could have done also like said hey the size is zero. I want I want the processes to be off and then max overhaul 20 so it will grow as as the word question and this is This is relevant because maybe you don't want your resources on your server to be spent if they're not being used Right, so you don't want maybe you want 200 queues But you don't want them to be up if no one's no one is using them Right because that will waste a CPU memory and whatnot. So that's that's pretty powerful in terms of flexibility You can you can row and shrink that the queue pool and and it will just do it for you You don't have to do anything but but stating it there Then So the main model the one that starts everything is it looks like this So I just start the program and start that supervisor which is going to handle the queue Then call this method in queue which is going to just send all the records to the database and and then to the queue and Just return to supervisor so the sort of process stays up And queue what it does like I said it just pulls all the records from the database then goes one by one and Creates this this piece of code is what creates the actual process to process that is sent into the queue and Then the queue will manage when to run the process and just you know as soon as it It will have to matter to say hey, I have a Process available for you. I will handle it and then just discard it as soon as it's done. And so it just works Basically like magic. So my server just started working like insane Look at that. It's using all its CPU power to To do all the processing of the images with so little memory used like this is this is this is amazing It's it's so it's great. How how elixir just takes care of it and it's it's it's really there The Erlang virtual machine, but still it's elixir. What's what's to me? It's elixir. What's doing all the magic? So in conclusion It took about Four days to process the 2.7 million images, you know It still took time time, but it was way better than the the month that I had forecast with with Ruby Which if you if you split if you do the math to split it it took like 96 hours 5,760 minutes and all those seconds and if you average it in Took around 0.128 second per per image, which is Insanely fast right and it's only problem. Well, it sort of solved my problem because it took me like 12 days to figure it out So in total, you know, it it was like 16 18 days So my boss was still not happy, but it was it was you know was quite the learning experience and the second conclusion will be This is Like elixir is it's so so great and not just because of elixir, but because you're you can Use technology that's been there for years and in in Erlang right Erlang it has existed for 25 something years and You know at this point you'd expect them to do to for for the airline developers To have done and solve all the problems in the world So so you you're not you're not reinventing the wheel You're just making it a little bit better with the syntax and that's that's cool, right? And and I know because I know people that do Erlang for living and Now that elixir is getting hype and when I talk to them about all the all the amazing stuff that I can do They're like, you know that yeah, I've been telling you for years. You're wrong Erlang is the right It's the right path, right? So so yeah, they they were right for they have been right for 25 years all these problems that that you That we are still solving in other languages like fretting and currency all that it's already there It has been there for for for years and years and it was actually designed for it You know, you're not patching a language that already exists that to handle threads No, no, no this Erlang was designed to handle, you know multiple processes at the same time So so you're when you use elixir You're using all that experience in your code and and that's great because you're not it's it's harder to find Weird box or unexpected behavior and if you do find it it's probably because you caused it, right? And like like the whole strings and charts difference like I did right so so So this is this is great. There's also a lot to learn obviously and Another thing that that makes me excited. It's that that's a lot to to give It's All the libraries in elixir right now are still, you know looking for for help in terms of Code and I like to do that like I like to find gems or or X packages that it can be improved or that can can can be better and just code something and Make a pull request so for me that's that's great it's the state of the current state of elixir and To me it's like there's a lot of opportunities to to give back to the community by you know patching stuff So that's great The other thing is that the syntax is very very Beautiful like I like the syntax like I said, I'm a syntax guy and and you know the whole pipeline thing and And and everything makes it look very very elegant and and I really like that And I guess the other part which I haven't yet explored but I will at some point is that whole airports thing Where you can open processes in other languages? I can I can at least think I have it on my bucket list to create What if I could create a server a web server that can that handles all the all the connections via our link But then it can open a process of Ruby and just you know send the request to to be a rack And then you know the web server will be in Erlang but handling Rails applications that sounds that sounds like something that will definitely someone to explore even if just for hobby and That's it. I hope I hope you you guys are really enjoying Coding with elixir like I am Like I said at the beginning there were probably 10 different ways of doing what I did It will have probably been Faster if I just refactor the code to use j ruby or rubiness, but that was not the point Well, maybe for my boss. It was the point but not for me. I Decided to do it really except because I wanted to try it out in a real-case scenario and and you know figuring things out and and I did and so that's that's how I ended up handling all those images and learning a lot about ecto about, you know threads concurrency and all that so I Guess my last piece of advice is if you have a project that you Think you can do with with elixir Just go ahead and do it and and learned you some stuff and that's it. Thanks