 Thank you. My name is Peter Hinchens. I'd like to start by thanking the FOSDEM team for their organization. This is an amazing event. When I started programming open source in like 1990 or something we were marginal and today we're mainstream and events like this you know bring us together and to exchange ideas it's really fantastic. Okay so today I'm going to give you just one idea which is about how one way of building large applications, concurrent applications, parallel applications, multi-threaded applications, applications that do a lot of work and that need for that a lot of cores, boxes whatever. Basically what I will argue is that everything that we've been taught in the universities and with all respect to universities in the last 30, 40 years about doing this is completely wrong and I will explain the right way of doing it. Okay so a law about concurrency. I just invented this a couple of weeks ago. It's very original, E equals MC squared. The effort that it takes to build a system is a function of the mass that's the size of the system times the square of the number of conflicts between parts of the system. So most systems, most simple systems like you know a tool to do a flash programming there's no conflict it's one program doing one thing. As we build larger applications we start having to share information in some way and the traditional way of doing that is to share data between threads and we get this curve at least in my opinion. Where the effort goes up exponentially as we get more and more work on the same data. Now we all start very happily with simple programs doing simple work and we start with no conflict well let's say one conflict equals one thread. The problem is that over time if a system works then we're asked to make it bigger and handle more, get bigger in capacity we start going into multi-threading and we end up basically here. You know one of these series in IT is that pain is a bad sign and effort is a bad sign. Things should be easier not harder and when things take a lot of pain a lot of effort you know there's like whoa stop this is wrong there's something wrong here and the problem with multi-threading as we learn it in school and university and as we're taught by basically every vendor is that the more you try to scale your system the more pain it takes, the more effort it takes exponentially. The reason for this in my opinion is that traditional multi-threading concurrency is based on this theory that there's a global market for 5000 computers and you know we take the computers as a very big serious thing and then we multitask around that thing and we try to share the data that the computer represents. So it's this notion that your data structure is central and your threads have to access that somehow and that that's concurrency. This is what I was taught at university okay that was a long time ago but it's still what you'll get when you buy a tool from Intel or when you look at how to program concurrency and practically any modern language. So what you know what we've been taught is that concurrency is data shared between threads. We're not really used in this part thinking of concurrency as multiple boxes or multiple processes. This is kind of new. I know that modern browsers now start to have a process per tab rather than a thread per tab. This is a new way of thinking. So we have these threads that then try to share data and they try to prevent conflicts to the same data. So they use locks, they use semaphores and this code basically fails by default. When you share data it seems to work as you scale it fails. So you're building systems that are that are not fail safe they are going to fail by default and the more you load your system the more it fails. The more you make it parallel the more it fails. This is really terrible and your diminishing returns hits at about four or eight threads depending on how much effort you put into it. So I consider this to be bogus if not actually totally insane. The ideal world what we want is that we start with cheap and we end with cheap. And this isn't a dream. This is just a matter of deciding, you know, we're tired of 30 years of autocracy, you know, democracy. Go Egypt. We want to end up with basically a democratic liberal system. And what very few people do know is that there is actually a model for this called the actor model which I only discovered myself, you know, maybe like a year ago. We kind of reinvented this for zero MQ. Now the actor model is actually 40 years old almost which is really tragic, completely ignored by mainstream programming. And it's based on the theory that there are lots of computers and they're cheap, which is actually where we are today. The key thing about it is that there are thousands and thousands of computers they're very cheap and they're connected by a very fast, very efficient and cheap communications infrastructure. Think about that for a second. If you can get this then your conflict is one, no conflict. You have one thread at any point accessing one piece of data. There are no locks. The only modern language of any size and import that does this is Erlang. And Erlang is weird. And, you know, it's a good language for doing certain things but having to learn a weird language to do basically mainstream work is not good either. So the basic, I mean the actor model is really quite simple. Think of boxes sending messages to each other. You know, it's a mobile phone sending messages to other mobile phones via whatever. This is an actor model basically. And this model is based originally on the physical world where there are lots of things that are all independent. And it does match the way that, you know, IT is going which is hundreds of billions of computers connected in weird ways. And if you can manage those connections you can build applications that distribute over hundreds, thousands, tens of thousands of computers. So what are the ingredients for a successful, you know, concurrent architecture? You need lots of cheap boxes. Well, we have this today. Boxes are really cheap. You can get threads, you can get processes, you can get cores, you can get Amazon. Boxes are very, very cheap. There's no longer the requirement to share a box. This is old. We have tools to build contractual APIs. Now, you know, when you start talking between boxes you want some formality about what you're sending. You can't just send random data. The level of formality depends on how important your system is. You can make it quite ad hoc or very, very formalized with parsing of messages and schemas that can generate parsers and get very sophisticated. We have this. We have tools to make any level of meta that we need. And we need a high performance network. And it's really shocking but no one's ever made this before. It's really weird. You'd think this would be like, you know, a main thing to make but no one actually ever made this. You could buy commercial products that did this but they're very expensive which excludes all of us. Or you can get tools that do certain things but they're limited. But actually a generic communication infrastructure is a new thing. And so we have zero MQ which is basically fast and cheap networking between threads, processes, cores, boxes. And so my argument is that zero MQ gives you C equals one. So you pay your effort according to the mass of your program and that's it. Yet you can build highly concurrency systems with this. You can connect threads to threads, threads to processes, processes to boxes, using the same API, using the same model which is basically a kind of a socket style API. You say, talk to this piece, send it messages and it will send entire messages for you. It transports blobs. There's no notion in zero MQ about the contract. That's up to you as a developer. Anna has these things called messaging patterns which are more or less natural ways of connecting pieces. Pub, sub is distributing data. Pipeline distributes work. Request, reply is about getting reliability in there. And these basic patterns is three or four or five which cover 95% of all messaging and all communications in this respect. So when you start looking at zero MQ, it's kind of weird. It's a small library. They're not very big. It does a lot. It connects, I think it's like 20 languages, 25 languages with APIs and it takes maybe a day to make a new language API. When you write, when you look at multiting applications in zero MQ, there are no locks, no semaphores. What you have are tasks that take messages, process them and distribute further messages using the simple API and it's shocking how easy it is to write code like this. It's shocking. I mean, I've written multited code and spent, you know, 18 months debugging applications that were supposed to run on four, eight cores. And the more we stressed it, the more it crashed. And we spent, you know, I spent nights debugging conflicts. That's why I have no hair left. And with zero MQ, you know, you write code and it works. And you write it on one box using a few threads, then you put it onto processes. It's the same code that still works. Then you break it across boxes in the same code and it still works. And then you load it and you hit it with data and the thing still works. And this is really shocking. And you can scale to any number of cores. You know, we're building this, you know, the kind of thing that we do with zero MQ is build stock exchanges. Where you have tens of thousands of cores. And you're processing tens of millions of messages per second. This is unimaginable in any other, in any other way. And yet the cost of building systems is not hundreds of millions of dollars. It's, it's, it's, you know, single figures of millions of dollars. Thank you. So this idea I'm going to give you today is very simple. Concurrent systems, parallel systems are a very big part of our future in IT. This is, you want to exploit these cores. You cannot accept if in some niche cases, keep software running on one core. Software is distributed. The world is distributed. And as programmers, we need tools to make the distribution work. Those tools didn't exist until a few years ago. Zero MQ is one of the best, obviously, because, you know, my community made it. It's GPL open source. It's been in development for about three or four years. It's based on our experience making communication software for the financial industry where they have this problem. But really we wanted to make this an open source tool. It's a successful community. 70 people on IRC at any particular time. Active mailing lists, many contributors, which is really nice. API is in every possible language. It runs in every possible system. About iMatics, we've been doing free software for about 20 years, I guess. And, you know, we do protocols, communities. It's what really I like doing. We turned Zero MQ over to the community last year and said, look, you know, everyone owns this. We're not going to take control of it. If I die or if my company is bought, it won't affect the software. It's owned by its community, which is a very important step. The LGPL, like I said, and we make our money from support, which is quite a nice business model. So we have, that was a quick talk, right? And we have five minutes for questions if you have questions. Gentleman in red. Thank you. Two questions. One question is, how would you compare to Debus? Debus is probably slow, I guess. And the other one is, how about processing images? For example, you have multiple cores processing one image. Okay, we'll go quickly. So first of all, Zero MQ versus Debus. Second question, how do you distribute work, for example, image processing? So Debus is a product which does a certain thing. It connects processes and threads in one box. Now, you'll see much overlap between Zero MQ and other messaging systems. There's not a new concept to connect pieces. What Zero MQ is, where it's special is its generosity. You can connect pieces, be they processes, be they boxes, be they threads, using the same API. You can use different transports, TCP, multicast, inter-process, inter-thread using the same API. So you write code one time, which is conceptually based of tasks. You put those tasks anywhere you like, then you distribute them. Tasks don't change. That's the big difference. It's really aimed for scaling out applications to any size rather than building them on one box. Second question about processing images. This is a classic workload distribution case. And it's a classic case of using many cores to do the work. One CPU takes one second to process one image and your client wants to process a thousand seconds. You have to have a thousand CPUs. So typically, Zero MQ architecture, you'd have a first point where work comes in and it distributes it to nodes that do work and then the nodes send back the results and you send it back. Classic architecture. Very simple to build a Zero MQ. It'll take you maybe half a day to make this in Python. Maybe two hours. Next question from somebody? Gentleman Yes, all the end points tied, in other words, if you have, if you want to set up a message to a Python to a process, can that process move or is it fixed to a type of interest? What I suggest you do to understand Zero MQ a little bit better is to read the user guide I wrote. It's called, there's a book called The Guide. And it explains the basic patterns, how they work. It's got lots of examples in every single possible language. And you will see how to build basic applications using Zero MQ. That's a good starting point. I will stay outside here for the next half an hour or so if you have questions, feel free to come and find me. Thank you very much. Good afternoon.