 Dú llwyddon Gweithio. Mae'r fawr gweld o'r talydd o'r bydd, mae'n anod o gwych yn erbyr, mewn cyfcatlen cynghwyrnod, yn y gwirioneddol yng nghymru, fel y gallwn hynny, cael bod hynny'n defnyddio sy'n bwysigol. Why multi-core? The reason for worrying about all of this concurrency stuff quite so much is largely to do with the rise of multi-core. Probably most of you have phones in your pockets that have quad-core processes. And I'm sure I don't need to tell you about what Moore's Law is. That trend is going to continue and speed-ups in programmes will start to be gained by using more cores more efficiently rather than simply buying a faster processor. That's really important. What's also important is as hardware becomes cheaper and people do more interesting things with it, different sorts of platforms are becoming available. So this board that you can see on the slide is called the Parallella Board and it's by a company called Adaptiva. So this was a Kickstarter project. And I'm afraid I haven't bought one with me. But this board is about the size of a credit card. So it's the same sort of form factor as a Raspberry Pi. And the aim of this company is to do for supercomputing what the Raspberry Pi is doing for sort of general development. So to make it cheaper, to make it more readily available on the desktop to all sorts of people, whether they be hobbyists or scientists or whatever. So the interesting thing about this board is that it's kind of like the Raspberry Pi. It's got the sort of features you'd expect of a single-board computer. This is a dual-core syncs chip. But the really interesting thing is this chip that Adaptiva made themselves, which is a 16-core, many-core co-processor. And the idea is if you want to speed up your programs, then you use this co-processor to help you do that. So they've got a 16-core version of that chip and a 64-core version of that chip. So that makes multiprocessing very cheap, very fast and also very low energy. And that's really nice. But how would you program this board? So the default libraries are all in C. And if you want to be experimenting with a large amount of data or experimenting with making a difficult, complex scientific analysis a lot faster, probably going through the C workflow of writing very carefully crafted code that doesn't blow up as soon as you run it and compiling it and all those other things is not really much fun. People who use this sort of thing often want to explore and explore their data and explore their programs. And that's where dynamic languages really come in. So to use Python on a board like that would be really fantastic. But can we do it? Can we do it natively and really nicely? We don't know yet. So message passing concurrency is not a new idea it's a very, very old idea. And it came from two lines of work. One line of work was very practical. There's this old chip called the INMOS Transputer and the idea was that you would have many of these transputers there sort of like a CPU and you would put them in a grid and you would wire them all together. So this was like a sort of very early form of a multicore. The other line of work was very theoretical. So CSP, Communicating Sequential Processes are one way of mathematically formalising concurrent and parallel processing. There are many other ways and they're all sort of reasonably similar in terms of the ideas that are in them. And these two things went together but they never became popular because we have threads instead. So I'm not going to go through all of the sort of mathematical details of CSP but I want to give you a flavour of what it's all about and why it might be important. So in the process algebra view of the world computation is made up of imperative commands that run sequentially and you have lots of those but you have them in processes which run concurrently and can communicate with each other. So you have different processes which are all running their own imperative programs and they don't share any variables or any data. If they want to communicate with each other or synchronise with each other they need a special way of doing that and a common way of doing it but not the only way is to pass messages via channels. So the way to think of this as a sort of broad overview is to think about unix processes communicating by pipes. So you have different processes all running in parallel and if they want to communicate they do so by pipes because they don't share any memory with each other. But I'm sort of aligning a few details there because CSP is an abstract idea, it's a mathematical formalism. So when I talk about a CSP process that's not necessarily an operating system process and when CSP talks about events and synchronisation that's not necessarily like the sort of events that you would see when you were programming a GUI. But it might be, it might be and I'm going to talk about that in a bit more detail later on. So why would anyone care about all of this stuff? Well, if you've worked with threads and you've worked with LOX you know they're really tough to get right. Correctness is important and when you're dealing with really low level things like LOX and pointer arithmetic and all of this sort of stuff it's hard and locking is hard and deadlocks and starvation and race hazards and all those things are hard. And they don't sit well with our sort of very high level pythonic view of the world which is to use the abstractions of the language to make our life really simple and hide a lot of the really hard things which is a good way to do computing. It's what computer science is all about. So the good thing about message passing concurrency is that message passing removes some of these possible faults that you can have with locking. So you can't have race hazards with message passing concurrency but you could have deadlocks still so you know it's not perfect. And hopefully if you have a good message passing language or library then a lot of the difficult stuff is hidden away for you in a runtime system perhaps or in a library. So all the cool kids are doing message passing at the moment. It's an idea that's come back around because of multicore and other things. So the next few slides are some examples of message passing concurrency in different languages. So I've already mentioned unix and pipes and that's a sort of really simple idea of message passing and hopefully what most people are familiar with. So this is a simple sort of hello world in go and the syntax here is a little maybe a little unfamiliar to Python people. The idea is that here on this line we're creating a channel. So we're creating a new channel that we can pass messages around with like our pipe and that'll be bi-directional like a pipe is on the command line. This thing that looks like a function is a function but this special keyword go in front of it means that it's also a go routine. So a go routine is like a co routine. It's a sort of lightweight kind of thread. It's not a thread that's created by the operating system so it's much cheaper to create a go routine or destroy a go routine than an operating system thread. And then what we're doing on this line with this sort of funny syntax is we're sending the string hello world down this channel, my channel and then we're going to run this go routine straight away so it's going to be running in the background of our program. So these sorts of funny bits of syntax I'm afraid really pervade these ideas in languages and you'll see a lot of weird syntax. The CSP syntax for doing this sort of thing is a pling, an exclamation mark or to receive something down a channel a question mark so it kind of hasn't got better through the ages in my view I'm afraid. So on this line here what we're doing is we're receiving something down this channel from this channel and whatever we receive we're just printing out. So this is a simple sort of hello world in that language. Rust is another new language does similar things so we've got a sort of idea here we've got a channel being created here we've got a background process running here where we're doing some sending then we're receiving this value and printing it out. What's slightly different with Rust is that if you are familiar with working with UNIX pipes in C you'll know that when you create a UNIX pipe in C what the operating system gives you is two ends of the pipe a sending end and a receiving end and that's what's happening here with Rust we've got a sending end of this channel and we've got a receiving end of this channel and the idea there is to prevent you from doing silly things like sending down the receiving end or receiving down the sending end so you the programmer in Rust have to decide ahead of time where in my programme am I going to want to send down this channel and where am I going to want to receive down this channel it's something you need to think about at compile time so Scala being a JVM language is taking up the whole screen with its verbose braces but this is exactly the same sort of thing so we have an actor which is similar to the sort of background processes that we were talking about and co-routines and so on and we've got this slightly backwards here so here we're sending so this is really using CSP syntax with the pling and in here we're receiving something and printing it out and Python CSP so Python CSP is my own library and some of you in this room have been really generous and contributing to it particularly Stefan over there in previous year of pythons so this is an attempt at doing something like this in a pythonic way for Python but built as on to the language as a library so here we've got two CSP processes so we're not saying in the code here how those processes are sort of reified whether they're co-routines or threads or operating system processes but we've got two processes here that can run in parallel with the decorators we've got channels that can be shared between them and we can read and write with those channels and again we're just sending hello world printing it out and then on this line so this is a much more sort of CSP-ish way of doing things than perhaps the other examples we're saying well we're going to take these two processes and run them in parallel and start them off and if we had a huge program with many processes we might decide well we'll run them all in parallel or we might run a few then run a few more in sequence whatever we wish we'll have flexibility there about how our program is sort of put together so this last example is by a student of mine Sam Giles and what he was looking at was really interesting which was can we build a language like this that has the sort of go or rust style channels and concurrences on the R-Python toolshane so can we use the technology that the PiPy team would develop to do this same hello world example there we've got channels here we're going to send down that channel a hello world and then we've got Sam's sort of unusual receive syntax there to receive something from the channel and print it out and this function here is being run in the background as a sort of asynchronous coroutine so that's a really nice project and it's a really nice way of working and I think actually Sam was a very very good student but I think it's a testament to the good engineering of the PiPy team that an undergraduate student can produce a working language like that in the small amount of time for a final year project so I'm not going to talk in great detail about optimisation and speed and efficiency and those sorts of issues but I just wanted to show you quickly one of Sam's benchmarks which shows quite nicely that with a jit with a tracing jit now I can perform well compared to other languages of this sort so go here an OCamPy which is a descendant of OCam both compiled languages and now I can compares pretty reasonably well to them this is only a small benchmark so we perhaps shouldn't take it as gospel but it's a good indication that this sort of way of working might be positive on the other hand I haven't got for you here the same benchmark with Python CSP but we looked at similar things with Python CSP and that was engineered very differently so I'll talk a little bit more about the sort of design decisions that message passing concurrency implementers might take in the next section of the talk but what we found with Python CSP is that our implementations of channels were very very slow so you wouldn't expect or I wouldn't expect a Python implementation of this to be as fast as something like go OCamPy that's compiled and has a lot of a lot of engineering going into these features but I perhaps wouldn't expect or I would hope that message passing would not be the bottleneck in any program and what we actually found was that OCamPy is incredibly fast, it's designed exactly for this but compared to other sorts of interpreted languages we looked at JCSP for the JVM thinking maybe because JCSP is built on Java threads are okay but you know they're operating system threads maybe we could get something like that performance and we actually got sort of a hundred or so times worse and didn't do very well at all so there are some lessons learnt there and there's some interesting stories but part of the take away of this is that actually it's very difficult to engineer that kind of performance if you're starting from an interpreted language that hasn't been built with this sort of concurrency in mind so this next section of the talk is all about the sorts of varieties of message passing concurrency that can be created and the different decisions that an implementer would have to make if they were going to implement something like this in Python so one choice is synchronous channels versus asynchronous channels so in the CSP way of thinking and in the sort of process algebra way of thinking and that sort of very mathematical formalism the idea is that all channels block on reading and writing and you don't move forward with the computation until your read or your write is finished and some people including me think that the nice thing about this is it's then very easy to understand what your program's doing and reason about it because you know exactly in what order everything's going to happen you know this piece of code will not move forward until it's finished this read and then it'll do this and then all the other things that are waiting on it will be able to move forward as well asynchronous channels though are quite common as well in different languages and some people suggest that they're a bit faster and sometimes that seems to be true and certainly the benchmark I showed you before showed that in that particular benchmark Sam's asynchronous channels for now were a little bit faster than his synchronous ones if you do have synchronous channels though you need to think a little bit about avoiding some of the common problems that people have with concurrency like starvation and you don't necessarily want a process to block for a very long time if it doesn't have to sometimes it might have to sometimes it might be waiting on a long computation but if you don't have to block you would probably prefer not to so a common feature of message passing languages and libraries is some way of selecting the next ready event to process so if we're thinking in terms of events being channels and message passing down channels if you have a lot of channels that you're waiting on and you want to read from a sort of map reduce type problem or a worker pharma type problem then you might say well give me the one that's ready first and that's called alternating in sort of OCam old fashioned language or selection more generally so you can say select for me the channel that's ready to read and usually if you're implementing that selection you do that rather carefully because although you might want to select the next most next ready channel to read from if you've got a channel that's always ready to read from and some that are taking a little while you don't want those other channels to not be processed so usually there's a little bit of work goes into that to avoid starvation and do some good load balancing so that's one issue synchronous or asynchronous channels or you might say buffered or unbuffed channels another issue is are your channels bi-directional or are they unidirectional so we saw in Rust you get what you get in Unix C which is a read end and a write end of a channel and that's quite a common way of working with channels to avoid miss avoid some mistakes in your code if you look at the JCSP library which is a very nice library because it's been engineered very well with a lot of thought going into its crackness the JCSP library is very Java like and Java people don't mind having thousands of classes to choose from and large amounts of documentation and they don't mind pressing control space in their ID and getting a long long, long list of things and so JCSP sort of works with that paradigm and it has lots of different channel types that are all classes I haven't listed them all because the slides are small but so you can have things like you can have a one to one channel that has one reader and one writer process attached to it at any one time you can have it any to any channel that has any number attached to them at any time and so on and then you always have the read end channel wherever you are and the idea here is to use the type checker to design out a lot of potential faults that might creep into your code so that's nice for Java because it fits well with the sort of Java way of doing things it's what Java people would expect so when I wrote Python CSP and designed that I made all the channels and I didn't give people the read end and the write end I let them shoot themselves in the foot because it seems to me to be a bit more of a sort of dynamic way of doing things and a bit more pythonic but not faultless not foolproof so those are a couple of different design choices another is mobile or immobile channels so this is something that wasn't built into CSP originally but it was built into a different process algebra called called the Py Calculus by Robin Milner and then the Kent the team at Kent University who sort of took over the development of Occam created Occam Py which sort of fused together the Py Calculus and the CSP way of doing things so a mobile channel is a channel that can be sent down another channel to a different process and the idea of doing that is that you can think of your message passing program as being like a graph where the nodes of the graph are your processes and the arcs between processes are your channels that link those processes together at runtime you may wish to change the topology of that graph and change its shape so two good reasons why you might do this one might be a bit to do with load balancing if you have a computation that's split among a lot of processes you might find some of them are more active than others and you might decide to change the load balance between them which might also mean changing the topology of the graph and who's reporting their data back to who and who's aggregating the data and so forth that's one reason might be that you might be running these processes across a network so you might not only be working with one machine you might have some processes farmed out to another machine on your network and then you might have issues like latency or you might have issues like network failure or whatever and that might make you think well during the running of my computation I'd like to change the topology to make the most efficient use of that network of machines so that's one reason so this this leads to two issues mobile channels can be great if you can use them really well and you've got a good use case for them if you're in a situation where you need to shut down this network and graph of running concurrent processes then you need to notify each node in your graph that it needs to shut down and so doing that safely is quite an important thing to do so in the sort of message passing world one way to do this is called poisoning which means that you tell a channel or the node that decides to shut everything down or shut a few things down tells a channel or all of its channels that it knows about that they need to start shutting down and they need to propagate the message that this program is going to halt and this is called poisoning so you poison a channel and the idea is that it poisons the well of the whole program and each process shuts itself down safely and that's something that takes a little bit of care and a little bit of good engineering because you need to think well if I'm a process and you're all processes and you need to all die and then I'm going to die then that needs to happen in the right order if I kill myself first you won't know what to do or not, who knows so the other we talked about channels we talked about mobility different sorts of channels the other thing is how to represent the processes and there are a lot of different choices there too so in some languages one CSP process is one co-routine and that makes sense in some paradigms so I think this is kind of how Node.js works and that leads to very fast message passing because in the run time system all those processes share memory they're all really in the same operating system thread so they can do a lot of things very very fast and they can pass messages down channels very very fast but then it's hard to take advantage of multiple if you're all in one thread you could have a one to one mapping where one CSP processes one OS thread that gives you much slower message passing because whoever implements that does have to deal with locking and all those low level issues but then you can start taking advantage of the features that your operating system has you could make one CSP process one OS process and that's a really good choice if you're thinking about migrating processes around a network and running your code on more than one computer at once so sort of MPI style if you're into MPI or you can have some sort of multiplexed version of all of those options so you can have some CSP process that are co-routines but live inside an OS thread and there are other CSP processes that live inside another OS thread but are really co-routines themselves and all sorts of combinations therein and this is really where why Python CSP was not as fast as we'd hoped because we were looking at taking advantage of multi-core in the network so we were using these sorts of one to one mappings which are not the best in terms of speed so I'm not going to talk for a huge amount longer because hopefully we can have a good discussion but I wanted to say a little bit about message passing in Python so there are lots of although Python is not a message passing language in the way that goers and rasters and all those other things Python does sort of have a lot of these ideas built into its ecosystem and sometimes in libraries sometimes in different implementations of the interpreter sometimes in all sorts of other ways so I was really pleased looking through the Europe Python schedule to find that actually there are a lot of different talks in this conference that in some way have quite a lot to do with the ideas that I've been talking about today so they're not necessarily straightforward implementations of message passing in the way that Python CSP was but they take on some of those ideas either by implementing co-routines or using co-routines or using channels and so on so in a sense message passing is already here in a sense and also of course in Python 3.4 we have co-routines built in so there's perhaps a big opportunity there to think about building these things in to the core of the language so if you're interested in this stuff then I'd certainly be interested in talking to you my next steps for this Python CSP has been sort of in abeyance for the last few years while my day job has taken me to do different things but Python for the Parallella board will be coming out this summer so I've got a project working on that this summer starting sort of mid-August and we'll be looking at nice and hopefully efficient ways of using Python for the Parallella that ideally would use message passing in some way but we'll see how that works out the jury's rather out on that one and Python CSP is certainly moving back into regular development Sam's language now will be continuing so I'll be at the Pi Pi Sprint this Saturday doing a little bit more on that and if you are interested in this stuff then please do come and catch me some time thank you very much can you please slide up on the mic on either side of the board hello I have seen that we are always relying on the operative system layer for the threads or the core routines etc has been any enhancement on how processors can pass from one to another information about besides the caches and all those things there was an interesting development in the open MPI library a few years ago when they found that their message passing was a little bit slower than they would like and open MPI people tend to work on Linux so the Linux kernel brought in a new way of doing that which is called cross memory attach and the idea of cross memory attach and I think it is only a Linux thing now but the idea of it is that you've got two different operating system processes and then rather than saying rather than doing what you would do in a pipe which is that you copy the memory you have you keep the memory in one place and then you pass around a sort of handle to that memory between processes so that's a much quicker way of doing it that was built specifically for MPI but would possibly be a really good way forward for for any other implementation like an implementation on Python so yeah there's definitely some interesting work there Hello I wanted to ask what the Python CSP library provides that Sagee event doesn't apart from the Simpler API well it's a good question it's a different API I don't know if it's simpler or not Python CSP started because I wanted something like this in Python but the only things that were available were direct ports of the Java JCSP language so the idea of this is that it's much more similar to a CSP way of doing things than anything else so what does it provide it provides processes which can be various sorts of processes it provides channels it provides selection or alternation it provides a small library of built-in processes that might be useful so the reason for that is that the way that CSP people tend to think about this is that the more concurrency you have the better so rather than saying well a nice sequential program how do I split it up to make it efficient or concurrent or sensible or whatever it is they say well you know make everything you possibly can concurrent so they tend to have libraries of processes that do things like have two channels read down those two numbers from those channels add them together and send them out down a third channel so a process that just does addition and then a process that just does all the other arithmetic things so there's support for that way of working if that way of working is something that's interesting to you I suspect though that that way of working is probably only interesting to people who are interested in CSP for its own sake because it's not a terribly pragmatic way of working so I mean the answer to your question really is it Python CSP implements all the sort of basic things that you expect of a message passing library it's just a matter of how it implements them and how well and I think we probably score about 5 out of 10 for that at the moment but hopefully it will get better So quick question on Python CSP and multiple processes is it currently implemented with multi does it use something like multiple processing multi processing how are you doing the message passing across processes is it difficult to serialise them so we've got two different ways of doing it one with threads and one with processes but I didn't use multi processing I just use os.fork and that sort of thing so windows is out of the question yeah so the idea of that was that multi processing is really built for a particular way of working and it has a lot of internal code that supports that way of working but isn't so useful if you want to do things the CSP way so for example I think when you spawn a process in multi processing that process also spawns a watchdog thread for that process but in a CSP library you don't need that so the idea was to be just a little tiny bit more efficient by not having those multi processing internals I think in reality that if you compared a version of python CSP using multi processing on one without you probably wouldn't find a vast amount of difference so you could easily do most of these things using multi processing because you've got pipes in the MP library so in that sense python has some of these things built in already and so you used the os.fork to do the memory copying to do the message passing no to do the processes and then pickle and so I think I might know why your message passing is the bottom yes I think that's a very good yeah absolutely I mean something like so shared memory is a problem for object passing because you still have the question of then who owns the reference count so some kind of library that could where you could have immutable data structures where and you have a convention that the receiving channel owns the message that's been passed so it's responsible for the the destruction and then you could do reliable message passing between between channels yeah I think that makes a lot of sense the version of python CSP that uses os process is sort of unique shared memory type things and that's still quite slow partly because I think shared memory is more efficient when you're copying a large amount of data or copying data many times through the shared memory it's not really intended for sort of one off sends and receives which is kind of what you're doing when you do a message pass in CSP so it's not really the right tool for the job so shared memory attach might be interesting so where do you see the future of message passing in python would it be something like Prypy STM or rather something like async IO or so async IO sort of does this kind of thing already but for IO processes that you want to run in the background for that particular use case which is great but for more general computation I think it would be interesting to see message passing used together with python 3.4 co-routines and see how that goes I think that would be a really interesting experiment to do and really interesting to benchmark that and see if it could get really fast and usable for the sort of as it were the ordinary programmer rather than someone who's got a particular use case like background IO so and the Prypy STM like would that be any hope so Prypy STM is a fantastic piece of work as I understand it the purpose of Prypy STM is to make the core interpreter concurrent in a sense which means that you can then build these high level sorts of concurrency that the programmer would see on top of that so I wouldn't expect that I hope Prypy STM is really successful I wouldn't expect that that would mean that ordinary Prypy programmers, ordinary python programmers rather use STM in their own applications I think that's kind of the wrong level of abstraction for the programmer so I think building message parsing on top of Prypy STM would be really interesting I have another question you already mentioned Staglars Python did you look into the co-routines Staglars Python are called tasklets and the channels provided by Staglars Python so I didn't quite catch that Staglars Python is an alternative implementation of the Python interpreter and it already provides co-routines and channels and message parsing over these channels and did you look into it? Yeah, yeah, so I think my understanding is that Staglars has a different implementation of the Python interpreter so it's not quite C Python which is why it's Staglars actually actually it is C Python with some additions Yeah, okay, with some changes So it's fully binary compatible with C Python Yeah, so yeah so that's also a really interesting piece of work Yeah