 Thank you very much. Thank you so yes, my name's Nicholas. I'm a freelance Python developer from the UK. This is an introductory talk really about async.io through the medium of distributed hash tables. Async.io as I'm sure you all know arrived in Python 3.4. My only expectation of you is that you know a little bit of Python. So what we'll do is we'll explore some of the core concepts of async.io and I'll tell you the story of how I used async.io to create a fun little personal project which was to build a distributed hash table which Holger very usefully referenced in his keynote this morning which is perhaps why there's rather a lot of you in this room. This isn't an exhaustive exposition of async.io. There's an awful lot I've had to leave out and it simplifies a lot as well but my aim is to arm you with just enough information so that you can continue exploring the module and the concepts found therein afterwards. It's also a personal pedagogical exercise. If I can explain async.io in a simple sort of a way it demonstrates my own clarity of thought about async.io and understanding about async.io. As you can hear I'm talking very quickly because I have rather a lot of information to give to you. So yes perhaps like Eric Gumby your brain will hurt by the end of this talk. So the first thing that needs to be answered is what does async.io do? All the Python documents are very clear about this. They state that this module provides infrastructure for writing single threaded concurrent code using co-routines multiplexing IO access over sockets and other resources running network clients and servers and other related primitives. You got all that? I've actually got that written down. I didn't do that from memory. Now while I understand all the terminology from the documentation it doesn't give me a sense or a feel of how I might use this module and such documentation can make the module appear a little bit intimidating and sort of the realm of esoteric leaked uber hackers like Trinity who's so leachy talks in courier fixed width. See if you can figure out what Neo talks in. Anyway, what does async.io do? Well we can do a lot better than that. We can keep the answer simple and stupid. What does async.io do? As Trinity says it lets you write code that concurrently handles asynchronous network based IO. So let's be clear about what I mean by those terms. Concurrency is when several things appear to happen at once. Asynchronous literally means not synchronized. There's no way to tell when something may happen. The network of course as you know is a medium for communicating with another device usually via the internet and IO is as I'm sure you all know input output when a program communicates with the outside world. So the problem clearly stated is that messages arrive and depart via the network at unpredictable times and async.io lets you deal with such interactions simultaneously. So given this simple purpose I want to place async.io into a practical context. So let's talk about distributed hash tables. So again let's start with the obvious question what is a distributed hash table. Well I'm assuming you know what a hash table is because it's more or less the same thing as a dict in Python. We are at your Python after all and I think it's reasonable that I can expect you to all understand what a dict is. It's a simple key value data store. A distributed hash table is distributed because the whole is built from several independent yet related parts. So it's a little bit like this abstract encyclopedia. The whole encyclopedia is made up of volumes that are independent yet related to each other. In our case the distributed hash table is a structure that consists of many independent nodes that collaborate with each other over the network. DHT is also decentralised so no node is more important than any of the others. So there are no client-server relationship in the DHT. It's a loose peer-to-peer network of nodes. So a distributed hash table a DHT is a peer-to-peer key value data store. Why would I want to implement one of these? Well, it's a really interesting programming problem with fascinating properties. As I mentioned, there's no single point of failure. The DHT algorithm that I use, which is called Cademlia, efficiently scales to a huge number of nodes. It has good handling of fluid network membership as nodes leave and join the network. It's a solid foundation for more complex services such as the ones that Holger referenced this morning. It's tested in the real world so BitTorrent uses the Cademlia algorithm and FreeNet does as well. Obviously, if you're like Holger it's an example of decentralised platforms which are fascinating on more than just a technical level on a political level as well perhaps. So guess what? Distributed hash table nodes have to concurrently handle lots of asynchronous network-based IO which is the sweet spot for AsyncIO. So we have a context. So how does AsyncIO or how do I use AsyncIO to make all this work? So let's introduce some core concepts. The first one being the event loop. Quite simply, this is some code that just keeps looping. Each iteration of the loop basically does two things. The first thing is that it pulls for IO events that occurred during the time it took to complete the previous iteration of the loop. The other thing that it does is that it runs any callbacks that need to be run during this current iteration of the loop. The loop also carries out various housekeeping needed for callbacks that have yet to be executed but that's something that we can ignore. So it's important again, this is sort of an introductory talk to define what polling and callbacks are. So polling is discovering the status of something external to the programme and in AsyncIO this is network-related IO events. A callback is some code that's to be executed when some event has occurred that's been detected via polling. And this metaphor, obviously the kids are polling. Are we there yet? Are we there yet? Are we there yet? And the mother is saying I'll let you know when we get there and this creates a sort of a callback in that she is promising to do something when some condition is met, i.e. they get to the end of their journey and they get to Grandma's house. So it's important to note that polling takes place once during each iteration of the loop. IO events are discovered by polling to determine which callbacks to execute during the current iteration of the loop. All pending callbacks executed one after the other as it says so in Pep 15. And the loop can't continue while this is happening. It is blocked. So the next iteration cannot start until all the sequentially executed callbacks finish in some sense. So you're probably asking, there's something wrong here. This doesn't sound very concurrent. Unfortunately, concurrency is a hard problem. And there's actually more than one way to do it. So it's worth taking some time to examine why Async IO works in the way that it does. So in the traditional threading model several problems can happen. Task A reads a record. Task B reads a record. Task A changed the retrieved data in different ways. Task B writes its changes. Then task A writes its changes. What's the problem here? Well, task A has overwritten the record containing task B's changes. So we have loss of data, which is something that we want to avoid. So why not wait until one task is finished before continuing? So first do A, then B, followed by C and so on. This is easy to understand and sort of deterministic. But what happens if A needs to wait for something such as a call to another machine on the network? Well, the program has to wait until A's network call completes. And in that situation, we can't get on with other stuff because we're still hanging around for A. So given the situation in this slide, the program is described as blocked. And this is unacceptable if writing code that needs to react quickly to network-based events, which is precisely the sort of program that has been designed to help it. So you're probably asking yourself, why not just get on with tasks B and C while we wait for the result from the network call for A? Well, if you're thinking that, then actually you've described quite succinctly what Async.io does. So welcome to the most important slide of the talk. I hope you're all paying attention. So Async.io is event-driven. And this means that network-based I.O. is non-blocking. So how does this work? Well, the program does not wait for a reply from network calls before continuing. Programmers, us, define call back to be run when the result of the network call is known. And in the meantime, the program continues to poll for and respond to other network-related I.O. events. Callbacks execute during the iteration of the event loop immediately after the expected network I.O. event is detected. Confused? You shouldn't be, because actually this is how we as humans think about concurrency. So in the real world we make plans all the time. The washing machine finishes is an expected event and hanging the clothes out to dry is a callback for when this expected event happens. So how hard can this be? Says the stock photo dude with the washing basket. Also, we multitask in the same way Async.io does. We skip between the things we need to do while we wait for other things to happen. We know we will have time to squeeze the orange juice while the toast and the eggs are cooking while we make breakfast. We don't just stick some fridge in the microwave, put it on for five minutes and then watch it. Well, I don't. You probably get on with other things. This is exactly the sort of thing that I'm talking about. We multitask. There's only one of us, there's only one Async.io event loop and this is how we react to the world around us. The Microsoft... The microwave. It was a long night last night. The microwave goes ping and that is the expected event and then I take the porridge out and eat it. So a Async.io avoids potentially confusing and complicated threaded concurrency while retaining the benefits of strictly sequential code. Thank you. This is the fundamental advantage of Async.io. We plan ahead of expected events by defining callbacks to be called when such events eventually occur and in the meantime we sequentially handle the callbacks related to other events that may have happened in the interim time. So questions you're probably asking yourself are how are asynchronous concurrent tasks created? How do such tasks pause while waiting for non-blocking network-based I.O. and how are callbacks defined to handle the eventual results? To answer these questions you need to understand co-routines, futures and tasks. So I'm going to attempt to explain co-routines in about three slides. A co-routine essentially is an object representing an activity that eventually completes or it's a decorated function that returns such an object. The important thing to know about co-routines are that they may be suspended using the yield from syntax and when a co-routine is suspended this allows the event loop to get on with the other things. Co-routines are sorts of generators so they lazily generate results so calling a co-routine doesn't actually start its execution. They yield from other objects and when the yielded from object has a result the co-routine continues from the yield from statement that suspended it and this is called re-entry. And at the end of the chain of yield froms is an object that returns a result or raises an exception. Rather than yielding from some other co-routine. So this oh dear look at that that's just chrome being a bit daft. Okay so this is a decorated co-routine method that handles an incoming HTTP request as part of my DHT. Upstream something is yielding from the co-routine created by this function in order to do something with the response that it generates. And this block of code will pause by yielding from the co-routine created by the payload.read call which is a method that reads the raw data posted as part of the request you can see that just after the try. Okay. When all that data has arrived the code pauses again while waiting for the code routine created by the self.process data method itself waiting on other things such as perhaps a call to the database. And when the task encapsulated by this co-routine is complete i.e. returns a response the upstream co-routine gets the return result and resumes execution from where it yielded from in its code. So we kind of know how asynchronous activity happens but how do we handle the result? How do I handle the result of a co-routine? What about callbacks? Well for this you need to understand futures and tasks which I'll try and explain in just two slides. So a future if you've done twisted this will all be very familiar to you. A future represents a result that may not yet be available yet. Callback functions are added to a sort of a to-do list to be executed when the result is known. And a task is simply a future that wraps a co-routine and the resulting object is realised when the co-routine completes we say the future has resolved sounds all very docked too. So here's some example code which I hope makes this clear. So the first thing I do is I create a callback function handleResolvedTask and the only thing that it does is record it logs the result. The next thing I do is I create a task associated with a certain co-routine that will pause and then I add to that task the callback that I defined above and the task object is automatically resolved when the co-routine that's associated with it completes. And at the end I execute the code because remember co-routines are generated by setting up and running the event loop itself. So here's another perspective on this a little bit mind-bending. So my buddy Terry Jones likes to point out that we're used to working with first-class functions in Python. We pass them in functions and we return them as values. We just might not know what the result is yet but we can start to add callbacks to them. So let's do a quick recap before we come to the sort of end of the talk. I see him waving the five-minute flag. So an event loop is looping code that pulls for IO and manages event handling callbacks. Co-routines are a little bit like first-class function calls. We can also pass them into functions and callbacks. Co-routines is an object that represents activity that eventually completes or is a decorated function that returns such an object. Future represents a result that may not be available yet. The associated callbacks are executed when it resolves, like I said, the future resolves in a kind of mysterious, gallifray and sort of a way. Tasks are boiler-paked future classes that wrap co-routines and these are realised when co-routine is done. So, given all this theory, let's very quickly look at how AsyncIO works in practice with my DHT project. For us to be able to do that, I need to very quickly explain how a DHT works. So, as I mentioned before, a DHT is made up of nodes and the classic way to visualise this is with a clock face and you'll see why in a moment. Each node has a unique ID that is within a set of all possible values for a certain hashing function. You all know what hashing functions are because you're all-in-hold as excellent talk this morning. Okay? In my case, with my DHT, I use SHA 512. The ID's value indicates the node's position in the clock face and in this way, we can tell where the node is located in the abstract network, as it were. So we can tell who is close to who and how far away nodes are from each other. So there is some sort of notion of distance in the DHT. And data is a key value pair. And the key is turned into a hash again, using the same hashing algorithm SHA 512 in this case. And the value is stored at nodes whose IDs are close to the hash of the key. So, this is similar to understanding where to look things up in a multi-volume encyclopedia, going back to the example at the beginning. Articles are words or keys in our case and associated definitions are values that are stored in volumes that cover some alphabetical range within the global encyclopedia space, as it were. So, ARDVARC belongs under A whereas Zebra belongs under Z. Okay? How do nodes know where to look? Well, each node maintains something called a routing table that tracks the state of its peers. And each interaction with its peers results in exchange of information about the nodes that he's been talking to. And that's how the routing table is populated initially and kept up to date because nodes keep bumping into each other like ants in an ants nest. The routing table splits the clock face of nodes into buckets. Now, buckets contain the same number of peers, but buckets cover a smaller range closer to the local node. Therefore, the local node knows more closer nodes. As you can see, me, I'm red at about half past one and the nodes that I know are in the different buckets and they are the produced squares. Okay? Each node has some very simple rules. They behave according to some very simple rules. It doesn't really matter what those rules are. I'm just putting a few up there to illustrate what I mean. But the important thing about a hash table is that you need to be able to put values and you need to be able to get values. You need to get and set. And each one of those fundamental actions requires a lookup. Okay? So that you know which nodes you want to interact with. And all these interactions are asynchronous and lookups are also parallel concurrent because you can ask several peers at the same time about the information that you need. So, how does the lookup work? A recursive lookup is the six degrees of separation game which always features Kevin Bacon if you're talking about how they would act. So, say I want to put a value with a key who's hash is in a position at six o'clock. Okay? There's the target at six o'clock. And the first thing I do is I ask the nodes in my routing table that are closest to that hash. And they reply with nodes from their routing table that are closer to the peers. And this keeps going recursively until I find nodes that are actually close to the target key. Until I can't find any nodes that are close to the target key. So, get and set requires a lookup. And how is this handled in the realm of async.io? Well, a lookup is a future because it's something whose result we can't yet know until we finish looking up. We need to interrogate all our peers on the network. Okay? And the state of the lookup, the progress in finding nodes that are close to the target is held within the lookup instance. And the lookup resolves because it's a type of future when the result is known. And the result is either going to be a value or not found exception in the case of a get operation or it's going to be a list of the closest known nodes in the DHT in the case of a put. And what I will do then is contact each of those 20 nodes and say, store this information for me please. So what about networking? How does async.io handle different networking protocols? How do nodes in the DHT handle down the wire aspect of input output? So core concepts 5, 6 and 7 are streams, transports and protocols. Streams are high level abstractions that allow you to send and receive traffic down the network or from the network using reader and writer objects. Transports are provided by async.io to handle low level networking activities such as TCP and UDP and they handle the low level IO layer and the buffering and the event loop and sorts these out for you so you shouldn't really have to interact with them. And protocols handle network protocols at the application layer so for example HTTP or net string so on and so forth. Streams are flows of data that can be read from or written to and they are built upon the transports and protocols. So what is a transport? A transport is concerned about how stuff moves over the network whereas a protocol works out what to do with the stuff that arrives from the network. It works out how to turn the raw bytes into something meaningful such as a net string message or an HTTP core and really you only need to work with streams and perhaps protocols if you're doing something a bit fun. So my final thoughts. With my DHT I've managed to get 100% unit test coverage and this is because async.io is part of Python and testing is normal. I'm using the built-in unit test library and mock and so how you organise your code is a key factor for this but this is in contrast with for example twisty which has its own test driver and its own unit test class it doesn't work quite the same way as the one in the standard library which can be a bit confusing. My DHT implementation is small and simple so there are only 871 lines of code as of this morning in the DHT itself and this is because async.io makes it easy I've got about one minute makes it easy to think about concurrent problems. I believe the abstractions make it easy to write very simple code that's short and comprehensible. And finally I need to mention the difference between IO and CPU bound don't use async.io if you need to do something with lots of CPU overhead because of course that's going to block the event if you have lots of networking however that's the sweet spot. There are changes for better in Python 3.5 co-routines with the async and await syntax which Guido mentioned yesterday his favourite feature so it must be good. So if you remember here's the co-routine that I showed you earlier on simply all I've done is taken out the yield from and put the correct keywords in Python 3.5 these are proper native co-routines and they're not to be actually confused with generators the new async syntax outside the code block not some internal yield from so it actually makes it easier to read in my opinion and also the new features it comes with an await context manager and iteration protocols as well so you can do some really cool things though that's not shown here so pep492 is Guido's favourite feature go read that I've not tested this code as I put it's for illustrative purposes only the yellow can smile the DHT project is called the drugulus that's the github account any questions does anybody have a short question a short question like my DHT code that's short no questions at all okay thank you very much