 TAPS Transportation Services API and how that relates to retiring the BSD socket API. Please give him a great round of applause and I'm welcoming him to the stage. Hi, so my name is Phil or Philip Tiesel and I'm today here to talk about TAPS the Transport Services API. All the work I'm presenting today is work in progress. So it has become a fairly large group of people working on that and it's really awesome how much traction it got but it's still work in progress. So we have no ROC yet, we are just talking about internet rafts and preliminary stuff we're doing for the API but on the other hand we have already have a few implementations and it's really nice seeing how much traction this gets. But before talking about why I want we want to replace the BSD socket API or want to retire it let's see what's wrong with that. So looking at the BSD socket API we have to go back in its history and you see the BSD socket API originates from 4.2 BSD from 1983 I was two years old when this was released and therefore it has become quite old. At this time it was they were really very strong in the idea of everything is a file and as we didn't have virtual file systems yet in this time they decided okay if we want to do networking just make it a file handle and have instead of an open call the socket call and everything works like a file and it was implemented as an extension for inter-process communication so instead of doing inter- process communication on a single machine we're now doing it over the new shiny internet which was quite small at this place and time and work yeah as an experiment and they decided to say we have two kinds of APIs for inter-process communication a message-based you can easily also create in the new mix domain socket and a stream-based and for this they implemented the user datagram protocol UDP which is unreliable and the transport control protocol TCP which is reliable stream and as it was the first really nice real API that ought to be usable and it doesn't require to fiddling with network drivers or putting a similar code in to come that but enabled everyone who was able to write a file to also do network in the process communication it became the template of most modern networking APIs if you look at the BST socket API and look at the Windows socket API or other socket API on embedded platforms they look quite the same they're all modeled after the original BST socket API so let's see what we need to open a socket this is the classical example from Stevens at all Unix network programming this is still the reference on how to do this so you first create an integer variable to put the socket file descriptor as file descriptors are integers then you reserve a struct for the the name resolution configure it a little bit and call get other info with all this stuff and to get the return parameters now you have a linked list of stuff that allows you to to connect and as the results of the name resolution if it fails to get an error and then it gets worse you start a while iterating about this list and trying can I connect the socket if it works we're fine if it doesn't work try the next one tree try to connect if it works we're fine if it doesn't work it blocks its timeouts and we go to the next drop and try to drive the next one afterwards we have a socket file descriptor we have a special free that allows us to free the whole list and we're done so simple question what's wrong with that and I'm not thinking about modern APIs but just on what's wrong if you accept that CS program that way what's still wrong with that and the question the answer is today's internet transport has changed we're not in 1983 anymore and therefore a few things changed first we have much more many more protocols out there than just TCP and UDP at the narrow waste of the internet we now have ipv4 and ipv6 and not ipv4 anymore at the transport layer we don't have we have sctp as a reliable message-based protocol we will become quick we will get quick pretty soon so it's rolled out at the moment primarily from Google but Akamai is on the way and a lot of other CDNs or contrast distribution networks are on the way to roll out this so we'll have a second transport protocol pretty soon that is made for replacing TCP at least if HTTP is on the on top of that and probably you want to use this transparently as the second change we have devices that have multiple paths today so if you had the mainframe in 1983 you were really lucky if you had a permanent connection to the internet and were not dining and up through 300 bout modem today you have a cell phone that has multiple network interfaces at least one cellular carrier and at least a Wi-Fi interface probably you can use multiple at the same time and you might want to really decide on a per socket basis which pass to use and finally you have multiple endpoints serving the same data if you're looking if you're talking to Google or any other CDN you just get the front-end cache and you have several to choose from and finally last but not least everything today should be encrypted at the oldest of the internet everyone was trusted we don't need to encrypt communication you can trust the other a few hundred guys on the internet you know them today we need encryption to save everything and these are all things that were not sought of when inventing the BSD circuit API so let's look how we can fix that we go to our textbook example again and have first to look at the name resolution we want to resolve names over multiple paths in parallel and to make things worse if you're talking to CDN nodes you cannot use the same IP addresses through all links because if you have a Wi-Fi from the camp and for example a T-Mobile LTE on the camp those will most probably deliver you different DNS results for CDN nodes and you might not be able to reach the other CDN nodes at all for using the wrong link so you have to resolve in parallel you have to keep the results separate and you have to use them in the right way the BSD circuit API doesn't provide this feature so if you want to do this in an application for example the web browser you have to implement your cold DNS yourself welcome second if we have results these DNS results contain information that will be later on useful for your secure connection setup if you want to do use features like SNI encryption that allows that prevents some observer on the past to see what hosting you're connecting to you have to get a key from DNS the current second API doesn't provide you this you will need and you you will have to implement DNS yourself so thing if you want to connect users usually like fast connections if you have to time out multiple times if you have to time out 90 seconds per connection attempt the user gets annoyed if you click on the website a link and it takes you three or five minutes to display something the users really really angry so what you what all browsers do today or what you have to do today if you want to be state-of-the-art you have to try multiple connections in parallel so you try IPv4 and IPv6 with a small head start for IPv6 wait which socket connects first and use that one so no loop anymore but a fancy loop around select who in this audience likes select or e-paul or paul as a programming concept oh we have quite a few so it really gets ugly to code this and with things like quick emerging we will have to do the same for the protocol so this is ugly then we need to set up a secure transport so for example SSL which is not part of the socket API but this is a separate library you're using and you're separately take care of that and finally you have to pass the transport protocol chosen so if you use TCP or quick or whatever you use back to the application because the socket may have slightly different semantics whatever you chose I sort about presenting a code example of that and we're looking for something so there's no textbook example there's nothing on stack overflow by the way the nicest and shortest example I found was in libcurl and it was about 1000 through 1200 lines of code okay so how the heck do we solve that the usual idea of people are asking that question say oh no way there's no way of replacing the beasties socket API don't just don't think about it would work several researchers tried that and failed and I only have one reason why I'm hoping that this approach might succeed so why I'm standing here last year bunch of people who got really annoyed about the beasties socket API because they broke their research code or were really ugly in the API design met at the tabs working group at the ITF and said oh we have to do something about this and the group who initially met was first a bunch of academics and then Apple joined and we're saying oh yes we built our new network API we want to standardize it here work together with you who all have worked on different aspects of this connection set up and of this automatic probing of different transport options let's work together and let's build something new and they really are now trying to release the socket API or the new network framework that's basically what we're currently standardizing at the ITF okay I talked about the ITF the ITF what the heck is the ITF and what the heck is the tabs working group first the ITF is a standards body that looks sounds pretty boring so a lot of people talking about standardization and how to interoperate with things not quite because the ITF is a pretty unusual standard bodies and also if you visit the meetings you will find out it's a pretty nerdy place I think these two quotes characterized the ITF pretty well so the first is from David Clarke and saying we reject Kings presidents and voting we believe in rough consensus and running code which is quite unusual for standards body because usually you the standardize something that someone else already had as a product or you're just standardizing something everyone's agrees on and nobody knows whether it will ever be implementable and the second one is be conservative in what you send and be liberal in what you accept so if I want to interoperate with others I should keep to the standard as much as possible but I should be able to tolerate failures from what the other send and also in the ways the ITF works you see this is quite different from other standards body so there's no voting shares for different companies but there's individuals in a room that try to find consensus and they usually decide whether to get to the consumption by humming so instead of saying are we have raised our hands to vote there's the question who is in favor of the the following proposal please hum now and you just get a feeling on how much the people humming in the room and if you're really in favor of it you can hum loudly and really for the full voice you say okay I can live with that but it's not so convincing you can hum silently and as it's a very deep voice you can't really get who is voting for what so it's in person it's in the room it's easily about viable and it's more less anonymous that's awesome so how's this ITF organized first it's divided into several areas so their applications in real-time or art area is mostly concerned about application protocols and on protocols that are doing real-time transport then there's the general area which is mostly concerned with stuff that touches everything else and doesn't fit in close to one area there's the internet area which is really about the IP protocol version 6 and perhaps looking at what the legacy IP needs for fixes there's of the operations and management which mostly talking about practice practice of how to manage networks the rooting area which is mostly about BGP and other rooting protocols the security area where things like TLS or IPsec gets standardized and the transport area and this is the area we're talking about today because we are talking about transports like TCP quick SCTP and this is also the area where tabs works so what's the tabs working group so the tabs working group is by its own charter concerned with transport services it started off as a group of people mostly thinking about ways on how to actually deploy SCTP and get a CTTP deployment but now with a few people joining and really talking about methods on how to choose sockets really on talking about methods how to use multiple access networks it got a lot more traction so the name is it is a little bit funny because who knows what tabs means for a usual American tabs is the melody you usually play when a soldier on a soldier's funeral which was something because many Americans laughed about the tabs working group but today as we try to retire the BSD socket API it might get a different meaning now the idea is to enable application developers to use other protocols than TCP and UDP without too much caring but so you don't want to rewrite your whole code if you want to use quick instead of TCP actually if it works you just would like that works just the same way you did it before we want to enable transport evolution and describe an API how to use transfers wait API ITF is the idea of just standardizing protocols why APIs so that's the reason why we really talk about an abstract API and there's a really really tight separation of concerns on what the ITF specifies in terms of protocols the ITF specifies abstract protocols so basically what primitives are there what are the basic interactions and APIs for concrete languages are usually made by their own standard bodies so for example if you look at the Unix see API is done by politics if you look at the Java API there's the Java community process for that and therefore what tabs is doing as a pro as a specification is just input for other standard bodies on how could we implement this in our language and it's trying to have fairly broad on abstract concepts there that are usually somehow able to be mapped in different language so the different the API's will look quite a little bit different and can neatly integrate within the language while you have still the same rough same interaction patterns and if you're going from for example objective C to Python you will see a similar API but that's fields objective C on your iOS device and fields Python that's got one Python that's the idea so how does this standardization stuff work in the ITF so we start with writing an internet draft an internet draft is for the academics from you something like a tech report or something like an archive or paper everyone can submit that everyone can write that it has to go through the ITF processing toolchain and provide this beautiful esky text but otherwise there are no real limitations for who and what you can do with an internet draft and then you I want to decide how do I want to publish it and if it gets published it gets an RFC number so how do we do this you can either say I'm working want to do this with in a working group then you carry your internet draft to the working group and said oh I have written this draft do we want to work together on that and adopt it as a working group item and if the working group says go away we're not interested or the working group says oh yes it's fine and the following five people also want to join you in working on that and this is what we did for tabs there's also this individual submission way where individuals can submit stuff but that's rarely used for real stuff because mostly you you're not writing RFCs alone then you get if you say think it's ready and you ask the related areas that are touching stuff your stuff please do a review look whether it works and whether you have insights that said it's a good or it's a bad idea and then you iterate over like till the other people are lucky so this is always because this is the blue file you iterate another round if someone is unlucky and if it doesn't find consensus so usually it gets about five to 25 rounds of rewriting a document before it might get an RFC after this effort after the reviewers from the other area said it's fine the working group said it's fine it's sent for the internet standard internet engineering group for review which is again a group of people who are ATF veterans and are elected by a very interesting process who say okay we have an overview about most stuff going on the itf we think it's a good idea and works or we say oh no you have overlooked something please fix it at the following western after the isg review comes in you send it to the RFC editor for publishing and they will tweak a little bit the wording have a little additional rounds with you on how to fix details and finally if all references are fixed if everything is fine it gets published as an RFC so if you're asking where tabs is at the moment tabs is at the moment here we have it accepted as a working group item and we are actively working on that and we'll soon get the first reviews from other areas so for ITF work it's quite early in the pipeline so what's the idea how does it work what are the idea principles so first it's an event-driven API so modern networking APIs or networking applications work asynchronously you always do select you always have to look about several sockets and several connections so you don't want to block anywhere and tabs has nothing that blocks so everything you do whether you initiate a connection you listen for incoming connection read or write if something's for you you get an event and whether the event is implemented as a callback or as a listener pattern object or some kind of work queue is totally up to the specific language implementation whatever fits into the language would should be used to implement these kind of events in addition if you're looking at these sockets it's really really complicated to get information like fire ICMP ICMP blocked or ICMP rejects it's really hard to get this information and tabs is going to make this information easily available as events to the application too the second basic idea and that's why it's in the tabs working group it should enable protocol evolution so we're focusing on what the application needs the application doesn't say i want a tcp socket but the application says i want a reliable in order stream socket and whether this is implemented as tcp or as quick or as scdp you don't care you get the service you're asking for and you can narrowly define with with properties what you really need and the system makes a smart choice for you second you want to be really flexible a connection establishment time so you want to use happy eyeballs for everything for protocol selection so for transport protocol selection for ip protocol selection maybe also for endpoint selection and if you have other preferences so for example you want to use them the cheapest possible link or something like that you just code this into the timeouts or the head start you give for the different kind of connections so if you give a 30 millisecond head start for a certain link these connections will establish first and will be used but if the link is broken or if something else is broken there just using the released preferred or the lesser preferred one that doesn't have the head start still you don't have to wait for some timeout you just get a connection and it feels fast and the third idea you want to be flexible after core connection establishment and you want to use features that modern transport protocols bring you like connection migration if you change your p address you don't want to reconnect you'll just carry on with your connection you want to be able to use multi paths if you have multiple links and you want to be able to use multi screaming to save connection time so your application shouldn't care about whether you need to open a second tcp connection if you want to parallelism or you open a second an additional quick scream if you have quick available and these all should be glossed over in tabs and as the third main concept we want to be able to do data transfer using messages all interactions in model networking applications are message based yes you have a tcp so that's a scream but you usually chop the screams into messages and work then on this messages and therefore we want to support framing and deframing for protocol message based protocols on stream transports because we are not going to change the transport steps we're just providing nicer interface to the existing one that's also the reason why a tab system can without any problem interact to any other system because we're not changing the protocols we're just changing the way you're accessing the protocols it allows to control a lot of stuff on them individual messages that are available in the protocols but not exposed by the socket API today like deadlines so for example in a city you just might not want to transmit a message when the receiver isn't hasn't any use for it anymore so you can associate a deadline with a message and it might be dropped from the send buffer after this deadline is over you might want to send certain messages unreliably because you don't care whether they arrive or don't care too much about that whether they arrive and you can all select this on a per message basis in a nice and suitable way and it also allows you to assign messages to underlying transport connections for multi streaming and for pool connections so if you're today open have a web browser that opens the connection to some cdn note or to some other website it opens a bunch of connections and distributes the request among them we can do this in the socket API and not you don't have to implement it yourself this should be done by a tap system how do we last this with two concepts that are central to do this the first one is framers framers are pieces of code that allows you to chop a stream transport into individual messages and the nice idea about this is you can just write the piece of code and then it integrates with the buffer management and back pressure management and you only get a message in your main application code once a complete message arrived or a chunk of the message if the measures are too large but you can really implement this in a nice and sensible way I think we should also be able to offload this for example to hardware so if you have something you are able to offer a lot of stuff into your hardware it might also look for the messages and only send the application into in software interrupt when a whole message arrived for it and finally this concept is mainly for chopping through stream protocols but we might tweak it a little bit and we might be able to implement just simple protocols also within this framing layer and finally to control all that stuff we need some mechanism and this means to configuring our transport properties so we have selection properties that influence pass and protocol selection we have connection properties that influence per connection behavior and with message properties that influence per measures behavior they just used like a dictionary and we have well-defined namespaces so we have default namespace for all stuff we are currently writing in this r of c and we have different namespaces for example for transport protocols specific stuff you just write tcp. and the property and you know this transfer property is only used if the connection is done using tcp if it's done using quick this is just ignored so how does this interaction work so if we have this on a textual basis we say i want a connection to example or using htps i need a reliable transport and please optimize for local latency in the tap system says oh yes nice here's a connection object now you now can send you and receive your messages on this and you don't have to care about anything else so it's not thousand one thousand two hundred lines of code but it's about 20 lines of code to could establish that connection so there is this nice sgr diagram from the architecture from the current draft that's saying basically the idea so you start off over with the pre-connection that's an object where you do all the connect configuration on you want to connect you specify a local endpoint a remote endpoint for example a hostname you specify selection properties and you specify defaults for the connection and message properties so you can for example say we have if even if i get a reliable protocol i want all my messages sent unreliable if possible you can already specify this in this stage then you go into the pre-connection on the pre-connection you can either call initiate and get a connection object once the connection is established or you can call listen on it and get a listener object that is listening for your connections once this initiation works you get an event so in this case a listener you you get a connect connector received event or in this case get an initially completed event and have a connection object on this you can send can easily call send with a message you receive messages events out of that and every close it goes to close connection fairly simple so how does this work look in abstract code so we have remote specifiers say i want a remote specifier with the hostname example com with the service https i want transport properties reliable in order stream this is a shorthand you can also specify a bunch of um properties on that you want reliable that you wanted in order and so on and you say i want the capacity profile low latency added to it that you optimize to latency then you configure your security parameters for the encryption and create a pre-connection object with all these parameters on the pre-connection you can say i want an htp framework that i just want to be want to get htp requests and responses as messages and don't want to care about anything else and then say pre-connection.initiate and get your connection object back the connection object then sends you an event saying oh i'm ready the connection is set up and then you can send your messages you generate a new message context configure it you want a lifetime of 200 milliseconds and tcp low delay if tcp is used if not you don't get it and then you have the connection i can send a program so Theresa who might be in the room here um and a few students of us and i have written this pyson azonc io tabs as one example implementation which is still on top of the bsd socket api to just see whether this how this fits into python and we see have this as an example application that does exactly the same as the two slides before of pseudocoded so we're not the only implementation at the moment there is much more complete as our python implementation uh the apple network framework which is available since ios 12 or 13 i guess so it's it was a better in ios 1200 will now come in ios 13 as default network communication framework and it's based on the ideas of tabs um and there are also some other projects which are need and socket intents which were pre-runners that gave input into the whole process but you can get some ideas on what a tab system could do from these frame from these implementations but basically if you want to use tabs today uh and your programming for ios you can just use network framework and you get most of the stuff i was talking about to you today just today by using it if you're interested in more stuff the documents are in the itf data tracker so you can see it if you're interested to collaborate if you have comments on that subscribe the tabs mailing list from the date itf data tracker and start discussing about that stuff we love input we love ideas uh we love if you see that there are problems because we really want to have the next generation socket api there and therefore we need input of whether use cases are sort of or whether this works nicely and if you want about the latest version the latest discussion there's also github repository where we also have github most detailed discussions as github issues so if you have a need or if you found a right spelling error in the documents just make a github issue and we'll take care of that so with this i'm finished with my talk and i'm happy to take questions hello hi okay sorry we just had a technical problem so thank you so much this was phil's and we have a bit time left for questions so please don't be shy and come to the microphones and ask away thank you you mentioned that the new api is supposed to include deadlines are these real absolute deadlines or just timeouts this is a good question so so we don't think that anyone will really implement a real time networking api because that's really really hard if someone is going into the area of deterministic networking where you can have real deadlines this might be as exposed as well as a transport property but the the timeouts i was seeing here are mostly advisory timeouts like gear stack we really don't need to retransmit this packet if it has lurked for about 300 milliseconds in the output buffer so it could be real deadlines at some point in time but at the moment we are just are for the actual implementations today we're thinking about per message timeouts thank you so much next question please so in your python example you awaited the send and then registered some sort of callback on receive after you awaited the send that looked a bit weird for me why do you need to await the send if you use callbacks anyway you don't need to wait at this point in time it was just in the example but you don't need to await it in this moment and maybe a related question so what kind of you said you wanted to be protocol independent and obviously there are many communication patterns that are interesting beyond just bi-directional byte streams so do you only plan to support bi-directional byte streams or something message-oriented broadcasting what so as i already told in the talk we are really looking about messages if it's possible oh okay so the default interaction should be messages and we have the framers to chop is byte stream into messages and that also should allow you to migrate from stream-oriented transport like tcp to a real message-oriented transport once this is available in your current deployment scenario if you're not looking about point to point we have some people who joined the tabs working group and joined our work that are really looking about source specific multicast and we definitely want full multi-cast support within this api but that's still at an early stage that's not in most of the implementations yet but it's coming so it's definitely on our agenda to include multicast in a usable and nice way and in a portable way thank you so much and i think we have a question left yes thanks for your talk do you foresee already how much of the implementations will be user mode and how many or which parts of implementation will be user mode and which parts will be kernel mode especially regarding the async parts which are usually done differently on different operating systems that's a very good question if you're looking at the implementation that are currently out there so there's our personal information is mostly of wrapper as the need implementation also is mostly a wrapper uh around the socket api the implementation is actually doing most stuff in users mode and it's just using the kernel mode for demultiplexing um and i think this is the way it should go in the next years so on the other hand i think seeing that both is possible should also allow an implementation to change that some at some point in time so hopefully it's possible to start with wrapper on the bsd socket api and go to some user space networking implementation at a later point in time without changing the api to the application that's what we hope to achieve thank you so much so um thank you so much to philson please give him a great round of applause