 So for those of you who weren't here in the last talk, I'm Drew Gallatin. I've been a free BSD committer since the 90s I really really like fast stuff at making things go fast So the first thing I worked on in free BSD was the free BSD ports of the deck alpha with Doug Rapson And I kicked around doing stupid stuff in the network stack and Now I'm really lucky because I work for Netflix and I get to play with really fast machines That serve real traffic to real people on the real internet So I'm here to talk to you about what I'm calling new mass siloing and the free BSD network stack and Or really really what this is is to serve 200 gigabits a second of TLS to Netflix customers from a single machine using free BSD of course So why do we want to serve this much traffic basically since? 2016 we've been serving at roughly 100 gigabits a second with with Colonel TLS from you know a single What we call a flash appliance And we want to continue to drive our costs down and to consolidate things and reduce density So we want to try to do 200 gigabits a second from a single box So in order to explain why this is a challenge. I need to first talk a little bit about our workload so we use free BSD current and We're basically we're basically a web server. We use the engine next web server and We serve all of our video via send file and Colonel TLS And like like if you're here in the last talk, you know, we enable Colonel TLS now with this TCP Tlex TS Tx TLS enable try to say that five times fast So why do we need 200 gigabits a second? Why do we need new myth for 200 gigabits a second? And in fact, what is new I'll explain new man in a little bit, but first let me talk about why About where we are with 100 and what where we need to be for 200 So for a hundred we started off with a Broadwell Xeon for our original hundred G in 2016 or so And that has about 60 gigabytes a second or about 400 gigabits a second of memory bandwidth about 40 lanes of PCI express and Now we've moved on to newer Intel generations Skylake and Cascade Lake Which have 90 gigabytes a second of memory bandwidth, which if you noticed isn't quite two isn't quite 800 gigabits And they have a little bit more PCIe gen three, but not you know not enough and So the this mace this diagram will seem a little bit familiar if you were here for the last talk and if I can figure this laser pointer I'll try to annotate it I have my own laser pointer at least I did so basically the the workflow for Colonel TLS as I as I mentioned before is send file will Pull data in from the disks into memory and then to encrypt it You've got to read it into the CPU and then to read it and then to once you've encrypted it You've got to write it back to memory and then once it's been written to memory the network interface card needs to read it to Send it and so if you add all these if you add all these 25s up It's pretty easy math you get you get to a hundred gigabytes a second a memory bandwidth is what you need and from the last From the last slide you could see that the zN only had 90 gigabytes a second So how do we get to how do we get that? How do we get that much memory bandwidth? Well the simplest thing to do is just throw another CPU socket at it So basically, you know, it's it you double everything you've got twice as much memory bandwidth twice as many PCIe lanes and You've got two UPI links and I'll go into more detail about that later connecting the two sockets and You know on this on these prototype machines, we have eight really fast NVMe drives and we have two 100 gig NICs and We thought well why not get AMD a chance? So let's build a prototype around AMD and when we first started this we were looking at the AMD Naples series machines with The interesting thing is here is that we is that you can do this in a single socket with AMD and just like the just like the The Intel we have eight NVMe drives, but the on the AMD We actually have four NICs and I'll get into why and a little bit later in the presentation But we're not running for hundred gigabits. We're running, you know, four times 50 basically So, you know, once we doubled everything we're like, yeah, we're gonna get a lot of performance boost But actually the performance went down so, you know our normal Workflow or normal workload was we were getting about 85 gigs on AMD and about 130 gigs on Intel at 80% CPU and crazy stuff was happening You know, we get crazy disc latency spikes that would Drive our engine X latency way up, which would cause clients to run away in terror and And I should mention by the way that in case it wasn't clear before all the testing I do is with real Netflix Well, I mean that the not the very beginning testing But most of the real testing I do is with real Netflix clients. So if you live in San Jose or if you live in Chicago You've probably you've probably been served a video from from one of my machines and I apologize So anyway with with no optimization Numa it just wasn't it just was a non-starty We threw more hardware at it and we got either negative results or not enough positive results to matter So we didn't consider doing doing Numa for a long time because of earlier results that were very similar to these that we did in 2014 to 2015 So Now we got to understand the problem. What's what's Numa? What does this Numa stuff mean basically means non? Uniform memory architecture or memory access depending on who you talk to so basically it means that Stuff can be closer to one CPU than another like back in the good old days Like you know 15 years ago before AMD did hyper-transport and before Intel did did QPI Basically the way a multi-socket system looked was kind of like this You've got you know the central IO hub or Northridge or whatever you want to call it sitting in the middle All the CPUs plug in equally all the memory plugs in equally all the disks plug in equally all the network cards plug in equally Everybody has equal access to everything. It doesn't matter if you're on Why can't I figure this out? There we go if you're on this CPU and you want to talk to this disk Hey, great. Go for it. If you want if you want to go to store it in that memory. Yeah, it doesn't really matter The problem is that these were slow and expensive and complicated to build and so CPU manufacturers figured out that it was better to basically sort of build a network on the on the motherboard And you wind up with something looks kind of like this where basically you have what's essentially two separate two separate systems that are tied together by this thing we call a pneumo bus and What that really means is that? Stuff on the left side is basically his own computer and stuff on the right side is his own computer in these red circles We call them a locality Call these locality zones a pneumo domain or a pneumo node and so what that really means is that if if you Are on this CPU and you want to read something from this disk It's got to go across and ideally into your own memory or and if you're on the CPU and you want to Access this memory, it's got to go across this pneumo bus Or if you run if you want to send something on this network card and it's stored in this memory It's got to go across this pneumo bus and the problem is there's only so much bandwidth on this pneumo bus And then once you get into AMD you get into something that looks even weirder And this is why we've got four network cards on AMD so we can have one each one of these red circles but basically with AMD you have you know pneumo links between the the different the four different pneumo nodes on a chip and You've got four pneumo nodes, which is kind of a disaster, which is why the AMD performance actually went down so much compared to the Intel So there's a there's a latency penalty to go across these links It's you know from everything I've read and what I've seen it's depending on manufacturer and revisions and stuff It's about 15 nanoseconds give or take give or take 15 nanoseconds The real problem is when you're sending a lot of bulk data across these links 15 nanoseconds can turn into 500 nanoseconds can turn into you know even milliseconds in some cases Which is really really bad if what you're trying to do is read, you know Kernel text that's on the other domain if you're trying to write to you know a Global variable if you're trying to read to the end page and you've got to go You've got to wait for some bulk data transfer to pass That's really really bad and the CPU utilization goes crazy So And the bandwidth is speaking of bulk data is you know roughly from what I've read in and they try to obscure these things by talking About gigatranspers per fortnight or something Which makes it really which makes it really really hard to figure out what you what you actually get in bandwidth But from what I've able to figure out it's about 20 gigabytes a second per UPI link or about 40 gigabytes a second per infinity fabric link And the AMD is even more complicated because it depends on the memory speed and on the new ones There's multiple there's multiplying factors, and it's kind of crazy So anyway What I came up with was after playing around with a lots of little Optimizations to do things like move the VM page array to make the VM page array be backed by local memory on each domain I decided well I'm just kind of playing with the small stuff and what I really need to do is figure out a way to organize things and Keep the bulk data off the new links Because the bulk data like I was saying we'll congest the new Malanken will slow down anything that you haven't managed to localize so I'm gonna go through basically the worst case if you do everything you possibly can wrong On it on a two-node machine. So basically what you what happens here is you're this the CPU he wants to read Damn it this this CPU he wants to to read memory from this disk and set encrypt it and send it out the network So he starts reading he starts reading from the disk Whoops, but whoops it goes across the new Malanke and into the other nodes memory because he wasn't paying attention when he allocated his memory and then he wants to encrypt it, so he's gonna have to read it back across the new bus and Whoops he forgot to allocate it on the right note again So he's gonna write it back into the he's gonna write it back into the wrong nodes memory and then he wants to send it out in the network and You know, maybe he should be using this network card up there, but whoops, he's gonna send it out those this other network card So we end up crossing the the Numa bus four times and you end up burning basically a hundred gigabytes a second of bandwidth So at this point the fabric is gonna saturate and you'll have CPU stalls. You'll have latency spikes You'll have all kinds of crazy stuff So the best case is basically the case that I showed you kind of at the beginning where You read from you read from the disk and in the close memory The CPU reads reads it from the close memory encrypts it writes it into close memory and then sends it out at the network card that's closest to him and That's beautiful. There's no Numa crossings. This is how you know AMD and Intel would really like you to use these machines in an ideal world So How can we get as close as we can to this best case Basically all the simplest idea. Well, let's just forget about let's just pretend as two machines Let's have one be what's have one VM per Pernuma node and pass everything through except if you do that You're gonna double your IPv4 address space and every IPv4 address is precious and so at Netflix When you get a when you get a movie or a video or a TV show or whatever You press play and your client talks to Netflix stuff running in the Amazon cloud and that stuff in the Amazon cloud figures out where the where the where which which machine has the file you want and It's close which is closest to you and which is next closest to you and next closest to you and so on and gives you a list of URL's where you can find that that file so if we Double the number of machines then we're kind of doubling the work that we have to do in AWS In fact, if we're running VMs, we're kind of more than doubling it because now you got the hypervisor to manage to so it's kind of a non-starter for that reason and The next idea. Well, what if we use multiple IP addresses? Oh, wait a second multiple IP addresses We don't want to do that. So for the same reason as before basically So basically, how can we get as close to the best case as possible while using lag and LACP to combine the Knicks and To just use one IP address and while keeping the catalog the same so that, you know, AWS doesn't have to do any extra work So we need to somehow impose order on this chaos and the first idea I came up with which was not the winner Was what I call disk centric siloing which is basically try to do everything you can on the Newman node where the content actually lives And the other idea I came up with was network centric siloing Which was try to do everything local to the network card that the connection came in on and If you don't know anything about LACP basically what you need to know is that When you're speaking LACP Your your switcher router that you're talking to will take a connection and will hash it based on the the based on on some n tuple and It will decide which of the lag ports that you're connected to that it will the traffic will go over So you have no control over that So basically we try to in the network aware The network centric siloing we try to do as much work as we can on the Newman node that came that were the where the LACP Partner decided that the connection was going to live So let's talk about the thing that didn't work first So basically the idea was to associate a disk controller or an NVMe Really an NVMe drive with a new manode and then to basically Propagate the NUMA affinity through the VFS layer until we got to a point where if we looked at a file if we looked at a v-node We know what NUMA node it was associated with and again, but again, we have to do all the work to associate network connections with NUMA nodes and the idea is we want to Move the network connection to be as close to the to the to the content as we can So that if it comes in on one lag port, it'll end up going out on the other So after we move everything there's going to be zero NUMA crossings for bulk data The problems with this was that like like I said, there's no way to tell the LAC partner You know, I don't want to come in this node. I want it to come in this node. You can't do that so basically While you're setting up the connection while you're doing the get You're gonna have your admin before you before you know what content you're talking about your acts are going to be going And your replies are going to be going out one port then as soon as you figure it out It's going to be going out the other port so you're gonna have stuff about going out both ports and with TCP That can lead to reordering and that's kind of bad news and I think Randall would be upset if I did that So the other problem is that Unbeknownst to me Clients will actually reuse the connection and make multiple make multiple requests in the same connection for those of you that Love or hate the newish feature where if you're on the Netflix homepage This crap starts playing all the time That the it'll reuse connections for all that junk. That's that's that's that's playing all the time And so you'll end up having stuff coming from all the from all the new nodes on the same connection So I was seeing connections being moved around willy-nilly and TCP to retransmits going crazy and decided it was a bad idea So I went back to the other idea, which was the network centric siloing, which is basically It's basically just dumb plumbing and that's good because I'm just a plumber So it's essentially you have to associate the network connections with the new nodes and you have to allocate local memory to back to back the media files and You allocate local memory for crypto and you run the TCP Pacers and on the local on the local node and You manage to choose a local node to send the To send the date on So how do we do all this? To associate the network connections with the new nodes basically I'm gonna go through some kind of nitty-gritty Details of what's been committed and what's in review and all that kind of stuff So if you're not a developer you may want to check your phone So basically I Added a new mode to a new a domain node to the struck down buff There was just a tiny little bit of room and I stole it and That was added a few months ago And I also added a new a domain to the IF net struct also a few months ago And this is kind of all groundwork. So try to stay awake Basically and and once I once I did this when a driver Received a Packet he can tag that packet as he receives it with his new mode and That's in that's in the tree to and I also added a new a domain to the ion PCB struct which is also in the tree and Basically the idea is that when the TC connection connections is born when in sin in sin cash in the sin cash expansion you've got a new mode there in the mBuff that caused the the because of the connection to get established and You can then propagate it in the ion PCB table and so The next the next trick is to make sure that you give that connection to the right engine X worker and I'll detail that in a little bit So the other and the other trick is what I thought was going to be a hard job, which is to allocate local memory for the For send file to back to video files and I actually came up with this gigantic patch to Plumb, you know all the way from from send file down into done into the VM page allocation routines a new manode and It turns out that I don't need any of that stuff basically if you have a first-touch policy and engine X is bound to the the right domain then everything just works automatically and I want to thank Alan Cox and Constantine for pointing out my stupidity and Making me realize that the VM system already did everything I needed it to do So that was tweaks in my life. I'll never get back so The other trick is to allocate the local memory for the for the TLS buffers so basically We run the TLS worker threads. I mentioned in the last presentation We basically have a thread pool of per CPU TLS workers And the idea is that normally connections are just hashed to them using Using just software using something software hashing on the on the end tuple so that the same connection goes through the same You know TLS worker, but what I did was add a filter based on new mid domain in front of that so that Connections that were associated with node zero will go to a worker That's run it will be hashed to a worker that's running on the CPU and on node zero and no similar with node one so And I also set the the KTLS workers to have a Domain allocation policy so that they'll allocate stuff local to their domain So that way Well, we're doing the crypto on the same domain the connection lives on and we're doing the crypto into and out of local memory And this the KTLS stuff is in review currently So how do we choose the right? lag port to go out of so like I said earlier mBuffs are tagged can be tagged with the new mid domain so When we go through IP output or IP 6 output we we tagged the outgoing mBuffs and I've organized I've done a patch to lag which is in the tree which is enabled if you have the used Numa Option set for lag where basically you've got this higher you similar to KT Lash You've got this hierarchy rather than just hashing directly to any lag port in the system First you you filter by new mid domain and then you only choose a lag port in that dome That's connected to a nick in that domain and obviously if there's no nick in that domain It'll fall back to just hashing to anything so that you can still send even if there's even if that lag ports down And that's in the tree So how do you choose the right engine next worker this is this was the hard part for me So right now we've got this S2 reuse port stuff that came in I don't know about a year ago or so where essentially what that means is that You can have multiple Threads multiple processes share the same listen socket and again, it's kind of like lag things are hashed fairly Connection in new connections are hashed fairly to these listen sockets And that allows you to have you know a bunch of njx workers all listening on port 80 and port 443 And so what so the the obvious thing to do is and everything's obvious in hindsight the obvious thing to do is to filter that by Numa domain so that You end up with You end up with a you know You end up with a new socket option unfortunately because of the way engine X works And I can go into detail for me. Yeah, but sure why not why not? So the way engine X works is the master process start starts up creates all the listen sockets and then forks off his children and At least you know for mere mortal reading the engine X source code There's no way to tell which listen socket is going to go to which child to which domain So the easiest thing for me to do was to make a new socket option Which was called after the child had had had inherited his listen socket and sort of taken possession of it And after he bound himself to to a to a CPU Then I can call a socket option where the colonel says ah you're running on this CPU Which is on this domain and you want to you and you want your listen socket filtered there So that basically builds up another one of these hierarchical models where first you filter per Numa domain into a listen socket And then you hash among all the different workers that are that are listening on that domain and like lag There's a fallback where as if there's nobody on that domain It'll go back to hashing among all the listen sockets on that port globally and that's also in review and So Let's go back to that same diagram where I talk about the the worst case So in this model the worst case is basically if you always choose if you always get unlucky and your content is always on the wrong Domain, so you know going back to what we talked about before We're running on the bottom new on the bottom Numa Numa domain on the bottom CPU And we wanted we and so it request comes in and we're reading that we're reading data from this disk on the top So we go for one Numa bus crossing read it into local memory and Then we and then we read it out of local memory and yay We're encrypting it on the right on the right CPU and now we're writing it back to a crypto buffer that we were smart enough to allocate in the right CPU and Now we are going to send it on on the local nick because the connection came in on this on this bottom domain originally So now in the worst case, we've got one Numa domain crossing and so basically You're doing a hundred percent of the the disk reads the NVMe reads across Numa Which is about 25 gigabytes a second on the fabric which is Much less than 40 gigabytes a second second of the fabric bandwidth But the the nice thing is the average case which is is better The average case is about a half a Numa crossing because you're you're gonna get you're gonna get it right about half the time You're gonna get an unlucky about half the time so it's about 50 percent across the fabric and that's about 12 and a half gigabytes of data on the on the fabric and The nice thing is in this case the CPU doesn't saturate and we had 190 gigs So for the for node It's the average case is a little bit worse because you've only got a 25% chance of getting lucky So 75% is across Numa and you get a little bit higher bandwidth going across the Numa bus But that's still less than the 40 gigabytes a second and we can still get better than 190 gigs So here's what everybody's here to see One thing I should mention before I go into the performance results. This is sort of a game of moving goalposts When I first started looking into this we were looking at the At the then the Naples the first version of AMD and we were looking at the the Skylake Intel and since then You know, both of these motherboards have had their CPU swap to the to the latest and greatest from the different manufacturers So those first initial results were from free BSD from like, you know fall of 2018 ish with the older CPUs these new results are from Just you know last week with a AMD Rome Rome CPU and a Intel Cascade Lake CPU So and and and this is why the Xeon performance is lower That's something that I don't quite understand The way I got these numbers was to basically go through and intentionally torpedo all the optimum all the optimizations I've done and When I did that I was surprised a little bit by the fact that it's a hundred and five rather than a hundred and thirty and I think some of that is some of the work that mark and Jeff have done to make things to make things better for Numa where I guess if you make things If you make things better you kind of make things worse if it makes any sense There's some stuff in UMA that that we have turned on at Netflix, which will try to Sort Basically it will try to if you do a UMA allocation of like an mBuff or something on one domain And you do a free on the other domain It will try to return the the memory to the proper domain rather than mixing up the UMA zone so that so that you can still have a nice Numa Zone but the problem is that once you've once you have Freed a lot of stuff on the wrong domain That option gets really expensive because you're taking a lock and you're moving things back to the proper domain and and when you're doing things Right, it's awesome, and you're not when you're doing things right, and you're not doing a lot across domain freeze It's great, but when you're doing a lot across domain freeze, it's expensive and so basically I've actually measured with the Intel PCM tools the Fat the the you the QPI utilization if they give you this this metric that tells you how much of the memory controller accesses were remote versus local and it goes from 40 percent to 13 percent and On epic because the four nodes things are even worse so that you go for even better. I guess you'd say So you go from 68 gigs to 194 gigs And for people like visual representations, this is the the Xeon before and after so roughly a hundred to roughly 200 and the and the This is the the utilization on the on the QPI bus again going from about 40 percent to about 13 percent and Here's the bandwidth on the on the AMD going from, you know 60 ish gigs to 195 gigs and For people who like green screens with raw data This is the output from PCM dot X showing the memory controller traffic as I was As I was mentioning this is the UPI added traffic memory control over a memory memory controller traffic Which is point four and that's that's bad and This for people who aren't familiar with this. I wrote it. So it's my favorite tool It's something I call n stat which is a I got sick of having a window for VM stat and a window for for net stat and either running net stat with a with a With the delay of eight seconds or doing the conversion in my head to convert from from the from bytes to bits So I wrote a tool that that does It spits out all the stuff I care about so it's my tool I can do what I want it's in ports though so anybody can use it but Basically, it's this is the output gigabits per second the important fields here the number of TCP connections the percent CPU and things like system calls and how many interrupts and context switches and how much memory is free in the machine and Input an output in million millions of packets a second So that's this is the this is of course the before and This is the after you can see the 13% remote and that's a that's a good number and you can see 190 ish 191 gigs with a hundred and fifty thousand TCP connections and in the 70 ish percent CPU with you know a hundred thousand context switches a second thank you TCP pacing and and For people who like Looking at internal Netflix metrics. This is our internal bandwidth Graphs showing each showing each link separately stacking to about a hundred a hundred ninety when the machine is finished ramping up and Here's the same stuff from the AMD and I've crossed out the the model because it's not a released model It's a roughly equivalent to the the model number I said at the beginning the presentation except it has a lower clock speed So the actual AMD results would be better than this because the actual AMD the real AMD CPU that's like this would be higher clocked So this may be doing AMD a slight disservice by mentioning this But I would I would imagine this CPU number would probably be maybe eight or ten percent lower on the on the real AMD part And again, here's since Then the other the big frustration with AMD is that they don't export enough counters for us to be able to measure the fabric utilization And we've we've complained about it to them And I've heard the Linux folks are also complaining because Linux doesn't have it either So if you happen to have a good relationship with AMD complain about it, too, please Anyway, so here is the the data from the the green screen data showing you know hundred and ninety four Gigabits a second when it's getting close to ramping up and This is not nearly as pretty because we're not used to the way these things were numbered They were numbered there were two port NICs, so they're numbered, you know zero two four and six and No other machine has that many NICs So it doesn't feel it doesn't fit and nobody's ever picked a color for it But this this bar is the you know roughly 200 gig 200 gig line And it goes up to 400 because there's because there's four 100 gig lengths active in the lag But it's not really going to go up to 400 because some of them are gen gen 3 by 8 links so That's it. I've rambled on for a long time about something really simple So if anybody has any questions, this is this would be the time You're mentioning that you are using Amazon Machines Because you have a lot of data books you have a cost with these ones What would the cost to install some Some machines for management in different part of the world. Why are you using the Amazon services? It's above my pay grade In terms of like you're worried about like you know like a million connections on one domain and no connections on the other or Just exhausting resources and the We deal in you know orders of thousand or tens of thousands of connections hundreds of thousands of connections and it on That level it's rough. It's it's gonna it's roughly gonna be fair because lag is gonna be you know Hashing to the different NICs in a fair way Obviously if one link goes down Then you're gonna lose half your bandwidth But you've still got enough capacity in that in that Newman node where the link is up that you're gonna be fine Does that kind of answer the question? Because I think it would be a different story if If a you receive you constrained Because you were doing a lot of work that wasn't Basically if a connection was doing more work than you anticipated I guess I guess would be the way the way to say it Right if one connections could somehow or a small number of connections could somehow cause an inordinate number amount of CPU use But that's not something that can really happen You've got four NICs. So you have a theoretical bandwidth of 400 Well 300 actually well because it is that it's a it's a it's a an older mother board So it's only a PCI gen gen 3 so they're not they're not hooked up with full bandwidth Yeah, I was gonna ask it is there is there a theoretical reason why like You got close but not over the 200 limit actually, I mean in What I was testing earlier if I if I let that guy ramp up I think I could get it. I well, I think I know I got over 200 the problem is that when you do that if you have if lag is hashing everything fairly which which it is then you are screwing over the people that come in on the on the on the links that are limited to 50 gigs Because they're gonna be banged with constraint and TCP is gonna be you know Gonna be sort of seen congestion because the NIC is gonna be dropping packets on the way out 100% all of them so for capacity planning purposes we we have to For capacity planning purposes and for my performance work we do everything with a hundred percent TLS 660 ish percent on AMD and 70 ish on Intel That's come down over the years the The CPU use now for a hundred percent TLS Thanks to a lot of the work that's been done in the VM system by Jeff and Mark and Constantine is down in the upper 50s and that you see the Broadwell machines that I was talking about earlier at the They're so close to the memory bandwidth limit limit that the that the performance It's like it's like kind of like a hockey stick where as the memory bandwidth This is the memory bandwidth on this access and on this access and then the CPUs is like this as you get like The limit the hard limit is like 60 gigabits a second But as you get much of the further over you 50 you get the some more you sort to climb up on this hockey stick and The any little thing like every cash line on those machines like any cash line is sacred So so basically any cash miss you can you can avoid? You you move further and further down that that hockey stick and as you save an inordinate amount of CPU like you could eliminate I mean I There was an early optimization I did where I eliminated You know looking at the third line at cash line of an mBuff which saved like two or three percent CPU on those machines the Same optimization on you know a cascade Lake would probably save almost nothing because it's got excess excess bandwidth Set answer your question Where a single listen call would return 16 sockets the on the per CPU PCBs and then he added Like a call where the worker could Get a socket and query on which CPU PCB this socket wasn't then buying the worker there Okay, would an approach like that help you for the engine X matching of the worker threads It might I would this with this was part of his RSS work. I Think but it never made the tree. I think because of UDP Refractment assembly problems. I maybe we can talk afterward because I'm not familiar with that piece of it Instead of bouncing Going once going twice all right, I think I'm done. Thank you