 I start, do you start? I start? You start. OK. Hi. Hello, I'm David Stewart. This is John Dickinson. I work at Intel, work, manage a group of engineers working on server languages, primarily talking today about the work we're doing with Python. Intel is actually doubling down on Python, and it's really important, I think, from the OpenStack community perspectives and so much of OpenStack's written in Python. So excited to talk about some of the work that we've been doing jointly. Thanks. And as David said, my name's John Dickinson. I'm the project technical lead for OpenStack Swift. And I work for a company called Swift Stack. So just as a little bit of a background to start with, Swift is storage for the internet. And I know that there are several people here that I recognize that are actually deploying Swift's for internet users. It is, I was really excited earlier this week when OVH gets to get up in a keynote and say, yeah, we've got 75 petabytes stored in Swift today. And being able to talk online to people every day, that's people who are using Swift to store data. So you've got web content. You've got backups. You've got file document sharing. You've got scientific data sets, genomics research. You've got movies. You've got all kinds of crazy things. So in my opinion, Swift is kind of a big deal. And it's being used all over the world. The one thing I've tried to say over and over and over again is my vision for Swift is that everyone will use it every day, even if they do not realize it. And we see that when you go help your kids with homework and you pull up a picture on Wikipedia that comes from their Swift cluster. When you go watch a movie and you see that, and you know that that movie happened to be made with the help of storing data in Swift. And you wanna go watch a TV show or store backups or do documents sharing with your bank or something like that. Swift is used in all these places today. So we're getting there. And for a lot of these use cases, there is something that's really important. But I think Maverick said it best. These people need speed. And they need it in such a way that we, Swift is good for scalable distributed storage, but everybody always wants something to be faster. So as an introduction to how we are focusing on this work that we've been doing, this is basically what Swift looks like. We've got a two tier architecture. The client normally talks through a load balancer to proxy servers. The proxy servers are responsible for implementing most of the API and coordinating all of the requests between the client and the storage nodes. The storage nodes on the other hand are responsible for actually persisting the data. And the great news about this is that it's a very flexible design and architecture. It means that if you need more performance on your end user API performance, you can add more proxy servers. If you need more capacity, you can add more storage nodes and you don't have to do those things scaled at the same time. So it's really great when you do that. And this kind of isolation of concerns also means that these different tiers within the storage system itself have different hardware usage patterns. The proxy server is gonna be very CPU intensive. And this makes a lot of sense. Basically it's taking network packets in off the wire and it is pushing them back out of the wire after doing a little bit of computation on them. It has to do some things like check sums, it's doing mostly just packet shuffling and a little bit of computation to figure out what storage nodes is it actually gonna go talk to. The storage nodes mostly are dominated by IOPS. You've got a whole bunch of spinning drives and spinning drives are a lot slower than networks and CPUs in general. So that's what you have to optimize kind of on that storage tier. But for this talk, what we are going to focus on exclusively is that proxy server. What happens where you have some CPU constrained information? This is not the first time that we have looked at PyPy in Swift. We played with it a little bit a few years ago and there was some great progress made. And I think, I guess maybe almost conceptually, if not an actual truth. This is building on work that was started back then that we were able to kind of reignite when David and I started working together on this. Okay, so thanks, John. Yeah, as I said, Intel's doubling down on Python. It's something we're really investing and to try and make it run great. And sometimes it's utilizing features in our processors. But in any case, the work that we're doing is totally upstream. So it's like upstream all the things. That's really, we wanna make sure, I talk about like putting the cookies on the low shelf so everyone can get at them, right? So that's kind of the approach that we're taking. Python, of course, has a 25-year history, but it's really a language definition as opposed to an implementation, even though Cpython is the default. PyPy, as a JIT implementation, has been around for 10 years. The thing that we, it was actually kind of funny. We've been trying various different approaches to optimize Python and we ran Swift with PyPy and just stunningly great performance. So we thought, well, what's wrong with this picture? Why isn't everyone doing this? And so, I think we felt like we just could give it a little bit more engineering love and I think that we can really make this available to everybody. So that's been kind of our effort and our focus. Partnering with Swiftstack and with John and with others to try to really deliver this as a capability. So let's dig into a few of these things as to what we've seen. And I know we talked about reading the notes, but frankly, I can't see them. This is a, if we switch, maybe that'll work better. Oh, what are some of the goals of PyPy? Why, what is the purpose of PyPy? What is the reason for it being out there? Yeah, you know, and PyPy started out as a project that was funded even by the EU 10 years ago. And so it's been, generally speaking, kind of a very grassroots sort of thing, sort of like Python itself. And from that standpoint, but their total focus has been speed, right? And so if you think in terms of interpreted languages, I'm gonna talk a little bit about this later as to why, how this works. But really, memory usage, speed is absolutely their priority. And we actually, there's a set of benchmarks called the Grand Unified Python Benchmarks. We call them guppy. And it's what the community looks, Python community looks at for performance improvement. And with PyPy, we're getting on average 15x speed up for those micro benchmarks. And some are as much as 300x speed up. So they really have achieved this goal in terms of, you know, and it's, you know, some of the way it achieves this is through some different techniques. Some of it is they don't use the sort of very stack-oriented machine approach that Python uses. And so they really are achieving a lot more. And the most important one is a JIT, a just-in-time processor. So instead of having every Python instruction interpreted, you're actually able to take the hot, PyPy takes the hot functions, generates native code, and then runs that native code. Now, this is a technique that for all, you know, for interpreted languages, all of them take this approach at some point in time. Java, JavaScript, you name it in terms of interpreted languages, they all go the route of a JIT. And typically, if you take hotspot in Java as an example, they have amazing performance improvement as a result of using a JIT. So this has really been, and there's some challenges, though, when you try and use PyPy, as some people have often come up. One of them is its approach to garbage collection is totally different from CPython. In fact, one of the things that you'll note when you see the instructions for how you can use PyPy for your own setup, we have a recommendation for an environment variable setting that actually avoids some of these garbage collection issues. But in fact, it does garbage collection differently. And in fact, if you have any sort of modules that sort of depend on the garbage collection behavior of the default CPython, you might find that there's an issue or two relative to that. So this is, again, one of the things that we're very focused on in terms of working with the PyPy community. Actually, we had a great workshop with the core developers. We brought everybody together for a sprint in a room, together for a week-long discussion of micro-architectural tools that we have, as well as the techniques we can use to really accelerate PyPy even further. And some of these things like garbage collection are one of the things we're also trying to continuously improve. Another challenge that a lot of you may have had with PyPy in the past has been deployment. So if you think about, gee, I just wanted, if you're an operator with an OpenStack cloud, you're trying not looking for more complexity, usually. So I think that's been kind of a barrier for PyPy adoption. It's like, well, you know, and some of the things that we've solved in this respect, too, were sometimes there were some libraries that need to get recompiled in order to work correctly with PyPy. A few additional steps you have to go through. It's not as easy as just using the default Python. So again, this is something we really want to work on continuously improving. Actually, my hope would be, and the conversation I've had with Guido von Rossum, I don't know if you know Guido, he's the benevolent dictator for life for Python. And as he and I have talked about this, he's basically said, yeah, first he says, I don't have a dog in the fight. Basically, when it comes to performance, he's more interested in language design. But he's basically said, the future that he sees, he says, at some point in the future, it may be the default way you run Python, is essentially through PyPy. So this is the kind of conversation, the kind of community work that we're trying to accomplish to resolve some of these issues. For now, yeah, it takes a little bit of effort to get past some of these deployment issues. One of the things we've done is we've come up with a set of instructions which should hopefully help make it easier for you to go through. Some examples, I think, to make this specific is about the third-party C libraries that you might need to recompile. When we first started our testing work, there are a couple of Python bindings to C libraries that we use inside of Swift. One of these is the ability to use extended attributes in, on the file system itself, on the underlying file system that's on the hard drive inside of Swift. So Swift uses this for storing lots of metadata and we need to use this language binding. And when we first started testing, working together with David, that wasn't, we had to recompile Python Xatter to be able to use it with PyPy. Now the good news is, with the sort of upstream community developments, then it's now kind of included in the box, kind of fulfills the Python batteries included model so that it just kind of works. That being said, I think the only dependency that's in upstream Swift right now that has yet to be completely natively ported over in that sense in which it's all the batteries included just kind of works for you is the Liberation Code module that we're using for EC. It's still possible to use it, of course. It just takes that extra step by somebody who's doing the deployment. So those are the kind of things we're working on. Yeah. Well, and honestly, a challenge is the fact that it does run so fast. We actually found, encountered as we were kind of going through porting various modules and parts of OpenStack, we were finding there were some challenges relative to how fast PyPy was running. In fact, there's some Python code that was written that set up a timeout and suddenly things were timing out under PyPy. It's because PyPy was running so fast that the code, the bug in the code was basically being exposed. It's a little bit embarrassing. I hate to say people's code isn't necessarily good, but it had bugs. It happens. It happens, yeah. So an example of this, and I want to be specific, this was an older problem that I think illuminates the concept, but it's not specifically related to the PyPy work is several years ago when we first started commonly deploying Swift on top of clusters that had fully 10G networking everywhere. That was not the original thing we were doing, and normally it was kind of 10 gigs in or more, and then one gig networks inside, just kind of as you did five, six years ago. When we put it on a fully 10 gig network, it turned out that we were just assuming that when you put something on the network, the buffer would fill up locally, and then you would block, and we'd be able to handle other work in the system. So it wasn't like a technically wrong thing that we were doing there. We just never had the option to actually switch to other things that were happening. So we put it on the 10 gig network, and the buffers never filled up, and the system actually never blocked on a network packet write, and so the network just became too fast. And it's that kind of problem that you'll also see occasionally with PyPy if you have code that just is assuming that something might take a few extra cycles or might do some sort of blocking operation that may not always be applicable in something that's running with native code as opposed to interpretive code. I wouldn't think this would be a problem, but in fact, it's a little bit, when things run too fast, it can sometimes be a challenge. But again, as we're going to try and go through as much of OpenStack as we can to eliminate these and try and, now, the fact see Python is slow. I'm not wanting you to hyper-focus on the details here, but we actually run a ton of different algorithms in different languages, and there are a lot of other external organizations that have done this. You can see these all over the place. Native compiled code is always faster. If you have an interpreted language, it's typically much slower. Now, why do you do people go with interpreted languages? It's because the turnaround time to a program is very, very fast, right? This is one of the things that's great about Python. You don't have a compile phase you have to go through. You don't have to have your lightsaber sword fight while you're doing a compile. I mean, Python just works. And that's certainly true for interpreted languages. It's very convenient, but that's why they go to this direction of a jit, which tries to spend more time in native code. In the picture here, this is like one algorithm implemented a ton of different languages. The red here is compiled native code. The blue is just in time. So in almost all of these cases, there are a lot of different languages, like Lua and Python versus PyPy, et cetera. So you can kind of see in every case the interpreted look at Ruby. Man, that's huge in that example. So there's a lot of these examples where PyPy simply is accomplishing the goal that's set out to be. And that's to really get the speed that should be there. OK, enough about us talking about what PyPy is and that it is fast. Let's actually show some results. So like David said, Intel and Swift Stack have been working together closely for several years now. And a few months ago, we were connected with one another saying that, hey, well, these guys over here are looking at improving interpreted languages. Specifically, obviously, Intel wants to make sure those things are fast on Intel CPUs. And we, on the other hand, are looking at Swift. And you know what? Customers like faster Swifts. So let's do that. I should say Swifter Swifts. So we started working on this and wanted to kind of go over a little bit of what our results actually are. So this is a setup that we have in my lab in Oregon. There's one proxy server that's based on our latest generation server processors. We call it Broadwell. I know not everybody's into the naming sort of thing, but we have code names for everything. I think it's the fifth generation or fourth generation. I can't remember. I think it's the fifth generation. Someone from Intel in the room is going to complain at me because I don't remember the name. But Broadwell is the code name. And that's the proxy server. It's a two-socket server. And then a cluster of 15 storage nodes that are based on atom-based servers. So let's take a look at what we accomplish, a PyPy versus Cpython. The one note here I want to point out is that in this particular cluster with 15 storage nodes and one proxy node, you have a cluster that is going to be naturally CPU bound. And if you remember back at the beginning, that's specifically what PyPy is good at doing. And so that is why we have this kind of test setup. And this is using SS Bench, which is the standard benchmark that's just the Swift project uses. So nothing funny here. So what you can see, this is actually a result which first got my attention, which says response time. You could talk about, by the way, performance in a number of different ways. The group that I'm part of, these guys or gals, are just really amazing in terms of understanding performance and how the CPU really affects it. So in this case, what we're seeing is response time. As you scale up users, it's getting better and better. In this case, the green is PyPy and the red is Cpython. So you can see as we're scaling up more and more users, the response time is really slowing down in the Cpython case. But the PyPy case is remaining relatively flat. 87% response time difference. And this, you can think of measuring performance a number of different ways throughput is common for your parallel benchmarks. In this case, response time, I kind of think about this is what a real user experiences. It's like, are they going to experience a lot of delay getting their objects or are they going to get a very fast response? So this one's, I think, pretty meaningful to users. The other thing that we've seen is throughput. Again, that's sort of the traditional. If you go to the scientific sort of approach for how you measure parallel benchmarks is that you think in terms of throughput. How many things in parallel can I get through this cluster? And in this case, we're seeing, again, in this case, green is PyPy and the red is Cpython. You can kind of see that as you start, as you continue to add users that the throughput is really plateauing on Cpython. But PyPy is doing extremely well. We have one data point where it's 111%. Now, I've got to tell you, working in data center performance as I have for about the last 15 years or so, you can claw and scrape and scratch your way to get 10% on a real customer workload, right? What's something that really, like you're running WordPress or Facebook or some Wikipedia, something like that, right? So you just work all year long to get something like 10%. In this case, getting 111%, it's kind of stunning. It's like a complete gobsmacking kind of experience. And so this is exactly the kind of thing that I think really kind of gets my attention a lot. How is PyPy behaving? This is a latency throughput chart where we're actually doing instantaneous throughput measurements. And you can kind of see that, again, green is PyPy in this case and red is Cpython. Another thing to notice here that's kind of interesting, this was actually taken in the previous version of PyPy. We've actually made some improvements here. But you can kind of see, for the first 30 seconds, there is actually not much of a difference between, in fact, they're relatively the same. But you can kind of see a warming up period that goes on with PyPy. This is because it's going through and actually, as it hits hot functions, it's actually jitting them or emitting native code. So at this point, this is kind of why you're seeing, if something is less than 30 seconds in terms of its runtime, you may not see much of a difference between PyPy and Cpython. In fact, you may see it even slightly slower with PyPy. So this is an example of why startup time's really important. One of the things we're optimizing with the PyPy community is startup time. And I'll touch on a little bit more of those kind of interesting nuances. This is a different set of data that was coming from the Swift stack side of results. We were seeing these initial results until it came to us and said, hey, look, we've got this great stuff. So can you do it? And we're like, yeah, this looks amazing. Let's see if we can do it too. And so our test setup was a little bit different. Whereas David's initial results and those graphs he showed were more on 100% read workloads, we started with looking at write workloads. How fast can you, how quickly can you put stuff into Swift? Specifically measured on a requests per second basis. Also, our testing hardware setup was much different. I don't have 16 servers in my actual physical office. So I was putting this on something that's a little bit different. But I needed to still have a CPU constrained. So I have two boxes. One of them has, I think, 22 hard drives in it. And the other one is used as a proxy server. And I think it has 16 cores in it. 16 cores to 22 hard drives on one server is not CPU constrained in a Swift cluster. So I had to learn, actually, a few things that were interesting and new and give me to play with some kernel options and stuff like that. One of the reasons that's, in addition to this ramp up time, that an initial quick run of PyPy versus CPython might not initially look good is that maybe the kernel is just doing good stuff for you, rescheduling things on different cores. If you're not CPU bound, then, well, we weren't CPU bound. We needed to pretend to be CPU bound. So I turned off all but one of the cores. I was like, great, we're not going to do that. Well, even in that case, you still might not see better results with CPython and PyPy. And the reason is because of power management inside of the CPU. The CPU is not locked into a certain frequency. You might buy something that says it's 2.4 gigahertz or whatever. But if it's not doing much, the system will actually slow it down. Maybe it slows it down to 1.8 or even 1.1 gigahertz just to save power, which in general you probably really want. But it means that you may get the exact same performance benefits. It just happened to use less power. And if you actually realize that I can run PyPy at 1 gigahertz, get the same performance as CPython with 2.4 gigahertz, your initial results are going to be like, well, both of them got x number of requests per second. So what's the difference? So in this case, I had the frequency lock, the one core that I had turned on. And then at that point, we can give you some apples to apples comparisons. So when we did this, when I started with one core, we saw a 2.3x improvement from CPython to PyPy, just exact same scripts running puts per second. So we went from 130 puts per second to 303. The actual individual numbers aren't as important as the relative comparisons there. So then we kind of started scaling it up. And with two cores enabled, 1.6 improvements, three cores enabled, we had a 1.5x improvement. And going on, if you just blast all of the CPU cores on, so we have 16 cores enabled. At that point, again, not CPU constrained. And PyPy was running at 0.96, so 96% of the CPython performance. And yes, that's a little bit lower. And to be honest, what I think this is, is probably jitter in the overall testing framework. They're basically close enough. So overall, it's not like we're seeing that in PyPy, it's going to be half the speed if you're not CPU constrained or something like this. So overall, we did see very similar, very positive results. So that kind of leads to the question, why? If I didn't change a single line of code, how in the world does this thing faster? Yeah, great. Great lead in. And I'm going to go, I'm not going to expect you to read these, but I'm going to go through two slides quickly. And then one slide, talk a little bit more. In this case, we're just looking at top. We can see that the CPU is thoroughly saturated. 100% CPU as close as we can. 90% user space time, as opposed to kernel time. And all of the workers are all at 99%. Drill into that and what's going on, you can kind of see here that 80%, you're essentially spending 80% of your cycles in Python when you're running this workload. And that's really kind of interesting. That means that of the time it's being spent, it's all pretty, 80% of it is in the Python interpreter. Then you drill into that and you see that it's actually a third of that time is actually in the main interpreter loop for the interpreter. So what's going on? All right, so again, for reminding of you, your computer science class, what is an interpreter? Well, there's actually a Python virtual machine, is what they would call it. All that Python is doing is, as it compiles the script, it's generating a set of intermediate code, byte code, for the Python virtual machine. Now, this is a virtual machine that's a stack-oriented machine. That means you do an ad instruction in Python and you're essentially popping the integer objects off the stack, looking at what the objects are, what the type is, doing the ad, creating a new object and pushing that back on the stack. That's what happens every time you do an ad in Python. So we did a little experiment just for fun here, as we said, okay, if you have native code, how many instructions does it, well, it's one instruction, ad, right? It can sometimes even be less than one instruction because we have these little techniques called opcode fusing. We're actually able to fuse it with other operations. So then you say, okay, well then, what happens with Python, see Python by default, on average, if you're adding two integers together, it's 76 instructions, right? So that's like a 76x longer amount of instructions that have to be run. Now, what does this happen when you're running on the processor? Well, it turns out to get the best performance. We have pipeline processors, right? There's actually a five-stage pipeline and I'm going back to your other introductory computer architecture class. The five-stage pipeline is what it's doing in the first two stages is fetching and decoding instructions, right? Then it does things like fetching operands and arithmetic instructions and updating state. And if you can run it all in parallel, all those pipeline stages running at the same time you get the best performance. What's happening when you're running Python is 50% of the cycles is actually stalled in the first two stages of what we call a front-end bound workload. So what that means is that you're actually just get this really nice, expensive processor that we put all these great instructions into and half the time it's sitting there spinning its thumbs around, waiting to get instructions because the code footprint is so huge, right? So this is the reason why you can save a ton of time if you just go, oh, let's turn that ad instruction in Python into a native instruction, right? So that's what a just-in-time feature in an interpreter will do. By the way, I was mentioning we're working on other languages. This is kind of a common theme that we're seeing a lot of these interpreted languages. So some of the work that we actually do is work with the processor guys to architect processors so they will not be spending so much time in front-end bound. But this is more of a future architecture work that we're doing right now. We really want to make sure that you get the best possible performance with the current generation of processors. So I think a great thing to do, first step would be, let's see what we can do to make PyPy the default for all of OpenStack. One of the things that we're doing in this space is we're porting all the core services of OpenStack to PyPy. Nova Neutron is the one we're currently working on. We're also Glantz, Horizon, all the other. What's on that? Keystone. Oh, Keystone, that's an interesting one that people often have performance issues with. We ported Keystone and we found a 37% performance improvement with Keystone. Now I know a lot of operators when they're implementing OpenStack, sometimes find Keystone is a bit of a bottleneck. Well, just switching over to PyPy, we were able to get a really nice throughput improvement. So, oh, sorry, this is response time. So again, if you're thinking about, since everything in OpenStack really has to go through Keystone, it's really nice to think about one change that would, you know, you don't have to change any of the code in Keystone and you get just an immediate response time improvement. Yeah. So I think one of the great things about this is not that there's, I mean, there's other alternatives to be able to do things. And yes, we should continually strive to improve our code and make our code better to run faster. However, this kind of idea of being able to say that, well, if we use PyPy instead of Cpython, we get some huge advantages of that. Number one, it doesn't require us to rewrite everything. It means that we can continue to use the vast and deep expertise within the entirety of the OpenStack community that we already have built up around Python. So being able to do this sort of thing may or may not be the be all and end all of performance improvements with OpenStack projects. But it's looking to me like it's a very, very good start. Yeah. Oh, so what's next? So how can you take advantage of this? Again, I wanna put the cookies on the low shelf so you can really try this out yourself. We actually have some instructions. That's the shortened URL. But actually, if you go and Google Intel Swift PyPy, I think this comes up as the first hit with Google. So it's a set of instructions we put down. Go take a look at it. There's actually, should be everything you need to try this out. So if you can't see that, it's bit.ly slash Intel PyPy Swift instructions with dashes. So what are we gonna do next? Where do we go from here? Well, let's figure out where we are right now to start with. Remember some of those deployment challenges, some of those current challenges that David was talking about about using PyPy. We're still working through some of those. Specifically, and inside of Swift, we're working on a few interesting errors that we get related to the garbage collection. Something to do with file descriptors and the way sockets are used. And so we're trying to debug those, make sure we can find the reproducible test case, make sure that never happens again and properly do the right thing so we can run seamlessly under both with no errors. We're also continuing to work on some of those deployment issues, those third-party C libraries and other things like that, making sure that those are streamlined and available for everything. And what we would love to see is your experience. We want you to go test it. We want you to share your results with everybody else because I've seen a little bit of my results. We've seen a little bit of David's results and we wanna see what yours are. And so if we have your clusters, especially if you can throw some interesting hardware and hardware configurations at it with different workloads, that actually gives us an idea of, okay, well, kind of where do we stand? Where should we go next? Where would they send their feedback, John? Well, the best place to do that is within the Swift developer community overall. So those things can be submitted as bugs to Swift on Launchpad. You can obviously always find us on IRC and OpenStack-Swift on FreeNode. We wanna know what you see. Even a mailing list post is completely fine. We would love to just, yes, tell me what you've seen and that way we can work together to get help with everything. Or send John a tweet, he's got his Twitter, or me a tweet, yeah, either one. We wanna have engineers that jump, my engineers will jump on this sort of thing. We really wanna get it worked out, yeah. So longer term, what we would look for in the future is a couple of things. Short term, as soon as we get these low hanging fruit issues with garbage collection and things like that straightened out, I would like to reintroduce inside of Swift the PyPy gate job inside of the OpenStack CI system. So that means that every single patch that goes into Swift will be gated on does this work with PyPy? So that's just like base level and it's not gonna be hard thing to do. We need to figure out the last remaining bugs so we don't have to play with garbage collector settings so that it doesn't trigger it. So we just want this to be much more stable and then we'll not much more stable. Just get those last little things, low hanging fruit things, taking care of. When we get that, reintroduce the gate job and that way we know that if you want to deploy that, you're not gonna have to do extra jumping through hoops. Longer term, if this thing, I'm taking off my Swift community hat but thinking from my employer's perspective, if this thing is good as far as this is a way to make measurable performance improvements for my customers, I wanna put production customers on PyPy and I wanna do that as soon as possible and obviously we're gonna do that based on the results that we have. So what I wanna see if this thing is great and it's looking pretty good so far, I want to run production clusters on PyPy. And from our perspective, as like I said, for the broader OpenStack community, I'm really interested in all the rest of the services, essentially having an all PyPy OpenStack cluster and so I would hope that we could actually have a proof of concept of that, hopefully by the next summit, be able to report results on that and I was actually hoping to have it for this one but I don't control all the resources so it didn't exactly work out but that would be my intent to see that and I think if we do that, I think we ought to be able to add PyPy back to the gate for everything in OpenStack, right? Yeah, so I think that's very exciting to see and as like I said, I've got engineers that are working closely with the PyPy community people to work through these issues, to make this really with the goal of it being the default way you run Python, yeah. So I wanna thank you for listening to what we had to say and we have about five minutes or so for any sort of questions you might have for us and there's two microphones here so please use those so it can be picked up on there. Questions, comments or nasty remarks? I'll direct all the nasty remarks to John and I will take no, yeah, please. So last time I checked PyPy was still Python 2.7, won't that just trap us into 2.7 forever? So actually, this is one of the things if you look at PyPy versus other Python Jits that I'm actually really happy with because it's both two, seven and three, I can't remember which released a three but there is actually an effort for both. I could go through several other PyPy, sorry, Python JIT projects. All of them are either Python 2, like Piston, or Python 3, Pigeon is one that another company is gonna be talking about the next PyCon. So this was, it not only has the best compatibility with everything that's out there, plus it, you know, our commitment to the Python community is we wanna improve, it's not an either or for us, it's a both and. It's like both see Python and PyPy and Python 2 and Python 3. So that's really, you know, my commitment to the community is to make that happen. Yeah, go ahead. Is there any plan for deploying this into any operating system, commercial operating system or non-commercial operating system? I think you can already get PyPy from the commercial OSes as a package. So you can install PyPy from your favorite repos. I don't know the exact version for any particular distro, but yeah, I think that's a great way to go. I was talking to some folks actually from Oracle about Solaris and, you know, it'd be great to see something like that at some point in time. Yeah, so those are the kinds of things I would love to see. I'd love to see partners like John. It's been great working with Swift Stack. I mean, they are a great partner for us. That's kind of reason why we wanted to stand on stage with a couple of us working together on this, because it's kind of a community way of doing things. But there are plenty of open stack sort of products that are out there. I would think this would be the competitive threshold for everybody to get in at this level to be able to have the best possible Python performance. And John here is a pioneer and I think is making it happen. Yeah, what else? Please. So I found a few references on the internet saying PyPy is definitely faster than Python, the Cpython. But it is faster for the CPU bound operations or kind of jobs that you have. But it's a bit slower for the Iobound operations or jobs that you have. So do you have any experience around that? So yes, specifically what we've seen, what we specifically have been working on is where can we accelerate the CPU bound things. And much of a request lifecycle inside of Swift is involved with reading and writing packets from the network and drives. So there is a lot of IO that you have to end up doing there. There are other things underway as well in the community to help us improve the IO dominated tasks inside of Swift. And we'll continue to work on those as well. And I see the advantage from Swift's design in that we can actually, to some extent, compartmentalize and isolate the different processes that are doing the different sorts of workloads, which means that we can selectively apply the performance optimizations where they're best used and best needed. But all of that being said, it's really going to come down to the numbers. And so if we get 5x improvement on CPU bound things, but we get 15% decrease on IO bound things, well, I don't know. I mean, I'm just making up numbers. That depends on how much data or size of every request that you're performing. So basically, I don't have a specific answer other than that's a fantastic question. And you're absolutely right, because those are exactly the things we have to worry about when we're deploying this to production. And we're going to make the right decision for the operator community and the deployer community. And I think this is the sort of thing that the performance experts in the group at Intel are really good at a whole stack sort of analysis to really understand where are we spending time in optimizing those things? We have to go work at the Python level or at the interpreter level or the OS level. We're trying to hammer all those out. And I would hope to see actually some sort of white paper about this specifically, and hopefully jointly with Swift Stack that we can kind of see specifically so everyone can have the exact numbers in front of them. Yeah, because you can show graphs showing all these performance improvements and throughputs and all with smaller workload sizes per object or something. But someone else can show other side of it, right? Yeah, there's just some guideline around. I think, yeah, you make an excellent point. So I think we have time for one more question. Oh, this is your time. That's it. I'm sorry? Of course. Can I put the bitly slide back up? Intel-pypy-swift-instructions. So yeah, I will end on a request. Please help us out. Let us know what's going on. I actually recognize several of you and know you actually have very large Swift clusters yourself. So let's work together to help make this thing awesome. Thank you. Thank you very much. Yeah, thank you.