 just as everyone's wandering back in from the break. I hope you're all adequately caffeinated. I certainly am. So my name's Giles Herron. I'm going to talk about Media Streaming Mesh, which is a new open source project that we spun up with in Cisco. We're looking for others to collaborate with us on it. The strap line here is real-time media in Kubernetes. So really, this project came from a recognition that Kubernetes was more focused on web applications than on real-time media. And that was really what we wanted to address. So if you look at applications in the classic kind of 2 by 2 matrix that an MBA might use, you can kind of divide them into non-real-time and real-time interactive and streaming. And so I guess my contention would be that Kubernetes, particularly if you're using things like Service Mesh, is very much focused on this non-real-time interactive space that's epitomized by web applications. And so of course, what that tends to mean is that in terms of scaling, what you'll do is you'll put out more and more horizontal replicas of your front end, for example, as clients connect to them. That's a very different behavior from what you see in streaming, where you tend to have one canonical source pushing data out. But as I say, we particularly wanted to focus in this project on the real-time space. And initially, we looked more generally just to anything real-time, so it could have been gaming, for example, as well as media. What we discovered was that Google, in fact, have a project called Quilkin. I don't know if anyone here has come across it doing UDP proxies for gaming. And when we looked at media, and specifically at RTP, we decided to focus there. So we're very much focusing on real-time media, anything RTP-based. There are various challenges there which will come on to. And I guess the other thing to mention, I said that this is a fuzzy application taxonomy. And I mean by that it's fuzzy in both of these dimensions. So for example, when is something interactive and when is it streaming? If you're on a Zoom or a WebEx call, actually the experience is a very interactive one, but what's happening if you look at the packet layer, you have all these streams going, each user is effectively streaming their video to all the others. And in the real-time, non-real-time, well, there are two of us working on this project so far. We're both Europeans. Unfortunately, the other one is actually based in the US. So it might well be that I'm watching a football match on my cable TV subscription. So I'm seeing things as they happen. More or less, those of you of my kind of age will remember the days of analog TV where things were literally up to the second and the news would count up towards the hour beforehand. We don't quite have that now with decoding digital TV, but my colleague might be watching over the top. So his challenge is going to be that if I text him and say, wasn't that an amazing goal, he'll be like, well, I haven't seen it yet. What are you saying? So real-time itself can be a bit fuzzy. And I think that's really the contention here, that if we look at real-time media over the internet clearly, things like WebEx and Zoom prove we can do effectively real-time. However, if you want to watch the football, you always find it's being delivered over HTTP. There's that lag. So the issue becomes, how do we make those kind of streaming apps sufficiently cloud-native that we can use all of our cloud-native tooling? We can drive the cost down, and then hopefully we can deliver real-time over the internet for mass markets. So what we're trying to do here effectively is do all the same things that a service mesh would do for you in terms of web traffic, so things like observability, things like security. So we want to do things like Spitfire. We're going to use things like SRTP to encrypt our traffic because you writing an app, you shouldn't have to care about that stuff. That should be a deployment consideration. But do all of that while achieving very low latency. So an RTP proxy is going to take a packet in, fire one or more packets out. It's not going to be buffering. And in terms of deployability, keeping the footprint really low. And I guess that's particularly pertinent to what we're talking about today in terms of edge. And really, that was where this project started, with looking at RTSP cameras being deployed at the edge, and where you didn't really want to deploy the holistic envoy stack, which I think is about 40 megabytes per pod that the sidecar ads for envoy. So we want the much lighter footprint, as well as the lower latency. So in terms of use cases, I mentioned we've kind of taken non-RTP stuff out of scope. Though interestingly, for example, there's even an ITF draft out there from one of my colleagues on doing gaming over RTP. The recognition that games are actually kind of similar to media streams, because they'll send out pretty much the same data on a regular tick to the players. So you can use RTP for that, for example. But really focusing on these video use cases, whether in the sort of TV production space, streaming over the internet, but also things like the retail and industrial edge, particularly around things like, as I say, RTSP cameras, doing analytics at the edge. But also, of course, real-time collaboration. So if you were watching this in real time, rather than with HTTPS lag, it would be that kind of stuff. So in terms of the video applications, I guess we're looking to see where might this technology be useful. So if you think about it, you might have cameras, again. Take the football example. Cameras at the football match, streaming back. Somewhere in the studio, you'll be mixing stuff together. You'll be bringing in the score from other games, the score in this game, that kind of thing. And then, of course, you need to encode it to send it out, whether it's going over the air or whether it's going out over the internet. And of course, because of sending it out over the internet, there might be various different bit rates, various different encodings, various different protection schemes. And so all of that you might want to do is a cloud-based thing. And then delivery out to the caches, and then the final hop to the user. So in terms of applicability, I'd say the first one's a bit tricky because there's a lot of dedicated hardware. If you were to look at the mixer desk over there, you'd probably find quite a lot of real hardware, quite hard to run that in containers. But this interconnecting with cloud-based encoders, that feels like a very Kubernetes case, being able to scale stuff on demand. Then streaming from there to the caches. Even if your final hop of your football match has been delivered to users using dash or HLS, it might be that you at least want the caches themselves to be right up to the second. And there are things you can do there. For example, stream it over RTP, but send it over two paths, maybe add FET, that sort of thing, to make sure the caches are up to date and have the content. Where it might get interesting is that final delivery to clients. And we've been experimenting with things like RTP over quick to do that so that you could deliver it potentially into browsers and through firewalls, et cetera. So if you look at the cloud-based encoders, as I mentioned, we're just saying for one input stream we might want to create multiple lower resolution, low bit rate streams. And so we'll do that through a transcoding stage, through our proxies, and then we'll carry on streaming that through another stage that does protection, and then pushes out the encoded streams. So in terms of delivering to caches, I mentioned the key there is just the thing that you can send potentially over multiple paths, adding FET, that sort of thing. But when we come to that final hop, really that's where a lot of the challenges are going to be. So for example, if I'm watching it on my television set, which is streaming over the internet, well, how do I update the software on that? When can we get software updates to support RTP over quick? But ultimately, what we want to do is take in an RTP stream that might be regular RTP over UDP, and then stream it out over quick to those users. You end up modifying the protocols. So when he's looked at some of these video protocols, typically what you'll see is you'll have a TCP control plane. That control plane will then negotiate the UDP ports that you use for your actual media streams. That's actually the key reason where Kubernetes struggles with this today. If you want to run services using Kube Proxy, you'll find that they simply won't work, because Kube Proxy can't inspect that TCP control channel and look to see what UDP ports are being negotiated. So that's where media streaming mesh will help. And I'll show you that in a demo. But in this case, with Quick, of course, you've got to negotiate QuickFlow IDs or something instead of those UDP ports. What's interesting for anyone who's watching this whole media space in the ITF is there's a lot of activity now around, do we do RTP over quick? Or do we create new media protocols that run over quick? So there's proposals from the likes of Facebook and Twitch, things like, I think it's Rush, Warp, I think it's another one. They sometimes do really cool things. They'll take one video frame and send it over a quick stream, so you'll only get head of line blocking for that one video frame, not for the whole thing. There's another proposal quicker that came out from my colleague at Cisco, which is really trying to do publish, subscribe over quick. But yeah, to come to the edge, and this is really where, as I say, where we started this project, was video monitoring. So RTSP security cameras, typically, I guess there are these two really extreme ends of the spectrum. One's going to be, you've got a lot of small sites, so maybe it's coffee shops. You could be a large coffee, what would you call them, retailer, who has tens of thousands of coffee shops. You might only have one or two cameras in each shop, but you might still want to have centralized monitoring capabilities as well as perhaps having analytics in those individual stores. Or you might be an airport or a factory where you have thousands of cameras in one site. And the challenge you have is that you've got both human and machine viewers, if you want to call them that. So you could have AI apps that are receiving the video streams and doing things like checking, are people wearing face masks? I'm not, because I'm speaking. Is there something that looks like a confrontation kicking off in a coffee shop? And they might then be streaming out using something like MQTT and doing all that analytics at the edge so that the central site can then see, OK, something's happening, I now need to view it. But again, those could be humans who are then viewing it at the central site. The challenge is, because you don't know ahead of time how many consumers you have for each stream. So it's going to be naught to many, typically, or one to many. Then today, what very often people do is they use UDP multicast. So they'll configure, if you have one of those RTSP cameras, you can configure it to just multicast out the stream rather than running RTSP. I guess the challenge there as a network person is you then end up putting state into your network for each of those multicast streams. And if you've got tens of thousands of cameras, that's tens of thousands of Star Comergy or S Comergy entries in your network. And you really don't want to do that. Whereas the proxies inherently support that fan out. So you can send off to one subscriber. It might be in that kind of edge case. Like I said in the coffee shop, you've got some kind of AI apps consuming it locally. But if that app then detects that somebody isn't wearing a face mask or two people look like they're having an argument, then what it might do is send a trigger and something at the central site might start viewing that. And that just gets replicated out through the proxy. So effectively, we're multicasting at the application layer. Of course, we can do that really ad infinitum. So how are we building it? Well, as I mentioned earlier in terms of architecture and footprint, we really wanted to minimize the footprints of this. So we split what we're doing into, I guess, three main components. We have a data plane proxy that's there in red, the RTP proxy. We'll come on to how we want to build that. The bits we've already started building are all the other bits, I guess. We're literally about to start on that data plane proxy. So the key thing, I guess, is the control plane. And we'll have one of those for each of the different control plane protocols, whether it's CIP or RTSP or WebRTC. But of course, we want to build this in a framework where you only have to add that code for that particular control plane. And again, we really want to have a pluggable model so that we can provide that framework so anyone can contribute that control plane. And then we have a little stub that runs in each pod, but that's much, much smaller than something like an Envoy proxy, hence calling it a stub. Then we have some other components just for injecting the stub, injecting IP tables, rules, et cetera. So that first one, stub inject. This is just really pretty much standard Kubernetes stuff you'd expect to see. So when I started the project on my own, I didn't know how to do any of this stuff because I'm a network guy and a software guy. So I was literally creating my own YAML files where I'd add like in containers and I'd run my sidecar proxy that I was using then. I'd run it privileged. And this is just to solve those sort of issues. So in this case, if you have this custom key that is saying inject the stub, then the stub will just automatically get injected. Likewise, in terms of the IP tables rules that are going to direct traffic into the stub, then those will get triggered off the same rule. In that case, we actually have a CNI plugin that runs on every node. It's a chain CNI plugin, so you can still run your calico or cilium or whatever as your main CNI. And this just gets chained in to add that IP tables rule. So the control plane, we decided, as I say, make something fairly pluggable where we can implement all the different protocols, but run it once per cluster because that feels like the natural way to do things in Kubernetes, whether that's a tiny little cluster at the edge with one or two nodes or whether it's a much bigger one. And again, trying to be as sort of cloud native as possible, writing all this in Golang, using gRPCs to communicate between the different components. And I'd say we've got the control plane up and running now. The stub, I've mentioned calling it stub just because it's pretty small. I think it's about 500 lines of Rust code, so pretty minimal. And really, the main function it's doing is terminating TCP connections from the app, and then just multiplexing over gRPCs towards that control plane, which will help us as we start to do control plane redundancy, because then we can move things around. There may be use cases, however, where we do need to intercept the data plane. So anyone who's looked at RTSP will know that there's a mode in RTSP where you actually multiplex the data over that TCP channel, which is actually what you'd see if you attempted to do this through Q-proxy. Now, of course, that adds delay, particularly on a WAN, but there may be cases where you're happy to do that in the pod. There may be other things we want to do, adding monitoring in each pod, or even if you can be really paranoid about having diverse pods, you might want to start those diverse paths at the pod, rather than at each node. Mentally, nodes, so the data plane proxy we figured one instance per node was the right way to go. Really, because that's the right place to replicate everything more than anything else, because if you have one node with a bunch of pods, that's the obvious place to replicate. We deploy this as a demon set, so we can do east, west, north, south, all through the same proxy. So anything that comes in from outside will hit that proxy. In that case, we'll have a stub alongside to handle the control plane for that. Effectively, it's because it's terminating the UDP layer and affects sort of terminating RTP. We're not going in and looking at the clocks and those sort of things, but we are terminating that socket. So that means we can do things like v4 to v6, or IP tunneling, convert NTUs, et cetera. We kind of get that for free. It'll also support RTP over TCP or QWIC, and we can support proxying at those lower layers if we need to. A key thing security-wise, of course, is this then minimizes our attack surface, because if north, south, if the only thing we're exposing is, for example, the RTS people, then that's really minimized the attack surface. The key thing then is how do we want to build this? And really, the goal here is for this to be a proxy with a filter chain that people can contribute their own filters, so I guess this is where the call for community assistance would come. I don't know how many people here were at the wasm session yesterday. It might be. It doesn't overlap so much with today looking one or two. And I was really impressed by Fluvio's model for streaming, sort of streaming analytics type stuff, where they basically have these little plug-ins they write, which are written in wasm, or deployed in wasm, to do filtering. And I thought, well, that feels like the right model for us, that rather than writing those plug-ins in Golang or Rust specifically and putting them in a proxy written in that language, could we do all this in wasm so that anyone can then contribute their own plug-ins with a simple interface that says, take a packet in, send one of all packets out? And you can write those in any language that supports wasm. So that feels like the right way forward. And of course, yeah, we can have multiple application pods all using that same proxy. And of course, we can have multiple control plane instances, and we might want to spawn a proxy for each one or have a shared one. We can figure that out. But what I mentioned, internally, it should be a filter chain that should be entirely reconfigurable, reprogrammable for the user. So your application might be different to mine. You might want to do, for example, forward error correction, or you might want to validate your RTP streams. Somebody else might not want to do that. So basically, we strip off those transport headers. We then pass it into this filter chain that might do things like fact, like fanning out to multiple destinations, adding security. And finally, we'll put transport headers back on for each destination and stream it out. So we're doing the time 20 minutes early. Demo, now I'm not even going to attempt to do a live demo. I feel sorry for the last speaker and having to deal with the network. I noticed the network does say something about Crate with Cisco. Is that the name of the SSID? I can only confirm I have nothing whatsoever to do with it. So, yes, let's pull out a demo. So it's like, here's one I made earlier. So what you can see here is we're streaming video through the system. We've got a couple of different setups, one using Envoy, which will force everything through TCP, of course. One using NodePorts, which in effect will force everything through TCP because of this challenge with dynamic port numbers. For the native RTSP over UDP1, I had to use a separate VLC player because the plug-in we had for Chrome wouldn't support that for some reason. Now, in fact, I think you can see that the UDP1 is first and then the others are following behind. There's quite a short loop, so it's always hard to tell. What we'll do now is we'll increase the loss. So we're just simulating packet loss. And this is over a WAN, so it should have an effect on TCP. So as that gets applied, you should see the TCP start to have some issues. It's not so visible. I don't think that the WAN length was long enough, but it will get worse. Now, normally, this is running in a very lightweight VM. In fact, I think you saw the domain name come up at the beginning. You can even look at that yourself if you want. The only thing is because it's lightweight, we couldn't put an analytics app behind it. So normally, you'd see the red box haven't got a mask on, and the green box has one on. But that model was so big, I couldn't run it on this little VM. So you can see those struggling with this. But if you really jack the loss up, and of course, this is insane, you would never see 30% packet loss normally, except perhaps at a conference like this. But if we jack it up to, I think it's 30%, we try here. What you should see, yeah, sometimes the streams will just die because of the packet loss. So forward error correction can solve this, of course, if that loss is truly random. Loss here doesn't seem to be, it goes in burst. But you can see even UDP will not cope with that kind of packet loss. But if you put in some forward error correction, and we hacked this, I called it forward error correction. I just sent each packet twice. It was cheating. So I just ran on a different port number and doubled up my packets. And so this was using a sidecar proxy that was a single proxy, rather than this decomposed one we're building now, but the behavior should be the same. And you can see that with the forward error correction, it's quite happy even at 30% packet loss. As I say, not something you'd typically see real well, but just as a proof point that this kind of approach can work. Let's see where my mouse is. I guess that's what happens when you don't hit pause. Let's get slides back. I think that's probably about it, yeah. So just in summary, the goal for this project really is for real-time media apps to be first-class citizens in this cloud-native world we're building. So be able to have the same tooling, use the same pool of developers, et cetera, and still do these real-time apps. And hopefully, as I mentioned earlier, drive the cost for that down such that when I watch football over the internet, I won't have to wait for the goals anymore. We have a website for it. It's up on GitHub. Some of the repos are still private. I think the stub and the control plane we're about to push out publicly over the next few days. But really, it's just an appeal that if anyone wants to come and work with us on this, that would be really, really good. I think I'll be demoing it on the Cisco booth at KubeCon itself. And it'd be great to have conversations with people who are interested in the idea, or as I say, who'd like to contribute as coders. But yeah, thanks so much.