 Hello, hello, hello. Good afternoon. How many of you came here from North America? One, two, three, four. Thanks for coming. My name is Fati Fati-Narma, solution architect based in Dallas, Texas. It took forever to come in here. But I hope it's worthy for you to listen, at least, and maybe have some engagements. So raise your hand. You picked this session because you saw the keyword 5G. One, two, all right. Raise your hand if you are here for the keyword observability. Pretty good, all right. So before we go, this session is kind of a mixture of both 5G and relation to observability. And we will be talking about the tools open source projects that we have leveraged in this bird. This is not a PowerPoint that only we did through, I would say, footwork. And there's a more detailed report on it, which we'll show at the end. And there's a GitHub repo that you can look what we have used and how we have used for 5G as well as observability part of it. And go. Thanks, Fati. So I'm colleagues with Fati. We're both from Red Hat. I'm in Amia, American, out of Germany, covering Talco. Fati's a global guy, as he said out of Texas. I think you slept overnight in a different country because your flight was late for connection. So I had a two-hour flight from Munich to here. And you had how long? 15-plus. So imagine a lot of you guys, who's first time traveling since pandemic? Anyone? OK, so a lot of you have already had the excitement of traveling again and figuring out what all your reward numbers were from two years ago. So I'm going to briefly cover a little bit. Fati has a lot of cool stuff to show you if you get into time. And if you have any questions, feel free to ask in the middle. And at the end, we can also follow up with you guys. And with that, a couple pointed questions to you. If you're looking at doing 5G or you're looking at observability, have you asked yourself any of these questions, such as, what is your expectation for end-to-end systems? Have you thought outside of a single Kubernetes or OpenShift cluster or multiple? The other thing would be how to make your environment more observable. This goes back to what Fati will show with his service mesh and feeding back. Imagine if you have 1,000 clusters. How are you going to see what's going on in each of them in one place? Do you really want people that are running operations of your cluster to have to go in and look in 1,000 places? And then also, how can you do self-aware? What does self-aware mean? If we're self-aware, one of the things you could think of it is, what's your feedback loops? Are you just doing standard, see something and do one thing? Or are you taking in multiple things and doing one thing off of multiple attributes? That's kind of the stuff that Fati's been kind of brainchilding and looking at. And if we have enough time, probably even talking about what your plans are after this. So with that, Fati? OK, so I can just go fast forward. So the journey, so my journey in relation to this work, I have a background in telecom media entertainment. I started my career as a software engineer in Telco. I did around 15-plus years, and then I did the system integration in the field as a consultant. I was part of Kanano-Kubuntu for OpenStack, and I joined Verizon Wireless as a distinguished engineer for the edge computing and infrastructure design for 5G, and integrating those with the infrastructure with the OSSBSS systems. Then I joined Google. Google Cloud, we did the first rounds around the Anthos, Anthos Service Smash, Anthos Configuration Management, while doing so. Obviously, you're not doing for the sake of doing art. You're doing for the sake of workloads to sit on you, migrating them to your platform, and then supposedly migrating to your cloud infrastructure. Then I joined Red Hat. There's a reason for it, but I'm not going to give you here. So long story short, for the sake of having more mature, trustable, dependable, well-documented, with the proper support organization, I find myself more confident to be part of Red Hat. So what you're seeing here, as we asked before, if you're familiar with the 5G, 5G is a combination of n number of microservices. It's not like only front-end and back-end. It's tons of microservices, including from control plane. So okay, so control plane, when people talk about the 5G control plane, it's different than Kubernetes control plane, all right? That's all right. We were getting in the mood. So 5G control plane is mainly around radio access network, registering the GNBs, which are G node Bs, radio tower stations, and also your terminals. This is the control plane, registering yourself, introducing yourself into the network, so you start getting service. And then the user plane is your payload. You're borrowing, saying the internet, all that communication is the user plane in the 5G. So Kubernetes control plane, as you're aware, we have masters and we have slave, now workers. Masters is controlling all this cube scheduling and cube proxy configuration, all that, that's the Kubernetes control plane. And then Kubernetes user plane sort of thing is where your real traffic is ingressing, ingressing out of your clusters, right? When you look at this deployment model, each of these different colors is mainly individual microservices, which is subject to scale itself within the cluster, within the namespace, as well as outside the namespace, within the same cluster, as well as outside the cluster. So when you're thinking about global scale telco solution, it is distributed cattle's around the globe, around the universe, because it's not only people served in here for 5G, not only this geography, this city, this municipality, or this country, all the countries, because all the major TME players are multi-country operations. So think this complexity with the scale, it's totally unmanageable chaos. By means of micro, number of microservice, different microservice, by means of n number of replicas, by means of interfaces defined by TrueGPP, by means of different platforms, hosted on different infrastructures. Okay, so why are we talking about scaring you first? It's the obvious we need to identify the problem, which is the complexity, how can we slice and dice the complexity into a more simple approach, obviously, Dwight and Conqueror. So identify the common roles and responsibilities across these microservices, and group them together, and tie them together, and then observe them together, together with infrastructure observability and platform observability as does application observability. And that role includes security and policy management, multi-class management, orchestration, and telemetry. So when we're talking about observability, everybody talks about observability is, when you Google it, you will find that observability is taking a snapshot of the system to understand where you are, so you can compare yourself where you want it to be as a declared state, okay, keyword, declared state. That's the thing with the Kubernetes, right? You declare something, and that happens somehow. That's the beauty of the Kubernetes. Imagine this complex system is hosted on clusters, and some of these microservices, not all of them are hosted in other cluster. By means of, say, this is a Berlin, and this is central Berlin, and this is the municipality, one of the municipalities. You don't need to replicate all the services, but the required number of them, say, for the user plane. But they have to talk to each other by means of network communication, as well as these different clusters. It could be OpenStack, it could be Kubernetes cluster. It kind of has started as a creation of the infrastructure as well as platform from a single hub as a zero-touch provisioning, provision infrastructure, get it ready with the networking and compute and storage, and then deploy those workloads, such as 5G control plane and 5G user plane nodes, and then tie them together over networking so they can function properly, and then the radio part with the users and the terminals are registering and getting service. So this is the thing. The biggest challenge, I think, in my experience in this world that we are living with OpenStack, VMware, say, bare metal, say, now, Kubernetes, is not the only platform the software is, 999% is networking. We have worked, the organizations I have part of, heavily on the OpenStack Neutron to tie it together to make it work properly for telco workloads. Now, we are similar efforts happening in Kubernetes-CNI domain with MOLTUS and with SRIOV with MacValan. This is coming from the nature of these complex workloads, kind of carrying those baggages from old legacy domain, the way they are implemented, they have implemented networking, they are carrying those bad habits sort of into this new world, but as a platform, as an OpenStack or Kubernetes, you need to address those requirements and make them comfortable to sit on you. So observability, the three key verses is logs, metrics, and traces. How many of you have been using, say, Yeager, Zipkin, or Kiali in your production environment? None of you, no? Any on the staging? Any POCs? So you do? What are you using? Which one? Yeager, okay. Okay. Perfect, perfect. So how many of you heard Loki, not from the Marvel Universe? Loki as a project, Grafana Loki. All right, awesome, that's good, that's good. So the beauty of this world is not necessarily everything turns around the traces. Mainly you may leverage logs in a better way with the labels and present them, say, over Grafana for time series approach to see more detailed approaches. So in this study, what we did, observability, when you're looking at observability, obviously you have to collect all these logs, metrics, and traces into one central location and then kind of digest them and make them usable and presentable so you can understand and then feedback into your, I would say, scheduler or your provisioning OSS system to leverage these data. The key part is collection. How you collect data is, can be consumed and cooked and eaten in different ways. How you collect data could be done through ambassador container, could be done by a sidecar container like in the service mesh concept. How many of you are aware of this service mesh as in the Istio, please raise your hand if you know what Istio is or how it works. Keep your hand up if you know what sidecar container is, even though you're using Istio, but you are dealing with the sidecar container. Okay, so sidecar container is a companion to your application, to your application. It injects itself, without you don't need to be aware of it and all the traffic passes through it, all the traffic, all the traffic, all the traffic. It's not only some traffic, all the traffic. This is important, all the traffic. So if you're familiar with the applications and you're already having issues with the bottleneck, imagine putting another little bottleneck in front of you, which is the Istio sidecar container. You were already trolling on your choke and then somebody else putting more pressure is something like that. Because the sidecar container comes with overhead of resource consumption and usage, as well as bottlenecking your ingress traffic as well as egress traffic through it. So that's why when we were doing this study, we considered, seriously, we considered service mesh to be part of 5G, collecting all the data we could collect and we can consume and digest, but the overhead of Istio was so, I would say showstopper or heartbreaker, whatever you want to call it as a word, is we couldn't go further. When we say we, as in the sense we have a true 5G testbed with the open 5G core that we use to all this experimenting as well as we have partners through legitimate 5G core partners that has been working with us in this journey. They're also relating Istio sidecar container and ambassador container approach and they experienced the same issues. So we decided, you know what? We will not go that way, but if you want to go that way, we have a workaround, how to export services into a service mesh into another cluster and use them not only for logs and metrics and traces collection, but also traffic steering as well. At the end, the article covers that service mesh part of our work as well, but in this slide we are not. So what is remaining is, okay, we excluded the service mesh part of it. What is left is, okay, so we are not gonna use sidecar container per pod, per application. We're gonna focus on the simplest way of collecting these data logs, metrics and traces, at least on host level, all right? How many of you raise your hand? You have deployed and using and maintaining demon sets on your infrastructure on every worker, demon sets. All right, so demon sets is mainly, they are there per host, at least based, and they do their job and they collect all the data kind of in a default namespace, meaning they collect every data from other namespaces that is accommodated on that worker, and they ship that data to wherever they are shipping, like in the old days of SysTix and all that, I would say OSS streams, SNMP alarms and all that, SMTP, it's similar to that. So what is this is, what you're seeing is, advanced cluster security, which installed agents on these workers and collect these traces for you, and what you're seeing is the 5G infrastructure on every worker, there's an agent collecting this 5G OSS data. If you're looking at this, the UDR and there's AMF, which is different names for different microservices in 5G, and presenting you in a visual manner for all the throughput, all the availability, all the replica sets, as well as if there's an issue with the throttling from single pane for multiple clusters. So ACS, I know, look, I was like you, I'm still like you, when you see some product name from some company, it's just like itchy, right? It's just marketing, but the thing with the Red Hat, every product that you see as a product name has an open source version, which we upstream every change, and then we pull downstream and we harden and then we ship it as a separate product. This product called StackRocks, if you go StackRocks.io, that's the open source part of it, okay, and which we acquired not long ago and we open sourced it, every component of it on the StackRocks.io, and this is what it is. Okay, so that was agent-based. Okay, remember, we talked about service mesh, sidecar container overhead, all this choking thing because it's not really playing well by means of complex and distributed and scale infrastructure. We talked about agent-based demon sets. What if I tell you, there could be another way? If there could be another way, you don't have to install a damn thing. It wouldn't be that fantastic, so you wouldn't worry any additional software to maintain. I also think about the overhead for the host, and that is, my dear friends, if you're coming from networking world, you're still doing it, you used to do it, you're still doing it, pulling all these metrics naturally from a well-defined interface over a switch, over a router, that switch is, say, if it's Cisco, it could be IPX, it could be EBPF, all these things are there for you to pull through a well-defined API already. You may ask yourself, why the heck we haven't done that way yet? So there's a challenge, though. So when you're going with the open-stock Neutron sitting on OVS and converting into OVN, different networking solutions, but we are settling down as an industry, at least on the telco side, to do kind of a common switching, virtual switching platform. Now it's becoming more of an OVN, which is enhanced OVS with the more of a well-defined API around it, and this is what it is, pulling all the data from underlying OVS switch, IPX flows, and presenting you from single pane. It doesn't install anything, it just collects the data from multiple clusters. As you will see, there's a production 5G core here, which is one dedicated cluster sitting on AWS, and there's another one, production two 5G core, which is sitting on my home at Texas, which is a VMware infrastructure, and they are all collected from a single point, and that point is actually single node open shift, is one PC hosting a Kubernetes with this hub cluster. So this is the outcome. So this is right now, it's called NetObserveOperator, Tech Preview, you can go into Operator Hub and download and experience it, and what it does is you provision your lucky backend on this hub cluster, and you point that lucky backend from these clusters to ship those metrics and traces into that destination. And once the data is there, we pull from that lucky backend and present you. Okay, so I don't wanna go too much bits and bolts about the 5G, the user plane function and all that. Long story short, for 5G to be more effective to have better coverage, some of these 5G microservices has to be distributed everywhere pretty much for promise of the low latency, user plane breakout should be wherever you are. Say you're in this room and I have a server here, although you're connecting through local indoor coverage of 5G, your session actually going all the way down to the Berlin gateway, wherever, or Germany, maybe Frankfurt, whatever operator you're using, and breakout from there and then come back in here. But the promise of 5G, remember one of the promises of 5G is low latency, for low latency to happen, your breakout has to be close to your application, your edge application, which is this distributed 5G with UPF is important. When you're doing it, you're multiplying a number of UPF instances, microservices instances from hundreds to millions, not thousands, millions. Wherever you go inside a building, wherever you have a good 5G coverage or 4G coverage inside a building, I bet you should bet too. There's the indoor radio access network coverage in that building if you're a full bar. Without indoor radio coverage inside a building with the concrete and metal, your coverage will be so minimal. Why I'm telling you this is because there's already in building deployments with the 4G and 5G for radio. What is coming is a local breakout within that building and it's happening right now in US and Europe, especially at the hospitals and transportation systems to have low latency, fast throughput application to be hosted wherever you are physically. So for things to work, you have to first monitor it. If you don't monitor it, you don't know how it is behaving, you cannot make it better, you cannot make it scale properly. And so monitoring part, which is the OSS where we kind of had this work and we kind of concluded that we're gonna go in the most optimal way, most cheapest way, by means of less overhead and more obviously collecting data centrally in a pool manner or push manner, but zero add-on with the agents. All right, I know I talk too much. I usually don't talk too much. That's what my wife usually complains about. What's wrong with you? I don't know. Any questions? Okay, so Eric mentioned what's coming. So one of the premises of the 5G is network slicing for industries and verticals. Slicing the radio from radio terminal all the way down to the 5G core side. And raise your hand if you heard the keyword called S-O-N, self-organized networking. Okay, that's what we're gonna do next this year. Self-organizing 5G networks delivering the network slicing. Our primary focus obviously for 5G core not necessarily radio, but 5G core perspective, how can we do network slicing with 5G core deployment starting from platform and then applications as a 5G application in a distributed global manner with the AWS and on-prem. And how we can automate this. Everybody, have you ever heard the zero touch for reaching ZTP? Raise your hand if you know you heard ZTP. All right, ZTP kind of a hint thing you may agree or not is more automation and less manual work, right? And we're gonna be kind of a click button like our account manager says. Is there any account managers in here? Not that I'm insulting you, but they always go over premise stuff. It's just one click solution. There's no such thing, one click solution. Some work has to be done. So a lot of automation, minded development coming in the 5G core, self-organized networking, that's what we are right now heavily focusing on. Obviously the challenge is the networking protocols on stacks and different infrastructure types. Are you guys doing all right? All right, so go ahead, any questions? All right, so one key challenge, right? So everybody talks about hyperscalers, hyperscalers, somehow everybody ties in hyperscalers to the AWS and Google Cloud and Azure. No, it's not only that. Facebook is a hyperscaler, Netflix is a hyperscaler. Hyperscaler means dues with the money in the pocket who has no limitations buying shit and getting as fast as possible. Means meaning that they can scale so fast, so up and their cost of economy for buying these things buying me a CPU core, Hertz versus memory storage is uncomparably lower than the biggest telco operator out there, I'm not kidding. AWS's cost of buying a CPU core Hertz compared to AT&T, one of the kind of okay size telco is like one tenth. So that's why the cost of economy is uncomparable on the hyperscaler side. So the relation of this hyperscaler is we are leveraging AWS infrastructure and Google infrastructure to deliver this as one of the infrastructure types to support in the solution with Kubernetes as an obstruction layer on top of these infrastructure services. So maybe last thing then go show the article that we published. So this work actually, thanks to Eric, he said, there's this conference in Berlin, do you want to come? I said, what are we going to do? We've got to do some work, right? Great, there's no internet. All right. So that's all right. What do you want to show? So how about this observable part? Okay, we did the groundwork and burned some servers literally burned it, thanks to AMD's lousy coolers. And then we wrote the detailed article and published on themedium.com and then also within the article we will have access to the GitHub repo where you will see site one full sledge 5G core site to only UPF and also you will see how to do server smash with the export in that GitHub repo. And also there is a hub folder in the repo for faster installing low key backend on the ACM hub which is that your management hub cluster. Any questions? Come on. These conferences are great. The last open stack summit I was in Boston, I was part of Verizon and I was wishing and hoping there could be some workload related sessions so we can not only talk about all the open source but the relation of the open source with the workloads. Yes, sir. If you're looking at- Okay, so the- You don't understand. Okay, okay. Okay, so the question is, don't you need to understand the networking type while you're doing observations and all that? Yes, we do. Okay, so that's a great question. That question has two folds, all right? The old school way of throwing things is traffic and the protocols. Protocols come with the well-defined ports, right? Ports, protocols. This is what hyperscalers do. If you go AWS, ELBs and NLBs which we were talking before the session, those network load balancers support load balancing of the traffic based on the port meaning they do support some protocols, say SCTP based on the port number but they do not honor the SCTP as a protocol looking inside the header. What is it I need to address? Obviously those hyperscalers does not support that yet. On open source side, we have some solutions say with metal LB and here, all the observations that you see, traffic and everything, mainly around port numbers. Okay, the 5G. Okay, telco. Telcos. Telco application developments and telco application stacks driven by organizations, say 3GPP, say Etsy, ITUT and then later in the beginning of 2000, we got the IATF at RFCs for the SIP and STUN and all that. So all these organizations defining the architectures as well as what protocols will be used. So the difference with the 5G is big because when 3GPP was defining 5G standalone architecture, they call it keyword SBA, service based architecture, more of honoring the web of doing things but yet there's still all legacy protocols, say SCTP from RAM to coming to AMF function where the hyperscalers are suffering, some of the solutions like that will be suffered unless you address those protocols which we are and adding those upstream contributions to the CNI levels with the multis. If you're aware with the multis, multis means multiple interfaces to the pods, not only single ETH0, but you will have net one, net two interfaces. You can have Mac, VLAN, IP, VLAN, SRAOV. So those secondary interfaces where the special protocols come in and hook it in. Those ports, those outer interfaces, right now, nobody covering them from observability. Even this net observe, which is pulling metrics from OVS, won't be covering it because if you're using OVS to SRAOV PCI pass through, it's bypassing the OS, bypassing the OVS. The way to do it, again, only option left is the demon sets and probes on the network. So the ACS solution has a probe approach. Google's main security solution has a probe approach on the network fabric. So that's another approach. Does that answer your question? Good. Any other question? So on this architecture, you show us that all the 5G network components, you deploy them as single, not open shift. You have in the operator network, you have some hub and this kind of stuff. Have you considered a similar architecture where you would, for example, have one control plane for the operator and you are deploying those either as, let's go as far as hypershift or at least as a worker nodes in the same cluster instead of having this small but still existing footprint of every 5G cell being a separate cluster. Let me try to wrap your question. There was more of a statement than a question. So the question is, what you showed us, kinda you hinted that there's a single node open shift somewhere which was our hub cluster single node, really a single node but the real production 5G core sits on three masters, three workers and other breakout sits on three masters, three workers to is multi node Kubernetes deployment, right? So your question was mainly around, okay, kind of hint into kind of a dilemma, right? Or maybe kind of a war. Shall I have a giant Kubernetes cluster with hypershift so I can slice the control plane per tenant or shall I have small ephemeral clusters? It depends on how you eat your food, right? I mean, as a person kind of living and growing up in Texas, I love barbecue in one way but if you go in U.S. to Kansas, they love barbecue in another way. So remember, I worked at Google and at the time we were having this dilemma with the Anthos with the Google Kubernetes engine too and we had this, okay, kind of hinting to the keyword LTS, long-term support, so please. So there's no such thing as LTS in the Kubernetes domain with the hyperscalers. It's fast-paced development and shipping and just upgrade behind the scenes, right? Meaning that those clusters actually assume to be ephemeral, okay? Meaning those are not really big clusters. However, TME is not about only T part of it, there's a M part of it, media and entertainment. One of the biggest, I will not name it, but originally born out of Sweden, one of the streaming services has like 15,000 node clusters and they suffer a lot of attack services as well as reliability issues and then they try to lower the footprint in smaller clusters but in more ephemeral way but yet they couldn't scale down below 5,000 nodes, for example. So what I'm trying to tell you is 5G core sites where all the full-fled 5G will probably gonna be multi-node Kubernetes cluster with the different namespaces, namespace oscillation as well as control plane isolation with the slicing with hypershift. Yes, that's happening, that will be happening but yet the small breakout clusters, those not likely be hypershift big clusters, those gonna be small footprint clusters, okay? Good? Any other questions? All right, thanks for coming. I hope you had some fun and I hope you don't get called it. I hope you go back to home safely and if you have any questions, feel free to reach and if you just Google the name of the session with the spacemedium.com, you will find the article unless, we have internet now? No, you don't. If you can get it on Wi-Fi. Okay. It's medium.com slash open-5G hyper core. All the work up there and the more coming for the self-organized network will be under there and all the GitHub repos are available on those blog posts. Thank you! Hey buddy! Let's take a...