 Good morning and welcome and joining me in this beautiful morning in Vancouver. And I was supposed to be accompanied by first my colleague, Eric LaJoy, but he bailed out because he's in Germany. He doesn't wanna travel to Canada, I guess. And our partner is one of our partners in this domain, the CEO and the founder of AOPS One, Abdul Reim Susslu. Unfortunately, he got COVID, so we couldn't make it. So I'm by myself. Hopefully I can address all your questions. So today, before I go, you may think who this fellow is. This is me. I am originally Turkish, born and raised in Turkey. I've worked around the planet, including Sweden, China, Dubai, US, UK, beautiful places, beautiful people, challenging projects, multiple companies, like, if you remember, there was a company once called Nortel Networks. And then Ericsson, I don't know who it went to. And then Verizon as a platform engineer for Verizon Cloud, then Google Cloud for two and a half, almost three years with the Anthos and all that. And then Radhat, almost reaching two, three years in Radhat, that's crossed the fingers. You never know what's gonna happen this year. Because before we all go, one of the major existential threat to us this year is cost optimization, year of efficiency. So how do you do efficiency with your workloads, with your deployments, with your services? Is mainly about collecting data, how your systems are doing? Are they addressing your customer needs and consumers? Are they matching with your SLAs? In order to do so, the observability become at least more and more important. And this year is, everybody's talking about observability. But not necessarily everybody's talking about the same thing under this key word. The majority of the surveys, if you look at what these companies are doing, is meantime to recover between the incidents is can be lowered to at least 40% if you have a good observability solution. You may have a better mean time between failures, more than 50%. If you have a strong observability capability and as by Gartner, the 60% of the enterprise IT organizations will have some sort of observability solution. Another point before go more into details. Please raise your hand. You're here because of the keyword observability. One, two, three, four. Please raise your hand. You're here because of the keyword 5G telco. All right. Not matching, the hands raised for observability is different than the people for telco. That's a good mix. So last year we did similar session in Berlin in opening for summit, and we talk about a couple of solution options which we'll go briefly over them too. So what has changed from last year in Berlin to here by means of work we had done in Red Hat as a Red Hat internal as well as with partners and customers is mainly around, okay, observability is about collecting data to understand where we are, how our systems are behaving, but what data I should collect? How shall I gather that data? How shall I store that data? How shall I use that data? And am I, do I have enough talent in house to crunch that data to create some insights? Because everybody thinks that, okay, I have ML and AI solution, I have a data, I can create insights. No, it doesn't work like that. So when I was at Google, people come in and saying, hey, please help me to monetize my data. Okay, you're coming from pharmaceutical background from a logistic company, telco. A lot of telcos came and say, help me monetize my data, i.e. help me create insights valuable to sell, valuable to use out of data. Unfortunately, as a technology provider, I can provide you tools and frameworks, but since I don't have your domain expertise, I cannot by myself create those insights for you. You, as a telco people, mostly here, you have the domain expertise. Data engineers or rules people, we have to work together. Data engineers shows you what data out there, how to clean that data, how to make that data valuable for the machinery that you can use with your domain expertise to create insights for yourselves. Okay, what if you don't do it? So, year of efficiency doesn't necessarily mean that you are allowed to fail a lot. So what happens if 911 outage happens? This is real metric. If you go to the void.community, you can see that a lot of incidents happen mainly around the mistake of making normalization of deviance. You know what that means? So it means that in the previous session, there was a session of our beautiful friends from Starlink Project show that, hey, they're pulling some metrics and data out of the IPMI for showing the, say, the current heat, current network interface card status, right? When you're looking at your Datadoc Splunk or Grafana dashboards, and someone comes and says, hey, Fati, what's wrong with this red button? There's something is flickering here with the red and you may say that, yeah, that's about the heat of the CP on this particular server from Farm. And you know what? That thing has been there for ages. It doesn't necessarily mean anything. So it means that I'm normalizing an incident occurrence, i.e. I'm ignoring it. So what happens if you ignore it? It's mainly back to the Apollo 11 crush, right? The outer ring, if you familiar with that crush, what happens is they were launching the rocket in Florida that morning. The outer installation ring froze. And unfortunately, it wasn't predicted that that rubber material or the material will be vulnerable for that much of a heat change. And then due to the heat change, even though sensors were showing, hey, there's something abnormal happening, the control room said, hey, no, at this time we have been seeing the sensors alarm all along. It doesn't mean anything, let's go on. And what happened is the fuel leaked and then we have a major incident. The same with the interesting thing in this picture is in the right side, you see the Datadoc, right? So Datadoc has their own issues with outages, even though this is a very well-known and very respectful observatory platform, i.e. everybody is vulnerable if you ignore and make all these incidents or errors become a normal for you. All right, 10,000 feet for holistic full-stack observatory view. So what are we aiming here is mainly not for addressing the application layer, 5G telco applications, but also what is beneath it, such as the platform, Kubernetes, OpenStack, VMware, or direct bare metal diplomas, beneath the hardware layer, beneath the network fabric, okay? Everything has to be considered as part of observability because at the end of the day, if you're seeing an alert tying together as a data mesh and you can see that there's something wrong with the performance of my 5G stack. Say my AMF is not addressing new GNBs to register over STTP interface. There's a performance issue, but performance issue could not necessarily be about the AMF applications performance. Could be something about the network fabric is choking because it has been flirking for failover redundancy on the network fabric. So we have to have a full understanding of from bottom up and use that data to correlate each other and then identify or predict what's coming your way. The core values of observability solution, so observability is not a product, obviously, because it encompasses a lot of data, like we talk from network fabric, OS, platform, application layer. So it is a combination of multiple solutions tying each other and offering your value. We started our journey, this conversation, about being year of efficiency. Please raise your hand if you have noticed this year in your company layoff. Come on, be honest. There are only a few. Okay, raise your hand if you have been told that the budget you were supposed to get this year chopped down say at least 25% for spending. Wow, this group is really rich. They are hiring, they have no issues, they're spending millions of dollars. You must be from pharmaceutical company or say from a company that's making aircraft carriers for war economy. Those are two major industries never issue with the money or banksters. So mainly around if you observe and understand level of traffic coming your applications, you can scale on demand using observability because it's gonna tell you the level of CPU utilization, memory utilization versus the demand from your consumer side. That's the important thing for scalability, for capacity planning and scalability. So proactive monitoring, alerting, root cause analysis is tying into let's be ready before something happens. So what does that mean? So say a lot of solution blueprints includes storage underneath, right? If you remember if you were here for the starting demo you will see the storage on each worker node. Storage node could be hard drives, SSDs and VMEs. Everything has a lifespan by means of IO and write and read. If you look at the specification of the hardware it says you can write this many times, you can read this many times. So what if you're reaching a limit of write on your storage and your ad school is about to go because you cannot write your application state into it. So you will not able to recover from it. Even if you say I have a backup solution like in the previous case, backing up in the local storage and your local storage drive is about to die anyway so you cannot restore it. So what I'm hinting is able to monitor lifespan of your hardware together with the metrics collected every layer you can predict those coming your way. And we'll show you some couple of good examples. All right, raise your hand if you heard the term service mesh. That's pretty good. Please raise your hand if you heard the term data mesh. Okay, so it's both phrases have the common word mesh meaning services correlating each other so you can understand the chain of services coming together to offer you an inconceivable service or a portfolio. Data mesh and the other hand is more of a graph theory that you can associate each data piece with each other some sort of a relation, like in this diagram you can see that there's a data governor, there's data cleansing, Degas integrator and treatment anonymizer and network ops and external API. So these are all each data points by doing something different with the data in changing the state and the format or the content of the data but they are related to each other with this relation. So what does this mean is you're collecting data from multiple endpoints and trying to associate with them say I collected data from network fabric, OS level, switches and routers and an application layer and I can associate it with this mashing with the data mashing and I can see the impact of each data on other data format or other data consumer or exporter so I can predict and I can make better judgment because I'm enriching the data. Last question, I promise. Who, please raise your hand if you heard the term hotel or open telemetry. Awesome, all right, good job. This is really important, this is why this is important. One of the key challenges in data engineering is the format of the data, how you're collecting, how you're exporting, how you're processing. There have been a lot of ways to do that. Making this more standardized way is what open telemetry is about. Standardizing the way to generate, collect and process and pass along that data is what is open telemetry is about. Open telemetry is not about only specifications. Open telemetry also offers you utilities and instrumentation to be used by your code, okay? So you don't need to write those instruments for yourself. So this is becoming really critical because in early days, talco folks, remember the term called OSS-BSS, operation support systems, business support systems, you can call it, Ericsson has their own OSS-BSS solution, Nokia has their own, and then the M-Docs came and they offer some other OSS-BSS solutions. What I'm hinting is each vendor has their own way of collecting, generating, and processing and sharing of their data. If this is a special question to talco folks, raise your hand if you know the term called camel. No, so anyway. So camel kind of supposedly born in the open source domain with the sharing the specification, but becomes so proprietary between vendors that Ericsson data could not be really used and be compliant with the other vendors' data or systems. So open telemetry hopefully addressed that challenge or making common way of collecting, generating, and passing along that data along the different states of data mesh and data processing. I usually don't talk and that's one of the key things my wife complains. Why are you so quiet? Is there something wrong with you? I say, I'm a man, I don't talk, okay? When I looked at the ceiling, I'm thinking and she says, what's wrong with you? I said, nothing wrong with me. And when I go to the conference and I talk this much and I feel like I'm kind of lying about myself to my wife, but anyway. So, another approach. Data could be generated by the existing application stacks and the oldest stacks we talk about from network fabric up to the application layer. One of the key things in the data engineering, this part, is three factors about data is important. One is the velocity of the data, i.e. fresh data, the volume of the data, i.e. big data and the variety of data, i.e. I have a lot of different data that can correlate to each other to build more enriched and sophisticated insights for you. So what if your existing data sets is not really good enough to create valuable data insights for you? Then you may think, okay, then if I have a data set here and I can use external data to enrich this decision making, that'll be a great case, right? So in U.S., starting with the President Obama, there has been a policy about making data open in the United States, open data initiative. That includes from weather forecast, weather humidity of the soil, know who is going, which hospital district and all that, all open data is out there for you. And some of that data could be useful to enrich your decision making or you can say, I'm gonna use external tools such as application performance testing, APT, and collect that data to pipeline into my decision making to make more insightful decisions. In this case, the DOSFI, for example, offers you global latency measurements from variable endpoint into your applications. So this born in multi-gaming online gaming world because online gaming is so sparse, gamers are around the world, but yet they are experiencing latency, right? They're experiencing quality of service issues. So what this DOSFI offering does is, I wanna test my application, which is sitting, for example, in Vancouver, but the people close to Vancouver, I wanna measure the latency from, say from Toronto. Not necessarily, I wanna measure from Seattle or Dallas. I wanna measure from Toronto. So DOSFI allows you to initiate your latency measurements based on your geography selection and measure that latency dynamically as preplanned or on the fly and feed that latency measurement into your decision making. So what does that mean? If your application is suffering because of a traffic increase, but the local metric is not giving you that insight, this latency will show you that, hey, there's latency increasing for the people sitting in Toronto or say anywhere in Ontario. So then you can say, oh, this is, is it because the application couldn't scale up or is it because there's a backbone fiber issue within Canada, right? And then you can try another trigger latency measurement from Washington, it's a different fiber backbone. You might say, oh, this latency is better. So there's a backbone issue within Canada between these geographies. So what can you do? Okay, deploy another set of application on this cluster close to, say, Toronto. So you can make these decision making based on this external data sets. All right. So data can be collected from a platform layer, in this case, what you're looking at is OpenShift console, and we can collect data based on a platform, namespaces, tenant namespaces, and feed into our dashboards, as well as we can offer you network-centric view with NetObserveOperator, which is based on eBPF agent we offer. This is an open source project and also offered through a operator. And free of charge, you go download and use within your cluster and clusters. And what you're seeing here is two different cluster sets. One is this one here, which has the Open5GS SMF, UPF and signaling gateway, another UPF, and another cluster up there is talking to each other for UPF, local UPF breakdown from a central location. So we are showing network interactions and network-based metrics between each interaction services from a single console. So that's very good. What you're seeing is network-centric observability. Fine, this is good, still platform layer. Another thing is from a service mesh perspective, since there are a couple of folks who were familiar with the service mesh. Service mesh, especially the ones around the Istio or LinkerD is based on a concept called Sidecar. Sidecar means that there is somebody attached to your workload and that the traffic coming in and out goes through that dude. He is like a gatekeeper. And since there's an external, sitting in front of you, that person, that agent or a Sidecar collects the metrics for you and reposts their central location. This is basic overall 10,000th view of a service mesh. Now there's another service mesh comb coming along which is called Ambient. I'm gonna put that aside. But main concept is Sidecars intercepting the traffic and talking to each other to give you this perspective of network visualization. All right, so Sidecar is a Sidecar with service mesh's additional software agent sort of thing. And NetObserve is actually totally agentless approach because we are pulling the metrics underneath OS from EBPF flows. Another approach is agent-based approach or probes. So StackRocks, which Radhead has acquired two years back now and made fully open source under StackRocks.io, this is a probe agent-based network security solution which can also give you a perspective of what metrics and telemetry available for your services to be consumed externally, also through a single dashboard. All right, I gotta go a little bit quick. All right, so we talked about the platform. We talked about, we haven't talked about how complex is the telco world is. So whenever we go, when I was at Google as well and Verizon, we were trying to explain to the vendors, right, say AWS, Google, and Azure because these hyperscalers not necessarily understanding how complex telco workloads are. It is the reality that telco workloads need special treatment. And you may say, as a Google VP of Engineering, I don't give a shit about being, giving special treatment to any application type. My platform is generic and if you wanna use it, you can use it. Then you ask this question, okay, you know what? VP of Engineering, how can I do multiple interfaces for my telco workloads? And the guy says, this is kind of a conversation back to two and a half years. What do you mean multiple interfaces? You have only ETH, these are primary CNI. You don't need additional interface. This is what the containers are designed for. You say, no, no, no, no. Look at this 3GPP diagram. Have this architecture for IMS, have this 4G, 5G, and 6G coming service-based architecture. You see multiple lines going out of IMS? It could be a multiple interface out of this container or a service. And he says, no, no, no, we don't do that. And then you realize, okay, and you take this approach to your, you know, below the vendor, say, Ericsson Nokia. My dear NAP, for your 5G, 6G solution, I can only offer you a single interface. And I will not let you change the MTU. By the way, there's no such thing as VLAN. And they will tell you, sorry, I cannot work with you. Because there is a requirement pending in 3GPP specification saying that you have to isolate each interface for not only from security, but also performance perspective. Because say you're signaling sitting in a signaling domain, and you're passing through more, say, DMZ domain, okay? So there's a segregation of domains in telecom. And then these platform vendors could be Google, AWS, and Azure, realize this. But along, as Red Hat, we have been partnering with these vendors, what you're seeing here from Movinir down to Nokia down to Ericsson and others. We are working with these network equipment providers for these softwares to be able to deploy it on any virtualization platform. Could be OpenStack, could be OpenShift. So along with the 3GPP compliancy. So this certification, that's why it's important to sit on a platform so you can collect these metrics properly. So you can observe them, so you can scale them, so you can predict what's coming your way. All right, so far platform applications. What a special, you may say, give me more data points for being a special application for telco workloads. So what you're seeing here, the left side is taking an alert state for a diameter interface of a particular CNF. And the right state, the top one is AMF, sorry, SMF metrics. Bottom is the AMF metrics, taking from a production, and not a production, pre-production environment in our sandbox environment. So you may see different data. The metrics obviously showing all this 3GPP metric names, along with the time series data. This is long time series data. So metrics are usually time series data. Logs are sometimes embedding them inside, sometimes not. Alerts are generated based on the events, on the particular application as well as hardware. Able to use this back to the hint, you need to have a domain expertise. So us, Red Hat, we are a platform software company. Yet for certification, we are working with these NAPs. We are helping to use our platform and frameworks with their domain expertise for their data so they can create insights like this. If you're seeing, able to read this, this insight says that it's predicted network congestion will happen due to lower MTU size cost fragmentation. That's interesting. So there's another one. Increase in jitter and latency predicted in the next 12 hours for the UPF. So it is predicting there will be a traffic and jitter and latency issue with this particular UPF, which is a user plane function, which is carrying your user payloads for breaking out the internet. So how do they do that? Because this partner, which is A-OPS one, we have other partners like this as well, has born in this domain, telco domain, to do service assurance. They are so deeply talented and knowledgeable about telco data from OSS world. And then we come up with the use cases about their domain expertise, such as root cause analysis with network fabric. How can I predict and identify root causes tying into network data that we collect? Or it could be, for example, the NetObserver operator collecting EBPF data, could be switching routers, they're collecting their metrics from their serial interface. And VNF has seen a performance bottleneck, like you just saw on the UPF jitter and latency case. Hardware failure prediction, back to the storage case. If there's a heat increase, the last span of a storage goes down as well. All right. There are not many solid state chip makers in the world. There are a few of them. Most of the other NAPs are buying those chips from them and building their storage cards or storage solutions. The difference between each of these vendors, say EMCs, say Hitachi Fujitsu, is how that board designed to address what environmental requirement, say vibration, heat, noise, dust, okay? If there's not enough sealant or covering on top of the storage, and if you're running the storage solution at the edge, which is subject to higher heat and vibration than normal data center, that storage will not last long. So if you can collect that metrics and data, you can swap out right on time before there's an outage at edge for your local breakdown in your stadium, for example. All right, we are almost at the end of the time. So one of the key things is having a single pane of view from, say starting from hardware, with IPMI integration and so on and so forth, Dell, iDRAC, iLO, whatever you are using, and platform level with OpenShift, OpenStack, NVMware. And then collecting all this data and running through anomalies detection and offering you a single dashboard. Say I had, say, 100 OpenStack deployments I can see it's from single pane for the 4G, 5G sites. If you can see it, it says this one is a 4G site, the one below is a 5G SSA site. Wow, you see it here, actually. 5G SSA south, 5G SSA north, 4G core, east and west from a single dashboard. This is application later observability again. And back to Rage Access Network KPIs dashboards. If you see that this is the uplink and downlink delays for jitter and latency. And also integration into the other data sources. Could be Prometheus, it could be Grafana. And then going back to this is the 5G core and then inside generation. So if I wrap it up, what we have talked so far, or at least I was kind of monologue way, is we as Red Hat slash IBM offer platform, hardware, cloud level, observability, frameworks, tools, and knowledge to be used with telco solutions, telco CNFs and VNFs because we certified them, we worked them hand-to-hand. And application layer domain expertise for ML, AI and inside generation, we have partners as well as we can sit down with you. You can say that I wanna know because I'm sitting and living in say Iceland. Say out of the 12 months, I have 10 months of freezing. So I don't need to run my data center cooler all the time. I wanna have a predictive data center cooling based on the metrics I'm collecting from my 5G core deployment in this country. We can carve down a very easy solution for you to deploy and leverage that. That will lower your electric bill for per data center. All right. Any questions? This is so true. You cannot buy say 4G, 5G observability but they will try to sell you if you go AWS. We have SageMaker to build you ML AI insights. Show me what you're generating as insight for 5G run, 5G core. Show me what you're generating as insight for my scalability of Ericsson solution, of Nokia solution. Oh, you gotta do it yourself. Obviously I gotta do it myself. But who am I? Am I being as a single Verizon or a service provider? Am I being a partner of yours in this journey to build the solution and maintain together? If you notice in the journey of OpenStack, there were many distros. And by 2023, how many distros are left? Which one is the biggest for production deployment? Which one you can trust for your money-making applications? Same gonna be happening with the Kubernetes distro. Same gonna be with any enterprise application or enterprise platform, enterprise OS. There will be always coming, they call themselves disruptors. But the most important thing is, who will there be with you in the five years' time span? Five years is not a long for an application or a business. If you're not succeeding in life in five years, you're not succeeding at all. Any questions, come on. Challenge, say, Fatih, this is bullshit. Because this, Fatih, I have this problem, no one helped me solve it. Can you answer this question? Questions, questions, demands. If you have problems, we can sit down and articulate the solutions and see if we can come up with anything that will be a winning one for you. Come on. Feel free to reach out to me through LinkedIn or my email. A humble guy, not necessarily talking much in normal life. But I love fixing problems. It could be a technology gap. We can sit down together and build a solution and upstream together. This is what we do. We generate solutions and we upstream. We buy technology companies and we upstream. We don't do anything proprietary. Everything is upstream. What you may say, people call Red Hat, oh, Red Hat is not an open source company. That's a fully ignorant answer. Red Hat is an open source company. What Red Hat makes money is the enterprise side of the solution for giving you proper support, documentation and consultancy. Software, the technology, everything is open source. So the question is in the network fabric, look at the network fabric and look at open telemetry's aim and purpose. Is there a gap or they can come up together? Remember what I said about open telemetry. Open telemetry is not only about specifications or APIs, also the instrumentation ready to use by means of libraries. Say, network fabric vendors, Cisco's, Aristos, Palo Alto's, they are looking at those libraries to use them and instrument them or they can come up with their own instrumentation compliant with the specifications. We are working with them. Because at the end of the day, as I said, full-stake observers from network fabric down to OS, down to application is the most critical one to have, end to end. So we are working with them. And most of the NAP vendors actually going in a VNS CNFA, say virtual routers, virtual firewall, right? Running on our platform already. So we are already instrumenting them with our platform capabilities for observers. If you go to search for CNF-VNF catalog in Red Hat, you will see in search for Juniper, you will see a lot of appliances, the virtual appliances down there. Any other questions? No, I wanna be respectful to time. Thanks for coming. Thank you.