 Well hello and welcome everybody to yet another OpenShift Commons briefing. We're really pleased today to have with us Nox Anderson from the product team at Cystig. He is going to give us a walk-through and a demo of the container native monitoring solution that Cystig is, Cystig Cloud. And we'll set the schedule so that he gives his presentation, his demo will pause halfway through and ask a few questions and then go on. You please use the chat to put your questions in as we usually do in these sessions and at the end we'll read off all of the questions that you have that remain and have an open Q&A conversation. So with that I'm going to give Nox the floor and let him drive the conversation in the presentation this morning. Thanks Nox. Awesome, thank you Diane. So Diane and I met each other at a conference and she came over to the booth and was like I've been hearing about Cystig, all these people using OpenShift have come by and said that I need to look at this product and so that's kind of why we're here today. So a lot of times when people talk about and tell a story about an integration between products they'll use an analogy like burger and fries and these two things go great together. They're a tasty meal but what we've done here with OpenShift is actually taken a step further and just layered those fries on top of the burger. So it's super tasty, really melted well together and kind of gives you a nice unique meal about that you couldn't get from any other product. So the reason that we're all here today is that monitoring containers is different and microservices and containers break legacy monitoring and analytics tools. So what we did here at Cystig is built a new product through a unique kernel level approach that can natively monitor any infrastructure and app including container based ones. So before I go any further, I'm just going to give you a little bit of background about our company. Our founder, Loris Dejuani was also behind a popular tool called Wireshark. It's a network packet analyzer had 20 million open source users and what he learned there, he used to build Cystig which is a next generation deep system visibility tool and the Cystig open source product was launched in 2013, got a couple hundred thousand users. Its kernel module is the basis of our Cystig cloud product that I'll be showing to you later today and then also our new security product that we just launched. It's a super cool technology, really excited to get into it further. Since we've launched the Cystig cloud product, we have over 150 enterprise customers. They're really excited about it. It's really cool for the container market and one of the things that specifically people have really enjoyed is our open shift integration. We have a couple of really large enterprise customers. They're using it to keep deep visibility into their open shift environments and we're really excited to show you more about that today. So first off, as many of you have deployed containers, you figured out that getting visibility inside them is really hard. So legacy monitoring approaches can't see inside your containers and the things that make containers really portable, the ability to move them from your laptop to a server or from a development environment to production also makes it really hard to see what's inside that container. And so that process isolation is something that's really nice but makes it really difficult to pull metrics from the inside. One of the things you could do is connect to the Docker stats API, but then you'll just get something like CPU memory and network not really get deep visibility. So one of the things that people do to get deep visibility is install an agent inside each container. But if you're doing that, you're adding a bunch of dependencies. You're actually putting another process inside that container. So you're kind of losing the fundamental reason why people went to containers for the first place and stopping that process isolation. And then last, if you're using tools like OpenShift and you have a scheduler that's spinning up multiple containers and deploying a microservice environment, correlating your different containers and your different services and grouping them into a whole is something that really the legacy monitoring tools don't understand. So what we've done here at SysTig is built a unique system instrumentation which we call container vision. And this allows us to see inside your containers from the outside. So we can see network information, what containers you're running, how they're talking to each other, the applications running inside them, even down to or up to the orchestration tool that you're using to manage these containers and the network traffic and basically everything that's going on in that host. So what we're looking at now is just a really simple application that I've running on a host. And so you have a couple of apps, you have some containers with things running inside that and they're all talking to the kernel. And so what we've done at SysTig is built a unique kernel module and this allows us to see all the system calls that are happening on that host. So we're actually individually tracing every single system call that's going on and getting deep visibility into basically everything that's happening on that host. And what we'll do is install our SysTig agent and this runs as either a privilege container or as a process on that host. And that allows us to install our kernel module. And from there, we actually collect and analyze all those system calls and then send them to our back end. And you can use SysTig Cloud as a service or you can actually deploy the service as a whole on your own machines. So with that instrumentation, this is usually a good time to stop for questions. So if anybody has any questions about how we instrument the system, now would be a great time to ask. So Nox, this is Diane, can you tell us a little bit about what the requirements are for installing the kernel module? It seems a unique approach. So how does that work, especially with OpenShift? Yeah, so great question there. So really the only requirements are that it's running on a Linux host and then that you allow our agent to be run as a root process on that host or just as a privilege container. And then that allows us to install the kernel module. Is there any kind of performance impact when the SysTig agent is running? Yeah, another great question. So our agent is actually really efficient and that's one of the things that makes our product really unique. So our agent usually has somewhere between one and 1.5% CPU overhead on the host. So really minimal impact. And then just sending data to the back end. It's usually four kilobytes per second. So really minimal on that end as well. So thanks for asking those questions. So there's one more question, too. Okay. How does it integrate with C groups in... Judd is asking that question. And are you working with the kernel devs who are rewriting C groups? Or do you know this? The answer to this question. Yep. So that's actually a great question. We actually look at C groups and namespaces to kind of see what's running on that host. So we actually leverage those utilities and that's what we use to detect which calls are coming from, which place. We're prepackaged in Debian. We're working really closely with the REL team as well. We're prepackaged in CoreOS. So yeah, C groups and namespaces are a fundamental part of the SysDig kernel module. But are you working with the Linux kernel devs who are rewriting? Is there anyone at SysDig participating in that? Yeah. So we have... One of our advisors is Greg KH. We work really closely with the Linux foundation. We're also part of the CNCF. So we work really closely with basically that whole team. And here's my chance to plug the cloud native foundation for a second. Are you going to be at the CNCF day in Toronto coming up? Sorry. Could you repeat that question please? Are you going to be at the August 25th cloud native day in Toronto at LinuxCon and ContainerCon? Yep. We're actually going to be... We'll have a booth at LinuxCon and ContainerCon. We're going to have multiple members of our team out there. So if you're going to be at that conference, please stop by and we can show you some more. We've introduced some pretty cool new functionality to our open source tool. So that's another great thing that we can show you there. Cool. All right. Cool. All right. So to get back to the presentation. So this instrumentation gives us much better visibility and typical infrastructure monitoring products can tell you things like if a host is up, if a host is down. What your CPU and memory usage are doing, but they're not really giving you a good view into how your services are performing. So this instrumentation gives us that service oriented view. We also are able to auto discover your applications and then apply alerts across your whole infrastructure based on the metadata that we pick up. And then lastly, we're able to move past a spike in a dashboard and kind of get down to all the system calls that we're running when an alert fired. So to start off, I'll go to the service oriented performance management and a lot of you are deploying applications that look like this. So it's across multiple nodes. You have multiple containers on each node that make up different services and making sense of these services and how they interact with each other isn't something that you want to do on a host by host basis. You want to do it on a service basis. And what sys state can do is trace each individual process and aggregate those services as a whole and give you an answer to something like what's the response time of my Cassandra service that's currently distributed over three data centers. What are the slowest queries? We also have a real a deep real time understanding of the orchestration metadata. So working really closely with people like Docker and Kubernetes. We're actually one of the original members of the OpenShift Prime program that launched a couple weeks ago. We're really excited to see where that goes. I think it's a great place for anyone who's thinking about deploying OpenShift to go and learn best practices, see the different tools. And I'm really excited to see how that community grows over time. So getting down into how we work with OpenShift, you set up an assisted cloud project in your OpenShift environment. You have to modify a little bit of the security policies. And then the coolest part here is you install our agent in the Kubernetes daemon set. And then we get full visibility into your entire cluster. So it's really a single change that you make. And then you get complete visibility across all your hosts. It's a really cool installation process. When I go through it with people and they install and basically our whole UI fills up with all this information they didn't have before. It's something that is really exciting to go through with a person. So back to the application auto discovery. We discover all the applications that are running on your hosts. So no plugins required. We look at each system, call and detect those. And then since we detect those, we have a bunch of pre-built templates which you can think of as mini dashboards that will show you application specific metrics, metrics about how OpenShift is running, things like that. We also have zero config custom metrics. So if you're using StatsD or JMX, you don't have to install an agent. You can even write your StatsD metrics to local host or dev slash null. And since we see that system call being made, we're going to auto discover those metrics and allow you to alert on them, graph them, do whatever. And then last with anomaly detection, Cystic Cloud stores your past performance. And from there we can set alerts and actually alert you based on deviation against that performance. And then you can spin up new services and we'll auto discover those as well. So this is something that really excites our dev and ops teams because it makes it so they don't have to be in sync all the time. Developers can spin up new things without really having to like alert and slow down the process. So this is another good time to kind of take a little bit of a break. If you see any tools that you're using here, chat them in and I'll try to get to them in the demo and take a quick drink of water. Okay. So the last part here is trade shift and troubleshooting. And this is one of the things that I think is the coolest about Cystic Cloud. It's a really nice blend between our open source tool and our commercial product. And so when it alert fires, we can actually get a trace of every single system call that was happening at that point in time. And so you can see basically pass this bike in the dashboard, pull down that SCAP file and start tracing down to each system call that a Redis container made when the alert went off. I have a really cool example of that that I can show you later. So an example of this that we could also do is let's replay the state of my system last night when the auth service fired five minutes before the app went down. Show all the system calls from the containers that were destroyed. And so Cystic automatically understands containers, Kubernetes, things like that, and even can map each metric to the replication controller or the namespace that they came from. Okay. So I breezed through the slides because what we're all really here to see is the demo. And I'll switch over to that now. All right. Can everyone see my screen? Okay. Yep. Looks great. Okay. So what we're looking at now is a simple four node open shift cluster. And we get this deep visibility to open shift kind of through a lot of the things that Kubernetes uses as well. So I can drill down into any physical host. See all the containers that are running on that host down to the individual network connections between containers and then even down to the process that's running inside each container. And as you can see, this is a pretty messy environment. So open shift is scheduling containers everywhere. They're being spun up randomly on each host. It doesn't really matter where the resources are coming from. They're just consuming them. And where this gets much more valuable is when we can apply the metadata that open shift exposes. So if you look at this map now, we're looking at the four different namespaces that I have running. And I can drill into my production namespace and actually see how my different services are talking to each other. So this is completely abstracting away the host and giving me a nice service level view of what's going on. And I can drill into any service. See all the pods that are managing the containers. And then down any inside each pod down to the container and the individual process that's running there as well with the same network connections were going in and out. And any of these you can dynamically change to show whatever you want. So if I wanted to go from CPUs percent to the memory bytes total, that's something I could switch. If I wanted to look at maximums or minimums, I can do that as well. I can actually change the timeframe that I'm looking up in this top right corner. So a lot of different options. And then with the links, you can look at things like net request counts, the total connections that are being made, net bytes total. So a lot of different options here. Okay. One of the other things that our OpenShift users really like is the ability to look at how your different OpenShift services are performing. So if I click click on OpenShift overview here, I can actually see the top namespaces that I have running in terms of the amount of containers that are running on them. My top services as a whole. So I can see my Java service has, looks like six containers running my WordPress service. You can look at their request counts by the different services that are running containers per host, host capacity, utilization. What are my slowest resources here? So like this WordPress service is one of the slower resources, top pods in terms of CPU usage, top pods in terms of memory, so on and so forth. And then since we're on services, we can actually build really cool dashboards about how your logical OpenShift service is performing. And this is regardless of how many hosts or containers it's made up of. And we can actually get really deep network or database level metrics. So looking at things like SQL errors, average and max request time, the top queries that you're running, the top tables, slowest queries, slowest tables. So things that you would kind of expect from more of a APM like tool, we're going to pull automatically out of the box without any configuration on your end. So now let's switch to the Explore tab. And this is the place that our users use the most to get really deep visibility into your system. And think of this as a massive pivot table for your environment. So you can change how this table looks, drill down into different services however you want. What we're looking at now is kind of a view that you would get from a typical monitoring tool. So you can see I've got multiple hosts running and with multiple containers on those hosts, the CPU, the memory, the network metrics for each container. But where this gets much more valuable is when we can go and apply the labels that OpenShift provides. So since OpenShift uses a lot of the same things as Kubernetes, I can actually apply those labels. And what we're getting now is a nice table of all the different namespaces that I have running. And so I can actually open up my production namespace and see all those services that are running. And then you can go into each service, see the pods down to the each container that's running there. So one of the cool things we can do here is let's actually look at this WordPress service. And so when you select a service, a table opens below. And think of this as a bunch of mini dashboards that we automatically provide so you can get information about how that service is running. And I can actually drill down further with this WordPress service and I can click on an individual pod and I'd get the performance of how that pod is running. And this will dynamically change. So you can do it from a service, you can do it at a pod level, or you can do it down to the individual container. And you'll automatically get things like CPU, usage, memory, network. But since it's a WordPress service, we might want to look at things like HTTP-specific metrics. So if I click on HTTP here, we'll start getting things like request counts, error counts, average request time, top URL, slowest URL, status codes. And any of these applications that you see on the side will automatically build these mini dashboards for you and allow you to really get deep visibility with no work on your end to see what's going on. If you see anything interesting here that you might want to look at later, it's really easy to just click pin to a dashboard. And then this specific panel that you're looking at can be pinned to a dashboard for you to monitor over time. So one of the things I'll do is if I ever find a bug that is something that pops up more than once, I'll pin it to the dashboard and I basically just have a bug's dashboard. And then I can go flip in, check that out. And it makes it a whole lot easier to keep track of different things that are going on and allows me pretty nice flexibility. So one of the other cool things we can do is our alerting. And so if I click on this alert, add alert button, it's automatically set this scope to my WordPress service because that's what we've drilled into. And we have a couple different alert types you can set. So a manual alert, this is your basic threshold type alerting. You can look at baseline alerts. So this looks at the past performance and tries to spot anomalies against it. Or you can do a host comparison alert. We also have advanced alerting capabilities that are kind of familiar to anyone that have done Boolean logic alerting. So you can look at things like, I only want to be alerted if my transactions are down and my memory is high. This really helps reduce the page of fatigue and lets you really fine tune the alerts that you set for your service or for your host or really whatever you choose. From there, we can send the alerts to Slack. We can send them to PagerDuty. We can send them via email. We're going to be sending out a VictorOps capability for alerting as well. And then we also actually released a really new cool Slack integration where you can actually route anything that goes into Slack to SysTag cloud. So I'll cover that a little bit later. The last thing here is the SysTag capture. And this will give me that trace file of all the system calls that were going on at that point in time. If I enable this, I can change the name of the alert. I can set the timeframe that I want that capture to run for. And then that capture will be taken and dropped to an S3 bucket. You can keep it stored in SysTag cloud. You can configure your own storage endpoint. And this allows you to really pull down that capture file and be able to go through and trace it without even needing host access to that environment. So it makes it a lot of times like I'll get locked out of a box or I just don't even have SSH access. And you can just pull down that trace capture and really get down to what went wrong. The other really cool thing is a lot of times I'll just kill a container for a problem. And then I can go back and look at that trace capture and get a really good nice deep picture of what's going on. So if I switch over to my terminal, this is a capture that I had previously downloaded. And if you've ever used Ahchop before, this is a really similar look and feel. So what we can do here is so you can see I've got, this is a view of all the processes that were running on the machine when that alert went off, but I can apply different filters. So let's go and look at the containers that were running on that host when the alert went off. From here, I can actually drop into my Redis container, see that single Redis process that was running. Maybe I want to print out all the system calls that went on. So now you can see these were all the system calls that that Redis process made when that alert went off. So it's kind of doing a heartbeat, getting the time of day. It opened a file and read from it, it closed it. And this allows me to really get a nice picture, especially in containerized environments, of what was going on. Another cool thing you can do is actually inspect each individual Redis transaction that went through. So let me make this a little bit easier to read. And you can see that like 26 bytes were read from this host over this port, the individual Redis transaction that went through. And there's a lot of different things you can do with that open source tool. We have a ton of videos on YouTube, a lot of different blog posts about use cases for it. You can actually go look at really a host of different things that were going on. So I could look at the different system call types that were made. I want to see everything that did a right on the host. So really any you name it, you can kind of pull it down from there. All right, so let's go back to the application and look at some of the things that we can do with custom events that are running. So we automatically will pick up OpenShift events out of the box. And so you can look at things like pod restarts, containers killed, set alerts like that, go on an alert, see the specific chart. If I click explore, this will actually take me back in time to when that alert was fired. So a really deep amount of capabilities here that allow you to troubleshoot really at any point in time. Since we store all data for Ever Insisted Cloud, you're actually able to go back and troubleshoot things really easily. Now the other thing that we have is custom events. So I actually don't have any custom events running in this environment now. But what you can do is add this description basically to anything that you're running and then that'll send custom events to Sysdig Cloud. One of the really cool things that we added recently though is the Sysdig Slackbot. And then anything that you send to that Sysdig Slackbot channel, it'll route to the Sysdig Cloud application. So we'll have things like an alert from a logging system go in. That will be overlaid on the chart. You can click on it, actually see the URL of the individual alert that went off there. And if I can actually show you a little bit here about some of the events overlay, if I switch to two weeks, you'll get a view like this. So if I go over, I can see there was a custom event that went off here, an alert fired. So the cube demo was load network on the Redis pod. Really anything that you set here. So there was a little outage that went on here in our demo environment. Those things are automatically fired. And so it allows kind of you to correlate something that happened outside of your environment. So it could be a code push, really whatever you want with the metrics that we're tracking. And that about rounds up for what I have to show you today. I'd like to open it up now for questions. And I'd really recommend you starting a free two week trial with us. It's 15 hosts for two weeks. You can deploy it as the cloud service or on-prem. And if you just go to sysdig.com, it will be on the home page and it's really easy to get started there. But you can actually go to the OpenShift hub. We'll kind of have more OpenShift specific stuff there. And then we have links to where you can start a free trial. So thanks everyone for joining. And I'm excited to see what questions you have. So this has been awesome. My mind has completely been blown here because this is the kind of thing that, you know, often people ask us why we don't have this stuff built and baked into OpenShift already. And this is why we're not reinventing the wheel. People like you guys are building awesome tools to work and monitor and manage the back end and all of this stuff. I'm actually kind of interested. We've got a cluster of OpenShift on OpenStack going up on the CNCF test cluster. So I'm thinking that we should probably try and hook that up to sysdig and be able to showcase some of what you guys are doing on that cluster. I'd love to see what this looks like on the back of OpenShift dedicated or OpenShift online. I'm going to go back to our ops team and see if they're using it already. I don't think so yet, but I'm going to send them this video and see what we can do. There are a couple of questions. Judd, I'm going to unmute Judd and let him ask his questions if he un-mutes himself and put a couple in the chat. Yeah, I can see some of the questions that you asked. Hey, Judd, how's it going? Good, man. So just to add a little humanity to the video for the folks who will see this in the future. I'm really curious how you implemented the collector of the storage in the front end, hoping that it's also microservices in the back end so I'll be able to scale this. Yeah, so our back end is the Cassandra cluster and then we have a little bit of MySQL in there as well for events, correlation, and things like that. Basically, if we send it to you, if you choose the on-prem option, we basically package up our entire application into containers and then you can scale it however you want. So it's a Cassandra cluster in the back end. What? Can I scale it with OpenShift? I don't know if anyone is currently doing that on-prem, but it's probably something you can do. I think my customers are going to want it. Okay, cool. Another science experiment for me. Yeah, we have a couple pretty large OpenShift environments that we're working with now. I can definitely connect you with someone on the sales side who's managing those relationships and you can kind of hear what they've learned from those. Really, whatever we can do to help you guys. Absolutely. No, because that'd be super awesome. Now we've asked the question about the sales side of things. What part of any of the SysTig offering is OpenSource? Any of it? Go ahead, sorry. Is any of the SysTig stuff any of this code-based OpenSource? Yeah, so the kernel module is fully OpenSourced. You can see that on our GitHub page. That's kind of the basis for how we collect all this information. The SysTig command line utility that I showed is also fully OpenSourced. We actually released a really cool functionality last week that allows you to trace basically any individual transaction through the OpenSource tool. So I could tag a method. I could do a network request basically and see how long each one of those takes to basically go through the system. So that's really cool. Our security project is also fully OpenSourced. That's called SysTig Falco. That looks at anomaly detection. So you could write a rule that's, hey, if someone runs a shell in my container, alert me, or if Elasticsearch has an outbound connection through a port that it shouldn't send me alert. So really cool stuff going through with that as well. The cloud product that we showed today is not OpenSource. That's basically a commercial entity. But besides that, basically everything else is fully OpenSourced. Totally. That's awesome. So how about if you share your screen and show us where someone could sign up to get the trial and where they can get more information about the SysTig offering. So we'll end on that happy note because I think that's... Okay, one more question. Oh, he's got one more question. Okay, go for it, Judd. My standard reference architecture that I'm deploying OpenShift on is offering 96 cores for potential VMs in containers. Do you guys have a rule of thumb with how much infrastructure I'd need for an on-premise SysTig solution, considering, say, that size deployment? So including where the SysTig backend would be run? Yeah, yeah. I'm not sure about that answer. I'll definitely have someone follow up with you about that and kind of look at your reference architecture and see how we fit in there. But I don't have an answer for that right now. So what we're looking at now is just a home page. If you click Free Trial, basically sign up. So first name, last name, email, and then you can get going. Perfect. And then the OpenShift specific installation instructions. Once you start your free trial, there'll just be a button that says install, like OpenShift install, and then it's a couple things that you need to change with some networking and security functions, and then basically use the Damonset installation and you're done. So pretty cool install process. We like those easy ones. Yeah. This is awesome. So really, Nox, thank you for coming in and doing this. There's one other thing you said you would show us too, and I think that just the Slack integration, which for the alert. I think there's, I thought I saw a blog post about that recently as well, but is it a blog or a doc? Yeah. More useful things that's coming up and requested often too. If I click on here. So yeah, the really cool thing with this is basically you can design a Slack channel to send anything to Sysdig Cloud. So I could write in something like I had a big Chinese lunch today and I might be sleepy in the afternoon and push some bad code. And then that'll be sent to Sysdig Cloud. So what a lot of customers are starting to do is take everything that they have that sends them notifications and then pump those into Sysdig Cloud. So then they can correlate basically anything that's going on with the performance of their environment. And this just makes it really easy to do through Slack. And there's a couple of different examples here. It walks you like how to do everything fully. And basically like you could do support tickets, really whatever you want. Awesome. That is really nice. Event correlation is super important for the full DevOps solution. Yeah. Yeah, we this is something that customers had requested a lot from us. And so our founder actually kind of took this on as a weekend project. And now it's one of like the coolest new features that we have coming out. You get a lot of founders that do that on the weekend. So this is this is a pretty awesome tool and it's wonderful not to have to reinvent the wheel and to have this available for OpenShift. And all the Kubernetes stuff that's under the hood at OpenShift as well. So thanks for doing this today. And when you get to your next releases and your new features will do that. And I'm going to reach out and see if we can't hook this up to the cluster that's on the CNCF Cloud that they're standing up for us. So that would be another way of showing some more deeper dives as well. So that could be a future briefing. And we show that something more than WordPress. Yeah. But that's the word. I'm not. The WordPress is everybody's favorite thing to demo. And nobody's favorite thing to actually use and deploy. So it's all good. Thanks so much Knox for taking the time today. And we will push this out through the OpenShift blog and on our YouTube channel. And we'll be back again next week with another session with the OpenShift Commons. So thanks for taking the time to watch this. Thanks, Diane.