 And welcome to the homelab show episode 119 net Danny net data interview with Costa here. How you doing Costa? Hi nice to meet you. It's very nice being here. Yeah, he is the chief chief trouble maker I love the title. He's like is this title alright? Right on with the branding over at net data Yeah, it is how you do some troubleshooting is I've talked about this in videos Jay's done videos on it. We're both big fans of net data So we're really excited to have Costa here because that data does something for those you that you haven't watched So say you didn't watch all the other videos we've done or listen our discussions on it in brief I would say net data is one of the easiest to deploy tools to get you metrics out of your Linux systems And it is just pre-templated easy But it gets really a lot more in-depth pretty quickly because of how much it does with so little from you This is actually I We had to stop talking before the show because I was like oh man I could go into detail why I love net data compared to other similar solutions simply because out of the box It does so much which is where probably we should start That is kind of how you came up with net data you you started looking at solutions You you know you had a problem. This is how wonderful open source projects start You had a problem you needed to solve you couldn't believe there wasn't a solution and so you started building one So the idea Well, everything started because I had some issues So I was migrating some infrastructure a fintech company from on prem to cloud And we were facing really a lot of issues random. It was not predict. They were not predictable so suddenly suddenly Friday or Thursday Everything were going slow or not didn't work and we couldn't figure out why So The same application was working on premise. So I started buying tools Installing them building a team for net for monitoring and the likes and after a lot of money and quite some time I Concluded that, you know, I had built this video walls with all the dashboards and the likes everything was there every tool That exists is a lot of consultants and the likes But we couldn't find the problem. So the problem was still there So the monitoring systems could detect that something is wrong But they could not tell us anything about what could possibly be wrong so I had the feeling that Everything all the monitoring systems are there to feel to make to make us feel happy. So they satisfy the need We have for monitoring without actually monitoring then immediately I started thinking, okay What does monitoring mean? What do we need to monitor? How does this thing needs to work? So then I decided that Since my engineers end up using console tools to actually see what's happening. I Decided that the granularity that we need for all metrics as a standard is what the console provides per second And we need as many metrics as the console provides. That is unlimited If there is a metric of some kind we need it Then we need a history for all this So I started building a tool to actually kill the console access. It was not a monitoring My intention was not to build the monitoring tools. Come on. Let's eliminate the console We need something that can have history Present all the metrics all the information that is there in real-time exactly like the console so we can grasp we can understand What's happening there? This is how I did I was born to what happened at the end is that I found from within the VMs of the Cloud provider that the cloud provider was performing updates on the VMs and They they were freezing the VMs for a few seconds one second two seconds So the whole infrastructure was like in slow motion Like this interesting. So there was no error No, no exception. No issue nothing But with net data, I managed to see the gaps because net data has a fixed So if you go with Promethean for example, they smooth out this micro latencies between data collections So you cannot see them but in the data this latencies becomes gaps in the chart So what what gaps how I have gaps and they have them everywhere So this is how I found It's really interesting And I think a good point is not all the some of these other dashboards were built by managers who think they know what they want But the people who are facing the problem, they know what they want and you're right having that time series To show the gaps as opposed to smoothing that that's huge That's one of the reasons I was able to find you an obscure problem in ZFS with one of my journey systems was the way Net data presented it gave me a better idea of oh wait. I can't pull these files at speed I ended up doing some bug reporting for the way Linux and ZFS interact for decoding things Because the only way to see it was like wait this process is supposed to be multi-threaded But it's not multi-threading how can I see that well net data shows one thread getting pinned not all Every time I moved this file But it was tricky to find because once you've moved the file once it goes in the cache and when it's in cache It's multi-threaded, but it's the first time read a single thread it is those are those fine-grained details that I can't name another tool I would have been able to really outside of actually raw looking at processor usage while I move files Of course, I thought I was dealing with a hard drive issue So I wouldn't have thought to go to the processor, but those slices that gives you over time really are You know what we do for within a data is especially for physical servers is amazing So if you have physical hardware, they have a tone of matrix a tone of matrix really thousands of them that are just zero They are errors of the PCI bus errors of the memory modules errors of this of the You know all over the place for every component there are sensors and there are error counters If you check this the Charts are always zero There is nothing Because there are no errors what we do now is of course we don't collect all this so no We collect all these thousands of metrics every every second So we go through the process of get the metrics put them in the data fetch them etc But we we don't store them if they are so the chart will appear the moment. It's non zero You get it yes, and what we also did because this is this is very important What I wanted is when I have network interfaces disks or whatever I want to have component monitoring So let's monitor a disk. How we can monitor a disk holistically the best we can what I learned should that it should be there for that disk so in a data is built from the ground up While all monitoring solutions usually go Helicopter view Let's give you a helicopter view so that you can understand what's happening without much information So I wanted something bottom up and I think this is the best way to actually monitor things a bottom up not top down Oh, yeah, I agree that completely You're on mute J. Oh That's actually I was good. I was on mute So I didn't interject at the wrong time So I'm just going to say I think one of the things I love most about net data is how easy it is and Sometimes some people could take that for granted before I say the rest of what I'm going to say though I do want to say that data did not sponsor this podcast by the way We have Costa on here because we legitimately use it. I was looking for guests and I have two more in addition that are coming I can't say who yet, but we reached out because we legitimately use that data in our business our home lab and Coming from an MSP space I remember going to this convention as an Amazon web services convention and I was there And there's this monitoring system and they try to someone tries to pitch me on it I don't even remember which one it was now and I thought at the time I thought it was great So I get back to the office. This is before I was on my own and I call them up I'm like, yeah Let's let's look at this and it was going to require like I think a $60,000 statement of work just to install it and it was going to require more aws Infrastructure be created to hold the house the system that they want so they want our aws account But they're going to put everything in there for us and then we have to maintain The servers that are maintaining the system that's monitoring the other stuff. It just seemed like a big disaster to me so having a Solution like this where somebody can download it and install it and You know, they don't have to like sign up. They don't have to have an account. You could just use it You can just install the repository. That's great and it makes so many of these Checks so much easier like you I don't know if you know this but net data has invalidated a Python script that I'm very proud of So back when I was managing Nagios for a company they already used that before I got there I Created a Python script that would Make an API call to AWS Find out if there's any new servers if there are it would have a ginger to template for Nagios host file And it would automatically add a host file for every single node and it took me I think two or three weeks To build this solution and then that data just auto detects everything And I don't even have to do anything at all and another thing I like about that data too The alerts make sense because sometimes people may not know at least when they're starting out They know they don't know how hard this is because at first it sounds easy I want to know if SSH is available. Yes or no, I want to know if this service is running I want to know if the hard disk is beyond a third certain Percentage thrust threshold that I'm not comfortable with and give me an alert that happens. That's all easy, but when you get into the you know, you're into the trenches on this you find out it's not I had a Client one time and this is kind of funny looking back wanting to know why The system wasn't detecting the CPU properly It was detecting I think the CPU at like a you know lower percentage or a higher percentage But this this computer was good or the server was completely topping out But the check checks the CPU percentage and Costa probably knows where I'm going with this At that point in time at that one second So if it hits the CPU if it's like a five check thing And it hits five checks and the CPU is not even like at a hundred percent Even though normally it is most of the time, but it hits it at just the right nano second It's not going to alert and I had to explain the client Well, technically the alert is operating normally because every time it checks the server It's at a lower percentage. It's just the problem if you watch the CPU on a Linux server You'll see it go hundred percent back down hundred percent back down It's like a new task enters the queue it just spikes a processor for you know half a second So it's always going to fluctuate and then I realized at that time that Checking the CPU percentage is not a good idea and from there It's like there's all these other kinds of edge cases, but it's great to have a tool that You know knows about edge cases already and not having to think about that It feels kind of like I'm lazy now When I was building the initially the first days of the tool I was installing it on all the servers we had hundreds of them For me, it was crucial not to do configuration on every server So I wanted the tool to actually do just install it and do everything. What is there? Yeah, okay, I detected this I detected that let's start monitoring that and proceed So if you if you go to even to the largest enterprises today They use the same components. We all use so they use an engine X server or a post SQL or a broker or Redis or something they use the same Linux boxes that we all do the same virtual or physical hardware So it's the same thing of course when you put all the Lego together Each one has a different building But the components are the same and the question is why do you need to actually? Configure again and again what the disk is what an engine next it's stupid Solutions do it. It's a waste at the global scale. I can't believe it That makes a lot of money when they sell that state Is my engine next different that yours exactly you have so Different needs that mine from your web server or your database ever Come what your next isn't being used as a proxy or web server and it's being used as something else What's it being used for in the logs are the same? I love that that data just pulls from whatever that engine you have on there Like you said it looks at what exists and says hey, let's you have an engine X You have a my SQL database for whichever one you're using and starts putting it on there I think that is just the beauty of it is that you don't have to configure it It's already seen someone in the comments talk about other tools and all those tools They may be able to produce some pretty graphs, but they also come at the expense of your time and time is especially when you're Repeating this across fleets of servers when you get into the large-scale admin space. That's fleets of time Have a really a funny story really quickly about about this so I Hired somebody or I didn't hire them myself, but I was part of the process and they came on the team and This individual is really sharp really smart and his is gone to incredible links in his career today But he was just starting out so he gets the project to set up a graphite and Grafana and I'm like, okay, that'd be a good first project for him and I was having trouble with it This guy he he was able to get in there and really understand it, but there's some things he didn't understand So he's looking for a book. He didn't find one. So he finds somehow an author That's in the process of writing a book. So he's so starving for information for this He finds somebody who's in the middle of writing a book and he's like, I really need information on how this works Can we just like you're like while you're writing it? Can I just and then eventually he starts helping out with the author Testing some things for the book because there just wasn't that much information but the setup process And to the first point these are great solutions I'm not saying that nobody should use these and if you set them up They're wonderful, but if I have a video to record or I have a project I need to get on to I Mean I'd rather not do all that work and if you have the time to set that up It's great, but that data is just easy like like I don't even I never had to read I mean I looked at the documentation creating the video hosted. I've done but to actually use it I never had to look at the documentation other than maybe the One-line install command that was probably do you know what we see we see even in big enterprises Fortune 500 companies that use my data and they have some other setup there the amount of Let's say not the insights that the data can provide the issues it can detect are unbeatable When you have a system from this or Grafana for example are amazing. I have great respect for them They are great for customized ability. You can do whatever you want, etc. Etc. Etc. Etc. But if you try to emulate the completeness that the data has with them You are gonna spend years You need all the expertise that we built into the tool you need to replicate it So even for biggest enterprises, what do we see is that they install the data and suddenly they say oh, yeah We had a problem no monitoring system found it except in the data But I mean people are using net data's Technology whether they realize it or not because I just love the fact that I read in a blog post of yours or one I don't remember who wrote it but it basically was talking about the new journal plug-in and I can't remember the verbiage But the article was saying that you know parsing the journal is slow so that you guys didn't like that So you submitted patches to system D to make it faster Faster when I want to go to do whatever it is they're doing and it's because net data saw a problem and then Did something about it and then I saw another forum post where somebody was Asking about something and it wasn't even a net data forum and someone from net data found it like hey I could probably figure that out for you and maybe do something to get them in the code base. I'm like gosh That's that's been the trenches. I love that, you know Even the entire system did journal thing was like that if you check the repository of net data We have our own log management solution. It is merged into the code, but we don't use it because if you think a bit system the journal is everywhere everywhere and You use it today like it or not. It's there Right the ability to give you access to your logs to analyze your logs Without you doing anything system the journal plug-in doesn't have a single configuration and item nothing You can configure nothing. There is nothing to configure So the fact that you will give net data gives you access to your logs on your entire Infrastructure without you doing anything is the most important thing. I believe Yeah, absolutely This is why it was very important to make system the journal fast We found the bugs we fix the bugs We submit them to system the journal and we managed to bypass in our code base the bugs the slowness for all the system the versions so The the current version of net data on all system D the slow one is fast Yep, and I think this brings up to an important point So so you come up with the problem we have a problem you come up with a solution that isn't that data and You know, you're working in the open source ecosystem here and you decided that this should be open as well So leave us a little bit on there. You're kind of history with the open source world So, yeah, you know open source for me is like a miracle. It's I Own to you to eat everything that I am so far. I have been a sea level exegative My good career, etc. In my life I believe that I own everything to open source and if you look In a hundred years from now, they I am sure they will say that one of the biggest miracles of humanity of this period is Open source So for me, I always tried in my life to give something back. I had I have a few other projects That all of them have some success and the data skyrocketed So I feel that I hit the nail. I think or something So, yeah For me, it's very important to be open source I believe that the open the open source movement is what makes us humans It's the opposite of what art is doing. Look at movies The the the rights they are in the licenses and the likes it's a mess and actually They are there to limit access to Art or whatever it is a movies or whatever The this thing for techno of my cat. Sorry guys I mean, we're I don't yeah, I don't understand the same thing because we're a tribal species We've evolved from you know tribal man where people are in groups and they're you know Sharing with each other and then now nowadays if we watch a movie at the theater that we really like we feel inclined to tell people about it It doesn't do us any good, but we like to share that we like to spread what we like and what we enjoy and do that in a group Unless you're a complete introvert, that's totally fine, but more often than not we like to you know Recommend a movie recommend a video game or we like to share this share that So I feel like open source just makes more sense from a human perspective Yeah, it lets the pleasure of it come out because I instead of Like saying oh, I have this program. It does a thing I could say I have this program it does a thing But check out the code I wrote here check out this function I did and oh you have a better idea for the function. Let's see it. Let's do it Maybe you can save me some lines of code and I just don't think You know proprietary computing is just going to stand the test of time to your point I just don't it really look today what open source does especially for businesses Is that it's a way to change how you Go to market This is the biggest difference at the end of the day. Initially it was community only community Linus and his friends doing this the things etc etc Today, however for open source projects this changes. So look what happens Let's assume that in a data. It was not open source How we would make a fortune 500 company open the doors to us impossible impossible As a startup, you don't have access to this But today they come to us With open source. So this changes the dynamics Changes on what access good projects have on on the market Now the tricky part is how to make money because when you are funded you need to make money So not only that guys the the project in the data, for example has been funded We have spent already 30 million dollars to build this to build it. It's an expensive thing It's a big team Behind the scenes that needs it and offers it for free good coders is expensive That is very very important. Oh, yeah The investors went something back Come on So the idea is that you need to find a way to monetize it While being open source. So I don't want to to get money from the people that cannot pay I I love this being a gift to the community to the humanity But at the same time people that make money or can make money out of this Or they want to use the most advanced features of it Come on. They need to support us because we need The next round of funding and we need to progress and we need to have a sustained company to continue support this and improve it Yeah So this is the tricky part and we're trying to find the balance. We have not found it yet So we're not yet at the at the moment that we can say, okay, we have it But I really think the way your cloud collector works and you know subscription model you have for that I subscribe to it. I have my servers tied to it. I really like it I I think that's one way on there, but it is it is a really tricky balance a lot of companies that are in the open source They sometimes have a hardware component They sell the problem is a hardware is easy to commoditize and someone can undercut you and it becomes very very cheap Um, I really like these support and subscription models and I've seen some success with that I mean red hats famous for it. They're they're pretty much like set the standard for how it can be done in the market but it's still not um It's not easy and I see a lot of people will complain about some of these open source projects I'm like contribute back. You don't complain figure out ways to contribute back figure out especially, you know I I was recently in another debate with um a large company That their developer was complaining about things not being updated that they needed specifically for their company Which was a unique gas but they didn't want to pay for it and I'm just like well You should buy support licenses. Oh, we would we have like 100 servers I'm not buying support licenses for 100 servers I'm like you have 100 servers and you you're complaining and don't want to give money back I'm like don't be the problem with open source. You know, I literally work for a big company if you have 100 servers I usually tell companies if you adopt an open source solution from a proprietary one Obviously, you're going to save money on licensing So take a percentage of the savings and give it back to the project that Helped you save the money in the first place. So if you're going to pay $10,000 For a license to something give a give a couple thousand to the project that you ended up going with I mean, you're still saving seven or eight thousand dollars. You can still brag to your employer You don't have to nickel and dime anything. Just give back like like you said, tom I just I really appreciate you saying that because it's just a mindset I just don't know if people realize it. It's not like they don't all want to help It's oh, yeah free thing. It does the thing I can move on to the next project But right make a minute if you're saving money just let the just just help them out I mean, it's I think it's yeah Should be like the the normal thing that companies do going forward if they have the extra cash They're saving money. Anyway, just just throw some money at them, you know, especially for projects like net data our primary goal What we try to do every day is simplify the lives of engineers This is the goal So when we look at the problem, we don't say how we're going to look up the customer to us How we're going to make it difficult to buy services from us, etc We are trying to kill the problem for everyone Now this is something that Especially for open source should be to my understanding people should Respect that they don't always do and in many cases, you know, even On reddit, etc. There are a lot of people that object about a tone of different things uh But I think the currently the open source movement especially when Some people have put money and they are trying to Fund to support open source. I need the it needs the love of people and if you think people don't love it enough Don't support it enough I've eventually this company will vanish. It's inevitable I think your business model is going to be one that's going to be like Extremely common at some point because I think companies are going to realize that It it can work but often doesn't work if you sell it to the cto But if sell is in even if it's free sell and air quotes, but The cto will then say yeah, I want to go with that and then force everyone underneath that person to go with it whether they like it or not, but if you go to the system administrators and Fix something that they hate Then they will recommend to the cto at that point You don't have to deal with resentment because what can happen is the system administrators are like really we have to use this We don't want to use this but we have to use this and now they're upset But if they like the solution and they battle test it and then they recommend it to someone above them I feel like that's always going to work out better every single time It's more complicated What I what I see a lot of times is that people that work within big enterprises have trouble convincing their managers About the need for an open source tool and being Open source allows them to use your limited but allows them to use the tool and get some benefits out of it because the moment You go higher there are policies and you know decisions and whatever else that prevent them from buying this thing And I am okay with this. So to my understanding the data is a tool that Saves time from engineers So that's the balance. That's the balance. Do you don't want to save the time? It's okay Go from materials graphona do something else. It's okay great tools amazing tools if you know what you want to do But if you want to save time and you want to do it to today Let then this tool can help you and even the you said about the cloud offering Look what we did if you go for example to data dog All your data are going to data Data dog has the database What do we do and actually this was one of the trickiest part to be higher resolution High cardinality high granularity in a data without limits. What do we do there? Every new data is Monitoring in a box. So you have it database query engines everything is there visualization. Now you have a lot of them You can have parents parents are nodes or if you have a similar ephemeral nodes for example kubernetes cluster you need Something outside that cluster to have the data because the cluster may vanish at any time at any point in time, man So but these parents can also be used to bridge centralization points within your infrastructure as many as you want Now what we do with the cloud within a data cloud Is that we query your database servers? We query your net data To actually give you the aggregated view that you want The metadata that we have in a data cloud are tiny We only want to know which are the nodes Where can we access them stuff like this? Which metrics they have but not the data We know that it collects system cpu, but not the actual metrics So the idea within a data cloud is how we can allow people to have all the data fully on prem Nothing exposed on the internet or or on our source offering I and at the same time allow them to be high fidelity Unlimited metrics per second everything machine learning everywhere, etc And give them an infrastructure view and access from anywhere role based access, etc from a source offering So it's complementary. We're not replacing the data agents Everything on the data cloud is just an addon a layer on top of what the data has already And I think from a fundamental standpoint Having it distributed like that makes it so much easier I mean, I've worked with and dealt with some large enterprise companies that complain about how large their data lake is And the amount of storage it takes and how it's and it's a challenge for them because As these companies grow sometimes even the queries you run against them start taking more and more time And it's because they've tried to go Hey, let's take all the data and we're going to stick it in our data center And we're going to take everybody's data and when they get really big this is a challenge But first you've eliminated the problem of that which lowers the cost and it's faster This is a better way to do it. It doesn't take much compute on an individual basis I can set up a thousand of these and it's If you think of it the way you connect your data together with parents, etc And you use the data cloud what you are actually building is a huge Pipeline a data lane This is it. Yeah, but the good thing is that it's totally distributed not all data are in one point So all your infrastructure is doing part of the work Not only that we use usually in a data users resources that are available and spare So it is installed on all your servers. It takes one percent cpu and a hundred megabytes of ram Come on. This is spare already everywhere So this makes monitoring a lot more higher fidelity But a lot more cost efficient at the same time. Yes So High fidelity lower cost Yeah, and this is why this model that we have I believe is is the winning model eventually this thing will go on It's it's inevitable because the other has huge tremendous scalability scales Look at the egress costs. Let's assume that you were in AWS and you have service in AWS What are you gonna do about the egress cost monitoring is a steady beat since data? All the time. What are you gonna do? What we allow is put a pattern there Huh, what you are gonna pay for egress is only what you view not for the whole Right Streaming instead of being stored, which is brilliant that through net data cloud it streams But it doesn't save anything there, but you can still see it looks like Net data has all your data, but but they don't the data is actually coming from your server not from that The only thing coming from net data is the ui and the logic around there the algorithms or whatever you guys do But even the alert that's cold if you want to know what you guys do But yeah, it's like easy because you don't have to worry about any of that It's just so machine learning alerts Everything is at the edge within the data everything is spread all over the place in your infrastructure So this makes it a lot more affordable at the end of the day a lot tremendously Absolutely, that's a great solution and I've been using it for a long time. So I think one final question I have I seen this popped up in the chat. What about uh, is there plans to I know it's a completely different beast. Is there plans for a native windows agent? I know there's ways you can connect windows. I I've seen that as an option But is there any plans to do any windows agent monitoring? Yes, guys, of course, we can port the data anywhere The problem is that we are 30 people. We have a huge plan Ahead of what needs to be done for the things that we do Support open telemetry support the support that more collectors improve now we're putting dynamic configuration. So you will you on the next version of the data you will be able to Configure data collection and streaming and all alerts from the ui No need for if you don't use ci To push configuration files anywhere you can use it from the ui. So all this requires resources So eventually and at the same time we have a runway that we need to cherry pick what will give us right most of the revenue to sustain and convince our our Investors to actually invest more So what do you have about that though? What if a company like really wants this featured? It's the only thing stopping them from using it because they have a mixed environment and then they say We'll fund it. We'll throw some money Every guy's a hire someone or something to get that done. I'm just curious because maybe somebody It depends so if something is missing from a customer What we do is that the first things we evaluate if we are going to reuse it Are we going to reuse it? Is someone else need does someone need the same? If yes, then we do it. He will pay some of the cost and we will do it If not, then it depends how critical is this customer for us Right, but generally I'm trying to avoid this So rather than One company wants for themselves that nobody else We have something that's going to be for everyone Yes, we have said no to some very big contracts in the data mainly because they wanted to build Something on the data that I believe no one will meet and I said no come on. There is no point I think that's that's a hard thing and sometimes that's what you have to do to make the project thrive is you have to say no to some things Yeah, sometimes you have to say not right now to some things because something's like, you know the windows functionality I think I've seen you guys talk about it in forums before in the past, but it's um, yeah, I get it I mean, there's going to be some some required things that have to be done But the very fact you guys are looking into this even if it's interesting. That's great. Yeah, I mean, it's it just Shows that maybe that's something we can look out for maybe we'll have another conversation at some point and that'll be Yeah, that'll come up Yeah Well, thank you very much. This was wonderful We loved having you on and learning about letting that data and everyone should you know, we've got some videos on it Go sign up. Uh, go just download it one line or install Which is beautiful by the way that you could just like do the one line install and it just figures it all out for you I'm pretty sure you guys have sensible roles too. Don't you I thought I saw Yes, of course, of course you can install the data with ci etc. I can actually you can configure all collectors you want Yes, yes, yes, no You can do it. Yeah, you can dive in there. You don't have to use the automation But if you're curious what it looks like just use the automation Yeah, all right. Well, this is very much guys. It was very nice being here. I love you Yes, yes, we love the product. So uh, we'll definitely we following up with you on this So, thank you everyone for I didn't mention that we have a homelab plan that it's coming Coming up today. So we should You want to have a product unlimited We have a very small price Uh to give you full access to everything in a data cloud Uh for almost no money So I'm looking forward to that. We should have led with that. So I like the uh We'll talk about that a big reveal towards the end your view towards the end here You can probably put that in the description. I think we will. All right. Well, thank you very much and everyone take care Thank you very much, mate