 Hi, and welcome to That's Not a Lot of Data. This talk will focus on three main themes. On the one hand, the operation of Cloud Native and also FinOps stacks. On the other hand, financial data visualization and why the real powers in the intersection between those two. For everything you'll see, I'm asking you to keep in mind that you can use literally the same technologies to run your stacks, to inform, optimize, operate them and to do the same with the resulting financial data. Again, to inform, optimize, operate. Above all, the point is you can bring this seamlessly together to allow deeper understanding of the interdependencies between those two sides of your metal. So as usual, when speaking in front of new audiences, I'm breaking out the breaking slides. This lets about me having specific lawyers. It's a lot more about that validating, yes, this person actually knows what they're talking about. If you truly care about this, you can simply look at this in the conference proceedings and also the slides are linked at the end of this talk. Maybe of specific relevance for this audience, I was responsible for the internet endpoints of oil wells in basically war-loaded country and architected, implemented and operated the complete IT networking security side of the gold vending machines. You might have seen them work right on top of the Porsche Khalifa and elsewhere. And well, there's more. Anyway, back to the basics. So if you look at this, Stahlschlüssel, 200 years ago, you could easily have won wars with these tables about steel recipes, about required tempering, resulting physical properties and such. Go back a little bit further. Lockbooks from centuries ago, they talk about cargo taking on, cargo unloaded, about the cargo being sold, about whales sighted and caught, but also they talk about temperatures, about off the water, of wind levels and such. And today, scientists are working through all those old lockbooks, because this data holds immense value today. Even farther back, 4,000 years ago, the first letter, which we know of, it's about wrong grade of copper being delivered, about provision of payment being made and the seller being rude to the buyer's servant. Going back 2,000 years more, 6,000 years ago, this is probably the oldest writing which we are currently aware of. It talks about who owned which slaves and when. The point is I don't need to convince you that numbers and events are important and that they are inherently tied to humanity, how we operate to the value which we create and which we distribute and then trade with each other. Without all of this data, humanity couldn't run. And the thing is, no matter if it's actual operational data or financial data or something else, fundamentally, this is all the same from the point of view of technology. And no, I will not get into the details of data being singular versus plural. I'm just following Wall Street Journal in their style guide. So let's look at the observability side of things. Observability is an absolute buzzword. It's all the rage and tech. If you look at control theory, it's basically just a way to deduct the internal state of a system by looking at the outputs and inputs only. You don't know anything about the thing itself. You just look at inputs, outputs, and it allows you to reduce the complete internal state of the thing, which is obviously nice if you have to deal with lots of different systems. And also it puts very good requirements on when you can actually claim that you understand a system, that you understand a thing. Contrary to this, monitoring has taken more this meaning of collecting and not really using data. Two particularly extreme examples would be where everything you have, you just do full text indexing without actually caring about the cost in money and time in resources, which is encouraged. Or the other extreme where you have data leaks, which is, in my opinion, a euphemism of no one will ever look at this again. Like maybe if you have data scientists who really look at like long running old historical data beyond this, like for example, the logbooks of old ships, in those cases, yes. But beyond this, oftentimes it's just an excuse to basically not have to actually have any plan what to do with stuff and just collect it more or less. Which is not what we actually want. I mean, even if I'm just storing it without an index, it still incurs cost in storage, in power, in just having to manage it, migrate it to backup it. All this is cost. So those extremes and not actually deriving value from the data is this kind of meaning which monitoring has started to take on. Personally, I used them interchangeably, but you see this distinction in the market. So at a very basic level, observability is about enabling humans and machines to understand and predict complex systems. Why machines? I mean, a lot of this is pretty close to AI ML. It's not absolutely there, but a lot of the properties are the same, which again goes into this all informed, optimized operate cycle. And the thing is oftentimes, if you have different fields, everyone has more or less, sorry, fundamentally the same needs, but when they come up with concepts and solutions, sometimes literally the same ones, there is different names and terms for the same thing. And in our case, like you would say observability and proper operations in cloud native world, whereas in FinOps, this informed, optimized operate cycle is more well known, but at the very fundamental level, those are equivalent, which is nice. Of course, that means all those nice cloud native things for FinOps, the problems being solved are actually the same. One thing is different, the scale of cloud native stuff is a lot higher as per usual than in FinOps. So you just have more leeway in how you choose to use those tools. Of course, you just have them designed for a lot more data, which in turn means you can use those deeper analysis, those quicker systems to your benefit. So let's look at those systems. Starting with Prometheus, as per usual. Prometheus 101, it's a time series database. It's inspired by Google Sportone. It's the standard in cloud native deployments. It's the standard for Kubernetes, which you most likely have heard of. We have, I'm part of Prometheus team. We have PromQL, which is a specific language. It's a functional language, which you use for everything, processing, graphing, alerting, exporting. All data operations are through this one language, which yes means, on the one hand, you have to learn that language. On the other hand, it means you, no matter how you work with the data, you literally use the same mechanism every single time, which is incredibly powerful, because you don't have to rewrite something from an inert into analysis. It's literally the same query. And we don't have hierarchical data models in Prometheus. If those n dimensional label sets, they can slice and dice your data as you currently needed. Important point here, labels in Prometheus land are not precisely the same as labels in IML. The labels as we use them in Prometheus are a very good basis for labeling in IML, but the terms are not precisely equivalent. But they share a lot of the same properties. And I've been told by our IML people that working with Prometheus data to do IML is a lot easier than with many other databases. Of course, so many properties are shared between those two. So what are time series? Time series are recorded values which change over time. If you have a lot of individual events, which matter, you can merge them into time series and just count them or something which goes up and down. Temperatures in a data center, service latency, but also the price of stock or how many coins have been mined and those kinds of things are basically all time series. Just how it changes over time. It's super easy to admit all of this towards the Prometheus ecosystem. So if you want to write a function in your Excel or what have you, that is actually doable. I know people who do this kind of thing and it's really, really easy. The scale of Prometheus ingesting a million samples per second is not a problem on current hardware. So roughly it's 200 case samples per core per second. We compress quite aggressively, which is nice for storage and long-term storage. The largest single Prometheus instance which we know of is a hundred million active series at the same time that worked. For long-term storage, there are two solutions which Prometheus team members are actually working on them. There's a lot of others, but like the primary ones would be Cortex and Thanos. Thanos is historically easier to run. Slower in querying and such. I forgot to write slower in querying. It works by scaling storage horizontally. Cortex is historically harder to run. It's gotten a lot easier recently. That initially started with scaling the querying ingestion layer and then took the storage scaling code from Thanos and guess what Thanos is doing with the querying and such code of Cortex. The largest instance at Grafana Labs is 65 million active series with the cost of 670 CPU cores and 3.4 terabytes of RAM. We have a customer who's running at three billion active series, which is like not a little Grafana Loki. Loki is based on the same label sets as Prometheus. It does not have a full text index, which is part of why it's so quick. It works with quite the scale of logs. If I do say so myself, we are looking at high numbers of terabytes with surprisingly little cost. As you have the same labels on your logs and your metrics, you can seamlessly jump between those two. Like you have a stock which is emitted and they have, I don't know, news article, what have you. You can tie this directly to how the stock develops or what have you. Of course, the metadata attached to them and to actually access those things are literally the same if you do it right. And that also means that you can trivially turn your events into metrics and then do a lot deeper analysis on the already pre-contextualized data which you have in your metrics. Super nice. And you can basically ingest every type of events into Loki. It looks pretty much the same. This is a not super deep technical one or talk. So let's just do the quick thing here. You have a timestamp. You have the same label set and then you have the opaque string which is not indexed and that's part of the magic of Loki. Curious internally, we regularly see 40 gigabytes per second and more in query speed. We create terabytes of event data in under a minute and then do complex analysis on those results of the data sets. We have a not official number of terabytes of data in those clusters. We have a cost of 270 CPU cores and 1.5 terabytes of RAM which for this amount of data and for the fact that we keep ingesting all the time and keep curing all the time is pretty decent, I would say. I'm kind of biased. Now we have Grapha on the tempo. That's more relevant for the operational side and not so much for the finance data itself. Just a nature of the data. Tempo is for traces and for spans which basically tells you how you walk through the program code in a particular execution of this piece of code which is relevant for the developers if they want to debug or if they want to understand why they have high latency or why a certain decision was made. It's super useful to know how you worked or how the program or the computer walked through this particular instance of that execution of the code. Yeah, again, it's more for the operational and for the debugging of your stuff not so much for the finance data itself. There's a super nice concept introduced with tempo or introduced to the wider world. It was available at certain search engines for quite some time. Exemplars. Where you basically just attach an ID to your traces and then you can jump directly from your logs from your metrics into those traces because usually you have this needle in haystack problem. You have immense amount of traces and you don't know which are the relevant ones and then you need to search through them and that just, it takes time. It takes, it incurs cost, both human and computer but if you attach this already to your logs and to your metrics, you can say, I have this one high latency and this isn't that request and I shouldn't have that course. It's for training or whatever. And you can jump directly into the thing or you have this one error state and you can jump directly from that log line into your trace and see why it's happening or your developers can. Yeah. Temporal is also currently getting searching and such for people who use it but the recommended way, so to speak, is to go through exemplars. You don't need any hugely expensive infrastructure on the back end. It's literally just object storage, S3, what have you and you just go. It's 100% compatible with all the relevant players in the market, both emitting and ingesting and at least at our scale, we don't do sampling. We have all the traces, which is super nice because you don't have that problem where you have a super highly relevant trace and then you cannot get at the data, which is not good. Our largest cluster is ingesting 2.2 million samples per second and 350 megabytes per second and this data is already like two month old. With 14 days retention, three copies stored, cost of 204 CPU cores, 450 gigs of RAM, 132 terabyte of object storage and still we have a P99 of under 2.5 seconds, which is nice. Grafana itself, you most likely know what Grafana is. If you don't know what Grafana is, the short version is it's a fully open source tool to visualize data, to democratize the creation of these visualizations, enabling everyone to create those beautiful, intuitive, informative dashboards. And yes, it's pitchy, but it's the truth. People can make this themselves without having to have studied computer science or anything. It's pretty easy. I've seen non-tech people do this and it just works and you can do it yourself. You don't have to share your secret sauce about this and that analysis with anyone else. You can actually do this yourself. Another super nice property is that you can take the same data and visualize it in a lot of different ways, which enables you to just get the different understanding of interdependency correlations, causations in your data to get literally a different view into the same data sets. I could show you endless, endless, endless amounts of where I have the literal same data and just show it differently or also different data and show it differently, which allows me again to look into my data, but also to correlate between completely different data sets. Like maybe I realized that I know during my backup times, I have a little bit more latency in my trading software and I don't want that. And this kind of thing where I can correlate trades executed versus something completely different in the IT stack. That's this power of this system where I can put all the things at once and do the analysis across all that data at once. That's the true power here. You can play with this yourself, literally. Play Grafana Orc, free for the taking. You can just click around and see what you like and what you don't like. And that's the visualization side of Grafana, but there's also the business side of Grafana. Half of the Fortune 50 are paying for our technology. Almost everyone in Fortune 50 is using our technology. It's part of our business model and it's part of open source that not every one of them has to pay us. And that's fine. Like you get indemnification, a little bit more features and speed and such, but you can run this yourself. And as you can see, almost everyone large does and half of them find it well enough that they actually want to pay for that privilege. We keep getting rewards, so I'm not going to walk through every single one of them. The most relevant one for this audience will be the JPMC Hall of Innovation Award, top vendor of 2022. Not a small feed-in, they don't give this out lightly. Anyone who has worked with JPMC will know that they're extremely demanding and extremely particular about what precisely they want and need, which is great to work with. And apparently we're not doing it wrong. You might also have heard of this thing called Robinhood and the GameStop stocks. Robinhood gave a talk recently at GrafanaCon and you can watch this, it's recorded just Google for that thing, a two-people team was running a Prometheus instance with a hundred million active series. Then came GameStop and with our help, they scaled up to 700 million times series and basically kept Robinhood able to observe and to run and to operate their own stack during this GameStop frenzy. I told you there would be more. We'll be talking publicly about the results of a recent internal hackathon soon and if you always wanted to build a Bloomberg terminal with your own data and official data and just mix this together into your own thing, now you can. And the nice thing is all of this or most of this is open source and you can run it yourself. You don't have to pay us, you don't have to pay anyone else, you don't have to have anyone watch your secret sources. You can run all of this yourself if you want to, no problem. It's also all available on-prem and cloud as a service. That's how we get paid. I like food and shelter, so if you were to buy from Grafana, I'm definitely not complaining. And that's it. Unfortunately for COVID reasons, I cannot join you in person. We have a few coworkers in the room and they will be more than happy to take your questions, give replies. My contact information is here. All the other talks which I've given are also listed here. And in particular, if you want to have deeper technical view on what I just talked about and different angles which are maybe a little bit more useful for the more technical-minded people, a lot of this can be found at the last link on the slide. So yeah, again, thank you very much. And I hope to maybe potentially see you next year in person. Until then, thank you. All right, thank you so much. I just want to bring up a slide. Cool, so thank you, Richard. Switch on the microphone. So thank you, Richard. So I really enjoyed that talk because Grafana is well-known for infrastructure-level monitoring, your CPU memory disk. That's certainly how I started with Grafana, but over the last number of years, and you can see in the talk with Richard, is how do we bring in data from the more business side of? So traffic into my website or trades being executed on a trader. So how do I correlate business-level data with my infrastructure data? And we can do this all with Grafana and the technologies that we have. So I don't know if there are any questions. I do have a microphone if you have any questions. And how this would actually work on a technical level, I'm happy to answer. My name is Veli, my colleague is Sam, so he's from engineering, so we're happy to take any questions. Or if anybody on the stream also has some questions. All good? Well, thank you so much for coming. Hope you enjoy that. We'll see you around.