 I guess, pick up from here. Clearly, when the speaker presents the speaker's microphone, they are subscribed to the app. Just order a book. OK? So maybe it would be good to do the introduction of one of these. Of course. Before the end of the session. Don't be afraid. The first row is still, OK, so it didn't change in three years. People are still afraid about the first row. The problem, if we remove the first row, it doesn't change anything. What you do is wait for it when we sit down and then you take the seat away. That's the only way to do this cool way when we're dancing. You have all of you in the back. There's a little guy here. You can sit. Come with us. So the camera will be here. So you need to speak around here. I think you're already in the opening. So no, no, no, it's good. It's the first time we are all the four together in person for two years in a half. So welcome, everyone, just to think we can just figure out that as an organizer of the general user group, you have the chance to have the four of us live for two years in a half. You don't know what chance you have, but it's great. So thank you. So now I let Michael start in. Sure. It's a good idea. We need this. Everyone for coming. So we're super happy. It's been two years and a half. But we didn't do any face-to-face meet-up. So we tried very hard to do online meet-ups, but it was not so fun. No pizza, no mingling, just a talk. So we are very happy to have you. And yeah, so for the first time, we had a visitor to Singapore. We have John with CTO and Hazelcast. And yes, we will talk about Karkar and Hazelcast. Jo, you want to do this? No. I'm taking a picture. OK. You want to say one? Ah, no. We're just really happy that we can kick-start a normal life for meet-ups. And as you see, we learned from the COVID. So now we're trying to have our meet-up hybrid. So it's also livecasts from YouTube for people who do not have the chance to eat pizza. So, sorry. Didn't have pizza. Patrick is from Paloite. He's our sponsor for tonight, for the place and for the group. Hi, guys. Welcome to Paloite. This is our first meet-up. We're not only in the Chabai group, but across all the communities from Nicaragua. We're visiting both our office in Scabida and our new office. Some of you might have realized when looking for it that our office was at both keys. So I don't know if you're wrong place and you walk all the way here. We moved here a year plus ago. Anybody who doesn't know Paloite? I guess some of you don't know what we do. Who we are. We are a global innovation company with a footprint in other countries around the world. We are not very big, but we are big enough to matter all across the world. And we have a strong focus on technology for good. So a lot of the work we do is about how to make our environment better, the work without people better, and so on. And this is an example of it. One of the values of the company is sharing. And I know the spirit of sharing. That's why we host regularly meet-ups events like the one on the other market. With this, I give back to you. Yeah, and I'm going to share specifics. Thank you very much. Okay, so now we have John. Thank you very much. Thank you. Well, I'm very excited to be here today. Everyone, are you able to hear me okay in the back? Yes. I came to Hazelcast four years ago. Yeah, four years ago. Sometimes it seems longer. From Cloudera. And when I saw what was possible with this technology, which I'll explain more about, you know, I was just blown away that you could do a lot of the things we were trying to do at Cloudera, but more real-time and in a much simpler architecture. And so that's pretty exciting and it enables a lot of use cases that you actually couldn't possibly do on Cloudera, which I'll talk more about in my talk. But first, I wanted to... I'll bring this up at the end as well. We're going to do a drawing. Anybody who's willing to put your information in, then someone in our marketing department tomorrow will do a $100 gift voucher. I don't know what the one is. It varies from country to country, but it'll be something appropriate for Singapore. And you just have to identify yourself so that we can then spam you. That's basically how that works. So before I get into some of these examples, I just wanted to know how many people have used Kafka before? Okay, so most of you have at least heard of it, but at least half of you have used it. How many people have heard of Hazelkast before today? Only a few. Of the people who've heard of Hazelkast, how many people knew that we had streaming analytics capability, real-time data processing capability? Oh, so one-third. That's about what we see normally. And were you aware that we had SQL capabilities? So, yeah, I just kind of asked those things because I think it's interesting to see, you know, what are the awareness levels and hopefully more of these talks will get people to realize Hazelkast has come a long way since we were created and introduced into the world about 10 years ago. Well, why am I here to talk about real-time? You know, I want to talk about a few examples of what do we mean by real-time? First of all, our CEO would say real-time is any deterministic business SLA, not a technical SLA, that drives, you know, an architecture and a requirement around data processing, you know, is what we define as real-time. So it could be real-time means consistently processing a certain amount of time of data within one hour. But being able to do so, you know, consistently, predictably, even under heavy loads, busy days of the year, even if it's securities trading during like a crazy high-volatility trading day, for example. Or it could be, you know, some other examples are we have a pizza delivery company in the U.S. who uses Hazelkast in a pizza-tracking app. And their busiest day of the year is Super Bowl Sunday. The U.S. Super Bowl is the championship for American football. It's the one we play with our hands and not our feet. I'm actually more of a normal football fan or what the U.S. calls soccer. But some other like latency SLAs the first one I always bring up is Google has something called their rail model. They've done extensive research on user experience and they've determined that anything more than 100 milliseconds and if an app is not responding back to you clicking on a button or interacting with the U.S. whether it's mobile or on your browser or wherever, then it's perceived of as broken. So 100 milliseconds if your app isn't responding then it feels broken. A lot of people don't realize just a regular website or mobile app needs to think about low latency and real time. And by frame of reference a blink of an eye is 100 to 400 milliseconds. So basically as faster faster than a blink of an eye is how quickly an app needs to respond. So that means that every company that has apps that are used by people should care about real time. Some other latency Windows 40 to 50 milliseconds is your typical window to authorize a credit card and also to do any kind of fraud checks to determine whether or not that's a suspicious transaction and it should be blocked. If you take any longer than that you could identify fraud and send someone a text message or push out a notification through the app but you already needed to have authorized the transaction. Which means if somebody is trying to take a stolen card and buy an expensive TV in the U.S. we don't even have a pin required even for large dollar transactions and so that means you know somebody could buy a $5,000 television with a stolen card and you might get an alert and say nope that's not me block the card legally they have to not charge you as a customer if it's within a certain period of time for transactions if you responded to these alerts which means that your bank or credit card provider has bought a TV or a retailer. So there's all kinds of windows of opportunity if I want to make a personalized real-time offer we have banks who are analyzing what you're doing and they're pushing it out while you're banking and making offers around say credit products or other products maybe they're cross marketing and promoting with retailers to do buy now pay later types of things the window of time again it might be milliseconds but there's other things like how quickly do you want to calculate risk that might be minutes hours of time that you do risk calculations and banking typically that's not a real-time calculation on the other hand there might be some things like if you're calculating things related to FX pricing or to haircuts on leveraged trading there might be real-time risk-related calculations and then medical devices obviously medical systems need to respond in real-time so I just provide a few of these different examples responding to equipment failure to prevent that failing just to kind of stress that this technology is pretty relevant to a wide range of use cases and what's unique about Hazelcast is that we have capabilities for processing data in motion such as that data that's moved around and stored with Kafka as well as data at rest in one runtime so our history was we were in a memory data store that was distributed sorry I'll try not to move around too much distributed and partitioned and so data would land on any of a number of different nodes of a cluster and then we would also replicate the data to other data centers so that you could have very, very high levels of uptime and that zero downtime architecture where you have real-time active-active clusters that are synchronized and the data within the clusters is partitioned but there's backup copies on the cluster so that you don't ever lose data that's kind of the heritage and over time we've made that more and more resilient and consistent and also added a lot of capabilities to be able to query and do computations on the data that we were holding in Hazelcast and as we did that we started to realize there was a huge value in bringing computation together with data and combining it with distributed partitioned processing and so that led us to adding this ingest and processing capability into the platform and this slide will advance maybe an arrow I'll use the arrow so that led us to introducing our stream and batch processing engine into the platform so the bottom component of that architecture a lot of people would compare to something like Redis the top layer of that is comparable to Apache Spark or Apache Flink except that we can provide the same kind of data pipeline processing as those technologies but we can do so in a way that is very high throughput and low latency and we'll talk about why that is we've done a lot of benchmarks and I'll just say that we crush Flink and Spark and in fact vendors who promote those technologies they don't even attempt to go after use cases of less than 500 millisecond latency typically and certainly not high throughput combined with low latency so that technology really has some significant advantages we're combining things together we're combining the compute and data while also taking advantage of the ability to make that very parallelize the processing and at the same time we have a data aware capability meaning that wherever the data lands in the cluster we know what data is in what partition and therefore we can make sure that the compute is related to for example my data that's sitting in my partition for my account or my securities trades might be in one place and somewhere else is Paul's data in a different node and so we're not moving data we don't move the data instead we process and analyze the data where it lives there's some other advantages we have a collaborative work sharing mechanism that means that we don't have to have a master worker architecture and that simplifies the operational aspects of the cluster combining that with the fact that we have a single jar file architecture means that if you're a Java developer you can get started with Hazelcast with one line in your Maven file and when you're deploying it that it's just another Java application and managing we optimize for in memory which further drives down latency we do sequel as well so that we can actually combine data that's coming in from Kafka with data that's sitting in Hazelcast so that you could enrich the streams of data with contextual data so you might have payments that are coming in and you want to add information about the customer's current balances and other core banking data but maybe you even want to add in data about other things that you know about that customer but also maybe even data that's not related to that customer but related to broader trends any data you could store in Hazelcast and integrate with real time data to make richer decisions so that's very powerful when you get into some of the use cases so the main goal of this talk is to talk about the enemies of performance and then really introduce technologies that could help you address those so really the first enemy of performance is the network every time you're moving data over the wire that adds latency and you can't get rid of the latency because networks are governed by the speed of light and so unfortunately that's something that we haven't figured out how to work around there's also storage bottlenecks there is processor bottlenecks and switching the CPUs if they are having to switch between different sets of instructions and things the way that the code works that can often add to bottlenecks in performance due to context switching and then overly complicated architectures definitely add to the latency so a good example of this is a lambda architecture a lot of people who are big fans of Kafka will have data coming in from Kafka and then they need to analyze it so they start to put the data into a database at the same time you have the data coming in on Kafka in real time and the database has to store the data before it can query the data and so what will happen is you'll have a mismatch between what does the data look like that's coming in right now and what is your query results and you know that it's not like a huge problem but when you get into these low latency use cases it can be a very big problem in terms of that impedance mismatch but it also means a more complicated architecture and then if I'm trying to use this architecture to support an application that people interact with that application if it's real time may have a tier one or tier zero very high uptime requirement and the architectures are terrible in terms of delivering five nines another architecture that is not a good a real time architecture even though some of the vendors who are behind this kind of an architecture including my former employer would you know claim that they can deliver real time experience but you can take a look at you have like data landing then there's some processing then it's stored, processed, stored, processed so you have this kind of ping-ponging of data every single step there's at least a movement over the network there's maybe also movement of data onto disk or solid state disk and back off and you know very complex architecture to maintain this is not going to be a five nines architecture I mean it's just not you know it's hard enough to even get this working reliably and resiliently in one data center with a predictable SLA of response times let alone what happens if you lose a data center that other data center is going to be not in sync with all of these different components so another example of hazel cask compared with the Apache Spark architecture it's not going to go into the details but you could just you know take a look and you can see that there's a lot of moving parts on the one side and depending on what fails you know you don't really know the implications of whether you're available or not or continuing to process or not in hazel cask every node has some data it also is doing some compute and it doesn't matter what node the data is always backed up the compute will be transferred and restarted on another node so basically if you have n nodes and you lose one it's 1 over n is your loss and capacity of both compute and data so this is really kind of the rubber meets the road picture for me every single one of these nodes is processing data they're all peers and because they're all also processing data and also storing the data and they're aware of where the data lives it's very powerful in terms of being able to do real time because it's also in memory at least the data that's actively being processed then that also you know further improves your performance but now we can also bring that together with things like machine learning and say okay well what if the machine learning is fed this data one of the things about machine learning is that machine learning needs very rich data you can't just take raw payments and feed it to machine learning all you're going to get is a very simplified pattern that doesn't really tell you a lot same thing would be true if you were feeding it sensor data you can enrich that if you sent a machine learning algorithm a bunch of sensor data like a temperature sensor will is that an air temperature is it a fluid temperature if it's fluid it could be a coolant or a lubricant is it attached to your car or a giant industrial piece of equipment or maybe even a power plant that context really is what machine learning in order to allow it to do the great things that you can do with ML but if you don't have rich data then your ML is not really going to add a lot of value and you might as well just be looking at very simple computations on the data and you know basic math you know so anytime you're talking about machine learning but you also want real time you need rich data pipelines and I don't know how you can do it without a heavily custom architecture optimized, custom coded for your use case other than using Hazelcast single jar file so when you're deploying this into production it's just a job application if we were to like look at payments, trades or could be a shopping cart you're putting stuff into the shopping cart I would like to make real time offers my data is going to be on a different node and a different partition than someone else's and that's really kind of the power of the real time so in terms of performance I talked a lot about performance you always want to hear some numbers so we've done benchmarking where we were able on 45 nodes to see linearly scalable increases in throughput the power of that is that we started at a very small number of nodes and we were already doing millions of events per second Kafka is very good at moving millions of events per second but Kafka streams is not going to process millions of events per second let alone a billion of events per second so Hazelcast is able to scale linearly to over 1 billion events per second a billion was reached at about 45 nodes we were delivering under 30 millisecond latency at 99 percentile on this query this was a real world benchmark that we found industry benchmark it wasn't our benchmark and so there's a link to the blog where we publish all of the details about that so anybody could take and recreate that because we don't ever believe in talking about benchmarks if we don't provide all the transparent information around them we actually published this and we thought like we're really just trying to prove that we're efficient but we didn't think anybody needed a billion events per second and there was a company in the media space in Silicon Valley that reached out within a week after we published this and said could you guys do 10 billion so there are companies out there that do need these kind of crazy throughputs but I want to talk a little bit more about the architecture behind it you know one of the things that you can do with any kind of you know technology like you know looking at Kafka and pulling the data off is that you could then take that input and partition it and process that in a way where you're avoiding serializing the inputs and so you can say John's data needs to be partitioned out or data about Apple trades versus Amazon trades maybe we'll partition those so we can get into partitioning with some clever programming but if we want to go one step further and this is looping there's still a lot of looping that has to happen how do we get around that bottleneck of processing things in order we actually can go further and take partitioning and combine it with pipelining and work with data in a pipeline because we often need to do a series of calculations on the data right we need to first you know filter the data then we want to do some like sums or averages or aggregation of the data maybe we need to join streams together each of these stages of the pipeline it's really a fairly simple API that we have we were inspired by java-youtel streams with how we built our pipeline API and reading from Kafka is a fairly simple you know small bit of java code and then we can break this into stages where each stage potentially is doing a separate set of transactions so filtering and then you know we need to enrich so we've got data coming from Kafka and we want to look at some data in IMAP which is Hazelcast basically like a distributed hash map concept and we could you know enrich that data but because we've say partitioned the data coming from Kafka in a way that is intelligent but we also have the data partitioned with data affinity rules in the map then we can make sure that when we filter the data and say John's data you know is going in and being fed so we can look at the keys of the data and start to spread the work out and then also look at the you know the information in the maps and again it's partitioned and then we can start to do some calculations and then of course we can write that back out to Kafka in this case so you know what does this look like you know because we have broken it into steps then we can now put queues between the different steps but these queues are not Kafka they're not on Kafka this is all within Hazelcast and this is all optimized in memory and so from between each step you have tasklets and these tasklets can be distributed and executed in parallel on different threads or even on different nodes and so as we move with the data through from one queue to the next one tasklet to the next we also may have different types of data and all of this is you know being represented in a directed acyclic graph you don't need to know what that is but it's basically a one-way data pipeline where you have immutable data moving through the pipeline the data is changing but it's not you know but from one stage to the next it's immutable and so what you're able to do when you compare this this was just a colleague of mine ran some simple benchmarking and just was able to look at how many time units you know when you don't do any partitioning and pipelining you know you could process three things right you process all the data for one thing all the data for the next thing all the data for the third thing then if you start to do some partitioning then we can use more cores on our single node of a cluster and so because we can use more cores then if it's two cores available we could you know potentially you know parallel process and then what keys we're using with the partitioning maybe we could use some more cores when we do pipelining we can use more cores and we can now get 15 time units versus three or six if you combine these two together eventually you know you can start to get up to 15 time units and you know much more parallelization it's not a super easy to use diagram or read diagram but I think it sort of conveys the point that we want to parallelize as much as possible and if we can break the processing down into steps each of those steps if it can be parallelized and if it's also data aware then we can start to get some tremendous improvements in both throughput and also latency so some of the you know basic building blocks of this streaming engine we used to call this engine jet and it used to be kind of an add-on component to our data grid and we last year unified them together so that there's just one runtime, one product one of the reasons we did this was to make it easier for people to adopt the technology but another reason was it actually allows us to innovate faster because we're not maintaining two different code bases and we can deliver more value to the market we're then able to do stateless transforms so some things can be extremely parallelized because they're stateless so filtering you know mapping, transforming merging streams together all of these things you know if I'm working with John's data and you know merging my shopping cart with my shopping cart history with my products and interest with information about my web clicks and what I'm clicking on at that one moment in time all of those pieces of data can easily be merged together and that could be very efficient, parallelized and each step of that type of processing can be running on lots of different cores and lots of different servers because you know Paul's data or Michael's data would be potentially running somewhere else so now we're going to get to a demo but I wanted to just talk a little bit about the demo it's a continuous query with drill down demo so we have this kind of lambda architecture but now it's running on Hazelcast it's all running in one runtime and so the trades that are coming in are actually at the same time being continuously aggregated and so you're able to see everything coming off of Kafka and also do aggregation enrichment of the trades with data that's held in Hazelcast so that you have this you know consistent picture of the data oh maybe I need to end the presentation so this is our trade monitor demo which there will be a link to in the presentation which we'll share with the organized for you guys to share pretty easily on Hazelcast's developer pages and documentation or on our github you can see that trades are coming in and we're actually looking at all these securities that are being traded and we're aggregating so we're basically looking at all the pricing and rolling it up and calculating the total volume and also analyzing and determining are the prices going up or down one of the things we're also doing is we're filtering out any trades that are securities that maybe currently being delisted or that are for some other reason considered sort of securities that we don't want to present to our algorithms or traders or whatever so there's an enrichment step that's happening as well and then you know what traders will typically want to do is as they start to see that a stock is moving up or down they may start to see some big trend and they want to drill in and see what's really going on is it you know a few big investors that are making some big trades that are driving that volume or is it a lot of little trades from small investors or maybe it's a lot of little trades from algorithms that are driving it and those algorithms are driving a bunch of volume that they're going to then do a big trade on that's the one that they really care about and the other trades are all just trying to trick other algorithms most algorithm high frequency at least algorithms are not trying to trick people anymore they're just trying to trick other algorithms so you know one of the things you might also want to do of course is actually query and so we do support now and you can't really see this data very well here but you know it's actually doing a join with data that's in Kafka and data that's in Hazelkast so we're basically you know looking at this and going ah well that's really interesting but I want to go and run a new query and I want to pull some of that data from Kafka and we're able to just go ahead and do that and I can go into our management center let's make sure this is and then in our little sequel browser we could start to execute this query and you can see a better view of what the sequel looks like so KFTrades is a Kafka topic and you can see it here in CoffDrop and KFTrades are actually coming in and you know this is JSON and we're taking the JSON data pulling it into Hazelkast and able to easily combine that data with data that's in Hazelkast that could have been loaded in advance from various other databases or in some cases it could be you know directly written into Hazelkast we have a lot of people who have their online applications writing to Hazelkast and then we're also replicating out to other applications so you know when you write into a Hazelkast map we can asynchronously write that data out to another database whether it's you know a typical like a sequel type database or something like Cassandra we're often used in front of you know Mongo or Cassandra and you know you can write to Hazelkast and read immediately back from Hazelkast and sub millisecond times and of course you can also then take all the data that's landing in Hazelkast and feed it into streaming logic and combine it with data coming in from Kafka so that you can do these kind of real-time use cases in you know many different industries so this kind of gives you an idea of what this data looks like and you know raw in Kafka and just jump back over and we will take a quick look at just one of those examples so this is actually from the sample application code so you can see we're you know taking the trade in and then we get the symbol and we want to check whether or not this trade is considered a normal security type so that's kind of a reference data given an example of how we're enriching the data that's arriving with other contextual data and then we're able to do some grouping and we're able to then do an aggregate so each of those is going to be a stage and we'll be able to then be parallelized and processed very efficiently so you can see the code it takes a little while to get used to the API but once you do it's very compact and and then of course incredibly performant most people if you're familiar with Java Util Streams as well as familiar with the Java 8 concepts and syntax then it should be pretty easy for you to pick up this API and then of course we also have a sequel as another alternative I am actually on the end of a three week business trip by the way so I spent a week in Turkey doing an engineering offsite and then I've been in London and Sydney before here so that's why I put all of the code into into PowerPoint to make it easy to read but also because nobody ever asked me to well why don't you edit that code and then make it do something different which I'm a little too sleep deprived to be coding on the fly in front of a group so that's my little way of preventing you guys from stumping me that way but yeah we encourage you guys to have a look and try this out you can actually get these examples on our GitHub repo there's a whole lot of different use cases everything from like flight telemetry types of data to trading or there's an example where we're looking at telecom data and trying to predict whether a customer is unhappy and likely to churn we've got one where we're analyzing retail data and looking at propensity to buy so all kinds of different interesting use cases and then there's also simpler examples just of like how to read data from Kafka how to write data out to different sources and I also always want to give a plug for our excellent documentation as a great source of just examples use cases simple bits of code that you could then use so Hazelcast doc is really a good place to go learn more about Hazelcast we've got information on YouTube we have a twitch channel which is going to get reinvigorated now that we have a new developer advocate who's just joined and let's see what else and then we have Hazelcast cloud so you can try this product out in the cloud as well and there's free access for forever free for up to 200 megabytes of data and that's going to go up I think to a gigabyte with our serverless launch and we'll be continuing to look to make it easier and easier for you to try it you can install Hazelcast via like Maven and Gradle you could install it by downloading it you can install it using Docker from Docker hub or in Kubernetes if you use Kubernetes we have both a Helm chart and an operator so there's a million different ways to install Hazelcast there's also Brew you could install Hazelcast and also install our management center and our command line that way so however you want to try and install Hazelcast we will make it easy for you to do and encourage you to try it so I'm going to kind of pause there and see if there are any questions yes yes so when I say distributed I mean there's like say 3 nodes or 50 nodes in the cluster the data will be split up into partitions those partitions will be spread out over the cluster so that each node has different data than another node so if I were to go to our management center and if I were to look at the data that we have stored you know the data is going to be distributed there's entries on different nodes of the cluster and if I add nodes it will get spread out the computations are also distributed across the cluster what you're talking about with the geographic distribution so we do have a replication cross cluster so you can run clusters in different places and then you can configure how data gets replicated between those so that you know if I'm writing data here in Singapore and then somebody else is writing data in Sydney they're going to retrieve the data locally but we replicate it so that if someone tries to read the data here in Singapore that was written in Sydney you know it will be available there's always a latency there when you talk about starting to distribute things across geographies because again we have the speed of light I don't know what the latency is between here and in Sydney but you know you can look up whatever the top telecom provider is between the two regions and they'll give you you can literally look up I've had customers are like oh well why can't you replicate any faster between here and these two cities and I'm like here let me look up in the US I'll look up AT&T or Verizon and I'll say this is both the two of the top four latency that they can offer you between these cities so no matter how much money you have like you know unfortunately Einstein comes into play you know but we also do have people running these clusters in not just in a highly available kind of DR type of configuration but we also have people running clusters closer to where data is getting created for you know edge type processing so for example running inside of a petrochemical plant or inside of a warehouse where goods are being packed up and moved around or run inside of a manufacturing facility or a medical facility so there's a real world customers who are running hazel cast in decentralized locations and then they may replicate that data back to centralized clusters to kind of consolidate a view of the data and typically in that way you don't do active active right if you're doing that it's more like you know one way replication from the you know kind of leaf nodes if you will to the central clusters that you have in that kind of edge architecture we've even got a customer that runs us on a heavy truck and they drive that truck out to an exploratory oil field and it's analyzing the data from the oil drilling equipment and so there's like 60,000 events per second every single oil rig produces 60,000 events per second so but you don't want to take all 60,000 events and feed them back to the data center because actually a lot of those events in the IOT world don't change right you know like the RPMs of a pump or a motor you know typically they once they reach a certain operational speed then they just it's steady you know so temperature readings like once your engine gets warm in your car the temperature tends to stay steady and so like if I have a 200 hertz sensor giving me 200 events per second telling me a temperature but it's the same temperature in an hour well I can just aggregate up and send like every five minutes back to the data center and say yep same temperature, same temperature hasn't changed you know so why would I waste a lot of network bandwidth with sending the same data over you know so that decentralized edge architecture you know at the edge you're processing everything and you're responding and you can instantly respond and say this oil rig needs to be slowed down and if it's not slowed down it might fail and that might cost a million dollars but the central operational view is like yes all the oil rigs are looking good they're all working properly so that's a good question and we do see more and more people looking at these kind of edge architecture and moving that data can you say a few words about the influence of which language do you guys use or do you use the Hazelcast as an open field? Yeah so I mean Hazelcast is written in Java and the original APIs of everything were Java we have data APIs that allow you to retrieve and write data in just about every language and then we have SQL support that supports those languages so now if you combine SQL with every language then you have that ability to work across languages but under the covers you're still talking to a jar file now we do have a native memory mode that's obviously not written in Java it's moving that data some of it's in Java but it's into an optimized architecture where we handle all the garbage collection and there's some native code there as we start to introduce other optimized features like storage and things we might occasionally write a little bit of optimized code we can call the languages as well so you could have a data pipeline and then at a certain stage you decide I want to take that data pipeline out to a user defined function if you will in whatever language so you have customers doing this calling out to C++ or C a lot of Python being used and then also just various kind of microservice or REST type APIs as well GRPC we can call out to other languages that way and sometimes it's because we're calling ML but in other cases it's just because there's existing code that sort of works and nobody wants to rewrite it they just want to call that code we run generally on Linux but we occasionally have customers who will run us on other types of UNIX flavors including a couple of POCs that are underway right now where we're running on ZOS on the UNIX system services in ZOS so you could run us anywhere where there's a Java 8 or later SDK Raspberry Pi by the way when you get into edge environments ARM and Raspberry Pi or hardened ARM based architectures are coming up we're starting to see people asking about ARM as well in the cloud Intel's a big partner of ours we don't try to promote or push that that kind of path because we do see great performance with Intel technology but we can run anywhere in terms of the engineering team I think almost half the company is engineers or certainly if you include engineers and product managers it's at least half the company most of the people are doing that stuff and not an exact like me or a sales guy like Paul we're kind of you know there's just enough of us to kind of get this software out to customers and keep them happy and then of course we have a support organela and consultants and things like that but it's a very engineering oriented technology oriented company and it's also open core product that offers things like automatic DR and other like zero down time upgrades, rolling upgrades and things like that advanced security and other features but the core product anybody could try out is open source and you know we have actually a lot of companies using our open source today I can never like you know one day to the next who you're allowed to talk about that changes so much that I try to avoid you know naming names but if I named our top 5 open source you know users they would be companies that you would be like oh yeah wow I've been using Hazelcast and I didn't even know so any other questions in the back uh-huh yeah so the question is you know when you've got all of these different distributed steps or stages of our pipeline you know how do you sort of tune when some things may be slower than others and optimize the overall end to end throughput you know the number one thing that we can do in that area is when we look at the jobs that are running so the pipeline stages are visualized here so each one of these little boxes is one of the stages of the pipeline and you can see that at a certain point there's parallel pipelines where it goes in different paths and so at each one of these stages right now you could see what's going on with the whole end to end but let's say you picked you know this step you know you can actually start to see processing time information for each step of the pipeline and that information all of this information that's being gathered in and the statistics are also available you know via you know JMX and you know you can feed them into other things actually we've got an example here we're feeding the data into Grafana but you could also be feeding that data into a pool that's also getting data from a profiler so you could be combining your pipeline stage data information with maybe some profiling information and that might help you to start to identify what's going on the other thing that you can be doing is that you can be looking at where you have you know more computationally intensive or other types of bottlenecks and starting to kind of analyze you know what could you do to maybe you know distribute that and so one of the things that you could do is for example look at your data and see whether or not there is an ability to further partition you know so for example I could get all the trades coming in from the Singapore exchange and all the trades like the Sydney exchange and that might start to hit a bottleneck but then what if I said let me look at all the trades that are of a particular type of security or something like that that could be another dimension of shrinking the amount of data down to allow you to parallelize it but then you could look at other things like for example in a particular industry and maybe that's another way if I started to get another dimension of the data and then we could figure out some ways of taking that data and parallelizing the input to your pipeline so there's techniques where you can say this is a key that I want to use for you know partitioning the pipeline and therefore if I said all of the data from banks and all of the data from manufacturing and all of the data from hospitality and retail would be processed and different so the industry might be another way of processing it but you could even go down to individual securities as your dimension for distributing and partitioning the data pipelines so a lot of it is about thinking about the data what's the bottleneck is the bottleneck because I'm feeding it into a machine learning algorithm what is that algorithm need or is the bottleneck just some really heavy aggregations that are happening you know at the end of the day like a lot of aggregation type computations are where you create a bottleneck I'm trying to like you know analyze and aggregate all of the information about a particular security and then I might want to eventually sum that up across a whole class of securities or like I want to look at you know all of my customers you know based on age well you know as you start to go I like to look at all my customers and all of the things that they're doing with data in real time but I want to look at it for all customers from age you know 15 to 50 well now you're just you're adding more data in and then you know that's going to create more of a bottleneck with the data and if you could break it down for different steps of your pipeline then you could you know avoid that and then you only want to do the final aggregation at the end right so you do a lot of parallel processing and at the very end you want to say and add that up for everybody within a certain range and so there's tools in the product that allows you to do that kind of aggregation and that's also a big area for us in terms of roadmap is we want to provide more tools to help you to analyze and tune the product I mean you know I think over a longer term we may eventually start to build some kind of automated query optimization or pipeline optimization algorithms into the product you know but the key to doing that type of more automated analysis is that we need to kind of really have a lot of data from our customers on their use cases to be able to kind of figure out how to build those and that's always the tricky part as you know if you said can you tell me exactly what you're doing with our technology a lot of people are like I don't really want to tell you every single thing that we're doing in some cases I'm not even allowed to kind of get into a lot of detail and so you know that's always a challenge with you know the automated types of optimization is you have to run them through a lot of workloads to try and make them intelligent enough otherwise they could automatically make bad decisions so we don't want to necessarily do that so that's why we kind of are favoring right now you know giving you the tools to allow you to kind of do those optimizations but also to help you find the bottlenecks at the end of the day the quicker you can find the bottlenecks then the quicker you can optimize I wanted to leave you back with the QR code in case again anybody would like to do the drawing and because our marketing team will be disappointed if we don't make that available I'm doing everything you know I mean the fewer people that do it then higher chances of somebody winning but you know yeah we would love to have you guys enter the drawing and you know in exchange you will learn all about what's new in Hazelcast alright well thank you in exchange you have some some questions just I captured a little bit so if some other questions are coming this was I same correct correct me if I'm wrong very good very talented so do you have seeing this do you have any other questions that come to your mind when you got this graphic recording graphic recording impressive it's fine so no other questions I have actually one question when you spoke about that your equipped cluster with distributed and storage and compute I'm really happy that you said that you came and moved to the Steelers Light because this is one of the things that nobody understands especially management when you come to this when you have complex companies firewall, F5 and all this kind of shit that create latency inside the data center is there some recommendation on how to install Hazelcast to avoid creating well you know if it's within a data center or within one cloud region we have recommendations on to do that but just in general I think you want a minimum amount of latency between the different nodes of the cluster and then I think we also do a lot of work to make sure that our things like that's the cell support are using the fastest possible libraries that can also take advantage of hardware acceleration and that's another thing when you talk about security nowadays more and more people are going to this zero chest model where even within a data center they may want to have TLS communication and so we definitely want to be thinking about using the right performance libraries and maybe network cards and things like that and then in the cloud region it's just the same thing that you have to pick up based on reading the specs but we have papers and recommendations other than that I would say that you know because we're not moving the data over the network we do sort of minimize the impact of those things thank you any other question or you want to have a novel life maybe you cover it in the new system you were comparing your architecture so that you can have some picture one on the side it was a complex complex part that there was also a and one of the challenges that I faced in the past is kind of architecture and multi-dimensional setup so is that a problem that hasn't cast addresses or do management nodes still come in the same connection process yeah so each cluster is replicating and communicating and so if there is a network partition between the nodes we have to deal with that and we have two different algorithms that we use one of them for better performance and availability it's based on the practice algorithm and so that's an AP range behavior and then we also have a CP based option so that if you really care about if I write to the data and I can't make a backup copy and you know I can't ensure that everything gets that right 100% that there can't be any data loss or any dirty reads that you can have a CP so we have what we call our CP system which is based on the graph algorithm so we can allow you to kind of make trade-offs there the other thing that we do offer even with the AP is there's a way that we can have some of the logic executed using something called an entry processor which basically ensures that each partition the data of writing that partition will be queued up and so that having a bunch of writes that they all end up having to go into a partition thread and since it's all working with one thread then it can't possibly execute ahead of other instructions and so that entry processor can offer consistency but what you have to then do is tune how many partitions you have based on your compute patterns and your data and so some tuning that's another technique and do that in short and then we're constantly looking at different use cases from customers in this area trying to figure out how to make that in the future it's going to be a pretty much thing for each object or data type that you create you'll do the set AD for AP and there's another interesting theorem called PSLC which talks about not only the trade-offs of availability and consistency but also the latency how those choices impact the latency sometimes people think that it's just a choice between consistency versus availability but actually you know thank you thanks everyone for coming thanks John thank you for hosting and thank you for hosting I'm very excited to actually be doing this face to face again so we officially close for internet but you can still stay at least a little bit I think if you want to do this so thank you John thank you for your presentation