 One tip I'll start off with, and if there's anything else you remember from my talk, remember this, okay? Become data scientists, earn lots of money, retire early, enjoy life, okay? Do it in that order, okay? Don't become like me, okay? Old and gray and fat and still working, okay? So there you are. All right, so agenda, very simple, three and a half things to cover, okay? The half is this Kubernetes, because I've got it running actually, and it's just to show you what this is. So what we'll look at is total charging SQL queries. We'll talk a little bit about Spark, okay? Actually, Spark also, awesome, cool technology, okay? That's a lot of stuff. In-memory execution engine, ML lib, you might be working with those technologies already. There's streaming as well, okay? But it is not a database system, all right? And so Ignite and Spark can work very well together. Think of Ignite as a persistence, okay? A sink, if you like, for data that can come in from Spark. And actually, you can work with other streaming technologies. So if you're working with Flink, Kafka, any of those things, it's exactly the same. In fact, the Kafka connector was recently certified by Confluent, okay? So there you've got some confidence in the technology. We'll look a little bit about machine learning. So Ignite does machine learning. It was in beta last year, it went GA this year, and it's being extended and enhanced, and they're adding deep learning to it. So TensorFlow integration coming into .7 anytime now, right? Actually, my boss, Dennis Magdurpe, so Dennis is the guy who runs the project at the Apache Software Foundation, he's the PMC. I've got a whole lot of Gira tickets I need to clear up when I get back to London, but at the moment, since I'm on the road, I can put it off for a while. So we'll look briefly at Kubernetes as well. Very, very useful for DevOps. All right then. So as the Americans say, here is the kind of 30,000 foot view of what Ignite is, the big high-level architectural view of the technology, and its core is this thing here, memory-centric storage, okay? So Ignite started by grid-gaming about 10 years ago, classified, I guess, as an in-memory data grid. And there are other in-memory data grids out there as well. So you have Oracle coherence, for example. You might have heard of it. Hazelcast might have heard of it. There's Apache, another Apache project as well. Design to solve two problems, scale and performance. So the scale comes from the fact that it's cluster computing. Just add more resources. It's a peer-to-peer system. Add more nodes. You get more power. Performance from the fact that it uses memory super efficiently. So you're running operations at memory speed. You do not have to go to disk to load stuff. But the added persistence capability as well. Because the idea is, suppose you have a cluster of 10 machines, everything is held in memory. Let's say something catastrophic happened. Say an asteroid hit your data center. All your data is gone. It's all lost. It's in memory. Now you can persist that data, okay? So Ignite can save the state. The other benefit that you get from this is that if you've got, say, 10 petabytes of data, it's highly unlikely, you know, cash it all in memory, unless you're Facebook, Twitter, or Google. And probably none of you are. And so it's expensive, right? So, you know, save most of that data, cash what you need to work on, okay? The other thing is that it works really well with other persistence technologies, particularly if they are transactional, relational systems. I'll show you a quick demo in just a moment, okay? So the idea is that you have invested time, money, effort in an existing business system. Chances are you're using Oracle or MySQL or Postgres or one of these major relational products. What Ignite can do is it can cash data from that system in memory. Plus, you can run operations at memory speeds and it will keep the cash and the backend system in sync. And it will do that for you automatically. It generates all the plumbing, all the infrastructure for you. You, as a developer, do not have to do anything. The only thing you need to plug in are your credentials, username, password, the port number, and the IP address where the server is on. Okay? Now quickly then, some of the features. So it does SQL. So whether you like SQL or you hate SQL, the fact of the matter is SQL is intergalactic data speed. It's not going away any time soon, right? Be I told you this. You want a good skilled SQL developer, you can find one. Very reasonable, easy to do. Okay? So it does SQL. It's a key value store and the value can be anything. Okay? Simple types, character, integer, floating point or something that you define. For example, a financial instrument or a healthcare record. Ignite does transactions. Okay? Let me give you an example of why transactions are important, particularly asset transactions. So I'm a great friend of Moose. Massive open online courses. Early this year, I was on the website for one of these merchants. I mean, there's lots of them, Coursera, EDX and so on and so on. I got my credit card out because there's a course I wanted to do, put the details in, hit the pay button, and it came back and said, there is an error. Okay, I thought, there's an error, transaction didn't complete, does not matter. Some time later, I looked on my credit card statement and I noticed that I've actually been charged. I go to the merchant and they say, we haven't received the money. The bank says, you paid for this. Where's the money? Who's got it? Right? There is an example of a poorly designed system. That should not happen. If I want to send a hundred Singapore dollars from my bank account to your bank account, my account needs to be correctly debit it. Your account needs to be correctly credited. It's an atomic unit of work. We can't have one operation happen and not the other one. Either all of it happens or none of it happens. Ignite or guarantee these types of absolute transactions. Very, very useful. 50% of the use cases for this technology are from the financial world because of the SQL support, past performance and transaction. This is really, really useful. It does the other type of transactions as well. It's optimistic, but no time to discuss them. Computing services will skip streaming. If you're working with Spark streaming, have to play many of these streaming technologies, ignite or happily work with those. There are adapters and connectors for that. You can do things like complex event processing. You can define windows. You can say, all right, I'm interested in the last five minutes worth of data or I'm not interested in the last 10 events. As that data arrive, you can process it quickly, throw it away if you need to, or store it if you want to, and you can then read in the next flash of data. Machine learning. That's new. Deep learning, machine learning, GA next year. It's really extending and expanding considerably. You just need to sign up to the user and get a list and you'll see all the activities running back. Okay, so Postgres. I will show you a quick Postgres demo to show you how this actually works. All right, now Murphy's law. Who's heard of Murphy's law? Anything that can go wrong will go wrong at the worst possible time. So no guarantees this will work, guys. Okay, I'll do my best. Okay, there we go. That's life. Okay, sometimes these things do go all right. I'll have to bend down. Sorry about that. If you're trying to capture me on camera, but there we go. All right, so Postgres, now can I blow this up? Yes, great. There we go. So I have this world database that I'm running. Okay, so this very simple. It's countries and cities and population and basic information like that. And what I've done is I've taken that schema, Ignite has created a project for me automatically and I'll show you where you can get the information to do this yourself. We don't have time to show you today. But then, oh, thank you very much. All right, that's much more comfortable. All right. And then what I've simply done here is read in the project. Okay, so all of the code there is auto generated by Ignite. I've simply had to plug in my credentials, the server IP address and the port number and that's it. Right. So now what I can do is I can start a server node. Here we go. So run Ignite server is down a bit and you'll see this window pop up. So now what's happening is I am actually launching a cluster, not a big cluster. It's just one machine. Okay, but it's okay enough for our purposes. We could launch more and the way this is configured obviously it's running locally on my laptop within the IDE. Okay. And it's configured to hold everything in memory. All right. So that's up and running. I'm going to load that now. So it's going to connect to the Postgres database. And it's going to take the data from that database, cache it into my cluster of one machine. Okay. All right, that's fine. And if we have a look here, it's actually super fast because there's not a lot of data to actually work with. So if we have a look down here, you can see it's done it fairly quickly. Okay. There's not a very big database to really work with. All right. All right. So that's done. Now my data are in memory. Let's switch over here. This is a REST based interface. Okay. Just a browser and I've got a little bit of software running in the background that this browser connects to and that software connects to my cluster. Now I can look at stuff that's going on in my cluster. One of these is this project that I've got, which if you look closely here, it's just standard SQL. There you go. Let me just drill in and have a look. Okay. So here we've got a sort of query, name, maxpopulation, maxpop from country, groupbyname and so on and so on. Standard SQL, SQL99. If we run this, okay, we get back some data, China, India, United States, three most populous countries. We can do things like this. So I can change all occurrences of USA, for example, to United States. Okay. And again, if I run that now, it's just going to give me back, oh, sorry. Just try that again. There we are. Yeah. So it's just giving back a result of one to say it was successful. Okay. And I can do the same back. There we go. And then just simple select queries like this as well. Just look for cities named London and we find actually there are a couple. So we've got London, Ontario and London, England. All right. Now, what's happening here is that data is being modified in memory. As it's being modified in memory, it's been pushed back to osgress. The two are kept in sync. Okay. Pretty cool stuff. Okay. Very, very useful because the idea behind this is no rip and replace. It's highly unlikely you're going to throw away your existing business system just because for maybe 10% of cases it doesn't work very well. So in those situations, something like ignite can really help because it can boost the performance of queries. You're running things at memory speeds. You add more resources as you need them. Elastically scale up, scale down and this thing will work anyway. Anywhere the job will work. You can run it on a Raspberry Pi if you want. Okay. But it will work in the cloud. You can configure it. Very, very easy. Really super simple to run. Okay. Actually, I've got very few slides to show you. The demos that take more time. Okay. That's the killer in terms of time. All right. So Spark. And you guys might be working with Spark. So the integration with Spark is very tight and long. So you have this notion of share that you need for data frames. So the thinking here is that, okay, you're working with Spark. Spark applications terminate. You might want to save that state and data somewhere. And you can do that now with ignite. So an ignite cluster can save that state. With Spark guys, state and data still exist. All right. This can be relaunched again. Or you can have third party applications connecting here as well. And that's very, very useful. Again, this integration is very tight. Okay. It saves you guys a lot of time and effort in terms of working on this. And the thing is that you can do useful stuff as well. So boosting performance, running SQL on top of this data as well. That's capability is provided. There's no data movement. Okay. In-place query execution, you get lots and lots of benefits from this approach. The other thing that I mentioned a little bit earlier on in terms of Spark then was the streaming capability. And let me show you a quick example of how that works. All right. So let's just bring these down a little bit. So we got a little less fewer things to distract us. So I need to terminate that server that was running there. That's fine. And we'll minimize this as well. We don't need this anymore. All right. Almost done, guys. A couple more slides. A couple of quick demos. All right. So first thing I'll do then is again, I'll launch a cluster. Okay. It's going to be a very simple cluster of one node. Again, as I said, Murphy's Law. So we don't want to push the machine too hard. I don't know what his behavior is going to be. I've had some problems and I complained to my boss. So he says, get yourself a Windows machine. So maybe I'll do that. Macs are great. I've been using them for a long time, but seriously, guys. Sometimes I wish Apple would make the decisions they make. These keyboard are terrible. And all this fancy graphic stuff is rubbish. And he's just, there you go. But that's me. All right. Anyway. So we've got a cluster of one running again. And actually, why don't we launch another one? We can launch another one. So we'll do two nodes. Okay. And these nodes will just find each other. Okay. It's just a configuration file. They pick up the same data. They're able to join form a cluster. And now, if we look, we can see here, I'll try to zoom in for you. If you can, if you're sharp-eyed, you see now it says server equals two. Okay. So there's two servers running. And they're just sitting there and waiting for work to come their way. All right. So next thing we'll do is we'll launch a sensitive generator. And you are going to be seriously underwhelmed when you see what it says. There you go. Sensor sample generator is up and running. What is it doing? It is generating random data to simulate IoT. All right. Could be wearables. Could be some sensors. Lots and lots of data being generated. But it's going nowhere. It's writing it to a socket and it's disappearing to a platform. Okay. Nothing's happening. Now, what we'll do, let's launch Spark. So Spark will take that data, connect to that socket. It will read it now and it will do a little bit of processing. Okay. So let me zoom in a bit more. Now notice lots of info and that's just up for Jay. Okay. Don't worry about that. Very useful. Okay. Don't disable it. Gives you lots of useful information to look at. Okay. Because you want to know what's going on, right? But for our purposes, all right. But essentially, what Spark is doing now, it's taking the data, streaming it in to that Ignite cluster now. All right. So that Ignite is the cut of the endpoint. It's consuming that stream data now. Obviously, once the data are now in Ignite, we can now do something useful with it. Okay. So let's minimize this and let's query that data quickly here. Okay. So now we've got another window open. Keep your eyes glued to the screen. Watch this space carefully. Periodically, you will see a table pop up, a relational table. Okay. And what's happening is that Ignite is now doing some processing on that data and it's pushing some output to the console. There. And it's just loop forever. Okay. So let's just keep doing and accumulating the values and it's just giving us the top 20 values. All right. So there is a simple example proof of concept of the ease with which these technologies can be integrated, how Ignite can act as a sync, Spark can stream data in and Ignite can consume that data and then process it and do useful things with it. Analytics, we could run machine learning, we could run SQL code on that. Right. Okay. All right, guys. So I'm nearly out of time. So let's kill this one. I think the easiest thing to do is just close it and just terminate. It'll last me a couple of times. There's a few processes running. There we go. I need my resources back. There we go. Finally. All right. So here we go. Machine learning. So thinking here is that as a business, typically you're going to have large quantities of data. As data scientists, typically you work with training and test data. Maybe you build your models on a laptop. That's okay. But think about real world problems. Think about in a banking environment where you've got millions of transactions coming into the system. You want to detect whether these are fraudulent or non-fraudulent, right? You want a classifier that can run in real time, be able to give you some indications of probability, some possibility for you to identify these very, very quickly and stop them in their tracks. All right. For these kind of cases, you want to scale the system up. And you've got lots of data that you're streaming in. They built these algorithms from the ground up to go up in distributed manner. Large-scale parallelization. Again, you're using the power of the cluster and all its processing capability to really help you in terms of doing analysis of this. In the past, you had to do ETL. That's awful, right? Imagine you've got 10 petabytes of labor in an Ignite cluster and you have to ETL all of that out to be able to do machine learning. Not very useful. Now it's all possible in place, okay? Very quick demo. And then we'll just wrap up then. Okay. So let me exit from here and show you an example. So if you download the source or the binary distribution from the Apache website, as part of that, you get lots and lots of examples, okay? They ship everything with it. And one of the libraries that comes with it is this machine learning library. So you get clustering, K&N, preprocessing, regression, SVM, trees. There's lots more stuff here, okay? The simple example that I will choose here is this genetic algorithm. So this simulates biological evolution, okay? So it's got chromosomes, genes, mutation. You're trying to start with an initial condition, which is hello space world, right? And my seed, what I'm starting with is the letters A to Z and the space character. That's what I'm using. And then the system just goes away, iterates over this to find out how many generations it takes to actually arrive at the answer. It varies from run to run, okay? So we'll kick this off and let it run, okay? Because it takes a little bit of time and just see that it starts. And then I'll just wrap up with the slides and then we'll come back and have a look at this in a minute to see where it goes. All right? So there we go. It's on its way now. It's running. And let's skip to the last slide here. So Kubernetes. Very, very quick demo. I had to set this up a little bit earlier. Problem here is that I'm using this technology called Miniku, okay? Which is a great way to run a kind of Kubernetes cluster locally on your laptop, right? But the thing is, it's version 0.282 or something, right? You have to be careful. It's very, very great. So I've got a couple of pods. Each pod is running an ignite server. Okay? So here you can see there's two pods. And if you drill down and have a look, now you've got ignite running as a service inside Kubernetes. Okay? So here you can see, again, just show you. There is an ignite node. And if we zoom in and just have a look, there it tells you there are two servers running. Okay? So one pod contains one ignite server. Very, very simple setup. Awesome for DevOps people because you've got a standard set of commands now to manage this kind of environment. Doesn't matter which cloud you're using, whether it's Oracle or IBM or Microsoft or AWS or whatever. Okay? It's okay. Kubernetes will work across the board. And now you have a standard way to be able to manage your cluster, run operations, you know, scale it, scale down, whatever you want to do. And it will work the same way. I mean, this is really, really useful for you guys. So here's an example of ignite working in the Kubernetes environment. All right. So very quickly then let's have a look. Where did our genetic algorithm get to? So here it tells me it took 320 generations to actually arrive at the answer. Okay? And it tells me what's the chromosome, the fitness score, how many genes and so on. And this is the string that we were looking for. Hello world. Right? Now there's lots of complex stuff there as well. There's a knapsack problem, all sorts of other things that you can look at in your own type. But this is a kind of good, simple example to show just for you guys. And let me just wrap up then. So in terms of Apache Ignite then, so these are numbers from the Apache Software Foundation. So we are number one, as far as the developer mailing list is concerned. If you have a question, no matter how simple or how complex it is, please, please post it there. Awesome community. They will jump in and help you and answer your questions straight away. Okay? We are number two for the user mailing list and number four by the overall number of commits that we have. Only Hadoop and Barry and Campbell are ahead of us. And each year there are over one million downloads per year. All right? And remember, everything I've shown you today is free and open source. There is no cost associated. Now if you want free things stuff, please talk to Alan and his colleagues. All right? They will help you. All right? If you want to pay us money, you know, of course we'd like your money, but my mission is to just tell you about the open source. All right? And there we are. And just, I think finally, so here's where to go to for Ignite. Okay? Ignite for the Apache.org. And remember, I said right at the start, I will show you where to find a couple of screencasts. You look in the top right there, it says screencast. If you drill down and scroll to the bottom, there are three videos, 10 minutes of your time. Just go down here. This uses my SQL and shows you how to set up that project, generate the code for you, connect to a database, and then use that caching capability directly. Okay? It's awesome. Okay. I'm on LinkedIn. Feel free to reach out and connect to me. And yes, that really is my job title. Okay? There we go. The Chinese there because my wife happens to be Chinese. Okay? So yeah, I have to keep her happy. I'm also on Twitter as well. So feel free to reach me there if you so desire. Okay? You're very welcome. Any questions? It's just firstname.lastname at gridgame.com. I said, you know, Albin and his colleagues are here as well. So please chat to them if you want to know more about grid game, the company, products, services that they offer. If there's something specifically that you would like to know more about in terms of the open source, please reach out to me. I'd be very happy to assist you the best that I can. Other than that, I thank you very much. So, right. So all good? Yeah. Any quick questions? Any quick questions? Maybe one? Yeah, take one. Yeah, yes. Alaka? Yeah. A simple question. Also, may we might be doing in the light of the next question? Right. Okay. So the question was, what's the difference between Ignite and Memcache? Okay? So Memcache is a great kind of caching technology as well. But remember, Ignite does far more than just caching. So this transaction, it does SQL. It can persist data as well now. It integrates and provides streaming capabilities. You can do microservices on it. Now, can Memcache do all of that as well? I don't know. I don't think so. I mean, you know, if you think of Ignite as like a Swiss army now, it has a range of capabilities. You don't have to use all of them. You can use one of them or two of them or all of them. It's like the same in English. You know, the whole is greater than the sum of the part. So these components are integrated. If you want to use machine learning, for example, on streaming data, you can do it. You don't have to go to a third party to use it. And so really think of Ignite as far more capable than just the caching technology, far more than what Memcache can do, or even a key value store can do, or just a relational database can do. I hope that answers your question. Okay, thank you. Yes, question. The question is to ask a very funny question. Yeah. So I myself appeared around like 50 to 200 machine learning models in the last time. So I always have an issue with regression model, right? Because of the interaction, consideration, and then if you actually have variable eight, and you know that the permutations of interaction can go on to end permutations. What you have on there is this crash. Usually we try not to effect that analysis. But we don't use the factor score because if you use the factor score, it will go into issues on my campaign because the predictors cannot actually have co-ordination there after. Now, you only have regression. Do you have some things to cure interaction? My advice to you is close to the death list, okay? Those guys will be able to help you. We've got an entire machine learning team at Grid Game that is basically doing the development part on this. Those guys are far more than I do. I know the point you're trying to make, and I understand. I think the key thing is always in this is that as developers of machine learning, as developers, as scientists, we have to make some choices with, you know, even building architecture. So things like distributed systems are hard, okay? I mean, you can't solve all your problems. Ultimately, you have to make some decisions about how you're going to do your modeling, what sort of, you know, the pluses and minuses that you see within each of these implementation of libraries and how you're actually going to run your code in your models. Those are choices that you'll have to think about. How do decision trees think, you know? Oh, decision trees, all of that stuff is supported. That's good. Yeah, yeah. Yeah, I mean, the GAX outblush is a super simple one because it's quick to run and it just, you know, within about three or four minutes, you get back an answer. Other stuff I could run potentially might last too long. So my goal was simply to show you that this idea of genetic algorithms is kind of useful one for some business partners. It's like all algorithms. If you know how to apply them and the kind of business problem that you have, you will, as a data scientist, which I think you are, you will understand the strengths and benefits of the different algorithms and you will know straight away, okay, it's better to try this one than this one because I know this one has some limitations or, you know, the limitations may be in the product's ability to be able to run those efficiently and effectively. All I, I think, simply say to you is that keep in mind that, yes, you've got a great point that you've made in terms of there are, you know, you could scale to the nth level and that could cause real problems. Ignite will scale horizontally. You can add more possibilities, but even then you can still bring the possibilities in terms of what you're trying to achieve. It won't help you. It's a decision that you have to make and ultimately decide whether you want to run it. And the thing is Ignite won't stop you, all right? It won't stop you from doing that. There are cases where people have really made serious mistakes. I give you a very, very quick example just before I finish up then. So one of the really useful features that Ignite provides is this thing called peer-class loading. So essentially what that is is that as you modify code, Ignite automatically can push that code out to the cluster for you so that you don't have to take your cluster down and bring it back up again. That can be time consuming. Now in a test environment, that's okay. In a live environment, please do not do that, okay? Someone tried something similar in a live environment. They said all of people make balances to zero. The bank had an awful lot of time just rolling back and trying to recover some state, all right? So those that this is not Ignite, it's just software, right? We are human beings. We make mistakes as well, but ultimately it's the choices that we make in terms of implementations and what algorithms we want to apply that really need coming out useful to technology. But great question. Thank you very much for that. And a lot of time, guys.