 Here we go. Thank you, Saktis. Hi, everyone. Thank you for joining tonight. Thank you, Saktis, for organizing the session. I also thank Microsoft for hosting the talk tonight, and I'm quite excited today because I get to talk about my three favorite topics at the same time. So big data, in-memory analytics and cloud computing. And to do that, we will tell the story of an experiment that we at Active YAM did with Microsoft. We wanted to explore new possibilities for in-memory computing in the cloud. And yes, we ended up creating some systems that can operate 4,000 cores and 60 terabytes of memory. And that you can start from nothing to usable in less than half an hour. So we'll tell you everything about it, all the steps to do it yourself if you want. And I think that as we go, it will send a strong message. The message is that you can add up the advantages of in-memory computing and of the cloud. So fast analytics, interactive analytics of in-memory computing together with the agility and the cost efficiency of the cloud. And this combination, I think, is the foundation of something new, like a new model to build a new generation of operational applications. So I'm Antoine. I'm the head of R&D at Active YAM. And with my R&D teams in New York and Paris, we build the Active YAM analytical platform. And it's a very powerful platform that can help make faster and better decisions to keep it short. And I came today with Nidha Abouzid, my colleague, and the technology manager of the Active YAM office in Singapore. And Nidha and I joined Active YAM at the beginning 10 years ago. It was in Paris and we wrote the first lines of code of the product together. We did the first client implementations together. But then, seven years ago now, he left Paris for Asia and he established the technical team of Active YAM in Apache. And he's been leading it since that start. So we thought that to do the presentation tonight, we would do it in two parts. Why and how? Why did we do that? And how to do it? In between, we have to have pizza. So let me ask you, why do we do this, have pizza, and how to do it? And the white section, I'll just spend a few minutes to go back to in-memory technology as a trend. And the change it brings and the new use cases that benefit from it. But then we will quickly jump into the heart of the presentation, which is the step-by-step tutorial to do it at full speed and to operate thousands of codes and tens of terabytes of memory. And all that with the agility of the cloud. Let's get started. So how important is this in-memory computing story? After all, the idea of loading data in memory to process it is as old as computer science. But yes, but it used to be one little chunk of data at a time. Between the 80s and today, you may know that the price of memory has been divided by a million. It's a million times cheaper to use RAM today than it was in the 80s. So of course, at that time, nobody could dream of loading an entire data set into memory. No, we could dream. It's all you could do. You could dream of it, but now everyone can do it. And it's not even an investment anymore because you can get with a finger snap an instance in the public cloud in Asia that gets as much as two terabytes of memory in one instance. That's the current max. So no planning. You can get them on the fly. And so when you think about that, two terabytes of memory, it's probably enough to handle 90% of all the data sets in the world, not web indexing, not Facebook, but for the rest of us. 90% of the data sets in the world, they would fit into terabyte of memory. And so there is no wonder that all the technologies that we've used, that we know and we've used before, they are now being redone with in-memory computing. There's a few examples on this slide. For instance, when it comes to OLTP, so very fast, small transactions, VaultDB was one of the first to introduce in-memory technology to accelerate and improve OLTP workload. And I'm sure you've heard about SAP HANA, which does it for analytical queries. And also other technologies such as key value stores, Redis for instance, but also Coherence or Hazardcast. They also use in-memory computing to give you low latency access to data elements. And finally, let's not forget Spark, Apache Spark, the very famous batch processing framework that accelerates batch processing on big data by keeping intermediary results in-memory. So everything you could do before, you can do it faster now with in-memory computing. But is this the end of it doing the same than before faster? I don't think so and think there is another way to look at it. And I've put up just a few examples from my experience of what actually are the benefits of in-memory computing in real life use cases. So all businesses, little and big, they have their rules, their formulas to quantify how well they run, to quantify their opportunities or their risks. But very often, those calculations, they are run at the end of the week, at the end of the month, before the board meeting maybe, and in an Excel spreadsheet. It's very common. But what will happen when those organizations realize that those calculations, they used to steal the business, now they could do them all the time, continuously on real life data. So that's what in-memory computing brings. It's not about accelerating the practice that already existed. It's about changing the way you're doing things, to introduce real-time notifications to do calculations on the fly, what if analysis alerts. And for instance, let's start with our example. Banks, banking industry is a big user of in-memory computing. And for instance, traders with in-memory computing, they can have a more precise and more real-time calculation of the risk. And for them, it means that they can operate closer to the limit. They can take a bit more risk, and for better margins, of course. But beyond that, also, with the new ability to test decisions directly on the live data, what if analysis, when they are about to do a trading decisions, they can test several options and see instantly the feedback, the outcome of those decisions, and take the best. And it's not just for traders. Risk controllers also in banks who are reviewing and monitoring the risk indicators of the bank. They can also do their job faster and with less mistakes, because with in-memory computing, they can calculate, they can run the calculation on the fly as many times as they want. They can look at the steps of the calculation. They can go to the level of detail that they want. And that's how they can improve their productivity. And if we look at retail or e-commerce websites, there are marketing teams there in charge of setting product prices. They set the right price for the products to be sold. And for that, they use some business rules and formulas to determine the right price. So it involves maybe the stock level, the price of the competitors, of course, very important input, the time of the day, the time of the year, and web analytics activity. All of that goes into a big formula that gives you the right price. And of course, with in-memory computing, every time any of those parameters changes, they can rerun the pricing algorithm again and again, continuously. And that's how they enter into the dynamic pricing era. This new field that was pioneered by Amazon, Amazon.com, of course, but that is not spreading to all organizations, e-commerce, online, but also retailers and brick-and-mortar. And let's look at one last example in the field of supply chain. Here, for instance, the food retailers use special tools to plan in advance the delivery routes from the warehouses to the stores. So it's planned in advance. But with in-memory computing, they get the new ability to do fast and informed changes to those plants. And for instance, I will tell you a story that a friend working in a big food retailer in the US told me. He told me that when the weather forecast announces a sudden cold, I don't know, in Northern California, for instance, statistically it is known that when the cold comes, people will drink more cocoa, more hot cocoa. And so there is an opportunity to put more cocoa on the shelves of the stores in that region. But the questions when you're operating the business is how do you do it? Will you re-root some trucks that were going somewhere else? Or will you shift stock from another store in the same country? And of course it's much easier to know the answer, because it's always the best decision when you can calculate the outcome of each of them. And I could give you many more examples of how in-memory computing can change the way you're doing things in different businesses, because I draw them from what our customers are doing with the activity and technology. But if you think about just those three examples, they have something in common. They have all operated a transition from batch processing to intraday applications, from static reports, pre-canned reports to interactive analysis, and from spreadsheets to collaborative environments, where different people who run the business can collaborate. And enabling this transition, this is really what we do. In fact, it's the best introduction to Activia. We have the technology to do this transition. And Activia itself, the analytical stack of Activia, is a comprehensive and powerful suit. So the primary component at the top is called active pivot. Maybe you've heard about this before. And it's an in-memory analytical database that is very unique in the sense that it fusions database technology, database engineering, to do filter, group buy, aggregates, together with a true calculation engine, something that is more like HPC. And that's a very key ingredient in memory computing, I believe, because if you remember to the previous cases, we just looked at it. It's not BI. It's more than BI. There are actual calculations. There are SLAs for supply chains. There were risk indicators in finance. There were price formulas for dynamic pricing. So all of that is more than BI and requires an actual calculation engine mixed and combined with the database technology. So that's what Activia is. And over the 10 years when we've been working hard on this, we have become one of the leaders of in-memory computing, in particular in finance. We are the industry leader for in-memory computing in finance. And being recognized as such has opened up for us key technology partnerships. For instance, we are working with Oracle, and the guys at Oracle who make Java, because active pivot is based on the Java platform. We have access to the guy who makes the JVM at Oracle, and for instance Java 9 releasing this form contains major improvements that we ask them to do and that make it easier to run active pivot and in-memory computing on the JVM. And also that's what opened up for us key partnerships with the big cloud computing layers, in particular Microsoft Azure. And then what we're trying to do is to bring in-memory computing to everyone, trying to lower the bar to enter this technology, to use it on your own use cases. Because we believe that by combining the strengths of cloud computing with the strengths of in-memory, we will create something new. If you're here tonight, I guess that you already know the advantages of cloud computing. So to me, the most important point are the agility and the elasticity. The agility of how you can get resources. You can get a server with two terabytes of memory in a few seconds. While before, maybe it would have taken months in an organization to get this server. And the agility of the application that you run, because you can start them on demand, run them on the hardware for the type of being used and shut them down. And of course, the elasticity of the platform where you can, if your instance is too small, overnight you could replace it by a bigger instance. Or the elasticity that's even more real-time, which is managing a cluster where you can add nodes or remove nodes during the day to match peak capacity and to stay efficient regarding costs. So we all know the benefits of the cloud, but maybe for others who haven't been playing with cloud computing so much, all of this is still abstract. Those are just words. And what we really wanted to do was to demonstrate this combination on a real stuff, on a real project. And so did Microsoft, Microsoft Azure, because they have a long-term investment in high-performance computing in the cloud. That's why they partnered with us. And that's why they opened up access to their expertise and their resources. And here is what we did with them. We took a big data use case. Backtesting and thread analysis in finance on a historical dataset. So we took a 400 days historical dataset, 100 gigabytes per day, for a total of 40 terabytes of data. So with 40 terabytes of data, which is big data territory, if you want to do better than just a batch, batch processing on your 40 terabytes, you need in-memory. Period. There is no way you can do things interactively on 40 terabytes without in-memory computing. But at the same time, buying and operating enough servers to process 40 terabytes in memory is very expensive. It can be a showstopper for most organizations. And instead, what we should be able to do is to rent it. Let's not buy those servers. Let's rent them just for the time needed to do the analysis session. And of course, at this scale, it sounds impossible. It was impossible to use 40 terabytes and thousands of calls just for an hour until it's done. Sounds like too much work. But of course, big spoiler. In fact, it's possible. It's possible today. And we did it. We did it in Azure. You can do it in Azure too. And to do this, to load the 40 terabyte data set in memory and be able to do interactive queries on it, it took us a cluster of 128 servers, some of the biggest servers available in Azure. So instances called G5. You know the G5 instances. They come with 500 gigabytes of RAM and about 32 cores, I guess. So overall, this system has more than 4,000 cores and six terabytes of RAM. And you will see that in Azure it's possible to start this cluster from nothing at all. Just the data stored on cloud storage at rest and do everything, start everything on the fly until it's ready for analysis in less than half an hour. Why so long? I don't know, is there anyone from Microsoft in the room? It could be improved. It could be improved. But that's setting the record, so maybe next year. So we will now enter the heart of the presentation. We'll do the step-by-step presentation and explain how it can be done. And for that, I will be handing over to NIDA, who will start with step one. How do you start that many instances of the fly? I give you the price of the cluster, the price per hour of the cluster. So NIDA handing over to you and NIDA will show you that you can start 128 instances in the cloud with just one line of code. Thank you, Antoine. Indeed, this 128 instances were started with just one line of code, but as you will see, this line of code takes like three slides. So as you can see here, we used Java SDK to start those 128 instances. We started first by defining the network interface, and then we defined the virtual machines. We started by using, we built our own virtual machine based on Linux, of course, where we installed our software, Active Pivot. And this is how we deployed the 128 instances, based, as you can see, on the standard G5 Azure instance. The last part is just to validate that everything is up. So this is how we started all our instances. Of course, Active Pivot, when you start every instance, the VM turns on, and then the Active Pivot in memory database will be turned on as well and start pumping the data from the blob storage. Let's talk a bit about what is a blob storage? Blob storage is where this is a solution offered by Microsoft Azure to store your data. This is what you attach to your VM instance, and it's made with what we call an account. This is the main entry point to the blob storage. Every account can manage a limited number of containers, and every container can manage a limited number of blobs. Blob, this is where this is your file, basically. It could be any type of file. And the container, it suggests if you need to group the files by type, I don't know, videos, music, CSV data, XML, etc. So this is what we used to load our data. So first challenge. First challenge is to start sourcing the data as fast as we can. If you recall well, we had to load 40 terabytes of data. If we have only one instance that has to load those 40 terabytes, the bandwidth we were having on the instance site is 50 megabytes, so we would load everything in nine days. So that was not acceptable, of course, but of course you can do it with this code. This is the basic Naive code that you will find when you go and visit the Azure website, asking like, how can I use Java to download data from the blob storage? So this is the Naive and the Naive approach. If you have to load, of course, few meg of data, but you have to be smarter than that if you want to load 40 terabytes in 30 minutes to make them available in 30 minutes. So then we move it to the old recipes. Actually, the single instance will connect to the blob storage through HTTP. So then we decided to, instead of having one single connection, to saturate the bandwidth of the VM instance and to open as many HTTP connections as we can. This is what some download software or legal stuff use. So this is what we did. So we were saturating the bandwidth of the instance and then we're having 20 plus connections per instance. Just to give you an idea about the snippet of code we used, the download method you see here is what every task will use to start downloading the data from the blob storage. If you're familiar with the Java API, I can tell you that we relied on the out-of-the-box Java NIO channel package. And we relied here, as you can see, what I highlighted, the download range to byte array, which downloads the data from the blob storage and writes this in the byte buffer. And then you can work on it. Sorry, I just have a question. I'm not making it. This line, path to a meeting, the third line from the top. This will consider the case where the data is georgraphed in the cluster. Actually, the storage that we are using, what's the exact question? If the storage is in another geography, like my instance is in Singapore and my storage is in the US, is it what you mean? No, you're trying to say a select star from something. And something, the data is replicated. So is it taking that as a thing? Yes, but it is transparent. When you call blob storage to get a block of data, automatically it will probably take the replica, the closest to the VM. And of course, to make it faster you have to take the closest one. It is not explicit. You don't decide which replica you get it from. It's transparent like a service. So the overall connector that we wrote has, of course, much more lines than what you see, but this is the main core logic that we implemented. So keep that in mind. You have to open many connections, many HTTP connections, saturate your network interface on the instance side, and then start pumping the data in parallel. Yes, sorry, I didn't mention. So if you recall, Will, if I go to the first slide, we said theoretically we will load 40 terabytes in more than nine days with one single connection. When we parallelize the work, we'll load that in nine hours. Of course, if we rely on one storage account and on one instance. But as you can see, we use it more instances and more storage accounts. So from nine days, we move it to nine hours. And now what we notice it as well is every storage account has also a throughput of 30 gigabits per second. So to saturate the storage as well, we need more than one instance. Remember, one instance 10, three instances 30. So this is how we will saturate every storage. And then from nine hours, you move to three hours with three instances, loading the data in parallel from the storage account. And then the final metrics we ended with. So we were loading the 40 terabyte in 13 minutes. We relied on 128 instances. So roughly you have 40 accounts that were used. And then 40 times three instances, you have the 128. What you have to keep in mind is we had an aggregated bandwidth of 50 gigabyte per second. So this allowed us to load the 40 terabytes in almost 13 minutes. Of course, here I'm talking about the 13 minutes, but you have more than 30 minutes because you have, Antoine will describe that later on, our distributed technology, like the nodes have to, you have to have some exchange of data between the nodes and so on. So this to have the cluster up and running, you need more than that. But this is just the loading part. I know you like this one. So this is the, okay, let me introduce this graph you're looking at. So on the x-axis, you see the time. And on the y-axis, you see the data. And one single color here is the instance getting bigger and bigger loading the data. It looks like a swarm. It looks like nice painting. But you can see here that we were really loading as fast as we can. And I guess the neighbors who were leaving around us on the cloud at that time, they were a bit suffering because we were pumping all the resources, because not all the persons working on the cloud will do such tests. And every day this was just for us to see how fast we can load that amount of data. And so we did it. You could say we were a noisy neighbor. Of course, yes. Just a side note on the security. Sometimes when you do some tests like this, you disable the security just to load as fast as you can. When we did, when we captured our metrics, we didn't disable the security in our tool, and the object storage we were using remained encrypted. Which means that this is potentially something that you can have in production. And this is, those could be a real production metric. Since we were respecting the security, the security was turned on while loading the data. So we didn't remove any overhead. This is my point. Thanks, Nina. So I would like to stress the point that moving data at 50 gigabytes per second, this is huge, huge. I have never seen this on premise or in a data center on projects I did before. But here it's available to everyone. It's not even using premium storage of Azure. It's using the basic blob storage. That's how much of a change cloud computing can bring to your workers. And thank you again, Nina, because that brings us now to step three. So all that data that we have brought into memory, now we have to put it into structures, data structures, and all those 128 nodes, we need to gather them in a cluster so that it's actually possible to do queries, to do interactive analysis on the data. And of course, Actifigo does that in the experiment. And we've been hard at work on this topic for 10 years. So if you allow me, I'll try to summarize 10 years of R&D in two minutes. So the first thing that's very important for a memory is to manage the memory the right way, quite obvious. And when you're running on the Java platform, it means that you have to go beyond the standard memory management of Java. In particular, you may have about garbage collection, this algorithm that cleans objects after they are being used in Java. It has limitations that could not allow us to run on several terabytes. So Actifigo, in fact, also running in Java, does much of the memory management by itself from the inside and using what is called off-hip memory. It means memory outside of what the JVM sees and that you can manage yourself more efficiently if you know what you're doing. And all of the important data structures of Actifigo, the columns of numbers, because of course we are using a column store which is the base layout for analytics. The indexes, the hash tables, the simulation numbers are all stored off-hip. They are not managed by Java. And to do that, we are relying on very well-known operating system primitives such as memory map. But over memory map, we have implemented a bit like our own malloc, the Activia malloc algorithm that works well for our workloads, that is optimized for our workloads. So you have to manage the memory the right way. And of course, if you want to do something fast on those terabytes of memory, you need to use all of the processing power that you can get. And you need to use all the cores of all the processors in your cluster. And for that, there are also well-known techniques. I think we are using all of them. For instance, using a special thread pool called the fork join pool. I don't know if you've heard that name before. It's a special thread pool where the threads can still work from each other to make sure they are busy all the time and that your cores are busy all the time. It's called work stealing. And of course, we also use log-free data structures everywhere we can to avoid contentions like mutex between the threads. So we do that for dictionaries or queues and our indexes. And in fact, if you really want to take advantage of all your processors and all your cores, here is something else that most in-memory database has to do. The idea is to partition the data in the memory. Partition the data in blocks and have one core, one of the many cores on your server, handle one partition. That way, the core can operate on the partition very efficiently without synchronizing or contending with the other cores. That's if you want to take advantage of all your cores. If you want to take advantage of all your memory bandwidth, you also have a bigger server to take into account NUMA, the non-uniform memory architecture. You know that servers with more than one socket, servers with multiple processors, they have the memory chips a bit like distributed among the processors. And if a processor reads data from its local memory chip, the performance is optimal. But if a processor gets data from a remote chip, there is a performance penalty. And you pay this high price for in-memory workers. So in Active Pivo, we have a special NUMA-aware allocation pattern that ensures that the threads running on one core always access and operate on data, on the partition of data, on the right memory chip. Very important for a memory database. On the kind of hardware that we are operating in this experiment, the G5 instance, there are two processors. So it's a small amount of NUMA. NUMA impact is not very great with two processors. But if you move up to larger systems, and I'm not sure it's been announced yet, but in Azure, some new bigger instances are coming called the MS series that will have up to two terabytes of memory, and those will have four processors, four sockets into the server. And in that case, the NUMA impact can be higher. So we have that covered. And finally, if you want to take advantage of all those servers, 128 servers in our case, you need, of course, a proper distributed architecture, something that can distribute the queries of the users and all the nodes in the best possible way. And here, the challenge is, in fact, almost the same than for NUMA. What you want is to run as much calculation as possible within the nodes and sharing as little data as possible between the nodes. That's the secret of a good distributed architecture. And to do that, Actifivo has a two-way architecture where we distinguish the data nodes, the 128 nodes, and the query nodes, which are just there to schedule and dispatch calculations on the nodes, on the fly, and reassemble partial results before sending the results to the end user. So it's only with all that that you can really do something with all this data loaded in memory. And now that I've said that, I think it's time to put it on the benchmark, to put our big system on the benchmark. Here is the benchmark we've done. We've took one query doing trend analysis. Trend analysis means that it's a query that does aggregation and calculation on all the historical days in the dataset. So it touches every single bit of data that we have loaded. And we've done it on the raw data without any optimizations. We've done query one, and we've done the same, but we're enabling a special optimization in Actifivo that we call Bitmap, which is a mix of special indexing and some pre-aggregation. And what we wanted to prove with this benchmark, with this query, was how well our system follows the Gustavson's law from the Gustavson researcher who said once that when faster equipment becomes available, more servers, bigger servers, then a bigger amount of work should be doable in the same time. That's the Gustavson's law. And it's very well, it's exactly what we are measuring here because what we want to see is that if we add more historical days to a workload, to a dataset, is it possible to keep the same query time, the same interactivity just by adding more nodes, adding more nodes to the cluster? That's what we want to measure. So to do that, we started by running the queries on one single node, just one out of 100. And one node can hold about three days of historical data. That's the first point of the chart. But then we've done it with a cluster of two nodes, four nodes, eight, 16, up to the 128 full-size cluster. And as you can see, the chart is pretty flat. So it means that our solution follows the Gustavson's law very well. It scales very well and it tells a lot about the performance and the scalability of public cloud infrastructures as well as of the design of Active Pivot itself as an analytic engine. And if you enable the bitmap optimization in Active Pivot, you can do trend analysis on 40 terabytes of data in about five seconds. That's what we mean by interactive analysis on big data. Nothing other than R than that can qualify as interactive analysis on big data. So instead of looking at the query time with a flat queue, you could look at the throughput queue, the throughput at which our system processes the data. This one gives you a nice ramp, a nice scale of the figure. Because if you think about it, these systems will process data, aggregate and calculate over data at almost four terabytes per second. You know those big four terabytes hard drives that you use to store movies or PC games? So you can scan the entire thing and aggregate it in one second. That's what the cluster is doing. It's pretty crazy. And with those great results, we are reaching the end of our presentation. And I think that we have answered our initial question. Can you combine the strengths of in-memory computing and of the cloud? Yes, a big yes. And in fact, by combining the two, it becomes accessible to everyone. It's not just that it does work. It becomes accessible to everyone. Because for many organizations, the acquisition of special large-memory servers was the showstopper in the first place. It was impossible to explain buying such big servers just for one new workflow. But now this buy is gone. You can get it on the fly. You can get a server with two terabytes or 100 servers, if you want, in 10 minutes. 10 minutes to start the server. 13 minutes to load the data. And just a few minutes to prepare the cluster, the activity cluster. And then even for organizations that could afford it, that could afford the memory computing, there were still many use cases where they could not justify the cost of in-memory. This one, for instance. Four terabytes. But this type of analysis that you do only from time to time, only for a specific requirement, you could not justify acquiring that many hardware just for one to do it one time. And this barrier to entry, it's gone too. Because now you can operate it on the network. And of course it works for small problems. You could start a single instance with a few gigabytes of data, then operating in-demand, operating software such as Active Pivot On Demand that would cost you maybe a few dollars per hour. But yes, it also works at scale, at big data scale, because you can start something like that in half an hour, use it for half a day and shut it down. Pretty worth it. So that's it. That's our message today. And I would like to thank Microsoft again for the partnership, for making this possible. Thanks very much to our hosts. And thank you guys for your attention tonight and for listening to us. Thank you. And we will take as many questions as you want now. In the course slide, there was one poll which was creating the instance on flight. Yes. This was what we call really infrastructure as code. Yeah. So the create statement, is it on demand? Yes, completely, yes. So the query, as it requires, it can spin this. This is a Microsoft Azure API, in fact. You can run it from your laptop. And by running this on your laptop, it will call services in the public cloud that will fire the instances and return to you when it's done. But the demand is done on manual basis or on automatic basis, as your data increases on flight. So it can automatically take care of creating the instance. Yes. It is very common to have automatic elasticity and the size of a cluster growing and shrinking with respect to the workload. But here we did not do that. Here it was a static, start it all in the beginning kind of configuration. What's the structure of your data? You've got 40 tenabytes. What looks like a primary image? Yes. What's the detail? What's inside? Financial data, vectors, arrays. So effectively multiple structures? Yeah, but structured data, definitely structured data. But a lot of it, like you could think of a bank having 10 million trades in the book and each of those trades, you will price them in varying, in different pricing scenarios. So you may have a thousand prices just for one trade, you see. And all together, that gives you about 100 gigabytes per day of financial data. And then 400 days of that. But this is not, this is something that's pretty accurate in fact. When you look in market risk, for instance, risk calculation or value at risk, those are that type of figures. So there you go. Effectively a set of tables. Yes. Now you're processing. You must have some customer processing to be able to read what's financial today and what's the security, what's the basic data for that. So it's clear, yes. Active Pivo itself keeps this knowledge when it puts data into memory. But it keeps it in a relational format that we are all familiar with, like in the table. And on the top of that it offers I would say a multidimensional interface like a Pivo table. And with that... You're touching on an important topic. So in the case of Active Pivo, we keep the incoming structure. We keep it as it was in the instance. Yes, exactly. So no cleaning, no dropping the data? No, no. And it's very important because it allows us to load the data in the Pivo table. Yes, exactly. And it's very important because it allows us to load the data in the memory in the right way like in streaming. While the data is loading from blob storage to instances, at the same time, in fact, like in the streaming process, Active Pivo is preparing its internal data structure. And that's why at the end of the 13 minutes of loading, in fact, Active Pivo is already ready because it's done all the work at the same time until the data was loaded. Looking away, we were told to give away nice Windows 10 fans. I will answer the question in your industry. So I'm just curious, what was the company reward? Is it like a platform or a service provider? That's kind of huge. In a few instances, it's also a lot. Thank you. Just to mention banks, in fact, in Singapore, domestic banks there is management majority of French product who writes and they fix everything. Yeah. Well, first, we were trying to set a record here. I'm not saying all of our customers are operating at this scale. We were just trying to send a message. Actual applications running in investment banks, some of them in Singapore, standard charted, IZ, they are using Active Pivo, for instance. They are running in even below a terabyte or a few terabytes. As a company, we are a technology maker, so we write the technology. We do not operate it at this scale as a service. Not yet. But many people are asking because it's very appealing to be able to do interactive analytics at this scale. And you said it would be very expensive. How much do you think this cluster costs per hour to, like, like a ballpark figure? How much do you think it is? Unless you have a special bid price. No, no, no. List price. List price is $1,000 per hour. In real life, probably half that. But only how long the duration to cover the report was expected? Well, you've seen we've done the full trend analysis queries on this dataset in five sessions. So if you had a thousand reports to run, you could do them within an hour in this. But it's not necessarily how our technology is used most often. It's more for interactive analysis, right? Giving it to end users. This dataset is static. Yes. So you can run loads of different queries Yes. Absolutely. Absolutely. And by queries don't just think right, by queries don't just think about a sum or an average. Remember, we have an engine that can do actual calculations like cross-currency, quantile, statistics, everything you need for a variant resource all on the fly, like it was a simple sum. Yeah. And you have a CPU and a core and a memory. So in that file this one. So there's a relationship between the core and the memory. Yes. So you bring the data, aggregate it and then do the computation. But there is a master slave relationship between the slaves and distributing those slaves and getting it through. So we use a master slave technology for the distributed architecture between the servers. But within one server you can be using more partitioning for this piece of the information on this system. Yes. Of course. In this case historical days in fact are the partitioning key. Historical days. Several days. That's a very easy one. But so within one box how do we dispatch work on the processors? We use a special thread pool of our own like one big thread pool that gets task and that itself spreads it among the four threads, thread pools that we have. We have one thread pool for each processor. It's really internal. You asked for it. What is your instance fields? That's it. 10 of the instance fields where it is running the computation. How is this taken care of? For translation or doing something the instance fields the azure instance. One VM fails and then what happens to the rest? Because you have distributed one instance is carrying several and within the pipeline of vectors you need that. It's very important because it's not taking care of the rest. So we did that. Actually within the cluster there is no failover but we did in the past some tests with products like a chef or a puppet where you decide if there is this instance and you write your recipe and if this instance fails start another instance and load the same data that was in the previous instance. Here we're talking about it's a problem of failover. This of course the whole cluster we were not expecting any not to fail when we were running such tests. But if you want our clients what they do is they have one primary site and then one disaster recovery site and then if there is something that fails in the primary they switch the clients to the recovery site. As simple as that. Sorry I missed the second part. We're considering other languages. Thank you. We use Java for our core technology like our database but for instance when it comes to the UI we use JavaScript I wanted to mention that. But now to we haven't found a use case yet where we could from inside a calculation from one of our calculations we could leverage Lambda we haven't found a use case for that yet. But that being said we are using Lambda for other parts of the software for instance for the licensing we have a special pay as you go licensing when you operate active people in the cloud. And the way it works is that every one of us we have a license we have a license and the way it works is that every time you start an active people somewhere in the cloud it calls a Lambda function that will record the usage in the database so we do benefit from all the services and the ecosystem of the cloud but within our engine itself it's pure Java and for best performance. Cosmetical it used to call the same API for starting, yes. Of course. Starting it one last question I have two of these Distributed When we are out of space I also have name cards Is it supposed to be part of the excuse me Excuse me Is it? One more question First one I have five You can keep talking with pizza Pizza, I've been told pizza are here so we can keep talking over there Pizza is already there If you want to have it then you can Couple more Just one question The ladies first I'm French, I have to operate from a French Since you don't have any idea of the benchmark then we can compare with what is part of the chart of performance and parts That's a nice one But I can do at least a comparison on a feature of what is part of the Piana It's really a sequel database that runs faster but it's still doing sequels So you have the building blocks of a sequel database or a group buy some address While Active Pivo has a two calculation engine when you can put your business logic and calculate an SLA a price or financial reasons And then Apaches Park It comes to the flexibility of the calculation you can do but it's more batch processing It's more of one batch taking a dataset processing it and writing the results and it's not something 100 users could run at the same time You could not have 100 users launching 100 Apaches Park batches on the fly When you use Spark in general you do one transformation batch and then you give the output to end users who will use standard API So very quickly that's how I would draw the line between the system And then Operating Active Pivo is probably cheaper not just because of license cost but because of the agility and the flexibility with which you can deploy it You cannot do SAPANA Indeed, this way SAPANA requires certified big scale up hardware and nothing else like Active Piro on a laptop or a cluster or a big iron How do you update your data? Do you update your data? We do financial say but it's changing day by day by second by second I'm sorry it turned into an Activian product description but it is one of our big strengths Opposite to traditional OLAP solution where you have to build a cube but then it's read only We are completely incremental We can keep loading data and at the same time operating queries on it with an MVCC concurrency model And the reason for that was that our primary market was financial markets where everything flows, trades, prices and so it's one of the big features of the platform that you can build your cubes incrementally with data We send the data from an age out the old stuff so you do that 30 days For instance 90 day analysis Last one Last call Could you make sure that you have my people study accounts my people streams the final data loading slides This one, right? Which one? This one, no? So you do that No, no, no, it is distributed across the DNS Roughly, yes, this is Yes, exactly That's right In this case it's very simple Each server has three historical days to load so you just have to distribute those days to the server at start-up time This case is very easy because the data is naturally partitioned per historical day Some other workloads are more difficult to partition You have to look at other business fields maybe the booking, maybe the region but here it's very natural you just at dispatch like a run robin the rest of the record days to the server So the filings are called the name of the file has the date needed some simple pattern like that nothing fancy on that Sure One last question Do you have any critical question to ask? The one who answer will get the jacket I'm asking the question Yes Some critical question where? How many nodes? What's the total number of nodes? You raise your hand You choose One critical question You just raise your hand, don't answer the question Just tell me the full name of these two speakers French guy You? What was that? I mean it was good And Tony's fine Take first Fine? Thank you guys See you around