 First chance I've had to say hi to everybody my name is Tony Shaw. I'm the founder of data versity and Along with my colleague Dan McCreary is going to kick off the talks right now is I'm the co-chair of the conference. So I hope we're after a good start. I hope you've learned something valuable today and That the next couple of days are even more valuable for you. The format for this evening is Lightning talks so Want to get this to you Barry? So basically the format of a lightning talk is five minutes, you know, and it's a real hard deadline. So my colleague Victoria here Operating our timer She has a computer that faces the speakers so Speak you can see and no not yet So the speaker can see this and you can watch the timer countdown as well For the benefit of the speakers when that hits one minute Color changes to yellow when it hits zero Victoria will chime the conclusion of your talk She's gonna come up with a tune So The whole point really is that in the space of about 45 minutes We're gonna go through eight different presentations and hopefully get eight great ideas. We're gonna come out with three or four That's not bad either The the names of the sessions are all in the guidebook app Which is that electronic that mobile app that if you've downloaded that you'll be able to see them But briefly I'm gonna tell you so we're gonna kick off with with Dan on the topic of no sequel and agility Then Steve Brodsky of IBM is gonna talk about new tools for Hadoop and big data Neil Raiden is gonna give a presentation on in-memory databases the pros and cons of those uh yon's awesomen of France is going to talk about the power of linked data and Then we're gonna transition to Barry Morris who just got a patent the other day I'm not sure if he's gonna tell us about the patent exactly, but Gonna talk about when sequel meets no sequel By the way with I heard a lot of no no SQL today. I've Is there consensus what it what is it and I'm giving feedback somewhere if that's controllable Thanks, hands up no sequel. Have you heard of it hands up no SQL? Oh Preference no preference, okay Jeff Malasky Malasky of phasic Jeff were you over on the right here gonna talk about Corporate no sequel enabling agile data governance My camel of past past stream Just in from Germany welcome Mike gonna talk about real-time big data analytics and indexed column stores and then Vlad Bagvinsky is gonna wrap it up with the seven habits of successful no sequel adoptions. So Let me move up here There's something really tinny coming out of there isn't there or am I the only one who can hear it? I think Okay, so is there any echo in this one? I guess they're okay Yours sounds better than mine. Maybe I'm standing too far forward. Yeah, it could be that you're catching some Does this make any difference when I stand back here? Okay, lesson learned. All right, Dan your remote is there Victoria you're ready to start. Okay Then let's get going Dan McCreary. Okay Now one of the things about these lightning sessions is audience participation So if you like anything, you're always willing to welcome to clap if you don't Boos are acceptable So I want to talk about no sequel and agility And I want to start out by saying I'm really mad, you know Tony and I when we started this conference We said should we call it big data or should we call it no sequel because both of them are big themes? And if you look at the Google Trends You see that big data is getting this huge spike and no sequel is still growing But maybe not quite so fast I'm really mad at big data big data has started to take the limelight and been pushing a lot of the other Reasons why we're going to no sequel out of the way. It's just one use case for having sakes So big data has become the big bully Isn't that true you go to these things and there's all these wonderful things we can do about high Availability and other things big data has pushed everything else out of the way What we really need is to have a fair and balanced approach I'd like to talk to you about one of my favorite reasons to be joining no sequel is agility Agility is smart agility is quick Agility gets you out of trouble of your projects behind schedule Agility opens new doors and new business opportunities that you didn't know you had until you started to move to new no sequel systems The most important thing about No sequel is that instead of one big massive relational database Just like the old shortwave radio receiver where you had all these knobs and you could tune in these remote stations We can tune in the right service levels Isn't that what agility is about is if the web volume goes up we can tune in things Agility means that we can quickly change The systems that we build based on the changing business requirements and it's not just the changing demands in volume It's also the changing Liability requirements that we have the web service that we thought was going to be a really nice pilot now Everybody depends on we want to be able to check that little box that says Also make this run in multiple data centers. That's the reason that we're using no sequel for agility Do you remember this old for translation model that we had when we had these webpages? We had to translate them into objects We had to take those objects move them into the relational database get them out of the database back into this Translation into objects and things isn't that kind of like driving in the mud Isn't that this quagmire of all these things that if you change any of these you have to change them all together There's no other systems. I know that help you be more Agile when you remove all this translation when you say Everything in my browser is going to be exactly the same structures that's in my database and that allows you to change your requirements quickly and Also to empower other people that are not Java programmers to be able to do those things For us the people that I work with Allowing no sequel to do direct document to document is the best way to increase the ability to add new things To engage other people without that big middle tier to get rid of all the shredding and object relational no assembly Required you just make it resilient to change Adding new features just by changing the document structure and you're done so agility is really what I think people should be thinking about and One of the things we're starting to see now is that this this chart shows how object relational frameworks like hibernate is the Blue one on top is going down as the amount of traffic in no sequel is going up Frameworks like Ruby on rails starting to flatten out the more that we can use no sequel The more that we can Get rid of the middle tier get rid of the complexity and make our systems easier to respond to business changing business requirements So agility also has a lot of other friends that come with the no sequel world Simplicity where we're using simple key value stores that can be portable around Testability because we have a lot fewer interfaces to test Performance quality and empowerment you can come to the no sequel party for big data, but we want you to stay for agility Nicely done Dan You what I'll I'll give Steve this one Okay, so Steve Brodsky from IBM is going to talk to us about New tools for Hadoop and big data Steve What I'm going to do is show you one chart and then I'm going to show you a demo and the basic idea is we've got All this capability in big data a lot of it's open source. That's great Mostly the open source is about runtime What happens if you put together the open source and you put together? Analytics on the open source and then you put together tools that bring together the people that are going to use the open Source and the data together and so you think about that What can we do quickly in a scenario? We're gonna do is we're gonna crawl the web pull down financial statements in their original text in actual HTML forms This is going to be IBM's financial statements for the last 10 years by quarter We're then going to graph it and we're going to make it Reusable so that we can build apps. This is sort of like the iPad Apple style apps that Android style apps depending on your perspective and Make it so that you can do things like issue hive queries and see the results It's the idea is that we're bringing the people together as well as the data through analysis. So let's see if we can show the video And I will narrate as it goes assuming that it shows. Yes, great So we have as a web console as the main tool that allows people to easily see what's going on their cluster We're gonna just go tab by tab as we go through the scenario first Let's make sure our cluster is up So what you're seeing on the left is all the major different kinds of servers We have a lot of the popular open source in there the state of every cluster map reduce Hadoop HDFS hive flume zookeeper H base Uzi a lot of the common Powerful things that people find in no sequel and of course we're always adding more what you're seeing then is it's easy to add Notes the cluster check if things are up or down start them or stop them Now that we've done that administrative thing to make sure our cluster is up. Everything's ready to go It's time to go in and look at our files What you see here is is a browser for HDFS the file system Let's you look at any particular files, and it's easy to browse people used to this Explorer style concept now what we're gonna do is we're gonna switch into app style concepts and do that web crawling So the first thing you notice is there's a pallet of apps that you can choose each app does Something useful import export from databases in this particular case. We're gonna switch And look at one that looks like it's gonna execute a hive query over here We're looking at one that's gonna do some database import export distributed file copy It's very easy to build much more sophisticated apps as well And you can also wrap existing code as apps So that makes it easy for a lot of different users to leverage you can kind of think of the left side as your app store now We're gonna first crawl the web so we have a web crawler app that's gonna be pointed to the IBM Financial website since there's a lightning talk, then we crawled that web It was built by wrapping the nudge open source web crawler as an app Now what you see is what we call big sheets is the ability to take a common spreadsheet style or Excel style Concept of what's inside of your cluster that view it as a spreadsheet what we've done is we've looked at the results of those Spreadsheets we've picked up the URLs which showed where we crawled from looked at the original HTML That went up and down really quickly to show that was the original web page. We've crawled We now have this raw HTML and what we're gonna do is we're gonna look at some of the sheets of how we can Analyze it with this particular sheet what you might notice is year Quarter Billions of dollars and sales these are some of the breakdowns we've done by taking that raw HTML in Creating another spreadsheet that's based on it which represents a view of The actual structure information we've extracted and we've done this with much more complicated documents as well You can then of course do many common things you'd expect out of a spreadsheet style tool such as graph your financial results by quarter the ability to do things like You social media get 360 degree view of your customer Etc is also possible using the same text analytics capability then what we did is we said okay great Let's make an app which is going to go and run high sequel against this structured financial information We have up at the top what the select statement actually looks like you can fill in a bunch of parameters. It's an XML driven Form based input you can schedule you can update other sheets You can see at the bottom half all the existing runs You can make this application run quickly We will be Releasing a more significant version which has dashboards and application chaining workflow all coming up Very quickly so you see anyways as we're at the end of this demo is we have Wrote a new app made it analyze our financial results. Thank you Well since this is a a no sequel conference I decided to go no slides But because of that I have no idea whether I have five minutes or 20 minutes or three minutes I have some jokes in case I run short though, so it'll be all right This is difficult for me because as Tony knows I always come to these conferences exquisitely prepared With the polished slides and I've rehearsed to see Tony is laughing In memory databases, you know, I think it's It's inevitable That someday everything will be in memory And whether it's silicon-based, you know dynamic RAM chips or something else who knows But it is kind of ridiculous to store lots of data on devices that are a thousand or ten thousand times slower When you have something else it's just a matter of cost, but the market should take care of that but in the meantime We have a tiered memory arrangement and we see that there are a lot of In memory databases starting to appear it's not new there have been in memory databases for a long long time In fact in my research, I've counted over 30 of them But of course the big daddy right now is SAP's HANA. That's the one that's getting all the attention So I'll probably in my examples talk more about that than than anything else One thing to keep in mind it Wow, it's going fast. I better go faster All all CPUs more or less operate the same well all standard CPUs obviously the specialized things but a CPU gets a piece of information from memory It gets an instruction and it operates on it and doesn't have any idea if it's working with a database or if it's working with Some other kind of application program or or pushing up a web page So the real trick in any kind of software is how you orchestrate those little bitty pieces of instructions How they happen and that's where you get efficiency So whether something is in memory or not doesn't really matter Because if it's not in memory it has to get there before the CPU can operate on it, correct so what that really means is that In memory databases can't just take Existing software and port them to run in memory and they'll run faster. In fact, I've actually seen cases You know how things can be IO bound if you get a terabyte of memory and you're starting to slam stuff through it You actually get something that's CPU bound the CPU gets flooded with data and it bogs down So it really takes a lot of effort to make something work. The other thing to think about is it is compression I know how it talks about 10 times compression. That's actually a myth It's more like three and a half times because the first thing they have to do is load everything else in that terabyte of memory Not just the data, okay Now I'm really going fast. I thought I had less than five minutes of information The other thing is in memory databases aren't really in memory. They're only partially in memory They have to stage things on disks to be persistent and we're seeing Especially when you're looking at high levels of compression columnar in memory and so forth that The amount of data out on traditional disk whether it's SSD or platters Can be as much as 10 times the amount of data inside the database now Let's talk about total capacity with HANA. You can put a terabyte of memory You can now expand up to 16 nodes 16 terabytes of memory times three and a half That's about the equivalent of 50 terabytes That's nothing right you can't you can't even run a data warehouse with 50 terabytes of data anymore So the idea that you can build a in-memory database that handles analytics and OLTP in the same mechanism is crazy because none of them are acid compliant They say they are but they're sort of lazy acid compliant what it means is you know We can get it into memory that fast But then we have to sit around and wait for it to update on the persistent disk that may be mirrored and all sorts of things right so that doesn't work either And the cost is tremendous it may be a lot less than it used to be but you know like steely Dan said it's cheap But it's not free When you start looking at 16 terabytes of memory you're looking at millions of dollars So it's not like putting up a database is saying boy. This is going to be great. It's going to be really fast And one last thing you may not have thought about and that is it takes a hell of a lot of power To keep 16 terabytes of dynamic RAM running 24 7 in fact It takes more power than it does to take an equivalent hybrid storage where we have hot warm and cold storage based on how we do it So if you're looking for green this may not be the best place to go now look obviously You know when it comes to when it comes to relational databases, we know they've got a lot of drawbacks In terms of being rigid and so forth but in memory database is an emerging thing and it as far as I'm concerned today It's not really ready for prime time except in some fairly small niche areas So I'm expecting a lot of grief about this later on and I'm ready for Yeah, okay, so Tony asked me to talk a little bit about the power of link data and to give a demo of our graph database so Who knows what link open data? Only a few oh man So basically what you do is you take all the knowledge in the world and you connect it in one big graph and you have a mechanism to query it Facebook being Google are now building a big proprietary Knowledge graphs where they take all the people they find the places the organizations everything else and they link together So that they can create better search engines and can answer questions better But the semantic community already started about ten years ago Tim Berners-Lee It is big dream that we would take the entire web add metadata to it in a standard way We take every object give it a name as a URL make a different principle by putting HTTP in front Putting interesting information around a particular object and then link it to other objects. Yeah, and so This principle is now being used in the enterprise and the government the data that gov is now working with RPI to take all the government data and put it also in RDF and in triples and About 2007 who knows this picture by the way Only a small group. Okay 2007 we already had about 40 50 of these big files somewhere on the web That contained data in the form of RDF or triple. So for example, the DB pedia It's about 300 million triples that describe the Wikipedia Geo name 7 million places on earth described as triples in RDF. This is 2010. Yeah, so now we have Farmer data government data multimedia media data, etc. Etc. And I'm doing a quick demo where I take five databases that I downloaded from the web one with side effects one with drugs one with clinical trials database Medicine and with diseases and they all link together through the URLs For example, you take drug bank as a website But you can just go to Google and say download RDF drug bank and you get the triple version Let me just show you how that works See Yes, so I have all these databases downloaded and I can look for something like ibuprofen and cancer I Got a bunch of triples back that contain these both these two words. I can check take I Can show say three clinical trials to talk about them So here you see what some information you see this clinical trial Discusses these diseases these drugs these side effects. I can click on something like aspirin And I see triples about aspirin. So I'm jumping from one day to base in the other I see well the chemical formula the mechanism of action. I can look at other clinical trial to discuss that etc Etc. Yeah, so I can go from one step to the other. I can explore the graph by Basically saying which predicates I want to follow so I can say give me the diseases drugs and side effects and targets both coming in and going out and I can Show a few values So now you see the graph and I can take any other topics. So say let's something like a romantic kissing Which has proven to reduce your blood pressure. Yeah, so we have this particular clinical trial And how would this thing link to something that I be proven? So you let the graph database do its work. It finds 12,000 paths between the two Exactly something like this We diffuse all this path and I get a much bigger graph and then there's a query language called sparkle Anyone heard of sparkle? Oh good. That's good Yeah, so this is a query that actually looks at five different databases Three different databases give me a drug with the name B. P. Torr a side-effect Diabetes and they give me every trial that talks both about this drug in the side effect And then you get the results and here's the official graph of this. Yeah, so this is the shortest demo ever of a semantic technology Quickly back to my presentation So anyway on Wednesday, I'm going to talk Oh About when you want to use a graph database and when you want to use a no sequel database And on Wednesday also talk about some use cases where we actually loaded up to a trillion triples Yeah, and did something very very interesting work with that and we actually take our Triple store and we also completely integrated with solar and mongo and come to a booth if you want to see how that works Okay, thank you. Okay. Thank you And I joke here, but you don't want to see it Well, we have we have about 10 seconds worth of transition time here. You have a joke Okay, I'm gonna wire up Barry while yon's tells a joke No, I just want to say about the D a drop the case because they didn't have 40 terabyte of storage So I'm just this at a conference about big data What happened to my interface? What do you want to tell me where the slideshow of you? Oh, well, it's you gave me a PDF. Yeah I can maximize the screen for you. All right. That's about all I can do That's okay, you want me to thank you Okay, this should come off your time, but we're gonna cut you some slack Since I couldn't figure it out either I'm not sure that the clicker is gonna work, but we'll try it and if not, all right Well, if not, then I'll do it for you. That's okay. I can handle it. All right. All right, let's go So my name is Barry Morris CEO of new adb I'm here to talk about when sequel meets no sequel and I do realize that that might be a controversial Topic here. We we're all aware that there's a problem with traditional databases In the context of modern data centers and modern workloads You've got your own ways of describing it, but it's data center problem. This workload problems It's developer productivity problems and so on The the difficulty is that and by the way the people in this room the community in this room have done a great job of solving Those kinds of problems in a different way. These are scalability and flexibility and simplicity kinds of problems The difficulty is that there are a bunch of things that are not solved By this and there are the things that traditional databases have done very well for 30 years And there are things about powerful set-based query languages and reliable data and transactions and an integration with enterprise Ecosystems and stuff like that and we've kind of said to customers, you know what you got to choose You can pick the scalability flexibility simplicity stuff or you can pick the powerful reliable Enterprise ecosystem stuff and most customers I talked to say do I really have to choose wouldn't it be nice if I could have all of the above? So that's the question we're here to talk about let me introduce Jim I wouldn't spend a lot of time on him He's one of the top database scientists in the world has spent 30 years building database systems back to deck and RDB into base he invented blobs he invented MVCC Dream was one of the top architects of my sequel when he went at this question He said why is it that databases don't scale out? Why is it that traditional databases have to scale up and And he came out with the answer that sequel is scalable inherently or not not is inherently not not scalable Ditto for transactions. The problem is the design of the databases We've built databases the same way for 30 years. Is there a new way of building databases that in fact solves this problem? He started in a completely different place Brought in a whole lot of new ideas about distributed systems and came up with this thing What is an emergent system? You're probably aware of it and think of a flock of birds The flock of birds takes off at the same time lands at the same time fly south at the same time And yet no one's in control Completely peer-to-peer system and in fact, that's the only way that nature puts big systems together It's how antelope migrate. It's how ants build communities It's how crystals grow and so on and this is the opposite of how people build databases databases are typically Monolithic centralized synchronous and so on we're talking about a database system That's asynchronous peer-to-peer loosely coupled and exactly the opposite of what you're used to So I'm not gonna have time to talk you through this the solution. I always get the question. What is it? How does it work? You'll see that there's some ideas in there. There are actually no sequel ideas For example, the storage in the back end is simply key value stores By the way, give me whatever key value store you want. We'll use it quite happily I don't have time to talk through how this all works But the magic source in all of this is this is emergent architecture this this this peer-to-peer system So what we have and you're gonna have to take my word for it I don't have the ability like John's to give you a demo right now But what we have is something that gives you all of the above Anything you can do in your your notes equal system we can do anything you can do in your sequel system we can do Problem is solved may not be what you want, but the problem is solved We have all of the sequel power of sequel the reliability of acid the integration with enterprising Ecosystems, but we also have the scalability reliability and simplicity and particularly the developer friendliness that you're used to and Got a bit of time to talk about this the scalability by the by scalability what I mean is elasticity I mean you can walk up to the console That's the console that you're looking at there and you simply say take this machine add it to a running database Immediately your transactions per second go through the roof Okay, you can keep doing that this thing We're scaling it at the moment on hundreds of cores easy to do you can take them away again as long as One of these machines is running in each tier. You've still got a reliable database Okay, because the peer-to-peer system. There's nothing on that diagram. That is a single point of failure Nowhere you can take out like a flock of birds. You have to take out the whole flock if you wanted to stop for it stop flying and Unsimplicity well all of your tools work. It's a sequel database So you can use all the stuff that you're using all the applications. You're using all the skills that you're using and and always have That's pretty much all I've got to say Do come and have a look at it at our website if you have an interest All right Jeff if you could make your way up, please. I'll bring it so Barry can we assume that the patent that you Got is in that quote-unquote secret thing that you We actually have disclosed everything that there is to know about how this works in a very legalistic Patent if you're good at reading legalistic patents. We announced that two weeks ago We got it in less than a year because there's no prior art on doing this kind of stuff And in fact there were no office actions, which means pushback on the patent at all one microphone short up here, so Struggling through that but anyway my pleasure to introduce my colleague Jeff Malowski Jeff Hi, thank you. I'm gonna go into New Yorker mode. So that's plenty of time I'm gonna give you a 90-minute brief of five minutes, which I've actually done in my life So what we're about we're gonna a little bit actually I'm gonna save a lot of my time Because I'm gonna do an object-oriented brief because a lot of my preface Dan McCreary did so I am extending the Dan McCreary talk So I'm just gonna jump over all that but we are focused on the corporate structured data environment And we're gonna dive down to something called data semantics So everyone talked about that but what you cannot do in the real world is Exize the human from the loop and so everyone in here like I'm a technical guy of a PhD in chemistry It's great to talk technology that is not the solution space in a real corporate environment So everyone's talking data center consolidation not happening We talked to major CIOs the only thing that's happening in data center consolidation right now is two things Outsourced to somebody or pick up the servers rent to U-Haul and drive them to the other side of the town and claim success There's actually no real consolidation occurring So what we're gonna do this drive through this first credibility slide. Have we done this? Yeah, we did this We did this with all the HR data in the Navy. I've given this talk before but that's the biggest one It was also the hardest one I have an open bet with a lot of people state dinner for your family for a year If you can ever find data that is worse than the Navy human resources data and has spent more of your tax money Attempting to and failing to fix it But to drive into data semantics there's a lot of people talk about abstractions and I'm gonna Disagree with a lot of people in the semantics world because I'm looking at the last mile How do you really get the person who wants to merge the data from various sources into a meaningful? Repository and here is one place that it differs even from the no sequel world And even like the purveyors like yahoo and Google and people like that There's a huge difference if you are can accept stochastic results, which is all web-based search engines It's okay to get near the result. That's good enough or a deterministic result Which is what you need in an infrastructure intranet or mission critical system where I got to get right to the right answer at The right time and I need an awareness to know if I'm missing it or I'm hitting it. That's a different environment So here are some open source examples that we got from a university about semantic conflicts And these are three or five systems about regular real estate data I like it because you would think that real estate data is kind of trivial but here look at the Three columns and three tables about garage spaces something as simple as that so all on the one on the left It's clearly not looking like the one on the right. So how are you going to build the canonical data model? How are you going to decide what the semantic intent is? How are you going to build your conceptual logical physical model and the answer is it doesn't get done Because the meaning of the data is not being captured by business process modeling It is not being captured by enterprise architecture It is not being captured by all the requirements being done everywhere So who has to solve this problem the guy doing etl who never reads any policy anywhere at any time All right, but now even we ignore that left one look at the right one and we can go okay Well, that's relatively simple. They're both integers must be simple to merge it No, because there could be different legal definitions of what constitutes a garage space by each state Do we know that answer? No So to investigate to even merge that data is a problem and then you want to go over that that's data semantics That is the single biggest hurdle on all merging of data warehouses corporate and government today I personally know in the government sphere of about 10 billion dollars that has been spent on this that had produced zero And whatever your politics are that came out of your pocket So what we have is we have corporate no sequel which goes into existing database servers in existing hardware Like all no sequel it increases performance by at least a factor of 10 to the 4th And then i'm running out of time. So what is corporate no sequel because of the reality of having to mesh with existing organizations Existing people existing data engineers. It simply blends the best of both types Which is this and this was really done for all that hr It has tables. That's what people like so you can have data concepts organized But look at each table some primary keys and a type value pair What goes into the type element is vocabulary which we define and manage that model and the vocabulary was done in real time With business people and put in production oracle data servers And i'm going to talk slowly for four seconds And here's some stuff you can read because it's not on my time All right, jeff. Well done. Thank you And you can of course follow it with jeff He has a booth out on the floor this week all right so, um I asked mike to speak just uh maybe 24 hours ago And uh, so it was it was just uh sort of a fortuitous event that he's with us today But um, i'm delighted that you're here mike. So take it away. Okay. Thanks. Yes, that'll advance your slides So thank you very much for inviting me and it was actually truly 24 hours ago that I reached out to him Because we just closed our first financing round in the u.s. And relocated the company to the u.s We actually closed it with kosher and that's a major step for us Um, it's all about real-time big data analytics and we actually started four years ago And we did it because we had a real problem That was in a tourism space And germans like to travel but I much more like to search for the right travel office billions of offers thousands of queries per second And less than a second response time with a multi-dimensional filter and a group on millions of data records That's what we started with And what would we build we actually build a database from scratch In c++, which is a column in our database with standard interfaces But with specialized index structures on it And the index structures are bitmap indices highly compressed And we can analyze them in their compressed format. There is no need for decompression And you can do that massively parallel That's an mpp distributed shared nothing system Which works with bitmap indices And it's highly specialized on analyzing data. It's not supporting transactions. Therefore, you can take that shortcut How did we build it If you look at a column in our database and you equip it with a bitmap index You can do many operations Just by doing xor and end operations for the multi-dimensional filtering And on the bitmap indices, you can also do a lot of mathematics Like some min, maxes, averages You can do all this basic mathematical operations on the index which is highly compressed Which is much much smaller than the data You actually have to analyze And we have a patent on the compression technique Which allows us to analyze the compressed bitmap indices massively parallel So we use all the cores of the system And why is this relevant? Because we really achieved multi-dimensional filtering with extremely high throughput Thousands of queries per second and with continuous data load So we load in streams and analyze it together with historical data With sub-second response times even on billions of data records That's not theory That's used in production 24 times 7 At some customers in europe and in australia and we have started testing about two months ago with the first clients here in the us One example is co-optimization You have seven terabytes of data Collected from google, bing and yahoo search engines And you try to find out who is your nearest neighborhood on the internet for your domain Which is using exactly the same keywords as you and stealing your traffic I will be happy to look into the sequel statement which does that That's frightening for every database and we do it in less than a second on 10 billion records And now It's about you and you have one minute to ask questions Sorry We use standard infrastructure means amd, optorans, internaial and chipsets They have a good memory throughput Of course it's an in-memory database, but it's not limited by the size of the memory We just use the memory as basically a swap space for All the indices and all the data which is persisted on the hard drive We just use it as a buffer basically And therefore more memory just allows you to keep more data in the system and access it faster, but it's all persisted On the right Sorry, can you Oh, sure, uh, we have multi-dimensional just to repeat that can you remove historical data from the index? Yes. Yes, you can the whole system is Is using a multi-dimensional index and you can drop partitions of i.e. historical data And index separately from the data if you like Okay Thank you very much. Thank you Mike Okay, so glad make your way up here if you would uh, we're gonna wrap up. I thought this was the right topic to Finish up the sessions tonight with The seven habits of successful no-sequel adoptions Okay, glad to me back fanski take it away. You're uh, you're Okay, great So the background of this talk is Based on experiences that we have in introducing no-sequel Now what is interesting here is that we are not dealing with some high tech research projects But we are trying to introduce no-sequel into ordinary organizations Now sometimes you are going to find that in such environment people are very skeptical You will have that the rest of the it often feels a little bit threatened through the new technology And they think there might be shifts in organization. So You have to deal with a number of software issues when you are introducing a no-sequel in the organization And here we are sharing a couple of our experiences So we have Summarized the things that we need to do in order to be successful in a set of habits and we have seven seven habits the first habit is As you want to introduce a no-sequel in the organization You have to actively look for the problems where you can help with introduction of a sequel Often you have technology groups in organizations They are enamored by the new technology and they think that just by playing with it They can introduce value to the enterprise. So you need to find that the problems where you can truly ease the pain Then when you start working with your clients make sure that you are truly delivering something tangible Something that will be very useful for the clients. Usually they need to spend some money to pay you. They are skeptical It is new technology. It is different from the ordinary it and in order to gain their confidence. You need to provide tangible value to them Then you have to prioritize you need to figure out what are the things that you can accomplish with a no-sequel And then choose the most useful thing for your clients first And you need to work on it in order to prove the validity of your approach and the technology One habit that we find is really important is It's really on the soft side. It is about stakeholder doing stakeholder management You need to cultivate relationship with your users often They will be quite hesitant to give you a new project and in order to gain their confidence and maintain the trust You need to constantly remind yourself that you need to work with them inform them of other progress You cannot just go away and spend a month or three months just playing with the new technology So you need to be in constant touch with your users Habit number six One thing that we all often see is that Looking into the new technology We see that people are getting Into depths of the mechanisms of particular no-sequel system and that they get all bogged down into these details However, one thing that they miss is the actual problem domain in which they need to apply the no-sequel They often don't show enough interest in the problem area And then when they try to achieve something useful for the client, they don't have enough understanding of the problem domain So that is something that the no-sequel teams need to work from the beginning Explore the problem domain the various domain modeling techniques that can be used Have a good communication with your stakeholders and users and make sure that you are versed Both in the no-sequel technology, but also in the problem domain Because often the problem domain that we are attacking is non-trivial And the technology alone would not help us to be successful in that space Integrate with existing it the no-sequel solutions are not going to be isolated islands They need to be integrated and not only in order to provide something useful to the client But to be integrated into the whole ecosystem that you have in the company So for that you need to particularly pay attention is how are you going to get incorporated into the workflows that you have for the data movement You need to integrate by ingestion of plain files ingestion of data from relational databases and also your outputs should better match What is already existing in the infrastructure as opposed to trying to create something revolutionary So integration with existing it I would say is one of the paramount things for the success And also this is one of the ways how you will gain the trust of the skeptics for saying We should be staying with relational databases. This new stuff is not going to be a match for our enterprise So always integrate even though that you will get a pushback occasionally And finally Technologies are changing very quickly. So you always need to be to look out for the new solutions Something that did not have a good technology support a year ago might be supported very well in the product that is now available And then when you have these seven habits Repeat them every week every week in a meeting of your nautical team Make sure that you go through this checklist and verify that you apply all of this. Thank you Well, I think you could pretty much adapt this presentation for any new technology given Because uh, those are good internal consulting skills no matter what the purpose. So, um First of all, thank you to all of our uh instructors this evening. I really appreciate having you involved And um, I hope it's left the rest of you with at least a few useful ideas to follow up on over the next couple of days We are reconvening at 8 30 tomorrow And uh, I look forward to seeing all of you then and have a good evening. Thanks. Bye. Bye