 Hi, good afternoon. I'm Matt Inchentron. I think we'll go ahead and get started since it's three o'clock on the dot. I want to thank everyone for coming this afternoon. I'm really glad to see such a good crowd here. We are two sessions away from the end, and they came through and reminded me to tell you that the reward for staying through this session is ice cream. So don't leave during the middle, after this session, there's ice cream before the next session. So let's have the reward specifically for this session, but there is a nice social networking opportunity after this. So I'm Matt Inchentron. This is my colleague, Daniel Sustl, from Clavera. I'm Daniel from Clavera. He signed it up already. We're a couple crew from Sun. We met together a long time back. Yep, quite a while back. So we want to spend a couple of minutes this afternoon talking about how a couple of, or actually really have multiple data stores can come together. There have been a couple of sessions here on CouchBase already at the conference. I hope those were useful to you, so you probably have a good ground name on CouchBase. I'll talk about that just very briefly. Chances are you've probably all heard of what Clavera works on. Hadoop, although we'll give a quick overview of that as well. But frequently, especially if you're new to this environment, you're probably trying to figure out how do these things come together. So we'll talk a little bit about that in this session. So this session is mostly about data storage for polygons. We'll get to the definition in just a moment. Motivation-wise, really, if you think about it, a lot of the systems that we use today were designed in that era. Well, not exactly for this system. Chances are for bigger systems. But the systems that we've used until the last really five years or so have all been predicated on, have all been designed for these other kinds of systems that existed previously. These were typically systems that had extremely limited memory, in many cases. Limited storage capacity, very limited IO throughput, and really couldn't do a lot of complex transformation and really hit their usage goals. If you go far enough back, you would do things like do transformations as you read off tapes and stuff like that. It doesn't look like you're doing that over here in the crowd. But the main point is that a lot of the systems that we've been using for persistence until recently were really, they've evolved of course, they've gotten a lot better, but they were really predicated on that architecture. So the title of this presentation has a couple of terms in it that I should take a moment to define. First one is polyglot. So polyglot is a person who knows several languages. Frequently this is heard in conjunction with programming languages. But the term I think the first person I've heard using is actually Adrian Cochran. But it makes sense. Polyglot can also be applied to persistence. So polyglot persistence, if you're using polyglot persistence, that really means that you're taking the data that underlies your applications, that underlies your business, and using it in multiple systems, each for different purposes. A lot of these systems have different strengths and weaknesses. And it doesn't necessarily make sense in today's modern era to try to do everything with just one single system. Now reality is, as an industry we've probably done things like this for quite a while. But it's kind of going to an extreme now. So we wanted to talk about one way that a number of these systems have come together. If I were to just try to divide up this world with no SQL, there are lots of different ways I can slice it up. Just a few terms to throw out. There are document stores. Couchbase falls into the document store category. There are a couple of other players in that area. They're called more-anted stores. There are systems that obviously provide acid semantics as you work with the data. There are systems that apply on a different side of the cap there. I'm sure you've heard that before. I don't know how many folks are new or not to this whole area. And then there are even some new terms. One that was coined not so long ago was FACE. I don't remember exactly where it was. And then there was one that Brian Hedgfield had. I should have added that. He called things DIRT. He said we had DIRTY systems. So there are lots of different persistence systems that we can choose from today. But these choices don't have to be exclusive of one another. You can choose to use combinations of them. We'll talk a little bit more about that. So let me give you just one quick, and we'll talk more about this use case in detail later, just one quick example of this. Couchbase has no SQL document-oriented scalable, no SQL database. One of our key capabilities is we do things very low latency and very high throughput. So we work very well for add and targeting. So one use case where you might bring multiple things together that are no SQL systems are an add and targeting platform. In that particular example, we end up in a scenario where we have a Couchbase server cluster that's spread out across multiple systems. And then maybe we're using something like Padoop in the case of... We've partnered with Cloudera and we've done some integration work between Padoop and Couchbase so that people can take data that they're using from the log... They can take logs from the add targeting platform and build additional information to be able to use these no SQL systems together. We actually do that through something called Scoop. And so with that, I'm going to turn it over to Danny who's going to talk a little bit more specifically about Padoop and Scoop. Padoop and Scoop. All right, so let's go to the Agi-Hadoop story. All right, so I assume... So this is no SQL conference, right? You guys all know everything about Hadoop, right? You're all Hadoop implementers. You're not producing your sleep, right? Yeah, okay. I wish because we would hire you if you could. There aren't enough people out there who can do it. If you can come see me afterwards, we'll hire you. So what I'm going to try and do here is like up level, let's step back. What is Hadoop? What is Scoop? And kind of try and keep it a little on the higher levels. So you probably know that Hadoop is all about unstructured data and it's about dealing with it scaleably, right? If you're plugged into the cloud-era marketing fire hose, you're also aware that it's the big data operating system and that we're changing the world one petabyte at a time. So what does that any of that mean, right? So it's a couple of adjectives dangling in space, so what is this thing? To explain that, we're going to take a step back and this is actually amusing. Matt and I developed our slide decks completely independently and did the exact same thing, but it started with the trash 80 and I do miss that big orange button. I often wish my system had a big orange button. So in explaining Hadoop, let's start with the very way back when or the very simple end of things. If you think about a single server deployment, there's CPUs in there, there's disks in there. You can look at this guy as kind of divided up as application and data and this is just as a single unit, right? And in this space that makes perfect sense, like when you're doing your demo from your laptop and I notice the most commonly heard comment I heard from speakers here was, oh my God, I can't believe my entire 3 tier and OS2L application is running so well on my MacBook Air. Everyone the presenter said this just about. I don't have a MacBook Air so I can't say that. But so when you're running this on your laptop, this is kind of your structure. So what happens when you scale that? Well, you blast out the application side of things, but traditionally your data, you don't want to spread your data too much. That starts getting hard. Well, what if you have to? Well, the very traditional answer is you're bringing a very, very large and expensive, biggest machine possible to hold that database, right? Because your data tier doesn't traditionally scale well. Well, then you have the OS2Ls to the rescue. You call couch base and now you replace the gigantic horrible thing with a data grid that hears your application grid. This is a very natural paradigm and it's a very natural growth out of that one system that was compute and data combined. But it's a different path than Hadoop. Hadoop took a different model, which is instead of diversifying or specializing in these of your computes and these of your data, we just have a lot of computes and data, right? So your compute is always local to your data or at least we try very hard to make sure it's local to your data. It's as local as we can make it be. And this is a really fundamental shift in the way you do computation. So when you hear about how amazing and wonderful MapReduce is or how amazing and wonderful Hadoop is, it's usually less about the algorithm and more about the fact that you're not moving data. When you do your computation, you send it to your data. It happens there in that same space, right? So that's kind of the core of what makes Hadoop excited. So getting back to what I was supposed to talk about, what's scoop? And I'm going to, so this is copied off of the scoop.img.org website. The TLDR is you move data with scoop. Getting into a little more detail, let's do that same thing that we just did for computing and getting to Hadoop. Let's look at ETL and get to scoop. So if you look at a traditional ETL system, right? You've got data on the left and data on the right and some ETL engine in the middle. You suck data from the right, do some kind of transform on it in flight, shove it into your source on the left. This is what we're all familiar with as an ETL model. When you start talking about scoop and you start talking about Hadoop, remember the data is with the compute. So that left side falls off entirely. You're sucking your data from the right and you're just dropping it on your disk. So you leave half of that data transfer behind. Now you may have noticed suck transform spit. Suck spit. Where did the transform go? Well, I'll get back to that. Sorry, I got one slide ahead of myself. And by the way, it scales. So you can do this on mass. And this is one of the things that's exciting about scoop is it's not just about doing this on the one node. It's about passing out this sucker, if you will, this code that's going to suck data down from the database and running it on mass across a cluster, which gets you one of two things. It either gets you really high bandwidth data transfer from your database into your cluster, or it gets you a really wicked DOS. So back to where did that transform go? Well, in the duke, the duke tends to be commonly used as an ETL engine, but it's really not strictly ETL. It's really more ELT. You suck it over. You drop it on your local disk. Now transform it as much as you want, as many times as you want, as many different ways as you want, because your data's local to your compute. You can do these operations very cheaply. You no longer have to insert them in stream while your data's flown by. You got your data. It's captive audience. Do whatever you want to it. Post back, though, as much as you want. So that is what scoop is about. It's about enabling you to suck out of your database, drop it into the duke, and then you do whatever transforms you want. Then you reverse the process. You suck out of the duke, and you shove it off into your database. So scoop is, let's see, there's little details. Scoop 1.4 is the current version, which is what's bundled in CDH4. CDH4 is the only duke distribution that you need to know about. Scoop 2.0 will be coming soon. 2.0, you'll see this later in Max remaining slides. Scoop is a very command line oriented utility. It is as Linuxy as Linuxy could be. There's lots of double-dash import. This equal double-dash. If you get a command line that's about four lines long to execute your scoop operation, and it's like some people find this boring. So with Scoop 2.0, one of the things they've built is a web UI so that you can just point a browser at a URL and say, I want this to happen. Press the button to go. So among the things that Scoop 2.0 brings, that's one of them. I actually asked one of the committers this morning for a screenshot of that. She said, sorry, we haven't built it yet. But coming soon. Coming soon to a scoop near you. And one of the things with Scoop is that by default, it plugs in by JDBC. So that covers 90% of what's out there. There's also a slew of custom connectors. So you've got couch-based bolt, Vertica, TRD, and diesel, Oracle, MySQL, Hostgres, et cetera, and Osium. The interesting thing is you can use JDBC with most of these guys. So why is there a custom connector? Well, because JDBC tends to not be all that blindly faxed. So for example, with MySQL, we do a MySQL dump, and then ping over the MySQL dump as opposed to actually doing a SQL save against the native IC. So that's scooping a nutshell. And if we have questions, we'll hit it again. Great, thanks. So let me talk a little bit more about some specific use cases. But before I do, I probably should have done this earlier. Let me just kind of get in a sense of where people are coming from. So this is the interactive part. So show of hands. How many folks have either tried or have currently deployed to do in one way, shape, or form? I'm sure they're all CDH-ering. And then how many folks have tried or deployed something like couch-based already? Okay, cool. Five or six. And then just to understand, maybe on another angle, how many folks are using relational databases and file systems for everything and are here to learn? Come on, you can admit it. It's okay. Okay, so a good number of folks. So that's cool. So let me actually, because of that, we pause for questions just for a moment and see is there anything that we need to address to bring you up to speed on that? Does what we're talking about here, we explain it in a way that makes sense? Cool. Okay. Got heads and arms. So that's good. You can get extras. You can buy a screen later if you ask me. Is that an SQLP of my screen? No. You get two. So I'd like to spend a couple minutes talking about some key use cases because I think this will help bring the pieces together for you. I'm sure all of your environments have lots of different disparate datasets in your minds kind of turning over how would I use this? But it's useful, I think, for us to be able to talk about some pretty common deployments. Ad targeting is the one I mentioned earlier. Ad targeting is kind of interesting because, well, number one, it's a very big business. There are a number of, it's a very competitive industry and there are a number of companies that are all trying to compete in segments of the ad targeting market. But there are also some very, very intensely data oriented. One thing that AOL advertising does is they actually have their SLA. When there's that little box of what am I going to display for the advertisement, they have 40 milliseconds to come up with an answer for what they're going to display in that particular box. Now the challenge is that's 40 milliseconds to actually decide what to display. So in order to make that decision on what they're going to decide on display, they need to get a certain amount of data, they need to apply some algorithms to that to be able to make a decision. And the better they do with making that decision, the more likely you're going to click on the ad. I'm sure we've all seen, I always say that in the old days, Facebook always had some of the goofy ones where their ad targeting was getting better, you see someone want, why are they targeting me with this? So these days, all I see are HorcWords ads. Really? Yeah. Yeah. I had a friend at Facebook actually told me that people figured that with Facebook, you can target people who work for a particular company. And he said that Facebook employees had gotten targeted by people who wanted a job at Facebook. So that people would buy ad space targeting Facebook employees, which was kind of funny. But I never did that. Maybe some of you did you're gainfully employed or wealthy from an IDO. I don't know which. So point is they had that 40 milliseconds to respond to the decision. And so that's not a lot of time. So where, even if we were to think about just using a traditional relational database, that's really just not a lot of time to be able to extract that information on mass from a system designed to support the whole world, right? These ad targeting systems do need to support a very large data set. Couchface, because of its design, I can go into some detail if folks are interested later. We can actually serve up data if you're talking about things that fit in the packet. We can serve up data across gigabits and at about 250 microseconds. So if you think about it, that means that I have 40 milliseconds to apply that algorithm to that data. If I grab two or three bits of data, I've now spent two or three milliseconds. I can now spend the remainder of the time with my algorithm, make my algorithm that much better to decide what am I going to display for that particular reason. So Couchface has been in use behind a number of different ad targeting and ad retargeting. It's a very interesting competitive industry that I don't really partially understand. But the main idea is that over the course of their ad targeting, they will, in many of these cases, they have a few different components of data. So one is that they generate events. They generate a lot of events. As people visit the different pages, they generate events associated with that user profile. The user profile then gets stored, but it may also be used directly with Couchface. So they may write those events in Couchface over the course of the day and use that over the course of the day. You know, I scream for it. They also will then generate different profiles on those users and campaigns associated with those users. So our campaigns associated with their customers who are targeting ads. So of the people who have visited the site, in front of the events I've seen, I build up a profile. Now I have an ad campaign based on what my advertising or what my clients are willing to pay to profile or to target different profiles. All of that data needs to get loaded back into something that I can use at runtime to be able to hit that 40 millisecond SLA. So in the case of generating the events, the front end will put those either into Couchface or to do it for, in some cases, bold. And then they'll generate profiles and campaigns, typically running a set of really heavy-living to do not-produce jobs or frequently chains of these jobs, things that Daniel has done in his sleep. And then load that data right back into Couchface before being able to sort that out to the user. So have you seen the quote from the Humberlocker? It says, the best minds of our generation are about figuring out how to meet people like I'm a little pop-up-ass. It is true. What was that? Well, there was a Will Wheaton. Did you see that one? There was a Will Wheaton. I think it was a tweet the other day who, I think, Will Wheaton, who was Wesley from Star Trek the next generation, he said he turned off JavaScript and searched for about 30 seconds and discovered that most of the web, that without it, the most of what JavaScript is doing on the web is trying to figure out how to do pop-up nerves, display ads, and it's worth looking at. So it was kind of funny. So let me talk a little bit more about a very specific use case. So I talked about AOL briefly. So these slides are actually directly from AOL. The architect over there, Paro, had shared these, and actually a subset of this was, or I'm sorry, a superset of what I'm going to talk about here was actually presented here at NoSQL now last year where I talk more about this case in detail. So I'll just cover a high-level piece of it because I do want to get on to one other use case. So at AOL, AOL adopted Catchbase initially for their ad targeting and then they worked through how they could build out new sets of products in their environment. AOL built what, they turned a real-time framework for their ad targeting. This system has since evolved from here. But this is just a quick overview of their real-time framework. So they actually, at the time they were feeding data in through Flume, some of that has actually changed since then, but they would feed data in a regular more of a 15-minute burst. Once that data is loaded, it's then used as sort of read-only staging before it's run by different compute jobs. Depending on the job, it may actually be either written, so at this point, the data is actually in a Catchbase back-end cluster. Depending on the job, it may actually flow into Hadoop where they're going to do a lot of other additional heavy-living and developing those user profiles. Or it may flow back over to directly back over to the front-end for ad-serving logic. And since this is done on a regular basis in their environment, they're able to over the course of the day actually change that profile for the user. So as you're clicking your way through the site or various sites, because you never actually know what's happening behind a particular page, it might be ad-targeted or ad-retargeting. It's a very interesting industry. You'll see that they have a number of different approaches to be able to gather that data, build that profile, and then serve straight back over in a real-time fashion. One specific example is what they call contextual segmentation. In this specific use case, you'll see that there's a user to content ID mapper and this is all run out of their own proprietary system that will create, based on the active event frame, the user context map, and some other data. The user segment mapper, et cetera, they'll run that off to other systems outside they can do to be able to generate that data and build that profile. That data then goes directly back over to Cajets, where it's then used live real-time for ad-certed. So in that environment, they'll be able to very quickly over the course of the day just continually ingest the data, update profiles, based on the updated profiles, push that straight back out to the front-end system, and then target the individual users with that individual data. Now there are also a user segmentation long-term map that lives in Hadoop, and so this is really where, this is a short term over the course of the day, I'm rebuilding the profile, but there's also a section here where they build a long-term profile of that user, and they do their user segmentation on that long-term map, that too lives in Hadoop. So they'll do daily map updates and event-based updates for that. So with that, I want to probably talk about one other, but before that, I probably should have described this a little better in slides, but it's worth discussing how does this actually happen. In the case of Scoop, what will happen is you will see it in a bit, it's a very simple command-line tool, it's really just a matter of, as I said, simple, yeah. Well, it's simple after we did the integration work, right? The neat thing is that I've seen a couple of these environments, not AOL, one other customer, they had I think it was an 18-node couch-based cluster, 20-node to do a cluster. It was pretty neat to see, for those who haven't seen a couch-based, we have a really nice graphical UI of what's happening operationally inside the system. At deployment, when you would ask Scoop to go ahead and move a set of data from one side to the other, and it is simple, export and import, they would, we would see that spike up to about 3 million operations a second. The reason that happens is that the quick little Scoop command-line is really not directly executing there in that little shell, it's really creating a set of map-reduced jobs that end up getting deployed out to the new cluster and run as separate mappers and reducers. The new connector itself and the work that we've done will automatically split that up into whatever makes sense for the individual cluster. That really means in parallel what we had was 20 Hadoop nodes trying to shove data into 18 nodes of couch-based as fast as possible, as fast as it would allow. It works out quite well. With that, let me talk a little bit about another set of uses, and that's really content and recommendation targeting. So the other place that CouchFace ends up getting frequently deployed for sort of the data stored behind an application are a number of content sites, companies like the non-nav-tech, NHN, Vineo, even Mozilla and Salesforce use CouchFace behind their sites. In many cases a lot of these sites are actually, they actually have applications that have been around for quite a while. They're content oriented sites it depends on who it is of course. But those content oriented sites like NHN is one of our early partners that's the next human network. They're basically, they're effectively known in the U.S. as the Google of Korea, so they're very large in Korea and they have CouchFace behind a number of their content sites. Many of these apps have been around for a long time and have traditionally used relational database, but as the requirements for that application have grown and especially as user demands for that data have grown they need to be able to keep pace with the throughput and also hit the better response times that users expect now is they want to roll out applications to mobile devices etc. So what's happening in many cases is that they'll take these content oriented sites or Orbitz is actually another one and they will, they'll continue to use the relational database but they'll also start to build out new portions of those applications on something like CouchFace as a document oriented database that can handle the throughput and latency that they need for those new requirements. So now that same old application is using well it's not really the same old application, a newer version of that application continues to use that relational database so they still have the same data that they've been using for quite a while, they can still use a lot of those rich capabilities of that relational database, but because of some of the limitations and because of the way it was originally designed it's very hard to scale that thing out, it's very hard to hit these new demands, so that's where it will start to miscalculate. In these content and content with recommendation targeting kinds of sites, it's actually I guess, I never thought about it this way but it's sort of incestuous, the reason people do content or recommendation targeting if you're a content oriented site because you're trying to keep people on the site longer, the reason you're trying to keep people on the site longer so that you can target them with more aftertests. So the two are actually reasonably related but they as well will have a large number of events that frequently will go directly into Hadoop. And they as well will continue to build not only user profiles but also in many cases the very specific content that they want to target to very specific users and when I say that they'll actually frequently build that out to the extent that it's even rendered HTML. It likely exists as a JSON document sitting in catch base that is a subset of what's going to get rendered and directly sent up to the user so that they can deliver that kind of throughput that's needed. Now at the same time that relational database that might be used behind that application they want to use that data in something like Hadoop to build out those user profiles as well and build out additional targeting for that application. So as the site is further developed or new content is deployed that needs to go in here so that some assets are available to be able to target in an individual use case. So just to see how what the moving parts are in that particular scenario. If I do have that sort of content driven site I have that original RDVMS and I probably have that catch base server cluster this web application that's developed internally in their environment is obviously interacting with that RDVMS in MySQL might be horrible etc and it's also interacting directly with that catch base cluster depending on what it's circling up if it's trying to serve up a recommendation. Now as users interact with that site it's also generating typically just lots of log data and this is data that historically we've had for a lot of these kinds of applications but we've honestly we've thrown it away or we didn't treat it as things that could be what was the term big data big data operations petabyte so now we've been throwing away petabytes of data that we could have been changing the world with so now we have an opportunity to capture that data and efficiently really understand it by being able to take a set of that content that might run as a scoop import from that original RDVMS into something like a Hadoop cluster as well as have this round tripping of being able to pull data from scoop. In some cases people don't necessarily have this leg of taking data out of the catch base it depends on where they're keeping their user session data frequently we've seen a lot of requirements where people do start to use catch base for session data but they'll have that round trip where they'll pull that data into Hadoop as well after a lot of the heavy lifting occurs to then export back to, if you will, the operational system that is behind the site so in this case we're actually able to use scoop as a common tool to be able to move data between these different systems so scoop actually stands for sequel to Hadoop if I remember correctly although ironically the first certified connector that wasn't just a default to even see one was for no SQL system it was a catch base so we we went through that work together to pull those pieces together so scoop allows us to pull together the different pieces and then run that, run the different kinds of algorithms at the Hadoop level across that data and since I know you're just all dying to know the way the law is getting to there's another project called slumber which is all about transferring streaming log data into Hadoop and it ends up, I like to call it the big data feature dish so it's a place where you snick everything and you see what grows out but it's that kind of bridging element and by the way FOOM is included in the CDH4 the only bit of distribution you need to know so just briefly let's look at what moving the data itself what the process there is at this stage if I were using scoop from a from to import data an import is really to scoop an import is all about bringing it into Hadoop this is actually as simple as it needs to be all I need to do is specify in the world of Couchface we actually we distribute the cluster configuration through REST through a REST comment stream so really all I need to do is specify where is the cluster just one node of the cluster I don't have to worry about all 18 I just say here's one node of the cluster and I can specify if I want to dump that data in the case of Couchface there are a couple other parameters if I wanted to get into a specific bucket things of that nature buckets are analogous to databases or with Couchface whoops I may also choose which is sort of interesting I may also choose a couple scenarios where this has been used to do what's called a backfill a backfill differs from a dump in that a backfill will connect to a cluster for a brief period of time and get a copy of all of the data changes that are occurring on that cluster without actually dumping all of the data out of that cluster so by doing this I can actually sample over the course of the day so backfill under score 5 is a way of saying if I want to dump the data that's coming or sorry backfill or sample a certain subset of the data as it's getting modified in Couchface I'm going to import that into Duke and I can do that just for that 5 minute period of time so I may have effectively a crumb job or other ways of scheduling that I'm sure there's another way to schedule that which is a project included in CD4 the only big distribution you need to develop I may run that job on a regular basis to sample what's actually happening in my environment maybe there's certain critical points during the day when I want to do that sampling since it affects certain kinds of user profiles the opposite is actually rather easy so it's really an export it's really a question just specifying where your cluster is and that you're going to dump that data and what are you dumping it into what is the is the is the directory in HDFS that you're exporting and so it may actually look relatively simple because it looks like I'm just moving some data around within a system but reality is behind each one of those command executions a map reduce job is generated distributed through the cluster and then it will directly it'll split up the data and decide how to how to most efficiently move it over to the HDFS cluster and this is also simple because none of the new connectivity options are in here because you can tell all that through command line variables so the new home is set here, the new comforter is set here so all the stuff that tells it what new cluster are you connecting to and where you find it and where's the data that's all just implicitly in environment variables that were the previous 10 lines that correct, yeah that's just something that people set up ahead of time that you don't actually interact with on a regular basis the the way that you specify which bucket is actually relatively simple, it's three user names and passwords those are just additional arguments to the scoop tool itself so I know this was a pretty high level review and that's certainly what we have I can go into some more detail but if you're interested in quite a bit more detail about either catch base or how catch base and Hadoop are used together in a couple of different environments especially if you're here local we do actually have an event coming up next month, catch com in San Francisco where we're going to have a number of, there are three separate tracks and there are a number of other use cases that we'll describe we have a number of companies that are coming in to show how they use them in their environment I can't talk too much about it at this point but after about a month I'll be able to talk quite a bit more about it it's a pretty interesting content oriented use of this so with that, questions? it's a relatively, yeah besides gain, like the size of online app are there any other use cases that you see? well, so the third would be the one I mentioned which is really content oriented sites the other that we've seen in a few different situations are seen financial some financial deployments especially where people are doing it turns out a lot of financial deployments have many of the same patterns, right? trying to recommend different products to their users and they have a lot of the same web aspects I was just thoroughly agreed with you financial would be one the other one that has come up relatively frequently in facturing we've even seen a few situations where people are applying the kinds of things that you might do with big data to gathering metrics during production process and using that to feed back in the loop later or long term understand what's happened one other sort of funny deployment was unfortunately I can't say exactly who they are they're actually an appliance manufacturer and so they're getting their appliances when the service folks go out in the field they get different data from these appliances and it will get fed back into the system and then they'll use that to understand or predict what's going to happen with those systems in the field I remember earlier today I don't know if he's in the room I don't think he is but it was one of the railroads and he had a really interesting so apparently they take these high resolution images of trains as they go by and then they keep that data they don't necessarily know what they're going to do with it yet because then a failure will occur later and they can go back and look at how did that failure come to happen so at what point they're going to use those images and he said in their environment what that allows them to do is to better predict how they have to service a train and it turns out it's a lot easier to service it if you do it before it fails so if it's blocking a track it's not a good thing so that was another environment so there are lots of different uses I would actually be interested to hear what that's where this conference is very good from my perspective because it's interesting to hear how this applies to other businesses that we have in traditionally government how are you guys in say like telco or government one of the big use cases I would think this would play well with your strengths is your cell phone is reporting lots and lots and lots of data and of course they would never look at that data or do anything useful with it no, of course not and Nuke wouldn't be a great way to do that it's how I would imagine and targeting is pretty big brother already it all ultimately comes down to antargeting click on the silly little ads and make them happy is that a use case that you guys I can't say that I've dealt specifically with government telco has been one frequently they're pretty careful about how to use that data I do remember one was really about actually emergency response in that case they were trying to deal with regulatory requirements being able to very quickly understand what data is behind when somebody gets 911 and even know which cell powers they didn't add previously that was really about trying to get specific regulations but I'm just enough about that any other questions no, great, okay well thank you very much and by all means grab us if you have any other questions and otherwise otherwise there should be ice cream thanks for your time