 Welcome everyone to the first-ever US user conference of Apache Flink sponsored by Data Artisans, the creators of Flink. The conference kicked off this morning with some very high-profile customer use cases including Netflix and Uber, which were quite impressive. We're on the ground at the Kabuki Hotel in San Francisco and our first guest is Dean Wampler, VP of fast data engineering at Lightbin. Welcome Dean. Thank you, good to see you again George. So, big picture context setting. Spark exploded on the scene, blew away the expectations even of their creators with the speed and the deeply integrated libraries and essentially replaced MapReduce really quickly. So, what is behind Flink's rapid adoption? Right, I think it's an interesting story and if you'd asked me a year ago I probably would have said, well I'm not sure we really need Flink. Spark seems to meet all our needs but I pretty quickly changed my mind as I got to know about Flink because it is a broad ecosystem, there's a wide variety of problems people are trying to solve and what Flink is doing very well is solving low latency streaming but still at scale like Spark, where Spark is still primarily a mini batch model so it has longer latency. And Flink has been on the cutting edge too of embracing some of the more advanced streaming scenarios like proper handling of late arrival of data, windowing semantics, things like this. So it's really filling an important niche but a fairly broad niche that people have and also not everybody needs that the full featured capabilities of Spark like you know batch analytics or whatever and so having one tool that's focused just on processing streams is often you know a good idea. So would that relate to like a smaller surface area to learn and to administer? I think it's a big part of it yeah it's I mean Spark is incredibly well engineered and works very well but it's a bigger system so there's going to be more to run and there is something very attractive about you know having a more focused tool that you know less things to break basically. So you mentioned sort of lower latency and a few fewer bells and whistles. Can you give us some examples of use cases where you wouldn't need perhaps all of the integrated libraries of Spark or the you know the big footprint that gives you all that resilience and you know the functional programming that lets you sort of recreate lineage. Tell us sort of how a customer who's approaching this should you know pick the trade-offs. Right well normally when you have a low latency problem it means you have less time to do works you tend to do simpler things in a time frame but just to give you a really interesting example I was talking with a development team in a bank recently that does you know credit card authorizations you know you click buy on a website and there's maybe a few hundred milliseconds when the users expecting a reply right but it turns out there's so many things going on in that loop you know from browser to servers and back that they only have about ten milliseconds when they get the data to make a decision about whether they this looks fraudulent or it looks legit and they make a decision so you know 10 milliseconds is fairly narrow that means you have to have your models already done and ready to go and and you know a quick way to actually apply them you know to take this data ask the model is this okay and get a response so a lot of it is kind of boiling down to that it's either I would say one of two things either I'm doing basic filtering transforming of data like your raw data coming into my environment or I have some maybe more sophisticated analytics that are running behind the scenes and then in real time so it's so to speak data is coming in and I'm asking questions against those models about this data okay authorizing credit cards so to recap the low latency means you have to have perhaps scored your models already okay so trained and scored in the background and then with this low latency solution you can look up key base look up I guess to an external store okay right so so how is light bend making making it simple to put what essentially has to be for any pipeline it appears multiple you know products together seamlessly that is the challenge so I mean it would be great if you could just deploy flink and that was the only thing you needed or Kafka or pick anyone up but of course in reality is we always have to integrate a bunch of tools together and it's that integration that's usually the hard part you know how do I know why this thing's misbehaving when maybe it's something upstream that's just behaving that sort of thing right so we've been surveying the landscape to understand you know first of all what are the tools that seem to be you know most mature most vibrant as a community that address the variety of scenarios people are trying to deal with you know some of which we just discussed and what are the kind of integration problems that you have to solve to make these you know reliable systems so we've been building a platform of the fast data platform that's approaching its first beta that is designed to help solve a lot of those problems for you so you can you know focus on your actual business problems and and from a customer point of view would you take end-to-end ownership of that solution so that if they chose you could manage it on on-prem or in the cloud and handle level 3 support across across the stack that's an interesting question we think eventually we'll get to that point of more of a service offering but right now most of the customers we're talking to are still more interested in managing things themselves but not having as much of the hassle of doing it themselves so what we're trying to balance is tooling that makes it easy to get started quickly and build applications but also leverages some of the modern like machine learning artificial intelligence stuff to automatically detect and correct for a lot of common problems and other you know management scenarios so at least it's not quite as you're on your own as it could be if you were just trying to glue everything together yourself so if I understand it sounds like the first stage in the journey is help me rationalize what I'm trying to get to work together on on-prem and part of that is using machine learning now as part of management and then over time this management gets better and better at root cause analysis and auto remediation yeah and then it can move into the cloud and these these disparate components become part of a single SAS solution under the management so looking out at the where all this intense interest is right now in IOT applications we know that we can't really go back to the cloud for send all the data back to the cloud and get an immediate answer and then drive an action yeah how do you see that shaping up in terms of you know what's on the edge and what's in the cloud yeah that's a really interesting question and there are some particular challenges because a lot of companies will migrate to the cloud in a piecemeal fashion so they've got a sort of a hybrid deployment scenario with things on-premise and in the cloud and so forth one of the things you mentioned that's pretty important is I've got all this data coming in how do I capture it reliably so tools like Kafka really good for that for Vega that Srikanth from EMC mentioned is sort of filling the same need that I need to capture stuff reliably serve downstream consumers make it easy to do analytics over this stream that looks a lot different than a traditional database where it's kind of data at rest it's not static but it's it's not like moving so that that's one of the things you have to do well and then figure out how to get that data to the right consumer and account for all of the latencies like if I needed that 10 millisecond credit card authorization but I had data split over my on-premise my cloud environment you know that would not work very well so there's a lot of that kind of architecture of data flows it becomes really important do you see light bend offering that management solution that enforces SLAs or do you see sourcing that technology from others and then integrating it tightly with the particular you know software building blocks that make up the pipeline it's a little both we're sort of in the early stages of building services along those lines and some of the technology we've had for a while our ACA middleware system and the streaming API on top of it would be really good for basing that kind of a platform that where you can think about SLA requirements and trading off you know performance or whatever versus you know getting things getting answers in a reasonable time good recovery and error scenarios stuff like that so it's all early days but we are we are thinking very hard about that problem because ultimately into the day that's what customers care about they don't care about Kafka versus Spark or whatever they just care about I've got data coming in I need an answer in 10 milliseconds or I lose money and that's the kind of things that they want you to solve for them so that's really what we have to focus on. So last question and before we have to go do you see potentially a scenario where there's one type of technology on the edge or many types and then something more dominant in the in the cloud where basically you do more training you know model training and out on the edge you do the low latency you know prediction predictions or prescriptions. That's pretty much the architecture that's emerged I'm gonna talk a little bit about this today in my talk where like we said earlier I may have a very short window in which I have to make a decision but it's based on a model that I've been building for a while and I can build in the background with you know where I have more tolerance for the time it takes. Up in the cloud. Actually this is kind of independent of deployment scenario but it could be both like that so you could have something that is closer to the consumer of the data maybe in the cloud and you know deployed you know in Europe for European customers but it might be working with systems back in the USA that are doing the heavy lifting of building these models and so forth. We live in such a world where you know you can put things where you want you can move things around you can glue things together and a lot of times it's just knowing what's the right combination of stuff. Alright Dean it's great to see you and to hear the story sounds compelling. Well thank you very much. So this is George Gilbert we are on the ground at Flink Forward data artisans user conference for the Flink product and we will be back after the short break.