 Welcome everyone. We're back at the Flink Forward user conference sponsored by Data Artisans folks. This is the first US-based Flink user conference and we are on the ground at the Kabuki hotel in San Francisco. We have a special guest, Stefan Uen, who is one of the founders of Data Artisans and the creators one of the creators of Flink. He is CTO and he is in a position to shed some unique light on the direction of the company and the product. Welcome Stefan. Yeah, so you were asking about how can Dream Processing or how can Flink and Data Artisans help companies that or enterprises that want to adopt these kind of technologies actually do that despite the fact that we've been seeing if we look at what the big internet companies that first adopt these technologies what they had to do they had to go through all this big process of of productionizing these things by integrating them with with so many other systems making sure everything fits together everything kind of works as one piece what can we do there so I think there are a few interesting interesting points to that let's maybe start with with stream processing in general so stream processing by itself has actually the potential to simplify many of these setups and infrastructures per se there's there's multiple dimensions to that first of all the ability to just more naturally fit what you're doing to what is what is actually happening let me qualify that a little bit all these companies that are dealing with big data are dealing with data that is typically continuously produced from sensors from user devices from server logs from from all this all these things right which is quite naturally a stream and processing this with with systems that give you the abstraction of a stream is a much more natural fit so you eliminate bunches of the pipeline that do for example try to do periodic ingestion and then grooming into individual finite data sets and then periodic processing of it you can you can for example get rid of a lot of these things you can you can of get a paradigm that unifies the processing of real-time data and also historic data so this this by itself is an interesting development that I think many have recognized and that's why they're excited about stream processing because it helps reduce a lot of that complexity so that is that is one side to it the other the other side to it is that there was always kind of an interplay between between the the processing on the data and then you want to do something with these insights right you don't process the data just for the fun of processing it right usually the outcome inference is something and sometimes it's just a report but sometimes it's something that immediately affects our certain services react for example how they you know how they apply their decisions in classifying transactions as far as or how to send out alerts how to you know trigger trigger certain actions the interesting thing is and we're going to see actually a little more that later at this conference also is that that in the stream processing paradigm there's a very natural way for these like online live applications and the analytical applications to merge together again reducing a bunch of this complexity and another thing that is happening that I think is is very very powerful and helping a lot right now in bringing these kind of technologies to a broader ecosystem is is actually how the whole deployment stack is growing so it's so we see actually a more and more users converging onto onto yeah onto resource management infrastructure as yarn was a yarn was an interesting first step to make it really easier once you've production as that part of production as more systems but even beyond that like the uptake of mesos the update of container engines like Kubernetes and so on the ability to just prepare more functionality button together out of the box you just pack into a container what you need to and put it in a repository and then various people can bring up these services without having to go to all the all the setup and integration work again you can kind of way better template integration with with systems with this kind of technology so those those seem to be helping a lot for much broader adoption of these kind of technologies both a stream processing as an easier paradigm fewer moving parts and and developments and to deploy on the technology so let me see if I can repeat back just the the summary version which is stream processing is more natural to how the data is generated and so we want to match the processing to how it originates at the same time if we do more of that that becomes a workload or an application pattern that then becomes more familiar to more people who have not who didn't grow up in a continuous processing environment but also it has the third capability of reducing the latency between originating or ingesting the data and getting an analysis that informs the decision whether by a person or or a machine yeah would would that be a yeah I think you can even go one step further it's not just about about reducing the latency from the analysis to the decision in many cases you can actually see that the part that does the analysis in the decision just merge and become one thing which makes it much fewer moving parts less integration work less yeah less less maintenance and this would be like for example how application databases are taking on the capabilities of analytic databases to some extent or how stream processors can have machine learning whether they're they're doing online learning or calling a model that's been that they're going to score in real-time or even a pre-scored model is that another example of where we put you can think of that of those as examples here and a nice way to think about it is that if you look at what a lot of the what a lot of the analytical applications do versus let's say just just online services that you know match offers and trades look at or want to generate alerts a lot of those have a lot of those kind of are in some sense different ways of just reacting to events right if you if you're receiving some some real-time data and you just you just want to process these interact with some form of knowledge that you accumulated over the past to some form of knowledge that you accumulated from some other inputs and then react to that that kind of paradigm which is in the core of stream processing frameworks like fling actually is it's so generic that it covers many of these use cases both building direct directly applications as we've actually seen we have seen fling users that build that directly built a social network on fling where the events that they receive are you know a user being created user joining a group and so on and it also covers the analytics of just saying you know I have a stream of sensor data and on certain outliers I want to raise a large it's so similar once you start thinking about both of them as just handling students of events in in this flexible fashion that it helps to just bring together many things so that would that sounds like it would play into the notion of micro services where the services responsible for its own state and they communicate with each other asynchronously so you have a cooperating collection of components now yes there are a lot of people who grew up with databases you know out here sharing the state among modules of applications what might drive the growth of this new pattern the micro services for you know considering that there's millions of people who just know how to use databases to build apps yeah so the the interesting part that I think drives this this new adoption is that it it's just such a natural fit for the micro service world so how do you how do you deploy micro services with state right you can you can have a central database with which you work and every time you create a new service you have to make sure that it fits with the capacities and capabilities of the database you have to make sure that the group that runs this database is okay with the additional load that or you can go to the different model way actually each micro service comes up with a zone database but that that time every time you deploy one and that may be a new service or it may just be you know experimenting with a different variation of the service they'd be testing you have to bring up a completely new thing in this in this interesting world of stream processing stateful stream processing as done by fling state is embedded directly in the processing application so you actually don't worry about these things separately you just deploy that one thing and it brings it brings both together tightly integrated and and these and it's a natural fit right the working set of your application goes with the application if you deploy it if you scale it if you bring it down these things go away what the what the central part in this thing is it's nothing more than if you wish a backup store where you would take these snapshots of micro services and store them in order to you know recover them from catastrophic failures in order to just have an historic version to look into if you if you figured out later you know something happened and was this introduced in the last week let me look at what it looked like the week before or to just migrate it to different different cluster so we're gonna have to to cut things short in a moment but I wanted to ask you one last question if like micro services as a sweet spot and in sort of near real-time decisions are are also a sweet spot for for Kafka what might we expect to see in terms of roadmap that helps make those either that generalizes those use cases or that opens up new use cases yes so what what we're immediately working on in fling right now is is definitely extending extending the support for in this area for the ability to keep much larger state in these applications so state that really goes into the multiple multiple terabytes per service functionality that allows you to manage this even easier to evolve this you know if the application starts actually owning the state and it's not in a centralized database anymore you start needing a little bit of tooling around this state similar as the tooling you need a database a schema evolution and all of that so things that actually make that part easier handling larger state and then we're actually looking into what are the what are the API's that that users actually want in this area so fling has I think pretty stellar stream processing API's and if you've seen in the last release we've actually started adding more low-level API's one could even think API's in which you don't think as streams as distributed collections and windows but you just think about the very basic ingredients is events state time and snapshots more control more more control and more flexibility by just taking directly the basic building blocks rather than more high-level abstractions I think you can expect more evolution on that layer definitely in the near future all right Stefan we have to leave it at that and hopefully to pick up a conversation not too long in the future we are at the flink forward conference in the Kabuki Hotel in San Francisco and we will be back with more just after a few moments