 Well, okay, I don't know probably we should we should be starting now or somebody from the organizers will just give me a go Okay, okay. I've got to go so so we can go and start I hope I'll be talking fast so we don't ruin the schedule more than Then it's necessary. Okay, so let me introduce myself first. I'm magic proknyak. I came from Warsaw From Poland. It's my first time in Madrid. I'm quite happy about it and I walk for almost 10 years It's not so small software holes called toke We're working for different different platforms from content delivery networks through various enterprise grade systems to installing some Hadoop plasters Doing stream processing and so on but my our main focus is on integration in in large enterprises And one of our largest customers are based in the communication sector We're working for I would say half of the largest polish mobile operators So they are not as big as telephonic. Yeah, as I've had many times At this conference because they are based mostly in Poland, but still they have quite a lot Interesting data to process right so let's let us remind us What are the data that the mobile operators are processing in kind of real-time streaming way? Because this is what we are going to talk about. So the first of them necessarily our various Events connected to you making false and call sending SMSes and so on like content deliver records And then various billing events will they charge you for that call not? And also because everybody now is online various data on network usage Did you go to booking come or Google stuff or or whatever and also a lot of lot of Localization data from a mobile phone either from GPS or for kind of more network connected Stuff how close you are to bait transmitter station and so on So as I've said our customers are not as large as what I found or telecom telephonic But still they have quite a large of amounts of data and in terms of calls in Poland's like three to five thousand per sec and peaks and When it comes to network usage and localizations and the situation is a little bit kind of complicated because you can measure localization data in various ways either it's active or passive and so on and if you track Your local your customers localization in big with biggest granularity you can easily end up with why like 100,000 events per sec So what we are using this data for well, I'm going to talk about two menus cases one and it's fraud detection There are many many interesting ways that fraudsters tend to to use telcos for once frauds on premise premium Usages and other spamming people of SMS's and also some more elaborate stuff like for example Cloning sim cards or simbox them in nation fraud. I won't go into the details because I don't even frankly speaking I don't understand how all of those frauds works But still we want to detect them very quickly. We want to compute for example aggregates How many SMS is to unique unique our numbers? You've said in in last hour if it's like three thousand and probably something is going on And also where have you been during your last location and last but one location because in like if you like in the five seconds You've moved from I don't know to magic to build by all then probably again. You're up to something It's just not possible and then if you detect that you are probably fraudster We want to probably to block your account as fast as possible Maybe send you an alert beforehand so so you can react But the most important thing is that we want to react very quickly and another important kind of area Or use cases is marketing right? We want to Find an interesting event. For example, you're running low on your balance or you are approaching our point of sales And we want to offer you a brand new Mac with a coffee We have to check that you are the customer that we want to interact with for example You're not some kind of low-income pre-pied customer, but real post-paid one that we want to offer something nice and Again, we have to react very quickly quickly in runtime in real time because if you Just move past our point of sales then probably it's no use sending this SMS like in a few hours okay So on this conference you hear a lot about official intelligence Machine learning algorithms as own but for many cases at least at our customers for them. It's kind of Enough in the first place To be able to filter the events to enrich them in context probably also score them with some Who have some machine learning models? But first thing is to filter out the interesting events and to enrich them with some kind of context of Of the customer and with fraught detection, of course It's also very important to be able to compute in real time some kind of windows aggregates how much Then without using how many calls are you are you making and the interesting thing is that? Mobile operators did similar stuff for many many years in their core billing systems, right? If you run out of balance on your prepaid account, you are blocked in the real time, right? So why do we need any kind of more? Advanced or modern system for that the the thing is that billing system is a core system And it's kind of very hard to change people are afraid to touch it because it's at the core of the company so Nobody's allowed just like that to touch the billing system as and to change the rules For example when to block accounts and so on on the other hand they have many many analytical tools, right? Warehouses data likes you name it, right? And it's quite easy to get some data from that You write SQL probably probably you can use some more elaborate Visualization tools and so on and they are easy to use but on the other hand The data flows to to to those places not in real time and they tend to have kind of low SLAs So they they are not necessarily fit for kind of I would say production use by production I mean sending real SMS is blocking cry real client accounts So the thing that we want to do is To be as fast as the billing system to be in real time and have really good SLAs But on the other hand we want the users to be able to interact with it as easily and as they do with any kind of analytical Systems, right? So so so this is kind of a challenge and how do you approach it? Well, and most of deployments of of similar systems. We end up with kind of Architecture more or less like that. So in the center, there's the screen processing engine in our cases Apache flink and then the data flows from kind of source systems through usually Apache Kafka and Again on the right We perform some actions over directly all we send it to to to some kind of output topics and Probably you have some fast real-time client profile to to enrich to enrich events with with some kind of more static date and at the above we have something That's our own that I'm going to talk about right So this is kind of not so special probably many of you have seen such architecture but the two core I would say Components are on one hand. It's Apache Kafka's standard method broker and Apache flink as our powerful stream processing engine, right? Apache flink is probably a little bit less Well-known better for example spark string But nevertheless, you should really check it out if you haven't heard or used it At this conference that's aliyah shakratek who's one of the creators So he's a great person to to ask about it. He's staying right here. I think So why we've chosen flink because at the point when we started there was really almost no no real competition and we want to have very low latency like in I know in milliseconds or tens of milliseconds it had really Impressive window API that allowed you to to define very complex state processing it also Could give and of course currently also can give guarantees on That the data processed from Kafka and sent to Kafka will be processed exactly once no matter if if the job that you are You created failed or not or you redeployed it And also can handle pretty pretty large states in real time. This is especially important for for example Fraud detection because for example, you have to keep running aggregates of for example all All the unique numbers that people are calling in in the last few hours So this state can get easily out of control that can be Wow people sometimes even reach terabytes We were very rich like tens of gigabytes, but still it's quite quite quite large So think is great But as you look at all those points you can see that they are mainly about operational excellence and the API's for the developers to to work with right I Don't know if this appeals to for example your business users or stock stakeholders Of course if you if you tell them what that means then it's okay But how do you actually use let let user design their crowd detection rules and so on so now we'll talk a bit about how to create fling jobs because Like spark or or similar frameworks fling comes with really nice Scala DSL and let's you write a kind of concise and self explaining Self-explaining definitions of jobs, but of course it's self-explained by the for developers and for poor business analysts Who are used to I don't know in our case many times to excel and SQL probably not so much probably they won't Necessarily understand. What is this underscore before customer and so on and the situation is even harder in our case because we are not part of the Of the mobile operator. We are independent software vendor and that At our customers they have their own silo's is right. They have business people Analysts IT teams and heaven forbid ops and their boundaries between each of these kind of departments But you can tear all those boundaries, but you cannot tear this one We are separate entity and it's always better for our customers to design rules themselves then go to us Prepare change request put it into JIRA and wait when we have time, right? So we want to let users or at least analysts define the business rules and This is kind of our history like more than half two and half years ago We started proof of concept of such solution with fling. We achieved of course great results But then of course all the processes where hand code is right hard-coded and then our client approached us and said, okay, and if you want to To write new processes, what would we do or change it? Well, you will have configurations and you can change it But configurations is not enough for us, right? For example, we have to add new rules add new conditions and so on So we said Okay, so we'll let you configure a little bit more. You will write this call expression stuff And we'll just integrate it. We'll compile it integrated. You'll learn it fast. Scala is easy is an easy language So I said, hmm. Yeah really Maybe we could do it But they still didn't look to convince and we were also not convinced that they will be able to do to write Scala code So we thought, okay, we'll prepare you a graphical user interface where you can just Drag-drop and see and on some kind of diagram what you're doing and they said, okay And this is how we created our open source project called no snucker I know the name is kind of strange But it comes from the fact that flink is from Germany and it has square in its logo and For some stuff you probably need some more something more than square to to crack a nut Okay, so yeah, if you like it, I have stickers with it. So just grab me afterwards So what we wanted to achieve What we still want to achieve and I think at some of our customers we managed to do it Is to create some kind of closed feedback loop. So first our analysis will come up with some idea Maybe do some explorator Explorer data exploration and in static warehouses and so on then we want them to be able To design the process to run on flink test it kind of locally in some kind of sandbox testing then we want them to be able to deploy it in kind of I would say staging environments let it run for like two or three days and see what happens and After what we want them to be kind of courageous enough to click the deploy button and to gather the results from from production, right? so this is kind of At that time we thought it was a bit ambitious goal to to let the user deploy stuff in production We are talking about kind of analysts who who were more Used to writing SQL queries But at some of our clients we managed to do that and the main assumptions what that we started with were first all the expressions and Well, generally generally all this user interface should be accessible for people with kind of semi technical skills a little bit of SQL Excel but no programming right and Also, we had to make it very easy for them to be able to test and experiments. So they won't Be afraid to click test to click deploy and see what happens, right and the the last point is That we still assume that some parts of the code will be written in scholar Java, right by us developers The integration the model and so on the things that usually don't change that often and if they change Then you probably need some some coding anyway because We've seen some of I would say graphical tools for designing stuff like that, but once they decide to be Totally zero code they tend to be very very complex and we feel that maybe it's better to To stop somewhere somewhere in the middle so that we still need developers from time to time to read to write new Integrations newer aggregates and so on but still we want to let users do most most of the stuff So we end up with something more or less Like this so this is kind of the graph of one of processing jobs right this a toolbox and users just drag stuff and Can do some filtering enrichment aggregations and so on right so in the toolbox and the users have Both kind of common stuff like filtering defining variables and so on but also They have blocks that are unique for their use case like for example get client data from Redis or compute This scoring model and these are Developed beforehand Right and once the user is drags and drop this stuff. They have to configure the filter Write some expression and so on so after a while we thought That we'll use something called spring expression language for those of you who are Java based They probably know that it's simple expression language used for configuration files And we are quite amazed to find out that it can perform very very well in fact we We execute such expressions like I would say 200,000 or 300,000 times per second and it can deal with it and the language is Simply enough to be able to do some kind of basic code completion in the browser So the users can know what they are up to and if they they make the mistake we can correct them Even before they save the process, right? So we do some kind of static analysis of both the whole graph and and the expressions I think it's it's kind of nice way to To to operate with with your users to let them know that they made some syntactic mistakes I think with languages with such as Python. It's kind of more more problematic But underneath we use scala so we can do quite a lot of stuff static nice static type Okay, but how all these kind of custom parts Will arrive and appear in our toolbox? Well, the idea is pretty simple. The idea is that when you kind of deploy Deploy this new snacker at your organization. You prepare a jar file with Java scala or whatever JVM classes containing your business logic and then you're you do it just once Of course, you can update it later and then the users just write processes They are safe as JSON files and together The model the code and the JSON will be deployed to to flink and live happily there Right. So so idea is pretty simple. Of course under the hood many many different things can happen and in this model The developers can define the data be they can be static like pojos or a class files Oh can be they can be kind of automatically discovered if you use avro and some kind of things like schema registry and also some Sources of data things of data There we just kind of more or less mimic flink API's and Also some various services for enrichment doing some actions like blocking accounts and because messes and so on and also some custom transformations will talk about it just a little bit later because They involve kind of more advanced flink concepts so when We deploy in snucker at one of our customers. We have to just implement this kind of This kind of trade we have to define services sources things some kind of global functions and so on And let's look now how to how do we define for example a new ways to To to enrich the data for example to to go to Redis and fetch some stuff. Well, we just write Normal Scala or Java if you like code We annotate it with with some kind of parameters and this is kind of the API of this Component that will appear in user still box right for developers. It's very easy to to write and then the user the end user would just have to parameterize it with some kind of Expressions for example, where does the customer ID come from right? So the idea that it's just kind of function. It's very powerful because we can You can use it for many many way many different things. For example, as I've said you can get some additional data on the customer from kind of second Second storage you can use it to To perform some actions like block client the cloud accounts and SMS and so on and so on but you can also use it to To score your data with some models. For example, if you've been to yesterday to the talk about PFA This is also something that you can do so you can integrate for example models export it from I don't know our or spark and embed it it into your Java code and then just Useless knacker to to to to score your models right, okay, of course these are just kind of Simple stateless functions, but what about stuff more advanced stateful fling stuff like windows handling state and our our fling goodies, so we figured out that we what don't want to expose the vast is Flink APIs to our users Because they will be overwhelmed by that right the the amount of different windows that you can configure with flink is It's very very large and easy to make mistakes. So currently we just exposed some kind of preconfigured windows Where the users can define for example the length or the key by which the windows are partitioned and also some some nitty gritty details of fling jobs But the idea is that most of the stuff is pre-configured So they don't fill of them well and they don't do too many mistakes and we the developers We can write this normal fling code and again annotate it with some kind of parameters and then it will be integrated into the flink Flink process that will be deployed when the user Decides to to deploy his or her His or her process right but nevertheless we can reach pretty interesting results with that because we can Configure not only simple stuff like keys or length of the window, but how do we aggregate stuff and in the meantime we've learned We managed for example to teach our analysts to use stuff like Like hyper log log to compute kind of approximations of unique numbers that client called Of course, they don't know that's hyper log log because it's called like just unique count approximation You don't have to understand the in nitty gritty details. Otherwise, you wouldn't be here, right? But they are able to use it and it's kind of works. I mean it works not only kind of so so the stuff that that William Told us yesterday about it can be used also in in such graphical user interfaces Okay, so now Our users can define this nice process Colorful it looks okay But what now? Of course, we don't want them to deploy it in production immediately. We want them first to be able to test and then when they are really Really sure that it won't break then they should deploy it so we need some testing and We come up with we came up with two two different kind of flavors of testing So the first one I wouldn't call it using testing but maybe kind of standard box testing So this is how kind of normal production Production pipeline looks like we take data from Kafka We we process it with flink we enrich it with some kind of different data from different data sources And then we can perform some actions invoke some services on put it into or put it into different Kafka topics But now we don't want to take real data we want to take data from kind of file with with our prepared test data and then we want to Create some kind of a sandbox flink mini cluster with within our user interface and take the data from from From the test data file again take real data about the customer So so that the results are kind of more or less like real, but then mock out Output Kafka and mock out external actions, but let the users just check what would be invoked So, yeah, they have this This test data file They test it and then they can see how many events passed through which nodes What were the results if if everything was filtered out or if we decided that That all the events were a fraudulent and then they can kind of dig inside and see what were the results of invoking all expressions Did they made a mistake or should they just adjust? I don't know adjust thresholds for detecting fraudsters and so on and so forth Right and it wouldn't be so easy for them because sometimes the data is kind of Kind of complex but fortunately because most of their data comes from Kafka We found an easy and nice way to generate this data That is we just take I don't know 10 or 10 10 or 100 of Latest events from particular Kafka topic we downloaded let the users let the user look a bit a bit on Them and then they can tweak them a little bit for example enter their own phone number and then test Do the sandbox testing using these prepared data? Okay, so this is this is kind of a nice feature that led them Test easily in some kind of that sandbox testing after reassure that there are no large mistakes We want them of course to deploy stuff to some kind of real environment and again Thanks to the nature of fling Kafka ready since one especially the nature of Kafka we can do a simple step that allow us To to do some kind of more like user acceptance or integration testing that is we take our production environment and we make a small copy of that in kind of like Staging environment we don't want to Run all of their processing there Just the process is that they are worked on at the moment because it's smaller environment, but still they should be able to process The whole amount of data for one process and then we duplicate duplicate the event stream right so our users can deploy The processes they are working on on the staging environments when they can see on on the real data Just duplicate it between our cast Kafka clusters and to see for example for a day or two How do they perform before? To analyze the outcomes Because before they click the migrate button and deploy stuff to production and again we use The exact clone of our customer profile so we can replicate the production environment as much as As we can and then of course there comes the big day when they really want or big afternoon When they really want to deploy stuff on on a real production environment and block some clients And at our largest largest deployment, they did it quite a few times we have like more than 50 fling jobs running both in fraud management and real-time marketing and the Amount of events processed by by all those processes Are more than I would say more than 100 500 thousand per seconds in during some peaks and they Were able to draw quite large diagrams. So it's kind of not so easy to To see what goes wrong if something goes wrong So what do we need to have we need to have of course good monitoring, right? Use grafana for that usually influence db and grafana is our kind of default monitoring stack So when somebody Creates such diagram and deploys it We kind of automatically create some simple grafana dashboard when they can track some basic statistics Like throughput latencies amount of errors and so on And for many for many Errors, this is kind of enough of course for kind of more detailed analysis they need to have All those data puts to I don't know some kind of elastic searcher or whatever cluster more advanced tool so that they can look at At the particular events and analyze if they were handled correctly, but for kind of First monitoring These simple Simple grafana dashboards are enough. For example, this is this is a case from the last months at one of our deployments This is the minute that one of our users deployed A version of a diagram that couldn't handle the load because it tried to To invoke some kind of expensive web service to to get some data and He could see more or less immediately. Of course, he didn't go to the metrics web page, but still He could see that the performance Dropped dramatically and after he corrected his mistake because he just created some additional filter and so on So it was kind of simple fix. We can see that performance is okay again, right? So some simple metrics accounts and latencies can can help really really a lot To to detect some kind of bigger errors Okay, now we still have a little bit of time. So I'll try to show you how it works I will try to show you I hope so Kind of it So you can see this is kind of Maybe resolution is not perfect, but I hope that you can see something So this is kind of simple A simple diagram that tries to detect some fraudsters that we take Our source of data that is the cause made at mobile operators We do some first filtering We can see There are some fields that we can use for example the balance charge or Yeah, or mssdn that was calling that the phone number that was calling the phone number that was called and so on and so on We can do some basic aggregations For example here. We want to detect how many unique numbers the customer called in last four hours And this this number is large enough. Okay larger than two probably. This is not good in real use case We can take his customer profile See if it's a post bay or postpaid or prepaid account And then maybe even prepare some kind of in my case This is very simple Exported to to to this pfa format and score it to to see if we need to Iver Send just him an alert that he calls too much Or if we are sure he's a fraudster just blocks his account, right? So this diagram is pretty simple, but nevertheless It can save our customers quite a lot of money So now we're probably not really sure that it's working yet So let's generate some test data Like 20 samples. Okay, we have We have a file with With the data that came from the kafka. We won't be looking at them too much and then we drag drop And now flink mini cluster is spawned inside inside our environment and we can see how many of these samples Just went through through our filters, right? So this is the filter that filters out the largest amount because this is the filter of How many unique numbers we're called and now we can see Here, okay, this is jason. So this is again something that your customers have to get used to What were the outcomes of of which expression cell and filters, right? And if we are sure we can push the deploy button. I've did it Sometime ago and see the matrix. I have some random data generator that works pretty well And we can see so how many events per second we are processing on which Which nodes the events were ejected how many nodes? Received this final notes. How many of them? Received how many events and some basic latencies for example, how much time will pass from from the From the call to the moment when we process the data, right? And after a while, for example, if we If we left the Our process for a day or two, we can check how many For example, events passed through Each of the nodes in I don't know in last hour I had some problems with time zone. So we'll see today and we can see that through our process like 160 Thousand events passed but only Only 100 50 hundred passed through through all those filters and so on so we can Hide it and this is kind of more or less how our clients Are used to work nowadays. Probably they also want to migrate stuff to production and so on but I think you should Get the idea of how is this working? We don't have Much more time, but it's good because I don't have much more to say I want you to remember this this idea that we want to That we want to close the feedback loop and let our users With minimal help from developers come from the idea to process design Testing in sandbox testing in kind of integration environment and then deployment to production and monitoring This is very important that there's minimal aid from From developers who are like me usually well expensive lazy and they are Kind of not not always ready to help when when business needs it But on the other hand probably even if you can Manage to to to achieve 80 percent of your stuff with ui You still have to remember that you have to have competent Expensive developers to do the last 20 percent of that with the code because You cannot just go with zero code systems. Some code needs to be written Right and I think solutions such as ours can be used mainly in kind of on the middle ground between Between cases when you want to do some kind of ad hoc analysis that have exploration visualization And between places when you have to be real sure that your processes are working correctly when you probably want to have Real kind of coding practices like reviews continuous integration deployment And on the middle ground you have processes just like marketing stuff or fraud detection that You need Production ready stuff Good production slas, but on the other hand if your user makes a mistake, probably it would want Have kind of dramatic consequences of how of course you can you have to do some stuff to make sure that it It's more or less safe, but still most of the processes They can just be designed for kind of by analysts or business people And if you want to learn something more about How we do it and and whereas it is on the github and this RFID code should point you to some kind of case studies I don't have much experience with RFID code. So I don't know if it works, but hopefully it does We still have I think almost two minutes for questions So thank you very much for coming and listening to me and enjoy the rest of the conference Yes, how long did it take to build this and Was this a customer initiative or your initiative? Well, well, it kind of was Joe I would say joint initiative I mean we came up with the idea to build it and the customer accepted the idea that we will build it and open source it and we kind of We cover part of the coast they cover part of the coast But you know, it was surprisingly easy easy to build because flink is really great a great framework and In scala it was quite easy to to do it and also they are, you know, mature Frontend frameworks that let you do this this kind of user interfaces well Most of the stuff that I've shown you it was like, I don't know Less than a year by Team of two or three people. So I think it was kind of worthy Okay, thank you very much. I'll be around if you if you have any more questions You