 That's as well. Yes. Okay. Hello. I'm actually from the United States. I'm CEO of the United States, which is a company part of the United States. I would like to thank a couple of us for allowing us to present and try to speak to you. Okay. Thank you very much for the invitation. Of course. Sorry, I'm actually capturing you. Okay. Just give me a second. Let me just try. I'm going to use a friend of mine who is working in the Astrolabe system. So we use a lot of technologies. We respond to each other before and support each other and reflect on the same issue. Now, let me let you know it's coming up. As of today, I will be talking about this like another system. So I'm going to stop here, but I just want to explain a bit about the systems that are fairly old-fashioned. And then, we take a look at the systems. And then, what we are going to start. Start then, which is the machine one that we are going to start. And then, we will have what we are all looking at. We will be using images of them and we will look at them. And then, explain the disadvantages of actually productionizing and overriding. A lot of the theory is nice in the ways that I have done in the past. But then, after productizing the solution, we will attempt to discuss about that. And then, we will look at some of the solutions or solutions that are actually really helping us to overcome these problems. Many of us say, for us to be behind this, we will select a couple of solutions that we are going to do in the future. And then, the solution. But, we will call it a solution, right? We will name it a solution. So, we call it a solution because we are going to add a solution. We are going to add a solution. And, we can actually do anything with it. I mean, it's very good to do a lot of things. And then, I guess, it's complicated because we have a functional programming with an objective energy program and we have a lot of estimates and data. But, the beauty of it is actually, being a lead member, we need to do a lot of things. We love to have that. I hope, for the future, we want to see the future. This line is actually from 2004. It's a long period of time which, I believe, is one of the reasons why I come to this assistance to lead me in the case. So, if you think about this 14, 13 years ago, when Amazon and Netflix were ladies, who had conquered the world, it was long, long time, well, we didn't want to store the map, all the inventory. And, definitely, what happens in these places, we didn't want to store products, such as pasta, or, at least, what we had in front of us, was popular with the masses. And, we didn't find things that were not popular with ladies, but some of them were. That's a long period. So, we were looking at various of the products, which, ladies and gentlemen, we do like, so, the thing that is highly rated by something that is very specific to you, wouldn't be so often in these cases. And, that's how we always appreciate it. Because, then, we stop our products, but, at the same time, there is just a specific page. So, all of these ladies and gentlemen, products, on a smaller page, or, you know, on the screen, or, you know, on the side of the screen, on the side of the screen, and, you need to digital other parts, and, that's how recommendation systems start making comments in the popular fashion. And, recommendations systems today, I guess, each one of us has used it, even if, again, if you talk about Netflix, about Amazon, the hotels here, everything is based on our recommendations systems. There are two thousand movies, on Netflix, that's a program in the state of movies, in Miami, or, in the West, based on recommendations. We have 25, so, these are real numbers. There is business value in the recommendation system, and, as you can know, it's so much popular, and it's very, interesting factors. I wonder if you want to do a release, as well as a lot, these technologies in the, non-internet businesses. Typically, a recommendation system has been focused on the, the, the, the, I mean, rather than the top, but it's been used as a product as movies as a user, and movie as a movie fiber. So we can also build recommendations based on our recommendations. So, SILVAN movies as, recommendation possibilities of solicit mail replacement devices, or recommendation But this is all the content of the product itself. So what happens when you have the product or what's inside? The systems that go inside the product. So that's the content of this. And also, just to be able to see how good any product is. So imagine if you have a recommendation system on your server. But again, the single quality of your argument that I can add within the code in this product. And it's not based on the context. So you need to have a user context. There are a lot of things that go into making a recommendation system apart from this user type of product. So that's how you can order the most sophisticated product recommendation system. So here it's this evaluation. The AI in 2004 was designed to really even improve the design quality of your argument. And it became popularly designed with some set of products. They can go to our church, or they can go to our office. And so you guys will be able to get more interesting recommendations for this, which is based on attributes. So here, we have an example of an action movie, a way of doing this. So I have to be based on these recommendations. Based on this actually. And then collaborative filtering became more mainstream, because I think that this released a very popular paper. On collaborative filtering, after which it's been adopted widely. That's how you hear the stories about diapers being bought with beer. I mean, who would think about it? But in Target, this is what happened. They observed this behavior where users were purchasing diapers with beer. So they moved the beer cartons next to diapers to make it a better sale. And then the item base, as I mentioned before. Nowadays, we also have social base. So you want to have a social context, a peer group context. We all like Scala here. So you might be recommended Scala books, for example. And this kind of enhances the type of recommendation possibilities and also the accuracy of recommendations. And last but not the least, it's machine learning. So you have various machine learning algorithms that is very popular now. CD and LDA for recommendation purposes. We'll talk a bit about machine learning as well in the demo. Let's skip this one. Okay, what happens in a typical data mining process? This is CRISP-DM, which stands for cross-industry standard process for data mining. It's a fairly popular methodology for data science. Here it says data mining, but this methodology came in 1996. So some things get changed, but the methodology doesn't change. So when you're starting an analytics project, you have to have a business context and a data understanding. And that's the beginning of any process. So you try to understand where is the data coming from? What type of data exists? What kind of business environment are you working? And what business problems do you want to solve? And based on this, you would then start the data preparation journey of collecting, massaging, transforming, preparing, the various steps around the data preparation. Then we move on to modeling, which is the algorithmic modeling based on the data that's prepared. And finally, you evaluate these models against metrics. And you get a prediction score. And then this is an iterative process. You have to go back and forth before your metrics start to make sense or you collect feedback so that the real-world scenario matches with the predictions that you're making. So it's a constant loop, and that's why it's in a circle. So the journey never ends. You're always improving, constantly tuning, constantly preparing and deploying it. Before I move to the next slide, how many of you are actually working on Spark? Spark one? Spark two? OK. Maybe then I'll just give a quick introduction about Spark. The founder of Scala actually made a provocative statement that Spark is nothing but domain-specific language built on Scala. But in theory, it's much more than just DSL on Scala. We're talking about various algorithms. It's almost like a platform in its own. It helps data engineering. It helps machine learning. It helps building graph-based models. And apart from that, it helps fast computations. So a lot of things have come together, what we call a Spark today. But what I'm going to talk about is Spark machine learning pipeline, which was introduced in two. But if you haven't worked on one, then the changes in two will not be so relevant. So I'll just straight away go to the changes or let's say the new things in the Spark machine learning pipeline. What Spark has done is standardized or built a best practice or allowed high-level APIs to build a machine learning pipeline that can be applied across various use cases. What it provides is five different pieces in the pipeline, data frame, transformer, estimator, pipeline, parameter. All of these together combine to make a pipeline. Transformer is nothing but APIs that allow you to transform your data set. The raw data set that is collected is normally a transactional record or a business record. But machine learning algorithms do not understand these kind of data. And this data has to be transformed into vectors or let's say integers and numbers, basically digits. And Spark provides excellent APIs for these. Then you have the estimators, which are basically the model building blocks. And these model building blocks allow one to build models out of it. But you can test it. You can have a regression on the models. At the same time, you can deploy it for use cases. I think this is the demo piece. So the demo is going to be based on a million song data set from Kaggle. How many of you have heard or are Kaggle members here? OK. So for the non-Kaggle members, Kaggle is currently, it's a data science group. I mean, it's a social network for data science scientists. But bought by Google recently, it allows you to build algorithms, share your algorithms. Companies release competitions on this. So you can actually test your skills, machine learning skills or predictive analytics skills. And it's a great collaborative social platform for data science. Just a sec. I'm sorry for the resolution. Is this OK? OK. So sorry for the resolution. We had to change the resolution for the display. So I'm not good with managing resolutions. So what I've used is Zeppelin notebook. Zeppelin is a notebook that's open source. It falls into the category of Jupyter notebooks. How many of you are familiar with these technologies? OK. I see always this section. OK. That's great. Zeppelin is amazing. I would recommend all of you to try it out. It's one of my favorite tools because it allows you to run Scala. It allows you to run AngularJS. It allows you to run Markdown. It kind of can run any kind of backend engine. You can quickly test prototype at the same time build amazing reports out of it. So highly recommended product, which we have been using in our team. So as I was explaining on collaborative filtering, there are two types of collaborative filters. Explicit feedback system and implicit feedback system. So with explicit, we are expecting users to make explicit or provide us information explicitly, which means that user either takes on or let's say rates a product or says or likes a product or gives a thumbs up or thumbs down to the product. And an implicit feedback doesn't have this, which makes it more challenging to recommend a product because you don't have any information from the user to know what that person likes. So for this example, I've used a million song dataset which is on Kaggle. Later, we'll put this on GitHub as a link as well so that you could try out this notebook. And in this example, we'll be using the alternative alternating least squares algorithm to make recommendations. So what typically happens is in a user item kind of a matrix, you don't always have the user input for all the items or user feedback for all the items. That's why you have something called like a sparse matrix. And it's very difficult to make algorithms or build algorithms on a sparse matrix. And with ALS, it's a very interesting concept. It's basically matrix multiplication. So you divide it into two matrices, one of users and the second one is for items. And then based on alternating least squares principle, you start slowly filling the gaps or the deltas in these matrices, which gives you an approximate, so this is why it's called predictive or let's say machine learning because it gives you approximate user preferences. With Spark, it's very easy. I'll show you how you can build. Unfortunately, I cannot see the full screen. This one? In what way? How do you come up with the resulting two matrices from there? Right. So you take any of these inputs with an existing user and item recommendation and you use the ALS algorithm to slowly start filling each of these points, which gives you a user versus item preference, but it's again approximate. Okay, so it's a black box algorithm that you're not going to export? No. That's the intention of the meetup so that we... and I'm not a data scientist. But anyways, it's a black box. ALS, we feed parameters into this and then you get items out of it. I hope you can still see the screen. Okay, here this is simple scalar inputs of the Spark library. We've just imported the Spark library for the ALS. As I showed in the presentation, this is the ALS model. So this is the estimator that we are going to just build by providing the user ID, the song ID, and the place as the three columns of that matrix. And with Zeppelin, it's very, very interesting because it just maintains the context. You can run. So now I have access to ALS for further processing. I already have the songs list, which is basically a unique ID of a song and the song ID for that, the song that was played. So all of this has already been anonymized and built into, let's say, a data set that's already ready for building algorithms. Similarly for users. And the interesting thing here is that you can get the data from any source. So you can have it on Hadoop. You can have it on S3. You can have it on various data sources and Zeppelin is able to connect to it and you can build or work on that data interactively. So here again, we are preparing the data sets, the three data set songs, the place, and then the user song play matrix by preparing the data frame around it. Once you have the data frame, you can go into model training. Model training is fairly simple. You split your model, yeah? Sorry? The data frame is not as fast as... No. Yeah, yeah. Yeah. For typical predictive model testing, we just have to have a test data and then you need to have a training data and then you have a test data. So we are going to split it on to training and test using this simple function again. And the reason I'm showing this is that it's very easy to build your own algorithms or machine learning pipelines in the course of, let's say, one or two hours. You can quickly test if you have the data sets. And Spark provides a lot of rich algorithms to play around. Once we have the training data set, we train the model. So we have this ALS, the black box. We say, okay, take this data and then give me the predictions, which I will test against the test data. So now it's running. It will take some time to complete. It's a VM with 8-gig memory and 2-core CPU. It takes about a minute on my machine if I'm not wrong. It's an iterative process. So every time it's calculating the missing pieces of the matrix. Once the model is ready, you would use the transform to make your predictions. It's as simple as that. You would use a model.transform. Again, this is coming from Spark. All of this is Spark API. So you connect the data. You prepare the data. You use a machine learning algorithm from Spark. Train your model against the test data. And then test or validate or cross-validate that against the predicted results. Let me see. This is playing with me. Okay, it completed in about a minute, 20 seconds. And then you're going to get an RMSE score, which is root mean square error. But this is kind of like a benchmark to validate or test your models, how bad or how good are these models a lower number of RMSEs is better. We're not going to talk about hyper-tuning or tuning these models for better performance or benchmarking. This is just a demo to show how simple it is to build a model on Spark using Zeppelin. I'll let it run and come back when it's finished. It's run out of power, I guess. Hello. Okay. Do you know the virtual box? How to come out of the full-screen mode? I think... Does anyone know how to get out of the virtual box full-screen mode? Technical problem. Yeah, I tried to control H right-click, but... So that was a quick demo on how you could build a model and train ALS algorithms on a data set. So is that the end of it? Like, what happens after you build the model? I mean, a typical example is that you want to recommend products on an app. Your data scientist will build a model and then how do you actually make those recommendations? That's the biggest problem that most companies face. Not in building models or not in testing models, but how do you deploy it in production? Because after prototyping, the challenges are always the same. How do you scale it? How do you distribute it? How do you make sure it works in a distributed environment? How can you make sure the different components are talking to each other? How do you make sure the separation of concerns are maintained so that if your recommendation engine goes down, the whole app doesn't go down? These are the challenges that a typical architect or a typical system engineering team would face because the models are good, but not sufficient to make the recommendations. Because in reality, the machine learning workflow looks like this. You have data that's coming in from different sources, streaming as well as data sources. Then you ingest it either using Spark or Kafka or various systems. We do processing. Data processing again could be Spark. Then you would do the model training. As I just showed you on Zeppelin, you would do model training. That's when it stops because you build these models. Everything is ready. The data scientist says, okay, here's the model, but how do you move it into production? How does your front end or your client-specific applications receive it? The deployment and going life is always a challenge. Then you need to also have a feedback loop because, as I said before, the user preferences keep changing. The recommendations can be improved based on the user preferences. If you recommend a product and if the user doesn't, or let's say the users don't react, according to these recommendations, you need to make changes so that the newer recommendations can come in. All of these factors make it difficult to move recommendation algorithms or machine learning algorithms into production. What I'm going to show next are two solutions that we believe help moving production, or let's say productionizing algorithms. One is prediction I.O. This is still an open-source project but recently bought by Salesforce. What prediction I.O. does is that it has built prediction APIs on Spark and Hedgepace and Elasticsearch, I believe. You can also have different backends to do the different pieces of modeling. It also provides a rich ecosystem as well as a standard API to build a pipeline. It allows you, as you can see here, there are two servers. One is where you deploy a model. You could train it in Spark. You could train it in different machine learning algorithms or systems but you could deploy it on prediction I.O. At the same time, you receive the feedback from the users in the same system which makes it very convenient for users to, or let's say not users, for companies to move things into production. The second one is fairly new. Clipper is, I think, not much talked about because it's coming out of the university and their vision is to kind of be the middleman because today we're talking about TensorFlow. We talk about various algorithms and various, let's say, groups. Some people prefer Scalas. Some folks would say we develop only in Python or Java. And similarly, in machine learning, there are, let's say, fractions of popularity like Spark, TensorFlow, Caffe. All of these are machine learning libraries but what Clipper does, it provides an abstraction where you could deploy these models at the same time receive events from these different applications. It helps to build the pipeline and making this workflow a reality by bridging the gaps of deployment and live systems. There are many more but these two are quite interesting according to me. So that was, in short, an introduction to recommendation systems with Spark and Scala. But as this slide says, sometimes it's not always about algorithms or data because without the right data you're not going to be making recommendations on intuition and one needs to pay attention to that. I won't stop at the Dilbert but this is how you can improve it. To add or make the recommendation system more robust, you would add serendipity to it. When I say serendipity, it's like you're kind of making a random recommendation which has not been done before and just tested out and see if users like it. It's a product that's never been recommended to that person and you would say, okay, let's test it out. It is not a bias effect because that's one of the challenges of a recommendation system that it's a self-fulfilling prophecy. You keep recommending the popular products and the popular products get bought more and then it's a constant loop. So to break that monotomy or break that bias, you would introduce this concept of serendipity. Remember that it is temporal so recommendation engines are not something that you deploy and forget. It has to be constantly tuned, checked, feedback has to be captured. You have to perform A-B tests because two models might perform differently so make sure that the systems have the opportunity to have A-B testing in life. And last but not the least, have feedback into your system. So that was it. Thank you. Thank you everyone for your time and you can reach me on Twitter or you can email me at this email address. Yes. Question number one. With initial training, how much data do you use for cross-validation and for test in percentage roughly? 70-30. Normally, yeah. The split is 70-30. How much data is very subjective? Yeah, yeah. Whatever data we can get, how much of a historical data we can get, it's good because then your system can be more robust. But one to two years of data is a good measure to make robust recommendation build better models. And 70-30, 80-20 is quite a good one. And second question. Do you take into account the parameters which doesn't belong directly to the item like moon phase or whether, you know, it can affect the choice of user when he or she picks the musical save or movies, especially information from IMDb. Correct. They try to do this, but I don't know the result. Do you use this or something which is not directly relevant to the item itself? I would say so because that was the first one I was talking about, hybrid and context. People are trying to make more and more sophisticated models. They're trying to observe the mood of the person. They're trying to observe what type of environment they stay like housing, everything. That's why data is so important. Data is money. They try to collect as much information about a person to feed into these kind of systems because the better data that you have or more data you have, better recommendation systems or better algorithmic models can be built out of it. So yes, I would recommend taking those parameters into it. Thank you. Any other questions? My question is, how do you effectively test the performance of one combination system or whether there is some kind of benchmark which is purposefully built just to test the effectiveness of the... There are two stages to this. One is before you move it into production. So that's when cross-validation and hyper-tuning and all those things will come into picture. And the second one is actually capturing it from live systems. So as I said, events based on user responses to the recommendations and then feeding it back to the system. So both of these play a critical role to answer the question. Why are you using a skull? Why not why? As I said, it's a very difficult question to answer. So actually I wanted to start the talk by saying that I'm not here to preach Scala or Python or this is better than the other. It's a preference. But to be fair enough, I mean, Spark is built on Scala. And Spark is such a rich ecosystem for big data analytics, big data engineering, big data analytics. It just makes it fast and simpler to use Scala. And once you start using Scala, you realize how rich the ecosystem is. You can rely on the Java ecosystem as well whenever there's a lack of support, a lack of library. You can execute Java code as well. So these are the pros. And the cons is that some of the algorithms are more mature in R or Python or the other libraries. So it's a very difficult question to answer. And that's why the industry is trying to make sure that no one group is left out. All the users get support by having these intermediate solutions that can talk to each other. Is there any library only on Scala and not implemented on Python? No. No. So in the Python, you have the second. The commodity is Scala. Correct. But the performance in Scala is much better. I mean, at the end of the day, it's going to run on JVM. It's a Java virtual machine has been there for the last 25, 30 years. It's been tested, proven everywhere. The speed is just going to be much, much better because it's a compile code versus Python. When we use data frames, Thomas should be the same. I think if your question is specifically about Spark, is it specifically about Spark? So if it's about Spark, then Spark 2 is the answer to that. Previously in Spark, one-dot versions, Scala was better performing. So any, let's say, API that you would use on Spark, Scala would perform better compared to PySpark. But with Spark 2, what they've done is they have provided new APIs. They've removed the concept of RDDs, which allows similar kind of performance on Python, R, and Scala. So they're trying to provide similar performance on all these languages on Spark. With Spark 2, you would get similar performance. So you mentioned this production system, Spark 2 and your models. Did you use them for your own cases, or did you load something else? What we are doing is we are building... So Qubita platform actually is this end-to-end pipeline. So we have built a platform that has abstracted this into one big black box. And our customers normally use it for building vertical apps on it. So the productionizing or time-to-market is faster. So then I guess you write a lot of code in Scala. This is what Scala is called. Because in the demo that you showed, it's really just... Yeah, the demo was just... Yeah, correct. What we use Scala primarily is for backend. So we use Aaka. We do a lot of distributed computing work. So we use Kafka primarily, and we have a backend that allows communication to happen using Aaka. And secondly, we have Spark jobs, which process terabytes of data. So we have a strong application built in Spark to handle these kind of jobs, as well as in the future, we'll be building a machine learning pipeline, again, completely on Scala. So I have a follow-up question there. So what is your model of running Spark context? Because remember when I was dealing with that, there was this issue that you could have only one Spark context with JVM, and if you have this kind of a long-running thing that can answer online queries, which Spark is not really suited for, but still it's kind of tricky. So how do you go around? Yeah, I guess we still have this issue. We have to have multiple Spark jobs for that. But there are some solutions out there. So we are exploring Levy, the REST interface to Spark jobs. So that allows you to have a REST RPC kind of a procedure called toSpark. And with Spark 2.0, we have better APIs to manage the data frames as well. So you don't really have to build a Spark submit or a Spark application, but you could literally use a play framework to build your application context, and you could have multiple contexts running at the same time. To your request, for example, if you have a... If you want to recommend something on the website, you have this request for the product page, and you want to serve a list of recommendations on the page. So this request, will it hit Spark or will it hit some other database that... Yeah, the actual interface or let's say the final predicted outputs would not be on Spark. It would be on another persistence store. We use Elasticsearch. We could use any other in-memory system, for example, from where the recommendations will actually go out. So the application that is querying for recommendations would not directly interact with Spark. That's how we have designed the system. Spark processing is a page processing. In a way, yes. I mean, it is batch processing because we are doing offline analytics on the historical data. We also want to do Spark streaming, which is where I think the real-time interaction will come. But even then, it is either going to be a Kafka topic or Elasticsearch, but we will not be using Spark as the interface to the application layer. Yep. If you have a series of Spark batch jobs, do you use something to monitor them to manage the schedule of different jobs or to restart their own jobs? Our team is smiling here because this is exactly what we have done. So we have a monitoring service in the platform that monitors the jobs, the ongoing Spark jobs. We can kill them. We can schedule them. So internal tools who are used? Yeah, we have to build it. So we use Kala to build it. And so we have built our own custom solution to monitor as well as the architecture is completely decoupled from each of the modules. So we use Kafka mainly to decouple all the systems. And we use Kafka again to even monitor as well as break or schedule or cancel these Spark jobs. Yep. So you described your infrastructure, serial infrastructure, and how did you solve any performance issues? Do you have any optimization settings or something? I mean, we started this journey three years ago. Okay, we have deployed in customers like 20, 30, 40, 50 nodes. So we have customers that is running the 30 nodes on AWS, for example. So the architecture. And our journey started three years ago. When we started building what we call Kubota today, we started three years ago. So we have gone through a lot of iterations, a lot of pain in making sure things work in production. So most of the times we had to face troubles with Spark. But what we have done is the biggest learning for us has been decoupling. Basically what we used to do is have a very tight coupling to make the development faster. Today we really explore Kafka for a lot of our use cases because what we are trying to do is an actor model kind of a development environment. So all our backend is based on Aka where you're just sending messages for jobs for various use cases which are then picked up by different actors and again processed and then we receive it over a Kafka channel and this has helped us to scale. I mean LinkedIn developed Kafka and it ran on I guess a few 10,000 nodes of clusters. So we are very dependent on Kafka and decoupling for scaling. Please, please. Sorry. Do you currently use machinery libraries provided by Spark ML? We have both. So we have a data science team. We also partner with a few data science teams. We are actually a company that's headquartered in Switzerland. The parent group is headquartered in Switzerland. So we have a lot of data science teams working with us. And we build our own models for customers and use existing Spark libraries. So the model itself is already written in Scala. Yeah. The one that I just showed, it's a black box. You can throw in let's say the parameters or the data to it. But to get the best result out of it you need to use hyper, you know, like parameters tuning or hyper tuning of your parameters. The performance wouldn't be optimal. And that comes only with experience because black boxes will exist. I mean, as you said, you can push something into it. But to really understand the performance or the optimal performance, one needs to have experience, context, everything. To incorporate new data, not for the recommendations, you have to run in your system, you have to run all the supply line again for scratch or if there's some way to take some kind of update of the beta and lambda. If you see here, like, there are two kinds of feedbacks that go into it. So one is into historical data. One is into the algorithmic layer. So it's already bypassing the historical. So typically what happens when you build these kind of systems, you'll have an offline and a real-time, the lambda architecture. And you would want to use both to provide the best recommendations possible because you need to capture the events, which means your model can be trained on live feedback. Normally, you wouldn't retrain your model. What you would do is, you would change your weights in the model. So you'll have weights. And these weights would be changed. If you have neural networks, you'll have your weights changed just before deployment so that you don't have to constantly tweak. And then if the, let's say, the performance is not so good, then you would come back and then go back to the data scientist and say, okay, we need newer models or can we do the analytics again on new data? So a bunch of questions have been to elastic search. Do you use elastic search in the persistence store or? Mainly for analytics. We use MongoDB for persistence. We use, we have a very, let's say, many different storage systems in our platform. So we use Postgres. We use MongoDB. We use Elasticsearch. We use Hive. We want to use Hedgepace. No, Elasticsearch is the event store because you would want it to be fast. MongoDB would not give you the performance. It's a JSON. Eventually all the output is a JSON. When you want to have a JSON transformation or JSON computation in real time, then you need to have an intermediate cache or an intermediate computation engine in between your analytics layer and your application layer. Else you would be relying on Spark. What we are exploring is how we can use Spark streaming as one option for these kind of real time because Spark provides excellent JSON libraries. We don't need to, let's say, reinvent the wheel. How will you proceed from Spark streaming when you just restart it? Sorry, I didn't get the question. How will you proceed from the offset of the Kafka data when you're streaming and if you want to change the algorithm? Yeah, yeah. For that, we haven't yet explored that. What we are thinking of is Kafka persistence so that we can store these offsets using an ACAP assistance layer. We haven't yet developed that. Good question. Anymore? Okay. Thank you.