 Thank you for coming through this afternoon, my first time in Madrid and first time at Big Data Spain, so really excited to be here today. Today we'll be talking about productionising machine learning pipelines with the portable format for analytics. So I'm Nick Pantryth, MLNIC on Twitter and GitHub, I'm a principal engineer at IBM where I work for the Centre for Open Source Data and AI Technologies, or CODE, I'll talk a little bit about that team in a moment. I focus on machine learning, data science and AI applications, and I have a long history in the Apache Spark project where I'm a committer and PMC member of that project, I've written a book on machine learning with Spark, a little bit out of date now, and travelled around the world speaking at various conferences and meetups on machine learning related topics. So I mentioned the CODE team within IBM and when I joined this team it was known as the Spark Technology Centre, and was founded by IBM to focus on the Apache Spark project and the kind of surrounding ecosystem, and it's a purely open source team working out in open source in the community, everything we do is in the open, and more recently the focus of the team really evolved to encompass not just Apache Spark, but the entire end-to-end enterprise AI life cycle and all the technologies that are part of that. So we aim to make it easier to create, deploy and manage AI applications in the enterprise, and that involves focusing on still on projects like Apache Spark, but increasingly the Python data science stack, including pandas like it learn, the deep learning libraries, TensorFlow, QRS, PyTorch, CAFE and so on, and a couple of other projects, the model asset exchange and the fabric for deep learning, which I'll mention just at the end, but today we'll talk about one of the other important aspects of this life cycle that we focus on, and that's model deployments, and in particular open standards for model deployment. So the machine learning workflow is, as we all know, really simple. You start with data, you do some machine learning, and you profit, right? You can all go home and we've done our jobs, but in reality this is a really complex workflow, and it spans teams, so you do start with data, that's true, but that data can be sitting around in various data stores across formats, some of it is historical, some of it is arriving in real time in a sort of streaming fashion, and that's normally the domain of your data engineers, who are managing various data stores, metadata, cleansing, all of that kind of thing. You then need to ingest that data in various formats into the core data science and machine learning and research workflow, which is all about starting with data, pre-processing, visualization, exploratory analysis, trying to find out what are the properties of this data and what can we use it for, how can we join different disparate datasets together, then moving to the feature engineering and feature extraction, and then into the part of the workflow that everybody obviously enjoys the most, which is your model training and development and model selection, and that's a sort of workflow within a workflow, constantly tuning different types of models and pipelines, and selecting the one that is going to perform the best on the particular business problem or use case that we're dealing with. And then we hit the end of that piece of the workflow, and that's where a lot of the typical discourse on this topic stops, okay, once you've got a model you're done, but really you need to deploy that model into some production environment, into some application to do anything useful with it, and that is normally the domain of your machine learning and production engineers who are running live systems 24-7, high availability, high throughput, they care about performance and uptime, so as we'll discuss a little bit later, this is not just about deploying machine learning models, but actually machine learning pipelines, you have to worry about versioning, lineage, tracking, all the input features and parameters that went into that model that's deployed, and then once it's in the live system, predict on new data, monitor the performance of the model, and gather feedback, which then goes back into your data store effectively, that feedback is used to retrain models or update models, and that is again arriving in real time and stored in your batch systems. So this workflow has a lot of moving parts, it's really much more complex than the common perception, and it involves all these disparate teams working together with very different goals in mind, and that workflow, and in particular the data science kind of sub workflow spans tools, so every data scientist, machine learning engineer, researcher and so on has their own favorite tool, whether it's R, Python, deep learning, Spark, they're all dealing with a wide variety of data formats, so the data is sitting around all kinds of formats and needs to be ingested into these different frameworks, the different ways of approaching model selection, cross-validation, even building pipelines, and all of these tools are typically in play and in use in any given scenario and any given organization and team, and finally that entire workflow is only a small piece of the puzzle, so I really like this image, which is from a Google paper, the hidden technical depth of machine learning, and it really illustrates that machine learning code, while critical to the intelligent application and to the business use case and adding a lot of value, is really just a small percentage of the overall picture, and in particular, we need to worry about how is that code integrating to the serving infrastructure, how are we getting it from that training to production and deployment, which is what we're going to talk about today, machine learning deployment. So machine learning deployment covers a few questions, three important ones, which is what are you deploying, what is the model that we all talk about all the time, where are you deploying it, what is that target environment, the runtime, is it for batch inference, streaming inference, real time, a combination, and how are you deploying it, this is the DevOps deployment mechanism, exactly what is the details about how you taking that model and putting into production and serving frameworks. So all of these are important questions, but we're really going to focus more on the what today, and perhaps a little bit about the how. So what is a model? Everyone talks about machine learning models, and when we say a model, typically what we think about is okay, I've trained a logistic regression, and that's my model, right? That's my algorithm, or it's my deep learning architecture, my neural net, my image classifier. But really, almost no machine learning algorithm actually takes raw input and spits out a result, they need to operate effectively on vectors or tensors, so they need to operate on a numerical representation. And even though the common perception of deep learning is that it can operate on raw data and there's no feature engineering involved, that's also not true, because even in the simple case of image classification, you typically need to do some pre-processing to that image, resizing, cropping, and so on. So there's always this pre-processing step, which has to take some form of raw data and raw features in the form of categorical variables, text, raw images, and pre-process it and transform it into a form that is actually usable to train the model. But before we've even done that, we've actually got to start right back at the beginning from that raw data and joining those disparate data sets together, transforming the raw data and extracting features from the data, using feature extraction techniques to combine features and create new features. So there's a whole set of steps that come before that model training. So at training phase, we need to apply all of those steps to the raw data to get to the model. We then make predictions and update the model weights using some optimization technique until it fits the data really well. Now, if we don't apply those exact same steps at prediction time, then we're not going to have any model that is useful. It's going to be complete garbage, garbage in garbage out. And if we're not following that exact same set of steps and transformations that come before prediction, we're not going to be successful. So deploying a model is all about deploying a pipeline and not just a machine learning model or a trained machine learning algorithm. So deploying the model part is just simply not enough. That entire pipeline must be deployed. So the data transformations that came right at the beginning on raw data, all the feature extraction, feature engineering and combinations, the pre-processing steps, of course, the machine learning model itself, which is important. And then often overlooked is actually the inference result transformation or prediction transformation. So typically a business user doesn't really care about tensors that don't know what a vector of numbers may or may not mean. That vector of, let's say, class probabilities and predictions in a classification problem needs to be applied. You need an inverse transform applied to get that into that kind of domain knowledge or that raw data space so that you can associate, for example, the actual class labels with those numbers and do something useful. So that entire pipeline needs to be deployed as one unit. And it needs to be consistent between training and test time or training and inference time. And even ETL pieces of that process and ETL steps in that pipeline are actually a part of it. So fortunately for us, as practitioners, we have pipelines available in many machine learning frameworks, you know, Cyclo and SparkML are prominent, also TensorFlow and R. And these frameworks make it really much easier for us to chain these pieces together, train them on data, and then get back a kind of unit that we can then deploy. So all of those frameworks make it a little bit easier to deploy pipelines. But we still have many challenges, right? So we need to bridge all of these gaps between languages and frameworks. The data scientists and machine learning researchers may be using Python R. They do a lot of things in notebooks. Maybe they're using a bit of, you know, Scala with Spark. But your production environment often wants to use a compiled language for performance, so Scala, Java, C, Go, Rust, perhaps. So frameworks, I mean, there are too many to count, and you typically, in any medium to large organization, going to see all of them. And everyone wants to use their latest and greatest version and their latest and greatest framework. You're going to have to manage all the dependencies of these different frameworks and components and pieces and versions of both your frameworks and your models. And the performance characteristics can really vary widely across these dimensions. So a model in R, or Python, raw Python, kind of maybe not that performance. But it may be extremely performant if it's wrapping C++ or TensorFlow code or PyTorch or something like that on GPUs. So it's really difficult to reason about what is the performance of each framework and each machine learning pipeline going to be upfront. So then in the same way that you have to bridge all these languages and frameworks, you have to bridge these teams that we saw earlier. And each team has different objectives and different needs and goals. So data scientists and researchers always want the latest and greatest. They may want to be deploying models off the TensorFlow master branch because it's got some new version of an RNN or it's got some new transformation or some performance improvement. Production engineers really don't like that. They care about having control, stability, minimizing changes they don't like to be working up to in the morning with a pager because the server's on fire. And the business user doesn't care about the details when it comes to the technical details. So they don't care that you're using TensorFlow, PyTorch Keras, whether it's running on Kubernetes, all they care about is that it's up, it's working, and what is the impact on the business? What are the metrics that we care about? So in the same way that we have to have this proliferation of languages and frameworks, where that comes formats. Each framework has its own format. So they are sort of open source but non-standard. You can go and look at what they are, MLEAP, Spark, TensorFlow, Keras, PyTorch. They all have their own internal format for representing models for saving them, for saving pipelines for exporting them. You, of course, have proprietary formats, which may be very performant, but they completely lock you into one particular product. And then we have open source, open standard, which is what I'll be talking about today, including PFA. So this lack of standardization leads inevitably to custom solutions. There's always custom code. And where some standards do exist, often the limitations of those standards lead to custom extensions. And the minute you extend a standard with some custom component or plug-in that is non-standard, you lose all the benefits of standardization. And there's no point in doing it in the first place. So this is a general talk on ML pipelines, but Spark is important here because it sort of sparked the work, so to speak, that we did in PFA export. So Spark solves many problems. And this can, to some extent, apply to other pipeline frameworks, like CyclicLearn, for example. In that each component within Spark, Spark data frames, Spark ML, streaming components, allow you to use one holistic system to manage all of these different parts of the pipeline. And all of the different teams can use one API, effectively, one set of APIs, which is really great. But it introduces additional challenges. And again, these are not completely unique to Spark, but certainly exacerbated. So the main problem with trying to score models built with Spark ML is that there's a tight coupling to the Spark runtime. So that introduces two problems. The first is you have to manage a lot of complex dependencies. So in order to score a model, you have to actually go and deploy a full Spark deployment or instance, which, yes, it may be running on one instance locally as Spark local. But you still have to worry about a lot of compatibility, versioning issues, and a lot of dependencies that are potential dependency conflicts that are pulled in for no particularly good reason. And probably more important, that scoring models for Spark in real time is really slow. So it's optimized for batch scoring and for scoring over data frames or streaming. I mean, this is changing a little bit with the new streaming optimizations in Spark. But certainly the overhead of data frames with respect to task scheduling and query planning are just not suitable to real time. So if you need a few milliseconds to up to maybe a few hundred milliseconds of latency, you're just not going to achieve that. And then in addition to that, you have to do a lot of custom work just to get Spark models out of Spark. So if you want to avoid these, you have to export those models into some sort of custom format, using custom export formats or readers. And you then have to either export that saved model into a custom machine learning framework for scoring. Or you have to write some sort of wrapper or converter into something like scikit-learn if you can. So everything ends up being custom, everything ends up being slow, and there's a lot of complexity. So before we talk about open standards, I'll mention one solution to this, which is becoming very popular. That's containers. And containers have seen a lot of success and wide adoption in software deployment in general and certainly for machine learning software. So it provides you a lot of benefits. You get this repeatability and ease of configuration in theory that if something works in Docker on your laptop, it'll work in production. And it allows you to have the separation of concerns between data scientists and production engineers. So the data scientists can focus on their favorite framework, what is the model that they're training, what is the what of the model. And the production engineers, the DevOps environment running Q-clusters and all of that, can focus on how, how it is deployed and managed. And each one doesn't really have to worry about the other. But that's not entirely true, because what goes in the container is still the most important thing. So as I mentioned before, performance can be really variable across these different dimensions of language framework, version, et cetera. And to do this correctly still requires good DevOps knowledge, deployment pipelines, continuous integration and testing, good practices. And most importantly, it doesn't solve this issue of standardization. So if you can throw anything into a container and throw it over the wall, that's all very well for easing deployment from a technical perspective, OK, you just push a container onto your Q-cluster. But it doesn't tell you how to consume that, you know, that new microservice in your business application. There's no standardization that makes that easy. So it ends up, you know, you end up having to go and inspect what's running in that Docker container to figure out how do I even call this model, what format, input format does it expect, what does it give me back. So there's no standardization with respect to the formats and what's in the container or the API is exposed. So you still have to write or use some complex serving framework on the top of that. So on the formats issue and problem and the aspect of standardizing model formats, I'll talk about the portable format for analytics as a solution to this. So PFA is being championed by the daily mining group, of which IBM is a founding member. And the DMG previously created PMML, which is the predictive model markup language. That's an XML based standard, which is arguably the only real kind of viable sort of open standard in use today. And it has wider adoption. It's used for many traditional machine learning applications. But it does have many, many limitations. And the key of the limitation of PMML is that if something is not in the spec, if a certain model or transformation that you want to use is not baked into the spec, you simply can't use it. It's very difficult to extend things in PMML without having a custom extension. And the minute you have a custom extension, again, you lose all that benefit of having a standard. So PFA was created by the members of the DMG specifically to address these shortcomings. And it's aimed to be a much sort of more flexible version of PMML in many ways. So it consists of JSON as a serialization format. So instead of XML, probably a little bit more modern JSON formats, it uses Avro schemers for all type definitions. So to define the input and output data types that you want to work with, and that your model works with, as well as the inputs to the input types to various function calls, it's all done with Avro, so it's all standardized, very flexible and extensible. And what PFA does is it codes the functions called actions that are applied to each input to create the output. So that's using a set of built-in functions and language constructs. So it acts like a sort of mini language, as well as the ability to combine various built-in functions together in user-defined functions. So essentially, you can think of PFA as a mini functional math language with a schema specification attached to it, a type system. So because you have this type system, and it's effectively sort of strongly typed, it means that any valid PFA document can be loaded by a compliance scoring engine. And it can be verified at load time that it will execute correctly and give you the correct result. So you may have certain runtime errors, but you eliminate a lot of the potential for runtime errors using this approach. So what this gives you is true portability, true separation of concerns from the model producer that is exporting that model from the pipeline, and the model consumer, which is your production environment or your API, your business application. So it's portable across languages, frameworks, runtimes, versions. If you have a compliance scoring engine in PFA, you don't care where the PFA model or document, as you call it, came from, as long as it meets the spec you can execute it. So I'll give you a simple example of multi-class logistic regression. So the key components here is we want to specify the schema, so the input and output that this model operates on, and then the action, which is the core model logic. So as you can see here on the top left there, you have an input and an output. And in this case, the input is a double array, so a feature vector. And the output is just a number, a double, and this would be the predicted class. And on the right, we have the action. And the JSON representation is really meant to be human, operated on by machines. But it is human readable. It's a little bit verbose. But essentially what it's doing is kind of very similar to what you might do in Python code. You at first do a vector dot product and add a bias, which is exactly what the model dot reg dot linear function actually does. It takes that input vector, which is referenced as an array, and the other argument to that function is a cell, which is the model coefficients. Now in the cell, we'll talk about it in the next slide, the cell is a form of data storage. So the cell there is storing the coefficient vector and the bias. So it performs that dot product. It's a syntactic sugar, and a little helper function to do that for us. We could actually break it down into individual functions if we wanted. And once you've got that, we do an argmax on that vector, and we end up with the predicted class. So while a little bit more verbose, it looks very similar to what we would actually do in a language like Python. So I mentioned cells, and what are they? So in PFA, the data storage is specified by cells and pools. Cells are a lot more common. So a cell is effectively a named value. You can think of it as a key value pair. And it acts as a sort of global variable during the execution of each PFA document. So each time you do an execution or run of that PFA document, it'll perform that set of actions that we saw before on the input, and it'll return an output. And at the start of each run, that cell value is effectively immutable between runs. So that cell value is available as a sort of fixed value at the beginning of the run. During the course of execution of that action, you can update the cell value. So it is immutable within an action, within a specific execution of a PFA document. So in this case, here's an example of, let's say we're doing text processing, and we want to convert text into a bag of words or something like that. What we need is a vocabulary mapping. So we need to map the actual each term or word to an index in a vector. And for that, we want to use a map. And we can see here that we've got a vocab mapping cell, and it has a type definition, which in this case is going to be an Avro record. And the record has a field, which is then a map. And it's a map from string to integer. It also has an initialization value, and that init value at the top there, which is circled, would be the state. So cells are a way to store the model state, whether it's coefficients or vocabulary mappings or vector normalization values. Anything that we want to typically use in a machine learning pipeline or analytic pipeline that has states that we need to apply is stored in cells. Pools are a very similar concept, but they are actually mutable across action execution. So you can think of them as very similar to cells, but they're like a database field that can be updated in a kind of atomic fashion, and they're shared across all models and all executions of that model. So some other features of PFA are the special forms. And that is effectively making PFA act much more like a little mini-language. So you have control structure, loops and conditionals. You can create local variables and update them. You can have user defined functions that you can create your own arbitrary functions using any of the built-in functions effectively, casts, null checks, some very basic error handling, but extremely basic. And it has a comprehensive built-in library. So on the right there, you see the built-in function library. And for example, your string processing is a big part of your typical feature engineering and extraction and machine learning. So it has comprehensive string processing functions, much like you would find in any language, like Python or R or Java. So what is the current status of PFA? The spec is sort of at the 1.0 level or status. It's still evolving, it's still young, but it is fairly well-rounded in terms of specification. A member of the DMG who is very involved in the initial creation of PFA, called Open Data Group, has created reference implementations in the Hadrian project. So that covers Python, R and Java. So it provides export and domain-specific language in Python and R and covers scoring in Python, R and the JVM. So what does PFA do well? It really has strong support for traditional machine learning. So as you saw in the previous slide, we've got a whole bunch of built-in functions that are really suited to your traditional analytic and machine learning applications. It allows control flow, flexibility and composability using a functional approach and user defined functions. And this type system allows strong verification at runtime. But there are limitations. So at the moment, there's no built-in support for generic vectors, so mixed, dense, and sparse, which I'll mention a little bit later. No support for generic tensors or any of the typical kind of deep learning operators that one needs in that space. And there's some open questions around industry usage and adoption, although that is growing, and as well as performance. So I mentioned Spark ML in particular earlier. And some of the shortcomings and issues around exporting Spark ML pipelines for production scoring. And it was this gap and this need that led us to create the ARTFARC project. So I'm from South Africa. And ARTFARC translates to an Earth pig. So it's a wild animal, kind of like an anteater. And ARTFARC is a bit of a play on words there. So this is PFA export for Spark ML pipelines. Now the Hadrian project has a JVM engine for scoring PFA, but not much support for actually exporting models or creating PFA documents. So we created in ARTFARC a Scarlet DSL for exporting PFA documents. It uses Avra for S and JSON for S. And then ARTFARC Spark ML component for using that DSL to export models from Spark to PFA, sorry. So on the right we see a recreation of that example that we saw before for multi-class logistic regression. And that's how it looks in the Scarlet DSL. So we define cells, we define what the input and output data types are, and we define what the action is using the DSL. So it starts to look a lot more like native Scarlet code and a lot more fluid. So some of the challenges that we ran into with Spark ML in particular, but that highlights some of the issues in ARTFARC, one of them is that we, I mentioned this generic vector support. So in Spark and similarly in kind of scikit-learn using NumPy scipy arrays, you can operate on dense and sparse vectors without worrying what they are. And it'll do the right thing if you have mixed vectors in operation like dot product or addition or something like that. PFA doesn't allow you to do that. So it results in really kind of complex and quite hairy workarounds and code and doesn't fully work. So that's a shortcoming that needs to be addressed. The question of combining the components into a pipeline is quite tricky. So we had to do a bit of work around that, trying to kind of match Spark's data frame operation as much as possible in the sense that it transforms rows of data effectively in a data frame. And it typically updates or appends columns. So we try to match that by taking Avro records, which is a very, very analogous to a kind of row with fields or columns and effectively copying that record and adding a field each time. So it becomes a bit verbose, but so far it's the best way we've found to do that. And then another issue is that Spark ML, once you've trained a model, that model itself has no knowledge of schema. So it has some functions that where you can pass in the input data frame, and you get out an output data frame, you can then analyze that data frame schema to figure out what's going on, what are the input and output types that you need to work on, and what are the columns involved. But that's fairly cumbersome and adds a few extra steps to export. That's not that convenient. So it would be nice if a Spark ML model was a kind of once trained, had a set type or had a set type or a way to introspect itself and know what are the input and output types that I've been trained on and provide that as an API. So ArtVarq is open source. We released it in June. And you can go and find it at github.com. Ford slash Coday Ford slash ArtVarq. The coverage at the moment covers almost all predictors in Spark ML, many feature transformers, pipeline support, equivalence tests between Spark and PFA, and some tests for the core DSL. We obviously need more help to round it out and to look beyond just Spark and looking at scikit-learn and other light GBM, XGBoost, and other popular frameworks and libraries. But it's interesting to see that it's starting to have increased adoption in the open source community. So Salesforce released Transmogrify, which is an auto ML toolkit for Spark ML. And for their local scoring and model export, they're using ArtVarq to do export to PFA and Hadrian to score those models. So it's interesting to see other projects starting to pick this up. And if you're interested, please go and check it out and give us some feedback. So I'll very briefly mention other open standards out there. PMML I mentioned before was the precursor in many ways to PFA by the data mining group. There's a lot of support out there for frameworks, scikit-learn, R, XGBoost, Light GBM, Spark ML, but it has the shortcoming that anything that cannot be represented in PMML can only be done by custom plugins. And then you lose the benefit of standardization. There's no point, really. MLEEP is a project which was developed by Combust ML, a startup to export Spark ML pipelines to their own format for model scoring. So it has really good performance and it has really good coverage, but it's really not a standard. So it's an open source format, sorry, but it's not a standard. So it forces this tight coupling between versions of the model producer and the model consumer. So every time that you will Spark framework, version has to be upgraded. So you have to upgrade your scoring engine in MLEEP, which is not the case for something like PFA. And finally, the open neural network exchange, or ONIX, is really interesting. It's a protocol, but buffer based format for exporting neural networks and it's focused on deep learning. So it has really good support for deep learning. In many ways, similar to PFA in the sense that it serializes the machine learning model and the state as well as the functions or operators that happen on that input to create the output. But at the moment, it has very poor support for traditional machine learning. So there is some support in the standard, but there's actually no support in the actual scoring engines out there. So this is something that may change, but at the moment for traditional machine learning or let's say non-deep learning machine learning, the support is limited. So finally, a quick note about scoring performance. We compared PFA with Spark and MLEEP and we did a comparison on a test dataset of 80,000 records with string indexing, effectively categorical feature indexing on 47 columns and then assembled those vectors together with some numerical columns and did a linear regression. So the Spark time per record was actually in 1.9 seconds, which is off the chart literally, so we didn't put it on. And we can see there that MLEEP is still a lot faster than PFA, so it's about almost three times as fast, but they're both still real time, less one minute, second or less. So this highlights that there is some work to be done in the PFA, which I'll mention briefly in the next slide. So in summary, PFA provides an open standard for model deployment, for serialization of analytics workflows, true portability across languages, run times, frameworks, and decoupling the model producer from the model consumer. And as such, it solves a significant pain point in the machine learning community. Certainly starting, we started with Spark ML, but this applies to everything out there, our scikit-learn, XGBoost, IGBM and all the others, but there are still risks. PFA is a young standard, it's still sort of gaining adoption as we saw production at scale. It has not really been tested, and there's certainly some open questions around performance. Can that scoring engine be optimized for PFA? And what about deep learning? It doesn't support the deep learning operators, so there are limitations there. There's no technical reason it can't be extended to support deep learning, it's just that it's not in the spec at the moment, and it's a significant amount of work to do and it hasn't been done yet, but that is still an open question. And finally, a standard has many, many positives, but one of the negatives is that it can move really slowly and it's kind of designed by committee. So once it's released, it can be quite difficult to change it. So finally, the future directions we've done in ArtVox, our initial work focusing on Spark ML, as I mentioned, we'd like to look at scikit-learn pipelines, XGBoost, IGBM, performance testing and adding more comprehensive performance tests, proposing generic implementations to the standard, so generic vector tensor support, some improvements to performance and schema definitions. And then finally, looking at PFA for deep learning, so can the spec be extended to encompass deep learning at GPU support, deep learning operators, tensor support? What is required there and is it feasible? And can it actually become a holistic standard to cover both traditional machine learning as well as deep learning? So thanks very much, I encourage you to go and check out codea.org to see all the other projects that our team has worked on, including the model asset exchange and the fabric for deep learning, a couple of links there. Don't really have time to talk about them now, but please go and check it out. Thanks very much. I think we might have time for one or two questions or... Is how do you import actually these models to some other languages or... And I didn't see any solutions for that. The question is, we're talking about export here, model export to PFA or some other format, but what about importing models? Well, the PFA itself can be used as an interchange format. It's not really designed for that, much like PMML, ONX and PFA are all, at the moment, really focused on the export phase. So the objective is to try and bridge the gap between all the different frameworks and languages and allow you to export them all to a standard format, which allows you to use one scoring engine to score them all. So at the moment, you could, hypothetically, and I mean, it's not that challenging, but you could export a model, let's say, from Spark into PFA and then load that PFA model into, let's say, Scikit-learn. It can be used in that manner, but it's not really designed to be used in that way. It's a short answer. So, you know, for data interchange or model interchange, I think that's not an open question. Even ONX has some goals in the future to allow, you know, to have that as an intermediate representation where, let's say, you could train a model in TensorFlow, export it into ONX, and then load it into PyTorch and do transfer learning, right? That's the goal, ultimately, but that stand is not there at that point. So a short answer, you know, there's no real solution out there that I'm aware of. PFA could be used for it. There's not necessarily a technical challenge, but it's not really designed for that. It's designed for exporting from, you know, it's kind of a many-to-one relationship, exporting from many frameworks into one standard format and then loading that into a standardized scoring engine and executing inference. Anything else? Great. Thank you very much.