 Hi, and welcome to this Open Source Summit North America presentation on Compare, Contrast, and Contribute Model Deployment Standards and how you can help. I'm Nick Ventrith. I'm MLNIC on Twitter and GitHub. I'm a principal engineer working at IBM's Center for Open Source Data and AI Technologies, where I work on machine learning and AI open source software. I'm an Apache Spark Committer and PMC member and author of Machine Learning with Spark. Joining me today is my colleague, Svetlana. Hello, everyone. I'm Svetlana Levitan. I was a senior software developer advocate, and very soon I will be open for new opportunities. And I've been working on PMML for many years, and today I'll be happy to tell you about it. And recently, I also work on ONIX. Thank you. So before we start, a little bit about CODE, or the Center for Open Source Data and AI Technologies, we're a team of over 30 open source developers within IBM, working on the enterprise AI lifecycle and open source. We advocate for and contribute to projects that are critical to IBM's data and AI offerings. And this includes the Python data science stack, open exchanges for data and deep learning models, deep learning and machine learning frameworks. Apache Spark is a big part of that, as well as AI FX libraries and what we'll be talking about today, which is model deployment and standards for deploying models. So we'll start with a discussion of why we need open standards for model deployment, and then walk through three commonly used and widely applied standards for machine learning serialization. And then we'll walk through a feature comparison of them and wrap up with ways that you can actually contribute to these open source projects. So we'll start with the machine learning workflow, which starts with data. We then analyze that data and typically, data doesn't arrive in a format that is amenable to machine learning. It doesn't arrive neatly packaged up as vectors and tensors and whatnot. So we then need to transform that data, pre-process it, extract features and convert it into a form that is amenable to feeding into a machine learning algorithm. We then train a model, and once we have a trained model, there's not much point in not actually deploying that model. We need to actually use it in the real world to do something useful. So the next step is to deploy the model and use it to predict on new data in a live environment. We then need to maintain the model and monitor it and ensure it remains doing the job that we wanted it to do. And then new data arrives as part of this process and in some cases the model is actually generating its new data. And that really closes this workflow, turns it back into a loop. So the workflow spans teams. The data side is the domain of our data engineers who are responsible for storing, securing data and providing access to the data to machine learning researchers and data scientists. We take the data into their data science workflow of analyzing pre-processing training models. And typically once a model is trained, it's often thrown over the wall to the production engineer and machine learning engineer. And they're responsible for model deployments and for monitoring and maintaining that model. And the workflow also spans tools. So for each of these phases, there are many, many different tools, various data formats, various ways of analyzing and visualizing data. There is frameworks and toolkits for creating machine learning pipelines and pre-processing steps as well as training models. And finally for deployment. And in any even small to medium-sized but certainly large-sized machine learning and data science team, you will pretty much be using almost all of these frameworks and tools. And you're required to actually support almost all of them, both for training and for deployment. So if we think about model deployment, we need to really think about three questions. The what, the where and the how. And today we're mostly talking about the what, which is probably the most critical of these. So the what really refers to what do we mean by a model? When people talk about model, often they're talking about a trained algorithm, a machine learning model that's been trained on data and a set of weights that go with that. But in fact, there's a whole process as we've seen that comes before training the model. So we need to take that data, transform it, we need to extract features, we need to pre-process those features and only then can we train the model. And at prediction we need to perform the exact same steps in exactly the same order. So we still need to apply the same transformations, feature extraction and pre-processing before we can make a prediction. And if we don't use exactly the same steps, then we're going to end up with a situation where we have a model skew and we get effectively garbage results at prediction time. So as you've seen, we actually need to think about deploying full pipelines and not just model. We need to deploy the data transform, the feature extraction and pre-processing steps of course the machine learning or deep learning model itself. And finally, something that's actually often overlooked, we need to deploy the prediction transformation and post-processing that occurs after model inference. And if we think about this even actually ETL and the data steps and in some cases SQL style operations are in fact part of the pipeline because they feed the raw data and the features that we are using. So we've seen that there are many challenges involved. We need to manage and bridge all these languages, frameworks, dependencies, versions and performance can vary a lot along these dimensions. So if we have a model in R or Python, that could be very performance if perhaps it's written in an underlying C engine or TensorFlow or PyTorch or something or it could be very not performant and we could have an issue there and we actually don't really know just because it's written in Python or R or whatever the language is or the framework, you're updating versions can have an impact on performance in all of these situations. We also have friction between these teams of data scientists, machine learning engineers and the business side and we need to bridge all of these gaps between these organizational silos and teams. Each of these frameworks that we need to support tends to do its own thing in terms of exporting models and formats and even for open source frameworks while you can have a look at what the format is it tends to be very different from each other and this inevitably leads to many custom solutions and extensions. So to address this challenge, if we can standardize the formats then it allows us to take all of these different frameworks and toolkits export them to one standard format and once you've done that, we have the benefit of being able to optimize across a single stack. So we have a separation of concerns between the model producer and the model consumer. The model producer coming from one of these machine learning frameworks or deep learning frameworks doesn't need to care where the model is going to be deployed all it needs to care about is exporting to a standard format and likewise when we're executing or deploying in our production engine all we need to care about is that we are taking in a correctly formatted model in the standard format, we don't have to worry about where it came from. And then once we have a single stack in the standard format we can optimize the performance of that single stack. So it's great that we can have a standard and ideally that would be open source but there's a difference between kind of open source and open governance. The open source license is only one aspect of course we want a permissive open source license it allows us to inspect the code, modify the code, use it in any way we want but we often may not have control so even though it's open source on GitHub doesn't mean that it is completely open in the sense that we can exert some control over the decisions that happen in that project. So open governance is really critical that it avoids concentration of control in the hands of a few large vendors or companies which gives you clear visibility into the development process how decisions are made, strategic roadmaps and plans for the future. Of course there are downsides to a standard it needs a kind of critical mass and adoption to succeed and things can move slowly and you might have a design by committee issue here but overall open governance and an open standard brings far more benefit than these downsides. So next I'm going to hand over to Svetlana we will talk through the first of our three model deployment standards. Thank you, Nik. Yes, so let me talk, tell you about PMML. So back in the 1990s some people realized that there are challenges with model deployment and the data mining group was created. Data mining group is a group of companies working together on the open standard for machine learning model deployment. So in the first such standard that came out of data mining group was PMML, predictive model markup language. It was using XML format because that's what was the popular format in the 90s and since that time it has grown a lot and many people from different companies work together on this standard. Just this month we released version 4.4.1 and it has 17 different statistical and machine learning models plus many ways to combine models together with ensembles, compositions and a lot of support for data transformations. And I've been fortunate to work with PMML for many years and leading the recent releases of PMML. So what is PMML? How does the PMML document look like? It is an XML document with PMML as a root element and inside you can find the header, the data dictionary, transformation dictionary which is optional and one or more models. And then, well, the header would describe what your application was, copyright, etc. And the data dictionary would describe your data which must be structured data. And then, of course, transformation dictionary would describe data transformations and the models would describe the models. And inside each of the models you can find several different elements. The mining schema is a required element. This thing does not want to advance for me. Okay, and sorry. Yes, so mining schema is required. It tells us what are predictors, what is the target and then outputs can describe post-transformations of the data, of the predictions and there are several other optional elements that are very useful in many cases. And here we have an example of logistic regression model PMML which shows you that, yes, there is a header, data dictionary and then the regression model is described by the regression model element and there are coefficients and other information necessary for scoring the model. In general, there are 17 models, as I mentioned, in the latest releases of PMML in 4.4 release which came out last year, we added anomaly detection and we added quite a lot of information into the time series model that previously was only for exponential smoothing. And mining model provides ways to combine different models together into ensembles and model compositions. And here you can see a list of companies or open source packages that support PMML. This list is taken from the DMG website where you can find all the information about PMML. At IBM, we have a number of products that support PMML standard pretty well and you can see here pictures of IBM SPSS statistics which, by the way, recently released release 27 and IBM SPSS Modeler and you can export PMML from many models in those products and you can also score PMML in those products. And then open source, there is a JPML package maintained by Will Rusman from Estonia. It has support for PMML export for many different open source frameworks as well as some support for scoring but that is mostly paid. And there is also Naoka package for Skykit-learn and there is R package PMML and for scoring PMML there are open packages PMML4S which is in Scala and has Java API and PyPML is a wrapper for Python for that package. And here I wanted to tell you one short example of practical applications of PMML. So there was this big insurance company that built a random forest model in R and they wanted to use a very efficient in database scoring with Modeler and thanks to PMML we were able to do it very successfully and now back to Nick with the next format. So the next format we'll talk about is the Portable Format for Analytics or PFA. We've seen previously the data mining group released PMML which was very widely adopted but certainly had some issues in terms of flexibility. So essentially if whatever you wanted to represent in terms of the model or the data transform in part of the PMML standard there was not much way around that apart from creating a custom extension but unfortunately once you do that you lose the benefit of the standard because a scoring engine may not be aware of your custom extension. So PFA was also created by the data mining group in an attempt to overcome this challenge. So PFA at its heart uses a JSON format as opposed to XML and specifies the data types for inputs and outputs as well as the function arguments using Avro. So you can think of PFA as a type of many functional math language plus a schema specification. So the idea is to provide a lot more flexibility and to be able to effectively represent almost any analytic transform model or pre-processing step in PFA using essentially programming language constructs. So let's take a simple example as we saw before of logistic regression. So here we take an input vector X and we want to apply a function on that input vector to get an output which is going to be our prediction. Now in PFA we split up the computation into these various cells that we see here. So the input is specified as a vector in an array. We then have something called a cell in PFA and a cell is the way to represent data. So any model weights or dictionaries mapping from terms or words to indexes and things like that that we want to use in our model or our pipeline are represented in a cell. And here we have a model cell which has the parameters which are our weight vector and our bias vector for our linear model. We then apply a function to the input of the model cell and a function is built into PFA as a linear model function. We then apply another function softmax to the result and then we apply an argmax function and we get an output which is the predicted class. So if we look at how that might be represented in PFA we can see on the left here that we have a very simple representation of a PFA document which is a JSON object and we specify the inputs and outputs using average schemas. So the input is an array type with double contents and the output is a double. And on the right we specify what is known as an action to perform. So the action is what we saw is the set of functions that we applied to the input in order to get the output. So here we start from the inside and work our way outwards and the first function we apply is the linear model function model.reg for regression.linear and that takes two arguments. The first is the input which we saw is this vector, this double array and the second is that model cell which has containing the weights and the bias. To the output of that function we then apply our softmax function and then finally our argmax function. So as we saw in the previous slide that flow is exactly what's being represented here in JSON. There's some other features of PFA it supports control structures such as conditionals and various types of looping. You can create and manipulate local variables as well as very arbitrary user defined functions including anonymous functions i.e. lambdas. You can cast between types perform null checks and do some very basic error handling. And there's a very comprehensive built-in function library. So the basics are the types and manipulation of those types including numbers and numerical math functions string processing array and map functions statistical functions as well as linear algebra and good built-in support for various traditional machine learning models such as linear models clustering decision trees. Art and open source there's been quite a lot of interest in PFA and it has been taken up by a few companies. The original reference implementation was created by the open data group and that is called Hadrian FAE scoring in there was not much in the way of being able to actually create PFA using the JVM and in particular for Scala for Spark our group created a project called ArtVark which is PFA export for Spark command models and a couple of other projects Woken from which is for PFA export and validation and Salesforce actually in the auto AI library for Spark called Transmogrify uses PFA for scoring and in fact uses the ArtVark library for exporting the models from SparkML. So also to walk through a concrete use case where PFA was used this is the human brain project out of Switzerland a gentleman Lulewik Claude who we are Svetlana and myself have worked with before and this is a network of hospitals that need to each two updates of a model or various models locally in their own hospital effectively federated learning they're saving those model exports and the models that they want to share as PFA and then those PFA models are then shared with other hospitals that are able to actually analyze and benchmark and then use models in new diagnostics to help new patients and this is all done using PFA as an interchange standard as well as applying federated learning and privacy techniques for sensitive medical data so the current status of PFA as I mentioned there are reference implementations including Hadrian PFA is the parts that it does well are effectively the parts that are very similar to programming languages so because it tries to be most of many programming languages it gives a high degree of flexibility the type system and control flow user defined functions will allow you to do almost any analytic application or feature transform model that you can think of including stateful operations so you can actually create an update state and access external data sources and databases as part of the PFA standard it has good support for traditional machine learning models and operations but however there are some missing features the most critical of these is the lack of deep learning support so there's some basic support for your standard vectors and the linear algebra on single dimensional vectors and two-dimensional matrices but there's no deep learning related operators like convolutions LSTMs and other recurrent neural networks and no tensor support for higher dimensions and finally there are some open questions around how performance and scalable both the scoring engine and the PFA spec itself is for very large models and we've seen some industry usage and adoption as we saw but really at the moment the question is is there enough critical mass for PFA to really continue and grow so we mentioned deep learning and that's obviously a very important topic at the moment and I've seen a huge resurgence in recent years and the final model standard that we'll look at is highly related to deep learning and it's called the open neural network exchange or ONIX and as we saw proliferation of deep learning frameworks in the recent years has really led to each of them doing very different things in terms of being able to represent and serialize and export their models and to help solve this problem ONIX was created originally by Facebook and Microsoft in around September 2017 and the idea is to provide a standard format for exporting and representing deep learning computation graphs so effectively an intermediate representation for these computation graphs it's based on protocol buffers predominantly for efficiency and it covers mostly deep learning which is its focus but as we'll see a little bit of traditional machine learning too and it has grown very significantly to be used and worked on by many many companies and at the recent community meeting for ONIX we can see that some usage and engagement statistics were presented showing over the last year significant growth across the board so a lot of GitHub stars and GitHub forks indicating usage of the project published papers being used in research new models added to the model zoo and a lot more 20% higher level of contributors and 10% higher or 11% higher pull requests so it's a very active community a lot of different companies large and small ranging from Facebook, Microsoft, AWS IBM, Intel Nvidia all the way down to startups in the deep learning deep learning space so very active and widespread community the deep learning frameworks typically represent computation as a computation graph and we'll see an example on the left of a graph here so the graph represents the nodes in the graph are inputs and operations and as you see the green box represents an operation and the blue circle represent inputs and outputs so this is a matrix multiplication and it takes an input X and Y and it performs a matrix multiplication operation to a matrix output Z and the simplified protocol buffer representation of that is shown on the right where the top level object in an onyx in an onyx graph onyx protocol buffer is this graph object and the graph is made up of a set of nodes and the node as we can see specifies the inputs, the outputs, its name and what the operation is and then the actual inputs and outputs to the graph as types where the type is typically a primitive type such as a number or a string, brilliant and the main first class citizen with an onyx is an arbitrary dimension tensor so it's built from the ground up to support one, two, three, four, whatever dimensional tensors are required for deep learning because its focus is on deep learning and there is strong support within the deep learning frameworks, it's actually baked into PyTorch Cafe 2 as from version 1 strong support for TensorFlow in fact IBM research has played a role in creating the Converter there Keras, Apple Core ML, MXNet Cognitive Toolkit from Microsoft and many, many more now even deep learning models actually need to do some pre-processing and of course your traditional app lines typically involve a lot of feature engineering so to address this part a part of the spec was created called ONIX ML and this provides some support for traditional machine learning so it adds two additional types sequences and probably most importantly maps so it allows mappings for example from strings to integers or integers to strings which are useful for doing things like categorical feature and string and text encoding, one-hot encoding and so on and then it adds specific operators for traditional machine learning including vectorizers for dealing with numerical string data one-hot encoding and labeling coding for categorical features scaling normalization and scaling numerical feature vectors as well as some of your standard models linear models, support vector machines, trees and ensembles of trees so the Porto library for ONIX ML has also grown and is very active as the most most popular and active of these are for scikit-learn where over 60 operations or components within scikit-learn are supported LightGBM and XGBoost very popular gradient boosting libraries Apache Spark ML supports over 25 components Keras all layers and within Keras as well as anything that is a Tf custom layer that is supported by the Tf TensorFlow Converter for ONIX is also itself supported the best VM, Apple Core ML and some others so there's pretty good support in the exported community although of course there's always a lot of work to be done so if we look at the ONIX ecosystem we have the converter ecosystem from taking all of these frameworks and converting them into ONIX we have the ONIX spec and a bunch of models in the model zoo represented as ONIX we have the single stack at the moment which is the ONIX runtime also an open-source project from Microsoft tools for network visualization and many other runtimes that support ONIX including TensorFlow, PyTorch and the deep learning libraries as well as various accelerated libraries from providers such as Intel and Nvidia for doing computation on edge devices or specific hardware so finally for ONIX there's still a lot of work to be done it's a very active community ONIX ML is still fairly new and there's plenty of new operators and functions that need to be created the converter ecosystem has good coverage but there's a lot of work that could be done for example in Spark which only supports Python currently performance testing of ONIX versus PFA and PMML as well as the native libraries comparing all of these standards that we've discussed today Svetlana in fact has been running the leading training working group within ONIX adding the ability to train models to ONIX so there's a lot going on and a lot of interesting work to be done in the future so I'll hand it back over to Svetlana to wrap up with a comparison between all of these three standards that we've discussed Thank you Nick so let's compare and contrast various parts of those those three standards first of all about the format well PMML is using XML and PFA is using JSON and YAML those are human readable formats which is very good for model transparency and helps to improve best practices in model creation PMML is very good in containing all the model metadata PFA has some of it and ONIX doesn't really have much metadata in it model quality information is also very useful for model interpretation and visualization and that is presented very well in PMML but not as well in PFA and ONIX in terms of larger model support of course ONIX is the best because it's a binary format so a larger size of the model is not a problem for it and in terms of model visualization well as I said PMML has this model quality information which is helpful and ONIX has various open source visualizers for the model which is also very helpful and in terms of feature preprocessing PMML and PFA and ONIX all have pretty good support for that but in terms of string processing PMML and PFA are pretty good while ONIX is not as good as could be and there is a room for improvement for ONIX and with categorical feature encoding it's more or less okay of course for image processing ONIX is the only one that can currently deal with it although in PMML for next release we are discussing adding some support for image processing with deep learning but that will be a while before the standard and in terms of custom transformations we do have built-in functions very flexible both in PMML and PFA but ONIX does not have those yet and in terms of models themselves well many traditional machine learning models are well represented in all three standards while some such as time series may present problems for PFA and especially for ONIX and of course for deep learning ONIX is good so far and in terms of community that's where ONIX is shining so we do have the data mine group which is a group of companies it's kind of open governance but a company needs to be a member of the data mine group and pay some fees to actually work on the PMML and PFA while ONIX is totally open foundation AI and has a huge community as we just saw in previous slides and again there is a very large active developer community for ONIX and smaller communities for the other standards and converter ecosystems well for PMML there is a lot of converters for all kinds of systems and for ONIX there is even more for PFA I'm not so sure how can you contribute to those open standards well for PMML we have this issue tracking system at the website mantis.dmg.org and I invite you to check it out you can create new issues if you find problems with PMML and you can help to work on existing issues for PFA right now I'm not sure even how to contribute because we haven't had PFA meetings and DMG for a while but for ONIX if you go to its main web page on onnx.ai you can find all the information there and there is a big ONIX repository on github and many open issues there so you can go and look at them and I think maybe if we want to make ONIX ML more widely used perhaps we want to create an open source transformer from PMML to ONIX and maybe from PFA to ONIX as well so that would be a new open source project which if you are interested in doing it let's get in touch and discuss how we can do it well with that I will conclude with showing you some of the links that you can use you can find useful and please get in touch with us if you have any questions and thank you have a nice day