 All right, well, thanks for joining us for Escape the Data Dungeon, Unlock Scalable R Analytics and Machine Learning. And also thanks to the R Consortium for hosting this R database webinar. Let's first introduce ourselves. I'm Mark Hornick with Oracle Machine Learning Product Management. I've been involved with the R Consortium since its founding and with R more generally for around 15 years. I focused on in database machine learning since Oracle's acquisition of the company Thinking Machines in 1999. And more recently, also on Oracle Autonomous Database AI Features. Sherry? I'm Sherry LaMonica, a consulting member of technical staff in the Oracle Machine Learning product team. I work with internal and external customers and leveraging Oracle Machine Learning technologies. I've been working with R for 20 years and with Oracle Machine Learning for R since its beginning. All right. So Oracle is a supporter of the R community for over a decade now. And the R Consortium since its founding. To help R users work more seamlessly with their database data, we introduced Oracle Machine Learning for R in 2011. Then it was called Oracle R Enterprise. And with OML for R 2.0, Oracle continues to enhance the ability of R users to take advantage of powerful database capabilities, both with Oracle Database and Autonomous Database. We also maintain the R Oracle Database connectivity package and provide a redistribution of R for use with Oracle Database. Now today, we're going to start with the characterization of the data dungeon. Rather amusing analogy, but one that sometimes feels quite appropriate. We'll touch on the benefits of databases as a focal point for data access, organization, and analysis, as well as Autonomous Database, which eliminates so much of the overhead and complexity with using traditional database management systems. We'll then dive into a few use cases and demonstrations involving demand forecasting, customer churn, and product bundling before wrapping up. So on to the data dungeon. Now you have lots of amazing data, right? But it may be all over the place. Some of it's in spreadsheets, CSV files, departmental databases, cloud storage repositories, and even individual laptops. It's a challenge when there are so many data silos. You may feel as though your data is trapped in some sort of dungeon, requiring a lot of effort to figure out what you have and where you have it. If the data is large, moving that data from one place to another can become expensive, both in terms of time and money, feeling like your data is somehow locked up. Duplicating database data in local environments can also pose increased security risks. There are production deployment challenges for reliable data access. The need to integrate R engines with production applications and handling of more complex error situations, as well as overall production level scalability. Now, this is where databases and their data catalogs come in. They enable you to organize your data where it's easy to find, use, and make sense of, whether it's physically stored in the database or through external tables that reference data stored elsewhere. Combined with Oracle Machine Learning for R, you can more easily work with your database data with minimal data movement and easier solution deployment as we're going to see. And Autonomous Database offers those same benefits and more. So let's take a look at what that's all about. Now, while you're likely familiar with the Oracle database, you may be wondering, what's an autonomous database? Well, we're using the cloud to eliminate the complexity of data management while also providing a suite of integrated analytics tools, including support for R, all in the same platform. Traditionally, each database deployment was more or less unique where you needed to build, secure, repair, patch, and tune each database. And this was labor-intensive and generally not scalable. Now, Autonomous Database re-imagines Oracle Database for the cloud with automation of infrastructure as well as database and data center operations. And it takes care of database administration so you don't have to spend time and resources tuning your database, applying patches or updating software, among other database administrator tasks. So Autonomous Database helps you escape the data dungeon by allowing you to reach a wide range of data sources, data in multiple formats, generative AI, and other supporting tools. Now, of course, SQL is the predominant language used for accessing databases and manipulating data, but not everyone is a SQL expert. And using a native language interface like R makes working with database data that much more convenient. When you need data, you may end up importing data snapshots from Excel or CSV files. Some R users may have direct programmatic access to pull data from the database to the R client and push results back to the database. But this round trip can result in scalability issues, such as access latency and local memory limitations. Also, by definition, data snapshots are obsolete and often require refreshing. Data may have errors that require going back to the source for correction to re-retrieve that data. And these round trips, of course, take additional time. In addition, data privacy laws can require keeping data in secure systems or locations. So to help address these issues, our focus is on enabling in-database processing from R. And we're gonna highlight the use of OML for R from a few interfaces and how they can connect to Oracle Database and Autonomous Database. For R, we have the popular RStudio IDE and OML notebooks, which is built into Autonomous Database and supports R paragraphs. We can also use SQL Developer to invoke R from SQL. One of the deployment scenarios that we'll cover involves using REST endpoints for deployed models and invoking user-defined R functions in database spawned R engines. And we'll demonstrate that using Postman. Now, I suspect you're familiar with the typical machine learning workflow. We're gonna highlight aspects of each step, but on the deployment front, we'll highlight working with in-database models as well as native R models, illustrating these from R, SQL and REST. Now, the three scenarios we'll cover are demand forecasting using in-database time series exponential smoothing, customer churn using in-database classification algorithms and a native R algorithm, R part, and product bundling and recommendation using the in-database a priori association rules algorithm. Now, our first scenario involves aggregated call center interaction counts, with the goal of forecasting the volume of call center interactions or customer incidents. These incidents are categorized by the type of request like billing, coverage or policy question and the channel where they were received like chatbot, email or phone. Now, rather than build a single forecast model for all of this data, we wanna do this at a finer granularity. So we're going to use partitioned models to automate the building of a model on each partition of data, specified by one or more columns, like category and channel, but these are used and managed as a single top level model. Now, you can use partitioned models for classification, regression, clustering and other machine learning techniques as well as time series. In the demo notebook that we'll look at we'll also highlight the use of conda environments and that allows us to work with additional packages such as GG Plot 2 to customize the packages we want to work with. Now, the second scenario involves customer churn where we'll build in database models to predict likely churners. We'll see prediction details to understand why an individual prediction was made and deploy an in database model to OML services for access from REST endpoints. The second part of the churn example includes building a native R part model where we'll invoke it using what we call embedded R execution from R, SQL and REST. And the last example involves product bundling and recommendation. We have sales data on customer purchases. This will allow us to identify frequent item sets that can be used for candidate product bundles, product placement for online shopping or store layout as well as inventory management for co-occurring purchases. Then we'll use the association rules to rank products to recommend based on their support, confidence and lift. So let's move on to the demonstrations. Okay, so we're looking at the Oracle Machine Learning user interface on Autonomous Database and I'm just going to go into our notebooks listing. We're going to start, I'm going to log in, it logged me out automatically. And so we're going to go into the notebooks listing for demand forecasting. So this is our first notebook we're going to work with. Now I'm just going to run all of the paragraphs from the top and that's gonna start up the environment and continue to support all of the different paragraphs that we have here. To start off with though, for the initialization, we're going to look at the conda environment that we created that has a wide range of R packages in particular, the packages that we've added specifically that we wanted to use in this case, GG Plot 2. We see that here. We're then going to download this environment and activate it so we can use it in the notebook environment. And this allows those additional packages to be made available. We're then going to initialize ORE and the ORE D-Plyer packages by loading those. And then we're going to start off with accessing our data table. Now the table we're working with is called call data two. And using the ORE sync function, this gives us a database table proxy object that corresponds to an R data frame, but the data remains in the database. We see that its class is ORE frame, which is a subclass of the R data frame class and other proxy objects that we have available to us in the environment. Now, to give you a sense of how does a proxy object differ from a data frame, here we're looking at the structure of that. And one of the key things that we want to highlight is the data query. This corresponds to the query that one would use to access that table in the database. From the description, we see the columns and their types, as well as the table call data two that this corresponds to. Now, next we can view some of the records from call data two. Again, using the proxy object and the overloaded head function, we're retrieving some of those values and displaying them here in the table view of the notebook itself. Mark, what does Z.show do? Yeah, so Z.show is a function that's provided to allow us to take the results that we get from the R environment and map those into the notebook environment for displaying in this interface. Now, we also have other overloaded functions like distinct in this case for saying, well, what are the distinct categories and channels that we have? And next we're going to focus on ggplot2. So we'll load that library and we're going to pull the call data two data from the database into our memory and do a transformation on the date received column. Why do we need to pull the data to the client? Yeah, that's a great question. When we're using third party R packages, what we need to do is make the data available in R memory to use them. We don't change the underlying implementation of packages to work with database data. So if you're using a native package, that data has to be brought into R memory and that's where ORI pull allows us to very conveniently take that data from the database and bring it into an R data frame. Of course, you do have to be careful of volume of data that you're pulling because memory limitations certainly do apply. Okay, now using ggplot2, we're just going to do a few simple visualizations here just to illustrate that one can do that and we'll do a plot of the data using, in this case highlighting the number of calls received over time by the category and channel. Moving further to filter data, we're going to use the dplyr functionality again. We have incident categories, what we're going to get back and using this proxy object for call data two, we're going to filter group by summarize that data, essentially want to aggregate the counts for each category and associate those with the date. So when we build our time series model, the exponential smoothing model will have the data that we need. We're also going to do incidents all that does something similar but it also includes the category and channel and recall that we talked about the partitioned models. So when we build a partition model here, we'll have three partitions, one for each category. And in the case of the incidents all will have nine because there are three categories and three channels. We can also get a bar plot of incidents by day of week. So again, we're using dplyr on the incidents all and using a mutate, we're going to transform the date received by day and get the resulting plot here. Where is the computation occurring for the already dplyr functionality? That's part of the transparency layer that we have with OML for R. And so what that means is that these functions that are overloaded, they're actually translating those requests into corresponding SQL that is then executed in the database. So because we have incidents all here is a data frame proxy object, that is going to take the corresponding SQL for the mutate, the group by and the like and run that in the database and provide back to us another proxy object because we don't want to pull data to the client unless we absolutely have to to do certain analyses. Okay, so that brings us to modeling. And in this case, we want to build an in database exponential smoothing model using Holt Winters. And we're going to have a partition model here where we're going to build one sub model per category. How do we do that? Well, first we're going to delete any model that we have created previously and we have a number of settings that we can specify. Most notably, what's the prediction step? How many forecast periods do we want to have? And we're going to have four for that. The model is going to be Holt Winters. We have seasonality of 26. And this is where we're identifying the column that we want to partition the data on to form one sub model per category. And we're going to identify the model name here explicitly as a ESM incident forecast one. Now to invoke this model, we're using ORE ODM ESM to specify the formula, the data, which is a proxy object again, because this model build is going to occur in the database. So we just need a reference to the data that's there. And of course, the corresponding settings. Mark, what is the ODM in the model build function name and why are we getting a warning about the model build? Yeah, so the ODM is corresponding to the original name of the in database machine learning which was called Oracle Data Mining. And so we incorporated that into the names to highlight the fact that this is an in database algorithm that's being used. And in terms of the warning message, we see that it's saying, ORE does not manage this mining model's lifecycle. Well, that's because when we designed OML for R, we wanted it to mimic the behavior of a native R experience. So when you build models or you create data frames in R, those objects, when you terminate the R process, they disappear. There's nothing there unless you've explicitly saved them. And so we modeled that same behavior by saying if we don't specify a name for the model that uses a default name, that when you exit out of the R process, it's simply going to delete those in database models for you automatically. If you provide a name, then you assume the responsibility for explicitly deleting that model later. All right, so having built that model, we can look at the partitions from this and we see that we have the three sub models for billing issue, coverage question, and policy cancellation. And we could even build the second partition model both on category and channel. And the only thing that we need to change is the partition column name, in this case, category and channel. Again, we run that and we'll see that we have new partitions, nine and all, that have the corresponding category and channel. For those of you who have teams that are not just for R users, but also include SQL users, perhaps part of the IT organization or others, you also have access to this information through SQL tables. So in this case, we have the all mining models partitions and you can access this information not only from R, but also directly from SQL. Now, there are other aspects to the models that you might want to explore. What were the settings that were used to produce this model? There's the user mining model settings itself that we can create a proxy object for and explore the contents there. There are also other, what we call model detail views that allow us to get more insight as to what the in-database algorithm did when it produced this model. And in this case, we have global diagnostics and model quality metrics that are available per partition. So each of those partition models will have its own specific metrics and their corresponding values that you have access to. Now we can also list the model detail views that are available for the in-database model because there are several that are there. And this can be done through a select query asking from the user mining model views for the specific model that we have. In this case, the incident forecast one model and we see those that we could access at this point. The next thing is, since we built these forecast models, key is we want to see what are those specific forecasts. And for this, we're going to be using the specific model detail view to see what those forecasts are. We're going to work on the forecast proxy object and perform a mutate from D-plier, getting the date ID, the count, the forecast count and lower upper bound and arrange that sorting by partition and date ID. And so here we're seeing the result from that, but let's look at what the actual forecasts were. So here the count is NA because those are the four periods that we're predicting. And the forecast count, the upper and lower bounds. And if we wanted to do a quick visualization of the count and forecast count, we see that here. So at this point, I'm going to turn it over to Sherry and she's going to highlight how we can do very similar things through OML for R using a third party IDE, in this case, RStudio and leveraging the OML for R universal client. Sherry? Okay, so here is RStudio server and I'm going to connect to both my Oracle database and my Oracle autonomous database from here. It's called the universal client and it can connect to either one and you can work from this client. First, I'll load my packages. First, we'll create the Oracle database connection on-premises using my RQ user schema. And then Mark showed all of these commands run in the notebook and I'm just showing you that they can run from the client as well. So first I'll create a proxy object from the call data to table in my schema and then using the overloaded or a dplyr functionality I'm filtering my incidents by category. So I can see I get my category count and the date, the date received for each. And I can build an in database model from here. I'm first dropping the model if it already exists passing all of the model settings to a list and then using those settings to build the model using or a ODM ESM, the object that comes back is mod. And I can look at the partitions. Each partition for this model contains, each partition for this object contains its own model. So for the billing issue we can see that I've got the model call and the model settings. So we can disconnect from our Oracle database and create an autonomous database connection. So the first thing that I'm going to do is set the TNS admin environment variable to the location of my autonomous database wallet on this host. And then I'm gonna connect to the database using the database service level high. Sherry, could you explain what a wallet is? Yeah, the wallet is a, it's actually a file on this host, an encrypted file that is holding my OML user credentials. And so that's the reason that I don't actually need to pass my credentials in clear text here. I can just create the connection string that corresponds to the autonomous database service level that I want to use, in this case, high. Okay, so then I'm gonna create a proxy object for the ESM model view result, call that forecasts. And then I'm going to use ORI deep layer to filter the, to view the forecast for the billing issue category. And those are my results. And this is just to show you that, you know, you don't have to use OML notebooks, you can use a client. In this case, I'm using RStudio and you can connect to Oracle autonomous database or Oracle database on premises. Sherry, we have another question about if the ORI library is not available on CRAN, how do you get access to it? Yeah, and actually you can get it in the Oracle downloads, Oracle machine learning for our downloads. That will contain the script that you need to use to install OML for R upon Oracle database and the third party supporting packages. For earlier versions of the database, the script to actually install it is included with the database as well. So you'd wanna look at the installation instructions and make sure that you're following the correct instructions for your Oracle database version. Great, thanks Sherry. All right, so let's continue with the CRAN prediction example. So in this case, we're going to again, of course, import our libraries and do some data preparation. And in this, we're going to get a proxy object to the table that we're calling customer CRAN 45K, it has 45,000 rows in it. We're going to enable row indexing by assigning row names and then convert the churner column to a factor. Now, using standard R, we're going to generate the sampling indexes that we want to use and then create the train set and the test set using standard R syntax passing in the index that we created earlier. And you can see that the result of this, both of these objects are also proxy objects or reframe instances. And the training data has about 27,000 rows and test 18,000 rows. Now to highlight that, how does the in database table representation map to the R representation? I did a describe here using SQL because in the notebook environment, you can have paragraphs of all different types in the same notebook. You can have SQL, PL SQL, Python and R, of course, and markdown as well. And so here we're describing what this table looks like from the SQL perspective. And we can also describe this from the proxy object using the attribute here. And we see that customer ID is numeric and churn is a factor because we modified that up above and the remainder of the columns with their respective types. So moving on to modeling, we're going to build a random forest model. And how do we do that? Well, again, as we saw in the previous notebook, we can drop the model that we've created, RF churn model one. And we're going to use ORE ODM RF. Passing in the R formula is churn or is our target. We want to take out the customer ID. We're using our proxy object, customer churn 45K dot train, and that the model name is going to be RF churn model one. From there, we get the prediction. We're going to invoke the overload predict function, passing in our model proxy object because that's important to know. When we get finished building the model, it exists in our database schema as a first class database object. And we're actually being returned a proxy object to that in database model. And that's what we're supplying here in predict along with the proxy object for our database table result. And the supplemental columns is going to be is churner. Now, this is an important concept as well because when you have an R object, a data frame, the first row is always the first row. The second row is always the second row. However, when you're dealing with data in a database, the relational model is essentially unordered, right? There is no ordering unless you explicitly ask for that data to be ordered according to some key. And so rather than incur the overhead for that, we rely on this notion of having supplemental columns that we can then associate with the predictions that we get back in that same table. And we'll see a little bit about what that means shortly. But if we look at the proxy object itself, we have the call, of course, and the settings that were used to produce that model. And in some cases, these settings were automatically chosen by the in database algorithm because we certainly didn't specify all of those. Now, we can also build support factor machine models among others. And we're just highlighting two of the available models. So we'll again delete that model that we've perhaps built in the past. Using the ODMSVM function, we'll do basically the same thing we did before. One thing you'll notice is that the type is classification and that's because our SVM algorithm supports both classification and regression. We'll do the predict as before, but one additional item is to do the predict and ask for the top N attributes. And this allows us to return the most influential predictors that help us explain the prediction. And so we're gonna get those at the top two of those. In the case of the basic prediction result, we have is churner, the original, the actual target. And then we have the prediction that's being made for that. We have the probability of zero and the probability of one. So what is the positive case for is churner? Oh yeah, thanks, Jerry. So the positive case, of course, is going to be one in this case and the negative case being zero. But if we go further for the explanation of why given predictions were being made, we can also see that in this case, the first prediction, which is a negative outcome, that household size played a significant role. Or in the third one, that whether or not they purchased a certain application or didn't purchase it was influential and their education had a role in that as well. We'll look at this in another context shortly. If we wanted to evaluate the results of the model, we have the predictions that were made by the algorithm. And we can use the overloaded table function to compute an in database crosstab using proxy objects. So again, this computation is occurring in the database without having to move that data to the client. And here we see the resulting confusion matrix. There are other metrics, of course, that we want compute as well are a C curve, lift chart, probability densities, and these were all produced using R. Deployment, deployment is one of the key areas for how do we leverage R in enterprise applications? And if you have your data science team that is working from an R perspective, these in database models can then be used from SQL. So if you're handing these off to perhaps others in your IT environment or your application development team, they can leverage prediction operators that are available in the SQL language for the Oracle database and use the model objects that were created above because these are first class database objects, just like tables. And so we're going to get the prediction and the prediction details, which is going to, again, tell us what are the factors that most contributed to this prediction? So here we see that the first customer here is declared as is Turner, yes, and year's residence with the value of five and this corresponding weight is what most contributed to that prediction. Same thing with this product purchase, whether or not they did and if that the corresponding weight that that has as well. Now we can use this model, not only from SQL and R, but also from REST. And Sherry is going to show us how that works in Postman. Okay, so here is Postman. This is a REST client. The first thing that I need to do is to get an authentication token. And I do that by sending a post request to this token endpoint. I have saved my Oracle Machine Learning user credentials in this environment here in Postman. And that includes the OML URL, which I saved as a variable. And when I send that request, I will exchange my username and password for this access token, which will be good for an hour and it can be refreshed before it expires or you can get a new one after it expires. Now, when I deployed this model through the models interface in OML notebooks, I saved it with a model URI called RF churn model one. And if I wanna just take a look at the model deployment details, I send a get request to this model URI. And I can get information about this deployment I can see that I saved it as a version 1.0 that it's an in database model OML created by user OML user. There's a model ID associated with it and all of the model metadata. Now, to score a single record, I pass a single record, here's my record, to the scoring endpoint with the model URI. And I'm also just like Mark did requesting the top two most influential predictors. So I can see my scoring results, I get back and then here, household size had the most impact on this prediction followed by YBOX games. Can also score in mini batches of records. In this case, I have a two records that I'm passing to the scoring endpoint. And when I get my results back, I can see the results for both of those records. Sherry, how many records can we include in a mini batch? You can include up to 256 records in a mini batch. And after that, you would need to use our asynchronous APIs for batch scoring. Okay. And how did we get the model deployed to be able to use it as rest endpoints? Yeah, great question. So you can actually do that here using rest commands. But what I did was we have a one-click way to do this and I'm on the book. So let me log in. And then after I'm logged in, I'm going to go to the models interface. Now this models interface shows us all of the models that we have access to in our database. So I'm going to look at RF churn model one. Here it is. And here is how I deployed it. The deployments are in here. You can see that it's been deployed. So the way that I did that was I selected the model, hit deploy, I gave it a URI, which is the same as the model name, a version and a namespace if we'd like. And then whether I want to share this model with other users in my database. And now I get a message that the model's been deployed and I can go here and I can look at the model metadata. And I can also see the open API specification for this model. Again, you can also do this through rest endpoints if you want, but this is a simple one-click way to deploy your database model to the almost services model repository. Okay, back to you, Mark. All right, thanks Sherry. So we're going to continue on the churn prediction but we're going to look at using a native R model from our part. And so for data preparation for this, we're going to use a data set that has 4,500 rows in it. We're going to pull that data to the client. And then we're also going to pull a 45K data set to the clients as well. And you'll see how we use those going forward. Now building the R part model, this should be very familiar. If you're familiar with R, we load the R part package, we build the model and we can look at the results from that as well. But for deployment and to be able to use embedded R execution that I mentioned earlier, we're going to go through a few steps. And one is that we're going to save this model in the database R data store. And you can store any of your R objects in the database as opposed to having to store them in flat files and manage them separately. So in this case, we have the churn model data store and we see that that's reflected here. And also, if we look at the contents of the churn model data store, we see the R part mod R, it's the variable name that we had that it's an R part object and the corresponding size of that. Okay, so that's part one. Next to use embedded R execution, we're going to create a user defined R function that we'll use for scoring. We're going to invoke that user defined function locally to verify that it behaves as expected. We're going to save that UDF in the database R script repository. So we can actually store our function in the database. And then we're going to invoke that using two different functions. One is the RE table apply, which will pass in all of the data that we identify in our proxy object to that function. And also row apply, that's going to use data in chunks to that and potentially with multiple R engines that are supporting that. So in this case, we have the score data function that we're creating that takes in a data frame, our first argument that and DS name, which is the name of the data store where our model is stored. We're going to load the data store into memory. So that's going to take our R part model and bring that into our function so that we can use it. Our result is going to consist of a customer ID column and the is churner column because we want to do a cross tab after we get our prediction results say to do a confusion matrix. And then we get our predictions as well using the R part mod R model. The data that was passed in, which is an actual data frame at this point. And of course type is class. And if we look at the results that come back from this, okay, this is pretty much what we expect. From there, we're going to load our user defined function into the R script repository. So we're going to give it the name score data, just the same name as we have for our function, but you could use anything that you'd like there. And then we're going to list that from the script repository. And here you see the name, a score data and the corresponding script. Now to invoke this, we'll use first or a table apply. And you'll notice that this is the proxy object, customer churn 4500. So the data exists in the database. The function name is the name of that function that's in the script repository. And the DS name is where our model resides, the R part model. And we call that data store churn model. Now what happens when we invoke this is we're starting up an R engine that is going to process this request. It's going to load the data from the database into our memory, process that result and return that for us. And here we see that, you know, this is the prediction that we are expecting from that. Now we can do the same thing from ORI row apply. The main difference here, well one, we're going to use the 45K proxy object, our score data, the churn model data store, but we're specifying the number of rows that we want to be in each chunk. And so what this is going to end up doing is causing our function to be invoked five times. We've got 4500 rows and that we were requesting parallel of two, meaning that you should have two R engines that are supporting this. And so that's how we can get some parallelism out of that. And again, the results coming back from that. Can we use the scripts saved in the script repository outside of embedded execution? Yes, absolutely. So it's not only that you can store these scripts, these user defined functions in the script repository for invocation through the embedded R execution, but you may want to store other functions that are just available that you can conveniently load as opposed to having separate script files that exist elsewhere that you would have to manage. Okay, so the next thing is to use embedded R execution from SQL. And when you do that from autonomous database, you need to set an authentication token. Sherry showed how we did that from the REST interface a moment ago. But then once we get that token, we'll be able to use that in subsequent invocations. And this sequence here is going to take advantage of a function get token two. And we've already provided the URL username and password in a table, because we didn't want to display that in the open text here. And so we'll get back the token and then we're going to set that token. So we're ready to use that in other functions. And so the next thing is to invoke the RQ row of L2. Now this is the equivalent from the SQL perspective of the ORI row apply function that we showed a moment ago from R. And so you'll notice a few similarities. This is the input data. This is the table name now, customer churn 45K. We have the parameter list, the data store name churn model that we want this to be done in parallel. So we're going to set the parallel flag to true. The asynchronous flag is set to true because this is going to be an asynchronous invocation. Some of these operations can take a long time to process. So what you'd like to do is to set up the job, run it, and then check for the result being completed. And we're also going to use a service level of high to encourage parallel processing in multiple R engines. The output format will be JSON. The number of rows is again, 10,000. That's what we specified from the row apply function earlier. And the name of the function that we're invoking is score data. So after invoking this, we have a job ID. And this job ID is now going to allow us to find, when is this result complete? And what we will get back when it is complete is this URL that we can then process to retrieve the result. And here we see that we're opening the cursor, we're getting the job result itself, the customer ID is churner and prediction result and getting the first 10 rows of that. And that's what's coming back from our SQL invocation. But we can also do this from rest. And now Sherry is going to take us through that. Okay, thank you, Mark. So here I can score my data in asynchronous mode. I'm here are the parameters that I'm passing to the function. So you can see I'm passing that I want to run this in async mode with a timeout value of 300 seconds. You can actually, this is an optional parameter. You can say, if this runs for over 300 seconds, please time out the number of rows that I want to run at a time. So the row apply, the input data, customer churn 45K, the parameters that I'm passing to the script itself, the data store name, whether I want this to run in parallel and the service level, the autonomous database service level that I want to use. So Sherry, could you contrast the low and high service level for us? And what are the options that we have available there? Right, so autonomous database has service levels for parallelism and concurrency. And with low, you get the lowest amount of parallelism and concurrency. Medium is a moderate amount of parallelism and concurrency. And then high is the highest level of parallelism and concurrency. So this time I'm using the service level high, the default is low. We can see that we get HTTP status 201 back. That means that my job has been created, a job ID has been generated. And in the background here, I saved that job ID to a variable. So I can pull using that job ID, which is part of a URL, I can pull the job status, see if it's still running or if I'm actually given a HTTP status telling me to go fetch the result. And HTTP 302 indicates that my job result has been found so that I can send a GET request to the job URL result and get the return value for my script here. And it comes back in a JSON format and you can see that for this first customer here, customer ID, they are not a churner. And down below, you can see all the other customers and what most contributed to those predictions. Sherry, how can we get the curl command that's associated with this rest invocation? Right, so if you wanted to run these in curl, you will go to the code button and here's the curl. So you can see we're passing this token, okay, and our headers. And this is my OML URL, this is my job ID and I'm pulling the result location. So you can just copy this and paste it into your, you Mac terminal or the next terminal and run that curl. All right, so with association rules, let's go a bit further. Again, we're going to import our libraries as before. One thing that I'm going to highlight a little bit differently here than we did before is instead of doing a sync from a table or a view, we're going to specify a query. So this is going to get us a proxy object that corresponds to the result of this query. So we're going to start with a table, sales two and products two are the two tables. From the sales table, we want the customer ID, product ID, quantity sold and amount sold. With the products, we've got the product name and the product category as well. And so here we've got about 186,000 rows. We've got 72 rows from the products table and we just see a few rows coming back from each of those to highlight what content is there. But what we'd like to do is combine these two. So we want to join them, merge them using the overloaded merge function of OML for R, the transparency layer. And so we have product DF and sales DF and we're going to take the unique rows from that and we have a result of about 56,000 rows that we'll be working with. Looking at the sales trans cust, which is again a proxy object as a result of the join that we just did, we see we have the customer ID, the product name and the product category. And then the distribution, if we wanted to view that as well, we can use the overloaded table for a cross tab and look at the counts of each product category and use standard R for doing the result of that. Now with association rules, one of the settings that we can specify is the rule length. And so we're going to build two models, one with rule length two, one with rule length three. What that means is that in the case of two, we have one element in the antecedent, the if part of the rule and one in the consequent and with rule of length three, we've got two in the antecedent and one in the consequent. And so you can increase the rule length as you'd like depending on what type of analysis that you want to do. But in this case, we're providing to the association rules algorithm in the database, the fact that we have a data frame proxy object, the sales trans cust, we identify those special columns, the case ID, which is really the transaction ID, if you will. And also the item ID column, which in this case is product name. We can specify minimum support and confidence and of course the rule length. And in this case, the setting is the name of the model that we want to produce. And in each of these cases, we have the corresponding objects that are the model proxy objects because these are in database models that are produced. And we're able to manipulate them now and use them from R. So let's create a table with candidate bundles from item sets. One of the computed results from the association rules algorithm is the set of item sets. And so accessing this, we're going to order these based on the support and the number of items in decreasing order. And to highlight that, you can not only pull data from the database into our memory, but we can also take data frames from the client and create tables in them. And we can create tables explicitly materialized from other data frame proxy objects. So if we did want to have an explicit table representation of that, we can do that as well. And that's what we're highlighting here with the candidate bundles and creating the table candidate bundles all in caps. And we're showing a few of the results from that. One of the things that you'll notice is this is kind of a transactional representation where we have multiple rows corresponding to the same item set ID. And so these first three rows relate to the single item set that we're seeing here for these items. Now we might want to have a more convenient way of representing that. Perhaps we'll say that let's have one row per item set with the results. One way of doing that is that we'd like to concatenate these together. Now just to highlight that not all of the deep layer functionality that you'll encounter in R has been replicated in OML for R. So there are cases where you're simply not going to be able to do the equivalent functionality. This is one such example. Here we have the candidate bundles, right? This is a proxy object. The group buy and the summarize if you're familiar with deep plier you might have used something like this but the paste function in this case is not available for that. So to take advantage of this from R you could pull the candidate bundles into R memory. And then you'll see the corresponding results here where these are all concatenated as you would expect from R. But alternatively, you might want to use this from SQL. And so if you have members in your team that are leveraging these results they might be familiar with the list tag function and they'll produce a similar result here with those results forming the individual item list. If we want to display the top rules that are sorted by confidence and support here we're just going to pull out the rules themselves and we'll say let's get the top rules from this. And again, because this is a transactional representation multiple rows will correspond to the individual rules and we're seeing that. So in this case, we're saying that baseball is life cap plus the bucket of 24 synthetic baseballs implies purchase of linseed oils. So how can we use these types of rules to make predictions? Well, one of the ways is that we can pull out those rules that have a certain item in the left hand side. And so here we'll let's say if the person has indoor cricket ball in their basket then what else should we recommend to them? And we're just going to pull out those specific rules for that and we see that well if the left hand side has indoor cricket ball we might be interested in recommending to them any of these items that are here on the right. And that concludes the set of three demonstrations. So let's go back and just wrap up our presentation. So this brings us to a summarizing the broader context in which OML for R exists. You'll find the SQL and Python interfaces as well. The SQL API is really the foundation for the in database algorithms. OML for Pi has similar functionality to OML for R but it also includes auto ML, automated machine learning capabilities. And on autonomous database you've seen OML notebooks. There's also a no code auto ML user interface where the resulting classification regression models can be immediately used from SQL queries or deployed to OML services for real time scoring using rest endpoints as Sherry demonstrated for us. The no code data and model monitoring UIs support the broader ML ops requirement for tracking changes in data that support applications and models as well as changes in machine learning model quality. And lastly the original OML user interface Oracle Data Miner which is a SQL developer extension. So in summary, we've seen how you can use R for accessing and manipulating database data. With OML for R we leverage the database as a high performance computing environment for data exploration, preparation and machine learning modeling. And with in database machine learning algorithms from an R API, we're gaining scalability and performance in part by eliminating data movement and leveraging algorithms that are designed for parallelism and memory optimizations. We also can easily deploy machine learning models and invoke R user defined functions with system provided data parallelism and task parallelism. Now by operating in the database, users benefit from database backup recovery and security. So we don't have to handle these separately in our applications. Further, the OML for R interface is included with your autonomous database instances and Oracle database licenses. So note that on Oracle database you do need to explicitly install OML for R and as we mentioned from the client perspective this is supported through the Linux operating system. So if you have your data in or accessible through an Oracle database you can really take advantage of this R-based functionality. So for more information, you know, here are a few resources. You can also try this functionality in your database or on the autonomous database free tier and explore workshops through the Oracle live labs and this will give you guided tours through various aspects of the OML components. So thanks for joining this session and if there are any more questions we're happy to take them now. Yes, feel free to put your questions in the Q&A. Mark, there was an interesting question at the beginning of the presentation about the differences between the OML notebook interface and Jupyter notebooks. So with the OML notebooks, this is an environment that actually was developed by Oracle labs and has a number of separate features that allow for built-in visualizations. It allows you to incorporate and to load both Zeppelin and Jupyter notebooks and to export not only native representations but also Zeppelin and Jupyter notebooks as well. So you can interoperate with those environments and when you're opening a notebook you can open it in both sort of Zeppelin mode which allows you to restructure some of the paragraphs and how they're displayed, their widths and whatnot or you can open it in Jupyter format as well. And so those are just some of the characteristics that we've incorporated into the notebook environment. We also provide ready-to-use interpreters for our Python SQL, PL SQL and Conda for installing third-party packages. Exactly, yes. You don't have to configure anything to take advantage of those additional languages. That's all provided with the autonomous database notebook environment. And then in terms of running from a Windows client which was asked a few times, what I demonstrated here today, I took an OCI Linux compute node, installed almost for our client there and then I just exported a few ports so that I could run it directly on my laptop browser. You can reach out to me if you need the instructions to do that. Any other questions? Well, if not, thank you very much for joining us and until next time.