 I know I have turned the screen share on as you saw and thanks for reverse heads up dealt with the microphone. I will not see the chat comments and you know one of the special features of such an online tutorial is that I have zero interaction with you the audience so please use the chat and the other co organizers moderators will pick that up and so yeah welcome to this. This is this is novel for me in the sense that we opted to do this for two hours because online live streaming is hard on everyone. It will be hard for you to you know stay alert for two hours and follow so we'll have a break in the middle. But the good thing also is that we'll be recording so we can we can replace this. I've taken the material from what is usually a three hour session so I have 100 slides we may not be getting to the end but I think that's not that's not the end of the world. There's sufficient interest we can it's just online we can just resume and do a Roman to follow up material but otherwise I will try to give you a nice intro. And again the the link is at the bottom that is live that's at my website so you can get to the PDF has updated yesterday already. So let's get started but before we do that. Big thank you to everybody in St. Louis for the main news are 2020 conference that of course we could not hold because of the events of this year and the covert 19 epidemic, as well as the. And that's the first for us. Oh, nice type of their use are not eyes are satellite conference in Munich. You know, as a bit involved with the foundation we have a little bit of oversight to these user conferences so we're quite pleased with how everybody could recover from that and was that also a really big thank you to the ladies groups. Thank you for having picked up the slack on. And then forward I think as well for the tutorials and organizing these so mine today is co hosted by Santiago Valparaiso and Concepcion as well as the user group in Santiago so thanks for doing that. And with that, let's get going. So, um, our CPP is about 10 years old. It's a 10 years anniversary, you know, half ago so 1112 years. I've done these tutorials a few times. And there's sort of certain flow that that makes sense. So even though everybody I guess in the audience is already an user. I'm going to recap a little you know why we're really grouped here what makes us special. We have some key salient features of our are of which extensibility is one and going with C++ is is not a it's not a bad choice so that's basically motivations and a little bit of empirics you know is it used is widely used we had a really decent benchmark past just a couple of days ago and then along the way I'm easing into how one is going uses it and that's some usage of restrictions. While I'm at this often when we do these in a classroom you would be sitting there with your laptops and exercising and you know one thing that is a little tricky if you haven't used it before. We're now going towards using a package along with the compiler so what I usually say is a really good test come to them in the slides as well is. And I'll come to this particular function as well we have a. See and I'm nervous presenting even though I've done that a million times so that was a typo this is what I meant. I'll be covering this again there's sort of three key functions but the simplest of them all is a simple evaluator we call it involves CPP it evaluates an expression in the C++ context, and we can use this as a litmus test. Whether your compiler and system is set up if this works. We're actually taking this and I'll come back to this and it evaluates two plus two and really compute compute that but if you see an error, then you just know that your system isn't quite set up. So in this case, a really good fallback alternative still full featured and unlimited and free till next month, when a pricing scheme recently introduced comes in is a studio dot cloud, that is the URL so it's the host studio top level domain cloud engine in there with the GitHub ID or a Google Gmail ID and single sign on, and you then will find yourself in a cloud hosted our studio server instance on the Linux box where also all compilers are set up. This is just going to be tricky and varying this and you'd see that it's on a fixed expression and see this one I actually hadn't prepared for everything on the system so we have to install us first. This will not go a cyclone assistance sort of has to wake up get its motions in with the Docker container and then we'll do that. So later we don't really have to have to wait for that. So that's, that's basically set up issues you know in a in a live tutorial or sometimes do that we have a helper roaming the room who can, who can help you with the isolation reason we don't have that here. If you have any grave or issues. See, and now I can bring this back there know we can run the exercise. There's mailing lists and other support but we don't really have the time in the in the scope of the tutorial so was that back to the overview and the first topic is sort of wire. I'm a reasonably old and long time our user but when I came to our at the end of mid to end of the 90s and 2000s, and even at the time long time our user was at burns, whom I had the pleasure of meeting a few times. And it was all the way back to before our when s plus was still done with that stuff, the way he first worked after his PhD. Pat is a consultant who runs burn statistics and the author particularly offer booklet some of you may have heard about the inferno which is a really excellent free PDF right up about some corner cases intricate details which is worth learning about our but because he's an our consultant. He of course also evangelizes for I am he when I was, you know, doing this revision of the slides and looking around how would you motivate are. I remember his website and went back there so that's really good. Saddle about, you know, why use our what is our what is the our language and why would we use. You know, us, who already have already, you know, taking the pill and fallen for the course we we know it strengths it's, it's not just a package it's a it's a language, and it really is operated. It's designed to operate in a way that we think about problems it's the language designed by statisticians for statistical problems and it's both powerful and flexible. And it deals really well with the statistical problems because it's it's interactive john chambers called that turning users into programmers. There should be a really smooth transition from from working a little bit with data to exploring which has to do with graphics of course is that much more solid support for things like missing values and other features important for modeling. And everything's done really as functions and function calls packaging systems really strong and this pet points out in this set of bullet points from a few years ago communities really strengths and we all know that here, given the way in an event hosted by the ladies. So what is our is programming with data, and it's really strong as ingesting aggregating visualizing modeling and reporting and just about any way shape or form. I think of our as a, as a middleman sitting in all types of data flows, just about anything you can have on the left hand side and formats can be converted into just about any output on the right hand side so you can be going from binary data and various forms of commercial text stored what have you binary network you name it and presented in ASCII text PDF websites, dashboards web front ends what have you it just does all of that it's really a central point the central place the central middleman for programming with data or, you know, I work in industry, and often we're not in a single language environment. It is also a really good player as one programming language in a concert with others. Now what's the story perspective where did it come from. And this one's, I trained with this quite quite a bit, because I think it was very early on I think in, in 2011 or 12, I had an opportunity to travel to the West Coast because of the Google some of code where I was mentoring as I am again this summer, and a couple of the mentors, a couple of representatives of ET and get invited to a mentor summit at the end and that is of course in the neighborhood where Stanford is in the virtual chambers live so other than so because it was clear that it was coming there a couple of the users at Google reached out and we had a meeting the day before and discussed and John had just given a talk a week before and had used this, this slide which I hadn't seen before that was 10 years ago and the funny bit is and I'm going in circles now as of last month actually that very slide that that that drawing that sketch is now in a paper in an ACM collection on history of programming languages, most excellent, the TUI is at the bottom it's on the bottom of this slide. And it's from frigging 1976 May 5. So the, the vision of how we should compute with data and the systems that we would like to build is, is decades old. This is a quest that has consumed, I mean, well more than a generation of statistical programmers so really none of the questions and problems are new. And a lot of it, you know, has been has been framed and phrased and analyzed and also implemented sometimes sometimes with detours, sometimes straight away so the article goes a little bit more in detail around this but basically what people and what the context was back then when at AT&T at Bell Labs, which was, to remember which was really unique at the time because the telephone companies weren't broken up. There was one national monopoly, which was making of course money hand over fist and could support a very large amount of research efforts at the time so this is like, you know, the equivalent of Google Brain and Google Research and a couple of those back then in the day it's also the home of UNIX, the CNC++ programming languages and they had a really, really strong statistics and statistics research department of which John was a member and they were thinking about a new system for statistical computing. And they wanted to have basically something that made it easier for the analysts to interface with canned routines. So what the sketch here is described as an inner FORTRAN algorithm, because everything back then when it was FORTRAN in the pre-C and C++ days, that is accessible through a subroutine that provides access to it and that is then called XABC, the interface to ABC. The rest of the sketch has to do a little bit with how information could flow to the routine and back with little designs, the interesting aspects of it are not really that important but the sketch really has a very special place in my heart now, not only because it's so old, because John was then a very sweet and very generous with attribution and actually already then eight, nine years ago framing that RCP was really the first project at sort of, I'm not putting words in his mouth, what did he say, sort of fully or completely implementing this version in providing an interface from an inner implementation language for RCP++ and an outer user interface and we'll see this, but this basically predates as which picked up a lot of these ideas of giving an interactive prompt interface to combine routines behind it. So and again, now you just not only have me rambling with a borrowed sketch from many, many years ago, John put that in his own words and it's a good part of this article of 17 pages that you'll find there. How did the world evolve after that meeting in 76. Well, there's markers and many of them are actually books by John Chambers. So this one came the following year 77 and is still all the way in for drum routines. Then there was a first departure towards S, but that one was how shall I put it retracted or revised or seen as not that great a solution that's a so called round book here in the middle, and I'd forgotten that it was there and when I gave the talk with these slides back in London a couple of days ago before this book existed had burns himself reminded me that actually skip one but then the evolution really came that the new as language so this blue book, replacing the brown one was the first one, describing the basics of of as the language as we as we know it, and important additions were made in the subsequent books so this is chambers and hasty and this one statistical models and as introduced things like LM, the formula notation and LM with the fact that an object comes back from fitting routine formula and other things. White book. Then, a couple of years later came. This was then late 90s 1998 still only as and john at that time I think not quite an alcohol member. This one introduced the S for classes, which became very important. For example, bioconductors built entirely on S for. And then there's a 10 year break. And this book came by that time, john's a member of our core and the title no longer talks about S but has in the subtitle of the title programming with our really excellent book I enjoyed this one a lot already has a strong sort of historical view and description of the evolution of language, and one that is really dear to my heart and where john sent me a couple of pre prints that are went over before it came out is from 2016. Extending our because that's really, you know, close to my heart and what we're doing here and we'll be getting to extending on just a second but I really liked soda software for data analysis from 2008. It has two chapters towards the end interfaces one and interfaces two. But it doesn't really make a point quite as strongly as extending our then does because extending our takes two lines that many of you know or have seen that john has described earlier actually in soda as well and when I asked him whether that was the first basic kind of said no no no no those those sort of were around but we don't have a clear starting point to it so basically these first two main descriptions were that everything that happens in our session. Or everything that exists is an incantation of an object, everything is an object and every change every alteration every computation that you do around objects is a function call. And those are really key important descriptions at the for the very core of what are is, and then this book added a third one to it interfaces to other software are part of our. And that's a really fundamental change if you think about it, because beforehand, you know, we might have come to us and then are as statisticians, not really knowing other languages so we mostly dabbled in our work quite heavy doing things and now sort of the statement really is made that. No, you shouldn't you shouldn't think of the world as monolingual. If something is done really well in another language. It's natural for our and and important for our to be interfacing to it. What this book then does is that it goes and contextualizes that for I think he calls a programming in the small medium and large and goes to three different interface two different languages and three different interface ways. And the three are Julia, Python, and I always forget whether Julia comes first or Python comes first and how the two of them are done once done with remote procedural calls. And the Python example is done with a pre says I'm not yet reticulated, but the current support packages with reticulated, and then basically Roman three that the main part for programming the large is going with C++ so there's a there's a big endorsement. And john also contributed some code to to our CPP and here again I'm highlighting this so you know programming in the large complex projects require a broad and flexible setup, no single language or software systems likely to be ideal for all aspects interfacing multiple systems is the essence. And that's what gets us to C++ because, you know, if you're really serious about extending are and willing to make that commitment, you may as well do it with our CPP. And before our CPP came the main mandated way of doing things was kind of highlight with them also almost jumps to the next slide so sort of that sort of the third time to me now so I'll stop this is the one thing in sort of a second generation of an interface that was an earlier function dot see that just used basically see at a minute stick object sort of in stubble or pointers of those if you know what that is. And this one then is, is bigger richer and more powerful and by now really the one that once was used and everything happens with so called as expression pointers. And these are basically abstract representation of the things that are there and are the objects and the nice trick is that the S expression encompass all of those and what our CPP then did was basically at a very powerful generic way to deal with all different of these S expressions. And that's basically what we're doing here so it's it uses an old C programming trick from the 90s before oh oh it's really a it's a union type with a bitmap that describes the type of representation it is whether it's a list or a numeric vector. So that details, we don't need to know really because our CPP shields them, but that's how it gets to that. There's a couple of internal ones behind it sort of things like or things that are still accessible even environments function and then a couple of others that aren't directly exposed but the key thing to keep in mind is that CPP is basically the magic potion that allows us to deal with all the different types from from the C++ site and I'll show some examples and put more will be from that statement. One of the reasons we're using C++ is that C++ has language structures that makes this unpacking of the SXP is particularly powerful, but our CPP CPC plus plus is generally just a useful language because it's fast. It compiles down to really efficient code that is really second to none for the systems that's a that's a design feature for C++ it aims to be both. If you want to use it that way functionally enough to be able to reason somewhat abstractly in it almost like functional programming yet at the same time be so efficient that it sort of crowds out and doesn't leave space for another language you would interject they're just for performance reasons so Bjarn Strohstrup as the creator of C++ is very adamant about that point and talks as well so it's all the way down to the machine level for efficiency. It's also old and widely used and had a really beautiful Renaissance, you know, sort of at one point around the year 2000, you know Java was coming up was seen as the next next thing then C sharp came it wasn't quite clear how C++ would respond and things got, you know reformulated new initiatives got started and the new C++ standard for C++ 11, which missed the year 2011 by a few years to it was ratified was sort of a really big response and then rebirth. And now the languages again really actively developed their newer standards and it's, it's, it's fairly fairly vibrant, but decades worth of use so many libraries many tools, very universal, you will likely have a operating system or or machine that you can use it on certainly wherever I use the on-site access to the C++. The combination of R and C++ and RCBP we find hits sort of a nice sweet spot it's, you know, of course you're going away from R to another language so there's a little bit to learn but it's really, it's not as complicated as going full force into C++. But it still helps us and is a great host because build and operating system specific complexities like linking are taking care of by being embedded in R. It's fairly expressive, because as I'll show, we can do things that we as our users have gotten really used to. The vector, you have expressions on the vector that take the vector as a whole. So for example, if you have a vector x the sum of all elements of access sum of x, whereas in some other languages, you would have to loop over it, access every element explicitly and sum up by hand. This is what I mentioned earlier that all these internals of R vectors matrices lists, environments functions get access because we can unpack the S expression point as the SEXP so well. Because C++ is performant RCBP is pretty and performant and it allows us to to extend things so it's a it's a pretty potent cocktail and as as we've seen over the last couple of years it it got a good uptake. So this was sort of somewhere around 2008. I started running a script so near daily or then daily that basically writes down the the SQLI database I think that's where I stored for these for these uptakes it's the number of packages using it. And JJ reminded me that at one point I was in Boston, I think I was running the race. There's done a few times so it must have been one of one of these one of the last and I think he reminded me that I had really brought smile on my face and was proud. You know, as a young father saying that oh my God we have whatever the number was that 30 or 60 packages using it so the uptake was was relatively slow. And you see that there's a little bit of a structural break here on 2014 not quite sure what caused it but the slope became much more pronounced. I could go on about this chart forever because there's a couple of really interesting tidbits hidden in there when it gets really jagged and it jumps up and down this was right in a January and that happens again this year. That's actually crown taking out the broom and throwing a number of packages off-cran or even more than usual that happens a bit more than it used to at other times. Here again so when I look like it flattened and then again sort of this year so it's reasonably steep so it's quite nice. When I started putting the second chart behind it which is the right axis of percentage of packages using our CPP as a proportion of all packages on Cran. These two lines actually shadowed each other for a little bit maybe around this time but then that also changed and this one's quite interesting now because you could almost see sort of an escurvy as it often happens with gross curves because clearly there's an upper bound to how many packages we can we can hit but I blocked about that just a couple of days ago because we hit two numbers that I was watching. We've now crossed 2000 which is just completely mind-boggling I mean there's just so many packages and another one that I was looking at is it's past 12.5% so one in eight packages on Cran uses it and that's a really just a compliment and an expression of phase of the community in what the packet provides and continues to do so so it's very humbling. So I updated this I think yesterday's and it was 2013 and of course whenever Cran admits things these numbers changed a little so I think as of this morning it's 2014. Bioconductor somewhere between 2003 and 2005 differs a little whether you look at bioconductor release or develop and a hard to know or unknown number of projects on GitHub get other places close source project at work what have you so it's it's it's pretty widely used. came from a presentation that Andy DeVries gave at a USR a couple of years ago when I was in the audience and had not seen this before the computations are based on a package of his which is always on GitHub I think it never made it to Cran for whatever reason. And he's encoding the logic of the Google page rank algorithm which once you look at the details of it is just, you know, singular value decomposition application once again it expresses the graph of who refers to links to in the web page sense or in the packet sense uses and the outcome of that is actually pretty strong when he showed that a couple of years ago I think we had we had just already arrived at pole position and oddly enough there's the gap to the next one still keeps increasing over the years which confuses me little because you'd also think that something that is used sort of just about everywhere like, you know, knit or or oxygen would be stronger but then somehow we're still far out here and then in the next few ones sort of some things have changed I think Gigi plot was when I first made these charts a little behind Mars not slightly ahead but you know there's strong effect of course clearly of tidy worse things, as well as recommended packages Mars matrix, and then it just goes all the way down. And we do know my thing used to be a top three data table one of my favorites there and as we are my deals that we'll talk about in a couple of minutes is also on this list and I think I would try to cut this off at 30 so that the access labels are still readable because after that it gets pretty boring and it's a relatively fixed percentage one new computation that I was able to do since or more easily since I know was able to do at all since R340 R340 gave us this function Cran package TV. I after 20 plus years with us to get a bit confused between the tools library and the utils library and it's sort of slowly dawning on me. Tools a bit more for what cron itself uses for dealing with packages cran itself package installation tests a lot of the R command sub functions and tools cran package TV basically returns to you. One big honking object with as many rows as their packages as cran. And then I think it's 65 columns of which one important one that I didn't really have access to beforehand was the yes or no toggle of whether the package needs compilation or not. And that basically means does the package have a source directory, which may be C code Fortran code rust code. I don't know what or C++ code. And we with that basically get the proportion of how many packages actually do need compilation out of the total cran packages and as of again two days ago that was about 3900 and a few out of 16,000. And that then allows us to compute out of those that need compilation, how many use RCPP. And when I was starting to compute that in this slide, I think, following R34 and a couple of years ago, I think we started with low 40s. It's been steadily creeping up. And it was a really big deal for me to when this hit 50% a couple of months ago because it now means that among those packages that are elaborate enough to involve compile code. One out of every two uses RCPP. That's, that's actually pretty much the strongest expression of usage and uptake by all means you're not required to use RCP and you can do everything sort of the standard way as well I'm not not saying anything here it's just empirics show that we seem to provide something that enough packages find useful to use so it's good. It's nice to have these options. And then basically, so that was sort of setting the landscape. Why would we do this, is it being used and now let's get on to how do we actually use it and with that, I made that earlier when we started pointing to about CPP. This was the other way a really good first test. If you have to check that on your machine because you're suspecting that something may have changed or you're installing it at first is it actually working right or not about CPP allows you to submit a really simple limited string. Not a complete program, not a complete expression. It's not even assigning to a variable. It's just just a part of an expression that can be evaluated around which RCPP with glue code, which is something it generally does, creates a callable function worked up enough so that it can actually be compiled, linked and then loaded. And there will be some C++ code evaluating the expression that we get and he adds two plus two in what I've done. In the early example in three plus three, you can just use just about any expressions and if the answer comes back as you would expect, then your setup is right. There's options to evolve CPP where you can do it in verbose mode, ask it to have it show you the little program that it sets up in writes, and all of those are good for for checking. There's a little bit more that you can do with RCPP and you can evaluate other expressions to buy quickly, you know, returning random numbers from C++ rather than from I itself, but it's it's mostly a tester. And again, RCPP doesn't really impose that much on your R system. That isn't already imposed by wanting to compile packages from source with or without RCPP. So on Windows, you will need to install our tools. Mac OS, you generally have a compiler, but that too is a little complicated. I'm not a Mac user. So I mostly refer to the FAQ and blocks and write ups. You need to get the setup as provided by Simon Urbanak and our core for use on a Mac, but if you do that the right word and just works. And where I mostly work, which is on various Linux is on laptop servers or in the cloud and compilers basically always the end just works. And as mentioned, a really nice solution accessible for everybody with a web browser is Astrid.cloud. Starting in August, I think you will be limited to a number of hours per month, but it's still it's still quite nice to have that as a fallback. And if you'd rather use that, then by a second computer, it's still not a bad value proposition. I think it's a, I think it's $100 a year for more. Modern Eric's setup, which, which is not bad. I've used it twice with a class I taught and it's, it's really quite, quite nice. So, there's really three key functions in RCPP that permit you to access compile code about CPP. It's the simplest. We've mentioned it before it just takes an expression. You can also use it to look at what on your particular machine, the maximum value for numeric of type double. This is how C++ can look, which is a little scary. And if you look at this, the number that comes back is what you get for a double. And it's actually the exact same number that you get back from I itself. If you look at machine, it's a subfield in there will have the same 1.790 308 because 64 written a double, that's the upper range. And with that, let's break maybe for a minute or so. And if you set up on a machine as I was as I am here, just play a little with eval CPP and see if there are things that work, don't work, how you can break it and just just get your feet wet and create some compiled code. And I'll take my clock now and maybe come back in a minute or so and continue. Already, maybe let's get back to this. It's really a bit limiting that I can't have immediate feedback from the chat, but it's, I hope you've got a chance to pick something up there. The next function that actually begins to allow you to have some more useful C++ expressions is CPP function. It generalizes a concept that was introduced with the very useful inline package that I said if you relied upon at the very beginning in line already allows you to take C C++ or now thanks to a contribution for functions given as a text string and compiles links and loads them in JJ saw that, like the idea and then generalized it a lot in something we call us a VP attributes which gives us that, and then a bit more. So we will forget to in just a second, but CPP function basically allows you to take an entire string and submit it to our and it turns it into a function. It's really clever because it does a little bit of passing and gripping it sees from the submitted function what the name in the C++ language of the function is and assigns that to an object of the very name automatically. So if you write example CPP 11 here it'll become an alcohol function with the same name. There are a couple of other options and toggles. You can tell it where to look for particular headers to insert features on or off. This slide is now a little dated because before the current or even previous R release, you would have to opt into C++ 11 as a compilation standard since then it has become the default mode so are does that already for you so this line is actually somewhat redundant but I just keep it here to show how this works. And this is an illustration of what we can do with C++ 11 if you know a little bit of C C++ and we'll get to that a little bit later to one of the aspects of a compile language is that everything is typed. So we're saying here that in gets returned. And one of the nice things that C++ broad is that the compiler gains an ability of determining the type of a variable from the assignment, because this variable is assigned 10, not 10 dot zero. It knows that this is an end and makes x an integer variable. Which we then still have to declare the return of the function because that wasn't in C++ 11 it's in the latest standard but small details. The main exciting aspect for us is we can just say CPP function from quotes to end quotes and it's a are callable function with this code provided that runs that and that's actually let me do that actually real quick. Let me bring my local version over here because you'll see how that works. If I get on this machine right now if nothing the environment is empty and if I then do the C++ function and I'm going to do something simpler just to keep the typing simple. So I'm just going to say I take a function that takes an end aid and then returns the sum of a with itself so to a hold on am I having my parents in the right space up. That was my mistake these didn't belong there so that looks better. And when this compiles now you see that in a second in the environment will pop up the function. If you look at double me. It's actually it's a little tricky it's just really dot call calling the code that contains this plus a little bit of glue around it so it's it's one level of interaction but the identifiable object under that name is created from the here and that is really comes from analyzing the code. So if we then said triple me and made it this you would see that we'll get third function created and look just similarly there's a dot call to a memory address and the variable gift. So very powerful very useful, quite popular. And yeah so there's there's a little summary here. And I have a suggested example here sort of library RCPP, which is how you would do it if you don't want to prefix it as a colon that I've done. I'm going to write this function and then call it this way and reason over whether it works. Yes or no, or whether it would not work. I'll, I'll set the same function up over here while I leave you the screen and then we can look at that in just a in just a second. Let me bring that back here so I did that quickly so that's the same function and there's basically one, one minor trick here we'll get back to the some some other implications of this as well. Well, we wrote this function as returning and then taking to it but when I'm calling it with F 21 and 21 what am I calling it with my calling it was an in no I'm not calling it was an in I'm calling it with a double a numeric and the C and C plus plus language to casts automatically when they can be done unambiguously. So a real number of 21.0 can very easily be converted into an integer of 21. So that's the same as if I've given integers explicitly just like this. But what is next clear is when you actually don't have an end, then they get lost fully converted to an integer by being truncated. So this works, but it may not always be what you intend. So maybe you really wanted this function to be double a so it depends on another useful trick is that B actually has enough machinery behind the scenes to pick up the fact that hello world is neither in nor numeric and already in numeric didn't really satisfy the the the expectations of receiving an end. But the way the language is defined is that that cast happens automatically and you can't actually stop that. But for things that don't pass. So a character string clearly is not an integer an error gets generated on it gets recognized as an error condition. An exception is thrown and it gets turned into an error and it comes back to the prompt just just nicely here so you know it doesn't it doesn't go up it doesn't board. It allows us to continue quite directly which is which is actually quite quite handy. And, yeah, and basically the context that I had implied here was the question was, you know, sort of other cases where there may be complications and these corner cases have to do with the types not perfectly match. So that's really source CPP is then the next one behind about CPP and CPP function. And it to is described in the one vignette describing all of these and machinery and source CPP is the next level of generalization because it allows us to work on a whole file, which may contain several functions as once. So CPP then reads the whole file extract all the functions in it and provides them similarly to how CPP function had provided them and just how CPP function generalizes a function from in line so does source CPP. A key feature here are plugins, the plugin that we've already seen with CPP function, you can also declare those in for source CPP and bring in other packages set up to cooperate with our CPP. So CPP has to be one of them, I get another and GSL as well as toggles for turning on C++ 11, 14, 17, or the draft for C++ 20, OpenMP and more as documented. How does that work. Well, the easiest way really I find is to. Let me show you again, I'm not managing to to bring the window down to size but that's well enough so an easy way to get a working C++ file is just in our studio say new file and say C++ file and it brings a pre canned cooked file in with a little bit of comment. So this is how comments get defined and C++ behind a slash with pointers to further resources, our website at least right up from the bounce on the USB gallery that I don't know about. And then there's a function here, and this is how we declare those so easiest way we file new file C++. And with that. slides, because this is what it brings in other aspects are that C and C++ files function by having include statements at the top that declare basically the available interfaces the available functions that that can be used disappears. Sorry. Include statements, basically provide to the compile information about what can be accessed so that's very standard and we set RCBP up in such a way that you generally just need this one include us. And then one thing that's fairly common, especially in code that you first write to experiment, maybe less on packages, you can say using namespace us VP. This is the namespace declaration C++ as our users we know of course the file named case namespace where we can also do import declaration statements has the same effect here. So using namespace ACP, we don't need to prefix us VP entities with us VP code and code. It's good for short of files. I personally don't use it much when I program with us to be I'd like to be more explicit and have them in the source code that's that sort of more than stopping comments come behind to slash lines. We can import the actual export by source VP happens after a tag called us VP export, which is for this version written as a common following the two slashes. And then there's one other convention. See code comments. I'm not declared line by line following double slashes they go over a longer range following a slash star way to posing star slash. I had the clever idea of generalizing this into slash star and being follows now is something that we should give to our and that's actually super handy, because if I now do, I have to save this first. That's always the same because if it's not saved. We can't sources open and mistyped but for a share. I will see VP. Tim. And then to write demo file and demo file. And of course it's not source. I'm really in presentation. I had in mind. So when we source it, it would take it as an off file and of course that doesn't work, but so CPB takes it as a C++ file. There was one function in he declared as exportable. And that one is now listed in the environment as well. And this trick that I was just trying to get to was that this corresponds to if you wish to examples and or unit testing. In a poor man setting everything that follows the common character and the meta character are is executed when source to be be goes over the file. So you could, you know, also put another comment in here. The button here or we call source CPB from the bottom and I hit the button here prefixes as we record golden otherwise I can do it as well immediately the command is also clever when the file hasn't changed. Notice that it doesn't need to recompile. So there's no one suppose of these commands. So that's source CPP and that's the, the third one basically of the key functions for using our CPP from from the attributes min yet. So what just happened we defined a simple C++ function times two, it has a single numeric argument that was actually a vector the example functions years ago we did it with a with a scalar but it's easier with the vector. We hadn't actually even run it so times to supply a vector of 20, 21, 31 again, all elements of the vector back as expected. Again already you know vectorized operations at C++ we double the vector in a single one line statement quite nice as we recreate the red wrapper compiles links and loads that makes the function and with that maybe do two things here let's have a quick break because I've gone on for an hour. So let's maybe make this 10 minutes have a coffee go to the bathroom but also at the same time try to work with this a little and go to in our studio file new file C++ template to work off the file. Otherwise go back to the slide and just type the few lines by hand what's really salient what you need these two and these four this is already optional if you just put these two in a file you can source CPP the file or they created from astro view or or not so have have a go with that and play a little with times to make it times three do another operation and it's the argument just do variations around it and we'll continue in maybe five minutes at the super hour in in seven past the hour. Alrighty, let me maybe slowly slowly slowly come back. That was a fantastically timed pause because being at home as we all are these days and on retail internet. I'm in the US, you know, at the mercy as many of us are of Comcast and they just they every now and then come in and just reboot the system and they seemingly just did that to me so I was actually cut off from all of you for a minute or two. But now it's back sometimes they just use it to update the firmware and let's now get get back to the quick example and Gwin in the chat had a really good really good question running into a nice example gotcha that can happen so I'm just going to retake the question just a little differently. So the question there was if we I'm just going to do it for argument sake on a on a scalar here. We can then look at the vector as well. So the question there was when I just want to take a value axe and cube it what happens and you see here, you know, our studio is already helpful to me and yells at me that that looks like trouble brother. And if I source this now. We see another helpful example of our studio studio picks up the error message from the compiler and situates the error message right in there with the error and basically tells us well this didn't quite work which was my immediate ad hoc reaction in the chat as well. Cubing that way works in a couple of languages but seen C++ are not among them so if you want something to the ends power, you generally take the function. How, and this is another trick and I'm running ahead a little bit of myself and later slides. We're now taking a standard C++ library function from the standard namespace and access it. If I know actually go back and do it more the way that the question was asked in the chat and do the merit vector on both sides. And then it's no longer an atomistic C++ type so I can't do standard pal. But I think this will work because we have and that was a nice stroke of genius. We don't hold on. I thought we had many functions we have Oh, sorry, my bad, of course it works, but I can't have the same function on the two names in a thing called us to be sugar and I'll talk about that really briefly and we'll get to that. So basically functions that we want to be available because we use them from our other places have been declared to work on the vectors so here I did this basically is also standard valid C++ syntax not just C++ CPP and is close to a C file, whereas here we're generalizing that and working on the vectors but we'll get back to that in just in just a second. So let's go back to the slides. I hope you all had a chance to work a little bit on the examples. It's a little tricky sadly that we don't have the feedback to all that all that well so maybe we can just pivot that over whatever the tutorial to stack overflow issue tickets I just pivoted just answer to get a vision ticket as well. So that was basically the main just about CPP just for litmus test source CPP for quick on the command and maybe interactive in the R session short functions of just a few lines that you can type immediately and then source CPP for something more elaborate and longer. That's really how it works now back to that we have the how back to and you know sort of it started with the motivation of why we want to do this. Now we just covered how one actually does do that and now let's go back to the motivation of well you know what's what's really the upside here and one good example that provided a lot of material and use for many of the session is something that I once found late one evening sitting at the same computer here at home and looking at stack overflow. It was years ago and a fellow had written in a function that implemented this equation this formulation which some of you will undoubtedly have recognized as the Fibonacci sequence of Fibonacci recurrence. It's something really simple and elegant and recursive. It defines a function f of n such that f is returned if the argument is less than two. So for n equal to zero or one small footnote here sometimes the Fibonacci sequence is only defined starting with one. I follow the convention here of starting it at zero and a big hand wave is over the fact that we do all of that. So the stop of the recursion is that for the smallest arguments less than two will return the value itself. All other arguments will invoke a calculation of the sum of the two preceding arguments of the same sequence. It's all very simple and it translates perfectly into our code. So in that state overflow question the user I think already has something pretty close to this. But in essence the same it's an R function that we call f that takes an argument in the end. If n is less than two we return that value in in all other cases we return the sum of f for the two preceding arguments. We can invoke this on the sequence from zero to 10 11 values for zero it's less than two we're returning zero for one it's less than two we're returning one for one and we're returning one. Now for n equals two this is no longer the case so we're going in here so n equals two is the sum of the two preceding values minus one minus two so it's two. For three it's the sum of two plus one it's it's it's sorry for the next one it's three and then so on five is the sum of two and three three plus five gives eight and so on so the recurrence grows grows quite quickly. Why was that user despairing in the stack overflow question well he was very ambitious and used to very high value of in already years ago. If we just go in and say if you were not just sequence of 10 we see that hundred runs here with a simple old benchmark. Take nine milliseconds that's really nothing but if we increase it just going from 10 to 15. So 50% of the payload, we get a 10 fold increase in speed, instead of nine milliseconds it's 94 milliseconds. We do that again at five more and go to f of 20. All of a sudden it takes a full second or over 100 times more than f of 10. Why is that. Well, the function is more expensive than exponential, because for every invocation, the past that it has to compute is unaware of the previous path or duplicate path and has to retake for smaller arguments. It's simple if you go in for f of seven it's the sum of six and five, but for six again it has to calculate the sum of five and four so many of these notes get revisited so it's a really great. And this is a great example for memorization and that's the discussion that I use for example the chapter of the RCPP book to go over. So, and he then had done this f of 30 f of 35 may have been f of 35 and went off for over half an hour of a single call and he was sort of dismayed. So, great example for making our look bad, because the invocation of a few are not just recursive and are for all its greatness and strengths has one known weakness. The flexibility in the language comes at the cost of some overhead and calling functions and keeping state of functions and their context closing environment sort of all of that so it's it's not the cheapest operation. So whenever we invoke a recursive function will just basically pay pay a known high price repeatedly so it's an example that makes alternative implementation that are fast, just as this one very very good and better than in real life because it really hits I had a weakness, generally speaking back to our motivation here RC plus plus just how we had written F in our we can write a function G a variant in C or C plus plus code just on scalars in goes and in out goes and in the argument is less than two we return it. So, rather than typing the function now we're also simply colons at the end of the lines but this line is essentially the same. If it's not less than two we're returning the function itself with the previous argument and previous argument. And we can wrap this in CPP function which we've seen in the previous section, but just having it in between single or double quote just as a valid string. Invoke it again on zero to 10 and we get the same sequence of your different values. Great. So we've validated that we get the same result. We can compare them in terms of speed. So we're just taking f of 20 which we had seen was the most expensive one in the previous benchmark taking about a second takes still just about a second, but G of 20 takes no time at all. So we're having a relative gain of maybe 500. But as I conditioned. When I started talking about this. This is not what you usually get. On more typical problems you can get 10 2030 easily on certain examples 50 80 times. It really depends switching tools going to compile language definitely helps with a lot of speed but getting the right algorithm that is just about as much. We can do the same thing again with the different benchmark rapper micro benchmark which gives us a slightly different table with quartiles and mediums as well as an ability of sticking it into auto plot and you see, you know, that it's really a very big difference and every now and then these scales because there's a, there's a member allocation happening and arguments up but that's that so yep for certain problems as we be is clearly very compelling, even though in our day to day work we won't have. We don't always have recursion so the game isn't necessarily. Because we're running late, and we're only having, you know, two hours instead of three let's skip this example and you can try that one at home. One of the things that you can run into here relatively easily which I hinted at here is that in may overflow so you may want to switch this to double and double. So, I talked a little bit of C++ but I never really introduced sort of all that well. A little bit of context held because invariably you will, you know, run into cases where you have to debug a little or something goes wrong so how do we generally work with C++ well, you generally need a compiler often it's g++ format users will be client plus plus. And you have to feed the compiler often a minus capital I for include directory argument where it finds these acute files. And when we then say minus C given a C++ file it will only compile leading to the next step where we're linking creating with minus all an output file so this is the executable one. Based on the object file traded in this step, also supplying a directory where libraries may sit in libraries that may be used. So this is already a bit more of an involved example because we're using external library and these two steps. And this stuff can get tricky because where these places sit may differ between operating systems all the rest of it so it's really great that we don't have to do that because artists care for us, but that's a bit of the context. The other thing that are already mentioned is the typing us different it's dynamically typed, we can perfectly well assigned 3.142 acts, following up with an assignment or food to the same variable x, I doesn't care. It's just that x and changes storage mode here is numeric. After this it's a character in C++ each variable before you use it has to be declared of a particular type. You can facilitate and enable this or that a variable generally speaking cannot hold these two values at two points in time you need two different variables for or so called very entire but that's common types are in and long for C and C++. Now we only have int as a 32 bit in sometimes you want 64 bit in the other languages have sometimes these in gets combined with the qualifier unsigned, limiting them to the non negative space and the longer range. Generally speaking, in goes from a large ish but not usually large negative number to just about the same positive, you lose one because you have zero in the middle. Similarly, and back when I was starting people used to care more about floats as shorter numeric variables. These days. I'll do everything with double because that's natural size on a 64 computer anyway so there's not that much left this flow and a little bit of a resurgence with GPUs where space is more limited shorter variables. They're still doing that. A little bit in in deep learning but that's only really many have to push through the middle really really hard. And then there's characters, which leads to something like string. All of these variables are scalars and natively, there is no such thing as a vector, but it got added later with C++ extensions. Another important thing is that you can create classes classes and C++ broad see already had struct struct is if you wish a little bit like a list type in are that you can have composite type several types arranged and really grossly simplifying and keeping it short in the interest of time struct holds types classes generalizes that by being structs plus code. So we'll get to that in just a second. Constructors are very, very similar. As I said, seen C++ came from Bell apps in New Jersey as did also similar wording blocks are defined by beginning and curly braces similar functions are similar to but their differences. So that are we can call functions with arguments by name and in position that's a little stricter and C++. You can skip as easily elements in the middle. There's a lot to be said about point as a memory management but if you really believe in in C++ and more modern C++ and alternative to see with the things like types for vectors and lists you have to do that much memory management yourself if you use these containers correctly so it's not as much as a scare and oh my God, I have to learn pointers. But but for some more advanced library building cases of course it's it's useful. So really quickly about this notion of structures and classes and what operate orientation is. Basically something that would have worked in C code already we're declaring a struct date containing three ends and just to show that we can use them here because years months and days cannot be negative. I'm not using an in but an unsigned in constraining them to be to be positive. We can then use a date in another structure we can nest them just so you can have a list in a list here we have a structure in a struct. And a date can be used to define a person as a combination of a first and last name and ID number of some kind student like the employer you will have as well as a birthday. And you can then do one more. If you're going from struck to classes because in classes now you get a new tool to work with you can shield the data. I can now say year months and date a private, which means from outside of the class object instance I can't access them. What is then often used as a public interface with set us and get us so I could set a date as the triplet combination of y and D and then have three get us forget the three elements back for me. So that's, that's a very basic sort of level introduction of what objects would be. One useful thing that I mentioned in the beginning for the motivation is that us a VP maps types that are has an int scalar or vector and reminder again are really doesn't have everything is a vector. So an int vector in our becomes an in vector by us VP is equal plus same for numeric aka doubles we have lists functions. As well as nested things so that just works so again in becomes integer vector or integer matrix if you want, double numeric char and strings as a character vector was become logical and complex, not used all that much becomes a complex So, let's work with that a little. So first example of working with the with the vector. And again this is just for illustrating some things the teaching example is given the vector how would I find the maximum of a vector. So, that's, if you wish a reduced operation of taking a vector of a specific but not yet defined lengths and returning a single element which is the value of the largest in it. How does one go about this well if the vector comes in I can query the vector itself to tell me about the size that it currently holds objects such as vectors are self describing that makes it very powerful to abstract this way. I don't really say just or it's just a pointer to to to to an array of doubles and then the links and you have to worry about the overrunning it or not. It's much more much better contained. So the vector can tell us its size. We then have to initialize the computations properly. If you have any less of generality the maximum value could be initial value loop of all other ones and if we don't find a bigger one, the biggest one will be this value, but the general operation will be. And I guess I left one inefficiency here because we could have assigned the initial argument. And again indexing starts as zero in C and C++ not as one. But then I'm again I'm looking at all elements of one too many because the initial comparison will always be false. So for each element I'm looking at is the current value larger than the working value for the max that we have. And then we just pipe out to to the screen that we got a new value and it is now M and store that value. And once that loop is over we're guaranteed to have looked at every single value, compared every single value and what I found the maximum. Again, new and C and C++ work from zero to just below the number of elements. So if you have 10 elements, the indices will run from zero to nine. So that works. So you can play with that. Just stick it into CPP function really quickly copy and pasting. You can even copy and paste from the PDF. And if you run that with four, five and two. We now know that on the second comparison, five will be bigger than the previously assigned value of four from the initial element. The first one that ran out now is five and returns gives us a value of five. You can play with that a little give different arguments here sorted pre sorted not sorted and see what it gives but that's how it works with looking at max. Similarly, you could work with sums of columns of a matrix by going in with the matrix and now returning a vector. So instead of reducing a vector to a single element now returning a matrix and returning. A reduction to a vector of here the sums or the max other things and here and showing you already one higher level of aggregation because we're going over matrix I and J. So there could be two loops. But because we can work with vectorized operations, we really only have one overall the columns. So we're first asking ourselves again well matrix how big are you and a matrix has the number of member functions rows and calls that give you this number. These countable things are often stored in so called size T variables it's just a variant of in sort of minor detail but we will see that relatively often. Because we're comparing against that I also use size team now as a value in the type in the loop. In the columns they are we can use that scalar value to create a vector of calls elements for result rest, and then our operation is from zero to less than the number of calls. Look at every column and compute the sum of the column elements of the matrix and assign it to the result. So that really what happens here in this pretty close to bread and butter stuff, then you may do for real work numeric matrix and vector of the go to types for matrices and vectors. Our common structures for for work there they also work on work on end on float as well as on characters. If you have text variables, we use us VP colon colon to make the main says very explicit and access rows and call dimensions. We used column to extract the column and some to operate over it. And that some is internally vectorized and will go over the entire vector elements and that again leads us to another example that maybe we'll we'll skip in the interest of time, but you can take the stance of the get max and just work with this a little and do other tricks loop backwards find them in do the some just just get a little familiar with some source CPP and a functional operating on vectors. Going further, where can we go from there now, especially when you interface other C++ code, you will often encounter STD types from the standard types from the STL the standard template library. And the most important of those is a vector of a given type. So a vector of doubles. And it's so important that we set up our CDP to seem to see interface with these as well, just so you can pass in a numeric vector from our and instantiated RCB numeric vector, you can do the same with the standard vector of doubles, because that's how you know algorithms you see in a blog or paper maybe described or interface with a different library and C++ interface basically works the same way, because we called it size for the vector for the RCB P type because the STL type has the same, and the operation is basically exactly the same. And I just skip the print to standard out here but but otherwise it's the same we get the same result. And these are very widely used so we support them they're widely used in C++ code and libraries is just one minor caveat if you go from an object to one of these STL vectors. The memory content has to be copied. That's because C++ uses its own memory management very efficient well implemented all the rest of it, but you don't always want to use that. We have reasons in our land and RCB P land to use the memory of the object directly so if you instantiate a numeric vector, no copy is made. That's also sometimes why we call it seamless. So if you copy transition from the object vector that you have in C++ it actually accesses the same our memory location internally and if you make those changes in our, then they come back to C++ so there's a minor performance to worry about that unless you accumulate really millions of calls. However, the fact that we're reusing our memory leads to a tricky difference. One needs to be aware of. So let's go over the slide for just a second. A really simple function that I called set second set the second element where a numeric vector V comes in and very arbitrarily I just an unconditionally I said V of one the second argument because that's a zero to 42. If I run this and assign one two and three to V and say sex second V and recall this is a void function doesn't return anything so the function doesn't print. What I'm printing then is the object V that I had before called the function looking at it after the function as expected 1423 what happened here is a side effect. I'm giving in the vector V. I'm changing it and the change is effective on the outside. And that really is a sword with two sides. I personally really like it because it makes these operations really efficient, but you have to be aware of it. It's a bit similar in spirit if you wish to how data table is highly efficient because it operates by reference on the vectors and mutates them directly. It is not as computer science see functionally clean because it doesn't treat this vector as immutable. But that's just the way it's designed. What you have to be aware of though is there's one very important gotcha. Remember that I'm called this guy numeric vector of V what happened when I call it with one L two L three L because now what is this vector. It's a vector of integers. If I then go in and say set second of V and print V I get back one two and three not 142 and three why is that. Well, here's the trick. That's the typing and the casting again. This is an integer vector. The function wants a numeric vector. The whole sort of runtime system behind us has to create a copy of this integer vector to create a numeric vector always been like that. That's just the way it goes. And it's then this distinct numeric vector that gets altered here. But because this was a temporary copy. It doesn't survive out on the outside. So be aware of this gotcha. This can be very confusing and head scratching. What happens here is actually documented and expected. But what happens here is that you don't get the documented and expected, albeit weird effect, because you will be working on a copy of the supply. Takes a little bit to getting used to so you may be sort of scratching your head now but it sort of makes perfect sense. Someone has to work with the examples a little. And I basically just gave you exercise six by discussing. So bit more than on particularly our CPP vectors. Remember that we've wrote the teaching example to get the max offer vector. We can do much better by just invoking the max function on the vector. That is the sugar stuff that I mentioned earlier with power. And we have a couple hundred of those defined within our CPP that just do the normal things under the name and behavior that are so min and max of course also P max and sort of things like that. However, they iterate over our CPP numeric vectors so they don't automatically work on STL vectors on the standard. So there's another extension package that deals with that but by default they don't. Having said that so sugar is really nice and gives us a little bit of superpowers. I'm seeing that there's more questions in the chat but certainly I can't get to them because I was just going to go to the city early on so so later on that. We have vectors we have matrices so you think you could do things like linear algebra projects as a regression with it and you can't. So it was the fact that I once as a young man and PhD student worked a lot with matrix libraries in C++ and learn the hardware just how involved and how much work there and so I shied away at the very beginning of putting all that into our CPP and kept us to be more on the flow of data back and forth. So if you want to do mass operations, you're much better off relying directly on a specific mass library, one of which two of which we've actually packed pretty well. I can and particularly I'm a deal now used by over 750 packages so that once that once you go to kids. And great that we still have time for that I'm aware of course that with the two hours. We can't come out quite as much because I've given you about CBP CBP functions so CBP. And the key thing really is packages, because I don't want to leave you after this tutorial and all you've seen the source CBP because then you feel like all the work that you can do as a view or some of the sources. No, no, no, no, no, no packages are how our works packages are what you put to cram packages are what you share with your friends co workers p is bosses, you know, aunts, uncles, everybody so it's very important that you need to know how to package with our CBP and luckily because it is so important. We put helper functions in right in the beginning. So you can invoke us be package skeleton, it generalizes the base our function package skeleton. And we have similar ones for a couple of the other packages on a deal has one I can choose L. The other one is also in our studio. Just how I showed you earlier how to get to a basic c++ files to experiment with, you can go in not under file new file but file new project, and then a package and I'll show you that in just a second. So there's a vignette on more aspects of this because there's always a bit more that can that needs to be said than than what I have time with for five sentences and packages really matter. And that's really how I said we got taken up this over 2000 of them now which is just huge and over 200 and bio conductor. Yes, I updated this screenshot in the older PDFs from the previous workshops were still one from years ago so these are the colors that it looks like that we bring my current session over so basically if you want to do a package. It's really straightforward thanks to all the ID work that our studio put in just go to file new project it generally doesn't noise me so much the resizing just always lose the bottom you okay let's just do it. Full screen file, new project and then I generally asked you. Do you want to create a whole new directory or work with an existing directory or do you want to pull something from get as we answer generally use new directory and then depending on what you have on your machine. Because I have a CDP and I'm a deal and I again must be parallel to shiny I have a bit of choice here so, but generally what I go to is either a CDP or the deal, then you just select that and say where you want it. Say, demo system type correctly with the demo package. This is a sub directory of temp and then you can say whether you want, you know, to take over the current session or create a new one, which is the same one. Then you do that and the same works on what you're going to call it. You get a couple of files pre created as a combination of our basis pattern skeleton hours is this one I still put in and then you basically see what's there and there's a very basic. Hello world for our CBP that returns a list something that we didn't got to list containing an American character and vector and similar we also get one. Just like that for, for my deal with some vector option I think that's what I want to slide some extra or exactly. So if you do that with Amadeo. If I'm being vector oriented. I made it a relatively cute little example of taking in a vector where explicitly say, I want this vector to be a column vector not a row vector. It's a column vector of course it's n by one. So if I do n by one times vector transpose one by n out goes and n by n. And this is an auto product turning a vector into a matrix, whereas if I come in with a column vector transpose first and then multiply with a column vector times n times one creating a one by one and then Amadeo convention if you want to assign the result of a vector matrix operations to a C++ plain old data type atomic scalar you have to explicitly convert this a scalar to to to be explicit and in the generated files you go to package file new project project. There's a third one that then returns a list with both the auto product so that's something really nice to work with if you like or to get used to Amadeo. A little bit of context here and a bit of hand waving because it's a complicated topic that is actually even more than a tutorial length packages work really well if they're well enough contained and by that we mean that if you want to use internal code, you can just take it all and stick it in the source directory and build it along with it. We did that for example once was as we be ML pack using the one point series of the upstream and the pack, which it just contains the full source of ML pack, which made it go stale to because then I'll pack had changes with external dependencies that we couldn't satisfy as crans so this one then fell behind and we're just catching up this summer and that's the aforementioned some of code project. I'm commentary where v3 actually brings us an update back again in the middle we tried relying on the external library being present on the system, which works but makes distributing tricky because if it's not there well then you can use it and ML pack is not as a library at crans so this never built for GSL earlier and others were using it so that one's there so I can can use on that the trick there then is that you have to deal with the auto discovery of where the libraries are on the system and quote them in your package. Aspie GSL is a reasonably good example for that, or a third and really easy alternative do what as we be on my deal does and pick C++ libraries that are template headers on so these are libraries that are defined just in headers and don't need to link it. And there's a bit more than that. Aspie we just had a package update to 105 last week, where a new vignette got added from a paper that made last year where go over the steps and bit more detail how to create a packet with external library so that's definitely a day two or three sort of more advanced as we be talking, not really for today. And then just before we close, because we often, we often want to do mass, I should show you a little bit of as we deal. This is a screenshot of a while back of its website. It's meant to look and feel like madlap so highly expressive C++ that just looks like high level math linear code so it's really neat. And then just between high performance and ease of use. It's, it's very widely used, as I said over 700 packages already at CRAN, but also millions of download format lab and used in other commercial project. It's another very liberal license you can basically do whatever with it. Conrad's very, very, very open to that and if you should need it, you can also, and it's well implemented performance and widely regarded so there's a lot of documentation. I just wrote one about SVP, Amadeo as well a couple of years ago and otherwise, Sanderson, it's Conrad is the author of, of Amadeo if we have an SVP Amadeo and Curtin, Ryan Curtin of an LPAC. One reason why we like an LPAC is that uses Amadeo as its internal matrix vector representation given that we already have asked me Amadeo getting these objects to our MLPAC is a really nice, nice extension, but we'll be more coming later in the summer. Again, when we work with these, you just have to say linking to the package generator helps with that automatically and there are many packages small and large that you can look at to see how they're set up. How does all this work? Well, this takes up the earlier example of the column sums that are done with the numeric matrix from RCPP and numeric vector. So here we're doing the same in Alma. Here I'm saying row vector rather column vector as in the auto inner product example. Otherwise, it's very similar. The dimension information has like different names. So here it's a dot and then identifies but no colon. So it's not a function as part of the map object, but a slot, a data type. So constantly you just access that but that tells you how many columns they are on the matrix. In other words, the flow is the same of what we used before you create a vector of size calls loop of all the columns and some of the columns. Amadeo 2 has high level functions that give you the sum of a column directly rather than having to look at every single element. I find the Amadeo documentation really useful and accessible. It's quite well done. We stuck in an RCPP depends here at the top that basically tells us that, oh, this is not just a plain RCPP package, but one that needs to go to the RCPP directory as well as the RCPP. Amadeo directory to find the headers and that's then all that it takes for a header only package. As I mentioned, n rows n calls us slightly slightly different here. And here it shows low bank you can use a call vector. And that's just that. And then you know other things are there as well and you want eigenvalues you just take eigen, eigen of a symmetric matrix and much, much more. Many examples at the gallery that we'll get to as well as a farmer do itself and amadeo's website and compare that with what one gets from itself. It's a little math trick here. Eigen values are defined by the values but not the order. So this is the correct and equivalent answer because if you get zero and two back for this matrix, it really is the same as two and zero are just has these ordered the other way around. But they're equivalent. If we had more time, I would let you work a little bit on in an auto product of the vector what we saw working with transposing a scholar, which is, which is there again, and, and again in a package, we just go to ask me p amadeo dot package dot skeleton, or do it from our studio from the same official earlier. If you're Mac or you may have to do a little bit of extra work at times the photo mobile is out at other times you have to do extra work for open peace so there's there's always a little bit of work involved on Linux, it just works and flies, so no issues. And in closing, one example to of something else motivating that that's really dear to my heart I first wrote this function when actually I may have written it before we had asked you to be when I said he was really embryo to get not quite there yet there's a fellow who writes to the health mailing list every now and then and financial economists which was once my background to and in economics we sometimes worry about the power and size of tests and simulate them and he wanted to do a lot of L n fits, but needed, not just the point estimates but also the standard error of these estimates to evaluate those and that you can't get back from us fast and fit function. So the idea was to write a faster one which I first did with us with the GSL and just not to give the story away, but there wasn't actually going to be disappointing what I got with with the performance code, but anyway I carried that on over to the other examples to and this was the very first version that we had in in us with the deal. This was before we had us if it be attributes contributed by by JJ I think around 2013 couple of years in or 1212. So anyway, just about the time in the book was finished so it just barely sneaks in. In the beginning we still had interface explicitly with sxp rather than types where try catch blocks all that went away. The next version then already, it's much more readable and concise because we're working in our direct types, but we're still doing a double dance that we're coming in with our CBP vectors and matrices that we're then assigning to Amadeo types in two steps. That was the way to do it efficiently because from our to these usbp types is your copy and then we said these ones up as your copy by setting this false yes don't copy, and it took another iteration and some is always excellent work of roman to make it the third generation when it's a very end. We just had our mat X and call that the Y regress Y on X goes directly through everything in our types, the actual regression in map that lingo is the solving of a linear system of X for Y residuals then our Y minus X times core as you would write it on paper. It's beautiful. You can extract the scalar six squared as the scalar of residuals transpose times residuals scaled by degrees of freedom, get the standard error estimate out and with that can then return coefficient the desired standard errors. And fit didn't have as here also the element. Again, that was sort of a bit of the discussion how the interface changed and changed in performance. We looked a little bit at the performance between all of these. So this is basically for a small example 5000 calls. And these are all equivalent not really changed that much. These were then alternate case that are a little behind but the big big uptick is that when a little bit later I tried to be cute and not just provide these functions as matrix vector but do it the way we like it and with Y tilde X invited as a formula, the performance takes a massive hit, and all the gains that we're getting from doing the regression C plus plus are lost. Because we have to do so much our work beforehand to unpack the formula and create us the X and Y arguments out of there so that's a really nice illustration again off. Be careful, measure, measure, measure and think through what you implement because sometimes you may do all the work to get a speed gain from compiled code and then lose it because the way you invoke the code. The cost is so devil is always in the detail and again here is when these two out and just as a replication of the subset, all these run sign and exemplify which is I think still in the package. They're more or less equivalent but the more modern version with the concepts as quick as the other versions. So that's basically all I had time is up it's just further references now. So the package has a bulk of PDF vignettes and help pages. Some of these vignettes have been through peer review and came out as papers. So, as we be and as we be Eigen have one each in JSS as we be I'm a deal, which I covered with the Eigen was with debates. This is I can read for the idea paper with Conrad and CSDA. And then James and I have another updated as we be intro in the American statistician. There's a meeting list which is pretty low volume and no nonsense that you should absolutely use and for standards band tests, you need to be on the list, not opposed. We also fairly religiously answer solid questions and overflow. And there's a number of blog posts and other things throughout the world. Another brilliant JJ idea was to set up a site called the gallery. This predates the who go and block down days. So this was still built with Jackal but it's basically by now 110 also little stories, particular, you know, sort of two page descriptions of a particular problem and how to solve them. So I put initial, how to use as we receive us as 11 in other things. So there's quite a few mayors there and because it's been there so long and has relatively decent Google juice. So when you search for something it's it's a good resource. I have a book out that you can get from Universal Library or order directly expense on some of the things that I added here, and you can see the amount of detail and juice the examples are still mostly in before attributes. So attributes is just covered a little but that doesn't really take away from the general coverage. And then here is my usual final slides with the link to where all the presentations including this one's on my website where you can get by mail on GitHub or by Twitter. And that's really all I had. Yeah, with that, I think I'll just close, hand it back, maybe keep the zoom open and try to hit a question or two as I can hear or else invite you to ask them at stick overflow or the, or the makers. That's all I had. Thanks for hanging with me for two hours. Thank you, dear. We still have an answer question on the chat that says, what is the difference between our CCB and STL vectors. I saw that out of the corner of my eye. I mean, I said that really briefly in the slides, but over two more minutes, it's really, you have to sort of step back and come and realize where that defined it sort of STL vectors are the C++ vectors professionally rewritten, written, rewritten, reviewed library code that you can't get any more efficient than it is so that is always what you should use as a vector. There's lots of guaranteed, as guaranteed performance characteristics due to its implementation, but it's, it's the C++ one and it will live with its own memory allocator management and all the rest of it. So, in order to get them to and from our we always have to invoke a copy. And that's just the price of the transition that we're willing to pay because it's very important to have access to them so you can instantiate them directly from our via RCPP to have standard vectors of in double types, even in other types and it's very important because a lot of C++ code is out there and will expect you to hand it, say an STL vector or an STL list and maps and the other ones that we cover as well but they live in different spaces also physically in the implementation which makes the limitations might be different, and makes them closely interoperable but not entirely. And so there's limits to what you can do I don't think we ever, you know, allowed you to to mix you can't just say, as a BP vector I think with some of, you know, X one and X two where one is an STL and the other one is an RCPP vector so basically, if you don't need the STL vectors because the code that you're writing to solve a problem is self contained you're expressing everything in the RCPP vectors, you may want to stick with those, but whenever you're on route to interfacing via RCPP to other libraries you probably won't be STL vectors so they have slightly different goals aims scopes and certainly implementations but in order to be comparable, they have a bunch of identical operators so I mentioned size and length, because the STL vectors have very well implemented operations that append at the end or at the begin pushback, push front, and those are big O of one operations the way these vectors are implemented and have the memory laid out. We offer the same functions for the RCPP vectors but warn directly that because these are our objects which have a simpler representation memory or are headed when we wrote them 12-13 years ago, appending a single element and growing them is expensive so you shouldn't use pushback and push front on RCPP vectors rather STL vectors so if you need to dynamically change their size you're better off with STL ones. The devil's in the detail but it's an important question, it's a deep question and among the 2000 plus overflow questions there will be at least 100 dealing with the STL vectors, it requires a bit more study but by and large they're similar but distinct and distinct for a reason, I hope that helps. We have three more questions on the chat, I don't know if you prefer to read them? Yeah I can see them now so let me, yeah so how do I manage pointers, that's loaded, because I said in the presentation earlier that you generally don't need to know about pointers because you get these STL containers or our ACV containers so for example I showed you when we did the column sums of a matrix that we just acquire the end of how many points we have. There are many columns there are and then assign and result objects and say a vector of that size if the variable is called n. So no pointers here and everybody and no memory management explicitly I just say if this object has 10 columns I normally result vector of 10 columns. It's dynamic, it says runtime and operates with it and I don't have to manage anything really behind it so in that sense that's automatic. So go to it if you actually do need pointers, there's also ACV types to deal with pointers, R has a concept of external pointers and we deal with them as well because sometimes you have to do more involved work. But basically for the simple stuff it's covered in automatic and you don't have to go there and the more complicated stuff is possible so that's that's that. Now Joachim's question to call an external library so that's basically in two hours or three hours in the workshop is really too short I mean I really I mean one day should just do a webinar just on external library so basically you have it there because that's one of the three cases that add in three bullet points. Taking a full copy and sticking in the source directory is definitely one way to go about it, it does of course require that all ingredients needed to compile the external libraries are already present if that library itself depends on something else well then you have a recursive problem so if it depends on QT or GTK for GUIs or you know some other sciences stuff and you have to bring that in as well so that that can be complicated. The structures and classes are handled. You basically I have an ability to write auto converters for the way in and out. We call that as and wrap as converts in take something from and then represents it as your types object. It's a bit involved but entirely doable. Some of the packages show it as maybe GSL chose it to take our vectors and stick them into GSL vectors and there's an entire vignette on that topic. But again that's it's a it's a really tempting one because you may have an interesting library and want to bind to it but it's it's not the easiest one good write up is on the RCB gallery under the title of custom as in wrap converters taking a relatively simple structure but but in essence. The magic of RCB P is that the C++ compiler will invoke these translators for you automatically each and every time you don't have to do anything, but they have to be there. So you have to write them in the first place because we wrote them for things like I'm a deal and I'm a do you kind of just say well you know on this interface I have a numeric vector and the compiler will just say sure you have it. So you basically you have to fill in for things that the compiler hasn't seen our libraries that we haven't set up the missing link to connect that library and our representation and if that's there. And what's written correctly, then it just works and it's magic and then you could say don't cry on publishing paper about it become famous whatever so there's definitely a mechanism for you to implement this, but no free lunch, it's not automatically written for you. I think that was the third right so. Alright, good. Yes, great questions. Thanks for everybody for sticking to the end for those who didn't this will be taped so you can get it canned in a while and again the links are there. Come and ask on the main news, like overflow package develop other places we try to help as we can. Okay. So thank you everyone for joining us I just put the link to the other tutorials on the chat in case you want to join one of those. Thank you again for joining us today and everyone we're going to post the link to the video in the next few days and the links also to the slide on the meetup so you can check them there. So thank you everyone and see you another moment. Bye. Thank you for being a lovely host and setting this up. Thanks so much.