 Thank you, George. First of all, can you all see my screen? OK, have a reply not yet. OK, great. Perfect. So hi, everyone. Thank you all for joining today. I would also like to thank the ArcRessortium for having us, especially Colleen and Laura, for organizing this event. I'm really excited for this presentation, as this topic has right and profound implications. As you all will know, companies in general, and I can assure you, insurance companies in particular, we spend a lot of time and resources rewriting the same reports over and over again, often following the same processes for many years. But many of these tasks can be automated and significantly improve with more up-to-date tools like our Markdown or Quarto. In this session, I will provide a brief overview of our Markdown and the messy workflow, a story inspired by real events. And the point is to show how our Markdown can improve such a messy process. Then for the bulk of the presentation, I will share a report generated with our Markdown by querying the database of a fictional insurance company. And don't worry if you don't have a background in insurance, insurance like any other business, it updates revenues, called premiums, by selling a product, in this case, policies. And we can think of claims as manufacturing costs. So what we show here applies to any business or any environment, say your school or hospital, different attributes and metrics. But the idea is always the same, to produce reports automatically from a given dataset that can help the business to make the right decisions in a timely manner. So let's start by asking, what is our Markdown? Well, our Markdown enables us to create dynamic documents, presentations and reports with R. It combines the core syntax of Markdown which applies formatting elements to plain text with R code that will be included in your final document. I simply like to describe it as a text editor that supports R code. And the reason why it is so powerful is because it enables these documents to become reproducible. And by reproducibility we mean it can be automatically regenerated whenever the underlying code or data changes. So some benefits are immediately obvious. It is less prone to error. If it's automatically regenerated, it has less manual work, hence there are less mistakes. It also makes it easy to readjust your code or data and we want all the analysis. So you're much more likely to explore different ideas since you don't have to invest too much effort to see the outcome. It also makes it easier to share your work with others that can either peer review it or further extend your research. And if you're like me, then by others I'm including your future self. And reproducibility has become extremely important in scientific research as there's been many cases of scientific misconduct over the last few years. But reproducibility is also crucial in a business environment. We can easily see the importance of reproducibility of a clinical trial for a pharmaceutical company or a pricing model in an insurance company. But when it comes to automate a recurrent report such as a quarterly financial statement or monthly sales report instead of a one-time technical study it's not always possible to have the final document fully reproducible. Either some parts require careful wording or we may have to add or remove new sections or for any other reason. But the goal is always to automate as much as possible and only break the chain of reproducibility when we really have no other choice. Now someone might say, well, that's all very interesting but already have an automated workflow in place. You do? Okay, so let's have a look. And let's call it the messy workflow. So here's a team whose goal is to produce a monthly report using policy and claims data that is stored in an SQL server and financial data that's available as CSV files. The SQL server is queried with a tool called golden six and then the resulting files are processed by SAS which applies filters to the data, groups, renames, variables and so on. Then sends its output to Excel to run some macros along with the CSV files. And these are what I call the three strikes macros. They are slow, unstable and worst of all complex to update. Now you may ask, but why don't they do that in SAS instead of Excel? Well, because Excel can produce charts and tables that are consistent with corporate branding. And in order to do that in SAS, you will need to purchase additional libraries. Not to mention additional licenses for some of the remaining team members to help to migrate things. We're not excited to learn a new language that they may not be able to use again in case they change jobs. And there's no budget for it anyway. So we are stuck with this workflow. So now all the exhibits are done. It's time to copy paste them to Word. We have to make sure that everything aligns nicely that we don't miss any table or leave something that belongs to previous version. And of course, update all the remaining text. Finally peer reviewers make sure that the totals in the text matches the totals in the tables and the numbering of figures as well as their references are all consistent and whatnot. So plenty of systems and people are involved in the production of this report. But finally it's done. It's been a really intense week. Everyone is tired, but at least as a sense of accomplishment, we did it until there's a new booking. And here we go again. It feels like Groundhog Day. And if you don't have a new booking, you always have next month to look forward to. So we are spending hours and hours, we're doing the same things over and over in the so-called automated process. So we have to rethink what we call an automated process. And the question to be asked is why not do that all with R? R is an open source platform that allows us to import and transform data, build models and visuals, and communicate our findings without leaving a single application. Now, conventional wisdom would say, jack of all trades, master of none. But in this case, we have a Jedi of all trades. So let's look at an R Markdown file and see how this works. An R Markdown file contains three components. There's the YAML header component, the text component and the code component. The YAML header is surrounded by three hyphens. And this is where we define the structure of the report such as the output format. In this case, we have HTML, but it also supports PDF, PowerPoint, Word and several other formats. Then we have our text component to be formatted with Markdown. For example, proceeding a text with two hashtags makes it a level two header. This is perfect for a section title. Or surrounding a text with two asterisks makes it bold. So this is the Markdown syntax. This is what I mean that Markdown applies formatting elements to plain text. But we can also format the text with a graphical user interface by launching the visual editor, which is really nice. Last but not least, we have our code component. And we can add code in two ways. Either via code chunks surrounded by three backticks and then we just write R as we would in any other R script. Just don't forget to type R in curly brackets. And we will see why in a minute. Of course, there's a shortcut, Control Alt I, and it does it all for you, assuming you're using RStudio. The second way is what is called inline code. You can type R variable surrounded by a single back tick and hard coded text within the same line. And then you just hit the knit and voila, our document comes to life. And the best part is, if any of the underlying code or data changes, all we have to do is to re-hit the knit and the document is updated accordingly. There's no copy paste involved. Now you may say that's all very nice, but what about my SQL skills, Python skills, Java skills? Well, you're welcome to bring them along because our Markdown supports many other languages besides R. And now it makes sense why we need to type R in curly brackets. So in fact, we can restate our initial definition and say that our Markdown is a text editor that supports several programming languages. So let's see how we can use this to ultimate a report for a PNC insurance company. So Remarkable Re is a fictional insurance company that provides commercial liability. And the database of this company contains policy data, claims data, costing data and metadata. So just a quick overview, policy data has all the information attached to each policy that is sold. So these descriptive attributes are pretty self-evident. This UWI is the year that the policy was written. It comes from the inception date. So I think only these numerical variables require a bit of intention. So premium is just the revenue from selling policies with the sales amount. Acquisition cost is just commission paid to brokers. So that's easy. Paid losses is really what it sounds, is what the insurance company has paid in claims, even easier. Case reserved is the money set aside for each reported claim before it's fully paid off and is estimated by the claim handler. No big deal. The only odd dog here is IBM R and it stands for incurred but not reported losses. A claim has happened, but nobody has told us yet, but we still need to account for it. And this is estimated on a regular basis by the reserving actuaries. The term for total loss is ultimate loss and it's the sum of all these three amounts. When we say reported losses, we mean paid losses and case reserves. We don't include that DNR. Then we have our claims data. Each reported claim has a claim ID and a description as well as the policy number that generated the claim. Then we have costing data. There's a statistical tool that estimates the ultimate loss for each policy at the moment that it's sold. And finally, we have our metadata. It just tells us how many financial periods are available. For the time being, we only have two. These are end of September of 2021 and end of December of 2021. And now we want our markdown to generate a report by querying this data and nothing else. So what type of reports can be produced with this data? Let's look at an example of a document knitted in Word. And Word is great if you want to post edit your document. And if you want to knit to Word or PowerPoint, you most certainly want to use the packages from the Officeverse. So we note that the document is consistent with corporate branding from the colors, form type, logo. And this is done in the YAML header by pointing to our reference file so that our output adopts the same styling as that document. The YAML component allows for several other options. For example, TOC instructs Word to display the table of contents. Another great feature is that figures are numbered automatically and it supports cross-referencing. So we never have to struggle with these things ever again. We know that in the introduction, the markable is written in red, which is a formatting style not supported by markdown. You can't just call her a little piece of text with markdown. As this Word does not support HTML, we have to use the officer package to be able to do this. And here's the code on how to do that. But the main idea is that just because the feature is not supported by markdown, chances are that someone in the extensive and resourceful R community has already provided a solution. Since we have current quarter data and last quarter data, the first thing we want to know is the technical result. But let's look at the document itself. Here we go. And what is nice about this is that this is a native Word table. It's not an image. And this was rendered with the FlexTable package from the Officeverse. And then we have the comment, the book experience a positive technical result of 4.7 million for contract years 2014 to 2021. The word positive is defined with the simple if, else then condition. Well, 4.7 million and the contract years we get by subsetting the data. And then we use inline R code to write the sentence. And this approach is used over and over throughout the report. But we can think outside the box and take this concept a step further. And this is what I have here with this sentence. The positive result is mainly driven by a BNR releases due to low loss emergence. Now, how does it know that? The short answer is that a BNR contributes with 4.5 million to the result and no other number comes close. The longer answer takes me to these additional slides. And this is a concept I came up with several years ago when I needed to comment on the financial impact of my reserving portfolios, which was very time consuming and was actually implemented in Excel. The main idea is that we start with a series of type of amount that add to a total financial result. If the side of the type of amount matches the side of the total, then the item is a driver. If it has the opposite side, it is an offset because if there are large movements in the opposite direction, I want to mention it. Then we can normalize our values. For example, by taking the absolute amounts and dividing each absolute value by its total. Next, we split the table in two. One with the drivers, the other with the offsets and we sort them in descending order. On the side, we create a series of prewritten sentences. Some of these items can have both a positive or negative impact and therefore require different sentence for each case. Then it doesn't matter of performing a VLOOKUP to add the right sentence to the driver's table. Now we define a threshold, say 0.15 and only include the items whose normalized amount exceed the threshold. To make the text sound human, we can add adverbs and adjectives. For example, these labels of magnitude depend on the value of the normalized amounts. For example, if a normalized amount is between 0.5 and 0.85, the adverb mainly will be added to the text. So now we are ready to start a loop that will glue our sentences together and then apply the same logic for the offsets. We can go to a more granular level if the table that's in the report. If new business is a driver, we can start a loop displaying the top new contracts, for example, to add insights that the reader cannot simply get just from the table that's in the report. If your report contains many of these comments, the text will sound quite repetitive. So I like to add some randomness to certain words, for example, mostly and mainly or and as well, but also can all be used interchangeably. So for my reserving comments, it took many iterations before I covered most of the combinations that reality can come up with. But at some point it stabilized and then it was absolutely magnificent. So it is possible to create a sophisticated comment algorithm with only if-else-then conditions. Well, let's go back to the report. Next, we show the top 10 loss movements. And what is nice is that if it's a new loss, it will mention that it's new. And if it's not a new loss, it will say it rather is an improvement or worsening as well as the total to date. And this is simply done with the loop that links the words and sentences together. Then we have the exception to date results, showing the main KPIs by underwriting year. A combined loss ratio is just the percentage of premium that is used to pay out commissions and losses. And it's automatically highlighted in yellow whenever it exceeds 100%. And from the text now, it should be easy to identify which words are hard-coded and which ones are added with inline code. Then we have the same, but by region. In this next section, we compare the expected loss ratio coming from pricing. So the initial one, which are the black dots with the latest evaluation coming from reserving, which are the red dots. For each underwriting year, including an overall average. So this is an actual versus expected type of analysis. While the average difference is close to zero, these are trending in opposite direction. So the pricing actuaries and reserving actuaries should come together and agree on how to assess these risks going forward. But because the average difference is very small, the adverb only is added to the comment, again, to make it sound more human. Then we have the same view, but by region. And what is nice is that we can insert links to a particular section. This is a native markdown feature. So clicking here takes me to Appendix A, which shows the same information, but by industry. Because there are many industries, I swap the accesses and only highlight those that have material differences. So in all these charts, the idea is that the reader should not have to make a lot of effort to interpret the information. So next we have our top five section. And for these tables, I used another package called Cable Extra. So these are no longer native word tables. They are images, so they can't be edited anymore. And in the last three tables, I use one of my favorite features of Cable Extra, which is the ability to include mini plots in sales that already contain figures. For example, in the most profitable table, we see that Luca Moderic was not renewed in 2021, despite being the third most profitable in short. But by looking at these mini plots, we can have a good guess why that was. The premium and technical result have been declining every year and even generated a small negative result in the last year. So while the overall result looks good, it was not an attractive risk to write this year. On the other hand, Claudio Marchizio was renewed in 2021, despite being quite unprofitable. But why was that? Well, because it had only one claim out of four policies, which can happen, that's why people buy insurance. The premium was adjusted and hopefully it's just a risk that got unlucky. A more controversial decision, however, was to renew the least profitable in short of the portfolio, which generated eight claims out of six policies in only three years. But the main takeaway is that it's really amazing how much information we can derive from these tables by mixing numbers with mini plots. I find it very powerful. In the next plot, each dot represents a reported loss and is called according to its underwriting year. The x-axis represents the value as of last quarter, while the y-axis represents the value as of the current quarter. For example, we see that the largest loss in the portfolio around 4.3 billion at the top right quarter did not move from one quarter to the other since it lies on the identity line. If it's below, it has decreased and if it's above, it has increased. So all losses lying on the y-axis are new losses since they were zero in the prior quarter. So we see that the largest new loss is just under one million. So it's very useful to assess the current status and the latest movements. Then we have the same, but for case reserves instead of reported losses. Next we have a loss batting section where reported losses are grouped in bucket sizes. And here we can see that the layer of 500,000 to 2 million contains 43% of the reported amounts, which is good to know when offering higher limits on when we need to buy reinsurance. And also that 77% of the policies have not generated any loss so far. But we can visualize this information for each individual policy by underwriting year, which is very insightful. So we see here that in the first two years, there were no losses at all, but they only make up 5% of the total portfolio. As the portfolio grew, the proportion of loss free policies decreased every year, except for 2021, but these still contain many unreported losses. But the idea is that we don't have to wait until the upper layers are hit. The activity on the lower layers can serve as a warning sign. The following chart shows exactly the same, but by region. And we know that Disneyland has the highest frequency, only 71% of its policies are loss free. On the other hand, Shogi Land has the lowest frequency with 97% of its policies in the nothing reported band. So what can we say about severity? Well, Neverland makes up only 7% of the total number of policies, but contributes with 25% of the total reported losses, which is not that great. And all that is highlighted and explained in the text below. You can also use latex with markdown. And yeah, in this section, I wanted to show that we can draw a chart for each of the regions by only adding one light of code. So let me go back to the PowerPoint. Yeah, so this is what we have here. So this line is all it takes for it to create one plot per region. In this case, we have six regions. With Excel, you would have to create a chart for each region, copy paste them to word, make sure they all look symmetrical, and then repeat the whole process every time you have new data. But here, once we created our plot, we just needed to add one line of code to have it for all the regions. And of course, if there's any new region, it would just be added automatically. Let's go back to the word file. Yeah, so in this section, we have text mining. And the idea here is that because we are using R, we have an arsenal of packages that we can use to enrich our reports. So it outputs up a whole new world of possibilities. So we should make use of it. And because I don't assume we can automate everything, I leave this section to be populated manually. Well, it seems that these items are impossible to automate. We should never give up without a fight. So the challenge is, what can we do to partially automate this section? Well, the information is likely to be available somewhere in the IT landscape. So try to get it. So while word is nice for post-editing, HTML is the format that supports the most amount of features. And it's by far my favorite format for several reasons. So let's look at the same report, but in HTML. Note that now we are using a CSS file to style the report. So this CSS file contains a corporate approved style. But you can also set a bootstrap theme. So this is the simplest way to style your report. We can also set the table of contents. For example, the float option just means that the table stays on the left when scrolling down. For inline coloring, we can use the HTML span tag. But this is not very nice to use. So to minimize typing and memorizing, which is always nice, we can define a function to make it easy to apply. And this is what I'm showing here. But let's go to the HTML document. So a nice thing to include in the HTML document is the collapse section. For maybe this could be too detailed for certain readers. Another nice feature are these pop-up messages. For example, if I hover my mouse over the combined loss ratio, I can see how it is calculated. And this is much cleaner than adding footnotes. But let's revisit the quarterly movement plot. So since we are using HTML, we can use a package called ggplotly to make the chart interactive. For example, double clicking on 2020 shows only the losses for risk written in 2020. But my favorite, though, is to hover on each dot and to see detailed information about that particular loss. I can also zoom in as much as I need until each dot is revealed. So we have detailed information of all the quarterly movements at our fingertips. We can also include a searchable table. So in the appendix, I have a table containing the main KPIs for each of the 636 insurers. So let's compare Cristiano Ronaldo with Messi, for example. Or I can also use the header filter, numerical filters, to see those that have the highest combined ratio. So it's clear that HTML adds a lot of interactivity to our report. But as you may know, R as a library called Shiny, which allows us to create web applications. So the question is, can we incorporate Shiny elements in our markdown document? Mark means Shiny, which is just another way to say yes. And we do this by adding runtime Shiny to the YAML header. And now we can embed Shiny inputs and outputs in our code chunks. In one code chunk, we build our inputs using standard Shiny syntax. And for the output, we can use any of the Shiny render functions to react any time the input changes. So Shiny components really extends the interactivity of our reports. So let's look at an example. So instead of repeating this chart for reported losses and case reserves, we can let the user select the type of amount. So the reader has total freedom to choose when he wants to see. However, my advice is to add these more elaborated dashboard components in the appendix, as we may distract the reader from the message we're trying to convey. For example, this is nice, but it's not very clear what I'm trying to say. But moving this to the appendix allows us to have the best of both worlds. But changing the coloring from other writing, year to region, I think it's perfectly fine. And it's a nice feature without being distracting. The same thing with the word cloud. So I can add a slider to display the number of words that I want in this cloud. So what we have done here is already quite remarkable. We created these reports only using the information available in the database, just by pressing a button. But for me, the most powerful feature of our markdown is parameterized reports. And the good news is that it's quite easy to implement. If the head of Seagoland comes to me and asks me to produce a report just like this one, but only for his region, then I know it's just a matter of time before the head of Disneyland asks me the same. And before you know it, all the regions will want their own dedicated report. So instead of creating different reports for each region, I just add a parameter for region to the current report. And we do this by typing params in the YAML header. So here I created a parameter called region. So in my code, instead of typing region equals Seagoland for the filter, I refer to the params region. Then the knit button has an option called knit with parameters, which will show a menu reflecting the parameters we created. Just like this one over here. So now I can just select a region from the dropdown menu to create a specific regional report. Is it possible to automate the workflow even further? Absolutely. We can use the render function and pass the parameters in the list in the params argument. It's not a big deal to use a dropdown menu for a handful of regions. But if we need to create a report for our key accounts, let's say we have a hundred of them, well, that's gonna require a lot of clicking. So in this case, it's better just to insert the render function inside a loop routine to create as many reports as needed in one go. What else? Well, with RStudio Connect, which is a paid publishing platform that we have available at Swiss Free, allows us to deploy our reports securely without IT support. So the users can access these reports with only a browser. They do not need to have RStudio or R installed. In fact, they don't even need to know that R even exists. It's all based on the point and click on the nicely laid out browser. And what is great is that each parameterized reports can be saved in RStudio Connect so the users can make their own selections. Moreover, we can add a mailing list and instruct RStudio Connect to run and email these parameterized reports according to a predefined schedule. Well, that is full automation. You don't even need to press a button. So from the messy workflow to the R Markdown approach, we created reports much faster with less resources, but more importantly, delivered a significantly better product. And something I like to say is that if cars can drive themselves, so can your reports. And this concludes my presentation.