 Good morning, everybody. As Krone has explained, I'm Susan McKeever. And just to explain a little bit more about what my role is, I work in CEDR for a part of my job. And CEDR is a data analytics group. And we work with companies. I have member companies. And I'm a principal investigator with them. So I do projects with companies, whereby we help them with data analytics-based projects. And I'm also, in my day job, a lecturer in computer science. And my research area is, as Krone explained, in data analytics. OK, so what I wanted to do first was to talk about some questions that companies ask me, when we're going to go through the area of data analytics with them and what they might do. Now, given that I'm on the technical side in a researcher, they have that focus, so allow for that. But these are the sorts of common questions that I get. Now, this one is a fairly basic one, but it's no harm to go over the terminology. Because one of the things I get asked by companies getting into it, are data analytics and machine learning the same thing. And just to put a picture on it, machine learning is basically an area within data analytics. So data analytics is the wider circle. And there are other areas within data analytics, such as data cleaning. You need to have good data. Visualization is absolutely key, whereby you have good visualization to see what you're actually telling your users. Domain expertise, the ability of the domain experts to be involved, who know about the data, and various other areas. But machine learning itself, which is a major area within it, they boil down to, machine learning basically boils down to being clever algorithms that spot patterns in data. That's what machine learning is. Clever algorithms that spot patterns in data. And just to break that down just a little bit, the two big areas, and I won't say they are the only two areas because there's never only, but the two big areas in machine learning, the first one is clustering. And clustering is about finding natural groupings in your data. And if you translate that into plain English, when you go over on the left-hand side and you feed information such as your set of all your customers, your clustering algorithm will be able to find out spot patterns that naturally group them. So you might have customers that are purchasing certain types of products at certain times of the day. You might have customers with other aspects. It will find those for you. And it does that in what's called an unsupervised way. So hence the blindfold. You're not labeling data, you're simply feeding in information. And your clustering algorithms will do that for you. So good examples for that would be customer data, as I mentioned, sales data, insurance claims data. So for example, who's claiming? What are the natural groups of people who are making claims in your insurance sector? And are there ones that stick out in a particular group which may relate, for example, to fraud? The other side, that was clustering. The other side, major big area, is around the area that falls into one which is prediction classification ranking. I've put those together. And what that is about is just how we as humans learn. So if you look at the picture on the left with the baby and the baby's reaching for the star and somebody is saying to the baby, look at the star. And over the next year or two, the baby will see different forms of stars and it'll know what a star is, that it has five points on it and so on. It'll know what the features are of the star. And it's learning through examples which is exactly what we're doing with machine learning. When we learn through, instead of examples, we just call it data. But that's what it is, it's examples. And all it's doing for supervised machine learning is spotting patterns from known labeled examples and using these then to predict for the next set that aren't labels, oh yeah, I recognize that, that's a star, that's a customer about to leave our organization. Okay, now to continue on with that then, some examples, when it comes to supervised machine learning and in the area of prediction classification ranking, some practical examples are in the insurance claims sector we're about to start a project whereby we work with the company to help them do more on the side of identifying fraudulent claims. And to do that, we'll be going back through the data to see which ones were fraudulent, which weren't and building algorithms to try to decipher what are the rules and patterns that are leading to one versus the other. So when another claim comes in, we can then, or the algorithm, we'll be able to say yeah, that's this percentage likely that it's fraudulent or not. And likewise, machine learning is used heavily in the medical domain. So for example, with the x-ray and reading x-rays, if you feed in examples of problematic x-rays versus x-rays that are healthy, you can build your model. The same with customer churn and in each of your companies, you will have examples of things that you want to be able to classify or predict or rank in some sort of order. Okay, the second question is, where do we start and what data should we be analyzing? Is our own company data enough? And there's a couple of angles that I would emphasize in that. So the first one is, know thyself. In other words, know what data and what quality of data you have when you're starting. This is on the technical side. How clean is your data? How much is it missing things? Got noisy data and inaccurate and so on and so on. If you have data that is not clean, then you're not going to get clean answers out or correct answers out. No more than with the baby with the star. If you gave it a jaggedy looking star that didn't really look like a star, then it wouldn't understand what it was trying to learn. Okay, the second thing I would emphasize is domain experts. They tend to get devalued out of the machine learning conversation. Domain experts are absolutely key. The people in your company who understand what the information is about. And label data. If you're going to do supervised machine learning, then you need to have the ability to know which were the good examples, the bad examples, or whatever labels you're going to need. Okay, now the second angle on data I would emphasize is it's not just your tabular database-based data. It's also your unstructured data. And I'll explain a little bit more on that in a second, but the unstructured data is stuff that isn't naturally sitting in your data stores. And what I sometimes say to companies is imagine you had this sort of hypothetical army of ultra-smart people and they could read every customer email, listen to every customer's conversation, read every document you produced. What would you suggest they might be able to find? And that's what your algorithms are doing for you. The third thing I would mention is the data outside of your company in the outside world, and that's probably going to be big data. Big data is the term around all the sort of endless supplies when it comes out to social media and sensor-based IoT data. But in your business, external data, so for example, if you're an insurance company and you have telematics going into cars where they'll have sensors and you can monitor the driving styles, then you can save their risky driver and you can maybe monitor what happened on an accident and so on and so on. Every company is going to be hitting the angle on external data, being able to give them more information. So whether it's demographics, weather, financial markets, patient-related stuff for health and so on. Okay, the third question that comes up more frequently in the last year is do I need to be thinking about unstructured data in my business? Is it the next big thing? And what I would emphasize is for structured data, and that's the stuff that's sitting in your databases in your nice tables with nice columns and easy to work with once you've cleaned it, that's the sort of traditional data. And the features, if you like, features being the five points on the star and so on, the features of it are predefined, they're your columns. And they're the sort of traditional machine learning techniques that you're using with those. Tradition algorithms you'll have heard of, the SVMs, the naive bays, the decision trees and so on. Unstructured data, on the other hand, is the data that doesn't sort of naturally lend itself to that. It's your images, your text, your audio. There are no tidy tables. And you need to know what yours are in your business. Okay, so for example, you take the Irish Times to leap over to that. Perhaps the common data that's been put in against our articles online are unstructured data they will want to explore for some reason. But you need to first of all know what yours is. Is it customer emails, free text fields we can write in, comments, call center conversations, social media posts and so forth. And there's buried treasure in there if you can get at it. What is it that has been hidden in there amongst the text and amongst the information that's much harder to exploit? Although we have the techniques now. Now the reason I'm showing this happy man is Eigen technologies, this is just an example of many hundreds of companies that are getting money. He's turning, as he calls it, documents into data with his company. And they've just got very good funding for that. So he's discovered in a different way. He is actually opening the treasure chest. And he's just sort of discovered that they can create a company that appeals right across the sectors with unstructured data. Okay, the other last question that I have is everyone seems to talk about deep learning. Should my company be using it? And so it's deep learning everywhere. And just to put it in context, deep learning is a sub-sector within machine learning. Because it's a sub-section within deep learning. And it's sort of a category of machine learning based on neural networks, which have been around for a long time. The thing with deep learning and why deep learning is becoming so much that the thing that is making deep inroads in machine learning is it's great for unstructured data. It's great for analyzing video, text, images because it doesn't need to know what the features are. It figures that out. So it figures out the kind of columns that need to be done. It does that work for you. And there's an increasing number of frameworks out there, whether it's Google TensorFlow or cognitive toolkit from Microsoft and so on, that allow you to do kind of off-the-shelf deep learning if you're a company and you want to try that out. So it's getting much more accessible in terms of the actual application of deep learning within companies. Okay, the caveat, of course, is you need a lot of data to do deep learning. You need a lot of data. Now, there are exceptions to that for those working in deep learning if you're using vector models for text and so on. You can maybe get some of those models in, but that's a whole other conversation, which I won't go into now. Okay, so with those questions in mind, what Cedar do is we develop software demonstrators. They're basically prototypes where we address common data analytics problems. And we sort of use state-of-the-art techniques to crack the nut on particular problems that our companies tell us about, our member companies. We've got about 35 to 40 demonstrators built up over the last five years. So I'm going to very briefly bring you through three of them. Okay, so the first one, sample demonstrators that we've built and built it last year, which is particularly popular, is every company is to be able to generate artificial data, whether it's because you want to do a sales demo or you want to test systems or you want to train, but you need to be able to generate artificial synthetic data, and you usually need to do it quite quickly under a deadline. So with this in mind, we built a demonstrator called Datagen, whereby if you have existing company data, such as a customer table or something like this, you can put it into Datagen. It will then quickly figure out a very clever way to figure out what all the features are, what the relationships are, what the correlations are, and to generate a synthetic copy, except that it's not the real data, which is great, because then you don't have to worry about your privacy or anonymization issues. Okay, you can of course also use it in manual mode if you want to actually handcraft the features, which is very useful too. One of our companies, the one who worked with the most on Datagen, Analytics Store, now offers synthetic data generation service via the Analytics Store, and that's one of several companies who are involved in it. Okay, now the second demonstrator I want to mention is one called Docopool, and that is a tool that focuses on analyzing text. So imagine you're a company that has many text documents. So for example, it might be emails coming in from your customers or it could be the description of a claim when a customer puts it in, but you don't really know what's in them. You would like a way to be able to explore them in some easy way. So what Docopool allows you to do is load in all the text documents and then it will automatically sort of slice up all your documents and figure out what the topics are. So if you look down on the bottom left, the topics then are the various circles and it tells you, well, this one was about asbestos claims, this group of documents were about something else, and it does that just by itself in a sense through algorithms we apply. Okay, the third demonstrator, three of three, just so you notice the last one, is one that we've just released and it's called deep learning for opaque data. So what we wanted to do was to enable deep learning to be accessible for companies because it's quite a sort of big deal to get into it. So to be able to do that and allow easy creation of deep learning models where you don't have to be an expert user to use it. And so we've made this very accessible demonstrator that does this and it's sort of lowering the barriers to entry of being able to try out deep learning for your data, typically your unstructured data, your opaque data, like for example, an image file or a binary file from a log is not easy to break down into features for machine learning. So that's why we're calling it opaque data. And we have all of those demonstrators and other sample seeder demonstrators are all available in the poster area later if any of you want to discuss any of them. Okay, so that was a whirlwind tour through various questions and demonstrators and I don't think you can ask me questions right now so that's just a token slide. And thank you very much for listening. Okay. Thank you, Susan. That was great.