 So, it's really cool to be here and I'm going to spend about, I think, you know, about half an hour or so, half an hour to 35 minutes talking a little bit about what we're working on but also focusing on just not just what we're doing but also our journey. There's going to be mostly a business perspective on what it is that we do but I'll bring in also, you know, the tech perspectives and happy to take any kind of more in-depth technical questions towards the end although I'm not the CTO of the company so there's a little limit to what I can be talking about but in very short, so if I introduce myself first, my name is Anita Scholl-Breda. I am the CEO and co-founder of Iris AI and we actually, this day, this very day, our company officially celebrates five years. So we're still in the startup phase but we've been around for a while. We've been doing, you know, both research and development in the AI space for five years so we've been through kind of quite a journey in those five years and where we started five years ago was that we started with me and my co-founders were challenged. We participated in this summer program at something called Singularity University and we were challenged to come up with an idea that would positively impact the lives of a billion people within a decade. Small challenge, simple enough and, you know, as the introduction said, what we found to be like the place where we could really make impact where we now know more than ever that it is vital to the success of our species was, it's quite simple, science. And the problems of scientific research are plenty and I'm not going to list them all but the biggest problem and challenge that we have in scientific research today is that we are living in a world with an absolute exponential growth of knowledge that's true across the world, not just in the scientific fields but it's also true in science. We publish 5,500 research papers every single day and it's growing by about four percent of that output in itself so we have a growth of the growth and it is becoming exponential and it is increasingly hard to navigate and of course with the introduction of the internet all of, well not all of but a lot of research was put online and that was great but the problem is that navigating it, finding it is becoming increasingly impossible. There's plenty of tools out there right Google scholar and Semantic scholar and plenty of good search engines and if you know what you're looking for that's great but actually finding exact bits of knowledge when you don't know what you're looking for and having the world's knowledge scientific knowledge at your fingertips at all times that is simply not possible and so five years ago we started setting out to say okay what you know what do we need to solve what could we possibly solve making scientific research more accessible to more people and when we looked at that you know at that time I was not an AI company founder I had learned a little bit about artificial intelligence and machine learning but very very little but I understood enough to understand the potential of AI and natural language processing applied to scientific text and so we set out to build what we called then and still call now the AI researcher and our vision was then and continue to be five years later a research assistant that will be an essential part of any research team in academia and industry across the world making sure that we have all of the knowledge of the world at our fingertips at all times so you know a little a little little undertaking a little you know side project if you like but that's that's where we started right and that was five years ago and we were definitely Norway's first AI company AI startup we were one of the few in Europe that really got a lot of attention in 2015-16 early 2016 we were part of you know tech crunch disrupt and 500 startups and we got like all this visibility and then we've kind of just kept going since and you know getting funding getting research grants getting clients etc so it's been quite a an interesting journey being part of this you know for for so long five years feels like ages in both the AI world and the startup world but what I want to kind of wanted to cover today I wanted to talk a little bit about the tools that we started building and what that we have built which is a suite of tools for academia but then I wanted to talk about some of the challenges we faced some of the assumptions we made some of the technical challenges we faced and then how we kind of took what we had built and applied it to a new market in a way we didn't quite that we didn't quite predict accurately if we can say it like that so that's kind of what I'm going to talk about it's going to be a little bit of kind of talking about the products but not on a kind of this is how they work level but more on a conceptual level of this is what we're what we tried to build and this is what we're still building and going towards so you know we say at least to clients and in marketing that we've spent five years building an award-winning AI engine for scientific text understanding and with award winning we mean that we are are a semi-finalist in the AI for good X price which is all about applying AI for good causes and there we had you know several lovely people who were experts in NLP come in and vet our technology so we we know that what we're doing is is unique and world-leading but of course there's so much exciting NLP development happening both on the research side and the applied side so we're one of many working on it but science is our field and not just you know science at large research at large but specifically now on chemistry in the chemical world including pharma and material science and a variety of kind of tangential areas all right um what does our vision look like i mentioned it a little bit ultimately 2025 we're building an AI research or an AI system that can do interdisciplinary in inference beyond what humans can do and is an invaluable research uh team member that's a long goal if we go all the way back in the beginning as i mentioned we built this assistant a suite of tools for academia for academics for researchers that need to do literature reviews systematically and thoroughly and we built this machine that has a broad interdisciplinary understanding of the world of scientific research and helping humans do literature and review and the status of those tools per today is that we have product market fit meaning we are selling the tools by yearly licenses software as a service and they're commercially available being sold mostly to two universities and then distributed to the entire university or we also have an individual model so any individual researchers wanting to use the tools can then pay for monthly or yearly access so those tools are kind of a commercial i will say success but i'm going to preface that with a little asterisk that i'll get back to on on some of our assumptions later but they are a success from a from a usage standpoint and then comes the middle part right because we started with this okay here's the literature reviews that we're going to build and then we're going to reach this AI researcher but then what's what's in between and what ended up being in between which is you know that transition is what i'm going to be talking mostly about today is iris ai the ai research tools becoming a specialist and what we decided to do a lot of delivery which i'll get back to is diving into chemistry specifically also material science pharma those kinds of things biotech then next to find knowledge and draw conclusions the status per today is that we have our first you know commercial proof of concepts deliver both sold and delivered we have productified solutions that are becoming commercially available this fall and so we're kind of in this breaking point with those tools both technically and and market wise and it's super exciting but it's been quite an interesting journey to get there but first let's look a little bit of on the tools that we have built and again this is not a neither a sales pitch or a full demo but i just want to give you a little bit of the impression of of what we developed so we already pointed to that scientific research you know research papers are growing exponentially in a variety of different fields and that is you know makes it impossible to navigate and unless you're a deep domain expert that know all the experts the other is to know in your field and you're just looking for that that one paper what was it called again it was from 2003 wasn't it if you do that you know google scholar or whatever you want to use is fine but the moment you're interdisciplinary the moment you're exploring a new field the moment you're a phd student or master student and you don't know the field yet this is just a mess right this is incredibly hard and you cannot navigate it it is tedious time consuming you don't know the accuracy and even if you are a domain expert your interdisciplinary skills are going to be really really tricky because key words limits you to what you already know all right so what we're saying is we're going to let the AI research assistant do your literature review for you and here are the two tools that we have developed and built and that are as I said live in the market and a commercial at least decent success if we want to put it that that way is an explore tool and a focus tool so the process of the researcher will be okay I have a problem statement I want to use that problem statement usually having to like build out a few full keyword query based on some guesswork and we're removing that part but you're taking problem statement you branch out to find 500 a thousand maybe 2000 articles that could potentially solve your problem and cover everything you need oh and then when you have all of that well you have to focus down right and you have to iterate down to a very very precise reading list and that's the focus tool and then of course at the end when you have your precise reading list with only relevant papers you're still going to have to read the papers but in this way you don't have to read all the abstracts and titles of these papers in order to to build your your literature review list in short what we do is that we take the input text which which can be your own self-written problem statement or the abstract a title of a research paper or patent right so a text scientific language three to five hundred words we take that text we extract the most meaning bearing words from the text we enrich it with contextual synonyms we enrich it with hypernims or topic modeled words if you like we combine all of that into a what we call the fingerprint hence the fingerprints on the slide but a a word importance based matrix and then we have our own proprietary document similarity metric called wisdom or fingerprint matching if you if you like the the business term we're using we basically do fingerprint matching and then we have a heuristic geometrical function to distribute the articles we've found and the fingerprint into categories and subcategories and then we build this Voronoi diagram that you saw in the last page here this visual overview over here are the main topics here are the sub topics and here are the papers matched with your problem statement but within this subtopic so distributed into categories and that's in short what we do we have amazing university clients all over northern northern europe and we have also proven that the tools work and that was one of the things we wanted to do really early on was prove that these tools were not just like you know let me go back to sides they were not just like a flashy interface and claiming to be some cool tech behind it so we actually decided to go out and run a bunch of experiments and what we did was that we had multiple teams compete to solve the same research challenge over a few hours to see you know who used the tools how much and how did they perform when evaluated by an external jury so we did that and found that compared to using pub med and google scholar and all these kind of classical search engines the users of our tools found more spot on papers they could draw superior conclusions and they had a much better overview of the field and this was true both for master phd students and for professor level as well though the difference was higher between the teams using iris and not when you were a younger more inexperienced researcher so so we built this we started selling it and and everything was going great here is one assumption we made which has nothing really to do with ai and machine learning but one of the assumptions we made was that if we can take the systematic literature review process that now only academics really have time to do but that industry say that they really want to do they want to map out all research they want to understand the the scientific field they're operating in they they want to do it but they don't have time and so our assumption was if we can take that manual process and reduce it by 78 percent which is the metric that we have for for doing a systematic review with our tool 78 percent time reduction if we can do that then we can sell these tools to industry right and that was our assumption and turns out that is not quite it and it is really interesting to have learned this that just because someone says they want to do something you know you you you bring you bring them this tool you develop it you show it to them and they're like yeah i don't i don't need to map out everything can you just give me the answer please can you just give me that one paper i need to read i don't want to put in the work and what it turns out to be is that a lot of r&d professionals in in industry they don't have the time even if the tool saves you 80 of time you don't have the time to do it which is unfortunate but that's the reality of it so where we found ourselves about two years ago was some really good tools if we can say so ourselves that we were selling to universities but there's a couple of issues right universities unfortunately don't have a lot of money now personally if i could i would give our tools away for free to the universities because science is important i would give our tools away for free to individuals as well because science is important but we are running a business so unfortunately we all need food on the table we can't do that so we are charging money for it but we're not charging a lot we are we're barely charging what it costs us to run our servers and to train our models and to do the research we have a research team of six people with phd's and and you know that that costs a bit of money so we have to um we have to make that money and so point being that we thought we could take the tools we've developed to academia and because we shortened the time so much much more efficient and that's something we hear with with with so many AI machine learning projects right is that we can take this really big process and make it quicker and what it turned out was that a lot of and not everyone but a lot of r&d personnel and a lot of r&d departments and industry said yeah we want to do that that process and make it quicker but when it came down to it they didn't really want to do that um and so um so what we had to do was rethink a little bit we had these tools we were selling them but we couldn't charge enough money and to to really become a profitable company and we also um couldn't sell those same tools to industry so we had to rethink entirely what we were doing we tried to sell it we ran a few pilots and we just realized this isn't working we have to think differently and so here is one of the ways we've ended up thinking differently so we still talk about some of the same problems right scientific and technical documents are vital for your innovation r&d success but they're really hard to efficiently leverage with human power so we had to reformulate what the problem is but the problem still is the same right lots of documents lots of content what can we do about it and then what we realized as we started interacting more in depth with our potential industry clients was that AI has a lot of promise right it really does and the breakthroughs we're seeing within nlp and and the things that are becoming possible and will become possible over the next few years are amazing but out of the box nlp does not work for complicated lingo and scientific research definitely is complicated language a few years ago we looked at kind of the you know our data set versus the Reuters database um data set of news articles and that was a vocabulary of 60 thousand words that that you know you would have to understand and our data set had a vocabulary of 200 thousand words right it is radically more complex and so out of the box nlp doesn't work for scientific content is what we've found so far custom consultancy is incredibly expensive and there's going to be so many technical consultancies out there telling big corporates that yeah yeah yeah we can build this illusion expensive and it usually doesn't work the results are disappointing so this is how we are framing what we're doing and as you can notice i'm still not really talking about any tools because what we've found is that every r&d um organization especially in the really big corporates will have a very specific problem that they're focused on and it's slightly different from company to company but based on their research process there are kinds of products their strategy their digitalization strategy their product strategy there's so many different components to it that leads to like here is a specific problem we want to solve and so here are some of the problems that we've seen within these large corporates now as you will notice all of this all of these problems are related to here is a document that has some some some science in it or has a technical or scientific or research nature but whether the company needs to you know some companies needs to extract experiment data right before going into the lab we need our competitors data so that we can test our material in the same way that they've done to hopefully say that our material is better right for some clients it's about identifying new application areas we have a material it's it's amazing or a compound or an alloy or whatever it is it's amazing but we can use it for this and we are sure that there must be other use cases for it out there that we could open new business lines in and make more money right then there's still the need to do literature reviews it's just that corporate you know r&d literature reviews has to be so much more efficient and i'll get back to that here's a set of documents i just really need a quick overview of it we have decades of in-house knowledge and if you'll notice i'm going kind of around the clock on my slide here we have decades of old in-house knowledge in the team how can we figure out what we already know can we find research funding grants for for our project is it fundable um can we extract all pain points from a patent these are just some of the plethora of different challenges that we are actually all tackling uh with our tech and so that's kind of where we got to and again you'll see the the slide right spent five years building this it is now custom trained on chemistry and it can be reinforced on a specific client's domain with a small set of patents and an entailogy which is pretty cool and a good you know starting point we have with having the machine trained on a general data set the tool is modular and that's actually something we've learned that we have to modularize it because everyone has a unique challenge so we've just kind of picked apart the different modules and said this is a modular tool we can assemble it so that it works for your specific challenge and problem and then of course we've already built this module so we can actually deliver it at a cost that is you know very competitive compared to um compared to consultancies that want to build things from scratch um we have worked here with a couple of clients and and it is super interesting when you're early early stage and here's a here's a meta um consideration again when you're early stage and you want to go out there and talk to clients and meet them you get this you know especially if you have some core tech like we have nlp for for scientific text it can be used in so many areas right and it's so vital and it has been to us to meet the clients exactly in the process of having a project and so the extract tool that you can see here where the client needed to extract experiment data from competitors patents before going into the lab they were looking for a solution and we happened to stumble up on them tiny little iris ai at that time stumbled up on this massive multinational company and turns out that we could build exactly what they needed and what that is tool that extracts and links all relevant data points from text and tables to populate in their cases are pretty neat we can do what for them is two months of manual work and when we say work we really mean like someone with like at least the postdoc right someone who has you know serious experience in their field they have to sit for two months and extract the data from a hundred patents we can do it in four minutes at 90% accuracy or about 97% precision and we're at 78% or so recall so pretty exciting and neat that is one of the use cases we've played with delivered to I should say executed on and then there is discover and this is interesting right so you will see in the picture here that it looks very very similar to what we built earlier right but when we came out with this general tool and said it's it understands all science it's you know all research it's a it's a literature review tool and you know the client the potential are you know clients in R&D said that's great but but we want it to be specialized we are specialized we want the tool to be specialized so what we developed was a way to to reinforce or strengthen the model within the client specific domain and all we need is a set of about 2000 documents from the client's domain and remember earlier we have this tool to do literature abuse so it's quite easy to like find 2000 articles reinforce the machine on the client's domain and suddenly we have a custom a bespoke search tool or discover tool specifically for that client's research area and so what this is essentially is a content-based recommendation engine for scientific papers and patents custom delivered with you know the specialty of that of that potential client so that's pretty cool some of the things we're doing and basically what we're looking how we're looking and how we've reshaped entirely the way we think about what we do is it's not a tool that we're selling here and a tool that we're selling here what we have are all of these functional modules like text similarity comparison and table information extraction which is a machine vision algorithm and table and text linking which is of course the tricky part about data extraction etc etc and I'm not going to bore you with going through all this but we're working with knowledge graphs we have a variety of different classification and clustering and categorization algorithms etc then of course this easy domain specialization where we you know the core engine is trained on a data set of interdisciplinary papers so we can easily specialize either using a small seed ontology or these 2000 articles and then we have a variety of different ways to visualize develop the UX and we're also working on abstractive summarization, hypothesis extraction and turning some of these modules into tools with a conversational AI so you can actually go back and forth with a tool and so if you look at what that actually looks like in terms of modules on the use cases from earlier here's the extraction use case so first of all we need to specialize the core engine on the domain of the of the client then we use table information extraction and extraction of entities from the text then we link the table and text and for that we use causality compositionality and a knowledge graph and then we combine all of that together to extract all the data points all the processes all the products from the tables and the text in a patent so that's the extraction tool and then of course we want to add in some some assessment and some explainability really and we have a self-assessment module where the tool continuously will tell the user how certain it is that the rows and columns were extracted correctly so that it's easy for the user to go in and modify and double check and verify that the tool which the tool is is not yet um sure of so that's kind of how we're using this like modular approach for the extraction and then there's the patent discovery uh which you know you will see bear quite resemblance to as I said this discover this uh this uh academic tool but here we have the domain specialization text similarity document categorization and then a very clear visualization component to it as well where we look at here are the main different the main um categories and subcategories as you saw earlier as well and then of course there's going to be more interesting things we're going to we're going to be doing and here we are still looking for those first um courageous and future visionary um clients in chemicals and um and and you know material science pharma etc where we're looking at application identification so again we have to do the domain specialization but we'll also work with text similarity knowledge graphs and then essentially a content-based recommendation engine in the shape of a conversational engine so again combining multiple of this to say here is my material what other things could this material be used for and then you have a conversation with the ai that draws on the knowledge graph to be able to accurately recommend not here is a full paper but here is that part of a paper that is describes what might just be an application area for your um for your product for your alloy for your chemical for your um you know material so that's super exciting um and we're having a lot of fun with this and a lot of fun kind of navigating our way around these massive clients to figure out what exactly is it what that we can deliver and so what we are what we're really navigating here which I think a lot if there's any other kind of ai startups on the other line uh listening what we're navigating here is like not becoming a consultancy still staying a product company but also having these really kind of customized approaches that will fit these large companies well so that's kind of what we're navigating like how can you modularize something that is so specific um yet also universally applicable and I know that's totally a contradiction but that is that is what it feels like and that is what it is um and so with that uh this is our business development manager I'll put his name on the slide you're also welcome to to contact me which is Anita at iris.ai a man is our main business development manager if someone has specific questions about the tools um you know where we're at right now is that we um are kind of in the early stage of the growth phase we've kind of reached the past you know the first few hurdles we are um you know we are we're actually generating money we're you know an ai startup that's actually generating good money and is not in marketing that's pretty neat if I can say so myself we're really tackling something that we feel is is fundamentally important and we continue to feel that we are a four impact company for profit too for sure but what we're doing you know in the world that we live in right now scientific understanding scientific literacy and scientific research continues to be of vital importance I believe for the survival of the human species and an AI and machine learning the way we implement it but also the way a lot of other really amazing people are implementing it is going to be of fundamental importance to the success of of the human species so with that I thank you very much for listening and I'm looking forward to see if there are any questions for me thanks so much Anita we have a question yes we have one how do you guarantee the mean that minimizing research time does not count as a cost of data in accuracy so if there is a balance between time saving and accuracy and a second one that it's telling that 90 percent accuracy it's it's good but the data structure looks very unsupervised what's your opinion on this yeah so if we take the first of the first question first which is like the time versus the solid solidity of the results right what what might we be missing out on and the way our tools so if we look at the academic tools and kind of the systematic review tools that is actually a question of you can you can go as much back and forth as you want there is always in these tools there is the option of you know manually reviewing and that's why you know if you need about we'll say about an 85 percent accuracy so 85 precision and recall for for a literature review search then you don't have to do a lot of manual review right getting to 85 percent is not that hard with the tools that we have now getting to 90 percent getting to 95 percent getting to academic precision of 99 percent there you do need a lot of manual verification and that's why this normally takes six months and with our tools a couple of months right for the academic precision so there we actually have a very clear manual you know you go back and forth with the tool to verify there's a lot of manual verification and there's no automation in that when it comes to the extraction tool and the question I guess was you know 90 percent accuracy is good it's mainly unsupervised I think that was the question and and yes there is I mean that one is entirely kind of you you plop the patent into a folder and you get a spreadsheet at the other end that's why we're actually implementing this self-assessment module where the tool gives you a very clear indication on how confident it is right so you can very easily see you know marked with colors like here is where the tool is less certain so you can go in and validate it and of course we also work when we work with a client we make sure that we that we validate you know we we start with an anthology and we reinforce the machine on their domain but we also have the client spend a few hours actually then reviewing the the points in the knowledge graph to say are these you know are these accurate are there any inconsistencies was the training successful and so we always ask from the client a couple of hours of their time to to evaluate the model before going into the extraction but then the extraction itself is automatic as I said with the self-assessment so you can go in afterwards and see does this make sense are the are the results accurate so I hope that answered the the question and if I understood the question right yeah thank you Anita for being with us and thank you for the answer it's it's time for a break in 10 minutes we'll be back with the last session so it's probably the moment for the last coffee of the day if you are a little bit worried about sleeping maybe it's not a good moment for coffee so we'll be back in 10 minutes thank you Anita bye thank you