 I mentioned I am a technologist and been at it several decades now and so but I work in the area of libraries and museums at Microsoft my role is to look at the libraries and museums industry and and what are the needs of that industry and how can technology be applied to further outcomes in in that particular industry. Microsoft is actually organized by industry and so there are people like me who do it for transportation who do it for finance who do it for schools universities etc but I came to this role after my time at the American Museum of Natural History in New York where as Claire mentioned I led digital transformation there for quite a number of years. So let's let's get started you already know who I am I'm based in Seattle and it is it is still dark here so sorry about the flexion in my glasses and and hopefully well hopefully it's more about the presentation unless about my glasses so but let's get started so one of the things that I focus on at Microsoft and actually frankly when I was in the museum world as well is how do we help people discover collections and one of the the great privileges of my role before the pandemic was I spent a lot of time visiting libraries and museums around the world and it cannot tell you how often somebody took me down to their treasured rare books room or rare collections room or something like that and showed me some you know this actually happened to me at the British Library I remember this vividly and showed me some incredibly rare book which clearly I couldn't touch I could just see it in the racks and we all were amazed at the fact that this had been preserved so long but the contents were not available like people can't you know it's only available to people who get to go visit which is only a few and even then you can't see the contents and even then you may not be able to understand the contents right I want to be very conscious in this conversation that I am coming from this from a technology perspective I'm in no way trained as a librarian or a researcher so feel free to challenge me on my thoughts here so I really thought about this this issue of how do we improve discoverability it's something I heard a lot of from a lot of organizations that I talked about which is you know how can we improve finding things in the collection and how can we improve understanding relationships across the collection and so you know my job is to marry technology with problems and and issues or constraints that exist in the in the library's and museum's world and so I really thought about this and and so as I said we don't talk about the technology so much we're not not an organization hopefully we don't claim to be an organization that says here's a technology you should use it but much more coming from the angle of how does it solve a particular problem and so I looked at is an awestruck by a lot of these collections that I had seen around the world and really thought about how can we improve discovering more because we're really scratching the surface and I hope that doesn't sound controversial but we are scratching the surface in terms of what we can make available and so when I look at this I'm you know use this definition from lexico.com which I think is the Oxford dictionary it talked about the quality of being able to be discovered or found and that was really a guiding principle for me which is how can we surface more things along the way I also did a masters in data science and learned a lot about knowledge mining which I'll talk about a little bit here and and the use cases of that and how that can also help bring information to the surface but in a way that's unexpected and not necessarily in the traditional ways that we may consume information so I really maybe naively maybe you know I'm always curious about the world and I'm like what what if we could increase the world's knowledge and there's not that we have the knowledge but how can we increase the visibility and the surfacing of it and really if we think about it we only have a fraction as I said I'm always curious about these things many many years ago now it's probably 15 years ago the National Library of Australia I'm Australian originally the National Library of Australia brought out I digitized all of its newspapers and then made them available on the internet but the challenge was that the OCR the optical character recognition that was done at the time in those days was not as good and so they introduced this idea that people from the general public could help correct the the OCR and I got into all that I was like I went crazy on there I think I got a book from the Prime Minister for my efforts or something along the way I discovered a lot of things about Australian history Australian history in the in the colonial times that I had never been taught and it was never aware of was not written in the history books and so one particular one that I you know found out about was there's a thing called a bachelor tax so apparently in the early 1900s if you were a bachelor and you didn't get married you had to pay an extra tax because that was your duty to society because women didn't have the independence that they have today and I thought this was fascinating never heard about it before and I keep thinking about these ideas of how you start to surface knowledge that we that is not written about in the books that is in all of the world's media so enough on that putting my technology hat on I've worked in technology since the mid 80s and along and has been a real privilege to have lived through the era where I started working with mainframes and punch cards and you know had the opportunity in Australia in about the mid 80s when PCs only were introduced to Australia and around the mid 80s and was PCs didn't exist when I was at university in the early 80s and and then I lived through the PC era the internet era the mobile era we're now getting into the you know virtual augmented metaverse era there will be more there are always more eras and when I look at technology it's almost always been about addressing a constraint of some sort it's like so I start with televisions you know it used to be a time where you had to be home at a certain hour at a certain day to watch a certain show and if you weren't there you never had another opportunity to ever watch that show again which is really when you think about it really very different to what we experienced today where I can watch a show anywhere anytime we actually had I hope I'm not dating myself too much here we actually didn't have colour television until I left high school so we had black and white television I'm sure there was an era before my generation where there was no television obviously there was then we had colour television and then we got to the idea of you could record a TV show while you were out so VHS came along sometime in the 80s I think it was VHS came along and then we get into DVDs and then we got into cable and so cable TV which frankly was late 90s by the time I hit Australia was this idea you could watch a show anytime anywhere but only at the times that they told you it wasn't quite any time right and then you and then you got into where we are today so not to belabor the point but we get into the you can watch a show anytime anywhere on any channel not any channel but you get the idea right so but it's all the ways of being about addressing constraints and if you look at phones it's the same thing if you remember rotary phones and then we got into the idea of you know people had the only way you could get a phone call is if you were actually there when the phone rang and then we got into people leaving voicemails and then we got into the idea where we're almost at today where you don't even need a phone anymore because everyone tax so but it's about addressing constraints so I'm looking at this from two angles what about the possibilities for surfacing knowledge and what about addressing constraints and one of the constraints is the description how you describe collections both in museums and libraries and so metadata as you all know and I feel I realize I'm totally bridging the cry metadata has been core to describing the collections to be able to solve that problem of discoverability but now we have so and I'm not in any way suggesting metadata goes away but now I'm saying there is so much more and there are also limitations of metadata it takes time and it offers a certain perspective and clearly it's also not the full in the case of a book or a document it's not the full text so it's always going to have and it's got a very limited space or limited limitation so it's going to always have some elements there that it's not perfect but it's a very good solution but then unlike so how do we add to metadata and then we look at other media so we're now living in a world there always was audio and video but maybe not always but recent times but the description of audio and video is really difficult right so if you try and transcribe audio it takes at least four or five times for someone who's really qualified at it takes at least four or five times the actual length of the audio to do that by hand when you try and describe video it's even more difficult because you get into things of like well this is a video around these topics and it has these speakers etc but what happens when you're someone like I'm talking about the Imperial War Museum because I've worked with them here who have a whole video collection of from from the wars and there is video after video after video of scenes from the battlefields and who are those people and what are the trends and what are the patterns it's very difficult to describe that you can say that this is a video about a particular battle or about you know whatnot but how do you find all of the videos that have Winston Churchill in it or anybody else for that matter Catherine Define not that I mean any of those but you know name a person how can you find all the instances that person appears even if you don't know who that person is and so you get into these kinds of sort of issues and I like to take what we know from the from other industries and so this was you know sometimes commercial they have the money to invest in innovation that is not necessarily available to the cultural sector which is a very sad reflection frankly but but then so I heard about this company that actually offered a service where if you wanted to find they had all the archives of every fashion show ever if you wanted to find a particular model with a particular dress from a particular designer from like 1969 or 1951 or something like that they offered a service where they could get that click for you in like an hour now think about the size of those archives and and the challenge of trying to find those things so that like really got me into well how can we how can we better understand video in a way that is more scalable and and also frankly how do you provide for multiple perspectives so we have the perspective I work a lot with museums and we have this idea of music so art is often described from the point of view of a curator and it's it's described as something like it's was bought or gifted from this particular person it was done by this artist it's it's a certain style it uses these materials right but none of this tells you from a general public perspective or maybe another researcher's perspective what what you might actually be interested in is it a depiction of war sorry I'm not focused on more I'm just using it as an example I've used it twice now though but is it a depiction you know what are all the depictions of war over the years um is it something that uses a very similar style and these kinds of things is what the metadata that is used in art museums is not necessarily it provides one perspective but not multiple I think the other thing that you know when we talk about um these kinds of technologies and how they can more rapidly describe uh collections one of the first challenges that we often get and rightfully so is it's not perfect so I will say my response to that is always you know well technology improves every year so the technology that you have in 2022 is will be completely um different to what it is in 2030 it'll be much better it's always going to be better um there is a chase for human parody but we'll never get there but we'll approach it um but I think the thing to remember is also that the way that we experience um knowledge today reading listening watching um in the case of you know documents videos etc or images is not necessarily the way that you have to consume the knowledge that comes out of it and so when we get into this topic of knowledge mining and so I look at you know I've talked to many institutions that have extensive oral history um libraries and the challenge with oral histories is you know the assumption is that the use case is that you're going to um listen to it end to end and therefore that the transcription needs to be perfect end to end and it does from an accessibility point of view absolutely but from a knowledge mining point of view knowledge mining is the process of abstracting out what are the key concepts what are the patterns um and trying to get more information about what is the content and then not just looking at across one document but looking at across a set of um a set of audio or a set of video um and and so I think sometimes perfection we should absolutely search for perfection but perfection we are with our other other ways to consume this data and and provide visibility so when I talk about digital what I'm I often say is the way to apply digital is to do something you can't do in the physical world and and so you know I get into knowledge mining as something that's not easily as easily done in the physical world so how can we use that to understand the collections of an entire organization and I'm going to get shortly into what happens when you take that beyond the organization so apparently was the next slide so we often look in the context of a single organization but I you know I think my other dream is what happens when you start to bring the world's collections together so you go beyond that into an entire country you know what are the collections of an entire country what is the knowledge that can be mined from that what does that tell us about history um that we may not otherwise be aware of what are the insights that we don't have today um my mother used to teach me um when I was young that you know history books are written from the perspective of you know the person who wrote them and those were inevitably the people in power um or who had the access to reading writing um publishing all of those kinds of things and so that there were other perspectives so that always sort of stuck with me um but I mean this is a to me the thing that gives me goosebumps in in my job is thinking about what happens when you bring the knowledge of the world together and what will help us understand about the past um and potentially the future um that we might not otherwise have known because you know it's because of these limitations so something to something to ponder and I'm really hopeful in the questions that will get some some thoughts about you know what the challenges are there the thing that got me to hear was working in natural history museums there's a natural history museum in every uh in every major city and world pretty much um and they all have similar collections on the face of it from the general public but they're not what you but when you start to look at something like you know an artist like um Gainesboro for example um or Monet or any any artist um you can't see all the collections in one place and you can't compare and you can't and so how can you do that and digital allows you to do those kinds of things and analyze um what the patterns are and then of course the other you know topic um that's very big at the moment and and and should be is this idea of accessibility we've seen it during the pandemic we've seen organizations realize that reaching out across um digital channels um for educational purposes um and I don't mean schools and universities of course those but I mean you know in terms of general public education and in terms of um just making these this information available and consumable um to people anywhere anytime um is a big thing uh and and something that really extends the mission of these organizations so um I often use the example of Italy uh Italy culture is very important um we work a lot with Italian organization we work globally actually but we work a lot with Italian organizations and Italian government and um it's very important to them that they take the culture of Italy to the world and that's something that they've really realized during the pandemic is that not everyone can come to Italy obviously um but what if they take Italy to the world and so there are lots of great examples in the museum space there around uh where they've taken that and and I think that there are equally examples around um the knowledge um of the worlds um and how do we bring that to the internet because the internet is the great delivery system here and then how do we make access available to anybody anytime um and not just to those who have the privilege of of um being able to visit collections in person um I realize that I'm probably way over time but I but because I want to get to some questions and some discussion and show you some demos as well um I'm going to I'm going to go through this slide in too much because it's a little bit too technical but these are the kinds of capabilities that we see with these technologies um which we call cognitive services um but it's the types of things that you can do it's you know speech to text text to speech translation description identifying objects um and knowledge mining being able to abstract out the concepts and key themes um I'll leave this for your for your reference later but you know it there's a lot of technologies and they're very accessible very quickly and they're really designed at a level that is not necessarily requiring um it's certainly not general public level but it's not requiring as much technology expertise as you might think so we have a lot of these kinds of services this is a big thing in technology these days um these services that are quite a lot more accessible to people who we would call citizen developers um so not necessarily professional developers but people who have the interest and ability to um sort of incorporate these kinds of services so um so there is a whole suite it's growing it's getting better every day um and so something to consider I want to show you a couple of quick examples and then we'll go into the discussion so one is about and I actually have these in demos I just want to quickly um walk through these uh one is using computer vision to describe the images um so this is back to the the conversation I had about what happens with the metadata when the metadata is around um the the medium of the of the painting for example or the metadata is around keywords in the in the document of the book um but then you can start to get into um so here we have like automatically generated tags which you don't have to display by the way that's always something everyone worries about which is whatever I don't like those tags but you know the idea is it can generate automatically and it can run through we did this with the Metropolitan Museum of Art it can run through 20 000 images in an hour or something like that it's pretty quick um and we also feel a fairly inexpensive it's like I'm sorry to use American currency but it's like pennies um for a thousand images or something like that um we also have this idea of like well how do you create these relationships across images in a way that you could never scalably do with metadata um and I'll get into this in the demos we also have things like handwriting recognition um so um how can you you know you can say see this document here has both typed and handwritten so it can also transcribe handwritten um and here's another example of where it's using handwritten and here's an example here on the left where it's actually abstracting out all the concepts um so this was not done manually this is abstracting out all the concepts and the entire corpus of documents and also you can obviously do typed content recognition um and same idea is what we saw with the art it's about what are the relationships so in this particular case this is about the files of the JFK assassination so it's taking an entity that's often described to you Lee Harvey Oswald and then what is that term most associated with um and then this is the idea here and you can apply this to anything is is getting this kind of a map allows you to rapidly explore what the possibilities are in terms of um other closely related topics that you might not otherwise know and certainly you can do this at significantly more speed um and then we have um this other example which I'm going to show you which is video indexing so this is a presentation I did some time ago um and what it does is it actually analyzes the video fairly quickly and then brings out all the topics that I talked about um and brings out the keywords and brings out so it's not just a just a transcript which is also important but it's also um mining that for what are the key topics um I talked about and and then obviously you can do that over a whole set of videos so that gives you sort of those insights I'm running through this quickly because I wanted to show you that and I'm at the time um but I also want to get into some real-life demos so you can actually see it um so with that thank you for this we're going to go into a chat with Claire and um and then I'm going to show you some live demos of some of those things we'll stop sharing Claire. Hi Catherine thank you so much for that it's great to see and I'm keeping some of these demos as well so I wonder you know looking at your cognitive services and I think we're all looking at them go that's quite a list you know for those of us with a big backlog of items where we haven't got the metadata we've quite liked for discovery you know where I suppose where would you say is a good place for us to start adopting some services? Well I think you don't have to choose like I want this one and this one you know they were um they were displayed in columns so right there was a set of services that are similar so you can apply that service and all of them will get applied at once um so maybe that makes it a little easier and brings it down to like four or five but I think it comes down to what is what is the thing that you and I always advise this not to do everything what is the thing that you most want to achieve right what is the problem that you're most trying to solve that it could be something like you know keywords at speed right clearly writing keywords like manually and I understand why um is you know time consuming right so how could you get keywords generated and you don't have you could consider those the first draft right if you're concerned about you know because that's always a concern right clearly a computer is never going to have the expertise of of a researcher but it's a great first draft right and eventually you'll build up confidence about it so that could be like let's start with keywords in the case of the video called this um I would you know run through um um you know maybe it could be translation right or closed captioning is another one right because that's the problem that everyone has again treated as a first draft or it could be translation um that's another great area of accessibility so I think it's all about what's what's your biggest problem and and then picking that rather than trying to do everything but it's available in the technology I liken it to like if you have maybe you're not like me but if you have a you know a new iPhone or something like that you know like every new release of software I go through and I'm like what's all the features how could I use them um and um and I'm always fascinated by it but realistically you're already you're only going to use the ones that solve a problem for you that address some constraint so just pick those you know just pick a handful and what skills do we need our libraries to be able to take these tools on you know they're coming through it's not the barriers not as it was but there's still I think for a lot of those things yeah well how exactly do I get started with these to get that first first list of key I recommend and willing to be challenged on this that it it gets done at an institutional level right so looking at the databases that an organization has and how you can apply um how you can apply this so you know it and and an organization is going to need some level of a developer um it doesn't need to be you know a whole team or anything like that but there is and I'm sure people are familiar here with this concept of apis it's basically an api that can a service that can run and then can append data um it doesn't damage your original data in any way um and it depends data and then you can decide if you want to use that data make that data visible but I often recommend that doing it with a pilot start with a an area someone who's willing and open minded around the possibilities pilot it and see what it looks like on your data um and see if it's actually valuable but also I think you know the key here is it's not perfection you know I think it's like 80 or 90 percent sort of perfection these days it's a lot better than it used to be which is at 50 percent and it's always the the question of is it is it better and faster than doing it manually and then how can I use it I don't have to as I said you know we had we had a real issue in museums with I don't like those tags I don't know if we want to recognize you know um so how can we how can we um think about not surfacing those tags but still using it in the searchability uh and the discoverability right so you just don't have to make it visible yeah yeah and then so the question here is where do we go to find out more about the different services and apis yeah well I probably should have put that in how we'll resend you the deck with the links to it there is um and clearly I'm going to come at it from Microsoft's perspective but you can come at it from any tech company's perspective um we have a suite of services called Microsoft Cognitive Services um which is the the thing that I showed you and I will send you put the link I'll put the link in the chat in a minute um and um before the end and and then of course every other company I'm doesn't want to be just about Microsoft I mean the other companies are clearly offer these kinds you know Google um probably Amazon as well um but the the point is that there's a set of technologies that could be used and I think is worthy of experimenting so we'll put the link it's very accessible as I said it's just but it does request some technical mouse but it's it's not um it's not like you need a specialist in this area right thank you do you want to go on to any of your demos or do you want to yeah let's do that let's do that um let me share my so I wanted to um organize this earlier let's start let's start with the JFK file so oops oh good morning Catherine at 6 30 um just trying to get this so that you can see my screen right so the story of the JFK files was um it's the corpse of documents that were released several years ago now um three or four years ago um I must have been on an anniversary when it was no longer classified so it is all the documents that were about the investigation from the US government um about the investigation of the assassination of JFK and and what it does is as I said showed you before has all these documents um and you can see that it is typed handwritten images annotations on top of all sorts of things right um and what it's done is rapidly as in like an hour or two rapidly um gone through this and analyzed all of the contents and so I think traditionally we used to have typed content as being something that was OCR right but now we can deal with so many other things but you know handwriting etc and then what it's done is automatically generate tags and these tags represent how often this this entity appears um and so this was not done by hand this was done by computer and allows you to understand what the um frequency is so it's it's in descending order so clearly CIA etc um you know is more frequent than congress or Cuba right for example but if you don't know I realize in this example people do know the history right but if you don't know the history right it may surprise you to find that Cuba was referenced right it may surprise you to find that New Orleans was a reference um and so this is the idea what I talked about knowledge mining this is what effectively this is is abstracting out the concepts and it's not just one word concepts it's it's um n-word concepts um and and and then you could take any of these and you could explore it further so I could take Cuba and I can excuse me and I can see here maybe I don't let's take the let's take the obvious one which is um maybe this is stop working no I've got it on my other one so I don't know what happened but usually it's what I showed you in that thing which is it's going to generate a mind map type of a thing of what's the relationship of these terms to um oh I can't believe I should have checked that beforehand I'm so sorry um but it's going to show you what I had on that that thing which is a mind map of like how those terms are related to other terms and how those terms are related to terms um but the thing that I often use when this demo is pick any country um that I'm in and I did this in Greece I did this in you know clearly if I look for Australia something because I'm Australian um oh it's work now um so Australia is most associated with topics around connected russians and authenticating um and then I could take any of these um connected russian to see what is related to etc um but also I can see what are all the what are all the documents oops what are all the documents and all the references to the word Australia um and this is where you're going to you know I'm just picking a you know an obvious term here but this is where you're going to uncover things that maybe you didn't know I did this in Greece where nobody in Greece expected there to be any references to um I'm sure there are plenty for um the United Kingdom right but it's going to pick up you know clearly any reference to Greece in here and then also be able to see back to this view here how is Greece most associated with wasn't most associated with Britain and Turkey um and absolute certainty whatever that means um but you get this idea of like being able to rapidly explore data and the real thing here is you can do this yourself but the amount of time it would take you to get through thousands of documents to get this kind of analysis is the thing and so even if you just want a quick understanding or if you take something like oral histories like a set of oral histories quickly understand what is being discussed that's the um that's what I think is kind of the game changer here um so that's documents and then of course we also have the um the same with images um so here is the example that I was using before where this is the here it says medium classifications edge this is actually the metadata from the Met's collection management system these are the tags that were generated for this um so here you can see it's talking about George Washington because he's a um he's not a celebrity but he's what we call a celebrity he's like he's a known person you know Winston Churchill would be another example um and it's talking about um boats and battle and army and things like that so it's doing two things here one is automatically tagging and then allowing you to do that same exploration which I'll show you in a minute of similar things and it's also being able to show you things that are similar to this so from an art perspective it's showing you things that are similar from a style perspective so this painting down here is similar because of um you can see it's got similar colors and styles and things like that one is a french painting and one is an american painting and typically in a museum um the people who own the european collection and the people who curate the american collection would not necessarily be aware of each other's collections or necessarily so this allows us to look cross collections look across countries and all of these kinds of things and this is visually similar because of its depiction of you know military and uh and battle and then you have the same you'll you'll see it's very similar where I found it here it is oops is the same idea that we saw in the other one where you can actually take that and then like what are the what are the relationships so here these are related because of similar metadata that was previously defined but these um are um related because it has similarities in the tags tags or the visual style and so it's bringing another dimension to that level of exploration um and then the last demo that I have is um this is this video one that I showed you so I could only show you um a little bit of it so all I did was you can actually do this today if you wanted to this requires no technical expertise good morning good afternoon good evening everybody thanks sorry that was something that I did a couple of years ago so I uploaded the video into this it's called video indexer.ai it's a Microsoft website doesn't cost you anything um and you can just try this out yourself and so you get an idea of what the technology is and um what it did was analyze this 20-minute presentation that I did um it doesn't know who I am but it knows that I appeared all the way through it um and I can probably um tag this as me at some point and then it'll it'll do that for future videos but it also picked out uh the the key topics that I talked about let me expand this so I talked about um digital divide and augmented reality and technology and art museums and skills apparently I talked about skills as well so um it talked about the keywords so it came up with all of these keywords that I talked about um yeah and so I actually talked about a Chinese proverb in that one I didn't realize it even came up and picked that up um and I've forgotten the proverb now I think it was something about um a tree the best time to plant a tree is now the second best no best time to plant a tree was 200 years ago the best time second best time is now that was what it was but you can see that it's actually picking up quite a level of detail um around um what I'm talking about and then it's coming up with various labels which again may or may not be useful interestingly this seems to be actually coming out of some of the slides that I had used hmm interesting and then it comes out with specific named entities that I referred to you know Clayton Christensen, Ray Wang who's an analyst um brands that I talked about and then it's it's you know I like this one one only two percent of it was joy um but it comes up with a sentiment as well but you can see now this runs pretty quickly uh it also gives me a timeline so it's going to give me a traditional transcript and all of this but you can run any video to this today so video indexer dot ai um and get an idea about and you get this idea that you can you can analyze videos much more quickly and you can do the same with audio much more quickly than you could by hand and get insights that are pretty good um and certainly better than if you didn't manually not so much better quality but um 80 quality and much faster yeah I'll stop sharing there thanks you mentioned that that service is free are the other services and there's apis are they free to use as well yep um this this particular service will let you do any video one video at a time is free um you may actually want to do it across multiple videos um and um but the other services so these are to demonstrate the possibilities there is another one let me just quickly show I'll put it up on the screen I'll put these links in before we leave as well is another one called computervision.ai I guess that's not it I'll have to find the links sorry forget that something similar I'll custom vision.ai sorry I'm just going to ask some questions while you're having a few um come and the other one related to sort of cost um and how do you can small institutions particularly get funding to do these sort of works you know these sort of projects? I think there are um well there's the the the traditional grant system I would love to see and the UK is pretty good at doing this of I'd love to see more organizations coming together to approach you know because of similar skills you don't need to repeat it in every institution I realize that's a coordination makes coordination more difficult but I think that that's one opportunity but I know that the UK is very good about aggregating um you know or looking at things at a country level um and so I think that that's one way to look at it but I also like me to say that this is very inexpensive um as the technologies go and I really need to look up the right pricing but the pricing is something like one cent one US cent for a thousand images something like in that range or ten thousand images it's not um now if you have a million images that starts to get real but we're not talking thousands and thousands of dollars here this is not and so I think that I'm not going to be naive and say it's within everybody's budget but it's also not as expensive as you think compared to say other things um so I think one is the grand system one is looking at from a country level you know is this something that the United Kingdom I would love to see more of this because it's the most efficient way to do this um want to apply and and then I think there's also the experiment where they didn't see how much it cost because you might find that it actually is very you know it's tens of dollars um for your collection it for a very small institution right yeah and then we've had some questions around the joys of algorithms is that they can come with issues of um equality and inclusion etc yeah and how do Microsoft go about identifying those and come combating those you know that yeah it's a built bias yeah um well Microsoft has a um some very strict guidelines around how it uses AI um it has um and there are lots of standards around it um I can also when I share this deck out I'll I'll put some of these links and I probably should have thought about these beforehand um so you can find out more so I will say that and I know I appreciate that I'm biased um is a very responsible company when it comes to um thinking about those and the implications um the challenge with not the challenge sorry the more data something is trained with and the more applicable that data is to the more perspectives it can represent um and so that's why I'm saying you should try it with your own data uh first and see if it is useful and it just with a small subsection of your data and see if it is useful and see whether you have any concerns about um the perspective that it brings um which I know sounds like here's the reason why I'm saying that right some collections is not going to make any sense for um because um it hasn't been trained on sufficient data to do that so we have the privilege at Microsoft of having access to a lot of data and training it for you know so we're going to probably have better trained models for a generalist um application than you know any individual organization can have except for many libraries it's a very specialist um topic and so when you start to get into very specialist topics you need to be trained in those topics to actually provide the perspective so it's an issue um I think sometimes we look at artificial intelligence and say that it is inherently bad because it's biased but I think like everything there is actually bias everywhere and what we need to do is so what so where there is bias it will be amplified by AI but it also has many benefits and so I think it's all about taking a critical human eye and what is the output and how do we feel about that output and does it and does it solve a problem for us um but you know it's um yeah it's a it's a difficult topic and we try to do the best we can um but I don't I know that there's often a reaction which is AI is inherently bad it's more about Brad Smith who is the president of of Microsoft wrote a book called tools and weapons and he says that all technology has been a tool and or a weapon and it's all about how you apply it and so it's been the guiding principles of how we've approached it at Microsoft uh but it's a valid question absolutely and have you worked with some libraries and museums because obviously they do have a lot of metadata to do that training of the tools as well like taking I would love to I would love to um I I will admit I've um everyone's usually says well this is great but then the ability to move it into doing something with it um or trying it out um has been very limited sadly because for me I see it as a very inexpensive accessible technology that can make a big difference um and um but yes so anyone who's interested here I'm willing to talk but most um but yeah it's been hard to get that critical mass and interest in terms of actually um doing something as opposed to oh wow that's great yeah I can see lots of hands will be coming and then yeah there's another question here around sort of digital preservation which is an issue that we're looking at and obviously a lot of what we would like to preserve is developed by Microsoft tools so is that something that Microsoft's looking at as well how we preserve documents coming out of Word, Excel, Excel, Outlook. Oh you mean that kind of digital presentation sorry um yeah I'm I can't say that I can speak to that but doesn't mean that Microsoft's 200,000 people doesn't mean that Microsoft isn't uh working on that um yeah I don't feel like I'm qualified to respond I'm sorry. And I hope I asked that question right Arlene um and then some questions yeah we've had quite a lot come through about how the tags are generated so do you know more about sort of what happens behind the scenes when it looks at those that video images? So what it's doing is you know all all of these kinds this artificial intelligence comes from a set of testing a set of training data and training data that has been tagged with and you know you can take this into millions and even billions of items right to get to it you know that has been previously tagged and said this is uh this is a boat right this is a um military you know example or something like that and then it looks for similarities um and so it's what is actually doing is in computer vision what is actually doing is like doing a pixel by pixel kind of like um analysis of something that looks similar to a boat and then it's continually trained and got feedback um until and there are you know there are lots of standards that are set in data science around like what's an appropriately um confidence level confidence level for something that's been trained which is why I keep coming back to if you don't have your original training data you're not going to get anything useful um and that's why we need more and more training data in specialist fields where we need more and more training data in the areas that are under represented in society so that we actually get better results um and um and so it's um sorry and I forgot your question I apologize but I went off on a tangent but it's that's how yes the what was the process so one is about training data and then it's about applying that so there is a whole as I said is there's a whole science around how you tag how you test how you get confidence level and you do you know training and your predictions against your test set your predictions against real data what is the difference and all those kinds of things so it's a whole sort of like process but the thing that technology can bring back to those constraints that I talked about earlier technology can do this at speed um it can it can train at speed it can do predictions at speed and so you can imagine that you just keep you're like on this little circular wheel you're just going over and over and over again and you get better and better results and the technology also gets faster because has ability to compute and process faster um and this is the this is where you get that kind of benefit um but it clearly needs more data right and so one of the challenges is when you start to get into things like um medieval era scripting for example um there is there is better training data than there used to be but there still isn't as much as there is about a current day um for example um and the same with handwriting you need lots and lots and lots of examples of how people write to be able to do that handwriting but um hopefully that helps understand the process and that comes back to it better isn't it that you might like our tv systems on our phones you might win it at this time and then come back in a few years and want to run through the process again to see it improve and what that looks like as well yeah there's always guaranteed it's going to get better um but that's not a reason not to start today and then yeah we're coming towards the end but I just wondered like we've had some questions about discoverability and I always find it really interesting these like network graphs discoverability which is different obviously where a lot of our systems are still that long scroll you know and I think we move towards that network graph and people knowing where to start because at the moment you basically you put your search term in and then we get this list isn't it and it is that sort of following it through yeah um yeah I think it also the use case that we have with search is you're looking for something specific right um you know what you're looking for um whereas that that sort of a network graph is about things that you may not have known to look for and it's about surfacing those so I think it's about another it's also about another use case um and I think with all technology regardless of this or anything else you have to start with how you want to use it um and what is the problem you're trying to solve for it could be speed right speed of describing um again I another museum example because I come out of museums but um maybe the supplies and libraries as well the the collections that are backlogged for description in museums is mind blowing you know um the museum I had I think that there were there was a gift that they were given in the 1960s that is still there waiting to be cataloged right um and um you know 50 years later or 60 years later and you know come back to something I said before which is all these knowledge is just like waiting to be unlocked and you know how can we how can we get this done faster and no one's ever going to say it's the same as someone who's been academically trained um but it's significantly faster and it's a good first draft um and definitely worth testing on your data to see how it applies and if it doesn't apply now come back in five years um and see how much more the technology has progressed because technology just goes like this it's crazy