 Today I'm going to talk a bit about multimedia as a data source, particularly about data management and support of research where multimedia is a data source and is the research material. And I guess I need to confess right up front, I'm not a new media expert, I'm what could be referred to as a digital librarian. So I come at this with a very different kind of understanding, potentially an exploratory approach and I'm very interested in the kind of questions and comments or thoughts those listening in today have. What I wanted to start off today with was to I guess have a bit of a think about what multimedia is because I have to say that this is what went through my head when I was thinking about offering up this webinar. I also wish to acknowledge colleagues over at Edith Cowan University who triggered the idea for this webinar because they were interested in looking at different types of data, particularly the sort of data that is generated in support of performance studies. And so that's where this idea came from. But I really wanted to look at how multimedia was defined to get a bit of an idea about the way that it's referred to by different groups of people and also what it is from a technical point of view. So you can see there on the slide, I've given a mixture of definitions of what multimedia is and I think the second one in there is the one that really holds my attention probably the most is that it's about different types of data that's actually contained in a file and there are different ways of referring to those components. So I've picked some of those out in the third point. I was quite intrigued to discover that there are chunks and atoms and parts and perhaps because I don't have any media background I come at this as someone who's looking at the material nature in some ways of multimedia. I wanted to have a think about where multimedia appears in the research environment and there I've listed films as an example of multimedia and web pages where you can have a mixture of moving image and sound possibly some graphics and digitized documents that can have both image and annotations, markup and transcripts and also satellite images which is something quite new to me. We had the great benefit of a lecture within the Australian National Data Service recently from Stuart Minchin from Geoscience Australia and he opened up my eyes to what a satellite image is and all the layers that actually exist in a satellite image. That was really interesting. He gave a great talk on the data cube that they've developed but that's a whole other topic. I just thought I'd pause at this point to see if any of those in the group have a background in multimedia or have some questions about the definitions of multimedia at this point. So I guess looking into where multimedia appears in the research domains got me thinking about the fact that this topic could come up through those working with people who undertake performance studies. So I listed some of the research domains there where multimedia is generated to help me kind of unpick who's actually using or creating multimedia and where that's happening and why they're using multimedia to get a better understanding of how it's created and also perhaps used. So I guess I'm looking at this from a cultural production perspective and I started to look at the methods that researchers were using that they used to create or process data or information and I found that a very kind of iterative process I kept bouncing across from the idea of something as information and something as data and having a bit of trouble identifying what was what and I kept circling back to asking myself what is the researcher doing and I think it really helps to look at research methods to understand when a digital object is being treated as a piece of information and when it's being treated as a source of data and I'm sure that there could be quite an exhaustive conversation about this but I really thought it helped to understand the purpose to which this material was being put and how it was being used and whether there was a human looking at it or whether it was a computer looking at it at the multimedia and whether that was a useful distinction or not and I certainly don't have all the answers and if anyone out there has some of the answers that'd be great or I can see lots of chats coming through that's really good. I think you asked earlier for people to suggest other types of multimedia that they've been working with or are aware of and some that have come up include photography and video gaming and opera. Where I got to was looking at multimedia was trying to understand what happens to a digital object and that's what I refer to something as a piece of digital material and so I tackled annotation as an area of research I guess interpretation or a process in which information is applied or data is applied to data and I became very tangled so I picked out four areas where I could see the word annotation being used and the first was genetics which was really interesting I'm quite fascinated by the idea of automated annotation and how that actually operates in genetics and it's more out of curiosity than ever having a desire to be someone who studies genetics but it was really interesting to understand the capacity for generating very large amounts of automated annotation and then I moved into geoscience to look at what happens when people annotate geospatial information and the kinds of terminology that's actually used to understand what's actually happening and whether data is being applied to data or whether information is being applied to information or data has been applied to information I really don't have the answers to this but I wanted to unpack what was actually going on to get a better understanding of how multimedia was being used and enabling research so I moved on to linguistics which was again quite fascinating to discover the different types of annotations that are applied to languages and I've listed them there descriptive analytic time sequence and text so that was I guess really interesting for me to understand to pick apart say a descriptive annotation from a time sequence annotation and try and understand what's data and information that it helped kind of distinguish different annotations but it certainly didn't help me answer the data and information kind of dilemma what's what's data and what's information but I recognize that material was becoming what seemed to be becoming increasingly multimedia in nature if it hadn't started off that way in the first place so the last area I looked at was biomedicine and I've got some images following this slide where where researchers they annotate images and they do that in different ways by drawing and adding notes and marking areas and I thought this was really fascinating and it made the idea of simplifying managing multimedia into what is data and what is information kind of meaningless in a way because it might be a theoretical concept rather than something which actually helps the researcher to do their research so I thought I'd put some examples in front of us here and this is a biomedical slide and it's been annotated and you can see that it's been annotated with a line shape and also with some words and that there's a scanned image underneath okay the next image I've got is this is this wonderful gene annotation image that I looked at and really couldn't make a tail of but it made me pretty interested in understanding how geneticists actually managed their data and what information they derive from that data really really complex and I think I've mentioned the fact that machines actually generate these annotations made me want to understand where those annotations are actually put and how they're linked to the gene sequence but I think that's a whole investigation under itself and the next image I've got which is slightly more familiar for many people is is a Google Earth image which has been it's a satellite image which has got street markings and bubble pop-ups and line tracing it made me want to understand a little more about how the how that information was being captured to know how to support researchers who want to manage their data effectively and be able to potentially make it available to cite it or to present it as part of their research this is the last one which I hope those of you who've ever been to Portland enjoy I found this on Flickr and it's a graphic image in the background and on top of that it looks like there are letters that are very carefully placed in alignment with what's called a spectrogram which is someone saying it rains a lot in Portland and I thought this is really interesting I wanted to understand a whole lot more about how these discrete pieces of data were actually brought together and whether you captured that all as one thing or whether you captured that separately and if the researcher uses those separate components as part of that multimedia but it made me understand that where the annotations or the combinations occur may be really critical and supporting some of the research findings is the first question that's come up is someone's interested to know whether anyone in the audience has worked with magnetic tape archives or if you've worked with tape archives or if anyone could suggest some points of contact I haven't worked with magnetic tape archives but I'm betting that national archives maybe major agencies I'm intrigued to understand a little more about what's prompted that question in relation to multimedia and if there's an opportunity to to sort of unpick that a bit more there's there's another question here is it accurate to perceive multimedia as not the raw source or primary data but as a visualization of the raw primary data you know I really don't have the answer to that but I do think that's why I've danced around what's data and what's information and how that kind of fits into a discussion of multimedia because it made me want to understand less about the defining of that and more about what was important to the researcher to enable them to do their research what is the information or what is the data that's going to enable them to do their research and when does data become information in the context of that research thanks Engrid another question that's come in someone would like to know if you have any thoughts about how multimedia might work or be used in forensic linguistics well I guess if you're capturing sound files and I know that this is potentially been on the news of late with analyzing the voice of someone who's been involved in aggression overseas and I guess that is underlying that is actually looking at how a sound file is created and what you can pick out from the different sounds that are captured and I really don't know how sound files work and I'd love it if there is someone who's got a bit more expertise in sound to contribute but I imagine this is where those layers and being able to pick out different spectrums and potentially changes in modulation is really important to see if they're a signature associated with people's voices but I think we need a linguist after that question that's a great one following on from the earlier question someone wanted to know who might be working with magnetic tape archives one of the audience members suggested the ABC and another audience member just happens to be doing a project to recover NASA magnetic tape archives to mine image spectrogram and audio radio astronomy data wow what an interesting project I'm what I'm fascinated about looking into multimedia it also is discovering language that I've never used before I've never used the word spectrogram and it's still something that makes me think I'm at the doctors but I'm not sure whether that's an appropriate description or not I thought what I'd do is introduce a project that's happening here in Australia that kind of for me emphasizes this idea of what's information and what's data and what other research is looking at and it's a project based up in Griffith but I think with people dotted around Australia I live by Mark for name called the prosecution project it's a centre of excellence policing and security and they're looking at criminal trials over time and they've been digitizing archival materials and transcribing them and you'll see there on the slide a nice kind of slashed image there that marks applied to kind of give you a view of the digitized image which to me is information but also on the lower part of the image is where the data entry occurs for transcription what I've found interesting in the exchange with Mark about this project and I met him through an interaction with Alana Paika recently up in Brisbane is that they're really looking at making the absolute most of this digitized material looking at it from an informational point of view to look at being able to read the records of these cases criminal cases here in Australia and also looking at what the data underlying that information can tell them it's been a pretty interesting process to get to grips with what it is that they're doing and I hope that this offers some insight to perhaps why it's important to understand what the research is trying to do and that they're interested in using whatever method and whatever feature of multimedia to enable them to do their research so Mark has emphasized here in the outcomes that they're looking at a mixture of research methods both quantitative and qualitative he's sent me an article and I will pop the link into these slides so that others can have a chance to go and have a read of it but the qualitative aspect of it was something that was a little more familiar to me the quantitative aspect of it was something quite different and it made me realize that perhaps looking at mixed research methods was also a way of understanding how multimedia is operating as both an information source and a data source but it's the data side of it which I'm finding I guess enlightening is the word to use and that they're getting that data through transcription human transcription but in other cases of digitization it can be character recognition so this is where I kind of got to as multimedia as a data source I got to the point where I decided that it could be both information and data at the same time because it's the way that the researcher is using it and building whatever they are learn from that multimedia whether it's being looked at as a piece of information or as a data source to do their research and so reading the cases or reading the court records and also doing text analysis or data mining is enabling this researcher this researcher or that research group to do their research which I think is a pretty incredible potential from one source of digitized material and I think that's quite an exciting prospect so from a point of view of management it made me think about how how they were going to approach managing that and Marcus can't be kind enough to give me a description of how the back end to the prosecution project is going to work they've got archival materials as digital images they're going to transcribe those images into an SQL database and that supports them doing quantitative analysis of longitudinal and comparative patterns this is an email that he sent over the last week they're looking to extend that database by accessing or linking other data sources and he's mentioned the Trove archive and possibly other projects or other digitized material like the police gazettes to enable qualitative what he's referring to as case level as well as quantitative analysis and they're looking also to enrich the data by accessing and just transcribing the trial transcripts and other text archives so I guess what I understood from this was that my notions of splitting something into information and data were helping me to understand what it is that was going to enable this research group to do their research but also I needed to dig even deeper into what sits underneath this application to understand how they're storing the digitized images and how they're storing the transcriptions and where they're wanting to store the linkages between those two things and I realized that multimedia in this context is very complex and that all that language that I introduced at the beginning about layers and components is important to I think inform how we support the management of this material so that's where I got to with the prosecution project so I'm just going to stop there before I get onto the three applications at the end and ask if there are any questions. Let's see there's been some further suggestions for sources people working on magnetic tape archives including the Smithsonian the National film and sound archive. Last but not least I decided to have a look at three applications that enable a person to manipulate multimedia so I picked three that seemed to be reasonably familiar to me and just wanted to have a look at how they how they enable material to be brought in and how they enable information or data to be applied and what happens in these three applications and these are I guess recently ubiquitous applications Final Cut Pro and ArcGIS and WordPress they're certainly not the very domain specific and you know I guess less commonly used applications that you might find in biomedicine or genetics more specifically so I had a look at Final Cut Pro to just try and understand some of the language that's used to understand what's actually happening when you use Final Cut Pro and if you've got any competent users in the group today it would be great if you offered some advice but I just wanted to look at what Final Cut Pro does to digital material and from what I can understand is that it technically consists of separate files there's something called a project file a media source file and render or cache files and to me that gave me an understanding that the multimedia was being captured in different ways and potentially for different purposes and I have to confess I haven't ever used Final Cut Pro and I think this is an interesting way for us to understand how multimedia is either brought into an application and where it is saved but also to try and understand what happens when you want to try and get that material out of the application and how you store that and whether you store that as a combined object or whether they're separate objects the open archives information system model which is used in the digital archiving world has been interpreted different in different ways to give you an example a long time ago when I was working on the National Digital Heritage Archive in New Zealand we decided to be very clear that we would capture metadata separately to capturing material that we were hoping to keep and I think it was the Dutch National Library decided to go a different way they decided to build the digital object with both metadata and also the object that was being collected and to me that is two very simple ways of approaching capturing multimedia is to separate concerns if you like or different types of digital information or to actually build it into a bundle but it made me realise that if I was trying to get material out of Final Cut Pro I would want to understand how this material could be linked back together again in case I ever wanted to work on that multimedia material again I won't carry on with that one but I did I did look at this and wonder how the output of Final Cut Pro is captured and how the components are captured and I don't have the answer to that today so ArcGIS this is another tool that I haven't used but I went in to have a look at how material was viewed in that application and what happened to it when it was being used and what I can understand from this is that it's possible to pull in images geospatial images and it's possible to pull in geospatial data if you like let long that kind of that kind of data and to build up layers within this application and again it made me think about being able to maintain those components separately but also to maintain the final output which may be a combination of those components could be critical to a researcher it might not but when you're dealing with different parts of material how is that used by the researcher from a point of view of looking at a map it's from a human point of view we can read it but is that an important aspect to the research or is it the annotations on the map that are more important I can't answer that but I guess in terms of being able to support researchers who use or create multimedia it's important to ask what it is that they want to do with it and whether they want to deconstruct or reconstruct from those original components so the last one is WordPress and this is one that I suspect many more people have experience with I've always wondered how people get the content out of WordPress so I went and had a look to get an understanding of what happens if you've had a website up using the WordPress application and you want to suck all the content out so you can capture it and perhaps put it into a different application and it may it may be very important to keep discreet narrative that's in posts or pages or comments separate from categories and tags and what I can glean I think you can get that out as separate pieces of data but it may be wonder how a researcher might actually use that material whether they would just want to reimport it into another application or whether they actually want to process those tags or categories to see how much contents being given those categories or tags. Hi Ingrid there's a couple of comments and one or two questions firstly there's someone at ANU using the Occam's content management system that's been developed at ANU and they use an extended Dublin core metadata schema Occam's can edit digital file object such as video audio image document they can edit digital file object metadata and also create record metadata to which files can be linked and the metadata itself can be embedded in the object or exported to an XML sidecar file they're currently working on being able to publish objects and records into WordPress including export of the metadata. That's really interesting because it sounds like that kind of well ununderstanding of the components seems to have if I if I understand correctly have really informed the way that that application has been designed so that you can maintain digital material discreetly irrespective of whether it's straightforward data or multimedia and I'm using that distinction very spuriously but it sounds like it's possible to pull everything apart and put it back together again so I don't know if that's a feature of multimedia that's useful for the research data management community to be aware of that being a bit like lego perhaps that being able to pull it apart and understand when you pull it apart how it was constructed so you can put it back together again might be quite important depending on what it is that you want to do with that material. Gee that's really interesting and ah that the content is being made available to go into WordPress fascinating. The question here um someone wanting to know if you have any thoughts on archiving websites as data sources is a website really data. Just before I get on to that and bang on um I guess I'm intrigued by the discussion of Occam's going from a data capture application to what can be construed of as a publication application and that the material becomes I guess a means to communicate the research but the data capture application is where all those pieces of information and data actually get brought together so sorry to the next question on web archiving. Long long time ago I used to be a web archivist at the National Library of New Zealand so this has been something that's interested me for quite some time and I've been really I've been hoping I guess to see or to assist with enabling material to be published to the web in a way that you can have a pipeline of that content whether it's data or information going into an application on the web and then um being able to harvest that website as a whole and also suck that material back out again as discrete components. While I was at the National Library back in New Zealand we developed something called the web curator tool which um was used to do that harvesting from the web so you scoop up the website as best you can from the way that it looks on the web but the example that I have in mind here is about how what kind of result you get from that and the the website that I was particularly interested in was the psychopedia in New Zealand that was being put online called and the front end of that is really important from a collecting point of view to capture because it shows you how the interface is designed and how the information is presented but the back end to that is a content management system and before that is a records management system if that's still correct at the Ministry of Culture and Heritage so capturing websites can be done in more ways so you could use the web curator tool it has an engine underlying it called Heratrix which is used by the Internet Archive but I'm yet to see that as a workflow to support a researcher potentially capturing information using a data capture device like Occam's potentially or we developed one a while back called Xite9 for some linguists here in Australia and then being able to port that into a content management system and then use that content management system to publish to the web and I think if there are multiple purposes to which this multimedia material is going to be put at each step through that it's important to understand what needs to be brought together and what needs to be able to be pulled apart and also what what is it that you want to keep at the end of that I really hope we see that eventually that that's possible we can actually sort of see that life cycle and ensure that that material is retained in different ways thanks Ingrid there's a comment and a couple of other questions firstly the comment someone saying that extremely large file sizes could be another consideration to mention when talking about managing unprocessed multimedia project files over time for example in Final Cut Pro ah whoever's just asked that or made that comment you might be reading my mind tomorrow I'm giving a talk at a workshop at University of New South Wales on digitization and data management and large images and I've been digging around to learn about a file format called big TIFF it sounds like an enormous argument but it's it's a kind of TIFF file and I think that that's that's a really important point to make and what I've learned by looking into our large image formats is areas of research that use this like the example I've given is by Med Before where they have these microscopic images that are just enormous and that they need to retain the image and also the annotations on the image and they may wish to align those annotations if they've got they're looking at particular shapes or morphology of cells perhaps and it made me realize that multimedia in the way that I kind of understood it from a library perspective was really limited and that the way that a cancer researcher might look at large microscopic images as multimedia and the way that they work with that material is is quite different and that there are real constraints certain applications can't handle the file sizes so they need to be converted and compressed and also it's sometimes necessary to cut them up and tile them so that you can actually move those large images across a network and that really made me think quite hard about how you support researchers to manage their data especially if you're you know pulling apart a large scanned image so yeah I think that side of research is is very new to me and it's certainly challenging the way that I've understood multimedia to be given I guess my cultural heritage background. A couple of questions from the same person that are and the questions are related to each other. Firstly we want to archive and preserve a WordPress website our 23 research things that we've just been running for the past 30 weeks so this person would appreciate any insights that you have into data extraction and preservation but just relating to this on the same track they're also asking should we be considering exploring export options or similar from any applications that our research IT services are considering offering to researchers for example Omeka shared self and that's importantly if yes how might this be done? First and foremost I think there is an export function to WordPress I haven't had to go at it but I think it's important to test that out and I'll be very happy have an exchange about that because I don't know what happens when you hit the export button and what kind of package you get and how discreet those components are or whether they're mushed up together I think that's quite critical to understand to know how well it is that you comprise things apart and know about their relationships even though you are kind of pricing them apart if you want to dump them out of an application but the I do think it's important to think about export import and export and what happens within the application with multimedia because it may be other data types and multimedia may be a bit of a furphy here because if it's coming in from different sources and being combined together to create an output from a data management point of view you want to understand the provenance of all those components irrespective of whether it's data or information and potentially be able want to be able to keep those apart if you particularly are value adding to something which you have access to but you don't own so for example you have access to some data and you create your own annotations well those are your annotations and you may you may share that those annotations with the party that's loaned you the the initial data but keeping those things discreet might be just as important as being able to link them together so I do think we need to kind of explore how material is processed brought in and processed and how we might support that coming out the other end so that we can enable researchers to potentially make their own data available if that's appropriate but also to maintain the data that they may wish to add to as they go through and undertake different types of research. I don't know how this could all be done because it's so so diverse it just made me realise that perhaps putting on the hat of not understanding and just trying to find out what was going on and what needed to come in what was happening and what needed to come out of that research process through an application with digital material was the starting point but in each case I think it would have to be explored to know what to do and how much effort to apply. I do think the WordPress example was where I started first was to understand what would I do once I got that material out and how would I want to keep it and that may or may not be important to our researcher that our colleagues over in WA sparked this because it sort of set me off on a bit of an exploratory process and I really hope that some of the people participating today undertake a bit more of that to extend our collective understanding because it it certainly well I have to say I found it quite intimidating I realised that there was so much to know that it was quite overwhelming. Do you have any thoughts about 3D rendering files? A long time ago I had to write some information on annotation which is why I picked it for this presentation to try and understand how you would locate an annotation in a three-dimensional object. It was a research project that I very incidentally worked with some of the researchers on at University of Sydney looking at the likes of games or virtual platforms like Second Life to see how you would apply an annotation in that kind of 3D environment and it made me realise that I needed to understand a whole lot more about working in three dimensions x, y and z and where you would locate a piece of information in that and what sort of tools. I don't even know the kinds of tools that generate three-dimensional object probably CAD tools that are used in architecture or industrial design but I certainly am not familiar with those types of tools. ArcGIS potentially does if it's dealing with space but 3D files to me are again the whole other area of multimedia that I think having time with those who who work with that kind of material or having a background in say design three-dimensional design would be incredibly useful because I'm very much looking at that from the point of view of an outsider I really I don't know and I certainly I can't comment on rendering 3D files except perhaps that they depending on what's in there could be pretty large and I don't know. Well thanks Engrid. Someone's saying that GISC has been very keen on 3D works. At the New Zealand Archives how did you separate the metadata from the object where did the object end and the metadata begin and how did you decide what was essential to the object. Okay I'll try and start at the beginning how did you separate the metadata from the object. I think the cheaty answer to that is that we had a collection management system which had different modules to it so it had a module which collected descriptive information so that was where metadata was captured. There was also a module that enabled digital objects to be loaded into it which was its kind of online catalog but underneath that was also what we called what we call the object management system where the objects themselves were loaded in and linked to the metadata that was in the collection management system so we had quite we had two separate systems and that to some degree dictated the way that we managed that material we managed the metadata very separately to the actual digital object itself and in other systems that's not the case that they managed that material within the same application and as to where the object ended and the metadata began when I started exploring big TIFF and kind of I guess reacquainting myself with TIFF as a file format I really I guess woke up to the fact that a TIFF file has metadata in it and data and I really didn't understand enough about file formats to begin to understand how I would describe it and so once I started peeling back the layer of a TIFF file I realized that there was a whole lot more information in there and when I looked at Landsat images I discovered that there were layers within those file formats I think they're called net cdf that's right they've got three layers in there something called the data access layer the coordinate system layer and the scientific data type layer and I am reading that from a piece of paper but I realized that that these were treated quite separately in the structuring of a file format I think depending on the way data and information is captured I think you could kind of move that line between the object and the metadata that perhaps describes and supports its retention as to the last question how did you decide what was essential to the object when you're in the collecting business you have to make decisions about what it is that you think that's important to keep because the whole point of keeping material as a collection is to enable that collection to be potentially used or viewed in some way and ideally you want to to capture the essence of the object so my comments about web archiving were just that when we were looking at what we could capture someone that I worked with back here in Australia Jason G said to me you're just cutting holes in the internet it's you know it's that's all you're doing you're cutting holes it's not the network and I think he was absolutely right what we could represent was a piece of the internet at a point in time and I found that pretty interesting but also some websites were really difficult to harvest using the kind of tools we have we used a tool called ht track which is open source so is the web curator tool by the way and you've got variable results depending on how you use the settings on those tools and flash was notoriously difficult to actually capture so in some instances other collecting institutions around the world have done film footage of a website to actually capture it and in other instances they've gone to the back end to capture what's in it and taken screenshots to reflect what the interface was like at the front so I think in each case it's important to understand what it is that's of value to the researcher and in a research context and also in a collecting context because if you're going to go to the bother of capturing information effectively it's with a view to making that accessible potentially reusable again that's a really really good bundle of questions and they're crikey and will be made available by and online for people to see who haven't been able to attend today it's been a real pleasure I don't really hope we get to have a bit more discussion and input because there's plenty to learn out there we need the collective brain at work definitely