 Well welcome everybody. This is the archiving your work panel. My name is Jordan Mitchell. I am the archivist and co-director of operations at town meeting TV or CCTV Burlington. And thank you all for being here. Before we get started I know Mike wanted to give a little pitch to summer preservation work so Mike the floor is yours. Thank you Jordan. And I'm not going to I'm not going to take too much away from this great panel I'm looking for to sit in but if you're specifically interested in historic preservation within your organization and community archiving of media and helping to document the people who have made a difference in your organization and your community through community media work that you've done we'd like to encourage you to be connecting with others in the field across the country who have similar interests. Since the national conference this last summer we've been gathering people on a quarterly basis to try to encourage historic preservation efforts as well as archiving efforts for organizations in the field at large. The next quarterly meeting is going to take place on December 7th at 2 p.m. Eastern. If you want more information I'll promote it in the ACM newsletter or I'll give you my card if you don't get the newsletter and I'll let you you can be part of that discussion but we really want to encourage people to continue this discussion even beyond today's great discussion that's going to come. I hope I'm not pumping you guys up too much. Pumping you guys up too much oh my god so much pressure but again please join us if you're interested in continuing the conversation with colleagues across the country who are interested in historic preservation in the field. December 7th at 2 p.m. Eastern and I'll I can talk with you after the discussion today but I just wanted to take a moment and thank you Jordan for the forbearance I really appreciate it. Thank you Mike. Well thank you for being here. We got started a little closer to two so we'll end a little after three but I want to make sure you also have time to get to the raffle so I'll be mindful of time but thank you for being here. To start I want to introduce the other panelists here with me. To my right we have Patrick Wallace then Rachel Onoff and John Hausner and if the three of you want to go ahead and introduce yourselves in your own words please go ahead starting with Patrick maybe. Hi I'm Patrick Wallace I'm digital projects and archives librarian for Middlebury College so in our college special collections we have well one of the great advantages of working for an institution like Middlebury is that it's small but we have we're pretty well known and have a lot of a lot of well known programs in the academic northeastern liberal arts college community so for someone like me I work with both digitized and born digital material from dealing with college history also a lot of things that we get in from alumni and events everything else and we function as kind of a research lab for using archival collections building trying to trying to build and and find novel uses for archival content and share that knowledge out so we have partnerships with others in the state and yeah happy to be here. Great thank you. Good afternoon I'm Rachel Onoff and I'm the director of the Vermont Historical Records Program which is based at the Vermont State Archives and Records Administration. Our program strives to provide technical assistance to historical records repositories of all stripes and we use a very broad definition of what formats are included within that determination of recordness. We also try to build statewide capacity in support of those typically smaller organizations through statewide collaboration and cooperation or closely with people like Patrick on projects like statewide digital repositories which we have in a planning stage at this point as well as working on disaster preparedness and response and collections care issues so I'm a generalist I'm an archivist by training but I'm quite interested in in working with groups and I've worked a lot with CCTV on the particular challenges that they face managing their assets over time and been thrilled with the white paper that was done a number of years ago here in Vermont thinking about how the different community media centers could work together to build or coordinate at some level this and I'm great it's thrilling to hear that at the national level there's these discussions about longer term preservation of these these tremendous assets and I'm thrilled to be here. Hi I'm John Hauser I'm the archive guy I'm the amateur in the group these are all professionals you're doing it man I mean come on and yeah I started because I got involved with Access Humboldt and we didn't have enough bandwidth to provide video on demand for our members our county is 100 miles long 40 miles wide and only about a third of the population is served by cable so we have you know from the civic engagement standpoint we've got people who who don't have cable who need to hear the government meetings they they need to see the public access shows and so I started kind of going around trying to figure out where who would we partner with to do that and this was only about two or three years after YouTube had started and at that point you were limited to 20 minutes or one gig of and you know a five hour government meeting isn't going to cut it and it turned out from providing started out as video on demand but then there was the blip tv issue and it was an early video sharing service and right up until one day they woke up and they said yeah we don't want your unprofessional non-commercial video anymore and so there wasn't any time you know if you didn't if you hadn't been archiving that or if you if all your assets were up on blip tv you were dead you know so when the same thing will happen to youtube you won't if you're not already you know everybody went to youtube because that's where the audience was and so i'm i'm skipping ahead a little bit but i wound up deciding instead of trying to teach access centers i spent the first seven years trying to teach access centers how to do it yourself archive their own stuff up to the internet archive and i then said you know this is like hurting cats there's no consistency it depends on who's doing it so i wound up looking and seeing how many access centers were using youtube or vimeo a couple thousand of them i said okay well that's where we should go so i developed a way of archiving your youtube channels into the community media archive and that's who i am and what i do so we got 1700 collections in there two point one million videos about 1.9 petabytes 90 of the largest hard drives made so thank you thank you all for being here um times on this agenda are now totally out the window um and i think i'm general overview for for this afternoon uh we all introduced ourselves um we're gonna go more into like an open discussion and then towards the end uh wrap up in final questions um so i just want to get a feel for the room what if any archiving or digital preservation work is your public access center doing now maybe raise of hands if you're doing anything at all or if you'd like to share um and or if you're doing nothing i'd also love to hear that just what is the you know what are folks here doing yeah you guys i feel it you let john take your youtube channel great um we have more it's more local we have hard drive systems often it's just within our office with anything that we do goes into this system and it's backed up to be backed up great local hard drives glad to hear backup of the backup that's great um anyone else yes in the back we're going to use a teleview to push into our great archive network any any teleview customer that's available as an added module um you do want to configure it because if you don't configure it it goes up into the the the collection where there are hundreds of thousands of videos and a jihadist beheading video may follow your town government meeting so get yourself a collection established through me and then you can configure it to upload to that collection yes dvd library great yes and it's looking back you can there's different hard drives and there's a whole dvd collection and it's like you can see everyone kind of taking over this position of archiving giving their own kind of stab at it so um yeah i'm just kind of continuing that on and hopefully get some more other ideas and stuff like that great thank you anyone else like to share yes sorry i don't want to talk again after talking about um this is a really fun session now um so also a ton of drives some at home like in my exciting time in two locations some at work um a lot of spreadsheets internet archive love it but we have recently also added youtube because it was faster but have both and love the archive because people come under the hood on things and they won't pull you all down if there's a problem thank you anyone else yes i'm a software vendor so curious about the what i'm hearing a lot of hard drives these are the video files themselves curious to hear the panelists and others talk about what you're doing about archiving your metadata about those videos great these are all great things we're hearing metadata spreadsheets backups and backups um in your home is interesting but um this is all great youtube we'll come back to that one later so thank you and i think just one second question is what are you hoping to take away from this session i think people have already said a little bit but is there anything that hasn't been mentioned that people are hoping to hear from the panelists great priorities about what you're priorities about what to archive great great so i'm going to leave up here for the remainder just some open discussion prompts but i think based off of what we just heard um panelists if there's anyone who wants to take one of those questions and dissect it a little bit um that might seem a little open-ended or we can kind of go down some of these other questions but whatever feels right um i know metadata was brought up i know storage and different storage solutions was brought up um software of course internet archives um so if there's anyone itching to jump on in i can speak to some of the metadata um in that question with we we have a very small shop in middlebury it's me a couple student workers um and so a lot of but we have lots and lots of content um and i feel like with metadata something i learned in in school studying cataloging was you want the enough to get the work done of being able to find things being able to classify them um and things like embedded metadata you know video files that come with with additional metadata in their container format there's not necessarily any reason to unpack that or worry about that when it can be just pulled from the file at a later time as long as as you're not doing too much transcoding you know if it's we have an original file it's being stored don't worry about that um but things like you know creators dates all the simple stuff um or anything that's really critical to the use cases that you're presenting so if uh you know writes um things like that where where release forms are stored um and those types of related materials all that is really important it's really good to get that done as soon as possible and keep it um but it doesn't need to be exhaustive john i've talked a lot about going through um older videos and having transcription software run through it to create captions from captions on a lot of things you can extract things like subject terms um increasingly with with better and better success so if i got if i get a shipment of you know somebody has or we have a breadloaf school of english at middlebury which is a lot of writers comments it's one of the one of the more famous writing programs in the country um and so we have all these really well known american authors who are speaking and we have recordings of them i don't worry about what they're talking about i worry about who's talking when they're talking um and those types of things just enough to to be able to get the work done fast and then from there we can later go through transcripts we can pull subject terms out we can add those to the records as we have capacity to do it but because so much of digital preservation is really about access um in the sense of coming from a archiving background you know aside from just just ab stuff um a photograph or you know my analogy is a photograph that you have printed can sit in grammats attic for a hundred years and there's still an image there um a digital photograph in a hundred years who knows we have no idea it hasn't it hasn't been part of our lives for that long to even really have a real guess at what's happening so a lot of what we're doing is just providing access it's getting putting this stuff in front of users um especially when it's in mass um because there's all types of uses that we can't even predict um so that's and technically we work with csv's honestly spreadsheets it's easy to train anybody on the spreadsheet i can train a student worker how to do metadata entry in a spreadsheet so much faster than any application's web interface and another thing about spreadsheets is that you can separate the data entry from the data review and approval process it's especially important if you're working with interns or students because students in interns aren't standard units you can get your first intern can be great and you develop a kind of a trust when you ask hey how are things going they tell you things are fine you learn not to worry about semester changes you get a student in you ask how things are going it's great you don't worry about it except two months later you find out that the metadata is all messed up or hasn't been entered correctly so the csv format instead of anything fancier is great uh for that review and approval process separate from data entry and they're kind of the lingua franca of data interchange everything works i also wonder if there's opportunity in waiting for technology and innovation to catch up and that there'll be opportunity if we have minimal description available for these assets if in time ai will do the the pulling out of important subject matter in fact the content is what we really care about right but it's the hardest thing to get get into and it takes the most time to provide the right breadcrumbs to but i wonder if there is some advantage to being strategic about waiting rather than doing super robust description and transcription for all of of your programs or attempting to start that process and instead minimal description of the entire universe of your of your holdings and work from there down and wait since you deal with local audiences you're used to using acronyms that make sense in your community but for metadata in addition to the kind of who what the real basics that after talked about it's important to gather two things the producer name because that's something that you can't pull out um actually you can if they're a video producer and they entered credits instead of writing it down on your form you know they say yeah why should i write it out it's in the it's in the video but the second thing is expand those local acronyms because nowadays you've got a global audience if you're putting this thing up on the web nobody knows what your um cac does it mean citizens advisory committee or does it mean cable access commission and so that's something that i've found working nationwide expand those local acronyms um it just makes it easier thinking a little bit more about metadata um can you talk a little bit about um you know there's different metadata metadata schemas maybe such as pb core so standards in place for people and how to organize their metadata but there's also you know the option to ignore those and come up with your own way of organizing your metadata for example some people call a program an episode or you know a series is an episode or as a program and maybe the staff is the worker or the producer so some of those words can get jumbled up depending on what how you're organizing your metadata so can you maybe talk a little bit about the benefits of following some sort of standard or maybe the benefits depending on the size of your collection of coming up with your own there's two there's two sides of that one is the schema side um which is having having a metadata schema that people know what the values are and then there are there's a local vocabulary control um when it comes to when it comes to the scheme itself i think there's it's easy enough to crosswalk most of them that whatever suits whatever suits your your use case best is probably probably like if you need the extra features of pb core that's fine if you don't need the extra features of pb core doubling core might be more than enough um we use i use doubling core usually with some some extra um expansions that we just use locally for for all of our collections um it's simple it's portable and it's easy to uh it's easy to map to other things if somebody who knows how the original metadata was created you know is looking at it and can can map it to to a new schema when for another process when it comes to vocabulary control um internal consistency is is the thing that's really key if um i call this person a creator if there's you know a column and then using a term that's you know creator or writer or whatever um just keeping those those internally consistent is going to be the most important thing because when we have bulk data like that things can be cleaned up terms can be mapped out their terms there's being a find and replace operations on a technical end that aren't aren't too onerous in access we have uh submitters or presenters we make the person submitting the national program isn't the creator of it but there's the submitter to the local organization so we capture submitter in a metadata field that goes up on archive and so you can wind up creating you can you can capture that as well as creator um so that's a little bit different if you if you don't know the field working with internet archive which i also champion all day internet archive is middlebury's uh core core repository and will remain so um they're very flexible so they have a kind of they're doubling core as their basic metadata schema but then they can take it can take arbitrary um metadata elements so it should be very easy if using internet archive to make whatever metadata you have on hand work with those records series is an example their catalog they were basically set up for texts bruster's a book guy their catalog is set up for books rather than video series doesn't exist in their metadata standard so slap a tag on there by all means called series if you know what the series is um uh producers are famous for screwing up the season and the episode number i mean even if they give you it you'll wind up finding it out of order to but series is usually pretty um pretty clear and so that's something go ahead and use that your government meetings organize them by put the jurisdiction put the board and the commission in there so that it's not because sometimes you have zoning board of adjustment sometimes they do it zba they abbreviate they use those acronyms so expand those acronyms let's say if you're looking at or assessing repository systems for your uses the ability to go back and make those bulk changes is really important how smoothly and easily you can do that should be a should should be a serious consideration internet archive it's very easy if you know how to the tricks of using internet archive and batch operations others i found are a lot more challenging so um yeah it's something to look at when you're talking to vendors yeah you don't have to get it right the first time and anything like the key difference between using the internet archive which is an open system and a particular vendor system the question you want to ask is do the edits that i put into the system can i get those out of there my point is that metadata should be vendor neutral ideally that csv file it's your crown jewel it's everything about you could argue that metadata is more important than the video assets themselves i won't make that argument but one could but the idea is that over the top your phone apps your those things change if you adopt a hub and spoke model where your website your various apps your playback server those are just spokes on the wheel the hub is your metadata collection and so put your effort into that these things become commodities they can change you're just mapping from one set of metadata to others um and that if you bring that into your organization you're you're set you're not trapped in you put this time to get your metadata right and you can't get it back out that's the advantage of working with internet archive i just want to highlight the importance of something patrick said about being consistent um many archivists abide by a standard called dax or describing archives a content standard and that helps you form what you put in those different fields or elements in your schema um so that you can be using using them the same way every time you use you know activate a date element how are you forming the date um to make sure that you're doing it similarly every time um and there's a mantra index that i love which is to um make a decision about what you're going to do locally document that write it down and then apply it consistently and i think that that's that's those are the if you can do that when you are developing description you're you're going to thank yourself and your product your your successors will thank you as well but i i do have a question about internal consistency for john because you are aggregating what did you say 1700 different collections so if there's internal consistency within each of those collections does it then present issues for you when you're an aggregator when you are trying to then have it so the searcher can can query that beautiful crown jewel of data and get dependable results yes there are issues but the good news is that searching on archive you don't it it is it uses a fielded search but you don't need to use that and as a matter of fact i recommend people overload subject fields you don't get penalized for having something in more than one place there's actually a waiting where a title or a subject matters more than a title a title matters more than a description um but you guys don't have to worry about that um no you worry about it right well yeah i do and especially if you pay me to worry about it but but the um you know board of education meetings if you're a multi jurisdiction thing you've got to wind up separating out your you know your city councils or your town councils from one another so you need the town name in there that's not going to wind up if you did a very strict search that's not going to get you hits across jurisdictions but a loose text search does and that's what the archive provides so you can be both very specific but when you're searching you can get a larger across there i haven't tried to maintain consistency because i value my sanity yeah so it's something that it's enough to have the stuff up there and uh searchable and and uh we can we because with the archive you don't need to get it right the first time right you can always go back do those bulk changes and clean it up and so i guess i'm i'm like patrick in advocating kind of get it in and get it started um because i've seen too many people have kind of that they don't want to screw it up and they're afraid to start and they never start so do what you can you know it's an iterative process that's what she said pulling back to earlier in that process for people with hard drives um and local storage uh things that i've found incredibly valuable so you're getting different types and different qualities of of stuff put in front of me there's a lot that you can embed in file names the date of creation for something um even last names of key people involved those sorts of things are super super helpful you can extract them later when you have a list of file names you put that list of file names in the spreadsheet and we can use you know excel excel functions to break those out so now the date calms populated by the date that's in the final name don't have to worry about that um same thing with folder folder structure naming schemas um so your local hard drives as much as you can kind of even even if it's not essential to your local file storage organization as much as you can kind of add to those file names and folder names to to give contextual information and metadata for the object even as a failsafe is really helpful um you know i pull files off my digital camera they go into you know the images folder on my personal on my on my local network storage device and they go into a folder that is just the date of when i took them sometimes you know a keyword for the place i was if i was on vacation something like that um just that is enough that i can make an image collection ready for internet archive where if it was just you know the sd card with the img underscore whatever dot tiff i i would have no idea what to do with this i would have to have a student worker look through them apply a bunch of new metadata so doing that work up front really helps um keep a text file a plain read me file in the folders with content when you put it in even if it's just your your thoughts off the top of your head on what what this stuff is that you're adding to to your storage that will help a lot in the future um a csv if you want to go through that work in the beginning of it even if it's just one line for the whole set of items you know we have 10 video files from this one event you can have a one-line file that gives the details for that event who recorded it who's speaking whatever it is do you guys know what we're talking about when we say csv files it's just the generic spreadsheet it's it's just row and column information it's a text file that can be imported into any spreadsheet or a database program do the fields for the csv have to be of particular identity no it's no there whatever it's it's it's columns yeah it's just columns and each column is a piece of metadata like date okay title and somewhere in the csv we put up what we say those absolutely yeah yeah yeah it's for you i mean it's it's not you you can feed it into a program but it's how you organize the information instead of putting that into a word document because you know you've got a column of dates you've got a column of each of those metadata fields okay yeah and they can be created with excel or google sheets or anything can save as a csv type file it's it's it you lose um the formatting information so what things you make bold uh the background colors of the cells and all the fancy stuff you can do in in excel or google sheets but just the column row data gets saved and that's yeah the first the first row is the column names for what comes underneath and okay and csv uh and csv i believe stands for comma separated value okay so may i ask um so when we're doing archiving i work for lake shingling axis tv we did a lot of archiving and uh we just did excel spreadsheets just did basic this is where it took place this is a date it occurred and um it just did like wrote thousands of so i have no idea what a csv is what is the advantage that's you did it you did it really that's all you would all we need to do would be open up one of those spreadsheets in excel save it as a csv um and it will it will save it as a csv why so here's because um so once something that comes up in my work i've got i this a few months ago i got a hard drive from uh from a writer who had donated an alum who had donated to special collections this is a hard drive full of microsoft works 2.0 for mac this is these are very difficult files to open and deal with um i can get the content out mostly i mean there is you open it up in a text editor there's a bunch of junk um and then there's you know chunks of text i could copy paste that out and paste it back into something else um but i would rather not so with an excel file like that's the the extra stuff that's going into the excel file are the organization of workbooks csv's will only have one like page of your excel workbook um in it so you can't have like multiple sheets in the same file um and then things like i'd said formatting stuff what's bold fonts colors um all of that but just the basic tabular data will remain if you open the csv file up in the text editor it does just look like uh every it's comma separated so it'll be a value comma value and that defines the the columns going across and then a line break to the next line defines the next row and so it's just each line is one row in the spreadsheet so the reason why you don't use excel or why you save as csv file is because excel changes you may not have the person using it down the line may not have the same version of excel as you it's just a a more basic exchange format data you still transfer and you can buy your compatibility yeah absolutely yeah yeah it's read by other types of programs devices you're going to manage a scenario where you're trying to map out where things say you want to video video mapping you could then you know take that csv data you had the location yep add that to that h2o right you would have a mapping tool so you couldn't do that because it's an excel supply we actually have the data that's readable right the csv is just plain um ascii unicode text so it's you can open up um just the the character format i mean you to say you could you could look at the csv file on uh on the old texas instrument big screen calculator you know it's it's almost universally compatible with computing devices so can i present a question on a totally different subject yeah so then uh i think somebody was talking about transcripting and so like if um if there was a subject matter of the video somebody mentions a word that's only that's like you know you know thousands of videos later but it's like you're looking for that one subject so then thinking about you know uh searching for keywords uh within transcripting is um i wouldn't know how to do that that's it's it's because it's tough to do technically if you're trying to roll your own you roll your own search search system in that but something like internet archive um provides keyword searches for text documents that are in an internet archive and you can associate multiple types of multiple files with one item so if i have an item that is this video taken on whatever date and i have a subtitle file or transcript i have maybe an audio only one that just has the audio stream ripped out and put into a way file or mp3 whatever all of those can be uploaded into the same item and then um internet archive will make derivative formats of those and things will come up and in full text searches that's not something you put in a csv that's something that had to be separate it'd be separate on the subtitle file you'd have you know a video file and a subtitle file and john i think you probably know this in more detail than i know um you have to text things that's way down in the weeds don't worry about it yet um yeah great so we've talked a lot about metadata uh in my next question i was going to phrase uh well before the metadata comes that the program and it sounds like a lot of us have dvds tapes other material you know stuff sitting on hard drives um but really your metadata you should be thinking about it while also thinking about um the other thing so it goes together but we've talked a lot about metadata but a lot of people have said they have other stuff quite literally laying around um and so at cctv town meeting tv in burlington um for example we have tapes and dvds dating back to 1980s um on 6000 tapes and 6000 dvds um to digitize it's like a five to tenish year process uh that i'm currently in the work of um and a thing to note there is um your staff doesn't have to be trained in that uh my background is not at all in archiving or historical uh knowledge or preservation i have a film degree which is great um but it's something that the idea that's something that anyone in your staff can learn it doesn't have to be an archivist at town meeting tv we're fortunate to have a full uh kind of full-time archivist and a part-time um worker to help with the labor of digitization but um at the end of the day we all have some sort of technical knowledge in the film and tv area so it's something that we can all do um but someone had asked how do people um or how does your organization decide what to save so i think that'd be a great point to talk about especially when there's so much like at cctv um ideally we could save anything but we also know that uh time is a ticking clock the tapes are deteriorating um and so right now our process is we're going in numerical order we're starting with the stuff in the 80s and 90s and working forward um we have a lot of community outreach as as uh members of the community maybe say i'm looking for clips of bernie sanders or former senator lehi or um an old mayor or a historical event we'll digitize those for them as as they come up but otherwise we're just going in order um and that's kind of what our our organization has decided um but that's also just what what we're able to do some people maybe you know um don't have any you know idea they just know they have stuff sitting on drives or dvds and like we have we're fortunate we've very organized system we know what's on our on our analog material some people don't so i'm wondering if maybe we could talk a little bit about um if you're saving everything what do you save or how do you decide what to save and um this also could be other things maybe you have old posters or photographs um i know we all we often think of of video um but there could be other related materials as well if anyone has any thoughts even archivists throw stuff out i wish they did more there there there can be things that aren't really that interesting and it are taking up space i mean resources for doing all of this are finite um that's it i think you know we have we i'm working in a very different environment than than you are for example but we things that have broad community there's hot one thing like things that we know for sure are going to be of interest your videos of bernie sanders are likely to be of interest um you know things that have we know are important to the community um certainly that are important to our particular users certainly and we do a lot of thing we do a lot of prioritizing based on what we like to as professionals and we'll be finding out there's there's quirky things that maybe only we know about this but maybe if you know we keep it and put it over here and somebody else will be interested um that's okay too i think yeah embrace your subjectivity you know that that's that we we are not objective uh and we are making value decisions and judgments all the time by how we choose to spend our time and what we choose to to to archive so yeah i mean use is a great thing to look at if you've got people actually able to act if you're you have your assets in a in some sort of format where they are and platform where people can use them and you can see what's what are the what are people looking for what are the search terms um your perceived value of these materials like what what is their long term and this is this this is the the the in some ways the most interesting uh determinant for selecting for digitization or for further preservation work um is what and trying to identify over time what might become uh of greater historical value and interest to your community condition can be another thing that is going to determine what your priorities should be if you have some tapes that were down in the basement for some period of time and you know that you might want to think about starting with those they're most at risk um the other another and the other thing to think about is the format that they are on and what what formats are most at risk for uh for becoming completely obsolete other because either because you cannot find hardware to to run them to read them um or because they are magnetic media of some sort particularly which are going to be the most the most likely to fail first um but dvds are not going to necessarily be be stable for for very long either so any of those those things we used to look to as uh it's on a we've got the digital files we've got them on the gold yeah gold standard yeah it's like no that's that's still that's hardware that is dependent upon some system to to spin it up and then it can just spin and not not give you anything so there's there's a lot there's different things to consider um as you think about what to select do you have a question can i ask a philosophy question um and and this may seem odd to the archivist in the room john knows his problem though um because many of our organizations were utilities or define themselves as conduits for other people's expression and they didn't maintain copyright they never kept anything which seems strange to me but it's it's it's not uncommon to still see that yeah actually within organizations that are devoted to other people's expression rather than their own um or you know in your case it's slightly different because the role the organization played over time how does one build a philosophy about being that repository for other people's expression because it you know it it it has not been part of the historic charge of many of our organizations you need there's there's two roles here and academia tends to wind up taking data stewardship and data ownership and and they're not separated and i think that's what you're talking about because when you're an access center and you only keep staff stuff you're concerned about that you're you're a data steward only for your own stuff but in the distribution world you were a gatekeeper you were an aggregator of your producers content and a lot of people didn't want to wind up keeping their producers content because for whatever reason they didn't want to get into issues about that but if you think about it as a data steward role capture the producer put that into your metadata then as issues come up you have a policy level thing which says we've got to take down policy if a producer requests their first of all you get permission for web distribution just like you used to you used to get permission for dbd duplication in the old days add a thing for web submission or access uh access archive access and and recognize that that data stewardship you're keeping that stuff for other people and all of a sudden a lot of those issues about producer content go away because you're able to on the internet archive if you've got the producer name you can give them a url which gives the last 50 people of the last 50 items they produced as they add new stuff that url doesn't change and they love that because that's what they give to the friends and family right so that that helps but it's a lot of the issues that you get into are by not recognizing the data stewardship is separate from the data ownership academic institutions will not take collections nowadays because of the cultural appropriation and because of the issues about ethics and and permissions they won't take collections that somebody gives them if they don't have clear ownership of that material and documentary and also a lot of a a lot of the things that that you do on are probably more interesting than you think they are the you know we just recently i i just started working on some grant funding from nsf for to do digitization work within vermont of you know peg tapes and stuff that still start on analog media things that we're super interested in our like community calendar cards um which who i i i don't know if 20 years ago anybody would have thought that was that interesting to keep for you know in bulk for that long um yeah i think there there is still a there is a lot that you probably probably have or could have that um that is very much of interest in the long haul but yeah i think john's right absolutely right about the the streaming agreements and you can put it's one line in a one line in a release form that can really make all the difference um in the academic side of things we do at least at middlebury because we can we lean heavily into the the fact that there we do respect takedown notices and one needs to be sent before anybody assumes us but that's it's it's a real thing you know if no there's a hue in the archival world there's lots of discussion about orphan works we have post cards that were printed in the 1950s at local drug stores somebody owns the copyright to this technically they just they're they're by the law they're in copyright works is anybody ever going to complain and is the is that risk you know what is that risk compared to the value of having you know these beautiful historic photographic shots of the state up in mass we make those judgment calls so yeah and it is a risk assessment oh i'm sorry i let you let you talk um this is kind of like a tangent as far as like for my station the big thing is storage like we're going to store this and i know the internet archive is a lot of people are like that's a great place um but i like to have like backups to backups right so like i'm trying to kind of scrape but like do you have any like advice for like personal storage like what's the most cost-efficient and i want to say readily accessible because i don't think at my station it's like every every day someone's going to come get some more type of footage but like accessible and it means that you could search through to find what they want and eventually get it like do you guys have anything to recommend in that field external hard drive and a csv file or an excel spreadsheet on that drive talking about its contents um a 16 terabyte drive is going to cost you if you wind up getting it on sale it's going to cost you 250 dollars if that's too much a four terabyte drive is going to cost you i don't know 30 bucks do they even sell them these days so so that winds up being a usb you know it's something that you can plug into a computer and you want to wind up if you're using max versus pcs in your shop you want to be cognizant of that because you um you want it to be able to be read by multiple systems most external hard drives these days ship in the ext fs format which can be read by max pcs and linux and one point on the external hard drives talking about backups and backups they will fail oh absolutely okay let's talk about that yeah how often that was my question how often should they be refreshed three two one is the backup standard you wind up having three backups at least two different formats and one copy off-site so you can do that with i mean an unsophisticated way of doing that is you wind up bringing your off-site copy on site you copy the data over what's changed since the last time you unplug it you take it home and you don't access that again until you come in again on the cycle either quarterly half yearly it depends on how much new stuff you get um so that covers your off-site um well off-site you could upload it to your google drive account or your microsoft one account or one of those things if that's convenient and you don't have too much again that's a thing that you can recover um that's a short term thing you know um because you may not have that account or they may not provide that service five years from now or 10 years but it's it's plan c it's plan d it's plan e it's it's another it's another thing i would like to just mention that um you know she'll also talk about different between hard drives and ssds and um hard drives they're moving parts moving pliers and if um you know using too much they wear down you know use them enough they wear down and um i think um then there's what we have is a drobo and what is that in the light hard drive ssds do you know what those are those are standard um yeah that says three and a half hard drives the other one hard drives yeah mechanical hard drives yeah so i wanted i i would first jump in to say back ups aren't preservation that's that's that is that's kind of something you get as an intro to the archival digital preservation world you can have um a copy of a file replicated across as many storage devices as you want if there's um bit rod in that file if bad files are getting replicated if if if if there's bit rod um if something goes wrong with the file in place and transfer and you're then backing up a corrupted version you're backing up a corrupted version so the nas device it'll keep things pretty consistent with your storage this is not it's not a preservation solution though because the preservation solution requires some ability to go back and hot it is it generating check sums how often is it is it checking file integrity to make sure nothing has gotten corrupted um you know is it keeping audit logs of access or changes those types of things and that's internet archive is one of those fifth systems that does all that any of the any of the vendors supplied digital preservation systems also have all of these other features built into it yeah this is second level stuff assuming you assuming you already have back up yeah do that first good hi do you have that does not say don't keep that first if you don't have that go do the offsite thing but if we have you know we have a up-to-date copy it's on nas whatever if you have if you haven't a ssd that's not powered on and is on the shelf it's going to last as long as your dvd backups probably um it's you know it's i'm sure it will its readability will live as long as the interface does so until the point where you know it's no longer able to plug into a port on whatever computer you have lying around it's probably going to live in constant use hard drive failure rates are high compared to any other storage medium but when they're powered off they're pretty good any more than in that um lots of copies that's the acronym we use is lots of copies keep stuff safe oh that's damford locks yeah uh-huh and so the yeah not i was not disparaging backups the more copies you have of everything the better it is unless you're concerned about the environment well in which case one gets one's hard drives from the internet archive um they regularly increase their data density and i was down there three weeks ago and wound up picking up 340 terabytes of six terabyte drives 60 terabytes uh sorry 66 terabyte hard drives in the back of my 2000 honda seven 2007 honda fit five hour drive suspensions like that suspensions like that headlights point to the moon going back home i calculated that data transfer rate was 50 terabytes an hour can't touch that with broadband sometimes you have to get in the car and drive and that was one of the times so a practical example at middlebury we have our digital side of things meets all of the standards for um you know national record keeping and everything else we're we're high up on on the lists far higher than we need to be for what we do um and that's we'll get a hard drive back from a vendor who's digitized content for us so we have a hard drive full of video files that drive gets put onto um some cloud storage solution a usually a lower cost uh lower access tier like amazon glacier or um whatever the the microsoft equivalent is um so there's a copy there we upload the stuff to internet archive there's a copy there um we have a copy on our local dark network storage so there's a copy there the hard drive gets unplugged to get set on a shelf that that brings us into you know social security administration level of digital preservation and we don't and it's it's so it's it's relatively simple i mean to have those those different systems if you have if you have a NAS at you know just keep the hard drive if you can and nowadays and nowadays those NAS units um have zfs it's it's a file system that addresses the bit rock problem that he mentioned before so your hard files on your hard drive can be silently corrupted you've seen it when you've had a picture single bit flips in your picture and then the rest of the picture is gray have you ever had that happen on your sd cards or well it yeah it it happens and it's a bummer when it happens especially if you cared about the picture but it's the sort of thing that zfs winds up detecting and automatically correcting without you without you doing anything that's fancy it's higher and but it's coming down to the consumer level products um and it's it's an open source protocol so you don't have vendor lock it so at this point i want to make sure we have time for questions i think we've naturally transitioned from uh discussion to question which is great i'm gonna give us about 10 more minutes to make sure y'all have time to get out for the raffle um but i think i'll open up a little bit more formally to questions are there any other questions that people have yes this is potentially kind of far out but maybe not um it seems like artificial intelligence specifically coming down to the consumer level like just even right within premiere being able to run a file get transcription going to Trent or otter ai etc i'm kind of curious your role that you see that's playing in preservation and also if ai could be potentially dangerous to or manipulate any digital content and not in a useful way like is there a chance of balances there as far as we see right now with the where technologies are i doubt about the manipulation thing i mean as much as it could anything else so yes it can use my pictures to make fake news for sure um this this is part of the capability but for its usefulness in the archiving practice we're just uh john and i were just talking about that too uh whisper ai is really good for doing transcriptions of like town hall meetings noise even noisy room the ai enhanced speech to text transcription it's it's gotten really good it's getting a lot better faster um computer vision and optical character recognition for um for chiron's and all kinds of other texts that gets displayed on the screen is getting really good and getting really good at detecting even things that traditional optical character recognition for like text documents newspapers would always be because text moves in different forms there's columns you know columns there's a regular column breaks all of that you get the same thing in in most screen media where you we have a column over here and we have a runner at the bottom and so ai tools i think will make all of that so much easier um so those types of things upscaling obviously like we i tend to when i'm doing digitization i tend to pull things as close to the highest reservoir the the highest bit rates highest sampling rates i can accommodate but usually close to native resolution because software upscaling is getting so good um if i can get a good enough if i can get a high enough sampling rate probably you know whatever whatever adobe's next upscaling plug-in is going to be is going to blow any of the hardware i have access to out of the water yeah so can you explain what upscaling is yes um the uh the increasing the resolution on analog tape for example we have an analog signal it's it's kind of it has a technically infinite resolution but when you're digitally sampling it you get a video file that's like what's ntsc come out to like to 320 by 240 pixels it's almost thumbnail it's thumbnail size by modern day standards is how big the capture resolution is so upscaling is we can display this at hd resolution and not have it just look like a bunch of pixels blocked and moving around um as screen as video resolutions get better as screens become higher resolution all of our old digital content starts looking smaller and smaller and smaller and smaller look it's little but the vga image that was really cool and you know 2000 is now that big on my screen at this date of resolution so it's um there's there is that process which is philosophically i don't know exactly how to deal with it like you're manipulating you are manipulating the original um but we're doing that with a with a lot of the it's it's just part of life and digital information um but that's i think that's another way like just being able to keep up with with that is a huge benefit of a lot of the new ai tools coming out thank you question in the back yeah so say your vermont access center and you find some old community bulletin board footage while you're digitizing something else often they're at the end of things and you get these cheesy ad graphics and cool things like well this is so neat like what is the best format that we could capture those to share if there was a bigger project we'll be talking about this right now i don't know yet but we're going to figure it out um the uh i keeping it as part of the video file keeping it as another video file um i think because it's going to come through through that kind of video capture process and digitization keep it as it as it comes and as close to the source material so keep the video file and we'll see what kind of ai tools we can we can wrangle to look at it and turn it into printed text yes question i'm looking at the last question there and i'm thinking about the types of state agencies that organizations should be contacting talk a little bit about like how state agencies are approaching archival work is within historical societies state function is there like a group of folks that get together on thursdays to talk about things i mean how can we be building the sort of parallel relationships across the country organizations like well it will probably come as no surprise to you that the state archives and other state agencies it varies wildly from state to state in our country what they are doing uh vermont doesn't we don't have a digitization lab even for state agencies to use um so and that's i often find find people assume we must have a lab um so the vhrp has built a really modest mobile digitization unit um consisting of scanners including a four thousand dollar one and um and a camera setup as well and we we offer training and get those out into the community again this is for basically for your your paper based historical records um but there are there certainly are states and i would encourage folks here who are from places other than vermont to reach out i would start with your state archives uh and see what are what are they doing what do they know of um but i i think there's also great opportunity within your community um partly because of the particular needs of the the larger files of moving image material you are trying to to capture and the fact that many of you have have large chunks very official archival term uh that are on magnetic tape and that those are things you need that maybe you have maybe at some point those were were were migrated reformatted now you also have them on dvd and there's also a question of which you know what and maybe you still even have the um some other backup format on tape and you know there's also sometimes i think a question of what what what is the one we should go with if you have held on to all of those over time um so i don't have a great answer for you at all mike but i think that that's it's certainly something you should look about you wherever you are to see to see who's available who's already doing something or might have some there's kind of the experience and like okay well we worked we walked through this we figured this out so you don't have to um or the equipment that they might be able to make available to you yeah that's something i want to talk up to you guys about uh from an access center point of view as you've transitioned from the analog stuff to the to the digital side of things um if you don't need your old decks talk amongst yourselves and keep them at least regionally so that there's some place and vermont's great because you guys um already have an access network and so keep you know keep the dubrack and some spares keep those regionally so that there's a place you don't every center doesn't need to keep the old stuff but it should be available when this center wants to do a project or this center is coming up on its 40th or its 50th anniversary and wants to go through and digitize some tapes and i think that kind of is a self-help thing you know that that makes sense instead of trying to send them out to a commercial uh a commercial audio uh preservation company so we have time for one more quick question great i just want to ask more of a technical question but then i'm from where god and say i'm nowhere near the media and we started in l4 so a lot of our stuff the early stuff was on tape but 90 percent between like 05 and 2012 was all dvd and the hardest part of archiving that was files is the time it takes to actually take the dvd you get a lead playthrough and then let it be i didn't know if there was a faster way to do this oh i don't let it let me uh three quarter inch down there dvds at cctv we use handbrake which is a free software um we could connect i could tell you the specific settings we use but um so i don't have to wait for the dvd to play it yes um yeah is that the one that just converts like a video file to like an mp4 file yes i use a image burn which is probably the same thing that's what i used to take stuff off of dvds or make digital copies for people or take stuff to digitize it to put it on the dvds handbrake's free it's cross platform you can get really deep into settings and the good news is you can use the gooey but then it also has a cue and so you could once you get a kind of a preset setup you can start loading a ton of things in there and let it run overnight over the weekend in batch mode and once you understand what the difference but i'm sorry the similarities between the gooey and their little command their batch program um yeah you can you can do tubs full of hard drives just run through yes if you have the um tell of you you can just you know take the dvd and dump it into the tell of you and then digitize it and you can put it in archive that's what i'm doing just put taking that on a ts file and putting it into the tell of you and it'll do the work for you great so for those if you didn't hear uh if you if you use tell view you can take the ts file um dump it in and it will uh it will it will work the same and if you've got another vendor ask them about archiving capability because i talked to tell view for seven years before they developed that module and that module just was an extension of an existing thing they already had and so cable cast you know is ray and i have been trying to get together to get their archiving capability cast us is still a distant third but cable cast it looks like ray and i are going to be able to get that done this year thank you all for the questions before you go there's one more point i want to make we talked a lot about um preservation which is great and you know that just because you have multiple copies of something that's not preservation but it's still important to keep copies something that didn't come up in this that i thought might um was youtube and if anyone was using youtube as a way to store files um and so youtube is not an archive youtube is an access point for example at cctv our current website is a little outdated we can't livestream to it so we livestream to youtube that's just so people can watch it live um we do sometimes our website fails and we can't put our videos on it we'll put them on our youtube channel just for people to access we still have our our mp4 files backed up on our server and amazon glacier and elsewhere and elsewhere youtube is an awesome resource just to get people to see your stuff um i'm glad it didn't come up here that someone wasn't like we use our youtube to store stuff because right with youtube you you can download your files back out but you're not getting them back um in the highest resolution and and other things like that what if youtube one day just disappears all your stuff is gone you get locked out of your account um so one last point i wanted to leave you with um and then lastly here are some references some stuff we did talk about some we didn't internet archive we talked a bit about um there's three resources there for preservation and digitization outsourcing and training um bay back bay area video collusion is great um yeah please take a picture um george blood any dcc um and then john had a what i've learned archiving two million videos reference sheet which is what that qr code is specifically um but i'd recommend if you want to take a picture of this here are some uh resources uh i'll stick around if anyone else has any other questions but i want to build your own adventure game it starts out as a one-page sheet that has links to other stuff that i've done over the years and so you can kind of build your own path through there but thank you and i want to of course thank patrick and rachel and john for being here