 Can everybody hear me all right? Okay. So my name again is Olivia Dill and I work for the UC Berkeley Library. I'm Project Irene. My project is a collaboration between many different entities on the UC Berkeley campus. We work with the Department of Linguistics, the Phoebe A. Hearst Museum of Anthropology, the UC Libraries, and Lawrence Berkeley National Laboratory Physicists. And our project goal is over three years to create digital versions of the audio recorded on around 3,000 wax cylinders in the Anthropology Museum collection. And the way that we're doing this is using a specialized and developing optical technique that relies on images of the materials rather than contact with them. And today I'm going to be walking with you through the optical method in general, how we're applying it in new ways for the Hearst Museum of Anthropology project and what directions we see it going in the future. So the method that we're using was developed with the aim of increasing access to media from the early history of sound recording. Sound was first recorded in 1859 and the materials on which sound was recorded during this early period were oftentimes delicate to begin with and have only become more delicate as time passed. And so this leaves us with more than a century worth of information on delicate and tricky to handle formats. And the technology that we're using for the Hearst Museum project was developed at Lawrence Berkeley National Laboratory over the last decade with the goal specifically of accessing these recordings. Because this method is non-contact and image based, it allows for the playback and preservation of historic sound recordings in different formats and different states of stability. And it's been used to play back some of the earliest recordings in history and on collections of different kinds of media in the US as well as abroad. And the mass digitization of the wax cylinders in the Phoebe Hearst Museum of Anthropology Collection is just the latest and newest application of this technique. So kind of an obvious place to start is with the question of what is a wax cylinder? But to really answer that question we have to take a step backwards and first ask what is sound? Sound is a motion based effect. It's due to the compression and rarefaction of particles as they travel through some medium creating oscillating areas of high and low pressure. Two of the main properties that we use to perceive sound are a sound waves volume and pitch and how those two properties change over time. The amplitude of oscillation translates to the volume of the sound that you hear and the frequency at which particles are oscillating translates to the tone or pitch that you're hearing. And the feat of early sound recordings is to create a retrievable trace of the amplitude and frequency of oscillations and how those values change over time. And so a wax cylinder creates this trace by engraving a groove in wax. A wax cylinder is exactly what it sounds like. It's a cylinder made out of wax. They're usually about four inches long and two inches in diameter. So think about the size of like a soda can just with a much thicker wall. They're made up out of a wax that's hard but malleable enough to be easily scraped away. So we should be thinking about the texture of like hardened candle wax. To actually record sound onto a cylinder a small needle called a cutter on a machine called a phonograph is sunk down into the wax by some small amount and then the cylinder gets spun underneath the cutter. The cutter drags along the surface of the cylinder removing wax and leaving groove behind it. Sorry. And it does this in a way that every time the cylinder completes one revolution the cutter has moved over horizontally by some amount so the groove that it creates spirals around the outside of the cylinder in one continuous groove. To record audio onto these things a speaker speaks into a horn on a phonograph that collects the sound waves that are being produced and transfers them through a coupling mechanism into the recording device and into the cutter driving the cutter to bob up and down changing the depth of the groove that it's cutting behind it. And so to play back the audio that's recorded on these we do that process in reverse. We place a playback stylus in the groove and we rotate the cylinder underneath the playback stylus and the playback stylus follows along the groove. It's rest against the groove bottom and so as the groove bottom changes height the playback stylus retraces the motion of the original cutter and reproduces the original motion of the audio. So the physical trace that sound leaves on the surface of a wax cylinder which is the link to the information that we want to recover from it is the shape of the surface and specifically the shape and changing height of the groove bottom. And the relevant scale that we're working with are micrometers or microns. One micron is one one thousandth of a meter and the average human hair is fifty microns. The needles on recording devices and the grooves on cylinders are typically a few hundred microns wide and the bobbing up and down the vertical displacement is about ten microns. So that's about one fifth of the human hair of motion that we're trying to track. So this vertical displacement is very small but it's measurable. The phonograph technology of wax cylinders was first invented in 1877 but by 1890 a device called the Edison phonograph had become commercially available in a reasonably portable and easy to operate form and one of the most common uses for this technology was for anthropologists. Anthropologists saw it as an opportunity to more effectively capture the songs and the languages that they had previously been transcribing by hand and so they took this new technology and they incorporated it into their research. During the early years of phonograph's initial popularity many anthropologists across the U.S. and abroad engaged in these dedicated recording campaigns where they would go out into the field and engage with indigenous cultures and use these wax cylinders to record their languages and through these recording campaigns they produced tens of thousands of these wax cylinder recordings. The collection that I'm working with for my project is the result of one of these field campaign efforts. The cylinders were collected by Alfred Krober and his students. Over 39 years starting in 1900 Krober and his students traveled throughout California and collected 2,746 separate recordings of native Californians as part of a massive overall ethnographic survey of California native cultures. The content of these recordings tends to vary. They contain both song and speech in a wide variety of languages. Our collection has a slight bias towards song. The majority of these recordings are sung rather than spoken. And the things that they record also varies. We get dance songs, descriptions of activities, medicinal songs, myths, creation stories, narratives, histories or just lists of names of places and lists of things. And so when the anthropologists were recording these they envisioned themselves as building a research resource for linguists and anthropologists and others in academia and the collection certainly still remains that today. But it has also become a valuable resource to community members. And by community members I mean oftentimes descendants of the tribes that are recorded on the cylinders and direct descendants of the actual people that are recorded on the cylinders. Many of the languages that are recorded have transformed or fallen out of widespread use since recording. So the collection has become a valuable resource for language revitalization projects and cultural revitalization projects that often have a very strong community base. And until now they've been doing their work with the collection and engaging with collections both scholars and community members through transfers that were made in the 80s on to real to real tape using the traditional method of playback, a hard stylus. And so the problem with these transfers is that oftentimes they skipped over broken cylinders. The audio content is difficult to discern oftentimes. And because they're not digital, some of them have been digitized but the vast majority have not been digitized, they're made just on real to real tape. And the logistics of actually getting them to the people that are interested in them is difficult time consuming and costly. The physical cylinders themselves in many cases have been broken or cracked. The wax that they're made of has become brittle over time so they're easier to break now. Also the surface of the wax is constantly being eaten away by a type of mold that thrives on wax and lives in the linings of the cases in which the original cylinders were stored. And so they're just an overall delicate state of preservation at the moment. And so the overall picture that we're looking at is that the Hearst Museum Collection records unique content. It's invaluable to community members and to scholars. They want to engage with it but it's difficult for us to currently facilitate access using our old transfers. And if we're to create new transfers, the method of doing that would have to be sensitive to the current state of the collection. And that's where the optical scanning method comes in. The optical method uses a physics-based understanding of the mechanics of sound recording and a physicist's training in interpreting and making precision measurements of observables to make targeted, to take targeted data about these cylinders and use it to recover audio information from these objects. The optical method asks two questions of these materials. First, can we measure the surface of the media in a way that allows us to create a sufficiently detailed image of the picture—excuse me, sorry—it asks, first, can we measure the surface of the media in a way that allows us to create a sufficiently detailed picture of the surface? And then second, can we display that picture in a way that's useful to us in a way that allows us to systematically analyze it and to recreate the playback process and recover sound from the recording? Digitizing recordings through this process answers a lot of the problems that come up in dealing with these collections, like the Hearst Museum collection. The only physical demand on objects under this method is that they be maneuvered around and have light shown on them, and so this opens up access to fragile materials. This is computer-based, so that allows us to systematically and automatically access large collections of materials, and the method works in images, and moreover, detailed images of the objects. So we see the surface in enough detail to pull out information that we can't have just by playing them back with the traditional playback methods. And using this one machine allows us to eliminate the need to maintain and operate outdated recording devices. So in the Hearst Museum collection, there are a total of around 3,000 cylinders that need to be digitized, and we have project funding from the National Endowment for the Humanities and the Natural Science Foundation that's scheduled to last for three years. So in assessing how plausible the optical method is for our collection, we considered the relevant dimensions of the cylinders, what measurements we would need to take, how precisely and quickly we could take those measurements with available measuring devices. For wax cylinders, we use a confocal probe that collects height measurements along a line on a surface, so long as that surface is within a set focal distance. Pilot studies, we conducted pilot studies where we tested this out, and we found that we could work within that focal length and collect data with sufficient accuracy to give us the images that we need. And the rate at which we could collect those measurements with specific probes was high enough that the time taken to scan each cylinder would allow us to scan the requisite number of cylinders per day to stay on our project timeline. So given the condition of the cylinders, the number that we needed to work with, and the desired format and specifications for the final audio, the optical method was the method that we chose to solve our problems. And so we then developed and applied a workflow to systematically work through the cylinders in the collection. For each of 3,000 cylinders, we'd need to work clockwise around the graphic that you're seeing. So we'd start out by making many thousands of measurements around the surface of the cylinder, and then stitch them together to make a high resolution map of the surface. The final product is a one to two gigabyte file with that map that we then take and we plug into a piece of software that calculates how the stylus would have moved over the surface and then uses that information to create a digital audio waveform. We can then circulate that file as needed and archive the map for future use. So how do we actually do this? To implement this process, the first half of the work that we need to do is to acquire the necessary data. And so for wax cylinders, we use a tool called a confocal microscope or probe. And the way it works is that it shines white light through a series of lenses that break up light into its constituent light wavelengths. And each wavelength of light corresponds to a different color. And each color focuses at a different height. And so when you put a surface in the focal range of this probe, the surface is going to be at a height where one color is in focus and where the rest of the colors are out of focus. So the color that's in focus is going to reflect directly up back where it came from towards the light source and the rest of the colors that are out of focus are going to scatter off. So the probe collects light that's reflected off the surface and analyzes it to tell which color is in focus. And from that information, the probe can tell the height of the surface in front of it to within an accuracy of 75 nanometers. And so again, the shapes that we're resolving are on the order of microns. So 75 nanometers is more than enough resolution for us to do the work that we need to do. And so the probe does the shining and collecting and analyzing at 180 points along a line across the surface. And the output that we actually work with and that we care about is just a list of 180 height values. And so if we take this probe and we point it at a wax cylinder so that the line goes across the grooves, you'll see the image on the upper right-hand side. And so what you're seeing in that image is a very regular pattern of peaks and valleys. And so the way to interpret this is that as the groove winds around the cylinder, it's going to intersect that 180 point line many times. So the many valleys are many sections of the same group bottom and then the many peaks at the top are many portions of the cylinder that have not yet been cut away. So they're the spaces in between the groove. And so to actually do a wax cylinder scan, we load the cylinder on a rod that's attached to a rotating motor. And so we can spin the rod and the cylinder together kind of like a leaf. And we mount the probe so that it's pointing at the cylinder. And in this image on the left, I want to point out that you can actually see the light that the probe is shining. It's a little white dot. And from that, you can see that we're only ever looking at a very small portion of the cylinder at any one time. And so while the probe is pointed at the cylinder, we tell it to rapidly collect measurements. We take about 1,000 measurements a second. And then while it's constantly acquiring, constantly making measurements, we spin the cylinder in front of it. And so if we do that and we collect our height measurements as the cylinder rotates and plot the results, you'll see something like the animation on the right-hand side. And so what you're seeing is the same patterns of peaks and valleys from before. But now there's one groove that I've marked with an arrow that's oscillating up and down. It's bouncing up and down. And the way to interpret this is to remember that the needle sits in the groove while the cylinder rotates. So this up and down motion in that one particular groove means that as the needle sat in that groove and the cylinder rotated underneath, the needle would oscillate up and down with the groove bottom. And so that vertical motion that we're seeing is the audio content that we want to recover. And so another thing to notice about this is that we have a regular pattern. We have our line with our peaks and valleys. And then there's one that's active that's bouncing up and down. But all of them together overall are moving down the screen. And the reason for that is that these cylinders on the small scale, they look like they're circular, but they're actually not. In reality, they're on small scales, they're ellipses. And then also the wax isn't perfect. So it has bumps and deformities. And when we look at it in this tight of a scale, we see these things. And so we're looking at a portion of the cylinder where we're coming down over a ridge as we're spinning it. And the probe only has a finite range that tends to be smaller than this out of roundness that we see in the cylinder. So it's not possible for us to just put a cylinder in front of the probe and spin it and walk away. If we do that, we might lose the surface of the cylinder before the end of a rotation. And so to correct for this, we use another instrument to measure the average height of the cylinder ahead of where the probe is measuring. And then we feed that data to a motor that moves the probe to chase after the cylinder as its surface rises and falls. And that second instrument is a laser and you're seeing the red light from it in the image on the left. And so we call this focus control. And when it works correctly, we can keep a cylinder in range and we can measure an entire rotation around the cylinder. And the need for focus control is just the first of many ways in which we have to continually tune this process to be able to work with many different cylinders in many different conditions. And we have to be able to tune it for the specific cylinder that we're scanning at the time. So when everything is working together correctly, we can rotate through one full rotation of the cylinder and it stays in range the whole time. And after you rotate through that full 360 degrees, you're measuring the same place that you started. And over that rotation, we collect between 20,000 and 60,000 measurements of that 1.8 millimeter wide line, depending on the resolution of the image that we want to create. And so that full rotation effectively creates an image of a 1.8 millimeter wide circular band of the cylinder. And so if we save the measurements that we take and we display them next to each other, then we can create a topography of the cylinder surface one ring at a time. And what you're seeing on the left is the real time acquisition for a cylinder that has focus control activated and that has more active audio in all of the grooves, or rather all of the portions of the one groove, and the corresponding topography on the right hand side. And so in the topography, you can kind of see that there are these several channels or gutters that needle would have sat in and rode through as the cylinder spun around. And so to apply this for an entire cylinder, we do this rotate measure and we sweep out our full grain and then we step over to the next portion of the cylinder that we haven't imaged yet and we rotate and we measure again and then we step and then we rotate and we step so on until we're all the way at the end of the cylinder. To do this for an entire cylinder, it takes between one and four hours depending on the resolution of the map that we're creating. But the process is automated, which means that we can load a cylinder, we can set up our scan, make sure that it started correctly and then we can walk away, come back one to four hours later and we have a map of our cylinder. And so this is the machine that we built to do this for the Anthropology Museum Project and you can see there's a long arm that has three cylinders resting on it and they're supported by little cylindrical plugs that insert some distance into the cylinder and keep them settled on the mandrel. And previous versions of the scanning machine that have been built only allow for loading and scanning one cylinder at a time. So we expanded for this project and we built a machine with a longer arm, which allows us to load three at a time. And so at one to three hours per cylinder, this means that we can typically run around nine hours without supervision, which opens up the possibility of being able to run things overnight. And so that was a significant goal of our project, was to find a way for us to work through this collection systematically and quickly. And so we're scanning the collection in a high volume way. Three cylinders overnight and three during the day and pushing out six cylinders a day, which is something that we haven't had experience with before. And in addition to reimagining how we could build the system physically to accommodate this many cylinders, we also had to do updating to the software interface. And so this is the interface that lets us control acquisition. It's written in lab view. And on the surface level, it allows operators to adjust parameters that control the scan. They're setting which region of the cylinder to scan, what resolution to scan, how many seconds to acquire per measurement, and many other things. And so for the method in general, but specifically for this project where we're looking at cylinders in the hundreds and eventually in the thousands, the scanning apparatus has to be adaptable. The properties of cylinders change slightly depending on when and where they were made, how old they are, the circumstances of the recording, how they've held up over time. And so the scanning interface includes many options to let us optimize our data taking for the cylinders at hand. And this is again another area where we have to make sure that our method can work for different kinds of cylinders and can be tuned to the individual. Another thing that we had to update for the current project and rework was to make sure that it was conducive to the higher volume. And we had to do some reorganization of the software to make sure that we could scan three cylinders, potentially all at different combinations of parameters without needing user input in between. And the last thing that the software is kind of doing is under the hood, it's also making sure that our measurements are taken carefully. One of the tricks of this method is that because we're measuring on such a small scale, we're very sensitive to small vibrations or variations in our data. Small jumps in the focus motor or hiccups in data flow can show up as small changes in the height measurements and noise that shows up later as perceivable audio. So it's something that we have to be careful about. Measurements have to be timed carefully so that the rate of acquisition matches the rate of rotation and the focus control has to be very incremental and gradual. And so there's a lot of software running under the hood to make sure that that's the case. And so the last sort of thing to note about the acquisition process is that we can also make it work for broken cylinders. One of the great things about this method is, again, the only physical demand on media is that we be able to shine light on it and rotate it. And so the risk with delicate media becomes that as it rotates, if it's not stable, it might wobble around, which would affect our measurement. Or if it's something that's broken into pieces, we might not be able to keep the pieces in place as it's rotating. And so we compensate for that by using special fixtures to keep the cylinders intact while they're on the machine and to hold them in place while they are spinning around. What we do with cracked cylinders is we wrap plastic bands made out of polyethylene around the outside of the cylinder and we cinch it tight so that the crack is as small as we can get it. And so we scan up to the band and then we move it and we move over to where we hadn't scanned before and then we continue when we finish the rest of the cylinder. We can do the same thing for cylinders that are broken into a small amount of pieces, few enough pieces that it's easy for us to reassemble. And we reassemble the cylinder kind of like a puzzle piece around a cylindrical form and then we hold it in place with those same polyethylene bands and we put it on the scanning machine and scan it. So when all of these concerns work together correctly, the end result of a full cylinder scan are two groups of files. We get several metadata files that describe the scanning process. They keep track of the positions of motors, of data flow, and many other things about how the actual data was acquired. And we archive those. Currently we're not actively using them in processing, but we do archive them for future use. And then, aside from that, we also get the main output file. And that's usually between one and two gigabytes for a full cylinder that's been recorded end to end. And what that is is it's just an ordered list of height values. And it contains many hundreds of thousands, if not millions of height values, all around the surface of the cylinder. And the order in the list somehow corresponds to the physical position on the cylinder. And so the question then becomes, we have this ordered list of millions of values. What do we do with it to make sense of it? And so an obvious reflex is to make a picture to visualize that information in some way. And there are many options about what kinds of pictures we could create. I've shown you animations and topographies. But we know that what we want out of this data is something very specific. We care very specifically about what's happening inside of the groove. We care about the changing height of the groove bottom. And so we want to choose a type of image that makes this clear, and that makes it easy for us to quickly point to where the relevant data is. And so the type of image that we choose is a grayscale depth image. And so in this, we map the position around the cylinder spatially to a position in the image. Moving vertically in the image is as if you're moving around the circumference of the cylinder. And then moving horizontally in the image is as if you're moving down the length of the cylinder. And then we translate the height values into the color. So darker is something that's lower, that's sunk down, that's recessed into the page. And white is something that's above that's pulling out of the page. And so in this view, we get many black stripes and light stripes. And we know very clearly that the black stripes are something down that's consistently winding around the cylinder. Those are our grooves, and the white stripes are the space in between that we aren't paying much attention to. And something else to mention about viewing our data in this way is that it gives us that very recognizable pattern of light, dark, light, dark stripes. And so in this view, our data is very regular, and so a pattern, it readily emerges. And also at the same time, if there are deviations from this pattern, that's very obvious too. And so it makes it easier for operators to track if something went wrong or if there's anomalies in our data set, it immediately becomes visibly obvious. And so to actually produce sound from one of these completed surface maps, what we do is we load it into a piece of in-house software. And that software stitches all the data, displays it as one of those depth images that users can look through and interact with and view in detail and inspect to make sure that everything is making sense. And so this is a screenshot of the software that we work with. The main image in the upper middle of the window is an overview. And so you can kind of see that there are six vertical regions, and those correspond to six rotations around the cylinder. There are six bands that we measured, and they're displayed here as if we opened them up and flattened them out. So it's as if we took the cylinder, we cut a seam in it from bottom to top, opened it up and flattened it out, and then we're looking at it. And so again, the relevant data is very clear. To us as operators, but we then need a way to analytically isolate it and point our computer program to it. And so the regularity of our data that creates this pattern of lines that are all about the same width and the same separation and average the same depth allows us to apply minimum finding algorithms and pattern recognition algorithms that allow the software to go through and identify the bottom of the groove and mark it with a blue line. And so you can see where the software has marked the bottom of the groove with a cyan or blue line. And so then at the right hand side, you can see that's a zoomed in detail of part of the overview that you're seeing at the top. And if you follow along the bottom of one of those grooves that I've marked here with a red line, you can see that the image, it's not just a solid black line, it's actually oscillating from dark to white and back again. And so again, because color corresponds to height, the dark light oscillation is vertical oscillation. It means that the bottom of the cylinder surface, as we look along the bottom of that groove, is oscillating up and down. And that vertical change is audio content. And so then the plot at the bottom middle of the interface is just a point by point display of the data that's along that red line. And that translates into a point by point reconstruction of the path that the needle would have taken as it went through the cylinder. And that's gold. That's exactly what we're looking for. It's very, very close to being all the information that we need to reproduce sound. The last sort of step that bridges our gap between data and audio is that we have to take a derivative. Because the mechanics of the recording device, it turns out that the needle's speed and not its position is directly proportional to pressure. And the pressure is the signal that we're trying to recover. So we take our path. We take our derivative to get the speed that the needle would have had at every point on the surface. And that signal is identical, basically, to the sound that we're trying to recover. This analysis works for broken cylinders, too, with the added benefit of we can look at the surface in enough detail to diagnose how big the crack is and how severe the break is. And in the best case scenario, it's a small break. And the processing runs normally as if it was a broken cylinder. There aren't any hangups. But there's a complication that if the split is too large, or if we're scanning a cylinder in multiple pieces and we weren't able to line up the pieces exactly enough, then our tracking algorithm can get lost across the grooves if they don't necessarily line up from one side of the break to another. And we can still work with these materials. We just have to go in, and instead of letting an algorithm go through and trace point by point the path of the stylus, an operator has to do that instead. So we can do it. It's just labor-intensive and time-consuming. But we do do it in the cases that demand it. And so the bare minimum that we can do is create a digital file with the audio that was recorded on a cylinder. And so I have a sample here that I'd like to play for you. It's a recording from our collection. It records a Native American named Ishi. And he was the last member of a group called the Yahi. He encountered UC anthropologists who did a series of around 200 recordings with him. And this clip is an excerpt of one of those recordings. And it's about 20 seconds or so. A full recording is going to be around two and a half minutes. And so the first thing that you're going to hear in this particular sample is a voice in English that introduces the recording. And we think that that's Alfred Krober's voice, because he was the one that was conducting this recording. And then after that, you'll be able to hear Ishi singing. And then also the animation is going to show you where the path of the original stylus would have been around the cylinder. I have a buffering wheel. April 14, 1914, Ishi, Duster's Dawn, repetition of 1740. So there's a few things to notice when we're listening carefully. One of the things is that there's clicking and popping noises on top of the audio. And those are going to be the results of every time the stylus runs over a piece of dust or a hair that's on top of the cylinder, it's going to pop up. And that changed that jerky motion in the stylus results in a click or pop that you can hear. And then also towards the end, you can hear a more rhythmic crunching noise. And that ends up being a once per revolution crunching noise that happens every time the stylus runs into one of those a patch of mold decay where the mold is eaten away a section of the cylinder. And the stylus has to bump around through the decay. And so it turns out that because we're working with images, we're in a position to help with these defects. And what we can do is we can look at the surface in such detail that we can visibly find things like mold decay and hair and fibers. And we can isolate them from the intentionally recorded audio. Looking at the left, you see the raw data. And in that, you can see a white smear. And again, white is higher, dark is lower. So white is something up on top of the surface. That's going to be our hair or other fiber. And then up at the top, you can see these webbed structures. And that's what is typical of mold decay, where the mold is eaten away in this webbed structure. And if you look at the bottom, you can see a point by point plot along a vertical groove that contains audio. And if you follow along, it's got these slow, gradual undulations. And then you can see a big jump in the surface height. And that's the change in height due to the hair. So the changes from these blocks, or from these defects that in the image processing world we call blobs, the changes that happen are very rapid and very sudden. The point to point displacement is very high at the boundary of these defects. And so what we do is we run an algorithm that goes through and point by point it says if the distance between here and the next point is too big, then it has to be a blob. It can't be audio. So we flag that as the boundary of a blob. And we can see where in the middle image, you can see that all those defects get marked in orange. We've identified them. And then we can go through and we can smooth over them and reduce the effect that they have on our final audio. And so I'm going to play you again another couple seconds from the end of the last clip. I'm going to play it again first without cleaning on it, and then I'll show you again with cleaning so you can see if you can hear the difference. So that's our raw data. That's just bare minimum. And then after cleaning, it sounds like this. So you can see it makes an audible improvement. Some cylinders it makes more of a difference than others. But it's just something else that we're able to do because we're working with these things, taking very detailed measurements, and working with images. Another thing that I hope you'll notice is that there is a high frequency hissing noise in the background of this audio. And that is actually a signature of the method that we're using. Because we're digitizing with light, we see the surface in far greater detail than a playback stylus would have. The point size of the probe, the diameter of the light points that we shine on it is on the order of microns. And the size of a stylus is a couple hundred microns. And so that difference in scale, let's optical scans resolve far smaller and closer together changes in surface height than a stylus can pick up. And those smaller, closer together changes correspond to higher frequency noises. So in the graphic that you're seeing at the top, the different colored wavy lines correspond to signals of different wavelengths and different frequencies. So the signals at the top have the highest wavelength and the lowest frequency, and then the frequency increases as you go down. And the stylus can only physically conform to the largest wavelengths, the ones where the wavelength is on the order of its own size. But the optical probe has no problem tracing and resolving those changes, even at the highest frequencies. So this means practically that we digitize higher frequency audio. It means that we can resolve and identify and clean out sudden changes in height that are the blobs that we saw earlier, or the dust and dirt and damage and things. But it also means that we digitize higher frequency noise and irrelevant information. And some of that is outside the range of audible hearing. Some of that is the high frequency hiss that you're hearing on our audio. And it's a trivial matter to take our audio files and use commercial software processing to roll off and filter at high frequencies. So that's something that we can also provide with our raw data for people that are using our sound files. And so at the end of our work, we have a two minute long digital audio file. And the way that we got there was we sort of chased our relevant information through all of these different formats. We started with the sound waves, but then got transferred into the height of a cylinder surface. And then we took that height and we turned it into a depth image, and we analyzed it, and we finally got out this digital audio file. And so from a physicist's perspective, that's kind of all we can do. We got out our signal, we cleaned it up as much as we could, and we put it in a digital format. But the final layer of information that we leave out of that equation is the information that really motivated all of our effort in the first place, which is the cultural and linguistic information that comes from listening to these recordings with linguistics and with people in the community for whom it has meaning. And so the last piece of our work is actually to turn the audio file over to the people that want access to it. And so for the Hearst Museum Project, our final audio is handled by the California Language Archive, which is an online repository of indigenous language materials. And they will facilitate restricted digital access to the content for community members and for scholars. So the kind of overview of everything that we can do as this technology stands right now is we can create and carefully acquired high resolution surface maps that are accurate to the micron scale. We can analyze those maps to extract relevant audio, and we can clean them to minimize the audible effect of decay and damage. And then we can turn them into digital files that can be circulated and metadata that we can archive, which ensures that the maps will make sense to those using them in the future. And it also allows them to be re-analyzed as new processing algorithms become available. And as of this current project, we can do this on a massive scale. We can do this for ultimately 3,000 recordings and four recordings that are in various states of disrepair and stability. But there are things that we would like to be able to do beyond that. And to wrap up, I'd like to mention a few of the questions that we can ask potentially in the future of this technology now that we're working with larger amounts of data. The first place that we could use improvement is in our handling of broken cylinders. Currently, the strategy is to reassemble them and scan and analyze them as if they were whole. And again, for the most part, this works fairly well. However, there are some cylinders where the grooves don't line up between pieces. And for those, it's difficult for us to run our tracking algorithm that finds the path of the cylinder, or finds the path of the stylus over the cylinder. The tracking algorithm has a tendency to jump to a different portion of the groove from one side of the brake to the other, which connects to a piece of the cylinder that aren't physically connected. And so when we play it back, our stylus is jumping over the surface of the cylinder, and we get audio that doesn't make any sense. And so there are a few different strategies that we can take to modify the analysis as it exists now or even to create new analysis techniques that would be able to better help us handle this. The first strategy is that you can imagine scanning the cylinder as we already are and loading in the image and then going through and flagging the discontinuities that happen at the cracks. You can do this either on a large scale by allowing a viewer or an operator to go through and click with a stylus, or sorry, click with a cursor, the lines of the cracks and trace them out. Or you can do it programmatically and go through and systematically sweep through and find these big discontinuities. And then we could potentially apply a big horizontal shift above and below the cracks to line up our data and get it back in the correct order. And you can imagine also using user input to tune the amount that we're shifting or automatically doing it as well. And then from this point, we could just apply the standard tracking algorithm that we've already been using. We're already doing some large scale, constant shifts like this. And so realigning our data in this way, it isn't totally outlandish. It's a solution that applies capabilities that we already have in a new way to solve this problem. Another strategy is to leave the data unshifted, leave it alone, just take it as we've been taking it and create a more sophisticated tracking algorithm. We might be able to update the program to allow user input to shift a tracking path by one groove or however many grooves would be appropriate to line it up. Or we could programmatically run and test different possibilities of paths across a boundary and then use an algorithm to select and recommend one of the results. And this, again, takes advantages of the capabilities that we have within the program and applies them in a new way. There are many other schemes that you can think of for improving the tracking and actually getting it to choose the correct path across a break. We can also take a step backwards and address the problem from the acquisition stage. We might be able to scan cylinders and pieces. This would take the invention of a new scanning fixture that would allow us to hold fragments in place. But that's feasible. And so if we did this, we would get scans of fragments instead of full cylinders. And we could then load those images in. And if we have the correct interactive user interface, we could load the images and pieces and then allow the operator to move them around and to reassemble them in digital space rather than physically on the mandrel. The limitation on this method, if we use user input, is that if we have a cylinder that's broken into many pieces, last week there was a cylinder that we had to send back because it was in 200, then this kind of isn't feasible for that. So a more complex data analysis solution would be to load in the images for the pieces and define the edges and then process the geometry of the edges to suggest likely matches. Or depending on how many pieces we're dealing with, we might be able to test many different orientations of the pieces and many different matches to see if we can analytically reassemble these broken cylinders. And then if we can reassemble them correctly enough digitally, then none of the tracking problems that we had earlier will be applicable. And if we can find a solution to more efficiently handling broken cylinders, this would be very helpful for us. The benefit is that right now it takes almost twice as long to work with broken cylinders and it takes just to collect the data for broken cylinders, takes twice as long. And the analysis takes up to three times or even four times as long as it does for a cylinder that's intact. So if we found a more robust solution, we would cut down our working time by quite a lot. And also the pieces that seem to have the most curiosity attached to them in the community outside the project among the people that are actually using these things are the broken cylinders. Because those are the ones that haven't been played back before and those are the ones that people are most curious about and eager to hear for the first time. So this would be a very good solution for us if we could figure out a way to more robustly handle these things. And so another thing is that, as I've mentioned kind of a couple times before, we have to make this process applicable for many different cylinders. We have to, at many points, tune how we take our data and how we analyze it to the specific cylinder that we're processing. And in the analysis code that turns into several variable parameters that users change and input as they're processing the current cylinder. And a question moving further is if there's a more sophisticated way for us to deal with that. A signature of the data that we're taking is that they're very regular and consistent. The data itself for one cylinder has this regular pattern of vertical stripes across the entire image that are all some set width and distance. And even between data sets, this sort of same pattern kind of applies. So if I flip through just a selection of seven data sets for cylinders that I pulled, you can see that they all look very similar. They all consist of vertical stripes. There's some overall curvature and some spacing, some depth that's basically constant across the data set. And the similarity, those similar things, those patterns that are easily recognizable across one data set are the reason that pattern recognition algorithms work for tracking. They're the reason that we can process these cylinders with these pattern recognition algorithms. But there is enough variation in these cylinders, as you can see as we're looking through them, that one hard algorithm doesn't work for everything. The grooves on some cylinders are going to be closer together than on other cylinders. Some cylinders have more surface damage than others. And there are some cylinders that have recording anomalies in them. And so again, we're coping with these variations between data sets by having these variable parameters. So the big picture message here is that these data sets are similar enough that we can flag key specific measurable features that are relevant to every cylinder. And we can key off of those features in our processing. But they're not so similar that we can have one concrete algorithm that applies to everything. And so the question is, is there another way to cope with this variability besides these variable parameters? Currently the way that we select parameter values is to tune our software to the current cylinder by having an operator guess a value, plug it in, process the cylinder, and check and see if it works. And the way we choose those values is by considering the cylinders that we've seen in the past and kind of guessing what would be appropriate. And if you've looked at enough of these things, and if you've done that enough times, it might be tempting to ask, is it possible to do that same guessing and checking and predicting with a computer algorithm rather than with an operator? The question is, could we measure certain quantifiable key features of a data set? Could we track and store their values along with the parameters that we plugged in in the software to make it appropriate for that data set? And then could we use that pairing of characteristics in the data set and appropriate values to make predictions about what values would be appropriate in the future? And as we accumulated more and more samples, we might be able to get enough data to trace out patterns and eventually create software that was able to predict which parameter of values are appropriate for the current data set. And now that we're dealing with cylinders and larger numbers, this task is both more necessary as we have broader spread in the types of materials that we see, but it's also more feasible as we have more instances of being able to detect these patterns. And so the bigger picture tax that I'm talking about is building a database of these images, keeping track of the relevant information in a way that allows us to follow patterns and implement predictions. If we broaden our focus from detecting patterns and the appropriate parameters for different cylinders to detecting patterns and other relevant features, I wonder a question that we can ask is what other research opportunities might open up? So far in working with these images, we're very targeted in our approach and very targeted in what data that we want to take from them. We specifically look for the position of the groove bottom and along the way we also look at defects. But what this sort of overlooks is that we're collecting images of an entire surface. We have far more data than just what's in the bottom of the grooves. And so when we look at the images as whole, we see things like anomalies in the groove pattern, the evidence of malfunctions in the machinery, the microscopic state of materials, these interesting dynamic interactions between the recording stylus and the surface of the cylinder, and many other curiosities that aren't evident that, first, these things that we're seeing you can't see and you can't get knowledge about by just playing back the cylinder with a stylus. And also, we're seeing them, but currently we're not fully exploring them with our project. A strong data analysis or image analysis skillset could potentially be well equipped to identify, quantify, and note patterns and features. And this would allow us to solve questions. This might allow us to solve questions about the recording apparatus that are relevant to historians of science, questions about the conditions of these materials that are relevant to conservators, questions about acoustics and sound transfer that are relevant to musicologists. But we're just currently not doing it. But we could potentially in the future. And so the final thing that I'd like to point out is that the optical scanning method has been developed through sustained interaction with students. The larger method as a whole, not just the Hearst project, has been developed over the last decade. And over that time, more than 40 students have come to work with the technology and contribute significantly to it. I was a student when I started working on it. And so we hope that there's opportunity for students to continue working with it. And in listing out these questions that are still kind of floating around about what we can do with technology and what else we can learn from it, we hope that students will be able to contribute as well. So thank you for listening. And this page has some links. If you're still curious and would like to follow up. And if you have questions, I can do my best to answer.