 I'd like to thank the Nathan for inviting me to speak about my take on this subject. I was wondering how I would feel 20 minutes by talking about day management and as I put this together I realized that I could probably talk about two hours about it. Unfortunately what always bothers me about my specialty is is that every time I give a presentation I have to waste about I have to waste several minutes explaining what Pew is and unfortunately yeah you'll have to sit through that because there's no real point in me talking about data if you don't even know what the data is of. So please bear with me. The Pew is a modern Burmese exonym for a people who once lived in what is now called upper Burma in walled cities. We don't know what the Pew called themselves. They flourished during the first millennium CE and they assimilated with Burmese speakers who arrived around the 9th century CE. The last Pew text is from the late 13th century CE and their language has been largely a mystery for a very long time. At this point we know very little about the Pew language. We know that it is a Trans Himalayan aka Sonos Tibetan language which is somehow related to the language that ultimately replaced it Burmese but apart from that we really don't know much. What remains of the Pew language are three kinds of data. First are Pew language epigraphic texts in an index script, this background image here. The second type of data are Tang Dynasty Chinese transcriptions of the Pew language in Chinese characters. There's the Chinese transcription of the Ethnom and Pew in red. These are very extremely limited in number and difficult to interpret. The third remains of the Pew language are some borrowings in Burmese. If you attended last month's workshop, one of the speakers talked about potential Pew long words in Burmese. This is all that remains of Pew. Pew has left no descendants. We just have two kinds of textual evidence in the Pew script in Chinese script and the third type are some words in Burmese that are thought to be of potential Pew origin. Most of my work involves the Pew language inscriptions which are the largest body of evidence concerning Pew. These inscriptions are documented in several formats. One being photographs. These photographs vary considerably in quality. We have both photographs taken from the Colombe, from when Burma was a British colony all the way up to the present. Many of these photographs have been taken by the photographer James Miles of ArqueoVision. One might think that new photographs in color and high resolution so forth would be inherently superior but often this is not true. Sometimes older photographs are better because items have been less damaged in the past. The photo here is of the most famous Pew inscription of all the so-called Pew Rosetta Stone which has four sides and four different languages. One of them being Pew and the other three being in other languages. Photographs also include rubbings of inscriptions so the previous photograph I showed you was of objects but we also have photos of rubbings including unpublished rubbings in the collection of Professor Janet Stardart of Cambridge and we also have so-called reflectance transformation imaging files or RTI files that were made by Oliver Riffitz and shown here James Miles of ArqueoVision. These RTI files are made from multiple photographs of a inscription and they are put together into a computer file that when viewed with appropriate software can simulate the experience of viewing the same inscription from different using different kinds of light. In this example here we see that these two vertical lines which are Pew punctuation marks you can see different aspects of them in different lighting. This is a single RTI file but by manipulating software I can change the lighting and see things in different ways. James Miles of ArqueoVision has also made for us 3D photogrammetry models of smaller Pew objects we are able to rotate these things in large them and so on. So this this is the kind of these are the kinds of files that are generated from objects photographs rubbings photographs of rubbings and 3D models. Now the problem is how do I catalog and keep track of all these things. When I first started working on Pew two and a half years ago with the mission to try to decipher the language I only knew of a few inscriptions so I didn't feel the need to number them. I came to Pew studies from Kitan studies. Kitan is an extinct central Asian language also primarily preserved in the form of inscriptions and Kitan studies for whatever reason no one has ever come up with any kind of cataloging system. So there are no universal numbers or no universal conventions it's a complete mess and I unfortunately inherited this chaotic mindset which turned out to be a disaster because it turned out there were a lot more inscriptions than I had thought. Here is the software inscription house in Burma where there are multiple Pew inscriptions. As a starting point to resolve this mess of cataloging Charles de Rozel in the early 20th century came up with initial inventory of Pew inscriptions. His first inventory had just five items. He eventually enlarges us to have 15. More and more inscriptions were found over the following 80 years but the problem with Pew studies is that after the colonial period Burma had gone through some rough political times and Pew studies basically were frozen in time and so fortunately a few years ago Arlo Griffiths and Julian K. Wheatley picked up where de Rozel left off adding 155 inscriptions to his list of 15 for a total of 170 and Arlo and Julian have collected these inscriptions in an Excel file containing these various fields. Now what I could go on and on about all these fields but the field of interest to me is numbering the numberless items. Now how are you going to work? How is one supposed to number these things? They almost none of these inscriptions have dates so we can't arrange them in any kind of chronological order for the most part. The few dates we have are terminus aqua. They tell us a point where we know that the inscription must post date but it doesn't tell us exactly when it was made and a few dates are in a calendar we don't even understand. So we can't use chronology to help us organize these inscriptions. Geography is not all that helpful either as most of the inscriptions come from three sites. So ultimately we ended up using de Rozel's original 15 in their arbitrary order and the remaining 155 inscriptions were grouped into thematic groupings and numbered consecutively within each grouping. So 16 through whatever is the few inscriptions in stone, few inscriptions in the polylanguage were another set and so on. Unfortunately new finds that fit into these earlier categories are just dumped at the end and this is unfortunate but unavoidable and new finds are being found all the time. Here's an example that I found while I was visiting in Burma exactly a year ago. We went to an archaeology office and boom we were telling hey we just found this two days ago. Now what I do with all of those photographs of whatever is I try to transliterate the text. So this is taking the raw material, taking the raw materials, converting the oriented photographs and so on and this is the next stage so where I take the text and try to convert them into letters. Now generally speaking Pugh is written in an index script. They are conventions of how to convert index scripts into roman script and so generally we follow them but one huge problem is that we don't know exactly how Pugh was pronounced and this is a problem because it means we don't really know how meaningful this transliteration is we're just I'm just mindlessly copying letters without knowing exactly what they stood for. Then I take these letters and I put them into XML files one per inscription. These XML files are in the Epidoc standard and they contain all sorts of information such as other people's readings in the past. As a phonologist I then take these readings and chop them up into little pieces like constants and vowels and these are put into columns in Excel files. Now earlier today we've come up with the theme of version control and why is that important? Well here's a personal example. As I said before we don't really know much about how Pugh was pronounced we're just mindlessly copying letters and converting it into roman script. So there is this subscript dot in Pugh that is very very common and for a century people had different arguments about what this dot stood for. When we first started transcribing Pugh we just arbitrarily decided okay we'll just make this dot an apostrophe and we'll stick it after the last constant before a vowel. Later on we changed our minds and decided to mark this as an M with a dot since M dot is a convention in andology for writing superscript dots we thought it would be really clever to use M with a subscript dot to write a subscript dot. Now here's where version control came in. This isn't just for 22nd century archaeologists. The problem was that I've mentioned my Excel file, I've mentioned XML, we also had RTF and text files. Migrating the data through all these different formats meant that when we come when we change our minds about how to transliterate Pugh it meant that we had to of course change all the other stuff in the other formats and this is where version control became crucial because searching replace with Pugh is a nightmare as I tried to use regular expressions to do search and replace but Pugh structure is so complicated that a simple formula doesn't quite work. Also there are typos and such where the apostrophe was the wrong place to begin with and so I did innumerable errors and had to convert things over and over and over again and sometimes I would think oh I'm done and then and then I would I also use GitHub by the way so I'd upload things to GitHub and I'd discover oh god it's all wrong and so this is where version control comes in handy because you go back to your old version you try to diagnose where your search and replace went wrong you undo that you do the search and replace again you re-upload to GitHub so that that's the value of GitHub for me personally is this type of error management of discovering where I went wrong in the past of restoring the old data and so on I've mentioned GitHub the XML there we then move to the publicly accessible site purpose of Pugh inscriptions my attitude toward Pugh decipherment is an open-source kind of attitude I like the idea of having the public being able to look at the same data I do and come to their own conclusions about it this website contains the latest version of the XML for the Pugh inscriptions it synced with our GitHub it contains images of the inscriptions and a bibliography of Pugh studies we've already mentioned the Zinodo previously the the XML the photographs the RTI files are publicly archived at Zinodo the future steps with data include some that I forgot to mention here so I will I've been talking purely about Pugh script stuff future steps include well what we just start with the Pugh script I'm working on an archive of images of Pugh actualize actuals are our letter our character combinations representing syllables all this colored material here is a single for the for the word for King that I translated it as these actuals I have taken screenshots of them and I am building an archive of them so I can try to break down Pugh script into its components and look at how different letters take different shapes and different environments and so on all this will eventually be publicly archived as well but this is still in the compilation process using this analysis of Pugh script I hope to to come up with a proposal for encoding Pugh and Unicode at some very late stage ultimately convert the Pugh texts back from this transliteration stuff into the original Pugh script in Unicode lastly other data that I plan to work with in the future are our for instance the Chinese transcriptions of Pugh which have never been systematically analyzed that to all be publicly archived and so I am going to now end 12 minutes early and and we'll see if we can have a record number of questions yes the inscriptions just about Kings battles and donations or is there some more interesting material in the inscription oh well well the state of Pugh studies as I said it well considering that we can't even we don't we hardly know how the language is even pronounced right now we only know about 200 words and so quite frankly most the inscriptions are incomprehensible at this point I mean what I what I do what when I transcribe these things I feel like some kind of mindless robot much at the time because I have no idea what these things are saying one saving grace of Pugh is that it is an index script so we have so we recognize the letters but the effect is like for both for most speaker for most people looking for most people familiar here would be familiar in familiar with the European languages the effect is is that of looking at Hungarian or Finnish where you recognize the letters but the text is just completely alien looking the language is very distantly related to Burmese and I have studied Burmese and frankly it really doesn't help a whole lot so so the point is that we just really don't know what most of these things are saying at all some are some our funeral our funeral texts because we recognize things like die we recognize dates so we assume those are death dates but we don't really know and we assume that the king and the name on it is a name of the of the deceased but this is all assumption really I mean no one has really come up even with something as simple as a a really good word-by-word analysis of these alleged funerary texts we just recognize these words and people jump to conclusions a lot of peace studies is highly conjectural and and that this is not really emphasized because people like pretending they know what they don't so so that the anyway brings back to a more database theme the fact that we just understand so little just makes cataloging this stuff really really difficult I mean I gave no just enumerating the inscriptions as an example but just trying to figure out how we're going to convert this into letters I mean you can base it on in dick convention to some extent but there are things like subscript dot and other oddities that have no in dick bases and I've argued with my colleagues back and forth what are we going to do about this and then it's like okay well let's search replace and try it this way no and then let's do version control and undo that and back and forth and that's because so much of this is just so unsettled I I think what you may find of some interest in my in in my talk is that I'm dealing with the problem of cataloging almost total terror incognita that's a very different issue with from what other people are working with I think I mean just just trying to figure out what categories to put things in is a mess I mean the fact that that you asked about the content of inscriptions I really can't use that as a method of cataloging them at all I mean I don't know what these things are saying and that's why our the categories for the enumeration are just so crude is it on stone or is it on metal yes so you mentioned that the the XML files as a zip file are all on Zanotto yes and the primary sources in terms of photographs and RTI files are big so I presume that you can't fit that all in one Zanotto record so how how are you keeping track of the relationship between the different Zanotto submissions each of those Zanotto submission is cataloged by the Pew inscription number and I and so I I looked up things in Zanotto by those by those numbers and under each number there's a huge file with the RTI and photos what I don't frankly what I don't like about Zanotto is I can't just grab a single file out of these huge collections yeah but on the other hand then we would have like hundreds of full files per inscription and that would be a different sort of problem yeah because RTI files require many many photographs to create to create a single RTI file that can be viewed with simulations of different kinds of lighting. Yes. I know that you've made your data available in the Epidoc XML format. I was wondering if you would comment on how much work that was and what the benefits of that. The benefits of Epidoc for me are that it it it a lot it builds a bridge between Pew studies and other kinds of epigraphy I think I mean one weird thing about Pew studies is that it is has it has been basically an island totally cut off from everything else I mean even within some of Tibetan linguistics it is pretty much cut off from everything else when I when I studied the Epidoc standard Daniel and I went to of all places Romania to take classes in Epidoc we we studied along with a classicist using Epidoc for Latin in Greek and the strength of Epidoc I think is that it's a shared format you that is usable for any kind of epigraphy and and so Greek and Latin epigraph for epigraphy experts have encountered many problems already and have found solutions for them. Pew is still a infant field barely explored and so it is nice to have this body of expertise available that can be recycled for our own purposes without us trying to have to trying to have to reinvent the wheel and I find I find that that valuable. I think Pew is just so unstable and so mysterious that any kind of help I can get is very much appreciated and and using an existing standard like Epidoc that has been around for a long time helps toward that end.