 Ychydig i'n ddelch yn ei wneud. Ac yna hwn yn ei gynnig yn yr unrhyw yng Nghymru. Yn ymwneud, Nathan, ddim yn ffrwng ar y hynny, ac yn dysgu i'n ddysgu'n ei gynnig. Yn ymwneud, Nathan, rydyn ni wedi'i gydig yr ERC-yfyrdd y projekt y Hata Yoga yn lle bydd y ffyrdd yng nghylch ffysigol yma. Mae ymdeg 5-yfyrdd. a we just passed two years into it. I hope that we are slightly over the top of the hill of data collection. We have collected a hell of a lot of data. I will talk about that of course. What I said at the beginning is I am very much an end user. I am the sort of person that if I never look under the bonnet of my car unless it goes wrong. I do not like to tinker around with things. I am not going to be creating new systems for looking at the data. I am an end user so I was sort of half disappointed and half relieved to see that Mondana and Helen are here now. I know that some of our practices are probably not as they should be. I am very much open to advice and correction in some way. I am just going to talk you through what we have been doing. Go back to the point I make about being an end user. I had a meeting with a PhD student yesterday who is just starting out. She is doing wonderful bibliographic database in Zotero which looks great to me. I am 25 years down the line and I have a really unwieldy text file of bibliographic stuff. I am happy with that. I am not going to go back and re-enter everything into Zotero. That is the sort of point I am going to make. I want everything to be as efficient as possible without me having to do huge amounts of manual processing of data. I will move on to the update. It is ready to install. It is very simple. The project is really quite simple especially if you look at it compared to Beyond Boundaries. We have got primary outputs, 10 critical editions of Sanskrit text on yoga ranging from about the 11th century to the 19th century. Four monographs and, as I say, umpteen articles I cannot remember. There are probably about 20 journal articles we are going to be producing by the end of it. Our data for producing those primary outputs is the primary data categories are scans of manuscripts and published editions. We have got one of the project team members who has spent probably most of the last two years in India going round manuscript libraries collecting scans where he can. Also, of course, some of the texts we are working on have been edited in some way. We have tried to track down some of the editions that are obscure but we have tracked them down and we use scans of those as well. The two primary methods in the project proposal are philology, so the editing of these texts and ethnography. We have got one full-time ethnographer on the project as well. She spent probably over a year in India searching out traditional yoga practitioners, interviewing them, taking photographs, filming them and so forth. She produces that ethnographic data and then we did not flag it up quite so much in the project proposal but a pretty key part of our work is our key source of information, our historical materials that depict yoga practice. I will talk a bit more about all these categories in a minute. At the moment, the way we store this data is fairly unsystematically. We are not tagging it with loads of metadata and so forth. I think Daniela is a bit with her ethnographic stuff but probably not to the degree that would meet the usual requirements. At the moment we have got a shared Google Drive folder and then everything comes down to our personal laptops and then we all back that up on external hard drives. That is it at the moment. I will talk about the plans for the future now. Looking at the philological side of the project, the text data processing. The process of that is we get all the witnesses we can of a text. We start before we have got all the witnesses, which is an ongoing process and collate those witnesses, those manuscripts. In the project proposal I did put some fancy stuff in about clodistics. Does anyone know about clodistical stemma analysis? To be honest, since I have looked into it, I have become more and more sceptical about clodistical analysis. I do not think we are going to use it at all. There are various problems with it. What it is, it is a way of forming a stemma of a manuscript tradition modelled on genetic analysis. One of the problems is that you can only fork in two ways. You can only have a bifocation, you cannot have a trifocation. Of course, we all know that one manuscript might get copied by more than two scribes. It instantly breaks down there. It also cannot take into account contamination between branches of a stemma. This is one of the examples of things where there is a bit of a fashion in indology at the moment. Certain people are really promoting this but I am extremely sceptical about it. I think actually what would be very beneficial would be a project looking at exactly that and seeing whether it is helpful or not. People do not seem to be questioning it. That seems to be a problem with some of the digital tools available. I think a great digital humanities is a new digital tool. Let's go for it. I think we need to be a little bit sceptical. Again, there are tools for collating large bodies of witnesses and so forth. In fact, when I started my PhD in 1995, I was urged to use this new programme, Collate, that was being developed in Oxford. Again, I think that it actually cost me a lot of time doing that. First of all, you have to read the manuscript and you have to transcribe it and then you have to break it up into little chunks that can then be prepared. I honestly think that it probably wasted months of my time doing that and I wouldn't do that again. After you have collated your witnesses as the editing process, now fortunately, especially since I have discounted stymatic analysis, again, with our texts we always have contamination. We can't do rigid mechanical stymatic analysis, so the editing really is a, we need to do it. I don't think any digital tools yet have been developed in any way that would be able to perform the editing job on our texts. Finally, once we've collated and edited, we want to produce the addition for publication. What we're going to do with our project is every addition, we're going to produce finally in XML and the TEI encoding that's been developed especially for the SARET project. Now, again, in the project proposal, I got all excited and wrote, you know, it seemed like it would be practicable at the time, whereby from the XML SARET file, you could then process it two ways. You can either go, you know, just at the touch of a button, it could be converted into LaTeX and then a PDF, so book publishing form, or the other way into HTML. I think I've got some pictures for those of you who... Here's an example of a SARET file, you can't see, but it's a website that's got not a huge number of texts yet. I mean, that's one of the problem of these things is there's only a small body of text. Actually, I don't think many endologists are using it as a search tool, but it's very good for presenting a single text and also we know it's sustainable, you know, it's funding, it's just got a new round of funding from Heidelberg and so forth, so those texts are going to be there for a long time. There's an output file of PDF produced by LaTeX for a critical edition, and then there are these fancy, very impressive HTML outputs you can get. This was developed by Charles Lee, who's at Cambridge. This is a very nice front-end, as you can see, where you can alter all the parameters of which manuscripts are looking at and so forth, and it will generate an additional apparatus. This is another one I was hoping to be able to go live on to the internet, but I don't think I would attempt to do that now, but Andrew Ollett developed this one, whereby, again, from an XML file, you can wave your mouse over these highlighted bits of the edited text and the variants will pop up by magic. This is what we want to do, is we want to make our editions, we want people to read our critical editions, and this seems to be a very good format to encourage people to look at the variants. One of the things we've been talking about yesterday was documenting the process of our research, of working with this data, and whether we need to do that, whether we need to make that available, and I was kind of reminded of this, I was thinking about this just a couple of days ago when a friend posted this on Facebook, it's one of Daniel Ingalls' notebooks, he's been going through some Harvard archives, trying to understand how a great scholar Ingalls produced the variants of his works. Now, I think it would be somewhat sort of hubristic to assume that people might want to do the same with our work. Also, this is something maybe we could raise in the discussion, I'm not going to go too much further into it, but I don't really see how we're going to, how one would be able to record the process of editing. I'm not sure it's particularly useful. As I say, it's hard enough to get people to read critical editions in the first place, let alone go through the mechanics of how they were made. I will one little aside, there's one moment, I was talking about this a few months ago, with some colleagues, when one's editing a text, there's one moment I think would be nice to record, I haven't worked out a way of doing it, of when you've collated a few manuscripts, a bit of the text doesn't make sense, so you come up with a brilliant emendation and you record it in your critical edition, then you collate another manuscript and there it is, your emendation is now recorded in the manuscript and you think, yes, I was right, but then there's that sinking feeling when you realise that in the addition you're just going to be recording that witness and not your brilliant emendation anymore. So that would be one interesting thing to go back for. There should be some sort of German word for it, I think, for that feeling. Anyway. OK, so that is... How am I going to move on? Now, also with all our data, so our primary source of data is manuscript sources and I know we're meant to make everything available, particularly with this ERC-funded project, but the thought of trying to get permission from all Indian libraries to actually make... even once you've got the scans to then be allowed to put them online and make them publicly available just fills me with absolute horror. I got this email yesterday from Jason, who's one in India at the moment going around the libraries. Let's just give you a new example. We went to the Sanskrit College's library today only to learn that the uncle of the librarian had died that morning and had gone to a village some distance from coach in to be with her family. We met the principal and she was very nice. She said the librarian is the custodian of the library and she's the only one with a key, what can I do, you must come back on Wednesday. So just seeing the manuscripts is often a triumph and then getting permission to scan them, but getting permission to make them public, I think, that would be another five-year project, probably for each library in itself. So that's not something we're considering. I was also relieved to see this. This is a hot off the press just two or three days ago. It's not only us, it's not only foreign scholars, even Baba Rande who's tight with the government. He's a sort of Ayurveda yoga guru, stroke businessman entrepreneur. Even he's having trouble getting hold of manuscripts from the government repository in Delhi. So yes, this is a problem in our work and pretty insurmountable, I think. It's not getting any easier to put it that way. OK, so that's the textual data. Now, I've got until the hour, haven't I? So now I'm going to look at our historical data. I'll give you an example, basically. I'll give you some examples of the stuff we work with and what we might do with the photographs and scans that we have. So this is probably the most exciting discovery we've made so far. This was last year. I and Daniela came to this gate on it. I just see one passing note in a Hindi article about it and there were some statues of some famous yogis at the bottom. But then, and it happened to be on our route, this is a rather obscure little town in Gujarat, and then we noticed when we were looking at the famous yogis here in relief and then up in here all these yogis doing complex yoga postures. This makes them the oldest such statues by about 300 years. So it's a very exciting discovery. We got some rather bad photographs at the time. Of course, you can't really see that at all, can you? Anyway, I don't know if you can tell what's going on, but it's a, well, actually you can from this. Because we then went back and got some better photographs. That was a whole saga in itself. I went with Mark Singleton as my colleague on the project with Indian photographer. That, again, is worth a book in itself, the story of how we got these photographs, so I won't go into it now. But again, I'm wondering, do we need to preserve the story of the different levels? We've got about three or four sets of photographs of this gate getting better and better. Presumably we just want to publish the best ones. Although just go back to this one. I don't know if you can make that. It was quite handy having the pigeon on the top. Because at least with that one I can prove I haven't just turned the photograph upside down. It's not someone sitting in lotus posture. Now the other sort of sources, we use one in particular that I've worked with a lot, Mughal miniatures of yogis. And what we found, particularly with the Mughal miniatures, incredibly, they're very, what's the word, there's a realist, there's an art historical term, but they really seem to depict real characters. And by analysing the insignia and the very fine details in these miniatures, we can kind of, I've used that to map the history of yoga practices and also certain sects of yogis. So here, and I'm sharing this one because I've spent much too long looking at the position of earrings in yogis ears. And you see here there are the ear lobes, there's an important shift a couple of hundred years later when the earrings move into the cartelages. Now this is a very sort of specific minor example, but the point is this is a very useful tool for analysing the history. In fact, Nicety Doward at the back, actually his work I think I reference when I'm going on about earrings. But these images are very useful too, but I've got thousands of them on my computer, which are completely disorganised. Now they also, in tandem with ethnographic images, so here's a photograph of a yogi I took at Cumbagmailer in their 2013, and then we can compare that. Actually this is an image from the same album as the previous image I showed, so we know this is an ancient ascetic yoga practice. Meanwhile, the same yogi of today is doing things like this, which we have no precedence for and in fact seems to be derived from western modern yoga traditions. So what I want to do with all these images and it's not written into the project, but I've just put in a proposal to the Welcome Institute and hopefully that might come through, but I'm determined to make this happen. I've got a part-time PhD student who's all ready to use the other half of her time to input to a creative database of yoga images. So to be something, we're not sure where we're going to host it, that's all depends if we get the funding and so forth, but that is something I can really see the use of if we tag all these images with metadata, both historical, modern images, we can say lotus position in the 17th century by a female yogini and we can bring up all the pictures of that and then that would be extremely useful for tracing the history of yoga. So that's what I plan, that's in a dream world what will happen with all the photographic, the image data that we've got as part of the project. And that is it, I think. That's all I was going to say about our data. As I said, I'm open for throw tomatoes at me, rotten eggs, I'm worried that we might be doing something very wrong, but we're in the perfect position now. As I said, we're two years in, we've amassed a lot of data. I think we're over the hill of manuscript scan acquisitions, so now is a good time to be planning for what we're going to do with all that data and how we store it and how we present it and so forth. Thank you very much. I'm going to abuse my prerogative here and ask a question and make comments over the comment. Jeff will tell you that if you have filled the manuscript and it's pre-modern, there's no legal issues to share it. It may be extremely rude to the owner of the document, but that's wrong. What worries me is that then may jeopardise future scholars going to get the scans of other manuscripts. Then I'll also point out that on Zenodo you can have an embargo period. So you could upload all of those and say make this open access in 50 years. Now whether or not you believe that will work in terms of what the world will be like in 50 years from now, is not a question, but it might be worth thinking about particularly in terms of compliance questions. With the ELC. So it's absolutely the adamant that you have to make all data available, publicly available? That's maybe a different discussion. But if you did want to make data publicly available, without offending people in India you couldn't embargo it with a long embargo period. My question is about your metadata. For instance, you have pictures from Google books and pictures of the YouTube, and how are you keeping track of who took this web and where, what library holds this miniature that I can be sure of? Completely haphazardly, to be honest. That's why people send me images all the time. I haven't got time to input metadata for all of them. I suppose that my photographs and Daniela does give good metadata. Store keeps good metadata with her photographs, so that's probably a different thing. The ethnographic thing is better, but the historical images, one can, with time and effort, track down where they've come from and create that metadata. I don't think it's going to be lost, and perhaps not where it was scanned and that kind of thing. That's why I think it would be great to have this project of an online yoga image archive. OK, now other people are going to do it. You've already asked my metadata question, but the second part of my question is about data sharing. You say you have everything in Google Drive and your team, and do people work with each other's data and how do you know what is what? Especially all this precious information that you haven't recorded anywhere. Well, we do all use it. We can all access it. There's a folder for every text that we're working on, and then all its manuscripts and subfolders will be scanned within those, so we can all access them quite happily. Daniela's photographs, she's the ethnographer, they're all available to us, and we have files of rare books, whatever that's here. It's all... Someone mentioned something yesterday. I made an overview. That looks quite useful, because at the moment, we've got enough work to keep everyone busy individually. At the same time, we're not working on the same text. We did try... What was that programme we used? Sub-Ether Edit, and that was a way of working on text files at the same time. If everyone's online, we had lots of problems with that. It often didn't work. After that, we just gave up, and now we're back to the old... You work on it, and then tell me when you're finished, and send me an email, and then I'll work on it. But if this... I might have an overleaf, and see if that works. I suppose two questions. One of them is... I can understand how it's very... I don't wish to be sort of... pressural on this point but it's wrong. I can understand why it's very difficult to go back and ask a whole bunch of Indian institutions of anything. But... there seems to be a due diligence element. It doesn't seem like there's a lot of effort involved in saying would you be okay with us uploading this for public use at the moment where you're in there, or making a stand. That kind of due diligence or going to do things with the thing you've given the access to as a result of you. Maybe I have a due diligence to actually tell you what I'm going to do and make sure you're okay with it. Normally, often you have to sign something that you're saying that you will give them any publication that you've used it in. Whether that also says you won't reproduce it. Imagine all the things you might do is not necessarily permission. I wonder if you have a due diligence to say this is what we plan to do with it are there any of these things that you might have an issue with. Right? I just wonder about that particular kind of... Now you mentioned I was thinking you could also just sort of after you've got the data and ask permission and say because they never respond to emails. Yes! You could say if you don't respond to this I'll take that as a yes. He's legally the case as well. But yes, you're right. We could just make that part. Maybe I should say it to Jason from now on at the point of receipt because you don't want to jeopardise it until you've actually got hold of the things as well. If you ask beforehand normally it's they make you wait for days and you tell them you've got to fly at home that evening and you'll get it at six o'clock just as you've got to get in your tax in case of grabbing it and running. But yes, I will suggest that to Jason. It's a good idea. I just... I wanted to mention there's obviously a lot of variation between libraries and in particular the big Jane libraries are well known for being extremely hospitable and helpful and they'll put you up for free and feed you and give you scans for free and I don't know what they are I expect their policy, they expect they would be fine about uploading the data online as well. May I? Please, yes. I wonder, you and I have talked and it's on your website the connection between the actual asanas and descriptions and texts and I wonder it seems to me like that would be obviously given what you said about under the bonnet things not something for you to do but it would be wonderful to have a description in the text which somehow links to a visualisation of a particular, right? You're reading now you put your left leg behind your head and raised yourself on your pinky, I don't know. And then there's a picture either a historical picture or a contemporary picture or both which says look, this is awesome. We've done that. In fact, Mark's much more involved than I was in a film we just made a film of the latest sort of most recent of the text we're editing which has 110 postures is it? 112. And we've had models in to do those poses so that's all been filmed. I think some of them have proved impossible even for England's bendiest yogis so we may have to go to Indian school children is looking up in it. Next point of call. Part of the problem is that you really don't know often what's being described the description can be quite terse or quite obstruse and so to put a picture next to it might be a bit misleading and we just can't figure out what's going on. This is just such an amazing project I mean the content and the stuff you're bringing together but I think that one of the most important is also for people, for funders also for people that apply is that the amount of work it takes to make this digitally available is totally underestimated Fredericka is just finishing a huge project the Crossroads project and because she had a, I'm just saying that she had work in this film before she noted new exactly how to manage her 5 PhD students and postdocs and 12. I think this is something when we see so I really want to run applications because they have double heads hats see I'm not a native speaker is that the amount of work it takes to constantly input the metadata and to manage the materials and to systematically file name and to sort them and to order them is so much work that in principle every project should have someone that does this work if you're working on the scale that you are working on where it's not one person and we know from the data languages documentation project the PhD students goes in, train qualitatively they start, they collect a shitload of data and at day 3 they forget what they were recording and what they were talking about because they think they can keep it in their minds and they can't and then they come to the end of their three years PhD thing they have to submit the documentary corpus to us and write some articles get a job but I think it's really key to understand that we always think the digital is easy no it's not, it's so much work and we underestimate this and in the end the researchers go into well we can't manage this so we're not going to do this and then my question comes what happens if you get hit by a boss when you walk out the street now with all this material on your Google Drive Google has it yeah great with no metadata there or some of it we have to move on so when you were talking about how difficult something was that it was faster to do the old fashion way when I started on my project with Mattis he said writing anomlogical dictionary old fashion way would be much much faster than doing it this way so I saw other benefits in doing it the fancy computer way in characterizing it doing it the fancy computer way in terms of methodological explicitness there also might be things about whether you're not getting hit by a boss but I do think that you're right to ask what are these computers giving me and if someone tells you it would save you time they're lying okay so