 Okay, so I would like to welcome you all to this ICB Virtual Computational BioGIS seminar series. Today we have the pleasure to host several speakers. Claire Cliva, Marcie-Alphonse Carr, Sarah Schultes and Simon Gabet. So I will go briefly through the bio. So Claire Cliva is head of Digital Humanities, plus at the ICB, the Swiss Institute of Bioinformatics. She's leading the DH research project such as Mark 16, the ETOX or the H2020 Desire project. She's publishing research at the crossroad of the New Testament in Digital Humanities. She's a member of several scientific committees and editorial boards. And she's co-leading a brief series called Digital Bi-Bike Health Studies, sorry. Regarding Simon Gabet, after a master's degree in romance, philology in Paris and groups, he defended the PhD in Latin philology at the University of Amsterdam and is currently working as a postdoc fellow at the University of Michelin, where he's now carrying research on the 16th, 17th century French manuscripts and digital philology. Sarah Schultes is part of the DH plus intergenerational team with Claire Cliva here at the ICB. She has published several articles on the Arabic manuscripts of the New Testament and Digital Humanities and defended her PhD thesis on the Arabic manuscripts of the Letters of Paul in 2016. She's currently publishing a research on a Greek Latin Arabic manuscript of the New Testament and the Digital Platform Humareg as well as working on the EU project Desire. And finally Marcel Sancar studied biology and computer science at the University of Paris set. He obtained his master's in proteomics and bioinformatics at the University of Geneva in 2007 and was enrolled in the company GeneBio, a doctoral solution company for proteomic-based biomedical application. He joined the IBM S in Strasbourg in 2009 as a data scientist in cancer genomics and he then defended the PhD at the University of Lausanne studying various kinds of morphogenetic phenomena through data analysis and mathematical modeling. And in December 2014, he joined the Vital IT Group here at the SIV to work on the mega-cloths and digital humanities also at that time. So today, the four of us will tell us more on how the field of humanities is benefiting from bioinformatics practices and how these practices are contributing to the development of digitized humanities. So I want to welcome you again and the stage is yours. Thank you. Okay, today it's a special SIV virtual computational biology seminar series because we are not going to talk about computational biology or bioinformatics. We are going to talk about digital humanities. And we are going also to introduce you to our new group, DH Plus, for digital humanities plus. The group is led by Claire Cliva. For the moment, it's a small group. There's two people. There's me and Sahab. We will be joined in April by Mina Monil from UK. And we hope that Simon Gabet, who is here today, will also join us this year because he just deposited an ambition project to work on its own project ID under the umbrella of DH Plus. So I will start with a short introduction. And then each one of us here are going to present you a product or an ongoing project where it's involved. And finally, Simon is going to show us what he's doing, what he's working on in Neuchâtel and maybe we'll give a few words about his next project. But first of all, I would like to start with a small story. It's the story of the blind monks and the elephant. The story starts with a group of monks who wanted to know the shape of an elephant only based on their feeling when they touched. Why? Because they are blind. And so one of them is going to touch, for example, the task of the elephant. He's going to say, yeah, it's sharp. Maybe it's a sword or a blade. Another one is going to touch the body of the elephant. And he's going to say, yeah, it's strong and solid. So maybe it's a rock or a stone wall. Another one is going to touch, for example, the leg of the elephant. And he's going to say, yeah, it's vertical, it's strong, it's maybe a tree trunk. And for us, it's easy, actually, because we can see the shape of the animal. But for them, they cannot see, it's more complicated. What they could have done, for example, is simply talking with the neighbors, exchanging ideas and hypotheses regarding the shape of the animal. And you can imagine each of these monks to be of science, like mathematics, informatics, humanities, biology. And that's what we are doing here, actually. We are trying to enable the crosstalk between bioinformatics and humanities. And bioinformatics, by the way, is already a crosstalk between biology and computer science informatics. And this is more a biological network representation we are maybe more familiar with of what is a crosstalk. So you have two components, the I for bioinformatics, the H for digital humanities. And we come back to the question mark just after defining you what is humanities. Humanities is defined by Stanford University as the study of how people are processing and documenting the human experience. And since humans have been able, they use tools, we use tools like philosophy, musicology, theology, religion, arts in order to understand and record the world. It's a definition that I like a lot because I find it very complementary to what we are doing in life science, for example, where we are aiming to explain life systems and being able to predict it using, for example, mathematical models. And more and more humanities is going digital. So we are talking and going towards digitized humanities. Why? Because humanities is going through a massive digitization of its raw materials leading to a huge amount of data and some needs for tools, computational tools, statistics, methods, and visualization, for example. And you can compare it to what happened to biologists 30 years ago with the advantages of sequencing technologies which tries to emblematic database, such as EMBEL database or Swiss Pro, and also tools like BLAST. And in vital IT, that's what we did during this four last year, actually trying to exchange our experience from bioinformatics with colleagues from humanities. In terms of methods and tools, mining, visualization, and technological knowledge, but we noticed that there was one fundamental differences between science, biology, bioinformatics, and science like humanities is that the nature of the data is different. So we are more used to deal with quantitative information and more and more with next generation sequencing technology and high throughput technologies, whereas humanities, it remains qualitative mostly. That's the data we have in humanities. So for example, it's manuscripts. The one on the left is from Codex Arabicus. The other one is a trilingual manuscript, Greek, Latin, and Arabic from the Martian Library. And for some people here, you can say it's images. So it's a big matrix of cell intensity values, right? But it's true because depending on the question, for example, if you are interested to character recognition or to automatically detect the limit, the limitation of the verses, that's the way you are going to use it using numbers and matrices. But the question that humanities researchers are asking is lying at the core of the manuscript. They are interested by the meanings of the verses, the meanings of the words that they are interested to the purposes, they are interested to the historical context, and all these aspects that are really related to human experience. And this is something that, in my opinion, is really present in the work of bio-curators, curators, or more in biomedical, for example, the work of experts like pathologists, maybe. But to score skills, they are valuable. And big companies like, for example, Google, they notice that, and they are bypassing the traditional academic channels to create a new discipline that makes humanity skills with data science. And data science is already a mix of skills. And why they are doing that? Because they noticed the importance and the role of intuition, purposes, meanings, for decision making to help human make decisions. And I invite you to read this article. It's from a fast company. It's a famous tech and business journal in the US. More concretely, what we are doing in DHS, one of our products, is the e-talk. The concept of the e-talk was developed by Claire Cliva and Frédéric Caplan, five years ago in a PFL. And it was deployed here in vital IT before arrived, actually, by Nicolas Labudin and Ioannis Inarios. And it looks like that. So you have the images of the slides. You have the speeches that is written. And you can hear the author's talking. You can hear the author's speeches. And this is an example from SID, from Frédéric Caplan on personalized medicine. And one of the features that we are proposing with the e-talk is the possibility to cite each part of the author's speeches with cross-link and references. And during the last four years in vital IT and now in the H Plus, we have a series of five series of e-talks with 22 distinct e-talks for a total of 28 e-talks and almost 800 minutes of recorded speech. Nice achievements. And last year, for the context of the course, we used Docker and virtualization solution in order to package the application and make it use by students and people in the workshop. And we also released a complete training on the e-talk that is available on Daria Teach. Claire is going to talk a little more about Daria just later. But this technology, Docker and this kind of virtualization make us think about kind of new editorial model, maybe, where you can chip the e-talk application that is encapsulated inside the Docker image to any user, can be students, can be scorers. They can edit their own e-talk in front of their computer. They can do whatever they want to publish it on Amazon Cloud or whatever. Or they can simply come back to us for review or editorial validation. It's not just a concept idea. It's something that we are experimenting and developing into several collaborations, for example, with sculpture. And we are still using it internally, actually you and Claire developed two different e-talks on Mark 16, one in French and one in English. And she's going to tell us about this project now. Thank you, Marcel. So now, Mark 16. Mark 16 is a three-mar grand project, so five years. And what is the classical enigma of Mark 16-8? Potentially, the Gospel-agreting to Mark could finish with the sentence in 16-8. So the woman went out and fled from the tomb for terror and amazement as size then. And they said nothing to anyone for they were afraid. If it's like that, it means that the second Gospel in the Bible could finish without the appearance of the resurrected Jesus. So it puzzled researchers since centuries, I can say. Although more that we have different manuscripts at the end of this Gospel with, as you can see, it's a list of different endings. So it's a recurrent enigma in New Testament textual criticism. Now, with digital humanities, we face a very important epistemological term for humanities, and I can show it for this project. Notably, we can look at the manuscripts online and it changes deeply the field of the textual criticism. We are attentive to the material of the research. And so, for Mark 16, we have no preparatory surveillance of it, no attestation before the fourth century. And Mark 16 ends in 16-8 only in three manuscripts, the two oldest ones, 01 and 03, and the manuscript, 03, 04, 12th century. So according to Keith Elliott, the manuscripts of 01 and 03 were aware that the ending of Mark was disputed. And thankful to the wonderful Codex Vaticanus, we can now see online and even download images. We can check it quite directly, because as you see here, there is a full empty column at the end of the Gospel, according to Mark, normally used to show that this Gospel puts an end here. And we can share with Keith Elliott that it is not a phasardus because all the other Gospels or other texts in New Testament finish with a not empty column. You have the other Gospel directly written on the last third column, for example. So it's time to come back to this enigma, and I totally share the opinion of the French historian François Hartog. Is it possible to write history from the point of view of both the losers and the winners, or with some French words nevertheless, alors que l'histoire des vainqueurs ne voit qu'un seul côté le sien, celle des vaincus doit pour comprendre ce qui s'est passé, prendre en compte les deux côtés. Une histoire des témoins ou des victimes peut-elle faire droit à cette exigence qu'on porte avec elle le très vieux mot d'historia? So what I want to do with Mark 16 is to show the diversity of the manuscript. That means the diversity of opinions in early Christianity about the end of these texts. And so we are building a new research model in humanities, a virtual research environment that hopefully will be launched at the end of 2019 to start the platform. It could look like that. Thank you to Marcial for the mock-up. It will be in my mind a reference portal on Mark 16 with as much as possible, depending on the copyrights, the material necessary for research on it, a manuscript tools with several volumes of Mark 16, an interpretation tool, it's a new tool, I will come back to it. And also, data map visualizations, we have several collaborations for that. For example, for the data map, we collaborate with Pelagios, who is an important product in UK, and we rely strongly on our scientific committee to get input, to have feedback from users, students, colleagues from other places. In this VRA, we will include multimodal scholarly productions and it is something really new to want to get also multimodal material, because usually we quote on the articles and books. It's very interesting, for example, here in this video of the Bible of DC project, to compare what the scholar is saying by oral, with rhetorical, pathos, ethos, and the stuff, and in a written article. So it belongs to the research to try to integrate all the intellectual productions about Mark 16. 2013-2023 has been code by Candela Castelli and Pagano, the VRA's decade. So Mark 16 is exactly coming at the five years of this decade and they give the following definition of the VRA that fits perfectly for Mark 16. It's a web-based working environment, tailored to serve the needs of community of practice and expected to be in contact to provide services to this community. It's very important and flexible for its lifetime and it promotes fine-grained control sharing of both intermediate and final research results by granting ownership, provenance, and attribution. So we are following all this important further of the VRA in Mark 16 and going even beyond that, notably thanks to the Swiss National Foundation that has established, as you know, the good practice to have a data management plan for the new submitted projects. And so we are in touch with Humanium for their service Nakala and Al and we are figuring to deposit our data at the end of the project on their open depository. So it belongs, though, to the VRA spirit to go until the data curation, data preservation on a long-term point of view. So our interpretation tool will be something to compare efficiently the different opinions in the projects. And of course, in the middle of the opinions, I will develop my own hypothesis about Mark 16 as well as our postdoc who arrived in April. But the main point is we need to show the diversity and to allow to the users to go in the material and to build their own hypothesis about Mark 16. So you have here the team and we are looking forward to welcoming them in April. That's for Mark 16. Now the H-2020 project this year. So this H-2020 project is led by Daria, the Eric in digital humanities, and the CIB has been involved since the beginning in it to lead, notably, as far as possible the Swiss candidate here to become a full Daria member under the umbrella of the Swiss Academy of Humanities and Social Sciences, who has started a consortium at the end of October, Daria CH. So we are working on a firm basis at a Swiss right toward the direction of this integration. This year is led by 17 partners and we are very pleased to have welcomed the University of Neuchâtel with Mathieu Néger and Simon Gabet on the road. Its purpose is to strengthen the sustainability of Daria with several initiatives and projects, and notably by organizing events, dissemination events, and we are precisely on the road to continue that. And six countries like us are accessing countries in this project, Israel, Finland, Spain, the UK, and so on. And so we try to go in the sense to go in Daria. And the idea is also to disseminate all the useful tools that Daria is providing to researchers and for infrastructure research. We have opened, thank you, Martial, a great DH plus desert blog where you can find all the news, what was happening in the past. For example, a very great... So Daria has become a partner in November, an important point, and in Neuchâtel, happened the Daria CH workshop led by Mathieu Néger and Simon Gabet and the team. And it was really an important event with colleagues from all the Switzerland, from Europe, also political implications. And we are trying to be strongly in touch with the European research for digital humanities. Why does it matter? Why does it matter for Switzerland to be in strong link with the European research? I take the example of all access publications to answer to this question. As you may have heard, the Coalition S at the European level has started a plan, Plan S, to hopefully get open access for all the fields as a reality in 2020. So it's a very ambitious program and the Swiss National Foundation supported and would like to try to get all the publications produced by the researchers at SNF in OR at the time. But of course, it's so huge evolution that all the partners have to be involved in. And notably at the governing board of the European Association in Humanities and Social Sciences, we are preparing a round table with publishers, those important actors of the question, researchers and different associations, and we propose a draft answer to the Plan S. And of course, at the European Association of Digital Humanities, it's also a topic of discussion. So for Switzerland, it's absolutely crucial to be in strong relationship with the European level as the European research, not only for digital humanities, and so we are happy to foster our relationship to DERIA. And now I come to the current Désir project at the Sib. We will be giving a lecture with Simo in Paris in May. And now I pass the word to Sarah who will speak to you to the DIMPO project. So a few more words about DERIA and the Sib involvement. One of the workgroups of DERIA is DIMPO, namely the digital methods and practice observatory. And this workgroup aims to develop and provide an evidence-based account of the emerging information practices, needs and attitudes of arts and humanities researchers. And we had a first survey released in 2016 and a second is now in preparation and as an active partner, we propose in collaboration with the University of Zurich to extend the inquiry about the digital practices in humanities in Switzerland by focusing on one particular group in the humanities, namely the Theology and Religious Studies field which is my study field. It will be interesting and innovative to focus on one group and we will be able to compare the results with the European results to ask specific questions regarding, for example, the digital use of religious scriptures and in this way to underline specificities for the field of Theology and Religious Studies. Now let me present another project of our team, the project Humarec. It is a SNF project that's run over two years and it is in some aspects still ongoing. The purpose of this project is to test a model of continuous publishing for the humanity and for that the project focus on one particular object that Marcia already mentioned, a manuscript containing part of the Bible and written in three languages in Greek, Latin and Arabic and we study specifically the pages containing the letters of Paul. The idea of continuous publishing is that we wanted to make each step of our research available online and for that we have developed several focus areas. First of all, we have worked on the best visualization option for the manuscripts and its text also. Secondly, the research work on the manuscript itself was published on our website little by little through regular blog posting and although the continuous publishing concept relies on the possibility to change and to correct the data and the result after an interaction with the readers and for that we have a forum on the website, although we communicate on Facebook and on Twitter in addition to the usual talks and papers at meetings. And finally we have created a format that is more like a book-like, the so-called web book. This is where the research done during the project was and is still communicated in a more synthetic way and the benefits of the web book is that in contrast to, for example, to a e-book, to a static e-book it has been designed in relation to our project website and in relation to other online resources and it is also integrated the continuous process. It was started and made available already during our project and I am still now updating and writing in the web book. And although we keep the different versions of the work and the versioning aspect is available for the readers as it is something that is quite interesting for a book. And finally we are collaborating for the web book with an established academic publisher, Brill, and they will organize at the end of the process a peer review and in case of positive review we will become our publisher and host the web book on their website. And this last aspect is very interesting because publishers particularly in the humanities are normally quite reluctant to new model and we are very happy that they consider our project as a valuable model. So that's all from me and for you Marat and I give the floor to my colleague Simon Gabet who will tell you more about Manuscript. Thank you so far. So I'm going to talk about projects about Manuscript and the Divinier starts with something that is quite famous which is the exhibition of Française once again and say another one. French people tend to do things very differently for some reason than the rest of the planet and for 17th century French text is particularly true. First things that when they did text they rarely look at Manuscripts which is a big problem but also they're not helped because there is no catalog for such documents so from which do we take the original text if I want to edit something? The second thing is that they completely normalize the old spelling and they make it like comprehensive French and therefore they are rarely studying what is the original text. The third thing is that they are usually forgetting what the philology is and thus not realize the two aforementioned points. So I would like to use digital philology as a way to go back to philology which is the science of the text in part really the science of studying Manuscripts and try to reread 17th century French text through digital philology which is philology. So I'm here to talk about the Néphanes project that has been funded in 2015 and 2018 and is now continuing with a new position at the University of Neuchâtel co-directed by two professors Marques Colant just a specialist of 17th century French and Alain Corbélary, professor of Mivon literature and there were two post-op to me and one that I think and who yesterday but the last version of the website online and we decided to test this idea about philology and 17th century French Manuscripts on Madame de Saint-Gigny which is one of the most famous writers of the Grand siècle to give you an idea of Manuscripts like that and the first step which was the most complex was to try to identify Manuscripts which were scattered everywhere on the globe and luckily I've been able to find new Manuscripts, unknown Manuscripts and also Manuscripts that had been seen except in printed versions which is the case of this one. This is very interesting because finding new Manuscripts helped us read them differently. For instance here, if we have a look we'll see that in the upper part of the Manuscript the letter is written by Madame de Saint-Gigny but in the lower part Manuscripts is written by Haudhosa, Madame de Grignan which is very interesting because until now the entire letter has been considered written by Madame de Saint-Gigny which means that we can re-attribute who wrote what in the letter which is interesting in a way that we can interpret more precisely what is in the text. The other interesting thing is that going back to the Manuscript helped us provide valuable graphic linguistic and literary evidences that helped us read the document once again differently. If we have a quick look on this Manuscript we will really see that in the upper part and in the lower part we have two different graphic systems two different ways to write French. The other part we see the word OVE which is written P-A-U-L-R-E and SUGES S which is a long S-U-I-E-T both times they use U rather than V and I rather than J which is a pretty ancient way to write. If we look at the lower part we see that DUN the apostrophe UN that we would write in French is written with the U at the beginning when her mother would use a V at the beginning because the V is always at the beginning and the U is always inside the words. Same thing that we see the apostrophe that the mother would never use. So we will be writing and reading the Manuscript. We can see that there are two parts in the way it writes the form of the letter, the hands that we call it and also the spelling system, the letters that are chosen rather than others and these help us understand that it takes better. Because we are doing digital philology not only philology, we try to transfer all this information into code and the most famous way to do it for digital editions called TEI as we can see here where we have chosen to go for a pretty high granularity where each word is encoded tokenized, lemmatized all start with more information about people that are incentivized to IDs and also handshifting paragraphs which help us to carry research at a level that is impossible before both at the same time on the old version but also on the modernized version because you can see in the code that sometimes you have a markup choice that helps you choose between original version and regularized version. So website now is online it's a beta version the last one was published yesterday we hope to publish the final version beginning of February and then we will slowly start slowly to publish more and more documents the ID looks like that so it's the exact same information that we've seen before it said that no we can read with many information with like the manuscripts and the macfax symbol on the left but also information on the right color when there is a problem in the correction purple when an abbreviation has been expended but also green when text has been added and we can precisely if the document where I did in the margin above or under the line which once again help us understand better documentation but what is more interesting is that not only finding documents but we have to find the history of documents so this letter was written from Paris to Grignan in 1690 when it moved out then we know that it was given in Monaco by the senate of Madame de Sévigny to someone called Madame Rosenhagen the letters start moving slightly more south which is a tragic a logic way for a manuscript to move on the map because rich people at the end of the 18th century are on the Riviera and they start collecting photographs and they meet other aristocrats and they ask do you have a manuscript of this person or this person they start collecting things then the manuscript keep moving it becomes the property of British politician Harry Grebe Bennett who died in 1836 in Italy but it's most likely problem that the manuscript stayed in London because it was sold in 1904 by Sophie then the manuscripts start moving more which is once again very interesting because I can see that no collections are well established and that London becomes a hub for manuscripts where all the manuscripts of France and Italy start being acted and you can still find many many documents in London and Oxford which help us find manuscripts because we know how they move so we have the idea where to look for them but the history of the document once again does not stop and in sold in 1904 it was most probably both by an American poet called Amy Lowell who when she died in 1925 bequived to the library of Harvard who will manuscript which is why now the manuscript move overseas and in the U.S. which is once again another way to help us find documents because we know that they move to the U.S. and actually we can find the manuscript of Madame Washington, Cleveland, Harvard, Boston and many other places in the U.S. and around Europe this is very interesting again because when you start looking for manuscripts in the history of document you try to document this history but when we have a catalogue of sale documents you have many documents that have been sold and sometimes you find another document of Madame de Sévignier and when I was in London looking at catalogs I found this document which is very interesting this is an autograph art in literature it's a sale catalog it was sold in 1932 in curse the problem that catalogs are scattered around the world and not probably collected so it was easier to access it in London and we see that there is a rare less inédite a son homme d'affaires relative à sa terre du bureau la terre du bureau being her land that she owns in Brittany and luckily enough with the talk with the catalog we had the facsimile of this letter which is Madame de Sévignier you will never find it which shows us that history of document help us to locate manuscript but also new manuscript that help us understand things but also retext that we had never seen and that editions can always be improved so documents have not been published online on my blog on which I try to publish documents that I found pretty much anywhere in the world you will find new letters of Pierre Bale you will find letters of Madame de Lafayette that has never been seen before and which leads us to two circles that I think that are very interesting is that the idea of reconstructing the history of manuscripts help us find new manuscripts but also that improving the quality of edition 40 is us to find new manuscripts which in its turn help us improve the quality of documents and there is a virtuous circle that is held by digital philology and like locating documents so now what are we doing and what are we plan to do the first thing is connecting our cell camera database to our edition should be done in June we are working on it and it is going to be published in the summer and also enlarge the scope the various authors of 17th century should have said Madame de Sédigny was only a test case if it works for Madame de Sédigny connecting history of documents, publishing documents it can be enlarged and as many other authors like Beau Suet, Racine, Boiseau and many others and the third thing that is very important is to provide tools for digital analysis of non-normalised 17th century because as I have said most of the texts are normalised which means that we don't have digital tools like lemmatization post-tags and many other things that prevents us from studying from the digital point of view the language of the 17th century and I hope that I will achieve that in the following month or years if I have the project that Martial mentioned at the beginning and then I probably give you the possibility to answer the seminar so thank you very much for listening to us and questions