 Well, hello everybody. Apologies for the delayed start. I'm Julia Martin from the Australian Research Data Comments and welcome to today's webinar on FAMES Mobile, a customisable electronic notebook. We are very fortunate to have presenting Associate Professor Sean Ross and Dr Brian Bolson-Stanton from Macquarie University. Sean is also the Director of Data Science and e-research at Macquarie. This webinar will provide an introduction to FAMES mobile features, architecture and multi-disciplinary use cases. You will also hear about the future of FAMES, possible direction for development and opportunities for collaboration. Now that's enough from me, so over to you Sean and Brian. Hi there. So, what I thought I'd do, I'm going to keep this presentation as short as I can so that there's plenty of time for questions and then I can go into more depth about whatever is of interest to the audience. But I thought it would be good to start with an overview. I'm not going to go into a lot of history of the project. That's something I can talk about if anybody's interested. But I thought that what I would do is start with a little bit of the broader context for FAMES that helps to explain the approach that we take and maybe situate FAMES in the broader schema of information infrastructure. So, we work in discipline. So, FAMES mobile is software for field data capture, especially human mediated field data capture. And it's used across a lot of disciplines in that. And that's something I'll talk about a little bit more later, the range of the disciplines. But it did start in archaeology and we started with a big stock taking workshop where we had about 40 archaeologists and it revealed some things that now, now that I've done more research into information infrastructures are pretty typical of small data research. One thing, you know, small data, like big data has been the flavor of the month for the last few years, right? And somehow people think that if data is smaller, that it's easier. But small data is a term of art that's used by Christine Borgman and others to describe the kind of data that emerges from communities where, yes, the size of the data may not be in total as large as some of the big data disciplines, like astronomy or genomics or whatever. But it has other difficult aspects of it. And one of those that proved quite important for us was a lack of standards or agreement on what data should be collected, how it should be collected, these sorts of things. When we started the project, we thought that we would make a handful of static data loggers for different kinds of archaeological activities, a couple kinds of excavation, a couple of kinds of surface survey artifacts, a few other things. But we got reactions like this that essentially showed us that no one was going to agree on anything and so we changed to a far more generalized approach. So this challenge was probably the leading one, a relative lack of standards, but that goes along with a number of other aspects that are common of small data, small science, long tail research, a number of terms are used in the literature. So our communities are smaller, they're smaller scale. Diversity is one of the key features of it. The questions, approaches, methods, the data, heterogeneous data, a variety of content ranges of structure. Another thing, again, that Borgman has really found in her ethnographies of field data capture in ecology is that the data, the structures that are used, the infrastructure, they're emergent, they grow out of the field work that we do. They grow out of the research questions that we ask and so it requires a certain amount of flexibility. And I've already spoken about the lack of standards in many, if not all of our disciplines, there's a relative lack of, or limits to infrastructure and funding that some of our colleagues in bigger data disciplines seem to do better with. I'm thinking particularly of a couple of conversations I've had with structural geologists complaining about the seismologists and how they get all the money. But anyway, but something that is coming for us is that we will soon have all of these problems of diversity, heterogeneity, limited resources, lack of standards, et cetera, from the small science side of things, small data side of things. But we are also starting to get larger and larger data sets from our field work from things like various kinds of geophysical remote sensing, photogrammetry, other approaches. So this combination made for a challenging context for the development of FAMES. The other piece of context that I would mention is that if we think about the research data lifecycle from planning and designing through collected capture, on through analysis to eventually archiving and publication of data, the data capture, the area that we're working in is probably the least mature for our disciplines. Many of our, not all, but many of our disciplines have had domain specific repositories. For some time there are some very well established repositories in archaeology and ecology in other field work disciplines. But then as you move closer to the origin of the data, the creation of the data, things get a little bit less, they get less and less mature. So when it comes to processing and analysis, a lot of that is still done at the level of the project with some virtual labs or science gateways in some of our disciplines. So that's sort of a middle level of maturity. And then finally at the call phase where we're actually capturing the data, this is where it's the most varied. Even within our disciplines, within any of the individual disciplines in field sciences, there's quite a bit of variation. And we found that even in archaeology when we started. Also the working conditions are often difficult in network degraded or offline environments and harsh environments for the equipment. So the other thing that came up, has come up is that a lot of the commercial solutions that are available, they weren't really designed for what we do and they may not be sufficient for what we do. The Bureau of Reclamation did a fairly extensive stock taking a couple of years ago of existing commercial solutions and as context for, or background for going out and sponsoring a prize competition for the design of field data capture software. And their stock taking that they did came to the conclusion that the commercial solutions aren't entirely satisfactory. There's also the problem with commercial solutions and we face this in archaeology that about 10, 12 years ago a lot of effort was put into and the direction that field data capture and archaeology went in was in the Esri ecosystem and then they changed their business model, business plan somewhat and it really made it sort of impossible for us to use their software. So in our discipline at least we've been burned once by this and are a little bit cautious about relying on commercial solutions where research will only ever will only ever be a small slice of what they do. So that's enough about context. I will go through where we're at with FAMES now, what FAMES Mobile, the current software that we have does its FAMES Mobile version 2.6 that we're up to right now. Software that we've been developing since 2012 and where we plan to go with that in the future. And again, I think I'm going to start with a bit of the broader approach that we take and then I'll drill down into some of the features and I know some of you may be most interested in well, what does it do and what are the features and I'm happy to go into more depth about that afterwards in the Q&A if I don't cover it enough now. So a few of the principles of our approach one is that we're research specific we're designed for field data capture in a research environment whether it's academic research researched on by consultancies heritage or ecological or geotechnical consultancies or by other research organizations and we developed a lot of specific tools that support some of the things that are necessary for us and there's just a couple of examples here with mobile GIS and some other things and in this case picture dictionaries to improve data quality. Another one is that we are generalized in the sense that FAMES mobile isn't a data logger that you can make a few extensions to it's a fundamentally customizable system Brian can talk more in the Q&A Brian can talk more about the specific approach we took but the short version is that we have let's call it core software that is an interpreter and it does a lot of the heavy lifting with synchronization mobile GIS structured data management these sorts of things version control other hard things but for you to use it you have to make your own customization and we call those modules and essentially what you do is create a customization file that interpreter reads and then instantiate your customization when you run the software on your mobile device so it's deeply customizable that way and if you go to the FAMES project GitHub repository you will see I think we must be up around 50 different customizations now that we've done with FAMES and you can use those files if you want and an interesting side effect of this is that those files that define your data schema and your UI your user interface which your workflow essentially those files they're more or less human readable their XML files and they really capture the essence of a project they capture that they constitute some really important metadata about a project in that they if you look at it you can see the data structures the workflows of that project so I think that they've got an important open research open science aspect to them in and of themselves so another thing that we did was in addition to being deeply specific deeply customizable is that we took a federated approach that FAMES mobile doesn't try to do everything we tried to be very good at one stage of the data life cycle at field data capture and then we hand off to other software you know I know other projects have sliced this pie differently and try to divide into end solutions within a narrower domain and that's fine that's a valid approach but we found that the needs requirements of projects across field work disciplines whether it's in archeology, geoscience, ecology oral history, ethnography the needs and requirements are pretty similar we solved generally enough for a discipline like archeology we could have started in any of these disciplines but if you solve broadly for one of these disciplines the solution is transferable to the others especially taking the generalized approach that we've done so far as well our software is open source and this is important for open science reasons, open scholarship reasons the core code is on github, GPL v3 and the modules which are the code for customization is also open I think they're mostly under creative commons licenses and it's all on github I will say that I think that this is important for open scholarship reasons I will admit that we have never had any external people, parties contribute to our core code base we have had people independently take existing customizations from github and tweak them for their own projects most projects came to us and hired us on a consulting basis or wrote us into grants or otherwise renumerated us to do their customizations for them but we have had a handful of projects run with it themselves and we did hire a technical writer at one point funded by a small nectar grant to produce a user to developer guide to ease that transition something that I would say is one real advantage in our disciplines where as is common of small data disciplines we have limited resources the cost of doing any given customization is depending on the complexity of your customization is going to be between $1515,000 say if you hired us to do it for you depending on its complexity I don't think we've ever had one that was more than about 15,000 for the customization unless the project needed us to add capacity to the core software and for that you get a piece of software that would probably cost you $150,000 to build from scratch if you went out and hired an android or IOS developer to do it so we're working on a paper now that looks at the total cost of the whole project including the ARC grant we won, the nectar grant we've won, other funding that we've won and adds that all up and then divides it amongst all of the customizations that we did and we still come out much less expensive for deployment than any given project than a bespoke mobile solution would cost so that's a little bit about our general approach to things what I'll do now is, and you've already heard me mention some of these but I will run through, you can see the list here of some of the key features for FAMES Mobile that I think the combination really distinguishes us from any of the existing commercial or open source alternatives to FAMES so some of the things that I'd point out are I've already talked through the fundamentally customizable approach that we have the other thing that we do that not much other software can do that I think is that capture and tightly bind a range of different types of data and this again is to speak to that heterogeneity diversity of the kind of research that we do that we can handle geospatial multimedia free text tabular data captured from instruments or external devices we can handle that range of data really to one another and get it and really leverage that binding in the sense that we can do things like when you export your data we can read your records and rename all of your photographs based on the data that's in the record so it could have the location you're working, the date, the time whatever you want and we can bulk rename all the multimedia associated with a project that way the other thing so that's the next thing that I'd really put is that was quite challenging at the time to develop is that everything works offline and it works offline in a very robust way we don't just cache data on your local device we have a copy of the entire data store on your local device and this combined with bidirectional synchronization that the device is opportunistically when they are online when they do hit a network they synchronize with the server and then all the other devices connected to that project will synchronize as well so it's bidirectional synchronization opportunistically and you can because the entire data stores copy to every device you can lose the server and lose 9 out of 10 devices if you've got one device it's quite robust that way and again this works opportunistically and the application is agnostic whether your server is local sitting in your room attached to a vehicle whether it can be local or it can be an online server we use an append only data store so nothing is ever actually deleted or changed and that provides us with a complete record history including versioning, rollback et cetera and again speaks to the way that we've really prioritized the preservation of all the data that you capture we developed a pretty fully featured mobile GIS I will say this is something where people said our researchers at our stock taking said they were going to use this a lot and it has been somewhat underutilized I'm not sure I would spend quite as much money on our mobile GIS next time as I did last time but we have layer management vector mapping rasters we can handle all of the basic stuff that you'd expect in a GIS connect to internal and external sensors via bluetooth or USB the difficulty of that just varies by what type of sensor it is we are multilingual we use pretty standard localization internationalization approaches we've had deployments in Bulgaria in China, in Malawi in South America that all utilized having two or three languages that you could swap between in the application and some of the things that we do to really help improve data quality is that we deliver very granular help and this can be text or images it's HTML essentially that when you're in a specific field you can hit an info button and get instructions about how you should record whatever you're recording we had one large archaeological project at Lake Mungo run by La Trobe University Nicky Stern down there that took about a 400 page field manual and chopped it up into little pieces and attached each a set of instructions with pictures to the specific field that it was related to and at the same time with that we we can also capture quite granular metadata there's a notes field attached to every single field in the application and you can kind of consider this the margins of your notebook when you're out or the margins of your form when you're out in the field so you can get very very granular metadata or problems that you've run into or anything like that captured in the records and we also do something elsewhere when you're making observations that are dependent on a person sometimes you want to indicate uncertainty and every field can have an uncertainty indicator. The last couple of things very quickly generalised export if you give us a target Brian may make a plane about this if you give us a target Brian can probably write you an exporter to hit that target and out of the box comes with CSV export and shapefile export and we've done a number of JSON exporters as well and we're designed to be part of a federated loosely coupled system and the last thing is we have put throughout our system some hooks to help make data more interoperable things like any place where you've got a controlled vocabulary checkboxes or dropdown or anything like that in the definition file you can specify a URI that will point that at an ontology or a shared vocabulary online. One of the more common uses for this is if you've got plant or animal species you could put the URI for the encyclopedia of life in or something like that. We do a number of data interoperability features to help make our data sets more compatible but doing it this way you could do it with quite a light touch and do most of it in the background in a way that allows researchers to do the research the way that they want to use their own local terminology but then map that to shared vocabularies or ontologies. I have a number of testimonials here I'll give you a minute to read a couple of them but I've already gone a little longer talking than I planned and I want to give you a quick preview of what we've got what we're planning for the future so I'll let you read that one maybe and what's the top one here is pretty good from a large project in Malawi an archaeological project in Malawi and the Siro Mineral Resources Unit in Western Australia has used FAMES extensively and seems to be pretty happy with us so one of the things that the team in mineral resources has let us know is just how much money they saved because of the increased efficiency of field data capture so just to quickly wrap things up so there's plenty of time for questions so where are we at now we've been developing FAMES since FAMES Mobile since 2012 we have mature stable software we're very well aware of its sort of limitations but it's nearing it's nearing the end of its useful life more and more of our stack is going out of it's no longer supported and we really can't keep it going the way it is now so two years ago we developed a high level technical plan for a hypothetical version 3 and we won that we were one of seven plans that were selected in that challenge.gov Innocentive.org challenge competition a couple of years ago that was sponsored by the Bureau of Reclamation and the idea was that was going to go to a second round then and we were one of seven chosen out of about 160 or 170 proposals but then leadership changed at the Bureau and nothing ever happened with that but we've got a plan one of the things we've been struggling with grant funding as many infrastructure projects do and we've come to the conclusion that anything that we do in the future has got to be commercializable and it needs to be commercializable as a self-service platform and open source consultancy around existing fames for several years and that brought in a certain amount of revenue but it's not really scalable so we want to move to more of a self-service model and to meet that in we've gone through Cyro's on-prime program I'm going to the on-tribe event next week in Melbourne we're taking advantage of what they provide and the most important thing that came out of that is as part of that program I don't know how familiar everyone is with it but the centerpiece of the program is doing interviews with existing and potential clients and we did over 70 interviews to give us an idea of how responsive our software is to actual client requirements and I'll say something about that in a minute we are considering applying a services grant with the same team that put up we put up three leaf applications with about half a million dollars of co-investment from universities and from Cyro from the mineral resources unit and the last one with the heritage consultancy as well and unfortunately none of those three were successful so we've given up on leafs we did have a leaf in 2014 but we've given up on them since then but we are considering the ARDC platforms and services call that's out right now but only if we can reconcile it with commercialization and I have a phone call with them tomorrow to discuss whether that's possible or not and we are running into a challenge that Brian and I and the rest of the team are very committed to open research and to open source software but convincing anyone that any open source software business models are viable is proving very difficult so the things that we must do technically and this is based on the 70 interviews that we did to improve uptake and this is what's on our plans now is the first thing was right now we did our current version we did native Android development and we got to do cross-platform and by surprise it was really a deal breaker for a lot of people that they couldn't run it on their iPhones so we got to go cross-platform the other major thing when we asked people why did you go use some of our commercial competitors instead of us or try to use them at least they said well some of your competitors gave us a graphic user interface based web application where we could play with it ourselves without having to code any XML to do a customization so we've made that a high priority to allow self customization via a web application we're also looking at improving the orchestration of deployment to infrastructure whether it's AWS or other infrastructure while also keeping the option open to have a local offline server data round trip to external desktop or online software this is basically the idea that you go out into the field you collect a bunch of data you get back to your base you synchronize with the server or you synchronize one way or another and then you want to open the data that you collected that day in ArcGIS or QGIS or whatever do a bunch of editing to that data and then have the edited updated data available in the field the next day where it can be edited again and right now we can't do that we can visualize legacy data but we don't have that round trip and that's something that our users and potential clients really want we'd be moving to a more modern architecture with an API that would allow access to the data that way we use ETLs we use transformations on export to hit a target that we want but we don't have an API and we've got a number of technical improvements we want to do that improve scalability in terms of performance on the devices synchronization the ability to really scale a project up and have multiple servers and have them synchronize with one another and then finally do some user management security one of the trade-offs we found with having a generalized system is that it is difficult to keep the performance up but we believe with the improvements in technology over the last 7-8 years that is much more possible now than it was when we designed the existing system so that's really what I had I've provided for you a list of our existing publications which include a technical paper a software publication of our existing system which also includes some discussion of socio-technical obstacles to reuse and we've got some case studies if I can find it major twice cut once is three case studies so these publications are available this presentation is available on github and here's the github for FAMES as well and that's really what I had to present and at least I've left a little bit of time for questions Hi Brian, did you want to take over from here? So I think at this point we'll be better served by answering audience questions I'm happy to provide technical answers to folks but I'd rather not go into technical detail about things that people don't care about so if our audience can put questions into chat Sean and I will try to answer them I've actually got a question from the outset it's probably more social than technical you've got such a diverse range of workflows in a growing library in the github files is there a community that you're developing around FAMES? I think we are trying to develop a community and it does tend to break down more by discipline in the sense that our archaeologists will share with one another and the oral historians and ethnographers will share with one another and we have had cases where projects unaffiliated they'll send us an email with either a question or say they've picked it up and run with it themselves when we still had funding we used to when we had grant funding we would run workshops and bring users together and do things like that we haven't had the luxury of doing that recently we do have a question here is there any comparison with redcap? I don't think that's a direct comparison there is an entire not just redcap but other software that was designed for social surveys my familiarity with redcap is not very deep I've had to learn a little bit more about it recently because there's been some uptake and I've somehow become the administrator of the system here but I'm relatively new to it but there are a number of pieces of software redcap is one of them and another one is the whole everything around ODK open data kit in 2010 or 11 the University of Washington in Seattle and their design for social surveys I'm going to go out and ask people questions and they've been extended to handle some geospatial data and to make some recordings but we took a really hard look at several points we took a really hard look at social survey software like that and found that it wasn't a real good fit for what field researchers do in the sense of going out and making observations about the natural world or the archaeological record or whatever I will say since then like solving for that problem also allows FAMES to handle we can handle social survey things I would say that's sort of a subset of the bigger domain problem but particularly we designed for use in remote environments offline unless you went and installed a redcap server I don't see how you deploy it to a remote area where you need to have 10 people out totally offline using their own individual mobile devices collecting things separately and then coming back in I'm not sure how that would work in a system like redcap and even ODK the last time I checked they were promising things like bi-directional synchronization but didn't have it yet so there's some pretty fundamental features that the social software survey doesn't have researchers need the best way to think about it from a technical perspective is the number of spreadsheets you need so most of these answers that I'm going to be providing right now are taken from my experience with Qualtrics rather than Redcap with a grain of salt but if your data collection fits well within a single spreadsheet with multi-valued attributes you don't have child entities and so on the simplicity of survey software that expects here's my question here's my question works very well and Qualtrics has an offline write-only mode so you collect data and it eventually goes up to its server what these systems don't do that we focused on is the ability to edit the ability to view and the ability to have multiple tables or spreadsheets in a single data collection session and so we solve very different purposes and FAMES has a complexity cost because we're so general so that if you do have a survey of what's your age, what's your occupation then the survey specific tools will fit that need better but if you have a more nuanced okay tell me about your household now tell me about the folk in your household and give me their employment history we have an oral history project that needs to use FAMES because of the complexity of their data collection and the only other thing I would add to that I think the closest analogy of software that certainly that we look at when we are thinking about what do we want to do with FAMES and when we look at business models and other things I think we are probably closest to lab archives but where they do laboratory work we do field work and I've had a couple of conversations with the lab archives people and they just will not touch offline work with a 10-foot pole because they really are focused on making sure that ensuring the integrity of all their records in the sense of being able to go back three years and see who discovered the gene target that led to the billion dollar pharmaceutical to resolve the lawsuit that's happening over that and they feel like they've got to have a full time connection to their servers to allow that kind of auditability and be able to really guarantee what was done when by whom which we also record and I think we do it in a reasonably robust way but lab archives they're not interested in offline stuff but I'd say that's the closest software to what we do. A couple of other questions coming in through here is there any way to install on a laptop rather than an Android mobile device and we have an Apple mobile device? So let me answer that one. We've explored a number of ways of emulating an Android on a laptop and the short version is for FAMES 2.6 No. There's a native Android system because we have so many interactions with the hardware of the Android and one of our hopes for FAMES 3 is that now the state of the art has moved on from 2012 we can indeed generalize to arbitrary platform but with the same sensor capabilities. Thank you. Next question would it be possible for those listening in on this session to record their interest in your proposal for the upcoming ARDC platforms EOI and if so where do we register this interest? Yes please contact me if you are interested I've just been emailing our existing our big existing users and the investigators and organizations that were on our last leaf which we put in in 2018 so I would be yes I would be really happy to have other people join other organizations other investigators join it so I think I've provided my email to you and I'd be happy to have anyone who's interested contact me. And just to speak about the different levels of interaction we plan to have this be both a service, a self-hosted service and something that we can do as consulting but if you're interested in specific features especially integration with external hardware like a Bluetooth printer or something that no other software does and you're pretty sure that we won't think of the best time to get involved is when we're designing and plan the next generation. So there's a higher amount of time needed if you want technical input but now is the time to provide that. On that note I understand that you do have a version which has integrated IGSN which is the persistent identifier for GS samples. Are you considering other PIDs as part of the design process? Yes I think we have we've worked with Jens Klump and others at SIRON Western Australia to figure out different kinds of workflows for how do you assign a persistent identifier when you're in the field offline for two weeks at a time and we've got a couple of workflows for that and I think that they would be transferable to other persistent identifiers as well and we aim to anything that we do like that any feature that we implement we always try to generalize it as much as we can so that we solve as many use cases as we can and I don't see any technical reasons if we can do IGSNs why we can't do others. Right now in FAMES 2.6 all of that identifier minting outside of the allocation of number ranges is handled on export and so because of that in FAMES 2.6 the response is we can handle arbitrary data export to arbitrary or identifiers because all we need is a way to speak to the system where we're minting the identifiers in. Thank you. We won't get bogged down in more detail than that. Happy to take other questions. The previous question about integrating was asked what is your email can you please just say your email address so that they can get in touch. Sure, you can reach me at www.ross. R-O-S-S at mq.edu.au and hopefully we can ask our organizers to put that into chat. Also I'm interested in digital skills and data literacy of FAMES users as interaction with the tool impacted on users interest in and development of data skills. We could talk about this at some length. We have a paper about this. You start Brian and then I'll get mine. If you're interested in some of the socio-technical aspects and having students use it and develop it, take a look at Parker van Valkenburg's paper and Adela Sobotkova's paper on both the implementing it outside of FAMES that's Parker's and the socio-technical barriers. Most of what we found is that people are not that interested in how the tool works, but want to use it in their own research. That has shaped many of our interactions and consulting models. What I would add to that is if you make a tool that is customizable then it requires customization and what we found is the biggest barrier to people going out and customizing it in their own isn't necessarily the difficulty of producing the XML code that it takes to set up your interface, but it's more fundamental lack of training and background in data modeling and workflow modeling, particularly the data modeling. When people, digital systems are far less forgiving than paper the typical archeologist may very well write up a paper form on the airplane when he's flying out to run a project somewhere being a little bit facetious, but only a little. If you get a paper form that is close enough then fine, it'll work in the field because people just flip the form over and write on the back. The problem with that is it leaves you with an enormous data cleaning job at the end. What we do is move that up to the front where digital is much crunchier it's much less forgiving, you really have to specify everything you collect and the details about it requires this data modeling and implementation at the beginning but then your payoff is at the end of the season you hit the export button and you've got clean consistent data. Going through that process of data modeling and to a lesser extent the workflow modeling behind coming up with a user interface is usually a real eye-opener to projects and it makes them really think about what approach they're using, what methods they're using how they conduct their field work it often reveals other problems in their approach or methods it can be quite transformative when you're forced to specify all your data that you're collecting at the beginning Conversely what we found is even if doing that at the beginning saves two years of time at the end in data cleaning those figures would not be out of line for a medium-sized archaeological project it is very difficult to get busy researchers to invest the two months up front to save the two years at the end because everyone is so time pressed and discounts their future time too much and there are a few things that have been one of the most interesting side effects of our project. Thank you. Do we have any last minute questions before we wrap up almost on time? No, not at this point please if you do have any further questions you do have Sean's email address or you can put them through to myself quite a number of people are saying thank you very much and how interesting this session has been and we very much look forward to seeing what FAMES does in the future So one note for the folks interested in more details Sean's PDF doesn't end where he showed you. We have about 20, 30 more slides in that so if you're interested in more details I encourage you to look at that PDF and more resources are available therein Alright Splendid, people saying a fantastic backstory and features of FAMES and look forward to reading more and the PDF I'm assuming is in the Github? Yes. Onscreen. So and we do welcome collaborations the expansion of our collaborations that we have now that we designed this software to be usable across a range of disciplines anywhere in the world and we would like to work on building our user community And thank you all for being an audience with fascinating questions Yes, thanks