 Well I guess we'll go ahead and get started. Good morning everybody. My name is Stuart Barnard and I'm the Digital Scholarship Coordinator in the Woodruff Library at Emory University. And this is my speaking partner Jay Barnard. He is the Digital Scholarship Solutions Analyst, also at Emory University. And there is some relation. We, our brothers, have been our whole lives. We've had a hard life since then. So the big part of what we do at Emory is work in what's called the Digital Scholarship Commons. This is a melon-funded initiative to try to figure out how to build the Digital Scholarship support inside a library that truly takes advantage of being in a library. So what we're, you know, that means a lot of things. We have a software development team in the library, so we partner with them. We have partnered with the metadata librarians when we're working with a scholar who's building out a project. We've found that it's good to ask questions about metadata and sustainability and copyright up front before you spend thousands and thousands of dollars. So that's what we're trying to do in disc. So as such, our projects tend to be really collaborative. And the goal is to produce excellent projects that are also technologically sustainable. Right now, disc has about 10 projects going. And half of them are in development. The other half are more or less complete. They were learning that it's kind of a question of what does it mean to be complete sometimes when you've launched a project. Let's see. The projects that we do really vary in scope. We have some small projects. And for example, we have some graduate fellows who work with us. And over the summer, I guess, back up a little bit more. One of our software developers had this idea to write a little Python script that would listen in on Twitter and collect any tweet with the hashtag AWS during the Occby Wall Street protests. So this ran for about a year and we had 10 million tweets. So at the end of the year, we had 10 million tweets. We had a new cohort of graduate fellows coming in and I said, Hey, graduate fellows, here are 10 million tweets. The one or anniversary of Occby Wall Street is in two weeks. Do something. So they did. They started those websites. We had to stop them from calling it Occby Wall Tweet. But the Tweeting AWS website was put together about two weeks and just does some really pretty basic visualizations on some of these, on the data. We also, I guess, this past, actually just a few weeks ago, we realized that we had a collection, a digitized collection of Abraham Lincoln's, or what was it, sermons delivered on the occasion of Abraham Lincoln's assassination. So thinking that there might be increased interest in this with the Oscars, we once again sort of said to the graduate students, here's the thing, do something. And so they ran a bunch of text analytics on this corpus and had a good time and built this website. We did let them call that one Lincoln Logo Rhythms. So these are small scale projects. They lasted a couple of weeks. They were mainly done by the graduate fellows using open source, freely available tools they found online. We do have some bigger projects that we do. For example, we worked with some faculty in the art history program to take this map of ancient Rome, which was on each of these panels, maybe two feet by foot and a half, something like that. So we worked to stitch them together and using Microsoft DeepZoom to go in and out of it. Then we worked with one of the developers in the library to build some functionality in that so that the map can be annotated by students. So this becomes part of a classroom project. The student is going to highlight buildings and write a little something about them so it's going to be this growing resource. We're also working with some faculty to build a mobile app for the battle to help people move through the different sides of the battle. I'm afraid that might be the sexiest picture we have right now, but it's still in development. Obviously a much longer process than two weeks. Let me back up. Stay on this slide for a second. So even though some of these projects are a lot bigger, the grand scheme of things, these really aren't that big. Particularly the grand scheme of things that a large research university like Emory will do and even in terms of something that a large research library like the Woodruff Library will do. So I think we had a problem that a lot of other sorts of projects have which is where can we play? Where can we just build this stuff? So we had a few different alternatives. We could use the university web architecture. The university uses a system called Cascade to build all of its websites. This is a really stable, sustainable thing and it's really supported. It's a powerful office that is dedicated to sustaining websites that are built in Cascade. Would this work for all of our projects? No, not really. It was cost effective. It wasn't very accommodating. It's pretty stable. It's not very flexible. To move on to the university virtual machines, we can pay for this service through central IT. This is also pretty cost effective. It's relatively accommodating, really. It's pretty stable. It's not very flexible. It's still, and I really shouldn't, that's probably not exactly the right way to put it, but what I mean by that is if we have an idea, we can't just do something. We have to contact UTS, so you can contact the central folks and negotiate that. It's not a big deal. They're pretty accommodating, but it was just one more layer. We could have a box in the library, cost effective, depending on how you look at it, probably. We might have a box just laying around that we could repurpose. Accommodating, not really because it seems to create some levels of anxiety when you start attaching boxes to networks and then just saying, we're going to do stuff. Don't worry about us. That's not what anyone wants to hear. Is it stable? Relatively. I mean, it's stable as a box in a way that could be flexible. Maybe. Not really. Particularly, if we built something that was kind of experimental, it's kind of cool, and then we wanted to turn it into something that's public-facing. Maybe that box isn't big enough anymore. Maybe it's not robust. So share hosting service. We could just go get dream host accounts, and that would be fine. Relatively cost effective, not really accommodating, probably stable, but not as flexible as we wanted to be. Virtual private servers are the same field. We got down to Amazon's cloud service, the Elastic Cloud, and that seemed to hit all of our buttons pretty well. But one, I guess one comment we left off of this table is how tried and true is this. It's something that the library hadn't done. That we didn't know for the Elastic Cloud. So we thought that it seemed really promising. It seemed like something we could try. But there's a little concern. What kind of broke the tie for us, I guess there were two things. One, this seemed like an exciting direction in which to move. It would be difficult to get production services, enterprise services to just move into the cloud and see what happened. That's probably not what we wanted to do. But the disk projects are actually small enough, and the grand scheme of things, low stakes enough. We could just do this and see what happened in some way. And the other thing is Jay started working for us in the library. He had done a lot of work in the cloud, so he had some experience with how this works. Well, this seems like an exciting direction. Let's give it a shot. So that's what we do. That's how we came to the decision to start using this cloud space of the Amazon service. And I think that's their logo. I know that's their logo. And the clicker to Jay. Thanks. So as I said, I'm Jay. I'm the Digital Scholarship Solutions Analyst. It's kind of more of a dev out job for me. I do some programming, and I do a lot of this admin. My main background is in systems administration. So I'm going to be a little more technical and talk about what we're doing and how we're doing it. So what we're running in AWS, basically, we really have three servers. We're instances. In Amazon, right now, currently in our production server, we have a lot more process running, an auto publishing script there. Some of the guys on campus that do it, an online journal, and they have this whole period here so we wrote a script that automatically updates their site for them when they're done. And we have this on the Twitter harvest, which is going to, it's a Django out, basically, just flesh out of the heart of it. So we have a development server, which is kind of our playground. We're just going to do whatever we want on there. And all of our backups are done through Amazon. I'm going to go through how that's done and how we maintain our data. So AWS terms. When I was preparing for this talk, I was reading a lot about how, you know, you had to learn all this new terminology, and it's really not that hard. In that sense, it's a server. On EBS volume, it's elastic block storage. It's just an attached storage that, you know, it's just like plug-in a USB drive into a computer. So here's the group. That's your firewall roles. And AMI is just the machine image that you're going to create a new machine from the whole system. And the snapshot is a block-rubble copy of the data that, you know, is from the elastic block volume. However, you shouldn't think like that when you talk about Amazon, because the whole point is, you know, it's not a fixed infrastructure. Infrastructure is a service. You know, EBS is really isn't a server. It's all code. And you just have to think about your infrastructure as code. And all these are just building blocks. They're not physical things. And what's great about physical things is we all know breakdown over time. So here's kind of how we have our system, you know, architect to work with. So there's a security group that is attached. So if you think about it as a programmer, all these are objects. Just each little thing is an object, and, you know, you move them around and repurpose them. So, you know, you don't have to repeat yourself a lot. So that goes to an instance. And in this instance, these are my items here. This is a little shattered out. It's backed by an AMI. And that's what the instance is created from the AMI. And just like the EBS line which attached to the instance is backed up by a snapshot. And again, a snapshot is just a block-rubble copy of the... So it's not a file-based copy backed up. It's actually blocks. So this is kind of now how we break it down. Where, yes, this is just the bare system. It is just the operating system and the service packages that go with that. All of your business stuff needs to be on the EBS volume. And by that I mean, you know, all of our stack, the database files, any configuration files live there. The idea is that, you know, we all know the server's died. It happens all the time. And, you know, if you architect it this way, then, you know, all your patches can fix, for example. It can be on that EBS volume. You don't have to go and, you know, remake the server every time it fails. And again, I'll get into that a little bit more later. But a good example of kind of how this worked was the UWS site that Stuart talked about. That launched and, you know, it was fine. But it got a lot of traffic, so Brian, our colleague, tweeted it. And he has a lot of Twitter followers and all of a sudden, over 100 people rushed to the site, crashed the database. So Stuart lets me know how this happened. And I said, I was at jury duty. And, well, that's a problem. And so I got the database back off once, just, you know, tethering my phone, laptop, but then it kind of kept crashing. And so on my lunch break, I went to the bottle house. Because I needed some coffee. And I set up a bottle house, and while I did the work, I stopped the server. I increased the size of it, just the instant size, each instance has a different size. Increased the size of it, that gave us more RAM and more CPU, started it back up, and it's been healthy ever since. So this allowed me from a booth at the bottle house and downtown to fix a problem that if we had been on a UTS server, like a VM, been like, hey, can you make us a new server? And then a day or two later, they would have said, yeah, here's your new server. And then I would have to copy all that stuff over and rebuild the server on that new one. Where it's here, it took me five minutes, and I got to enjoy my coffee on my lunch break. So, going back to this model, this is just sort of, this is kind of how it works with it. Internet goes to the Amazon YouTube security group. So basically what happened, or, so what happened is, you know, this instance dies. Which, I mean, if you buy sports in IT, you know that that happens. And it could happen when the server's a year old, it can happen when it's seven years old. So all we do is take this AMI, create a new instance, move this data volume to this instance, and start it up. And if you get back to the number, so the database lives on, so I don't have to import the database every time. I don't have to, like, go and find the backup and dump up the database and import it. The database is already there. And hopefully it's on front. And I can't, and that'll work. If it is front, then we get a backup. But the fact of the matter is it's really quick, but I don't have to go through, I don't have to copy over my Apache configs because they're already there. I don't have to install Apache because it's already installed the AMI. You could also, obviously, use a, and probably a better way of doing this is using a configuration management tool like Publiver Chef to apply those packages. And that's kind of the, you know, the AMI, and building an AMI and, you know, updating it and whatnot. It's a very iterative process whereas using something like Publiver Chef would be a little more declarative where you just describe the people out of questions and hopefully we're going to have plenty of time for questions. All right. So the cost is the thing. And, you know, so I was reading, I was kind of reading through various things about the cost. You know, Rackspace is a great, you know, offers a very similar elastic cloud. I think that Amazon's a little further ahead in some of their offerings. Like everything basically kind of came down to, if someone was back in Rackspace, Rackspace was cheaper. If someone was back in AWS, AWS was cheaper. So I don't like, all right, let me show you our bill. And so you can actually see what they charge for. Some people get a little cranky then. Like for S3, they charge for Get Request whenever someone, but it's like a penny per 10,000. And I don't know, you know, that's really going to hurt us too bad. I think they just dropped the price I saw in the email when I was getting on the plane to come out here. So this is kind of what they charge for. So the difference in size is that I was talking about. You know, so they charge you by the hour, which is a great thing as we are a web server that I mentioned. It only runs from 7 a.m. to 7 p.m. all weekdays. So we haven't been, we're also talking about the API that Amazon has that, you know, I have a script that runs on a crime job that every, you know, morning at 7 it starts it up and every evening at 7 it shuts it down and then it takes the weekend off like that. And so we only get charged for those hours that we're using it, which is great. Whereas at UTS, you get them 30 bucks a month and they're your server. So this is a little more flexible in that sense because they're only charging us for the hour. And, you know, so we haven't quite hit our 2 million IO requests yet too long. But, you know, these are all kind of minimal charges, but, you know, it is something to think about because if you are doing a lot, you can start that out. I think it's still cheaper and it's better. It's kind of the important thing to remember. Like, when I was doing this, I realized I don't really need this. I can kind of kill some of those snapshots. I don't really need all of them. It's just one of those things you got to remember to turn the lights out when you leave the room, the kind of thing. And that's what's, again, great about the API is that, you know, you can automate these things. And so it'll last a few justice and whatnot. And then it's, you know, broken down through different regions we've had. And I was playing with something just in the west region to see if I can get them really nice. So, and some of the other regions cost, you know, is a little different price. So I've kind of talked about pros throughout here, but still with some alloc. Scalability, like I said, like with the, you know, just making the server bigger within five minutes was great. You know, a better way to do that is that they offer elastic load balancers where you can automate that where it would see all the traffic coming in and create new servers as needed and then kill them off as the traffic dies down. And, I mean, the scalability part is, I mean, that's the elastic part in the elastic cloud. Low-end and overhead. Like I said, you know, we're saving money by shutting that server down all the time, but it's not like I have to remember to do it because if that was the case, I wouldn't remember to do it. And so it's, you know, it's very scriptable. You know, even the creative server and catching the volume, you could script that pretty easily. Which is, you know, with the API and basically anything you can do in the webinar phase, which is everything, obviously, you can do, but firstly build it in the API and then they push it to their webinar phase. And it makes experimentation easy and cheap. A great example of this is that we're using DBPD for one of our things. You know, you can actually run your own DBPD instance if you want to. We thought that might be fun to try. Apparently, the developer, he was getting some, you know, the response times were sort of varying. So I was like, you know, let's try and solve, let's try to run it. And we did. It's a really big server to run DBPD in. It's a lot, it's really ran intensive. So it wasn't really cost effective, but, you know, we spent 12 bucks one day trying, trying it out. And, you know, then figured out we could. But, you know, if you think about like, well, let's see if this box can run it. Nope, that box can't run it. So now we have to rebuild this other box and then figure out that well, we need like a $4,000 box to run this. That would have been longer. Like we spent three hours, 12 bucks to do it. So, you know, it's just that kind of stuff that makes it easy. And again, with the instance itself is in that model that I showed you being, you know, breaking it, breaking out the architecture that way makes the server itself really disposable. You really don't have to care about the server itself. Because if you screw it up, you just make another one. You just kill that one and make another one. So it just really kind of frees up particularly from a sysadmin background. You know, like, no, don't install that package. I'm going to install. Let's see what happens. You know, it really makes experimentation fun and awesome. I think you're going to talk about the odds. So that probably made everything sound wonderful. But there are some downsides that we found. One of these things that I think we probably suspected in you and some might have been more of a surprise. J-Mate, although that sounds easy, right? But there is a little bit of a learning curve to figuring this out. And this is just what I've heard from other sysadmin because I'm not a sysadmin. I'm a digital scholarship coordinator. So it's obviously a very similar kind of job to what someone who's been taking care of servers for a living has been doing. But the vocabulary is a little different and it's a slightly different mindset. You're not, as J said, you're not so much dealing with physical machines as codes and scrubs. So it's nothing that people can't pick up. But there is a learning curve and if someone's been a traditional sysadmin for years, there may be some resistance to doing it this way. Fear. There's a lot of different kinds of fear. One would just be the fear of learning something new if you're the person that's going to be in charge of taking care of all of us. Another kind of fear, and I was thinking about this the other day as I burned all my CDs. They're all in Google Music and I have a Spotify account and it's really easier to look stuff up on YouTube anyway. And I've got these boxes of CDs that I cannot get rid of because they're all scratched up. Most of them don't work. This stuff in Spotify doesn't skip. But I can't get rid of them because what if? Just being able to look at your stuff is somehow satisfying. And that's a problem with cloud things. They're somewhere. And a pretty big corporation is running it. And they have all my stuff. Now their stuff is way more stable than my stuff. But they could decide this isn't worth it anymore and turn out the lights and where are we? And that's sort of one thing for my CD collection. It's another thing if you're a cultural heritage institution that needs to take care of stuff in perpetuity. And if you're trying to keep track of a scholarly record that's been cited and will need to continue to be cited for as long as we can proceed. I don't really know if I could tell anyone about that. It's out there I think that the system has proved stable. I think that if all of this goes down the tube at that point we'll probably have bigger problems. That will be something bigger that happened. But it's out there. And it's probably worth trying to figure out what you can risk and what you can. How did this stuff go down? I think Omeca had a fairly famous or at least famous individual scholarship. Their Amazon stuff crashed one weekend and it just wasn't there. It goes out. Physical servers that you have in your server room go out as well and this happens. But it's not a perfect system that works all the time and no one can guarantee that. This is a little bureaucratic but you get a variable bill. Depending on how much you use your bill is going to be different each month. And whoever is keeping track of your books may not be happy about that. This is the mindset of, or the mindset shift from thinking of server space as a capital investment and hardware and moving into thinking of it more like a utility. Sometimes your water bill is more than it is other months. It's kind of like that. We think the way we were able to, actually I don't know how we were able to talk her into it. She was not thrilled but we made the case that this was worth trying and that it probably wouldn't get too astronomical even if we messed up. So I haven't heard any complaints since then but check with your account and see what they think about this. The main name is DNS. Actually I'm going to pass that one to Jay. We sort of had some issues with institutions getting a Emory.edu domain to point to an IP that's not on Emory's network. The right person asked the other right person and it just magically happened but we had a meeting the day before where one of the main guys at UTS was like, no, no, no, you cannot do that. So we had registered Emorydisc.org but then we somehow, I don't know what Scott did but he just asked the right guy and the guy just did it. You can see where that would be an issue with some institutions. And it's not, again, when I put my sister's statement I'm like, yeah, I get that. I get why you wouldn't want to do that but I'm glad that they did. Thank you so much.