 Good morning and welcome to this week's edition of Encompass Live. I am your host, Christa Porter here at the Nebraska Library Commission. Encompass Live is the Commission's weekly webinar series where we cover a variety of topics that may be of interest to libraries. We broadcast the show live every Wednesday morning at 10 a.m. central time. But if you're unable to join us on Wednesdays, that's fine. You can always watch our, we record the show every week. You can always watch those recordings at your convenience. And I'll show you at the end of today's show where you can access our recordings here on our Encompass Live website. Both the live show and the archive recordings are free and open to anyone to watch. So please do share with your friends, family, neighbors, colleagues, anyone you think may be interested in any of the topics we have on our show. For those of you not from Nebraska, the Nebraska Library Commission is the state agency for libraries in Nebraska, and that is for all types of libraries. So you will find things on our show for Publix, academics, K-12, corrections, museums, archives, it runs the gamut. Pretty much as long as it has something to do with a library or libraries, we could possibly have it on the show. So we have presentations from the Nebraska Library Commission staff sometimes for services and products we offer here, or things we're doing through the commission. And we also bring in guest speakers sometimes to talk about things. We have a mixture of types of presentations in the show as well. There'll be interviews, book reviews, demonstrations of services and products that we think might be interesting. Cool things that libraries are doing out there. It could be all sorts of different things on there. What I'm going to do first here, though, is I just want to, before we do get into today's topic, I want to talk about here in Nebraska on the Nebraska Library Commission website. We are still here across the country, still in the height of the COVID-19 pandemic. So I want to just remind our Nebraska libraries that we do have resources here on our Library Commission website for you related to that. This is a post we have that is pinned to the top of our webpage. It'll always be there for you to get to all the resources we have. We also have a list that we are maintaining. Started back in March when libraries started closing and mentioned accommodations they're making. Are we open with just Wi-Fi in the parking lot? Are we doing, when are we, when we start doing the curbside pickup, all of those kind of things. We've now added when libraries are reopening and what the situation is then and what the rules and regulations are for that. So if you are in Nebraska Library, check this, if you're visiting library. Look up your library on here if we need to update the information. Contact our reference department here at the Library Commission and they can update your info on the list here. On the pandemic resources page that we have here, we have a blog post here that just talks about it. And then we do have maps of our Nebraska libraries, depending on what their situation is, if you want to look at that. But this is our sub-page with all the specific information. If you're a business, if you're worrying about your kids, unemployment, financial help, lots of different types of resources. But I just want to highlight the page specific to libraries. We have a range here, closings, reopenings, information about how you can do that, policies, summer reading programs, how to hold a meeting for your library board. Always updating this page. So if you update information, check here and see what we've added to the site. If you know of any resources, things should be on here. Let us know that as well. We are, of course, following very closely. I hope everyone is the realm project from OCLC and IMLS. This is scientific study, which I think is a great thing. Finally, specific to libraries. What, how is it affecting libraries? What we need to be doing? They did testing, one of the most recent big things they did. And they're doing more of it, testing on actual materials that return. How long does COVID-19 last? According to their tests, the longest they even had any of it noticeable on the materials is three days. So three days is a max. You should even need two quarantine items. Just set it aside, do nothing with it. COVID will die and then you can re-send out your items. The next thing they just put out was some other reopening plans. We have things here that we've put together from the commission to Nebraska, but across the country, other reopening plans. If you're looking for something in your library to follow. So just wanted to make sure all of our libraries here in Nebraska are aware of that. If you're not a Nebraska library, you're welcome to look at our resources. They're just right beside there. But check in with your state library association or your state library. And they may have the same kind of resources out there for you as well. All right, so I am going to hand over a present presenter control now to our speaker this morning, Katherine. Get your screens up so you should see a little pop-up come up now about being able to show your screen. There we go. And you can do your presenter mode there to get it full screen for us. Oh, right, so here we go. All right, so this morning with us, we have Katherine Frazier, who is from North Carolina State University. Good morning, Katherine. Good morning, everyone. I'm going to go ahead and mute my camera for the duration of the presentation. But I will have it on later during the question and answer session today. So I just want to say thank you, Krista, for that introduction. And some of that information about the duration of COVID on materials was new to me. I hadn't heard the three-day information yet. So that's super exciting to hear. Yeah, they're doing some really good testing there. Yeah, if you just go to their page, they're testing lots of materials and they tested just five specific things in their first run through. And they even have other types of materials they're going to be testing as well. So yeah, take a look on that. We've got phase one and phase two. I'm just looking at it. So yeah, it was just the end of June was when they released the results of the testing. So it is new information, yes. Yeah, in North Carolina, we're really looking at restarting everything here. So that's going to be super useful. But hi, everyone. Welcome and thank you for coming to my talk. Here's what Python does for us. What can it do for your library? Like Krista said, I'm Catherine Frazier and I'm coming to you from the NC State University Libraries in Raleigh, North Carolina, where it is super hot, humid and cloudy today. So great summer day. Today, we'll be talking about the sun here. Then we have the hot and humid, but it happens to have sun today. Oh, no, that's even worse. It is. So today, we'll be talking about a specific Python script I developed at NC State to streamline collections, purchasing decisions. We'll also cover the process of learning new technology skills and developing a Python project from the idea phase to actual scripting. So just a really quick note before we get started, the script I'll be talking about today works with Gobi, which is a product of Yankee Book Peddler and its parent company EBSCO. I am sharing this information today with Gobi's permission and EBSCO's permission since what I'm talking about does cover things that exist behind Gobi's paywall. And it does go into how NC State uses Gobi. And that means parts of what I'm talking about will only be relevant to current Gobi customers since it does involve logging into that platform. However, we'll also be covering a lot more information that will hopefully help anyone, even non-Gobi customers, feel more excited and confident about using Python and technology in general to solve their problems. And just to provide some context about NC State, since I'm imagining a lot of our viewers today are not local to the NC State, we're a large public university with over 34,000 students. That's undergraduates and graduate students since we're a doctoral granting institution. The NC State University Libraries primarily serves a STEM focused population since the university is known for its agriculture, engineering, textiles and veterinary medicine programs. We do have some non-STEM areas on campus, such as our College of Design and College of Humanities and Social Sciences. But a lot of the work we end up doing in the libraries tends to focus on that STEM subject area. In the Collections and Research Strategy Department, where I currently work, we're responsible for managing the library's collections. Our day-to-day work in this department is largely made up of analyzing usage data, investigating resources and responding to campus requests. So we work a lot with campus users in helping them figure out which resource will best fit their needs, as well as investigating new resources that the users are suggesting to us. We also work a lot with the vendors and publishers as we try to work out deals for NC State and then consortial deals with other university libraries as well. So now that we're through with that context, let's talk about what we'll be covering today. We'll begin with how programming projects get started, such as how to identify project opportunities, as well as best practices for learning a new technology skill like Python. While I'll be using Python as my example today, some of what we'll talk about here is definitely relevant to learning other scripting languages like R or Java, and even non-scripting technology tools. So no matter what type of technology you're interested in, I hope you can take something away today. Next, through the Lands of the Collections Automation script I wrote, we'll discuss specific skills and tools in Python. So this section will definitely be more specific to Python itself. We'll cover things like web automation, data extraction, and data manipulation, all of which are key components that make my automation script run. These are also super important things to be familiar with because they're very relevant to library projects. So these are the things that you can really take away and hopefully apply to your own work. And we'll talk about a few examples for how to apply them to library tasks later on in the presentation. We'll also cover ways to share your code and programming projects with others in the library community, which is what I'm here doing today. And finally, we will turn it over to you in the last section of the presentation. We'll ask, how do you want to use Python or other technology systems to improve your workflows? In this section, we'll talk a little bit about the idea generation phase, as well as go over some inspiration spots that you can turn to when you're starting a project, but don't quite know how to proceed. And I'll just remind people now, while you're mentioning that, we'll talk about at the end there, Catherine, as you're thinking of things throughout the show, feel free to go ahead and type them into your question section. Don't, you know, I don't want you to forget something you want to ask or mention. Go ahead and type it in there and I'll hold on to all of those. And then when we get to the end, I'll be able to read through all of them and pass them on to Catherine so that we can discuss them. Yeah, so if you have any ideas or even just little tiny seed of an idea, feel free to shout them out there and we'll discuss at the end, like Christa said. Yeah, but let's go ahead and get started. Here's my little egg cracking open since I've chosen a snake theme for today's presentation. Appropriately enough, yes. So as you get started with a programming project or any project, really, the most important step is realizing that there's a problem. The problem can be something like an inefficient workflow or an outdated system, name it. But for me, a project always comes from the desire to improve something or fill a gap. It doesn't necessarily have to be a big problem. Just anything that makes you wonder if you could make a positive change. The problem that sparked my collections automation project was a pair of monthly reports generated by our ILS, which is called Cersei. One is called the lost checked out and missing report, which contains items that have been marked as lost, marked as checked out for more than one calendar year or marked as missing within the last month. The other report is the multiple holds report and it contains items that have been requested by multiple patrons at one time. So there are lots of people wanting access to the item, but none of them can have it because one person has it checked out. This screenshot right here is an example of what these reports look like. The two reports look basically identical. They include title, author, date of publication, call number and other holdings data like item location, home library and number of circulations. So these reports capture high demand items that aren't currently available. And it's been my job to find replacement copies so our patrons can access the items again. We try to do this on as regular of a basis as possible to make sure we're staying on top of the items that people really need access to. So we use Gobi, which is a vendor platform for ordering print books and ebooks to identify those potential replacement copies. Gobi doesn't offer an API or an application programming interface. So this has been an extremely manual process that often takes upwards of 15 or so hours a month to complete. And this is because the reports are often hundreds of items long. So you can imagine that doing just 10 or so searches in a platform like Gobi wouldn't be so bad. But when you're doing it for hundreds of items, that time really adds up and becomes a significant chunk of your duties. One of the reasons that searching for replacement copies in Gobi took so long is that Gobi search features aren't the most user friendly. And if you're a Gobi customer, you'll probably be familiar with what I'm about to talk about. Most accessible search function in Gobi is the basic search bar that appears at the top of every Gobi page. And here's a picture of it right here. With this, though, you can only search by one term, so you can choose from keyword, title, author, ISBN or subject. Since many of the items in our reports are textbooks with super vague titles like microbiology, there's no way that searching by the title alone would produce the exact item that we're looking for. Instead, it'll produce hundreds of result pages and nobody has the time to go through those to find the item. We also tried to search by the items ISBN, since this is a much more direct way of getting to that exact item. But we ran into issues there as well. An ISBN search only produces that exact item and doesn't link it to other available editions. So if you're searching the ISBN for the seventh edition of an item, Gobi has no way of showing you the eighth edition or the sixth edition. Since we like to buy the most recent copies of items because our patrons often want those over older editions, ISBN searching proved to be problematic because we couldn't see a lot of the options that were available to us. And often we're not finding the most recent copy. So Gobi has another type of search that's called the standard search. This one is much better than the basic search because it does allow you to search by both title and author. So a lot more direct and much better results. However, this page is a little bit more of a hassle to get to since it's not available on every page, meaning that you have to constantly navigate back to this standard search page. It's hidden under a little drop down in the Gobi search menu. So it's a lot of time just clicking on that, clicking on the standard search or clicking the back button in your browser. And while this doesn't seem like much, like I mentioned earlier, when you're doing this hundreds of times again and again, this really adds up. Another issue with manually searching for replacement copies in Gobi is the way that results are displayed, even when using the more accurate standard search. The results aren't in any particular order, so they're not ordered by date of publication or addition number. And the results often include duplicate items and items that are out of print. So you're seeing it in the results page, but you can't buy it anymore. This makes it extremely difficult to quickly eyeball the page and identify the newest copy and especially during a global pandemic. Oh, my gosh, especially during a global pandemic, find the best ebook copy for our users. After searching for all of those in-demand titles in Gobi and gathering the prices for them, that purchase information is then added back to the original report produced by our ILS. It pretty much looks like this screenshot here, information for what it would cost to buy a new print copy or a new ebook copy and information about whether or not we already have an additional print or electronic copy in our catalog available to users. This information is gathered for each item. And once all of it has been added to a report, this gets sent out to the library's collection managers. They then use this enhanced report to decide cases where they want to buy another copy. So a collection manager might look at this list right here and decide that they want to buy the 2013 ebook edition of an item for $89. And this is just really useful because it presents the information in a super streamlined manner, so they really don't have to do any background research and can just buy things quickly for our users. So I identified the process for finding replacement copies as problematic, because while it's important information to have, gathering all of it and adding it to the report was extremely time consuming. Like I said earlier, the two reports were taking me on average about 15 hours a month to complete. And that was just a ton of time sunk into these reports. So when I first started thinking about a way to speed this up, I didn't know where to start, but I did know a couple of things. I wanted to be able to search for items in Gobi. I wanted to select the same or newer editions. And then I wanted to add that information, such as the price to the reports. And I wanted to do it much faster than I could do by hand. My first instinct after realizing that I wanted to change things up was to look around for pre-made solutions. I looked around the Gobi website to see if they had a feature for searching badges of titles, but they didn't. I also checked journals and conferences for work that seemed to align with my needs. But again, I didn't turn anything up. So with no obvious solution, I knew I had to do some digging to find something that would work for my situation. What helped me narrow in on Python as the solution to my problem was overhearing a coworker describe why they were using Python. While my coworker was editing spreadsheets and not automating something like I was setting out to do, a few of his words stuck in my head. He said that Python is good for repetitive work and working with large sets of data program. This stood out to me because it described pretty much exactly the issue I was dealing with. Searching with Gobi is repetitive and it involves dealing with long title lists or large sets of data. So I thought surely there had to be a way that I could tackle my reports with Python. I began investigating Python's capabilities and found that you can pretty much do anything you want to with Python if you're willing to put in the like work of learn. One of the best things about Python is that it has a diverse set of libraries that have been developed to expand its functionality. When I set up with my idea to automate the monthly lost, checked out, missing and multiple holds reports, I identified a few tasks that I would definitely need to do with Python. These were data manipulation, web automation, spreadsheet writing and string matching. To start structuring my Python approach, I found libraries that could handle those tasks and these were pandas, Selenium, OpenPixel and FuzzyWuzzy. And while I won't talk about them now, we'll talk about them later and find good use cases for what they can do in the library setting. So before we take a look at the actual program itself and see what it does, I want to talk a little bit about project based learning. My automation script was the first Python script I ever wrote. So the learning process itself was almost as important as the actual coding. And now that I'm saying that, I'm actually thinking that the learning process was as important as the actual coding just because the learning is what guided the development of this project for me. Prior to develop art, I'm going to start that slide ever. Prior to working on this automation project, I had no experience with coding whatsoever. I was definitely intimidated because I'd taken one or two basic coding classes before, but I'd always come away feeling like I just didn't have the right type of brain for it. These were the sorts of classes where you work on, you know, little example problems where you're trying to solve a practice problem that doesn't really have any relevancy to the type of work that you do. So I definitely got stuck in the hello world phase where all you can do is print the phrase hello world, but you're stuck there and you can't actually do anything helpful or useful to your work. However, since I had a tangible project to work on this time, I found it easier to take the plunge and start learning in a more directed manner. I could use my project to guide my learning and know that with each skill I picked up, I was one step closer to my goal. The sense of progress toward automating the reports really kept me going when I ran into difficulties because I knew that the things I was learning now and feeling so frustrated about would then directly apply to my development process in my making these scripts a nicer automated thing. So this isn't at all to say that project based learning made the learning process super easy. I had to Google a lot and constantly. Things like how to add items to lists, what a certain error statement meant, how to read Excel files into Python, you name it. I probably slash definitely searched for this right here is just a tiny, tiny subset of all the things I looked up when working on my automation project. And here are some more of my searches. And here are some more you can pretty much get the picture. Looking things up is a natural part of learning how to code. And so was searching for things multiple times because you forgot something five minutes after you looked at it, which I have definitely done while learning. And then I definitely still do to this day. Basically, with programming languages, the learning never stops to this day. Well, over a year after I started this project, I still Google super basic things because I forget or the languages syntax changes or I realized there's just a better way to code something. Python and any programming language has a very particular grammar and a huge vocabulary and our brains aren't like computers. We can't remember it. So it's well known that even full time developers have to Google things and remind themselves how to rate a particular function or even just super basic things. So no matter if you're a total beginner or seasoned programmer, stay curious and be ready to keep learning. And then my advice is to just have a project to work on to give yourself that sense of progress. So you're not just spinning your wheels, learning something that doesn't really apply to your work, because then you'll be more invested in it and feel like you're actually doing something important. So now let's see a demo of what this program does. This is actually what the script looks like. This is what's called the Jupyter Notebook. So it's just a way of having your code in the cloud, essentially. And you can run it really easily. So I'm going to go ahead and start it and talk about it. As it runs, there we go. So we're going to see a nice browser window open up in just a moment. Here it is. So what's happening right now is my code is opening up a browser window and it's navigating to Gobi's website. And it is over here knowing that this is the login area and it's sending my information. And now it's searching for the items by pulling the titles and authors out of the report that our ILS produces. And it is sending those to that standard search page and then producing the result pages. And what's really helpful about the script is that it looks at the result page and gathers data from it much more quickly than I could as just a regular human. The script does still catch little moments like this right here where we're searching for a more generic title like computer networks. This causes it to be a little bit more slow. But the important thing to remember is that on the whole, the script can do all of this action much faster than I can. And so it saves me a lot of time, because another cool thing about this is it can pop open this window and be running while I'm doing something else, like looking at Excel or sending an email. I don't have to touch this while it's running. It's just going to do it on its own. So I'm not going to make you sit here. And that's very impressive seeing that do it actually live. Oh, I'm different from the frustration of what you had to deal with before. And then seeing this do it is just like, whoa, I love it. Yeah, it's super helpful. And you can also like play a joke on people by having this run and they walk by and say, look at how fast I'm working. Oh, so I'm not going to make you watch this whole thing because it's going to search for a couple hundred items right now. So I'm going to close this and we're going to hop back over to presentation where we'll talk a little bit about the output of the script. All right. So basically, this script creates the reports by automating three processes. First, it searches for the given list of titles and Gobi like we just saw happening. Then on the results page, it grabs the price, binding and publication dates for each of those search results. And then finally, behind the scenes, it selects the item with the best fit from the search results and adds that items price back into our original report. My best fit, I mean, the copy and Gobi. That's either the newest edition or the same edition as the original item. And just to remind ourselves, like I said earlier, our students and faculty really prefer the newest edition of items, since a lot of these are going to be textbooks and so obviously want the newest, most recent information. And when there are both print copies and ebooks available in Gobi, the script does select a best fit for each format. So the newest print book and the newest ebook copy. So now that we've talked about the learning process and seeing the script running, let's dive into what makes it work. So the great thing about Python is the number of libraries that expand its capabilities in the Python community. The word libraries and modules are generally used interchangeably to describe separate Python packages that you can download to introduce a specific function. So Python straight out of the box doesn't have an automation capability, for instance, and I had to download a separate package that would allow me to automate browser interactions. So I use quite a few of these separate libraries while scripting. Here's a screenshot from that script that shows all of the libraries that I import to make this run. And these are all super important and we'll go over them and touch on examples for how you can use them as well. So Pandas is a module that's extremely useful for library work and is one of my super favorites. It basically creates more flexible little spreadsheet structures called data frames, and these are just like the rows and columns that you see in Excel and Google Sheets, but Pandas makes it easy to add to them, iterate through them and merge them within just seconds rather than you putting in the manual effort of adding rows or merging different workbooks together. Pandas is also really helpful because you can read an Excel or CSV file directly into a data frame so you don't have to go in and copy a bunch of information to bring into Python. Pandas can do that for you. You can also write a data frame to an Excel or CSV file so you don't have to open Excel and prepare a spot for your data to land. A good example of a use case for Pandas within the library world is, for instance, using it to compare title lists, like when you're identifying books to read or identifying high use titles to purchase in electronic formats. When you read in the list of Pandas, you can ask it to check for common titles between the lists or to find the differences between those two lists. Instead of doing it by hand, Python will do it for you in just a second or two when using Pandas, and this is really helpful because I take forever to do list comparisons by hand. For instance, in the collections department, our vendors will sometimes send us a big, huge, long list of items that they are removing from a platform. So we won't have access to ebooks anymore. And I used to just do a catalog search or download a list of all of our ebooks and find them in that list, and it took me ages. And I realized with Pandas, it can just take a couple of seconds to do what was taking me a couple of hours. Selenium is one of my favorite Python tools as well, and it's what actually allows me to automate searching in Gobi. Selenium can open up a browser window and interact with it like a human does, so it can click on links, press buttons and find specific words on the page. Selenium also has some helpful functions within itself, such as waiting. Sometimes Selenium can move faster than the web page itself does, which happens with Gobi a lot because Gobi is not the fastest website ever. So without the wait function, what ends up happening is Gobi will be loading the results and Selenium will be trying to move on because it's trying to be speedy, and then it'll result in an error because Selenium is basically saying, oh, the web page is I think it's broken because it's not moving fast enough, but you can tell Selenium to hold its horses and slow down and wait for the web page. There's also a keys function that lets you send text to the browser window. This is how the title and author information is sent to Gobi like we saw when the script was running. Another useful function is the no such element exception. And this basically just lets you acknowledge that something you were looking for on the web page wasn't found, but you can still move on with the script. Otherwise, Selenium will tell you, hey, that was on the web page. We need to stop. I'm going to crash. You can tell it. It's OK. Let's just keep going. So a library world example of a Selenium use case is something like validating links, which we do a lot at the libraries. Rather than individually checking catalog links to make sure that they work, you could potentially send Selenium through a list of those links and just have it note which ones are broken rather than you waiting to find broken ones. Request is a super simple, but important little package that lets you direct Python to a web page. For instance, I can tell it to go to the NC State University Libraries homepage by just telling it browser.get.http-slash-www.lib.ncsu.edu and Selenium will go right to the library website. Fuzzy wassy is a module that allows you to do fuzzy or inexact string matching. An example of this is matching titles from our ILS two titles and Gobi because while those titles are largely similar, there are slight differences between them like punctuation. That mean they won't be exact matches. For instance, titles in our ILS often have spaces between the main title and the subtitle. So it'll be main title space colon space subtitle and in Gobi, it won't have those spaces there. So Python would look at that and say that's not a match. Fuzzy wassy is what lets you identify almost matches. It calculates the differences between two given strings and allows you to set a threshold for what you deem as an acceptable match. So you could give it a bunch of strings and tell it, I only want to count things that are a 90 percent or higher match. Re is a module that brings regular expressions to the table. And if you're like me and didn't know what a regular expression was when you were first learning about them, a regular expression is a string of characters and makes up a search pattern that you can then apply to a larger block of text. For example, say we have a text block that says pub date colon 2019. We can use a regular expression to identify a four digit string within this larger text string and just produce the year. The regular expression for this would be backslash d for digit followed by the number four in brackets. The output of this regular expression search would just be 2019. So regular expressions can be really hard to get right. And I definitely struggled to write them in my script just because the vocabulary is so specific, like the backslash d meaning a digit is not immediately obvious. But they're a useful tool if you have some time to learn the syntax. An example of how you might use regular expressions in library work, for instance, is if you have some digitized files, like a PDF of an old manuscript that is then digitized and made into an HTML text file, you can apply a regular expression to call out keywords that you're interested in or help a researcher find those keywords. And here's one last module that helps the script run. It's called XLS writer. And it's what lets you create a new Excel document from within Python. Meaning you don't have to separately open Excel and create a new workbook for your output to land in. If you'd like me switching between windows and having a million Excel workbooks open at the same time, you can definitely get a little annoying. So being able to streamline that document creation process from within Python is definitely a perk. As we transition to looking at the actual code, I want to note that I'm not going to go into too much detail here today, but I'm always happy to follow up with individuals who want to know more about the nuts and bolts. So like Krista said earlier, just to ask a question or reach out to me, I'll also link to the GitHub repository where this code lives at the end of the presentation. So anybody who's interested in using it or taking a look in front of there. So the code starts by creating a title, author and publication date list that are populated by information from the original report output by our ILS Cersei. I'm using pandas here to read in that report file and sending those title, author and date lists, date values to lists. Then we're pairing our title and authors together in what's called a Python dictionary. This means that they won't get mixed up and an author name won't end up with the wrong title. This is an important step because our ILS doesn't have the author information for some titles, meaning that the lists don't line up exactly. So by creating a dictionary, the script knows that those authorless titles should be paired with an empty value rather than the next author on the list, which would cause things to get out of whack. And here's the little bit of code where we actually activate our selenium browser because I prefer Chrome over Firefox. I have a web driver, which is the thing that opens the browser that's specifically for Chrome, but it's not necessary. You can use whatever browser you like when automating. The script directs the browser to go to Gobi's home page. And then we see it identifying the username and password boxes. But how does the script know where to look for those? How exactly did we come up with those G user and G password IDs for username and password? Selenium uses what's called a web element, such as an HTML ID, class name or X path to find its way around web pages. These elements are what make up the behind the scenes guts of websites. So the dirty stuff behind the nice front end. You can see the web elements by right clicking in a browser window and clicking that inspect button that comes up. This works across multiple browsers. So no matter whether you're using Chrome, Firefox or something else, you should still be able to view those elements. So here's an analogy that may or may not work, but I really like it. Working with selenium is a little bit like playing with a cat and a laser pointer. The object the laser is pointing to is the specific web element you want. Your code is the laser pointer and then selenium is the cat. So you're pointing your laser around and telling selenium where to go and interact with something. So inspecting pops open a new pane on a website that displays the web elements. So you can figure out which element you need to direct selenium to. What are we pointing our laser pointer? By right clicking and inspecting the username box, such as this example for the Gobi homepage, the pane will direct you right to that web element by highlighting it in the websites code. So we can see here that it's kind of grayed out in the background. And that's the browser telling us, oh, this is what you're looking for. So we can see that the ID for the Gobi username box is G user. So we direct selenium to look for G user. And when it finds that element, it knows that's where it should send the username information through inspecting elements and telling selenium where to go. We navigate selenium through Gobi and eventually land on a results page after inputting our search terms. So here, the script does something a little bit different from what we've done before, rather than telling selenium to look for a box or button, we're telling it to extract data from this page. What the script does is pull each result one for each title. So we're looking at two separate result boxes here. It pulls these into a list. The result boxes contain the title, price, binding and publication information for each title. However, these important pieces of information are clumped in with other pieces we don't need, like the publisher name, LC class and the prices in British Pounds. What the script does next is apply a series of those regular expressions which are designed to pick out the title, publication date, binding format and price for each of the results. One important thing to consider is that while searching by title and author and Gobi is generally much more reliable than the only title search we looked at earlier, some outliers can still sneak their way in. Plus, we're also scraping information for multiple results per one title. So what we need to do is compare the results we harvest from Gobi to the original titles from our ILS and choose the most accurate results. Make sure that we're actually considering a replacement option for the item that we lost or is in high demand. So this is where Fuzzy Buzzy comes in. By using fuzzy string matching, we can weed out the incorrect titles and retain only the accurate titles. I have my threshold for this code set at 70 percent. This might seem low, but it does a good job of getting rid of incorrect titles while still recognizing correct titles despite those punctuation and stylistic differences between our ILS and Gobi. When the script detects a successful match, it sends that title and it's accompanying price, format and date of publication to lists. And like we did earlier with our title and author information from the ILS, we bind those pieces of information together into a dictionary. So the information for one title will stay together as one unit. Then to begin preparing the data from Gobi to be paired with our original report, we send that information to a data frame. And as a reminder, a data frame is that flexible data structure very similar to an Excel spreadsheet created by Pandas. The columns of our data frame are the title, price, binding, format and date of publication all grabbed from Gobi. To make things easier, the script also separates print copies from ebook copies by creating sub data frames. This is done by essentially filtering the original data frame by format and putting the resulting rows into separate data frames. So here we're saying that the ebook data frame is only going to include items where the binding is ebook and the print data frame is only going to include items where the binding is cloth or paper. Next, we sort the ebook and print data frames by title and descending date, meaning that when there are multiple instances of a title with different publication dates, the newest publication date will always be on top. The script then drops all duplicate rows in the data frame, keeping only that top instance of each title, which means we always get the newest copy information. So our students and faculty are happy with the latest research. Finally, once we have sorted and deduplicated both our print and ebook data frames, the script merges them together. This forms one big data frame with both print and ebook purchase information. And because we deduplicated our other data frames before merging them, this final data frame will only have one print option and one ebook option per title. One exception to this is cases where there wasn't a print or ebook copy of a certain title with a binding copy. So if there wasn't an ebook listed for a particular item, that column will just remain empty for that one item. The script finally finishes out by sending this completed data frame to an Excel file, which is then shared with collection managers. The best part of this whole thing is that instead of taking hours like it did for me to complete this manually, the script can produce this polished report in less than half an hour. And in general, it takes about five to ten minutes, depending on the number of super general titles we have for the script to run through about 100 titles. So when you've written a script that's really helpful to you, like this one is to me, the chances are that it will be helpful to others as well. In this next section, we'll briefly discuss ways to share your code and ideas with others in the library community. So luckily, there are lots of great technology-focused spaces within librarianship where you can share your work. Some of these include the Code for LibListServe, which is an email list of programmers, developers, and people who just generally mess around with code in libraries. The Code for LibListServe is a really collegial and welcoming place for newcomers, so it's also a great place to sit back and look at other people's work. I've definitely learned a lot from seeing what other people are talking about on there, even if I haven't necessarily been sharing as much on there. So this is a really valuable place, whether or not you're sharing or learning. Another way to share your work is through conferences. For instance, I was originally going to present this talk at the Library Technology Conference back in March, but that was before all of the pandemic closures. So some other conferences in the future that would be great candidates for coding presentations or similar topics are the Midwest Data Librarian Symposium, the Internet Librarian Conference, I think there's also a Southeast Data Librarian Conference, things like that. And then there are also technology-focused subgroups of major organizations. And a good example of this is the Library and Information Technology Association, or LIDA, which is part of ALA. In North Carolina, we have the Technology and Trends section of our North Carolina Library Association, but I'm guessing that most of you are not North Carolina Library Association members, so other regional organizations will probably have similar subgroups. These groups sometimes offer the opportunity to host webinars or to attend events and share in person when possible, so they're a really great place to share and learn as well. And another important part of sharing your code projects is to think about licensing. So licensing really matters because it dictates how your code will be treated and reused by others as you let it out into the universe. Generally, in the library development community, people tend towards licenses that encourage reuse, such as the MIT license, the general public license, or GPL, and the Apache license. However, depending on your desires for how your code is treated by others, you might have a preference for one of these over another. Here's the very super important part where I say I am not a lawyer, but I can give some basic insight into how these three types of licenses differ. For instance, the MIT license allows others to adapt your code into another project and then license that separate project however they like. So this means that someone could potentially create an offshoot of your code and keep it behind a closed license that prohibits reuse, so somebody would have to pay to use that offshoot project. The GPL license, meanwhile, binds anyone who uses your code to license their project with similar licenses that allow for reuse. So this license choice makes it so that projects downstream from your code also remain open. And then the Apache license is sort of a middle of the road option, which allows any other projects that come from yours to use other licenses, but those other projects must very clearly state how their project differs from the original code borrowed from you. Let's choose a licensed website. Sorry, go ahead. No, so I have a question then because for a lot of things that I've been involved in or done that libraries do use creative commons licensing for things like websites and things they've written. Would that work for this or just because it's code do you then need to use these specific options? You can definitely use creative commons licenses, but in the coding and library development side of things, it's much more common to see licenses like MIT, GPL or Apache. The fact that it's coded, that kind of thing, yeah. Yeah, but you're welcome to use creative commons, it's just like a, I guess, not not peer pressure, but it's definitely more common to see software specific ones. Okay, cool. So if you're wondering which type of license you're interested in, this choose a license website right here is a really helpful starting point for anybody interested in learning more. I would highly recommend this resource for anyone looking to license their code or just become more familiar with licensing, because this is a question that comes up in libraries a lot. So I've definitely found it useful to just sit and learn about it a little bit more. It makes me feel more confident that I can point people in the right direction. So now let's transition to talking about you. Here's the part where if you have an idea, you're welcome to shout it out or chat it in whatever you like. But here's some helpful things to keep in mind when pondering the question about what you want to, what you want to do with programming or Python specifically. Something really helpful to keep in mind is your project needs. So questions like what is something that I need to improve, or what is something that would be nice to speed up. These questions definitely help me plan my projects out and most of my Python projects have emerged from questions like this. I also find it helpful to think of what I call dream scenarios, where I imagine what the best version of something could be. And then I think about what activities in Python could help me get there. And what I've learned from my experience and that of others around me is that if you can think of it, you can probably do it with Python or another scripting language. You just need to learn code and also be willing to troubleshoot and feel a little bit frustrated sometimes. Of course. Yes, so yeah, please do type into the questions section or go to webinar interface, any questions you have. I'd be curious to know from people who have are logged in right now, why were you interested in today's show specifically? I mean, obviously this caught your attention because of the description, programming, Python, making things go quicker. What were you, you know, what made you want to watch today? Is there a specific procedure or project that you've have been doing that you're desperately trying to figure out how to make easier, excuse me, or were you just curious about, you know, using Python in general? So please, you know, let us know what you're thinking about because we'd, you know, like to know if, if, if Catherine give you any specific tips, you might not know, really, you maybe just generally put in feelers out to see what is this thing, this Python thing. Yeah, that was definitely the case for me for a long time. I was, I was interested, but scared, like I mentioned earlier. So I actually have some tips for generating ideas if you're in that space of wanting to use it, but you just have no idea how to apply it to your work. Like what could it even be used for? Yeah. Yeah. Yeah. So something that I found extremely useful is to look for inspiration in other places just to see what people are doing because then it gives me kind of the idea like, oh, I can do something similar or, hey, if they can do that activity, I can apply that to my project. And I'll just briefly talk about these four resources. I mentioned Code for Lib earlier, but I'll bring it up in a different way. They, in addition to the listserv, have a journal that's published a couple of times a year, I think two or three times a year. It's chock full of interesting ideas, projects and use cases where people apply different types of technology to solve problems. So it's not all about Python, but it might inspire you because people go into really great detail about the projects they're working on, what they're using. And I also find it really helpful because people don't just write about their successful cases. People also write in about their failures, which can often be equally inspirational to the how they start to really learn from failure. Yeah. Okay to fail. It's actually a good thing. Yep. Use it. Yeah. And then library carpentry is another really good library specific resource. For anyone beginning their journey and decoding, this website is really great because it offers workshops, training materials, things like that. Usually their workshops are definitely held in person at libraries across the country. But given the circumstances that's on hold and they might be doing some virtual events where you can learn with others who are also starting. But if you're more of an independent learner, they also have great materials that you can walk yourself through. And then the last two are not quite as library specific, but really helpful for just getting an idea of what Python can do, which can help you see applications and crossovers between Python and your own work. PiPi.org is a website that lists all the different modules available to Python users. So there's little offshoot functionalities that you can download. It's really great for if you're thinking about a project goal, but you don't know the specific things that you need to get there. This, for instance, is how I found out about Fuzzy Wuzzy. So I was thinking, oh, I'm going to run into the problem where Python doesn't recognize a title and go be because it's not the same title from our ILS. And I was desperately searching for a solution and found it through PiPi. When I was just searching for something, I forget the exact term I used, but something like string matching and that solved my problem. And then finally, Stack Overflow is a very active, but not library specific community. It's kind of like a question answer website, sort of like Yahoo Answers, if you can imagine that. Whenever you're working with a scripting language or programming, you're inevitably going to run into Stack Overflow because it's a really great way to ask questions about issues that you're having or find other instances where people were running into a similar issue and you can try all of the answers that are suggested. So while it's it's much more tuned to people who have already started, it's it's a great place to turn to because it's through poking around Stack Overflow that I came up with better, more efficient ways to handle my automation project. I had originally been using another automation module called Beautiful Soup from Python, and it just was not really working for me. Like it was hard to learn and it wasn't getting the data from Gobi like I wanted it to. And so while looking through Beautiful Soup question and answers on Stack Overflow, I found out about Selenium. So it can be a great resource for learning and also realizing that there's probably a better way to do something. So it's there just a few suggestions. And I think this is good to those. Recommendations you had there that for us that look outside your own your own box, your library box to other areas that can help you do your library job that you might not realize. You know, you said the first few there are obviously library focused, but things that are not. Don't say library all over them, definitely can be beneficial and help you do what you need to do. Yeah, definitely, because there's there's quite a bit of overlap between the library community and the software community. But it's through looking at the software community more specifically that you can really figure out the specific tools that you need for your individual project because there's there's a lot of separate projects in the library community, the librarian community. But your project is unique. Nobody's going to have the exact answer for you. And so it's definitely good to broaden your scope when looking for solutions. Absolutely. We do have one comment here, someone talking about what what they're looking for for today. Robin says, I was just looking for ideas for library related projects that could be automated. I really like the overview of the libraries she used because that gave me some ideas about how this stuff can be used in my workflows. Oh, well, I'm really glad to hear that. If you have any questions about any of the libraries in particular, I'd be happy to answer if I know the answer. And you can always search to find it as you showed earlier, which also was something very good and important to see. And I think we as librarians, it's what we do is we find answers to the questions that we or the people coming to our library don't know. And then it's OK to jump into something like programming with nothing, no previous knowledge of what you're doing and just start the searching. Do the Googling find the site that is the expertise one on what you want to do and just start throwing in searches. And that's how I learned how to do so many things that I've done. You know, don't be afraid to just, you know, all that those three couple screens you had of all the different searches to just to figure out what could work and what might not work until you find it. And I think that's important for people. I know programming can be intimidating. I know it is for me sometimes, but I've managed to I've not done this kind of thing, but things like HTML programming or other behind the scenes programming. I've become more comfortable with it just by searching for what I want to do and finding the tip and the trick to the thing that I wanted to wanted to do. And I think that's good to show that even you as today's supposed expert on this to tell us how you did it. You had to start out just like anybody else with I need to do this thing. And I know nothing. Let's just search it and find hopefully something. Yeah. And definitely recognize that even if you feel like you're in a really good place with a specific language like HTML or Python, you're still going to be searching. And that does not mean that you are not good at it. That's definitely something that I've had to overcome and other people I know have had to overcome. So a lot of getting into programming is just bravery and recognizing that, you know, you can be good at it, even if you're still searching for stuff all the time. And like you mentioned, they change things. So. Oh, yeah. I don't know how to use it from last year or even. And then this year, try and do it and it doesn't do the same thing anymore. That's it. They are always updating and advancing things. So absolutely. And that's OK. Yeah. All right. So a little after 11 o'clock central time, but we did start a little after as well after 10. If anybody has any questions, the comments, anything you want to share, what you any ideas, what you might be doing with Python, get it typed into the questions section before we're wrapped up. And there is captain's contact info. We shall be feel free to go look at this code. It's public on GitHub. So you can actually take the code if you want to and adapt it to your own projects. And if you have any questions, just email me at that address. And I also mentioned to well, I should have mentioned at the beginning, but those of you who regulars know the slides captain's presentation will be available to you to look at afterwards as well, along with the archive recording. She's going to send me the link to the sharing link for the Google Slides and we'll have that added into the archive page when I get that up and going to. Yeah. All right. So anybody have a last minute desperate questions that you do want to ask, get it typed in now before we wrap things up here. It looks like everybody is very intently listening during the show. So there wasn't a lot of questions or comments throughout, which is fine. I think this is great. There's a lot of info to get to get to for you to pass on to us and get through. But I think it was great. I like I said, I am not. I'm not a big programmer myself. I've experimented with some things, but I actually understood what you were doing. So good, good. Zero knowledge is actually OK. If I ever did need to do this, I could muddle through it probably myself actually by looking at your things. That's what I do a lot, too, with my own coding or HTML or whatever. I just start looking at the behind the scenes code and see, you know, so what made the thing do this thing and how can I do what I want? Yeah, don't be afraid to dig into that. The actual so much of how coding works is just what made that person's work. I'm going to try and learn and emulate it. Absolutely. All right. We just have a thank you. Thanks, ladies from all things. All right. So I think we will wrap it up then. It doesn't be anybody doesn't have anything desperate. Looks like they want to type in right now. That's fine. Reach out to Catherine with any questions you do have. And check out her code there on GitHub. I'm going to pull presenter control back to my screen here. All right. There we go. All right. So thanks to everyone for attending. I and the show has been recorded and will be here we go on our encompass live pages or upcoming shows. But right underneath them is where you can access the archives. Our archives shows are here. Most recent one on the top of the list. Today's show will be there. I'll say as long as go to webinar and YouTube cooperate by the end of the week. Got it. Hope they do. We'll have a link to oh, here's one the last week brings the recording and a link to the presentation slides that Captain will send to me. So everyone who attended this morning and register will get an email from me letting you know that's here. I'll just show you while you're we're here on the archive page. You can search your archives for any of our previous shows. This particular topic you're interested in seeing if we did a show. You can search the full archive or just most recent 12 months. That is because this is our full archive going of all shows that we have done. Encompass Live premiered in January 2009. So this is a fixed growth of the whole thing is a long, long list of things. So if you do do a search in the full thing, just pay attention to the original broadcast date of a particular show. Everything has a date was when it was originally broadcast so you can see how old the information may be. Some of the information here may be outdated. It may no longer be correct. Of course, things have changed over the years. Some links or URLs might no longer work because products have shut down or changed so much. So just pay attention to when something was originally broadcast. Many of our shows do test and the test of time like reading lists and things like that, of course. But some things will become outdated in the information. Just pay attention when you are searching your archive. But we are librarians and this is what we do. We'll keep our archives here. Keep the information out there for everyone who needs them for as long as YouTube will exist. So that is where today's our recording will go. Hope you join us next week when our topic is the taming of the site. There we got here. Helping users find what they need, where they expect it. Another struggle for many libraries. We create this great website. You put this information out there. Why aren't they finding what we want them to? We'll find out. Hi, Jessica Gilbert Redmond will be with us. She's from the University of North Dakota and she's also another one, as Catherine mentioned, the Library Technology Conference is a great tech conference in McAllister College up in St. Paul, Minnesota every year. This year, as she mentioned, was canceled because it was going to be in March right when the pandemic really started going strong. And I've been able to bring many of those presenters to weren't allowed to be able to present at all on to Encompass Live. This is another one from there just goes to present on there as well. So definitely register for that one if you're interested in any of our other topics we have coming up July, August. I've got a couple more ones being added to the schedule in August as soon as I get some things finalized. So keep an eye on our site there. But we also do have a Facebook page. I have links to that for Encompass Live. So if you do like to use Facebook, give us a like over there to get reminders. Here's a reminder to log in today's show this morning. Announcements about upcoming shows information about our presenters when our recordings are available. So if you do like to want to use Facebook to keep up with things, give us like there. We also use if you saw that in the beginning slides Encompass Live. Here it is as the hashtag when we post things onto Twitter or elsewhere. So definitely do that. So thank you everybody for attending. Thank you so much Catherine for being here with us and explaining Python to us. Thank you for having me great episode. Like I said, I actually understood what was going on. So I think that's that's awesome. And if anybody has any questions, you want more help with what you how you might want to use it, reach out to her and she will definitely be a great resource for you. Happy to answer any questions. Absolutely. All right. So thank you everybody for attending and hopefully see you on the future Encompass Live. Bye bye.