 I'll give you a little bit of background. So this project was my PhD, which is technically in sociology, which is arguably more hilarious because there wasn't anyone in the sociology department who really was able to supervise my work that well. And that's how I ended up with a physicist as a supervisor. And I'm the only sociologist he's ever supervised, which is kind of hilarious. And so, and again, it was his research group that was crucial for me to be able to go through all of this technical stuff in a way that I would actually be able to get some results. So if that hadn't happened, I probably wouldn't have finished my PhD. So, but you know, there's finishing the PhD and then there's publishing it. Those are different tasks and various other things kind of had to take over. And so my supervisor and I have finally decided let's go back and see if we can finally get a paper out of that. So that was part of the way of forcing myself to keep going with it and to present at this conference and a conference on social network analysis, which was last week. So this is a very technical side of it, whereas the presentation before was very much the results, analytical results. But I figured this might be interesting for other people in the Python community and particularly if people have old projects that they need to kind of return to and get up and working again. So I call it resurrecting because it was very much the sort of left in stasis kind of situation from a few years ago. And of course, many libraries have changed and this is only looking at the Python side of it. A lot of the analysis ended up being an R as well, so there was quite a lot of back and forth between the database in Python using Django's ORM, specifically the Geo Django ORM in retrospect. Post GIS using the SQL Alchemy GIS, I think it's Geo Alchemy package would have been another option that I kind of wish I had gone for in some ways, but I found so much more documentation of Geo Django and that was one reason why I stuck with it. And I guess the possibility of maybe one day releasing the results in a way that other people could then look at the data was part of that motivation. So maybe one day, but at the time I went with Django and it was still when I started, gosh, part of it actually went back to my master's which was back in 2008. So that was fairly early days by Django standards and the Geo Django stuff was very recently coming out. And so that was back under Django one, I think maybe 1.3 or four, something like that. So if we fast forward to 2014, which is when I finally handed in my thesis, Django I think had gotten up to 1.11, but I had gotten so worried about getting the results that it was still kind of Django 1.7 which we'll get to in the initial state bit. And a lot of the libraries were kind of should have been updated, my test structure which I had done with knows was not done very well. And so there were still tests that weren't passing, they were taking hours to run. And my supervisor not being a software developer was mostly concerned about getting the results published. So things got really hacky towards the end. So I'll just give you a brief further context. It was on analyzing the geographic spread of this community and this map at the back. Let's see if I can zoom in on that. Maybe not. Yeah, so each of these dots, the circle is the number of people running bulletin board systems which were very early kind of social network communities in which people would literally act as a server on their home computer and publish times that other people could connect to them and then they would save messages that other people could read. And they tried to keep the list of phone numbers that you could call to connect updated every Friday. So that's a time series of the change in this community going back to at least 1983. So that's weak level changes in a community that I studied. I think I finally left the data at about 2012. So we're talking almost three decades for large sections, weak level changes in a community structure. And so each of these circles is the coordinates of the U.S. telephone exchange and the number of servers that are being run with numbers that are through that exchange. So it's a little bit misleading because it looks like Washington state up in the top left corner. I assume maybe you can see my mouse is like massive compared to say California, but it's actually partly because there's so many more telephone exchanges in California. So there's actually much higher density of places. It's just that they aren't all going through the same telephone exchange. So they all show up in the same place. So to process all that information, the geographic, I used GeoJango, which allows me to use post GIS so I could run spatial queries. And so it ended up being a Django project. And then from that, constructing that dataset alongside U.S. census data, which is also very geographic so I could get the demographics of the different regions. I could try to model how much the demographic features were important in the spread and then the client of the social network. And I'll just give you another slightly brief kind of delve into the context. This is, I hope you can see this, so FidoNet was the name of the community. There were a number of other bulletin board systems or VBSs, but FidoNet was probably the most global. It actually covered all six inhabitable continents and varying degrees. I actually have plots of that if you're interested. And so this was back in sending texts. So there was actually some fascinating parts of the project in which they were trying to deal with different character encodings before Unicode. And so I won't go into all of that, but that was the project. I think it's a really interesting topic. It was a really good documentary and I've met the guy who made the documentary. I've also met one of the people who wrote a lot of the core code that the system was founded on. And this little logo here by John Madill was the original logo of FidoNet. So that's the context of what I'm studying. And so in creating all of this, I used Django. There were two main datasets. One was the list of telephone numbers and actually I can go into, you know what? Why don't I alter my... Well, okay, maybe I can go back to that. I can show you the list actually of telephone numbers in the file format. Maybe what I can do is just quickly show some of that. So this is a presentation I gave at a social network analysis conference called Sunbelt. So yeah, I hope you can see this. This is an example of the data structure that I was processing. And so I can give you a talk on regular expressions if you like, but yeah. So this is a very hierarchical routing structure. There were actually zones for continents and then regions within continents. In this case, Eastern Canada. And then sort of nodes that were geographically local to that particular region number. And then another hierarchy of hosts. Within that and underneath hosts, there's another hierarchy of hubs. So you can see this is sort of the easier, more standardized level of data. The much earlier stuff is really, really complicated and just use commas and spaces but not in a clear structured way. There are a lot of asterisks to account for more international numbers because originally it was just US numbers. Anyways, there was a lot of processing that went into that and that then led to the dataset. And then I wanna show you this here. There's also US census data which can give you demographic features down to approximately 8,000 people and they'll give you the shape of those as in a shape file, which I could then process with Geo Django and I effectively created, maybe I could show you that somewhere else. Yeah. So each of these lines in this structure is a region that we could get demographic information on. And then these dots are the coordinates of the US telephone exchanges. And so I combine the telephone exchanges around areas that I could get demographic information on that of course there were some awkward edge cases around lakes. And so I constructed my units of analysis with about 16,000 areas. And then I could look at when each area, if they did, adopted and controlling for demographic factors to an extent spatial proximity helped predict joining the fight or not community. So that was the idea and that was the sort of data that needed to be processed in a structured way. So I can actually briefly show you. Let's see. Yeah. So hope this shows up. This is the database hierarchy of tables that I constructed. This was also one of my first big programming projects. So I'm guessing there are many people who think this should be done in different ways. People argue about the best way to do database structures. Feel free to criticize me on that. This was really my first big dive into programming at least on this level of project. So what I'm showing you here is the node time period. What I'm showing you here is the node time point. So I was trying to look at the time points of each activity at a regional level which I could then get from 1990 US Census data and there are different, there are tracks and census blocks trying to handle those different hierarchies. And then there was another which is the NHGIS data which was another regional amount for larger areas where you can get more detailed demographic information which is also related to this iPons thing. And so there were certain things like age and occupation which I think we could only get at the larger area level. And then there were other bits of data that we could get on a more finer grain level. So we had to do a weighted structure of all of that information to kind of characterize all these regions and how much that might make a difference in modeling this community's spread. And then getting the geographic information from area codes and the census track area codes and the coordinates of US area code regions. Anyway, so this is just a glimpse at the database structure that all of this was stored under to then effectively get a sense of the network as it was expanding and then try to get the data right. And I guess that's another crucial thing to point out. What I've done is much more feasible in a research context where it's not like a classic Django project where you have a bunch of users and they are, you need to have some data or something that's providing that service on a consistent basis. I can rerun the whole thing, I can drop the database, make sure my answers are right. I didn't have the classic situation of providing a Django site as a service to users. So that gave me more flexibility. And given how difficult I found this project, I shudder to imagine what it's like if you're running your own Django site without a whole team of other people to help with that transition process. So that's all the background. And so I'm gonna dive back into this. So it's interesting finding this whole share thing. So I'm hoping this is working. Okay, cool. So I think we're back to my poster. So with all of that context, I'm gonna start with the preamble. I assume you've heard of PEP8. Which is about making your code as clear as possible. And clearly written code, documented, tested. If you haven't looked at it in years and you come back to it, I cannot express how crucial that is. And even stylistic sort of linter packages, and there are others that I'll come blanking on some of the names that are kind of, you know, I've just initially running that on my code base and suggesting, why did you just name that variable one character? And then I can't even remember what that variable was doing. And it forced me to label it into a clearer readable way. It's absolutely crucial. And I found it difficult to express that to my supervisor. I think to some extent, some of the research expectations aren't as stringent as I guess production level projects where you're providing a service. But the more I tested and documented it in a structured way, and there were some downsides to the testing, which I'll get to in a bit, I couldn't have gotten anywhere without that. So huge reason why I find Python to be a great language. So with all of that preamble, the initial state for this was, this was a project back with Python 2.7.5, Django 1.7, Django NOS, which is a test running framework, which is built on NOS, which is considerably less popular. I think even the main NOS branch hasn't been updated in a year, but I can't look at that. There was a new package, which was brought into Django core after that, called South, which is for managing migrations, another very crucial potential aspect of a project like this. So I think I had installed South in anticipation of needing to mess with it, but it's just part of Django core from 2.0. Postgres SQL 9.4, I remember how difficult it was getting from Postgres 8 to 9, and even some of the transitions between nine. Whew, that was really hard. So in anticipation of that, and the PostGIS project, and I mentioned, there's a link in here to post on gis.com, sorry, Boston gis.com for demonstrating how you can't just do a classic upgrade, you have to do a dump and then process it to feedback in. It's very easy to make a mistake doing that, at least I found that, so I'll come back to that in a bit. And then two, there were other libraries, but those were two of the more important ones for my data structure. Django TreeBeard, which is a really interesting, efficient way to make a hierarchical tree structure, which is how I could keep track of the networking information that I was trying to show you in that diagram before, and tag it, which is still I think supports even Django 1.11 all the way up to Django 3, which is a fascinating example of a project that's managed to go between different versions of Python and still maintain that far back. So anyway, so that was the initial state. So that's zero, and then taking Python indexing style. So to prepare for the process, the first thing I did was I updated Python 2.7.5 to 2.7.18. That was the easy part. I mean, I think it involves, so originally it was a virtually in V, and I think I've now shifted to doing most of my stuff in Pipe in V, I could have put that information here. But yeah, kind of making that first shift to the latest version of Python 2.7. Then this PG dump, this Postgres SQL dump, that's a, again, not providing a live service. That was my way of having a safety, safe space to return to. I haven't put this here, but I put timestamps on all of the dumps as a way of file names as like being able to keep track when I can't remember how many dumps I used in this process. That'll be a reoccurring theme. So returning to my test coverage, and it was at least 80 something tests, it would take at least four hours to run. There were some things that I was already trying to change back then that those tests weren't passing. So it's really hard to come back to a project that's in such a kind of fragmented state. So you're not even sure what's right and what's wrong and going back through it after years can be hard. So I tried to sort of reduce the set of tests to adhere to so that it was more manageable and just how long it would take. I mean, you change one thing and if it takes four hours to run the test to see if it worked. And obviously you can drill down this particular test, make sure that test works fine, do a commit, then try and run the whole thing. But even that point where you're trying to run the whole thing and often it would be a case of, oh, I thought I fixed something here. Oh, but now this other thing doesn't work. How do I make sure? Anyway, so my strategy for that, and I think the person comes to this was to kind of narrow down the list of tests to cover and then just stick with that to keep going. And we'll see. I think that was a good strategy. I'm glad with how far I've gotten, but it could be further down the road that there's some other core mistakes that some of those tests were covering that I have not sorted yet, but we shall see. And then just updating those two packages, which again, thankfully are so supportive across multiple regions of Python that I could basically do a big leap on those without having to worry about the different versions of Python beyond that. So that was the easy part. And then the comparatively minor stuff. So I added PyTest, which mostly supports the same syntax structure for those tests. I think, yeah, I think I tried, I went to some lengths to make sure I remembered the, I put down the right number of versions, but Django PyTest seemed like a better option. Oh, I just got a message. I'm not sure what that's referring to. But yeah, so I used that as a method it's much more maintained the nose. That was a much better way to manage the tests. And it's mostly compatible with the way that you can write tests for nose. That was another kind of pruning set of tests to adhere to. And then I did the first Postgres and PostGIS upgrades. And those, again, I dumped and then did an import. And then did an import. I think I stuck with Postgres SQL nine to 10 first. I think that was the hardest with PostGIS. And I think it's much easier to update PostGIS versions, PostGIS SQL versions since 10. At least that's been my experience. There's a link there. I can put that in the chat if you like of the Boston GIS examples from PostGIS. And I see I'm actually coming up to, I'm running out of time, which is kind of ironic. So I'll just keep going quickly. Within Python two, the last supported version of Django is 1.11.29, I think. So that was the minor upgrade section. I then got to the actual shift in Python. And this was a very long complicated process to get Python future and future eyes and future eyes ended up being the tack I went with. There is another version called Modernize, which I has a nice kind of, if you decide to just go for Python three rather than trying to maintain compatibility across Python two and Python three, but Modernize isn't that well maintained anymore. And Python future as a package is trying to cover so many different options. It's worth looking through that documentation. And then again, this is a significant summary, but having structured all of those packages as much as I could that were compatible across Python two and Python three, that was when I did the shift. And yeah, so that got me up to, and it's crucial to say I didn't go all the way to Django three, I went to Django two and because there was more documentation on that, but I jumped to the latest version of Django, so that's 2.2.13. And there were some other crucial differences between the libraries, not just Python two versus three syntax, the ORM uses Cascade, it's shifted more to the SQL Alchemy structure being much more explicit about some of the SQL components. But, and then South was completely gone, I didn't need that anymore, I could use the migration structures that was originally South and part of Django core. And then I could upgrade to 2.2.3.13 having fixed the syntax. And I found at least in my cases again, I have zero front end to worry about, so I think a lot of the other changes I might have had to deal with if this was a normal web app project then that probably would have been more painful. But from 2.2 to 3.0, that was comparatively trivial. There's a useful thing to look up on the save function on models. Wasn't that big a deal to me, but again, on the back end there's a, in the slide there's a link to that documentation. And then I decided to just sort of tick the rest of the boxes, removing those, shifting Postgres SQL up to 12.3 and PostGIS to 3.0. There were some other details, it's now all dockerized rather than worrying about which server versions I'd be worrying about, but yeah. So that was the project. I realized I went to a lot of detail at the beginning and then had to go to the other stuff which might have been more useful very quickly later. But I'm happy for any questions and I can also go to the discourse chat if that's easier. And thank you very much for listening and happy to answer any questions. Thanks a lot. Why so long? Yeah, this is why I had proposed this originally as a long talk because even this is less than I had originally planned to cover. So I can see why. I wasn't sure how to squeeze it into a 30 minute option. But I'm glad, you know, it's very nice to have a chance to present it anyway. And of course, if you have any questions, we can also go to Discord if you'd rather. It seems like I don't have the option of doing a normal chat function within Zoom here. Otherwise I would have suggested that. And obviously you're welcome to switch your microphone on if you wanna just talk. No, sorry, I haven't really prepared which question. Oh, that's fine, that's fine. Well, I could go into a bit more detail about any of the sections if there's any part that would be that comes to mind. Probably the switch between Python 2 and Python 3. Yeah, that's the hardest part. Yeah, so the crucial thing, so there's the Python porting options, the 2 to 3 package is pretty good. There's reasons people have sort of added stuff on top of it, but yeah, that's a nice element. And what I also found helpful was the fact that there is quite a lot of overlap, like print, for example, of syntactic similarity between 2.7 and 3.5. So I actually, I started with just the print statements as a kind of test example to see if what I was doing made sense. And that meant at least for some of that I could run old tests in 2.7 as a kind of stop gap while trying to make sure that what I was doing is right. So that was the first big hurdle. And that's what 2 to 3 is a kind of classic example for, is fixing your print statements. The x-range versus range stuff. That was a lot harder. And so that was where, because you have arguably this, it's like maintaining two code bases. There's the base of your tests and then the base of what you're trying to execute. And so I had the problem of my tests breaking not because they were necessarily, because necessarily the actual main code was wrong, but I may have broken the text, the test, but not the main body of the code. And that was really hard. So yeah, I recommend the future eyes option. That's the one that kind of got me most of the way. I think it still is trying to help with maintaining something that goes across Python 2 and Python 3. Whereas modernized, I think was pushing for a more strict just three option. But future eyes, I found the most helpful kind of wrapper around 2 to 3. And it's a complicated project, Python-future. And I'm actually sorry, I don't know if you can load this up, but the links I've got here go to different section of the documentation. For some reason, I couldn't include the anchor links on these. For some reason, the export to PDF, those were getting all fuzzied up. So it's just jumping to the top of those pages. So you kind of need to scroll down to the more helpful sections. But yeah, that component, the future eyes sort of automation and you've got some extra options. I didn't go with the extra options. I went with the kind of vanilla flavor of future eyes. And then kind of had to go through various sections and fix some things. And I had to be quite precious, or sorry, the opposite of precious. I had to be quite sort of brutal and I can't remember what that does. I can't be sure if it's crucial. For the moment, I'm gonna keep going and stick with the stuff that seems mostly covered by the texts tests. This is where also stuff like MyPy and coverage ended up actually being really helpful. I don't know if those are familiar. MyPy is a way of type checking. That was also probably a way of keeping track of what was that function supposed to do. I know, I'll guess what type it's supposed to be. And try to force that. So I have, again, the sake of readability in this presentation, not covered all of those details. But yeah, so coupling that with, like I have various Python extensions within VIM, which is the way that I write code. And so it was raising warnings if I hadn't type hinted sections or if I had just raised an exception without a specific type of exception. So lots of those details that would just flare up every time I would make an edit. And as a way of forcing things to be specific and as type checked and correct as possible. Those were all elements of that process for me. And yeah, I mean, I guess I think we, in theory and in about six minutes, but trying to think if there's another way I could go into more detail. But maybe the first question is, is that at least helping answer that for you? I can try and see if I can actually find some of the code sections. I just, sorry, I didn't think that that would come up now and that might take a little bit of time and I'm worried we won't be able to get to something useful before I'm supposed to shut down. But yeah, does that in part help answer your question? Yeah, that was approximately what I had in mind. Another question, how long did those four phases take? So how much time? The first few were like a couple of weeks. And you know, it's hard, it's also having a bunch of other things I'm doing alongside it. And so it's hard to like, it's hard to kind of talk about that this step three. And again, this is with other projects. That was a month, it's a very intense month. And again, if this was a project to maintain something for a service for people throughout, I think you need a team to do something like that personally. I mean, of course there are developers that can presumably do some of this very straightforwardly. They know their code that well, especially if they've been connected to it for a while and it's not so sprawled from years ago. But yeah, that was a month and not doing that much else. And I found it very frustrating and very complicated. But it was quite a relief to actually get through. I think if it's helpful to say maybe three weeks in, I've almost given up and took my supervisor's kind of encouragement to keep me going. And yeah, I think I also often kind of didn't trust something when it initially just seemed to not work. And I think it was also because of my more recent projects have all been Python three. So my brain going back and forth between Python three ish syntax and Python two ish syntax. I would often get frustrated and think I made a really terrible mistake. And actually it was usually a really small syntax mistake but it could be really hard to find. But once I did, and then it was sort of like, oh, it wasn't actually that big a problem. It was just a really annoying typo, okay, I can keep going, this is gonna work out. I realized that's quite abstract. But yeah, so that was a month. And then the other stuff has been, that's been about a couple of weeks. We're not really done with that yet. So, and it's sort of like the irony of doing arguably the hardest part, which was phase three and now actually trying to check the results to phase four, but then having a whole bunch of other projects that I have to get done has kind of put that on pause. So it seems like stage four is fine, but we'll see when I actually get down to the results, which I have to now then export to R, if we actually get numbers that we like. So it seems like four is the easy part, but I don't know. I feel like I could answer that in a month, maybe more accurately, because it's not really done yet. But at least those components of it, like fixing the testing structure, getting rid of nose, going up to post-GIS three, all of that stuff I'm quite proud of. And once it was just three, rather than jumping between Python two and Python three, that was much easier for me, partly just conceptually, because I didn't have to keep trying to remember which version of what. I guess the other thing I haven't mentioned is I have a lot of Git forks in doing this. So trying to be very kind of retain a base that I knew working and then fork from that until I was happy with that fork and then merge that back into main, or sometimes many, many forks that merge back together to finally get to main. Yeah, and I'm also really sorry I can't show you that project. My supervisor is really keen to get this published and his approach to that is to not have it released until we actually get a draft of a paper to a journal. Otherwise, I would have a repository to just point out for stuff, but yeah. Thanks, that has been very interesting. Sure, thank you for being here. And it's very helpful to speak to someone. And if you have any suggestions on how to present the poster, because I haven't done that very much before. Yeah, I hope this worked for you. And yeah, if the, you know, my email address is, I mean, you can see my, I don't use Twitter that much, but I'm griff underscore Reese and my email address at Sheffield is griffith.rese at Sheffield.ac.uk. So if you've got some more questions, especially if you've got, you know, specific sections of code, I can try to take a look at, I'm happy to help. And I think that means that we're supposed to leave the room. I am also scheduled to try to give a very quick lightning talk for a very different project. So maybe see you for that. And I will also shift to that channel in Discord in case you or anyone else have things to chat about there. Thanks a lot, bye. Sure, thank you very much. Thank you.