 So thank you very much to the organizers for having me and for sorting out all of our technical problems So as as was just said, my name is Nicole I'm a designer and a web developer. I am Australian as mentioned, but I do live here in Scotland I live in Perth, which is a beautiful part of the country. It's about an hour north of here us here in Edinburgh And just to make things slightly more complicated. I actually work for a French company So I lead the UX and UI team at PeopleDoc We're actually a sponsor of URI Python this year and we're hiring. So if you're looking for a position Come and talk to me But that's not why I'm here today to speak to you. I'm here instead to talk about my experience working on warehouse, which is The project that currently powers pi PI not pi pi pi PI. We have to make that distinction for obvious reasons So I've been working on the user experience user interface HTML and CSS code base on warehouse for about three years now and Via warehouse. I am a member of the Python packaging authority. So that's a group of developers who are Generally focused on improving the state of the Python packaging world now a fun fact that I found out whilst researching doing some research for this presentation the pi PA One of the original proposed names was the Ministry of installation, which I really love that I'm quite disappointed that they didn't choose that in the end, but it so aptly describes really what what the pi PA is about it's about installing stuff basically on your computer and Also via warehouse. I'm a member of the Python packaging working group. So the working group is an organization that's a sub body of the Python Software Foundation and our Our goal is to raise money to try and improve the state of Python packaging So the long-term vision is to be able to fund both PI PA projects so the official projects things like PI PI and PIP and virtual and but also to be able to have funds that are available to the community to be able to fund Different projects that are emerging in the packaging space So as described I'm here today to tell you a little bit about the Python package index its story Look at its history and also ask some questions about where It will go in the future So Basics first a quick introduction for newcomers. What is the Python package index? So this is my definition. I didn't get it off Wikipedia or anything like that. So it's the place For Python programmers to publicly share their code so that other people can use their code So when I say I've deliberately highlighted the because it's the place that's supported by the PSF and recommended by PI PA tools and Publicly because there's obviously lots of other ways that you can share your code But this is the place where the community has chosen to invest so it's built for the community by the community and You've probably even if you don't really know what it is You've probably already used it because when you type PIP install my favorite project What's actually happening is you're getting the file off off the PI PI servers and It's mostly via PIP that The index last month served 11.2 billion HTTP requests and if we extrapolate that out over a 12 month period It means that we're serving about a hundred and thirty four billion four hundred million HTTP requests a year That's probably a conservative estimate because that's going up every month. So this is basically what we're handling And the other side other than obviously being able to access it by PIP install is we have a web interface at PI PI org and That's the place where you can go and you can search for for different packages to install and you can find out the information about packages and On PI PI org last month. We had one point three million unique visitors from two hundred and twenty eight different countries And if you start to break that down by region We can see that the largest group of users is actually located in Asia followed by the Americas So that's north and south America bundled together Followed by Europe then Africa which has got a growing Python community and Oceana If we break that down by country, you can see that the US is just ahead of China China's rapidly catching up with the US followed by India And then in terms of the European community in the top 10 We've got Germany the UK Japan and Russia. Sorry, not Japan France and Russia. I just relocated Japan So All of these costs a lot of money. So it costs this is a back of the envelope Calculation costs about a hundred and eighteen thousand US dollars a month to run PI PI in terms of servers CDN monitoring paying for search that kind of thing and all of those services are currently donated by sponsors So again, if we extrapolate that out to over a 12 month period It costs about one point four million dollars a year to run the index So all of that to say that the index is big and it's important So I joined Python packaging about three years ago I'm quite quite a new face on the scene But I found it really interesting to take a look into the history of PI PI to see it How we actually got to this kind of mammoth project. So this is what I found So I want you to cast your mind back to the early 2000s if you can I realized there will be some people in this room who were, you know, very young infants at the time perhaps not even born It was a time of Tamagotchi. I remember these at my school amazing double denim Hotmail was the most popular email service because Gmail didn't actually exist at that time Google did exist, but it looked like this, but you might have been using AskGeeves anyway if anyone remembers AskGeeves And despite Python being released for 10 years the Python Software Foundation wasn't actually founded until March the 6th 2001 So at that time several developers had independently recognized the issue of distributing packages and Several independent Indexes had popped up. This is the vaults of Parnassus, which I think is the most amazing example of early web design And this is an absolutely it was probably the most popular Independent index at the time in 2002 so about a year after the establishment of The PSF Richard Jones proposed PEP Python enhancement proposal for those of you that don't know PEP 301 which basically proposed a central index server So something that was going to be hosted on the Python.org domain that's the PSF official property that had an error of legitimacy So the idea to be able to combine all of those are previously sort of scattered approaches to establishing an index and That's basically what happened. So in we know that Pi PI was launched in very late 2002 because we have record of four projects registered So my bet is like late December someone hacking after Christmas or something But in 2003 that's where when the project really took off So we had 273 projects uploaded in 2003 at this stage there was no data actually hosted on the index So basically it was just a list of projects. You couldn't really even search it You could find things by Trove classifier. So kind of like tags And the workflow for people was to go to the index find Project that they're looking for click through to the read me And be able to download our files directly off the read me which were hosted on Somebody's server somewhere on the internet or you could also go to the home page of the project and then find the files that way So a very manual process To give you some context around what was happening in the Python community at that time So in 2002 there was the first Euro Python with 240 attendees and in 2003 the first Picon in the US was hosted with 200 attendees So really the Python community in its infancy into starting to grow and starting to contribute to the index So 2004 comes along and we have easy install so that project was kicked off in March 2000 and fought for and Basically, it tried to automate the process that people were doing manually So it would go to pi pi I basically crawl around and try and find links of things to download onto your computer Or it would follow links to try and find things to download on somebody's website somewhere on the internet In 2005 Filing file uploads were added to pi pi. So people didn't need to host their own files. It could actually be hosted on the index So this is what the index was by 2007 in 2007 we had 1249 packages uploaded so quite a substantial growth Compared to that a few years before and more and more of those Projects were actually hosted on the servers rather than being hosted in somebody's server in their cupboard and this is what the The index looked like and I think many of you will recognize this because from a web UI perspective nothing Really changed very significantly in the in the next 10 years. It still looked pretty much like this by 2017 But meanwhile in the background whilst nothing was happening at the front the popularity of Python as a language really skyrocketed and as a result the popularity of pi PI also Also grew so this graph that graph Shows the growth in the number of packages added to the index year on year So that's not the cumulative number of packages on the index simply showing the growth Obviously cumulative would be a much steeper curve and This obviously put a lot of pressure on the index in terms of its infrastructure So behind the scenes developers were working really hard to put out fires and scale the system For the legitimate use of the the the index from the community, but also as the index became more popular there was obviously there was more Militious attacks malicious packages and spam that needed to be dealt with as well So I haven't a picture or a rare footage of a core dev at that time trying to work on on the code base Basically there was a lot of firefighting because by this stage you're they're working on a code base that was you know if you look at 20 well 10 Growing in age. Let's say it's growing in age and and sort of not designed for the kind of scale that it was handling So in terms of scaling, this is a very brief overview of What happened? So the original code base assumed That PI PI was hosted on a single server and that's exactly what happened So it was hosted on a server called Dinsdale Which was located in the Netherlands With a company called XS for all so just just sort of standard hosting in Early 20 sorry late 2012 or 2013. I couldn't quite get the date for this the infrastructure was moved to the Oregon State University open source lab and DRDB was put in place. So that basically created mirrors for disaster recovery in 2013 fastly, which is a CDN was added, which is basically Caching on steroids. They can contact me for that catchphrase later And in January 2014 Probably the most significant change in terms of the Architecture was that the code base was moved across to rack space using Gluster FS So if you are like me and you're not an infrastructure person and you don't know what Gluster S is basically the very very high-level summary is that it's a clustered file system And it means that PI PI was running on many servers that our team could access as though it was a single unit so it gave the team the Opportunity to be able to spin up and and retire shut down more resources as and when the project needed it At the same time, I should say there were several peps also proposed to help try and improve the consistency of The Python package index So even though we'd sort of scaled the index there are still problems that remained First of all PI PI predates almost everything on PI PI So I think I said 243 packages in the first year It was built really before we knew how to build great web applications with Python So we didn't have any modern web frameworks to work off So the the code base is using custom custom code which means that it's really difficult to maintain and and I think I never personally maintained it but the sort of horror stories that I've heard certainly Certainly indicate this Because it's built in it was built in a technology. That's not that well known Or not a popular sort of web framework. It was really difficult to attract new contributors to the project Also, it was really difficult to set up. So Donald who I work with on PI PI told me that he has to Comment he used to have to comment out certain parts of the code base just to get it to run locally on his computer So kind of a real nightmare for for attracting new contributors And because of that there was really no I think no is a bit harsh So no significant new feature development, which means that in terms of features it kind of stagnant stagnated Also because it was difficult to attract new contributors We had poor bus factor, which is a rather brutal term for saying What would happen to your project if one of your core core team members were to be hit by a bus? My preferred version of that is that they leave on a bus. But anyway, we're very poor back at bus factor So you had really a handful of people who knew how to Fix problems on PI PI when problems inevitably occurred So here enters our hero at least one of our heroes in the story of PI PI and that's Donald's stuff to have great deal of Respect for so Donald saw PI PI and saw the problems with PI PI and decided to Have a go at doing something different so in 2011 he created crate which was an alternative service to PI PI that was using the data on the PI PI service and I'm actually gonna directly quote him here. He said it was a bit hacky, but it was popular Shortly after he started making commits directly to the real PI PI code base and in 2014 he decided to shut down crate because he was sort of duplicating his effort across Across two projects at the same time He was constantly thinking about how could he rewrite the PI PI code base to make it easier to maintain and to Contribute to so he did a number of proof of concepts during this this time and in 2015 Something stuck so the version of warehouse, which is the project that I work on was established in early 2015 using the pyramid web framework And that's where I come in so in June 2015 Donald posts an issue basically saying I'm really bad at design I need some help with this and through a good friend of mine I find out about this and that's when I start to become involved in in Python packaging So warehouse, what is it? So as I said, it's it's using pyramid so it's using modern tech stack So we're using pyramid. We've got elastic for search and sequel alchemy as our ORM We've got really modern tooling as well. So it runs on Docker. It's really easy to set up with Docker compose We've also got continuous deployment, which is really fun now that it's being launched because if you make a change on the Codebase you can see it live on this massive website within about 15 minutes It's more stable and and more secure. I Hope it's got an improved user experience. Otherwise, I haven't done my job very well And it's easier to contribute to so as a project we work really hard to support new contributors to warehouse We've done a lot of work on our documentation We try to support people through the pull request process and one of the things that I'm really proud about On on the index is that we've had a number of people who've made their first open source pull request on The warehouse project so many people might think it's a big big project I couldn't possibly contribute to something like that, but actually you can and we really want you to So throughout the development of warehouse, I've always been I think everyone's always been really optimistic about when it would go live This is a story rather crushed because of the VGA port Store it. This is a picture of me presenting at Picon France in October 2016 and you can see in the corner there It says what it actually says is good news. We're almost ready to release So that was optimistic And I think that kind of optimism went through the project for a long time But in reality we had some problems with trying to get the warehouse code base live So first of all the speed of development was slow because we were relying on community contributions and everyone was kind of stretched Working on other things There were many major features that had not yet been started So for example the area you log into to administrate a package hadn't hadn't even begun And I think I completely underestimated how much work that was going to be. We didn't have a release date inside we had no real project management around the project and To top it all off the old code base people were still firefighting on that which was taking away resources from being able to get warehouse live So in 2017 the Python packaging working group another hero of our story applies for and receives a 170,000 US dollar grant from Mozilla. So that's the Mozilla open source Support award and that was awarded under the foundation of technology track So actually I'd like to give Mozilla a round of applause for that if we can I really I think the whole team really appreciates the The the support that we've got from Mozilla to be able to work on on the project So this award was specifically to fund the development team to bring warehouse up to feature parity with the old code base And to release warehouse and to shut down legacy Pi PI through the grant We funded myself to developers project manager and her assistant as well as our PSF liaison So we had basically a team of what's half six six and a half people working on on the project And we achieved a lot with that money and within the five months that we spent that money So we worked on authentication workflows Account administration, you know, you can you can reset your password, which is a good thing The management of projects releases and files We've solved a lot of UI problems lots of bug fixes We added a lot to the documentation So one of the things that I'm really really proud of on the new warehouse is that we have a great help section We linked through to the help section. We link through to a lot of documentation. We found that really important Because we want people who are new to the Python community We know Python is being used in teaching a lot To be able to understand what the Python package index is because it might be the first time they've actually used any kind of index and We did an infrastructure overhaul. So that was to support kubernetes based continuous deployment and to end encryption and secrets management During this time we we worked really hard to build the community So we merged 425 pull requests. We closed 302 issues We supported 26 new contributors within that five month period and of the 425 pull requests 149 came from the community, which is about 35 percent Obviously excluding bots Unusually for a software project we came in both on time and within budget which I think is really a credit. Thank you I really have to credit Sumina from change set consulting who is our project manager who did an amazing job of Herding herding everyone towards that outcome So in March 26 we released our beta. We didn't have too many problems So by April 16th, we went live with the new code base and by April 30th We turned off the old code base and there was much celebration Donald had actually stockpiled a whole number of animated gifts to post-on slack for the very occasion. So it's just celebration everywhere and I think the new code base and the new Website is awesome. That's my humble opinion. I might be biased So this is what it looks like. So that's my work and for our users. It's got a lot of really great new features So we've got markdown support So what you can do is if you're wanting to enable markdown support on your project read me You can search markdown description On pi pi and it will come up with an example project And that has all of the instructions on how you can get markdown support, which has been a very heavily Requested feature for a long time. You can get that now on your read means It's got vastly improved a search with from elastic At the moment we're not doing anything too fancy But because we're using elastic search it means that we can really extend the search capabilities of the index Which is really important It's fully responsive. So if you want to check out packages whilst you're using your mobile phone You can and you can also do all the administration tasks on your mobile as well Lots of help resources as I said, it was a really important part of the design for us And we've also got a chronological release history really easy to see in terms of your the history of an individual package So that's what we've got for you But for us we've got some pretty important things too for our own sanity moving forward It is scalable. It is extendable learnable and maintainable So it's got those four things that are really really essential towards the health of the of the project ongoing And we've got great ideas about how we could extend the index so one thing that I would really like to do is to make the user interface more accessible the Python community as we all know is really welcoming to different people and My opinion is that we should not restrict people who have disabilities from being able to fully Experience the Python package index. So we really want to make some accessibility improvements on the front-end code We've had an audit on that and there's a few points. In fact, we can talk about those at the sprints As you saw earlier most well the largest group of our users are actually based in Asia So it would kind of make sense to be able to localize and internationalize the index I Personally would like to do a lot more design research and UX improvements on the index So right now we know it's not perfect But it's hard to make decisions on how to improve it because it's difficult to know what people want So recently I did some design research where I asked people to rate What's most important to them on the project detail page that data was really really interesting and it's linked to in my slides And I'd like to continue to do that kind of work to really understand what people want and what people need from the index So that we can improve the design in line with with those needs We'd like to add two-factor authentication There's currently a specification in project in progress for this and we'd also like to improve our audit trails So right now we know about so we have a project journal basically so we can say what happened on an individual project So someone was added as a collaborator or there was a new release made or You know a release was deleted for example And we were collecting things like IP address and date and that's about it And what we'd really like to do is to collect more information on sessions, etc third parties that maybe you're interacting on PIPI and to be able to If something goes wrong on the index for our own safe to be able to see what happened But also to expose that information to the users so you can keep an eye on your PIPI account to see if anything fishy is going on But my question is how we're going to get there And I've got a few questions to propose to the community about that So the first is do we as a community value the service provided by PIPI and how much So would we be able to fill the 1.2 million dollar gap if our sponsors were no longer able to provide? infrastructure donations Do we care enough about the project to actually pay people to contribute to it as? Python continues to grow is it can stay sustainable to rely on a handful of people to maintain and improve PIPI now I'm not underestimating the number of people who Who contribute to the project but at its core there's a handful of people who are steering PIPI in its future direction Now that the Mozilla open source Support grant is over we saw the miss the time allocated to project management community engagement and feature development And we had we have had actually on the project a hundred and twenty three amazing contributors But it takes work to support all of those people How should PIPI evolve to meet the changing needs of our community? So Obviously the Python community is growing Growing in its size in its complexity the different ways that people are using Python Also computer science is evolving as his web development people expect different things from their services So no doubt people are going to start to expect different things from the Python packaging dex Are we actually equipped to deal with this? And if not, are we prepared to allow commercial interests to fill that gap? So how can you help this is my call to action? First off, can you please verify your email address? I don't know if you know about this But basically at the moment we're sending to a lot of email to unverified email addresses Which means that our email address can be classified as spam. So that's not a great situation to be in So if you've got an account on PIPI, please go and verify it takes two minutes You can engage with us. So this detail it deals SIG. That's the mailing list We've got an IRC channel PIPA. That's on free node and you can engage with us on our issue trackers So it's not a really scary world. We really want to hear from you. We really want you to engage with what we're doing You can contribute to warehouse on github. That's the address there And you can sprint this weekend. I'm running a PIPI sprint At the Euro Python sprints. So I'd really really love you to join me there We've got a whole number of issues that have been specifically tagged with ready for the for the for the sprint So I'd love to have your contribution You can donate or you can ask your company to donate a donate dot PIPI dot org So we've recently added support for recurring donations and any recurring donation would be most welcome because that would allow us to actually make Plans in the long term about how we can allocate resources to to not only PIPI but other packaging projects and You can thank our sponsors. So obviously we mentioned Missila earlier that paid us a hundred and seventy thousand US dollars Which is great. We've also got all these infrastructure Sponsors so maybe if you're in charge of Choosing who you're going to use in your organization. Have a look on our sponsors page It's linked to in the footer and and see who is actually supporting the Python community So that's it for me. Thank you