 Hello everybody. Welcome back after lunch. So we have Tushar and Philip with us. We'll be talking about fetch code So before we get into the session. Hello guys, how are you doing and hope you had a good lunch? Well, I think you're muted the Hello, yes Yeah, cool nice nice no so could you could you guys like give us a Brief brief intro on how you you know got on for the project This was a Google Samoa project. Okay, so Outboard was a mandatory organization. So I wasn't actually from like Around May 2019 so At that time at around September October. He proposed that we should do this project. So at that time we got on this Interesting So, yeah, on my side, so Philip on my hand, so I'm the the lean maintainer for this this project called about code which is a number of a project for Many tools we do all in Python and it's open source, of course, it's to help analyze code and We've been part of the ghost more of code for a long long period of time as a mentoring organization and So to share what's project was selected that of like Think we had about 110 submissions at the time together with further students to be part of the summer of code project together and Now you would wondering what why would you care for having a tool to download things? It's super easy. Actually in practice. It's a bit more complicated than that when you want to download any kind of thing And and that's what fetch code is about. So Interesting, this will be really, you know, nice to know and a great talk to have so let's get right to it So I'll I'll just host the You know PPP as well and you guys can start out Yeah Okay, so hi everyone My name is Tushar Goyal And hi, my name is Shilip Umbredanna So we will talk about fetch code or smart code downloader It's a project that was made in Google summer of 2020. So let's start We will be discussing about each and every outline in the details. So All right, so The before we go to talk about fetch code, there's two three things we want to talk about to set the stage the first one is something that's called package URLs and Package URL is a project that has been adopted by by many as a way to identify Packages it's a simple problem, but every package manager platform they all use different ways to identify packages and We want to be able also to not only identify but also Locate and eventually provision meaning download a software package and To give a stupid example think about the package name file. It's a very common name. It could be Package on PIPI. There's one package called file. It could be file Which is live magic and the command line utility used on Linux name file it could be file as a load package on npm or Many different incarnations or that thing. So the thing it's it's hard to know Which package manager belongs to and should the next If you think about a couple of examples That would be an example so super simple syntax We have a pkg prefix that we use as a scheme like if it were HTTP then The type which is the ecosystem so by part of the npm and else and then the name and a version and It's it's meant to be super simple and obvious and saying that it is obvious in most case if you go next and so this option really needs to have a package URL to call so pro That's can be used to identify a package for any type of package and That's the the formal definition has so pkg prefix. It's it's kind of unique and we can eventually register that as an official scheme for URLs at the Yana, so that's part of the plan Has a type that's mandatory which tells what the package is about By pie is an example of a type npm is another one And but have at least a name Everything else is optional So if you go back to the previous slide for a second in the case of a jungle package the type is by pie the name is jungle and The version is 3.1 and with this little bit of information you can Infer many different things, you know, where to fetch the downloads You know what the version You can call the API from pi pi to get many information about these descriptions Newer and older versions You could use that also to For instance look up the database that has information about jungle In particular a security database that may have information about the form of meeting that may exist in jungle If you go back next and next So that's for package URLs and it's pretty important because that's a library that's used in scan code toolkit which is another another tool to Identify all the packages that can be detected. So scan code is a tool to Find the origin of code and its license It's used. It's really the industry standard for life section in particular but it has a the ability to detect a Different package manifest. It's able to parse a set of the file using DST Requirements the TXT which is a bit simpler, but not entirely trivial it can parse RubyGems both GenFile and RubySpecs and hundreds of different formats of packages and It's able to normalize that and assign to each of them package URL and she will next and that's done with the library within scan code or package code and That's where you have effectively all these different utilities and code to correctly normalize All the different package metadata from many different sources in one place think about 47 for instance the Debian Package versus a note package once actually they're very similar, you know, you have an archive you have a name and the version but Technically they've all used different Minute variations on how these metadata exist. Maybe one package will talk about the description Another one will talk about the summary Both of these fields may be the description. So the whole purpose of package code is to normalize that set that we can talk about the same same thing across all these different packages and Every thing there is identified by a package URL and eventually what fetch code does and that leads us to fetch code is the ability to deal both with the package URLs and the URLs and that's X and that's up to you yeah, so So let's get on fetch code. Okay, so fetch code Fetch code is a library. Okay. So what does fetch code does you can give it a URL or any package URL it will fetch it for you so What we does is like We have like the same type of URLs for now HTTP FTP and virgin control system. Yes, okay so to download The HTTP and FTP part was handled by the requests and effectively itself And if we get on virgin control system, this you are is okay. So after that we have used pips code. Okay. We have rendered pips code Like pip provides an api public pip provides an api, but It's not it's not easy to use. So we have to form the pip code. Okay, we will talk about we talk about it in detail for Okay, so like it Petch code also does that you can give any as Philip at all earlier like you can give any package URL or URL as input It will get all the rich metadata for you like the package code was doing by scanning the code base we will get it by you by reading the URL and making the Package history calls to the respective respective API calls So what were the problems faced in the virgin control system now? So like as all we know Most of us use git so for downloads so like you directly do git clone and get But there are still many other virgin control systems like svn but here. Okay a bazaar, uh, so what the People like pip provides code for downloading any kind of the virgin control system base you are like If you have tried it, uh, you can directly pip install and you can get you can feed the you are it and it will download for you. Okay, but So what we did it here like we have four pips code and made a wrapper function around it. So we can get uh Uh, like pip is used by many people. Okay, so we have just formed the pips code and made a wrapper function around it Which which which can do that stuff easily. Okay, like uh, paper Pip provides all the uh functions So we have just uh, we have took the note of those functions and made a wrapper function around them and Uh, like we took the backend functions for vcs base urs for all type of vcs base urs and made a Okay, so The next step is url to package it up the package is the thing I was talking about so Uh, if we see this url itself, okay, so it has kit up dot form slash makesp slash scan code toolkit slash Uh, okay, so slash virgin 3.2.0 rc one. Okay, so what we can see from it here like the host is kit up. Okay, uh The the host or the organization or the person who has made this code base is next week uh The code code base names can put toolkit and the current version is uh 3.2.0.5 rc one. Okay, so Uh package url does this all does all this stuff for you. Okay, so in patch code what we are doing like Okay, you are just able to get this six components So there are many components for our code base like it's code the url if download url And it's licenses. So what we do in patch code You can give it any url or package url It will get all the licenses the code view download and many types of urls that package code gives By hitting the package history apis like if you have given me pkg column npm slash Foo bar at the rate 2.3.0. Okay, so what we'll do It will take the foo bar and it will it will It will hit the empty registry and get all the data then the data is passed and Provided as a package code model so The currently the library hasn't been released. Okay, so we have working on the release part of it So we have we are working on release part of it and Uh, you can also like we are not giving support to many apis We have given to support like six apis till now like if I remember npm ruby pipy Cargo Github and bit pocket like these six have been supported till now But we we want to add support for more apis like there are many apis Which which handle packages? So we want to add support for them and we also want to include package indexes in them Okay, and we also want that our fetch code should be able to download any docker images like Currently we have done like HTTP HTTP and VCS. So in future, we also want that we should be able to download docker images Okay So, uh, like yeah the last point it's a it's a very good point. Uh dependent code. Yeah dependent code is a new project for us right now okay, uh What dependent code will be like, uh, you give a code base, uh, and there are many Every code has code base has many dependencies on it. Okay, like if you have a requirement txt or set up a py or There are many dependencies on it. Uh, so What dependent code does it's a It's a dependency resolver. Okay, uh, uh, so what you can uh, we haven't made it till now Just it's just an idea, but we'll start working on it like it will be a major user of fetch code it as well like It will be like a universal package manager like, uh, uh, for example, if a code base has to like, uh, uh, Like it can have a package or just some and requirement of txt both. So, uh, uh, but like you did npm install It got get all the empty packets for you, but still the python packages are remaining So what it does dependent code will be a universal package manager. So Thank you. Uh, you can check out the project fetch code. Uh Uh, you can also check out the chat channel or get a channel where we talk about it and About put a talk so Yeah, so that's it and uh, so if there's a you have questions and we've reserved a few minutes for for question and uh The the other thing is that the project together with the other projects Uh, uh about code will will be part also of the loss of more of code 20 20 one or four and um So there's a lot of interesting stuff there friendly community pretty active and So you're welcome to to join if you're interested and you'll be always nicely welcomed Yeah, uh, thank you. We shall I forget for Uh, the nice talk. I think we are having a few of the questions. Let's see Uh, so let me ask you of the questions here So, uh, one of the audiences asking that what are the pressure like problems we could have please or we can please So are there any challenges? In using How what's the stability from the stability perspective at this stage is this project is Yeah, uh, so Pitchport is still under production like uh, uh in gsoft 2020. We just started it. Okay. So if the project has just begin It's still has a long way to go. Uh It's not like still not production ready. Oh, but we are working on it to get it ready soon Yeah, one of the problems that to show highlighted is the difficulty to reuse Uh, the peep library peep as a library Um, because there's no api in peep. So it doesn't expose any documentation There's no guarantee that the the function you use today will be there tomorrow and and because of that, um We we had to vendor it And And which means that okay, if there's a bug fix in peep It's not going to be as simple as to update the requirements file and the latest version or just do a peep Minus upgrade Install peep to get the latest version that that's not enough Because our code vendors peep itself. We have to actually copy copy Peep latest version and do the changes Uh So that's that's a small difficulty with the the rendering approach Uh, but that's the only way in the case of peep that we can guarantee That the code doesn't break to tomorrow with a new version of peep because it could otherwise break at any time Um, now on the other end could have said well, why reuse peep then? It's pretty unique In its ability to support Geeks to version murkyo and and other version control system with the common clean syntax Uh, that's used to express these URLs and it's been tested on hundreds or millions of downloads Hundreds of millions of downloads. So it's pretty pretty robust Thank you The next question is how do we plan to resolve dependencies for packages? Well, I can talk about that So now we have this mechanism for on the one end Naming and identifying dependencies with package URLs. So that's the base We have the way to parse package manifests. So you have For each package its dependencies expressed in the same format And you have eventually version constraints for each of them and To resolve dependencies, there's really a couple of strategies. The most the the simplest one is to say I have a version that's the man that's required or version constraints. I'm going to use fetch code to get to the package type API so the package repository to get all the versions of that package And I could take a stupid the approach who says give me always the the latest version You know, that's going to be wrong, but that's not to be completely bad all the time Um, another approach is to use a dependent series over So peep now for instance now does use Starting any time now a new library called resolve lead which provides dependency resolution So we could use that library across all the package type or we could use another library called leap solve Which is part of the rpm package management tool And has a python binding too Which is providing a bit more complex resolution another tool called conda Which is popular in data science for package management is using something called the picosat, which is Constraints solver which using proportional research and and more complex math to actually do package resolution and Satisfying many complex dependence together. So I think we can do a bit of that Uh Literally using any kind of strategies last strategy would be to use the actual package management tool of every package type But that can be a bit daunting because You would need to have all the package management tool installed to sell if you want to resolve something from homebrew and And then maybe on an rpm and pi pi and ruby that would be a bit messy to have all of that um So we're we're trying to get we're trying to do something which would be a bit more possible Thank you filip Yeah, thanks for this. So I briefly had some technical difficulties Okay, I'm back now So, yeah, I I heard a bit about uh, you know why how fetch code would be better than using something like conduct But what what do you use on something like pip file? Oh So pip file pip file is one of the mini package manifest supported by scan code And and and don't get me wrong fetch code And dependent code are not intended to replace konda Or replace people near of that. That's not the goal. It's more Say you have a project and you have you want to to get an idea or a feel for what the dependencies would be Uh, without having to run all the tools and maybe you have system packages So you need to have these and these rpm installed if you're using Sun2s or you need these and these and these delian package if you're on they've done And if you're on windows you need to install these three nuggets And you have these packages And you have a bunch of note packages for the ui And that's very common. I mean pretty much every project that's a bit fleshed out Needs to have system dependencies and two and three different package manager So the whole idea is to say how how is it possible to put our rent around it in a way that's You know not running around all these different settings and trying to find a common language We're not trying to replace them at all. It's more being able to collect information Uh, but we'll never be doing package installation for instance Ah, yeah, yeah, that that really makes sense because a whole lot of times, you know, it's It's not just about having all the packages, right? It's also about having the packages on the right environment At the right time. So yeah, think about all the readme you have on all or python project Which says okay set up if you are on windows You need to install this and this and that if you're on mac do that if you're on linux do that Oh, but if you're on linux rpm do this Debian do that you want to it's almost like debian, but you still need to do that We we all write this documentation all the time, right? It's a pain. Yeah, the idea there is to find a way which would be to have something like a Meta package manifest in the end Which says run this and it will start To install the things you need on whatever your environment is and trying to be smart about it Yeah, so there's also another question is come from the crowd from the audience So, how do you Extensibly test the software before releasing it to production? So this is a little off topic, but I think this is a very important question too sure How much did I bug you on test? So, uh, it is a little bit picky about tests, uh, okay, so if you see fetch code like there is a lot of Mocking or mocking of data. Okay. Like we have to download your is okay So you cannot test the data. You cannot write test cases that are using network points. Okay. So you have to mock them So for mocking we have used Uh, we have uh right for unit test we uh took the mock class and we mock all the data Okay, like it's a it was very huge data. So we to be mocked and we Uh, okay, I have Like uh We mock the data and store it in files. Okay, then we use it as a function. Okay, like I am fetching this download Okay, for fetching this URL. So I repeat the URL it will uh get the data that has been mocked Okay, and then measure it match the data that needs to be matched That that has been provided that has been got through library. So we tested it in that way for vcs What we did like we mocked the uh download function. Okay, so in vcs what we were getting We were getting like the which type of vcs it belongs to like gate was our htsv Okay, then what we are getting like from which a host we it is coming like it helped bit bucket or any kind of That uh and the like download size of that URL and many other things like that. Okay Yeah, so I think I think yeah, uh, the the whole point is uh, eventually Uh, that you know, you need to mock you need to test the functionality. So maybe you could use mocks Maybe like magic mocks or mock patches And I think that is that is precisely what the char was getting at. Yeah makes sense. Yeah So the the the other thing is how do you extensively test the software? There's actually more code for testing than there is actual code for doing the the work The real there's more in the fetch code library But that's common in most of the libraries we have there's more code for testing that there is actual code in the library itself Yeah, I mean I think that's how it should be because you you got to test a whole lot of edge cases And many more, you know different cases even do stress testing and all of that Before you can go to production. Yeah, that's her question part And so now As we're we're scaling things up there's There's a new tool called scan code.io which Which is able to run scans On code using scan code and we want eventually to scan everything like Everything from pipi everything from ruby gems everything from node and and more Um And that will mean that fetch code will have to download each and everything they're Successful so When we'll start doing that There'll be like pretty extensive test and not mark this time. It's it's really about doing the live downloads on Potentially hundreds of millions of downloads Yeah, so yeah, thanks a lot for walking us through all of this and It was certainly a really nice session on fetch code and how it could be used Thanks a lot again. Philip and Tasha for joining us. Thank you very much for having us