 Hello. Oh, welcome. Oh, wow, even more people. Wow, I'm not used to so many people. Usually, I just talk about Haiku and there's like 20 folks. So yes, you can actually script the web with Reboob. So just a quote I found from Tim Berners-Lee, who said the web is about transmitting information to everyone regardless the platform. When he says information, it doesn't really mean the look or how many pixels to the left or the right. It's just information. Except CEOs just want their website to look everywhere the same at the pixel. So basically, there won't flash, not the web. Wonky. So when you want to find data on the web, you need to open your browser, go to the URL, wait for megabytes of HTML, GS, CSS, sometimes flash. Well, that's not really part of the web, but anyway, to load. And then maybe when you scroll down the page, you can find something interesting. Some people know about control F, but you need to know what you are searching for. So basically, you iterate until you find. You could be paced. You strip the spaces and whatever. But then it's a bit late, so you go to sleep and you don't profit from it. So what if we could just do something? URL, or ID, or pattern, or whatever. Pipe, grab, pipe, cut, pipe, something. And just profit from it. Of course, there's curl, wget, but you just get the HTML without the CSS, the JSON. So it's just not the data. Sometimes the data is not even on the HTML code itself. It's sometimes inside a JSON file taken from an adjax call or whatever. So here comes Reboob. In its web outside of browser, no ID don't come out with the name. So if you have complaints, make the others. There's usually, every time we try to get a release in Debian, there's a lot of trolls about it, but I'm not responsible for this. And basically, you have many modules that can be instantiated as backend, so you can have the same module for two or three websites. And they implement capabilities, like bank, video, and whatnot. It's written in Python, and there is a Python framework for it. There are command line tools. There are also GUI tools with QT. So the framework, as I said, separates information into different capabilities, video banking, messaging. Messaging can be mails, but also tickets on a bugzilla or whatever. And you can either pass it a URL, or an ID, a unique ID at the backend. And of course, you can search for things. And it also emulates a web browser for the module to get the data. So both HTTP calls using your different, the existing Python frameworks, and also passwords for HTML, XML, JSON, and even PDF, because sometimes you want to extract information from bills, and the bills you get are PDF files. And the web browser engine has been rewritten, so it's even now even easier to use. Here you've got some of the modules. There's a whole list on the website. So as I said, tickets are basically messages. They are handled like emails, so you can actually write a gateway to email something. It ends up in the backtracker of your choice. And also, French banks, maybe some others, I'm not sure. I don't really check. They are used by real companies to do real stuff, like Bujair and Cosy Cloud has a module that uses Reboop to get banking information. That's something Cosy can do, because they are very small, that Google really wouldn't even want, or even dream about doing, because everyone would go at them and say, oh no, you won't get my bank account info, as well as my mail and everything else. They're also shipping providers, so you can actually use Reboop to track your shipping. And video, of course, even with the Libre software meeting videos. I looked at the first-time site, but it didn't really have a search field, so it won't really be very useful. And then you can even search for jobs. So writing a module in Reboop, you just need to provide a few files. Well, you need to just list the base module class. And then module, browser, and pages. And you, of course, need to write a test, because websites break. Most of the icons in Reboop are really crappy on purpose, because we didn't want to infringe trademark or whatever. So yeah, let's try to make a module in Reboop. There's a tool called boilerplate, which actually writes the skeleton for you. You just pass it the name of the module and the capability you want, and just write it for you. Then you update the module database, and you can check the info of your already created module. And by the way, if you don't want to install Reboop, just to try it, you can actually use localrun.sh and pass it the Reboop commands, it will just work. After a git checkout or whatever. So let's just have a look at it and fill the blanks. And you can also, of course, look at all the modules to see how they do things. But first, as I said, sometimes the data is not in the HTML code, and the HTML code used to change a lot, because, well, people don't like stable things. And it just really needs a lot of escaping, and persons, and whatever. And sometimes the declared encoding is wrong and whatever. So before you try to pass the HTML, just make sure you don't have the nicely formatted gson data or XML or whatever. And make sure you locate only the data you want and not some other junk. So the module we just got, it just exports the module class itself. So we just subclass the base module. And for a job module, we have three functions, three methods to provide. The advanced search with separate fields is optional. On the website, I targeted there's not really any advanced search, so just empty, just not provided. And basically just passes the code to the browser class. The browser module class, sorry. So it maps the different URLs you can have on the website to pages to parse. So you can have a search page, a job page, whatever. So you can provide it with regx to filter the different URLs. And that's the code. So it just also forwards the request to the correct page class. And there we go. So that's where the real work is. So I'm not really entirely familiar with the some of the Python syntax, so the ads method and whatever. So I dug it a bit, but I will not really be able to explain you exactly how it works. But basically the interesting part is below. So you can use filters to say I want the clean text of the title, tag for this field and whatever, and we'll just iterate over the page. So that's just to get the job adverts. And for the search page, you also just iterate over the results and just say take the anchor tag which have this class. And then for each item, you just want to take the link item that we are and use a reg exponent and then take the results. So it's really a few lines and you've got the data you need. That's not really the full text because there are some more fields, but at least you get a title, the description and the idea. So now we can actually test it. So we should be able to find at least one Linux related jobs on a site called Linux jobs. So we just ask the backend search job Linux and we just make sure that's at least one. So let's do it. So we update the module database and we can query for a job by specifying this backend only or else it would just go to every configured backend. Since it hasn't been configured yet, it would ask you if you want to do it. You can enter some fields and just run the search with some more options to debug and whatever. Then you can query the info for this ID at Linux jobs and then run the tests. And before you upstream your patches, please make sure you have actually running tests and the other Python syntax tests are okay. Now we have command line tools. So command lines is nice. It has an interactive interface, but sometimes you want just the data and you don't want to fill out with the formatting. So we provide a large list of formats for you, CSV, JSON, whatever. So we can do for example, hey, let's search for Python jobs on this website. Put this in a CSV file and because LibroFist doesn't really like SED in, so we just make a temporary file and then launch LibroFist calc with the proper CSV filter parameters to make sure it will read the CSV file correctly because there are many different flavors. And then you will just get a spreadsheet of all the Python jobs. But you might also want to use the framework. So for example, here's what you would do to see what's in your bank account. If you just import Reboob, you import the capability from the bank type. You just get an instance of Reboob, you load the backend for this capability and you can iterate the accounts and see what's in the balance. So hopefully you've got some profit. We can also do some other stuff. Like I showed you a job module. Well, you can actually say things like, well, let's load the backends for the job modules and for each of those words, we will use this tag cloud library that I found. There are some others as well. And we'll just make a tag cloud of all these words with a size depending on how many jobs we found. So we just count how many items are in the list when we iterate over the result of the job search for each of these words. Or one. The actual code has some more checks because, yeah, that's way enough. Because well, the code was crashing inside SDL so you have to have at least 10 as a size, but well. And then you create the tag and it looks like this. So it's an interesting way to show what's inside the website. We also now have a continuous integration system because well, people love breaking their own websites. Who knows why? But because we rely on them for important data like bank account info, especially for real companies, we actually have to fix it real soon. So now we know this module broke this test and we can fix it. So if you want to contribute like, so you have a website that's providing job offers and you want people to be able to use Reboop to search your website, you can now send us pull requests on our own GitLab server. If you are publishing a website, please, please, please stop breaking them every time. We also welcome donations to the French nonprofit that supports the project. So you can become a member of the association. And there's also professional support available. So if you want to use Reboop for a professional system, you can have real assistance without having to dig yourself. And whoa, I was quite fast. So thank you. I hope you understood everything. And well, I welcome both your questions and your patches. Questions? Yeah. They do. I'm curious how you use the hang of updating, downloading a packet with your distro and updates every six months is a bit more. Yeah, so question was, you are quite optimistic about websites not breaking every two weeks and how do you handle module updates with regard to distro packaging, which tends to actually, yeah, been sometimes several years behind. You've seen the Reboop update command. Now you can actually use Reboop with the updates that come in. It actually fetches directly from Git. If possible, so you can actually install Reboop yourself and you get the modules updated regularly without relying on the package management system, which is a bit sad because I either don't really like having to go around the proper package management system, but for this kind of things, you don't really have the choice. So there's another Git repository with only the official modules that are maintained there. Well, so what are the advantages of Reboop against like Scrappy? I don't really know Scrappy much. I've used things like YouTube, DL, or whatever for videos, but not really anything else. So I can't really say, sorry. A bit louder, please. So are you able to handle content generated by JavaScript, like ad-ex requests? I don't know if we already have a module that does it. I think sometimes websites want to generate hashes and stuff like that on the client side to validate the input of the ad-ex requests or something like that. I'm not really sure I would have to check the modules, but I know some modules actually emulate ad-ex requests to actually get JSON data instead of passing ugly HTML code, sorry. So stack over flow, can you search stack over flow? Oh, so can you search stack over flow? I can tell you. Dev Reboop, STA, I don't have any module that starts with STA, so you can send a patch, please. So is there a reference implementation for banking modules outside of French banks? I'm not actually sure that there are only French banks, but I only know about them. I didn't really check all the modules because they are quite a large list. But there are so many, I think some banks like AXA Bank, Bank, maybe International enough that they also handle the non-French versions, I'm not sure. So you'd probably just want to ask an IRC maybe. By the way, yes, we do have a Reboop channel on FreeNode if you want to join and ask questions, I don't have the answer for yet. More questions or? Okay, well, thank you very much. Thank you a lot. And you will have as those speakers Belgian chocolates.