 Ok, Thomas is also living. Yeah. So now we're... Ok. So the next session is about web APIs using Python or accessing web APIs. And the first thing that I was gonna ask is that what is a web server? Yeah, that's a good question. Do you have a... I think answer. Yes, because I was thinking about it yesterday and today. And I think I have an answer. So server is a software that runs on a computer. And the mission of the software is to serve so we can make requests to the server and it will respond according to the request. And this usually means that we ask for some data and it will provide us with the data. And confusingly the computer that the software runs on is also called a server. And then what makes a server, a web server, it's that the server is connected to the web. So it has a web address. So when we ask stuff from it, then we ask it by sending the request for data to this web address. So does this make sense to you? Yeah, definitely. So if you think about it, we are constantly doing these kinds of requests, I think, because we are updating the HackMD for example constantly. And somehow the information gets to us. Something in the background in the browser has to make constant requests to the server to get this information to us. So now the HackMD software is kind of the server and also like every time we go to a website, the website is a server or runs on a server and then we ask for it for the content of the website and we get the response as the website. So if we can use a web browser to do this, why would we use Python? Like why not touch use web browser? So the idea would be similar to the earlier talk about scripts that why would we use Python to get data from a server. It's because like, can I actually, I will crop the screen. Yeah, sure. Because I think there was, does it show okay or? Yes. I will make it a little bit like, yeah, this. So there was, there was a, because here is like, here is a web address. And I will copy it and then I will put it in the address bar. And then we go there. So we can always do this when we want to make a request. We can put it in the web browser address bar and it will give us the, we will look at what this means in a while. But then if we want to collect data, a lot of data from different addresses, like for a long time, then we want to do it automatically. We don't want to do it manually and, and copy paste the web addresses to the web browser address bar. So I think that's the, that's the kind of the chist of it. We want to make, make programs and we want to automatize things. Does this make sense? Yes. What kind of tools does Python provide for us to do this? So, yeah, so, so now when we want to use Python, then we will use requests library, which is like the go to library. And that, that will provide us with basically everything we need. There is one more term and that is the API. So, so what, what is a web API? What, what does it mean like in relation to the web server? Do you have a? I think usually when web APIs are talked about this, they are a way of, of machine talking with the server instead of like a human who has to look at the web page. So when you have an API, you usually it's provided by the other side server to, to allow machines efficient access to the data in the server. Yeah, I think that's, that's like the correct answer. And the bad, the unfortunate thing about it is that it's, it's really abstract. So, so API is, is a machine interface. So we will look at some APIs now. So, so I was thinking that maybe, maybe we just take an API. And we start making requests to it. And let's see what we get in response. So there were some, okay, here we have retrieve data from API. And because we don't have any particular API in mind, there are some websites that gather like collections of APIs that we can use for developing and testing. So this was one, one of the websites and in there there was a cat fact API. So we can ask for facts about cats. So, so if we go here, I checked it a second ago and this LinkedIn work for me. It should give you, give us the information about the API. So, so what, what this means is that this is the web appress of the server, I would say. And then after that, you get the kind of the API specifications, which in this case is the fact API endpoint. So does the endpoint basically mean where our request should go? Yes. So, so, so then when we add the API endpoint to the web server address, we get the web server knows where to kind of route the request. So, so when we do this from Python, let's, let's send a request to this API from Python. I'll quickly note that if you're doing this, and you can do it at the same time, don't maybe hit the shift enter or something because we want to like the does the server like distribute a denial of service to anybody. So, if there's hundreds of people is doing the request at the same time, I hope that the cat facts don't go down. Yes. Good. So, so don't, don't be, don't be mean to them and do a lot of requests. Just do once if you get one request that's that's one fact that's well enough. Yes. Good. Good. Very good point. So I would start with importing the requests library. And then let's, let's take the, let's take the address or the URL or the API. And then in order to say to the server that, that give me, give me data, we use the get function. So, we will, we want to have a response. And then we want to, we use the get function to ask for the data. So, so if I just now run it, let's see what happens. So something in the response. Yes, something, something happened. So if we display the response, well, what does this mean? It means that the, the response kind of code status code is 200, which in general means success. So everything starting with four would be an error like 404 or so. So now if we want to actually, we have the response object. And now if we want to actually see what's inside it, we can use the response content. Okay, now it seems like very much more like, like a cat fact. So how would you interpret this? Well, based on what I know about Python, the bee at the start means that it's a byte object. So it's some sort of like bytes, but then it can still print it as a, as a string. So it's, it's some sort of like a byte string that comes. Yeah, so okay. Yeah. So it's a, it's a string. And to me it looks very much like Jason. Yeah. So it's kind of, it's a hash. And it has two fields fact and length. And fact is a string. And length is an integer. So, but, but you're right. The bee means that it's binary. So in order to get it to kind of like a nice non-binary format, we can use the built in function Jason so that it decodes the binary format for us. And we can do it like this if everything goes, let's see. Yes. And now it's very clean and nice. Yeah. And here we noticed that like what, what Teemu said that like the, it was integer, we could see that the length is integer, even if like it was in a byte string, because in Jason like you can, you can have these kinds of like types already defined in the, in the Jason itself. So you don't need to have like a pipe or something specified after each, each value there. So Jason itself like in this binary format, it can contain all kinds of data inside of it. And then once you interpret it as a Jason, you can turn them into Python integers and Python strings and whatever, Python dictionary and other stuff. Yes. That's, that's the, like the very basic function of the request library. And then there's, what do you think, should we go to the exercises now or should we make another example request with, with these parameters? I think one example of the question would be nice. So this is, there is another example here, which is the server is universities.hippolaps.com. And it has a search up the endpoint. And we can give it like to specify the search. In the cat facts, we didn't specify anything. We just wanted a fact. So it gave us a random fact. But now we can specify the query or the request further. So. And so we can give it a country with value Finland or Sweden or any. So let's see what happens. Again, we should I scroll this. Maybe a little bit. Yeah. So let's get the response object again. Are there other types of requests besides get requests that you could do? Yes, because now we are using get because we are asking for data, like send us data. But then there's, for example, a post, which means that we, that we have some data that we wish for the server to receive. I think that's one way to explain it. And there are a few of these verbs. And it depends on what the kind of the API is meant to do, that we can use this. Yeah, it's good to remember that like if you have a situation where you have something you want to store somewhere in a web server, or if you're using all kinds of things, you might, or databases or whatever, like you might end up using a request, even if you're not like scraping or asking web, like websites, what they're doing. If you want to like store your data somewhere, you might use all kinds of APIs for that. So about this URL. So if we break this down again, it would be here, we would have the server. Server part or portion, then we have the API endpoint. And then we have this question mark here, which means that after this question mark, we have parameters. So the parameter country has to be equal to Finland. I think that. So, and then when we sent the request, we should get a response and we can display the content. Oh, that's, we have a lot of stuff. So are there really so many universities in Finland? I guess so. Because country is Finland everywhere. So, well, that's, that's nice. So, so again, it looks like Jason. So it's a list of. List of. Hashes or dictionaries in Python. And each hash has fields country. Domains web pages. Country code and the name of the university. So. Because it is Jason, we can actually. Okay, now it looks much. Yeah. And then you, we can see that it converted it into really complex Python object where we have nested like we have a list of dictionaries that contain strings that contain lists that contain least that contain strings and all of this like really complicated stuff and this really nested structure, which is very common in this. Quests and this is where like having a format just such as Jason is really helpful because we can represent these kinds of like nested structures. And now that now that if we go back to the. If we go back to the. The URL or the web address. This is about interesting about the web. API is that we can always give the parameters like this. We can always like manually write the. The URL and it might be fine if we want to. If we have like one parameter, but then if we have, let's say we have dozens of parameters, then it becomes really tedious to write the URLs and also it becomes error prone in my opinion. Yes, yeah, especially if we have some special characters. So how would we add a space there or like ampersand or. Some chosen character like, like, like a slash. Like how did we represent the slash in the request? I have no idea and I don't want to learn it. I will clear the output for now and then. Luckily we have the. The request gives us. A nicer way to format this and we can do it. Or maybe I will write it below so. We just take the server address and. Then we. Give the parameters and we can give it as a dictionary. I think we need to have the API stand in the point there as well to search. Yep, good. This, yeah, good, it wouldn't have worked. So, and then we say that we are interested in Finland. And now we make the request again and we make it. The API endpoint. And I think it was params. Yes. Yes, so. And then it should be the same responses was here. So, let's see everything looks okay. Well, nothing, nothing will break if I. Okay, yes, works fine. So we get all the universities in Finland. So the idea here is. To that this is kind of a cleaner manner, less error prone. And the request library takes care of, like you said, all the character. There are special, special characters and those kind of issues. So we don't have to worry about those. Yeah, like, like I was talking with them yesterday about this. And like I've been mainly using like either, either there. Like some like many APS provides some sort of like click here to get what data set you want. Or they provide like. Like you can just like hard, hard code the URL or what you want. And, and, or you can use like something like curl to do it from the command line. But like I, I was a bit mind blown when I realized, okay, this is so easy to do in a request. Because like all of that stuff related to like either clicking through the API's or. Or use it or writing these into the URL is like. It's such so much extra work that is completely unnecessary. And I felt a bit stupid that I haven't used this request that much more. Yeah, so easy to fill that stuff in. Yes, and, and also, because it's in Python already when you get the response, you're probably one, you probably want to do some processing for it. So you're all kind of already in Python. So, and Python is a very good tool to start processing or analyzing the data. So it's kind of like nice that you can do the data collection. All in Python, and you don't have to switch environments or tools. So I think next would be some exercises. And in exercises, there are a little bit more running requests or making requests. And a little bit of different kind of parameters called headers, which are like meta parameters. And then there was a little bit of web scraping as well. So requesting HTML instead of getting JSON. Yeah, headers, about headers quickly like headers are basically like if you, if you think about when you log into a Google account or something like you, you log into your account and you go to a web page and you're, they see that you're authenticated or something. And that's because like the request contained like a cookie or something that contained your basically the proof that you were you, and then the web server decided to serve the material that you were supposed to see. So these headers usually contain like authentication tokens and that kind of stuff that is like, you don't want to put it into the URL bar basically, you don't want to put your authentication token into the URL bar. Instead, you want it to put it as a side, side thing with the overall request. Yeah, yeah. So let's take 10 minutes for the exercises. And we'll be back in 55. Right, so five two. So bye for now. But okay, we're live again. So there were some really good discussion and questions in HackMD again. So where should we start? Well, we can probably first start with the security question. So should you, is it safe to get data via web APIs and web requests? Yeah, I think you just said it a minute ago. That was really nice that, how did you say it like that? Yeah, like my first idea was that like, if you go to a web page, like a random web page you have specified a URL in, are you willing to like click all of the links and execute whatever they say? Like usually no, like. And download the. Yeah, like usually you don't want to like download everything that the website says. And it's, I think it's similar with the web APIs, because like we said previously there, we had a byte on the top of the JSON. So it was a binary object. So it could contain whatever executable code and stuff inside of it. Like it could have been an arbitrary code there, Python code, because it's a string and when the Python interprets it and the JSON interprets it, it can like interpret as a, like remove my hard drive or something like that. I don't know. Yeah. Yeah, that in, in general, we are, when we are using APIs, we have some kind of like task at hand. For example, some social scientist needs data from immature or, and they have their own API. So they make requests to immature that like, can I have some data? So it's kind of like you trust the API source. Yes. And in that situation, it's important also to use the HTTPS like similarly to any other webpage. If you don't have the s at the end of the URL and if you don't have the green lock in your browser, it can mean that somebody can spoof the website. So if somebody manages to get hold of the domain for a second and they move the API, the request goes to like some attacker's API instead. You wouldn't know it if you're using HTTPS, so HTTPS. So with the secure HTTPS, you request will give you an error if the certificate that they provide doesn't match what they expect. So basically if you're asking the wrong person that you don't trust. So you should always use the secure layer if possible. Yeah. Yeah, that's a very good point. So always use HTTPS unless there is a very, very good reason not to. Another question in the hack and the mention by Enrico was about the ethics of web scraping. So you have done a lot of web scraping yourself. Yes. So have you encountered this? There are at least two points, two important points about the ethics of scraping. That is first one is like you said in the beginning that don't make too many requests to the server in a short time span because this is basically denial of service attack. So you are like suffocating the server and they may crash if you like attack it with too many requests in a short period of time. So usually when you make requests it's like take a five second or ten second break between requests. That's so the server is kind of and also so that kind of stuff you need to think about when you're doing scraping. And then the other stuff is another point is the actual data that you gather or collect that it might and it probably does contain some sensitive or personal information. So you have to be very careful and you have to be. You have to have like a data management plan what to do with the data and how to store it and if you are allowed to distribute it at all. And this is a topic that for example Enrico is very good at and knows a lot of. So maybe this should be another topic for in this kind of course or another course. Yes. Yeah. Also like in many cases you might if you're creating a data set or something you might on like just using web scraping you can include illegal material in the data set. So many machine learning data set for example have been removed from the internet because the authors cannot be certain if the data sets contain like illegal material or something that is not provided by copyright or just like yeah so vile stuff that you can have in the internet. So yeah you should be mindful of that as well like you can internet is a big place and you can find all kinds of stuff in there. So you should be careful when you're just doing web scraping. And as a researcher it's we're almost what we're a bit out of time now. Are there any last words. This can be a whole other topic. Yes. So requests library. Very useful. And then the requests documentation is very good. And then the web API is they are meant to be used so they always have a documentation. So you can always go there and they probably even have some code examples how to get started. And in general like whenever you do some request remember that the data that comes from there is is basically text or binary data something like that and how you interpret that is up to you. So you can use tools like beautiful soup to interpret it as HTML file. You can use you can read it as a chase and in in that you can do all kinds of like stuff with the response like it depends on what the API provides you with. And then you can convert it into whatever Python object you want and then to analysis on it. It might be like a target file you might download like binary zip or something like that. And the response would be the binary file and then you can open it and see what's inside of it. There's plenty of things that you might download via internet via requests. Okay sorry we went over time again. Great. So we have a break for 10 minutes and then we resume with parallel as the last talk of today. So we'll keep answering questions by HackMD. So enjoy your break. Bye.