 Welcome everybody to the latest ANS Nectar and RDS webinar. My name is Susanna Sabine from the Australian National Data Service, and I'm your host for today. It's my pleasure to introduce Martin, Martin Schweitzer, who if you've seen any visualisation webinars, he's helped us a few times. Martin is a data technologist with ANS in the Melbourne office. He has a background in computer science and a particular interest in visualisation, data science and user interface design. That ground includes working on large IT systems, lecturing, as well as running training courses and workshops. And Martin is currently seconded to us from the Bureau of Meteorology where he's responsible for managing the climate record of Australia. But today, he's talking about something which I knew nothing about and hope to learn a lot more. He's talking about APIs without the jargon. So Martin. Thanks very much Susanna. Good afternoon everyone. Today, among other things, we're going to be trying some of the things live. So we also have to have some fingers crossed when we do that. So web APIs, what are they? Why are they useful? Why should we be interested in them? So I was asked a question several months ago now about one of the scientists at work came to me and said, can you explain to me, I've been hearing all this stuff about web APIs. What exactly is a web API? And I knew that she was very bright. Not, I didn't have to do anything down, but she basically, she wasn't that interested in the IT part. She really wanted to know what's useful. So I had to think about how to describe what a web API is to somebody who may be not technical on the programming side. And the definition I came up with was this. It is a web page that has been optimized to be read by a computer rather than by a human. And I think that really encompasses what web APIs are all about. And I've used this definition for quite a while. Maybe it's not 100% perfect, but it's 99% perfect. And so if you want to know what a web API is, that's it. So the rest of the talk will really be elaborating on this idea and giving some examples and giving examples of how we use them, why we use them, and so on. So I'll start it with a little use case. So every morning I like to wake up half an hour before the sun rises. And of course the sun rises at different times every day. So if I want you to know what time the sun rises, one of the ways I can find out is I can visit this web page. So I'll just open this web page. And here we go. This is the Norwegian Weather Service. It shows me the weather for Melbourne. But down here we see sunrise is 7.18. So this morning the sun rose at 7.18. So that's great for me. But being a software developer, the next thing I want to do is write a little program that's going to, let's say send me a tweet on SMS half an hour before the sun rises. In other words, wake me up or give me a heads up that the sun is about to rise in half an hour. So if I want to do that, one way is to say, OK, go to this website. Look for what time the sun rises. And so we can actually view the source code or the source of this page. And if we do, there's a whole lot of stuff there. I'll just go back to the slides because it's a little bit clearer there. So what we see is someone on that page, there's the following text. I prepared the slide a while back. So that's when the sun was still rising at 6.44, not 7.18. But we see this text. So once we know that that web page contains that text, the next thing we can do is write a little program that grabs that page off the internet, looks for the text and figures out the time. So yes, the program, I wrote it in Python. And what we're going to do now is we're going to try and run this little program. So I'll just copy it. And I'm using something called Jupyter Notebook, which allows us to run programs in a web page. So here's my web page. Here's my program and a press shift and an actual program. And if all goes well, we should see 7.18. I won't explain exactly how the program works because it's not important for this. What is important though is that often it's really easy if we've got a web page to write a program that pulls something off that web page. However, as we saw before, we're actually looking for this text inside the web page. And that text has nothing to do with sunrise. It's simply how they've decided to style the web page. So if they decide today, it would be nice to put the word sunrise in bold. My program will suddenly stop working. If they decide instead of the word sunrise to make it two words, my program will suddenly stop working because it's looking for this exact text. And this is where we get onto web APIs. So let's decide we're not going to use this. We want to use a web API. I did a quick Google and I found... Sorry, it doesn't work, yes. So now I find a web API on the internet. It looks very similar. We've got a URL. The only difference or the major difference is in this case, we're giving it a latitude and longitude, which happens to be the lat and long of Melbourne. So I'll open this URL. Sorry, my new page. Let's go through it. Open link in new term. And there we go. So what we see here is similar results. We've got something called results. And we can see that all we've done is opened a web page. But in this case, the web page doesn't look nice with colors and formatting, et cetera. It's just text. In other words, it's been optimized for machine. I'll just show you what it looks like if we format it a bit. So this is the same results just formatted slightly. And what we see is we've got these curly brackets. And inside what we have are keys and values. So one of the keys is results. And inside that, we've got keys and values. So one of the keys inside that is sunrise. And there we've got a time, 8.44 p.m. So that's the first surprise. Why is it 8.44 p.m.? Well, the reason it's 8.44 p.m., we know sun doesn't rise at 8.44 p.m., is because we'll just go back to this page. We see what happened is we gave it a latitude and longitude. So it knows nothing about time zones. So what it does is it returns the time in UTC, Universal Coordinated Time. And in this case, it was 9.19 p.m. If we add 10 hours to get Melbourne's time, we get 7.19 a.m., which is the correct result. Just remember, we've got this key results. And inside the key results, we've got sunrise. And that tells us the time. So now we write a Python program to find the sunrise using our API. Yes, our API. Yes, our program. This is the URL we saw before that we're looking at. Melbourne's latitude, minus 37.734, et cetera. And this last one is the imported one. So I've got this thing that has returned, which we are called J. And you'll remember there was something inside there called results, and inside results, there was something called sunrise. So let's now run this program. I'll just copy it here. And we'll go back to our Jupyter Notebook and we'll paste it in there and we'll run it. And once again, there's probably a small rounding area there because this one gives us 7.19, whereas that one gave us 7.18. But it's close enough that we're comfortable. There's also a couple of ways in which sunrise is defined. So we now see how we can use an API and what use it is. And of course, now that we've got this API, we can substitute in any lag long. So we could go back to where we've got this API and we can go a few degrees south, let's say minus 40 south. So for Melbourne, it was 9.15. And we'll just go to this call, this web page. And sunrise is now 9.26. So as we go south in winter, the sunrise is later. So at this stage, and I've spoken about web APIs before, I often get the question, so is the web API the program that is behind this? Is it the software? Or is it the protocol? Is it that format that we saw the data in is a format that's known as JSON? So does the web API mean that it comes in JSON? Or is it the data that we're getting back with the format? Which part of this is the web API? And it's a tricky question, but what I say is it's sort of none of those specifically. And in a sense, it's all of that. So what the web API is, is it is a contract between a supplier and a consumer that a particular URL will return particular data and maybe I could add in a particular format. In other words, it's that contract that defines the API. It's saying that if you go to this URL, we'll give you back the time of sunrise in this particular format in UTC. So that's what the web API is. And of course, in order to do that, you need the software, you need the protocol, you need the data and the format, et cetera. Okay, and here's just an example that the return data, in my first example, sorry, I gave it as JSON, but people may be familiar with other formats. One other very common format is one called XML. Exactly the same results, the same sunset under that tag results. But in a different format. So the format doesn't have to be JSON and further on we'll see other examples where we return data that aren't exactly JSON. So if you've ever sat in a meeting room with a bunch of developers and people have said, oh, we need an API to deliver this data, to present this data, to write this website. Almost inevitably somebody says, great, we'll do a restful API. And it's become very much a buzzword. What does it mean? Well, restful is an abbreviation of representational state transfer. And the idea and the concepts will come from a dissertation which was written by Roy Fielding in 2000. And unfortunately about 90% of people who talk about restful APIs are sometimes vaguely aware of this dissertation but probably haven't read it and have heard the term restful so they know that that's a good thing and everything should be restful but don't really understand what it's all about. So the title of the dissertation was called architectural styles and the design of network-based software architectures which doesn't sound like it's got much about APIs. I've got to admit I'm actually one of the people who has not read the full dissertation although I have read several parts of it a few times. So I know maybe enough to talk a little bit about it. But in this dissertation Roy Fielding came up with a number of what he called architectural constraints. And basically the first one is he talks about a client-server architecture. And so just to, if you may be familiar with the terms client-server architecture but not familiar exactly what they mean. So when we talk client-server architecture we're talking about a client and all that a client is for this purpose is the machine that's on your desktop or your mobile phone or wherever you've got your web browser running it could be your iPad or whatever, that's the client. The server simply refers to somebody else's computer which it usually and often is a very big computer somewhere. And a client-server architecture means that the client in the server, your browser and their machine, let's say Amazon.com are linked together by some kind of network and in this case it's the internet. So basically our computer is talking to their computer and their computer is talking to our computer. He also comes up with a number of other constraints. We're not going to go into them in this talk because they're all jargon and this is a talk without the jargon that he talks about things like statelessness, cashability, late system code on demand and a uniform interface. So knowing this and knowing simply talking about client-server architectures we can actually say something sensible about RESTful APIs and this is what I believe is the most important part of a RESTful interface and that is everything that is needed by the server to fulfill a request is contained in the URL. In other words, the thing that we tap at the top into the browser bar should be enough to tell the server at the other end the information we need. In other words, we saw the example earlier we want to know what time sunrise is and inside that URL we gave the address of the server and we also gave the lat-long of the place that we're interested in. So this isn't 100% accurate so this is a slightly more accurate description of RESTful interfaces. Everything that is needed by the server to fulfill a request is contained in the request. There are times when your browser will send a little bit more information to a server than you actually see in the URL and that's fine for it can still be a RESTful interface as long as each request sends exactly what it needs but the most important part is that first part the URL ideally should contain everything that is needed. So far what we've seen is just getting a single let's call it a fact the time sunrise is in Melbourne. However, a very common case and the case that from the ANS perspective that we're very interested in is getting data via an API. In other words, I've written some APIs I've written a lot for work and typically we provide data we provide lots and lots of data and our preferred method of providing data is find an API. So I've written a small toy API for the purpose of this talk and so there's a lot of climate data for Australia in particular there's a climate network called ACORN which is the Australian Climate Observation Reference Network and what that is is 112 weather stations around Australia that have a very long climate record. So here's an example which goes to my server and what I'm saying is I want the temperature and rainfall data for the station number 9021 which happens to be Perth Airport. So if I go to this API this web site or this URL I will just call it and what we see is it's returned a whole lot of data and if we look at the first date it goes back to 1910 and if we scrolled all the way down we would find its values for every day up to 2010 so roughly a hundred years of data and if so if we looked at it and format it nicely we would look like this. So there are different things we return and one of them may be just simply that the query was successful or if there was an error some details why we got an error and then there's the data and for each data point for each day we've got a date we've got the maximum temperatures on 1st of January 2010 the maximum temperature per airport was 31.3 the minimum temperature was 13.2 and precip or precipitation there was zero millimeters of rain so somebody wanting that data and remember the 112 stations may not want a hundred years of daily data for a hundred stations so the next thing we can do with our API is add parameters to it for example be able to add a start and end date so we'll go back to this and it's been designed so I can say start date equals the 1st of January 2010 and I run it again and very quickly I get my new data back and now we see that the first date is 2010 and it will go for the whole year for the one month so I can put an end date so my end date is 2010 31st of the 1st so one month of data and run it again and now I've got one month of data however maybe I don't like working with JSON for whatever reason I want my data CSV another option and this option allows me to specify what format I want my data in and once again we separate these parameters the convention is with the ampersand and I write format equals CSV I get my data back and this time I get a CSV file station number date minimum temperature, maximum temperature and so what we now see is that if I'm a researcher and I want some data and somebody has developed an API and they've done it nicely I've got a really good simple way to get that data I can slice and ask that data ideally I can for example I can say I want Perth Airport but I also want Melbourne which happens to be 86071 and there's Melbourne's temperature for January 2010 and rainfall I see we've got a bit of rainfall there so we can at the start and end date we can change the format it's a really nice way to be able to get data now I know all this and the next question is how do you know this and unfortunately I haven't just need to well there's a tool called Swagger which allows us to document our APIs and so I've written some very simple documentation for my API and so somebody wanting to use my API can go to this side of the page the right hand side and they can say I'm interested in temperature or I'm interested in rainfall etc so for temperature they sorry I've just got two screens here and so I can see the right hand side there we go they can click there and they see ok there's an API called get acorn data there's documentation what it does temperature and rainfall for given site I click there and it tells me what's required so a requestation number there's the rest are optional the start date the end date and the format and it says I can try it out so I click there and it says ok give me a station number site type in 86071 for this one we will put in let's put in a start date 2010 or 101 and end date 2010 3 let's say and I'll say execute so in this documentation we can actually execute a query and it also shows us for example yeah what URL we would need to use to get that data there's another API I've got because one thing you may say to me is well this is great but I don't know what these station numbers are well I've written an API to get information about the stations and that's over here and we've got one parameter which is the station name try it out and so if this works if I type in PER I want to find all stations with PER maybe Perth and pros and cons and we see we get this result, this response we get Perth station number 9021 Esperance because it's got PER in the name we get the station number we also get a lot of metadata about the station it's elevation, it's latitude, it's long acute the date it was opened etc so if I wanted to find out information about Leviton or whatever I could go yeah type Leviton, execute find out what the station number is for Leviton and then go and get the data the left hand side is how I've specified what you see on the right hand side and I've specified it using a format or a language called YAML yet another mock up language and what this means is that it's a formal language that this specification is also machine readable so the right hand side is readable by humans the left hand side is readable by machines so a machine can go off find out how to use the API and actually make queries against it so that's our documentation now in the examples we've seen so far we've just seen text from the API but we can get more than text in the example so I guess the first thing is to have a look at this URL which actually happens to be quite a long URL and we'll just go to that URL and see what it happens when we look at it so there's the URL there and this is returned an image and if you look carefully at the image it's the eastern side of Australia so the next question maybe well what's the point of a URL that returns an image well yes an application a little web app that I wrote a couple of years back and what this does is AI presents a map of Australia and if we look carefully we'll recognise that map from the previous one but what it actually does is it allows us to click on a point on the map and it gives us three words so for that point the three words are frequent expense find if we click somewhere near Brisbane we get three different words audience replace accept so if I go somewhere up in audience replace accept it should take me back to that previous point where I wasn't yet at Brisbane which it does but the important part of this is we've got this map we consume in in fact we consume in quite a lot I forget how much a lot of people to zoom in actually done to the street level now I used a javascript or web tool called leaflet to create this map on my web page I also knew that there was something which supplied tiles, map tiles what we've seen here is map tiles called Bright Earth E Atlas but all I had to tell leaflet is that this is the URL for Earth E Atlas I didn't have to tell it well I told it it assumed that it's using this format called WMS which is a standard APR and once it knew that it was using WMS it knew how to call those tiles and in fact we can just if we go and inspect we can actually see is and what we can see is all these calls so for every time we move the map or change it it makes about 12 or 16 calls because each of those tiles were small and it stitches them together so there's one example of a tile there's another example another example basically each time we zoom in or out or pan the map it will go often get a whole lot more tiles using this web API that's been provided by Bright Earth Atlas now the important thing is how does leaflet know the specification to read and how does Bright Earth know the specification to write and their standards and these particular standards WMS for web mapping service are controlled by a group called OGC the Open Geospatial Consortium so basically if I want to write some software to display maps all after this follow that standard and I can use any WMS provider and that's one of the great strengths of using APIs and writing them to a common specification other people can use something that's been written to that specification without knowing anything more about us okay so the final part of my talk is because as we interested in fare which is making data findable, accessible into operable and reusable how does this fit in with APIs what is it about APIs that can make data more findable, accessible into operable and reusable so hopefully now we've seen some of those things without even talking about it much further data that's available as an API we can explore it we can find out the metadata to find out about that data it's definitely accessible in terms of it's available in formats like JSON which is an open standard it's available in CSV etc we'll look at two particular aspects of fare and one is the principle F3 which is findable principle number 3 and that is that metadata are registered or indexed in a searchable resource and the second is principle A1 for accessible one and that is that metadata are reprievable by the identifier using a standard communications protocol now ANS has a few services APIs so in other words as well as being able to go to ANS and do manual searches we can also do machine searches and I'm going to give a couple of examples of that so the first thing we're going to look at is we have this page and that's the ANS collection which is so we'll make these slides available after the talk and so all these links are in the slides and obviously you're welcome to try anything in the slides one thing I would ask you though is in that API that I've provided it was written very quickly it was written simply to demonstrate an API so it's definitely not industrial strength the data for all intents and purposes is just demonstration data so just be forgiving if you give it a valid URL it should always work but if you there's not a lot of error checking etc anyway getting back to the ANS API so you'll see that there's a number of APIs that ANS does provide and they're all documented on this page and as I say there's the link to this page in the slides over here so the first thing we do is have a look at one of these APIs and in particular there's this one which is the get metadata API and what we're going to do is pass a query and so we want to query the ANS metadata and we're going to say okay we want to query the class is collections we want to query the collections class and we want to look for the search term Australian Ocean Data Network and there's probably a lot of it so I'm simply interested in the first 30 rows and of course this is all documented on the ANS page so let's just try to get into that you're around, here we go so what we see is status success we can tell from those brackets at the beginning and end that this is probably JSON the format there's a message and number and this one is the documents and there's a number of documents and we see that the first one has a slug called Coral Reef Health Jack Australia it's got a key AODN it's got a display title Coral Reef Health Database Reef Jack Australia etc and we if we looked carefully it's not very well formatted but there are 30 and just a little bit bigger so it's returned the first 30 records okay so we'll go back to this first one and we noticed it had this key so the next APR we'll look at is saying okay there's just a better formatted version of what we saw the status was success the message was we found 11,530 matching entries we're starting at entry number 0 and this is the first document and it's got this key which ends in FF 67 so the next thing we will do is to use this metadata and this time instead of giving it a class we give it this particular key and we say what we want to do is see all the fields for this key we'll go to this one yes and it's not easy to see this once again however what I'll do is go back to the slide and if we had formatted it we would have seen the following so using these APRs allows us either manually or by machine to get a whole lot of rich data there's another part of FAIR is to use standard vocabularies and supports a number of vocabulary services and in this particular one we're going to look at one example in this example we're going to look at the geology vocabulary and we'll just quickly go to this page over there so we did say if we look at the URL when we look at the results it looks like a human readable form well up top here we also see there's JSON, RDF, text etc so either we can click on JSON and we get exactly the same data back but this time in JSON and what I'll do is what we see is all it's done is it's put JSON there so if I take the word JSON out and I run that query again I get the same result back in human readable HTML I could have also .html I think we also had XML so we'll just try XML and this time we get exactly the same data back but in a different format and that's one of the things I alluded to earlier that we can get the same data in different formats so that's pretty close to what we're going to say about APIs I thought just a nice note to end it would be to look at how APIs are regarded in the wider world and people may have heard of Amazon and the CEO I missed one slide no I haven't so the chief executive officer of Amazon is a guy Jeff Bezos and he wrote a memo to staff and in the memo he talks about service interfaces but another word for service interfaces is APIs and this is a memo that was accidentally leaked and if you go to this URL you'll find this post where the person writes about the memo so what did Jeff Bezos what was his view on APIs well he said number one all teams will henceforth expose their data and functionality through APIs or in his word service interfaces teams must communicate with each other through these interfaces three there will be no other form of inter-process communication four it doesn't matter what technology they use and I'll quickly say one of the very important things about APIs is that if I've got an API let's say for my climate data and I'm using let's say PostgreSQL and you're using my API to get your data and each day you run a query that queries my API and next month I decide okay I'm going to actually change to Oracle and I keep my API you will have no idea that I've changed to Oracle because the API insulates you from any changes in the underlying technology and that's one of the huge strengths of APIs that do insulate you from what that underlying technology is anyway getting back to Jeff Bezos five all APIs without exception must be designed from the ground up to be externalizable and so basically saying that we must be able to use the same API internally and externally and six I think I've got an idea of Jeff Bezos and how much he regarded APIs anyone who doesn't do this will be fired and finally seven thank you have a nice day and then if you read this post the author of the post says actually I just added number seven as a joke Jeff Bezos doesn't care if you have a nice day or not so I was going to mention one other thing so while you can access the and fun API you may also be interested in okay what data that's provided by audio is available from ants and this is currently an ongoing project and ants is busy collecting and collecting all these data sets that are available via API and so there's a page that's talking about the current state and future direction of enhancing data service discovery so watch the space so that's it thanks very much wonderful thank you Martin there is one question that came through earlier that's asked so in your first example your script could easily been broken by new formatting etc on the website whereas the API is designed to be computer readable and the format will change sorry will not change is that right so if you formatted the website your little script won't work anymore at a high level that's absolutely correct anybody who's worked in computers for more than a few months knows that everything changes the thing with API is that what people do is they'll put a version number into the API so they'll say like slash API slash V1 slash and the whole idea of that V1 is that if we do make any changes we'll publish a new API and it will be slash V2 slash so in the ones we saw they were very simple ones but almost any API that's being used in a kind of production environment should always have a version number and if it doesn't be where because it's a sign that the person who wrote it probably isn't aware that APIs always change I guess another thing about APIs is that if it's possible when they do change to still support the old format so I can say okay I'm going to add a new data format let's say YAML so now you can say format equals CSV in my data API or format equals YAML but it doesn't break the old API okay alright that's the only question that we've had come through at the moment apart from a oops this one came through we had someone who said that they had to leave but thank you for looking forward to testing slides to have a look at what you've done the question here is are the words from Troy Mott's random or algorithmically selected from some data source so from your little API that you wrote there right so what I did is I googled the thousand most frequent words because initially I'd written it and you came up with so the idea was that I can say to somebody okay meet me at jelly, tomato, staple and so it must it's no good if I have to say meet me at all, astrophysics something or other so I found a list on the internet of the one thousand most frequent words and I used those words so all those words so that I'd have some I also took out words like and the batch etc then I used an algorithm called a geo hash that gives you a number from a length long and I converted that to the three words I thought Martin that you'd used the app what three words which takes the whole world and divides it into three meter squares and assigns the three words to them so I thought you'd looked at that was definitely the inspiration so what happened is that three words went a bit viral at work and I said wow isn't this awesome I said it's a really easy thing to write if you know shouldn't take more than a day or so and that didn't believe me so I had to prove that one could write something now three words is a wonderful site but it's close source and I thought wonderful if there was an open source version of that it is the same thing okay have we got any other questions from the people who are still on the line if not I'll have to say thank you very much to everyone who attended today and thank you very much to Martin for giving your time again to give this webinar and we have a couple of people here saying thank you and can't wait to see the slides I know there's a couple of people who want to go and often play with your APIs all right thank you everyone