 Ini adalah masa pertama untuk event remote. Terima kasih. Terima kasih kerana menonton. Ini adalah remote mode. Di sini anda melihat pakaian network. Anda dapat melihat pakaian perniagaan. Ini pakaian web kita. E-mail address, Facebook, Slack. Ini adalah pakaian pakaian online. Kami berikan wanita kemenggunaan dengan Smart. Memang mempersembunyi dengan kemenggunaan yang perlu diperlukan... ...untuk berusaha di karian profesional. Untuk membangun kemahaman... ...untuk membuat kerja network dan mentosyib... ...untuk membuat komuniti global untuk membantu wanita... ...semasa mereka tinggal. Bagaimana kita melakukan ini? Kita membuat sebuah event teknikal seperti ini. dan sejujurnya ada beberapa event sokskill yang lain. Jadi jika anda ada apa-apa yang anda ingin lihat, apa-apa topik yang anda ingin mendengar, biar kita tahu. Atau jika anda ingin bercakap tentang event-event atau event-event, biar kita tahu juga. Okey. Kita mempunyai pembentangan kota, pembentangan pembentangan pembentangan pembentangan daripada pembentangan pembentangan global. Dan kita mempunyai hal ini yang dipanggil Hashtag Aplod. Jadi, kerana kita, wanita, kita sangat mempunyai kejutan kita, tapi kami ingin memperkenalkan semua perempuan ini yang memperkenalkan kejutan mereka. Jika anda mempunyai kerja baru, biar kita tahu, atau jika anda tahu semua kawan-kawan anda yang mempunyai kerja baru, patut menghiasan Hashtag Aplod. Atau jika anda mempunyai kota, atau mempunyai pembentangan, patut menghiasan Hashtag Aplod. Kita berhubung bersama. Kita juga mempunyai kota, dan juga membuat tukar periuk. Ini dapat dikenalkan di pilihan berita. Jadi, kita berpunyai. Sehingga kita mempunyai kerja baru, periksa kerja baru kita, ia sangat berbeza daripada kejutan anda, seperti kerja baru atau apa-apa pun, di mana kita sebenarnya beritahu kita apa yang anda inginkan daripada kerja baru, dan akan berikan kejutan anda dari pembentangan yang digunakan. Saya ingin mengenai pembentangan kita untuk memberi kita kejutan dan mengenai kita hari ini. Jadi, saya tidak fikir ia Yoke. Ya? Tidak. Hai, Yoke. Jadi, beri kita beberapa kata-kata tentang pembentangan. Maaf. Anda ingin beri kita beberapa kata-kata tentang pembentangan? Ya. Jadi, terima kasih, semua orang. Jadi, saya dah lama di PAPAL. Jadi, saya telah beritahu, saya telah mempunyai banyak pembentangan di PAPAL. Jadi, PAPAL, sekarang saya rasa kita mempunyai sekitar 24 karansi di PAPAL. Kita berada di 200 negara. Jadi, ia sebuah pembentangan yang besar. Kita berada di pembentangan. Pembentangan industri. Kita berada di pembentangan di pembentangan yang berlainan. Saya sebenarnya dari pembentangan profesional yang berlainan untuk membantu pembentangan dengan pembentangan yang besar. Saya berada di PAPAL. Jadi, saya berada di PAPAL. Jadi, itu saja. Ya. Terima kasih. Terima kasih banyak. Terima kasih. Saya suka pizza. Okey. Maaf. Nama saya Yulin. Saya akan memperkenalkan diri saya. Jadi, saya akan memperkenalkan pembentangan yang berlainan di pembentangan ini. Dan Oga akan menjadi... Bagaimana Oga? Baiklah. Saya akan memperkenalkan pembentangan yang berlainan. Baiklah. Mari kita mulakan. Okey. Jadi, apa pembentangan yang berlainan? Sebenarnya, memperkenalkan data dari pembentangan yang berlainan. Ia tidak perlu menjadi pembentangan yang berlainan. Bagaimana? Tapi, apa yang kita dapat? Kita hebat sikap web, pembentangan data. Pembentangan data. Web haram, semua pembentangan web dan pembentangan data. Dan ia boleh jadi dari pembentangan tukang internet anda. Ia masih berkata pembentangan data. Jadi, mengapa pembentangan data? Kita membuat pembentangan data bersama dan memperkenalkan panas se Enter. Kita pembentangan data. Jadi, dari pembentangan, kami mendapat informasi dari pembentangan, pembentangan, Komparlit menjadi sesuatu yang menakutkan Jadi apa prosesnya? Kita dapat data Kita memperbaiki data untuk informasi Mengetahui informasi Dan bergerak ke pakaian lain Dan menerimanya Untuk hari ini Untuk Parti Data Stripping Ini adalah beberapa perkara yang saya suka mengekalkan Jika masa diperbaiki, kita akan mengekalkan semua ini UL-Lip adalah Web Page Modul Web Browser modul UL-Lip Kita akan melakukan beberapa download Api Dan juga untuk memperkenalkan modul yang cantik Sementara modul selilin Saya akan memperkenalkan modul ini Baiklah Selepas itu UL-Lip akan melakukan pakaian data Baiklah Jadi Kita semua install Anaconda Jika kita install Anaconda, langsung memperkenalkan modul Dan memperkenalkan Untuk hantaran Untuk download Ya Semua orang yang berikutnya Semua orang yang berikutnya Maaf Seseorang yang memperkenalkan modul Tolong berdiri Seseorang yang berdiri di sana Tolong berdiri Kita akan memperkenalkan modul Okey Hanna akan membantu diri Baiklah Jika ada pertanyaan, beri tangan Ada orang yang beri tangan untuk pertanyaan? Ya? Kamu perlukan bantuan Maaf Baiklah Semua orang yang memperkenalkan modul Kamu memperkenalkan modul Rapport 3 Baiklah Baiklah Baiklah Jadi Apa yang kita telah buat Kita hanya memperkenalkan modul Baiklah Semua orang Baiklah Selepas ini Selepas ini Jadi Kita hanya memperkenalkan modul Dan kita hanya Mencari untuk hal saya Dan kemudian Okey Jadi kita hanya memperkenalkan modul Dan kita mempunyai ini Jadi Siapa yang tak pernah menggunakan modul jupiter? Baiklah Jupiter modul itu bagus Kerana ia berikan anda Untuk modul jupiter Untuk kode dalam Python Baiklah Jadi apa yang saya akan buat untuk anda Untuk memperkenalkan Semua ini adalah modul 1, 2, 3, ini adalah modul Semua ini adalah modul Untuk memperkenalkan modul Kita hanya memperkenalkan modul Dan anda memperkenalkan modul Jadi Ia nampak seperti ini Ini adalah kode sosial Tapi ini adalah kode langsung Jadi jika anda memperkenalkan modul Anda memperkenalkan modul itu Seperti memperkenalkan modul Apabila anda melihat kode Memperkenalkan modul Dan kita memperkenalkan modul dari sana Jadi ini okey? Adakah ini terlalu kecil? Okey Baiklah Jadi kita akan membuat ini dalam Python 3 Kenapa saya mahu Kerana untuk Python 3 Dan Python 2 adalah kelihatan yang sangat berlainan Python 2 anda menggunakan URL lib2 Untuk Python 3 All combined together in the URL lib And URL lib has a couple packages This is not exclusive But the core packages is URL lib.request URL lib.error URL lib.parse And robot parser So maybe we will be using at URL lib.request tonight Okey So this is what you do All you do is import the URL lib.request Set the link And that's how you retrieve a page Just do a URL open And that's it And you just read the HTML Okey So if you want to read it Instead of a binary value You can read it as the HTML file This is what you do Just put the code UTF dash it Okey All right So this is how we do web scraping We get the information We retrieve the HTML page And we read the HTML page And get the data we want So what happens with any web page We get errors Whether the server is down You get a broken link Anything like that So it's very important When you do for your production Data scraping You need to anticipate All these errors that will happen At some point or another So let's do it again Import URL lib.request.sr Do that Okey So you will get encountered errors When you do data scraping So this is what you do Using your URL.error Try to open it If you get an error Do an exception So if you're familiar With how the web pages How the internet works So when you send a request Usually you'll have to send a header as well For most browsers They'll definitely send in the header The browser What browser is using And even the user agent So right here We're adding two headers We're adding the user agent And saying that this browser We're using the Mozilla 5.0 browser It doesn't matter That I'm not actually using Mozilla I'm just telling I'm just telling in my request That okay Just assume That I'll be the Mozilla Right So same thing I'm requesting the same URL So what happens is that In order to block robots Basically A simple thing What they do is They check Whether you have a header So this here It doesn't have the header So I'll just need to add the header And I can retrieve the page Okay If you still get Any problems retrieving the page What you can obviously Well All the ways you can check Is by basically putting robots.txt In the HTTP site The website itself For example This here we have This one Right So this here This is the URL Right We just grab the site name And we can see What the robot is looking for Well What sort of Screen measures They look for To check for robots Right Okay So this is just another way Another way To Request for the URL Is to build an opener So the benefits of using an opener Is that Once you have set The header You don't have to add the header Again and again For each of your requests So all you need to do Is just change the URL And that's it For this You have to add the header For each request For each new request You have to add the header With this You add the header And for all subsequent requests You can just Use the different URL And yet Another easier method Is by using the request library So this is a different library That has Inputed The basic functionality Of checking Of a HTML page So very simple You just need to import request And just request for the URL And you'll take care Of all the headers for you Okay And like any text This is just the Binary content Okay So very easily Right here I'll just show you How to Just get a file download So straight here If you look At this page Data.gov.my This here There's a download button here Grab this link And this Basically allows you To download the file For Malaysia's Dengi Cases Okay So this is the link Right here This is the same link So let's run this What we are doing here Is we are setting The output file The file name Of the output file And we are Putting the URL For the file download That we got From this page Right here And a very quick Simple way To retrieve The file And then save it To this output path Just need to Do a request.url Retrieve And there you go And if you look here You'll be there Download it At 7.30pm Okay You should see this Once you have Run this cell here You should see this In your Folder Everybody are good Many should download the file Okay Congratulations You just scraped Your first file Okay If I move on Any questions? Anybody? Okay Get some help over here Edy Or Olga Olga can help too For the late commerce Are you guys Are you guys following Or save payments Okay Oh Okay Right here Yes Save payments Got it? Save payments Any Problems? So you got Anybody got problems? Questions? Save the file That we downloaded Under this file directory Ya. Ini skia. Ini mereka sebenarnya di sana. Pada URL ini, apabila kita klik... apabila kita klik ke web page, kita klik ke sana, kita auto-dialog. Ya. Itu adalah link ke file sendiri. Okey, bagaimana untuk mempunyai teks dari web page? Mari kita katakan sebuah teks yang baru. Bagaimana? Teks dari web page? Ya, mari kita katakan teks. Bagaimana? Bagaimana? Bagaimana sebabnya sebuah teks dari web page? Bagaimana kita mempunyai selektifnya? Okey, ini adalah mencari kekulingan pertama 300 di web page itu. Jadi jika anda dapat kekulingan 300, anda akan dapat kekulingan sepenuh web page. Bagaimana? Okey. Jadi jika anda tidak mempunyai library yang memperkenalkan, ini adalah cara anda menggunakan library yang memperkenalkan. URL ini datang dengan Python. Ia adalah satu dari library standard. Tetapi library yang memperkenalkan tidak mungkin. Jika anda memperkenalkan pada konda, saya percaya ia diperkenalkan. Okey. Jadi ini adalah sama. Kita sudah melakukannya. Jadi saya ingin memperkenalkan anda dengan cara lain. Kerana particular URL yang digunakan, ini mungkin digunakan. Kerana ini adalah sebuah modul legacy dari Python 2. Jadi mereka memperkenalkan kekulingan anda. Mereka mungkin memperkenalkan URL ini. Jadi cara lain yang membuat download file ini. Di sini kita akan download file ini dari Data.gov.sg. Jadi anda melakukannya di sini. Lengah ini. Di mana kita akan download file ini. Yang sama seperti ini. Dan kita akan menolongnya sebagai file zip. Kerana download third file akan menjadi file zip. Beri kita file zip. Jadi apabila anda memperkenalkan itu, lagi saya akan menggunakan Opener untuk memasukkan halus. Jika tidak, mereka tidak akan memperkenalkan file ini. Dan di sini, dengan Opener.OpenURL sebagai respons. Dan OpenFileName FileName ini memperkenalkan sebagai byte sebagai outfile. Jika mereka berjaya, hanya anda mencari respons. Dan anda menulis data. Anda memperkenalkan output ke file anda. Kerana anda cuba memperkenalkan file zip. Baiklah. Jadi semua file ini yang anda memperkenalkan, anda tidak tahu apa format ia akan menjadi. Anda tidak tahu betul-betul anda akan menjadi file text atau file excel. Anda akan menjadi file binary. Jadi binary adalah pilihan yang aman. Baiklah. Sekarang anda mempunyai zip. Baiklah. Baiklah. Dan itu pilihan file kita. Baiklah. Baiklah. Jadi web API. Kita semua tahu sebagai web API. Baiklah. Baiklah. Apabila kita memperkenalkan data, banyak pilihan sebenarnya memperkenalkan API untuk anda mendapatkan informasi. Mereka tidak mahu anda memperkenalkan pilihan web dengan apa yang mereka lakukan. Mereka memperkenalkan anda dengan API untuk dapat menerima data. Baiklah. Dan terutamanya respons yang mereka memberikan adalah jason atau XML. Jasonن yang selangsam dilayanpah format XML. Jasone menyerah format HTML festival yang mudah simbola. Itu merupakan supaya jadinya тебяunful XML pinggang. Macam." HTML adalah sentiasaectik format yang sama. Jason dan colon. It's also in key value pairs. Alright, so we scroll down will be using the data.gov.sg API. So if you go to that site, you'll get, you can search for the APIs. So first thing you do import JSON, import pandas, here we'll put it into a data frame, get the data frame, get the API link for this. If you scroll down here, there you go. That's the source. Where's the API link? There you go. So you need to get the API link from the website and this here will give you the data by sending this right there, for example. So the key thing is this one here. You just need to copy that and then you have your API link, which is this one. The only difference is that I changed the limit. So let's run this. So we set our URL again import request library and this here, we're just requesting the URL and straight away we get our result. So this is a JSON format. You have your curly brackets and you have your key value pair with a colon. This is your key. This is your value. So in XML format, this would be Open Tag, API Week, 2012 Week, 17 Close Tag, API Week. So for JSON, we're just making it a lot shorter and save us couple of bytes. So how do we parse it? Because this is a key value pair, best way is you just use the JSON library and you'll give you a dictionary. So for Jupyter Notebook, you can just add a cell anywhere. Run it, it'll give you what's in there. You just need to add a cell, just add a cell anywhere, right? Click that button. Then you have empty cell. Type whatever codes you like, explore. So you want to check what is this type. Run it and there you get. So this is the best part about Jupyter Notebook, it's interactive. So what do we have? We have our data and it looks like this. So we have piped it into a JSON.loads and come out with a dictionary and this is how our dictionary looks like, right? So we have the help result. And what we want is here, the different diseases, the week and the number of cases for that week for that disease. And we want dengue and malaria. So if we retrieve only dengue and malaria, let's set up our data frame, which is a table format. This is the two diseases that we want to look at, dengue and malaria. And we want to retrieve the epidemic, epidemic, disease and the number of cases. So here, we set up the data frame and then for each result and records, we want to retrieve the epidemic, disease and the number of cases. And there we go. Now this here, we don't have the column names. So let's just quickly set the column names and save it. Okay, make sure you save this because you'll be using this file later for August workshop. Okay, very straightforward. We just retrieve using API, retrieve a response that is in a JSON format, pass the JSON object into a dictionary, read the dictionary and grab the data that we want. Okay, any questions so far? You have to test it. So, most likely, these sites, they have an API guide, so to speak, you know, a developer's guide for the APIs. They'll tell you what format it is. And from there, you can retrieve your data. Okay, so next, let's do a quick workshop. Same thing, just do exactly the same. Just at least one is going to be in XML. So for this, you will need to get a key. So just follow all these steps. So first import the modules. Okay, get an API key from NEA. You should get this almost immediately in your email. Copy the key from your email. Just click this link, register for a key from NEA. NEA is National Environment Agency. How Singapore National Environment Agency. Just quickly fill up this couple of boxes. Click that, put in that. And you'll get a key for downloading from the APIs. We want the weather. Just, you can click all of it. The now cast is the one that we'll be using. The now cast. Two hour now cast will be using for this workshop. But you can click all of it. It's fine too. Okay, once you have done that, take a quick look at the developer's guide. So for most sites that provide APIs, they usually have a developer's guide for the API, how to use the API. Okay, and this would be your request format. And your key would go in there. Okay. And we'll be doing the now cast. Set up your URL link. And create a request. So you can refer to your previous two workshops. Whether you use an opener. Or you use a request. Sorry, previous three workshops. You don't need that. That just for you to check whether you got it correct. Okay. If you got it right, you will get an XML response. So the only difference is that if you look at it, Jason is a simplification of XML. So in Jason, this would have been curly bracket, tighter colon to our forecast. And that's it. So very similar. You'll get your key and your value. And this is how you parse an XML object. Okay. So next one is just a couple of Python codes. Just look through the data. You'll have your data. This is how you grab the data. You'll get the timestamp. You'll get the time value. This here. By doing that. So write a follow. Look through this data. Get the items you want. So if you check, this is a typical avian forecast. You'll have the forecast. TL stands for... Not sure. I know PC stands for hardly cloudy. TL stands for... What? Thunder and lightning if I'm not wrong. Thunder lightning in Armour Cure. In Birdo, I think. What's TL? What's the R? I think when you're creating requests. What's the R? This is R. Yes. You are LLIP request. This is R. Because you are LLIP dot request. It's quite long. That's okay. Alright. So whatever you import it as, the R represents it. Which one? I think it's the same. Okay. This one here. Okay. That's the same. It's just you're replacing the URL. It's exactly the same. The weather. Okay. So whatever data you want, you can go ahead and explore. If you only want to look at the forecast. So this is your forecast for Armour Cure. So I'll give you an example. If we do this. So this is how it's parsed. It's parsed for a XML object. Right. This objectify is XML objectify right here. It's a module called objectify. So objectify, we're just parsing the response from the request into parse. You can, they see check. Okay. It's an XML tree. So what they do is they put it into a tree. So what happens with tree, you have to first get the root of the tree and then navigate through your tree. So that's where we get the root and then navigate through your tree. So this is your root. Then you navigate down. root.item. Press it. There you go. root.item.time. root.item.focussissue. There you go. Focussissue. And we're getting the time. Okay. Similarly, for you to get to the forecast, the two hour forecast, you need to navigate from root.area and we're going to iterate the root area and grab all the areas, all the area objects. So this is one area object and you will get all the areas, the forecast and the coordinates of the areas. I think SH is showers. LS, I'm not sure. Light shower. So hot. So hot. So hot. I don't think so. Just go up. Sorry. Yes. No. Who cares about forecast? Okay. So an XML format is like a tree. So you have the top of the tree, your channel. This is channel. And whereas channel is right at the bottom. So this here only gives you the first 500, right? Okay. So if we do that. Okay. So that's the start channel and the close tag. There you go. So that's the root of your tree. And then this is a branch. You have your title. This is one branch. Right. And then your source. Right. Then your description. Then your item. I think item is quite all the way to here. That's your item. Okay. So what these do is you get root.item and grab the area. We want the area. It's read through each area. So one area, two area and so on and so forth. So unfortunately that's all the time we have for data scraping. Okay. Yeah. Time flies. I know when you're having fun. So the notebooks are there. You can go home and they see take a look. There's the BS for beautiful soup. Which is what we use in Python to scrape HTML data. We put all these HTML tags almost like what you did with the XML. Put it all together into a beautiful soup object. And from the beautiful soup object you can get to the tags. So each tag, you know, is for the beautiful soup object. And similarly you can get the attributes and so on and so forth. Okay. So notebooks are there for you. Just run through it. And for Selenium it's basically HTML pages with JavaScript we use Selenium. Okay. Again, just run through it. And what happens with JavaScript is that because just a it's not a static page. So with JavaScript you have actual scripting you have to run the script. So we need a driver. So you need to download the driver. This here we're using Phantom.js or you can use any of the drivers that are supported by Selenium that's fine too. Right. And with Selenium basically what you need to do is you need to get them to do some actions basically clicking. Okay. Or this clicking actions. You can do that. Okay. Alright. So that concludes for the data scripting. And I will invite Olga. Thanks.