 Hi. So thank you for coming. So yeah. My name is Eric Zudin. I hope you guys are in the correct room. Honestly, I didn't expect large crowd or large hall like this. So thanks again for coming. And first and foremost, I think I would like to thank Danny and his team for making this happening. And again, inviting me to speak here in Auckland. So yeah. So today's talk is a little bit about machine learning, kind of stuff. So anyone here, Rage fans? Yeah. So they're kind of my sort of inspiration. So I've born and raised based on Rage's tunes. Yeah. You know. So pretty much the targets of today, about a little bit about myself. What on earth is ML and stuff like that? Why we use ML? Why we, you know, exploring on ML and what reason? And a little bit on how to do that based on what my pet project been working around and some sort of a little bit work based project as well. And a little bit of summary. Yep. I came from that, you know, Malaysia. Not far, but 10 hours flight from here. So yeah. I reached Thursday, I think. Yeah. This is my second time in Auckland. First time is back in four years ago when I speak in open source developers conference in Auckland as well. So yeah. My daily dose of work is more about data. It's about streaming data or batch data kind of stuff. I'm also doing solution architecting here and there and pretty much everything that my boss asked to do. Same as you, I guess. And I tried to be a bit confirm but, you know, it's not going to happen, I think, in a near time. So yeah. I was here, I think I started off my speaking adventure back in 2009, basically on local open source conference, talk meetups, marketing and whatnot. And, you know, speaking in Manchester in 2010, it's pretty much moving around the open source and kind of PHP technology application security as well. So yep, 2013, Auckland, open source developers conference. Anyone been there, I mean, attending open source developers conference back in 2013? Nope. So it kind of gap between 2013 and 2016 because I got my first baby's death so I think I kind of, you know, putting more time on her. So yeah, building slides and learning new technologies a bit, you know, kind of stuff. So in 2016, I kind of jumped off again to the speaking arrangement and things like that when I've been dupe from my friends in one of the local uni back in Malaysia. Initially, he promised that it's just a one hour talk and it's eventually like a one day workshop. So, bet him. So in August this year, I started my first Python conference back in Kuala Lumpur. The Malaysia Python user group are organizing the Azure Pacific kind of conference, Python conference. We kind of had about 200 people attending. And yep, we have Jessica Keller from US, the one of the PSF members as well, directors. Now I'm not sure she's still being with Dropbox or on her own. We also have Luis. Luis is one of the, you know, author of many Python books, data scientists and things like that. So I would like to share some key points that both of them shared during Python Azure Pacific. Luis shared a lot of stuff using, you know, how machine learning can make money and whatnot. And Jessica shared one cool stuff about her and her team bringing Python class to the prison, you know, teaching Python to the guy inside the prison. That's interesting for me. Unfortunately, I'm not in the picture because, you know, I'm going, I don't know, somewhere. So it's not that, you know, simple. You have data. You have algorithm and you come out with some sort of ML. Some, most of people think that ML can do magic stuff. So, you know, it depends on what you're thinking ML can do to you. This is one of the simple explanations from data scientists, Africa. So, you know, you can learn from experience. The normal robots follow instructions. And the more intelligent machines than from data or experience, you can call it, right? So this is basically just a typical long-boring explanation by one of the brilliant guys in Carnegie Mellon. Yeah, can read it. And this is, I think, one of the clearer and simpler explanations where machine learning is sitting. And, you know, starting off in the 80s with the, you know, growing of data, a lot of data, and capabilities of resources that we have now with clouds and things like that. So it kind of, you know, booming the AI and deep learning machine learning industries all through the world. This is basically from NVIDIA, yeah. So can I have one out of hand, how many of you guys are, you know, into the machine learning area? Building machine learning, yeah, right, cool. So basically I'm not a machine learning expert, so we're kind of just sharing my experience building stuff in ML, not all, but just part of the algorithm. So why most guys are, you know, into machine learning? So when people like Google, Facebook, and whatnot, you know, buying startups, buying people into their projects that, you know, most of the tech geeks like us, you know, want to be part of them as well. And this is one of the insights from CB Insights, these are the marketing analytics experts that, you know, you can see that these top guys, Google, Apple, and whatnot, Facebook, Microsoft, Intel, are buying, you know, really just buying all these startups to furnish their technology, furnish their initiative and projects, right? This is students, as of 2017, I guess that next year or next five years will be more, you know, buying. Maybe you should start your own startup in AI or ML soon, yeah. So, yeah, you can see a lot of money coming outflow. So money was one of the, you know, main point why tech giants are paying their sellers to AI talents as well. I guess this is not AUT. So, but universities are losing their top guys in AI. And they are not, I mean, compared to those big guys with lots of money, unless you have your own very loyal guy, you cannot, you know, hold them so far. So this is another random, I mean, normal things that you can do with ML as well, from web search up until space. NASA also using ML and stock fraud, credit check banks are using ML and whatnot, right? This is, I saw this news quite frequently. And yeah, why, how one of the Washington County Sheriff's Office are using ML to help them identify criminals based on their mark shots, you know, their processes using AWS services. And they are helping, just really doing stuff that really helps people, right? Interesting. Also, this is not basically not my password, but before coming here, I double-triple checked that my password is in the secure places for my kids. So I'm not having this kind of situation as well. So basically, what we are trying to do is drawing lines together with a lot of data. We kind of, you know, interestingly find out insight based on the data. So this is why, I thought I'd draw a deal with this, sorry. So, some interesting explanation from Berkeley, machine learning group. Some of the reason that, you know, how to separate the data based on classification methods. This is one example that you can use when you label your, for example, apples and orange. You can see the, whenever decision boundary is being set, this is the lines between your data. So when we have more complicated algorithm or complicated decision boundary, you can tend to have more interesting lines that, yep. And the other part is the regression when you want to describe your data. This is example by Berkeley when they want to predict house price based on the space of square foot kind of thing. Come up with label data, probability and predictor. So this is why we need ML, I think, to define or to draw nicely, to explain to people how we could use ML to explain to them. Basically, this is one of the interesting project by Brown University, they are students. I think this is quite interesting if you want to know more about the, what the meaning of those lines or graphs of charts. As far as my journey with ML, I just started Python honestly very deep, just about two years ago. We are building our own big data analytics platform back in Kuala Lumpur, Malaysia. So top-down solution is based on Python. Just recently, I think about six months, we are building our own ML function, ML features in our backend. So what we find out is also based on our findings. So these are the core elements, core features that you need to dive into if you want to move into ML. So basically, standard and familiar problems, design solution, bring up data, technology to use, to master and build your own model. This is the part, evaluate, fine-tune, recreate that and package nicely. So the first one, this is a simple one. You need to describe what's benefit from your solution, your problems and flow step-by-step and make sure everyone is agreeable on that. So data importantly, you need to prepare the right data. You need to pre-process, repeat the process again until you kind of have a more solid and the very best data that you think that you will have in the end of the project. And remember that when you have your better data in your data sets, it could be like producing so many outliers in the end. Yep, that's why we choose Python because they have all for machine learning. And there are a bunch of cloud-based solutions as well in ML. You can try Microsoft Azure, machine learning, GCP, Google platform as well. This is by far the most challenging part when you build the model. You would crash your system, crash your servers, spend a lot of money to build, train and repeat those core processes. And the fine-tuning to get the best data, to get the best accuracy level and how you measure the performance, how you train and test all those data. Machine learning mastery, one of the guys I call in machine learning mastery, this is one of my favorite reference, Jason Brownlee, doctor, if you do which one algorithm to use, why you use machine learning. Interesting. Yeah, I think since I'm working with mine, I mean, with somebody else, so end of the day I need to impress the stakeholders, the bosses, right? So you need to package nicely as far as whatever but agreed from the earlier phase of the project kickoff. Right, anyone of you guys have experienced this? This is full-blown, all processors, all crazy numbers of CPU usage. This is basically this machine, 16 gig per frame and i7 processor. I will discuss this on the end of the slides, but yeah. Right, I would share one, basically two samples for today. One is the, using back whatever that being discussed earlier, the case study, the best practice of two. This is two of the most lovely guy in this world, I think, I got this from Twitter bus, so yeah, I hope they don't, the NSA won't hear this, like, in. So basically the problem is that they wanted to build some sort of like sentiment analysis based on whatever that Mr. Trump says on Twitter. So based on that, when he mentioned anything, company, for example, Toyota, Nissan, whatever, Kia, the next day, that happens, right? So this is an interesting project, this is an interesting problem that you're going to, sure. Anyone haven't tried this? Nope, right, good. Right, this is when Trump mentioned Ford, so you can see the difference between this and this. Right, this is the problem, so this is the solution introduced by Trump to cash. I find of this very interesting how they see the problem, see the interest project to work around, so yeah. Trump to cash is basically a Python solution, project by one guy. You can follow the GitHub link, we'll share this afterwards. And they use Google Cloud Natural Language API, you can try that as well. And using Wikidata query services, and the current process of this is using tracking API. I'm sure it's still working or not, but we can try. So this is basically the interface for from Google Machine Learning Natural Language API. This is the presentation, how you could make or break the money from whatever that Trump to cash. Making money, yeah. So the second part of the sample that I want to show today is one of my inspiration back in Kuala Lumpur, how we built text summarization or classification based on whatever streaming data we have. You want to, for example, news or social media things. You want to classify what kind of stories that people are saying, people are talking based on that. So yeah, the ML program is text summarization or classification. We use keep them continuous bag of words. This is all related to text, text things or whatever that you want to classify. So we use Python for some time, and LTK for the corpus, the data, and Word2Vec and Jensim for, and Fastec for the libraries and wrappers. Basically Word2Vec is from Google and Fastec is from Facebook, yep. The source of data, for example, we use Brown corpus. Brown corpus is one of the most popular and used by the research and engineer and from Brown University, and Wikipedia is one of the common data sources that you can have there. Right, I'm not trying to do my Jupyter, but let's try it anyway. Right, to show you something, can you guys see that? Isn't that good? Right, basically, yep, this is inspired by Mr. Mohin. First of all, you need to download the data, whatever data you have, like I said, using Brown and pre-process the Wikipedia data text and train the model, make sure that you use Jensim and Fastec, probably, and using these, you can see that when you run this one, you can have the samples there. This is when your resources getting crazy, right? The bar, all right. So, yep, you can see that, yeah. When you train, this is one of the limited Wikipedia texts. This is around 100 meg of samples, this amount of time that being processed and generate your model, basically. And this is the larger data sets. I think it's about 700 megs of Wikipedia texts and it could spend more time on this, obviously. So, this one, this guy comparing the results based against your model. And this is when you want to check out the accuracy level four with Tuvek or Fastecs. And, yep, okay, this is basically just to want to, you know, showing the accuracy and based on the semantic and syntactically word set available. This is when we want to measure the accuracy based on the larger corpus. This is when I use text nine, text nine is Wikipedia larger database. You can see that as well. And I use Medprodlib to easily visualize whatever that being model and test up front, yep. This is the base data, Brown-USD corpus. Use semantic and syntactic. You can see that Fastec kind of, you know, performs a bit better compared to those Gensim, Word Duvek. And we use text eight data, Wikipedia data, 100 megs. You could more improve the result. And when using more, a bigger set of model base is kind of, obviously, you take more time but the accuracy level is improved. That's basically what I'm going to share today. Going back to this one. Yep, that's why I'm talking about this now. When you train, you do kind of predefined what was your need because it costs you a lot of money building those kind of things. And initially I want to put long list of summary for this talk, but when I browse through around, I think most of us know that as well when you have to build your own model. And this guy from Google, one of the tech lead in Google, give a nicest words on ML. It's a lot of work. I mean, the best summary that I can give, yep. I would like to end this talk with one random telephone. I'm developer. With this, I thank you. If there's any question, I'll try to answer. What was your second example? Your example you were trying to classify? Text summarization. So what do you mean by that? What was the, what were you? It's kind of to summarize. For example, you have a lot of text in one news, right? So you want to classify that as a national, an economical kind of a source or a sports kind of elements in that. So you want to, because a lot of data may be talking about different, different subject. So that's one of mine. Just to clarify from that question. So you were using the Wikipedia corpus, which is classified into different subjects. We're using that to train the system. So that's already classified into sport or economics. So how many classifications does it have? In the Wikipedia system? There's a lot. I mean, for this pre, kind of pre-processing, like kind of first phase of things, I would like to normally work around on the simple, simple size, like one man, and then, sorry, 100 man, and then move forward to 700 man. And if you go into Facebook fast tag, they even Google have their own pre-trained data that you can download in multiple languages. So if working around English is quite simple, quite easy because so many pre-trained data that you can use Wikipedia, Facebook fast tags, and things like that. But back in Malaysia, we have a lot of languages like Malay, Chinese, kind of things. I believe that in other parts of the world, like New Zealand also have specific language as well, local language as well. So that's kind of the amount of work and data that we're working around. Thank you.