 Hello everybody, so today I'm going to have a presentation about a topic that might be interesting to Some of you. All right, so before we go dad, let's go to a data sharing. Sorry not their sharing screen sharing So I'm going to share the keynote thing. All right So today the topic I'm going to talk about or maybe share about with folks here is actually talk about like open-source solutions What a big data challenges, right? Now, this is the theme statement for the presentation. So in this modern world of business, right? So data management has become the crucial critical challenge to Overcome especially for companies inside of Red Sea, right? So if you do actually invest in something called data management, eventually You might change the company's face from zero to hero, right? So the important thing here is actually talking about inside this new modern business world Data is of course everywhere, but the important thing is how to make use of it So to make use of it we use a term data management, right? So we're talking about the differences between data collection and Data management and so on and also where are you now? Are you somewhere in the management pipeline or actually still in the collection pipeline? Now these are some of the genders that we're going to cover All right, so we'll talk about what's the difference between data collection and data management And if you've got data on hand, what should you do or what can you do? Of course, we talk about the costings of maintaining the management and its pipeline the process and Then once we got all the things here, we should Pre-send them, right? So that's when we do storytelling. So there's also a demonstration here 10 to Roughly 10 minutes demonstration and I actually have videotaped it already So in case we do not have enough time, we can still read the presentation And the last thing is the FAQ right question time and so on Okay, so now who's talking so my name is Jason I came from the company called elastic right and my position is actually something called education architect pretty weird Right, so what I do I do so daily I do I run trainings I do curriculum development. Alright, so the trainings and the curriculum for elastic stack Actually, it's one of the contributors in general. I do a lot of prototyping too So that means simply coding sample projects, right? They're useful because I need to pre-send these prototypes in meetups and also for my technical blocks So that's why I still do a lot of coding even though now I'm kind of like doing trainings for most of the time So the two box I used to be Java developer writing Java back-end systems for a lot of big corporate and Also solution architect those kind of thing. All right Actually system architect is the solution architect. But anyways, pretty much the same thing So for the recent years I actually went I mean kind of like transited from Bagging developers to something what we call DevOps, which means simply one hand behind you need to Finish the back-end the front-end the database level or data level the kind of thing So that's why I kind of like shift to go language and nim as the back-end back-end development Language here. All right, because they're fast and also native bound, right? And for the front-end I kind of like switch to JavaScript in general So I can see a lot of react.js view note angler and so on. All right so if you also got these kind of I'll maybe show these kinds of skill sets and you want to kind of I have further discussions with me and so on feel free to All right So for the demonstration and the informations here This is the github. All right So this is the github for the the demonstration later on and you get back all the scripts there as well All right So take a look if you have time and you can actually take photos etc and also we actually have provided a block for the This demo as well. So even though I'm going to do demonstration, but if you prefer more on reading Tax instead of watching videos feel free to go to this block on medium and For the rest of these social media informations if you want to connect with me always find me on linking And also if you want to get back the source code for various projects and related things go to the github Okay, I'm going to use the pro master and For medium and what press yes, I've rights technical blocks And the future direction is I'm going to have all my blocks migrated to medium So what press will kind of like have nothing to do later on right? Twitter I seldom go there, but actually I have an account. So that's why I'm still kind of listening down here All right, so feel free to take a look on those resources Now the first thing the first agenda, where are we now are we doing data collection? No, actually doing data management To be honest data collection simply means getting data from various sources with nearly no processing All right So it's just like getting the audit locks from window systems and throw it to a repository and of course above it should be a Kind of like system that we can actually search and retrieve these information But remember we didn't do any processing right? So it's just kind of like a database you search for a particle Or the law you get it back done nothing else. So not much value in it. All right So we're talking about data management. We of course will collect the data but we also do processing on them and Later on we do data liberalization, which means that we will make the data life Live so that we can pre send the data in a more fancy way interesting way, right? So that means data management is a process a Step a pipeline. All right, so it's very different from just collecting them So some brain teasers. So we've got three scenarios. What do you think? These scenarios represent is it collection or just management? That's one collecting logs from different internal systems and treat them as all the logs This is collection because there is no extra value extracted from these data set collected Second one generating trend analysis on granted loans to SME's in Singapore This one, this is data management because we have trend analysis. It's not just collecting data the number of Amounts of money we've granted to SME's but also we have a trend analysis so we can Kind of like predict what will happen for the next three months or next year. All right value added Next thing discovering the relationships between savings and your occupation again This is not just collecting but also data management because we do analysis and so for banks or financial institutes They actually can use this information to kind of like predict what will happen in the economy for the next quarter For example, COVID-19, right? We know that lots of business and countries actually got locked down and so probably Savings occupations and the trends and so on they might not go well They might be flat if we're fortunate. They might even go a slum. All right So these are some of the things that we might be able to discover if we have the data there, right? So value added So now what can we do with the data on hand then because you know a lot of clients from my trainings Actually have told me that yeah, well the clients from us is like Because there's service providers. They said that all the clients just want us to store those locks there and doesn't We can do nothing. Oh, we have nothing to do. All right. So in general, what can we do? Well, usually we can do analytics so either by manual way that is find an analyst to do these kind of analytical jobs or maybe we can find somebody to Create a machine learning model so that we can learn these things without using any human inputs All right. So two ways of analytics one is human or manual performed. One is machine learning performed We can also do a storytelling base on all these data set we collected This is actually important because the next step if we can do a storytelling We can guide the management on decision-making As well, for example the trend analysis on certain things can actually help the management to have a Real idea of what is really happening inside this business world Also, we can discover the behavior and the stickiness by using regression analysis. For example, when we talk about something like the spending Related to your salary related to your living area. So actually these things can be done for regression analysis, right? So once we discovered the behavior we can kind of like create some Taylor made the projects or campaigns to focus on that particular group of people, right? So actually it's useful that way So the cost of maintaining the data management The first thing is open source and the commercial source commercial source data. I'm sorry not data products So like the theme set we're talking about how the open source projects and Software is going to help to solve the big data challenges, right? So you can see of course open source will have a Common rival core commercial products. So you can see elastic is somewhere here in the middle So it's actually kind of like free and not free So if you actually just don't need the support from elastic search the company You can use elastic stack for free Okay, but usually you'll buy the support if you're actually running elastic stack in some protection environment I guess right so that's why I would say is a yes and yes and no scenario So it's an open source product, but you can also make it commercial if you buy the subscriptions, right? So similar things happened in the field as well So you can see web splunk that is a completely commercial product that works very similar to elastic stack in general Oracle has its own suites to handle the data management processes and solutions, etc Locker from data. They're very similar to elastic as well. And they are actually sauce SAS so it's kind of like software as a Service or software as a platform as a service those kind of things, right? How do is also one of the Yes and no scenario. So it's open source for sure But for production environment, you probably would buy the license and make it commercial, right? So same case. So This is the variations the options you can get to manage a data management system Now the MIPS lots of MIPS, right? We see that open source is crap That's not true. All right. So for example, we have a lot of very nice Open source projects for example, how do is actually one of them elastic search is one of them Kafka is also one of them and so on. So there are lots of their well Produced it open source products and to be honest, even though you might think that open source is not Kind of a reliable many banking Many banks and financial institutions and even governments are actually using open source products at their back end So it's true. So well, I can't disclose the name though but you can see in Singapore some of the banks the bigger ones are already using elastic stack at the back and also some of the some of the commercial products like the What the something similar to Uber, right? So we have something similar in Uber in In Singapore like hook wrap and so on. So these kind of companies also make use of elastic search for searching and also for Geolocation related search. All right So that's the case. So open source is not really crap It depends on how the owner of the product or project runs this This thing, right? So it's just like commercial products. Commercial products also have crap things So it depends on how that company is going to run that project or product, right? So that's why it's a myth. All right. No such thing as open source must be crap, right? So support is limited on the open source products again, it's a myth If you look into elastic search, right? We have very good documentations for the public So this is also kind of support and also if you actually Don't want to buy support and you still want to get consultation for free. We Well, I mean for elastic search itself We're already maintained a forum for doing so so you can always pose your questions there and the Developers engineers and also the support team will actually take a look at that and give you some directions So of course, it's not 100% guarantee that you'll get what you want But at least we give you directions on how to solve the problem. Remember is free. Come on. All right So no guarantee on some of the things of course Okay, the next myth a team of hundred versus the collaborations of talents. So people will say that commercial products are better because we're 100 Smart brains working on that particular commercial product Well, that is a true and false thing again. So think about this way for open source like elastic search We have so many contributors from the public. So over thousands of contributors if If 50% of those thousands of contributors are also smart brains just like those team of hundreds Then probably I can tell you that actually the quality the idea the capability of those open source projects is not Worse than a team of hundred. Actually, it should be much better. All right, so think about this way If you're working in a company so your ideas or imaginations will be kind of like bound by the company structure and Management before when sauce we're actually very open-spirit. So if you have something good something fancy some good ideas feel free to contribute and maybe discuss with The group of people maintaining this open source project. So if your idea is actually a good one adopted Hey, you actually just contribute a new feature. All right. So This is a myth, right? No such thing as 100 brains must be better than the talents in the public and also with so many talents inside this computing world So that's why I submit. All right, the last point the open source company behind is not stable before or after the IPO Now this interesting thing because the elastic search actually gone through this, right? We just IPO like last year. I guess last year or last a year and a half interesting thing is Before we actually go to IPO and some of the markets will feel that the company is not stable Maybe tomorrow they will collapse and so on. So that is the reason why they don't want to buy licenses Hold on actually some myth because it really depends on how the owner of the project the company actually runs this This company all right if the runs this company well like what we did actually before after IPO is the same We're still very stable. We still have a lot of support to the users that adopted our products All right, so that's why it's a myth and on the other hand commercial companies like also go into IPO might also collapse Right, so there's no guarantee that if you are kind of like a public company, you will always be success So no such thing. So that's why it's a myth All right, so the course of your data management team The first thing is you need to have a team of people to handle these five things So we need a team of people that can integrate data source into our data management system Because the data source may be from database You need top of it from API's or from just CSV files the next thing is we need a team of people to do some process and Cleaning cleansing of data because the data you get back might be not in a uniform format So we need to kind of like Like for example, we got CSV you've got XML dataset We need to convert them into a format that is Acceptable by your data management system. So you need a team of people to do that Next thing is to apply the analytics either a manual way to the machine learning way. So again, you need people To program these things beforehand Next thing is how to generate visualizations on storytelling if you're not have a tool to do that You need a programmer to code the visualizations like for example using graph.js raffle.js and so on Or maybe you have pistol reports. You need somebody to create a report, right? So this is the case of visualization thing and of course last point is how to house keep your data All right, so all day that you need to house keep them. You need to archive back up and so on So PS a good tool stack would actually reviews all these course mentioned Now the process pipeline so you can see actually is the previous five steps We do the integration. We do the cleansing. We do analysis. We generate reports and of course house keep So each of the steps, how can we fulfill it in for example elastic stack in elastic stack? The first step is talking about something what we call Collecting and integrating data. So in this case, we use a lot of stash, which is an ETL 2 So you see we can actually have input from different sources like from standard in and then we pump the data into a Filter transformer so that we can transform them by using like graph regular expression adding some data from geo IPs and Extract a browser agent data from user agent plug-in and so on and finally we output the data to somewhere Maybe elastic search or just a flat file. All right, so that is how we Handle step one if you're using elastic stack, of course, if you're using some other products, there might be something similar as well Step to the cleansing multiple approaches so we can use it use a program to do the Cleaning beforehand You can use tools like log stash that we just mentioned earlier to do the cleansing or maybe you use some server side Products do that like for example elastic search have ingest notes Which is something that we can create an ingest pipeline. All right, so this is the pipeline kind of like a script And so every document that comes in will go through this script and do a cleansing operation. All right It's that free the analysis. So remember Analysis can be done through manually or machine learning level. So for elastic stack We provide Kibana and machine learning tools here. Like for example, this is actually a tool called machine learning inside the Kibana, so we can use this to or app to actually find out the trend analysis and also some Not at normal activities where some of the red ones here means that at this moment of time These things are actually abnormal. All right Step four we talk about data liberalization through the visualizations so usually we need a bi two for that and You can pay for it or you can use the open source version. So this is the Kibana dashboard that we provided totally free Okay. Yes to look free. So of course a free bi two would not be hundred percent super useful Sometimes all right, but it's actually already feature reached. All right The interesting thing is there will be some limitations for sure because we're not hundred percent working on This kind of visualization platform We're not like companies like Rafaana or Tableau in which they actually focus hundred percent on visualization So of course in some ways we have some limitations right compare with them In commercial products, usually we have more support on the visualization types And also able to integrate to different data sources as well But to be honest if you're actually do you don't want to pay and open source products like Kibana The and also the elastic stack is actually quite good enough. All right feature rights Step five is housekeeping. So again lots of ways. All right When we talk about housekeeping is like archiving data backup thing data Restore the data except for those kind of things So many ways so elastic stack provides basically something called indexed lifecycle management So we can create policy so that we know the index data index when they are old enough We will kind of like do archiving and maybe back up and so on So another way is to use the creator Python client, which is actually doing the same thing But you need to kind of like use some prom jobs to kind of like regularly run these from this task, right? Storytelling So why is it important for storytelling because you want to retrieve the hidden values inside the data set, right? You've got so many data, but it's very boring to just present them in spreadsheets, right? So instead you need something more vivid a graph or into Interactivities, right the kind of thing and also the second big point data collected from customers reveal their Behaviour truth. So those there's no way to lie So think about this way if I'm giving you a survey now Asking you a question did you buy something for your wife within this year like perfumes and so on? Then what you do is like you're scratching your head is trying to recall from your memory Did I actually buy something for my wife? Oh, if you did good if you didn't In big trouble. All right, so you're trying to kind of like retrieve the memory So, you know sometimes when we retrieve the memory is not 100% accurate. So you might say yes I did actually you didn't right or maybe the other way the other way round So now the case is if we actually got back all the data From your buying behavior. For example, I'm working in a bank. I get back all your credit card transactions Right throughout this year. I can actually do a filtering and find out Oh, you actually bought something that is related to females instead of for male Then probably they are related to your wife. Maybe all right So that means there's no way to lie even though through your memory the big memory You said that you didn't buy anything but actually we found out that you did buy something All right, so that is the point here So based on the truth data set we get we can actually predict the sales or do some campaigns related to a particular group Much more accurate way. All right, because these are the true data All right Now the next thing is storytelling or storyteller. Is it a good job? So these are some interesting facts are done If you go to indeed or maybe jobs street calm just type in storyteller and see here You actually got back quite a number of jobs as well. So that means storytelling is actually not too bad and also a lot of Non-technical users like business analysts and so on maybe product managers or project managers They're actually also storytellers. Right. So how to tell a story nicely if you actually have The data presented in the correct way, of course, it's better to tell the story If not, for example, you only got Excel spreadsheet Come on. This is so boring. Even though you can tell the story very vividly. It's still boring But on a visualization part if you provide the same thing here with a graph like this To be honest, it's much more better. All right It's not so boring and you can even hover on it. I have some interactivity. Wow. That's a plus All right So even though you have the technique you still need tools to assist you to make the story better. All right, so that's the point So I want to learn more if you're actually want to learn more about the elastic stack So today the demonstration we're talking about the exact stack here So we're talking about how to use file piece to ingest the data and then go to the elastic search and finally go to keep on it to visualize the thing You can actually take a look on this link Training the elastic CEO and then you will see that we have a lot of trainings talk about how to use All the tools within the elastic stack including machine learning as well. You also can Pay attention to something called certified engineers. So you need to take an exam and you get certified In us in the States Certified engineers actually got a better job. That's true but for Asia and not a big trend yet, but it's true that if you actually Have a certified certified Certification granted or earned it actually you should get a job easier plus also having some bargaining powers on getting a better salary So again recapped on the resources of the demonstration. These are the links that we get back That actually supports the demonstration later on plus also the block version and also the social media that you can kind of like paint me on and Before we go to the FA kill, let's go to a demonstration Okay