 Yeah so tonight I'm just going to be talking a little bit about this small project that I did. There'll be a demo inside this presentation but basically the question that I wanted to answer was what jobs are in demand. I hit one year at my job then I had a lot of angst so yeah so instead of like normal people I asked like is that asking like other people and my friends and maybe I don't have enough friends or recruiters like I wanted to answer the question like you know what should I be doing so this is like a kind of an old job ad from a long time ago right and what I did is basically I extracted some keywords from the job ads on the internet like these are the keywords that you'll be looking at in this ad basically and then I used that to do data science yeah okay yeah so the original goal for why I built this was to code more because at the time my project my excuse me my job was very very project management heavy I wanted to learn new tools which wasn't exactly successful because I got very lazy and then I wanted to quantify my options which I think you know what I'm talking about yeah so I would just skip to the conclusion because I think some people may be very disappointed that I'm not going to present like really heavy stats here I'm just going to show you the demo later but what I learned from doing this over the past few months was personal projects people really love it when you put it on your resume and they basically go okay this bus I can do the job however you need different kinds of projects for different types of jobs yeah so this was just my personal experience over the past few months and then one of the other things that I noticed was maybe I'm in the wrong field because there are about 10 times as many general software jobs as data analyst and scientist jobs put together and if you're looking about salary compensation numbers the accuracy of numbers posted online depends on the company I've had people tell me oh no that must be a typo yeah but sometimes it's a very very good negotiating tool in terms of trying to push the number upwards you go yeah you know this is the range that I think I should be looking at rather than telling you what their number is especially after our manpower minister has said you don't need to tell them what your number is yep so yeah okay so like if you want to learn anything from like what I'm going to say the next like 10 minutes probably not relevant you can stop here yeah okay so like what I did uh in the how how I did this uh thing that I'm going to show you actually I should show it first but I will talk about it first then like you can look at it and judge me yeah okay so like what what tools I used to scrape job descriptions of the internet was uh selenium and beautiful soup spacey to basically pass the text uh raspberry pi which I hosted on because I got very lazy and I wanted to control my costs and I already had one and then I used a free copy of tabloo thank you gavtech for sponsoring this because there was an internal uh hackathon uh that I participated in the data arcade challenge yep so this turned into that project and then uh because it's also free google cloud storage to pull all the data in yeah okay so this is the rough workflow la basically uh what I do is I'm not going to tell you which sites I'm using because uh Michael mentioned something about legality and I'm like yeah a bit sensitive but anyway like the general idea is this once you've identified a site that you have that has information you want you go and get links you visit the sites uh you pass the raw data on the site into structured format sometimes this is a problem I'll tell you why later and then after you pass it you store the raw data in the cloud in google cloud or whatever place you want because raspberry pies are prone to breaking down and then uh periodically do batch cleaning and analysis of the data which is yeah to make sense of it so the techniques that I used mostly very very simple natural language processing in spacey and a few other python frameworks so basically part of speech tagging to find proper nouns such as reactjs and python uh term frequency to identify the top five her elso skills interesting to me personally data cleaning at the character level because people put in emojis like fire and hearts into their job descriptions as well as uh different languages which is quite surprising to me and then dimension reduction okay so the issues I faced uh doing this project were basically quite a number of things relating to web scraping um the data cleaning at the data pipelines uh basically okay so uh one of these things are the changes to site structure inconsistent site structure so when you try to scrape a website let's say they change the website a little bit your script will break because you're looking for a particular thing in the the the website that you cannot find anymore or uh every day uh certain websites change certain parts of their their website it's like just yeah no um then the other thing is uh cross-site duplicates which is basically when you're scraping multiple sites and you have the same job description posted on different sites on different dates is it the same one or is it different ones you don't really know yeah and then the longer term these were more of my own deficiencies um I wanted to make the etl more scalable and robust but actually it's just the current job and and a class because I got very lazy and I found something already hinted yeah okay so anyway like these are the things that are working on error handling sometimes the page doesn't load and then you get errors um so you ingest basically nothing but it's marked as done um so I need to fix that and then store it as well I need to put things with the property database now everything is just sitting against flat files yeah okay so basically this is something I uploaded uh onto Turbo Public the link is there if you want to see it you can see it um I've had some success uh basically how to say I've had some success basically I won't say it I've had some success uh when people ask me can you do the job and I tell them and I show them this and they're like yes okay you can yeah and then the other things that I did um one of the things that I realized was that there are actually not that many data scientists jobs uh yeah I'm in the wrong field uh yeah okay so maybe I'll stop sharing uh the I'll stop sharing the uh what's it called I'll stop sharing the uh uh presentation and show you guys the Turblue demo but uh also open it up to the floor for questions if anybody has any questions yeah Michael is it okay to do that yeah go ahead okay so anybody wants to ask me questions can just ask yeah okay so basically this was submitted uh to the data arcade challenge uh GavTech organized uh I think like a few months ago so I had a few months to work on it so basically like I scripted 90,000 jobs between June to September 2020 and then this is basically taking the text of all of these jobs and squashing them into two dimensions what you see are actually two clusters so one is um general job descriptions which don't really have very a lot hard skills such as account system and software type jobs which might overlap so you see things like system storage engineer where you need AWS and Azure and you see DevOps engineer where you need JavaScript engineering development there's no overlap because DevOps may not need JavaScript sometimes yeah okay so then this one is really a bit sensitive but uh yeah basically out over the past like months two months three months you can actually look at how many people you are competing against and how many types of jobs that like you know have certain keywords that you're looking at um here yep so this is yeah so these are the two things that I wanted to highlight then obviously I had some address data I threw it into I threw it into a dashboard yeah so actually most of the jobs are recruiters that's why you see there's a huge um there's a huge concentration in the central area yep and then the last one was something that was requested by my friend uh sorry my colleague yep who I was working with for this thing yep okay so I guess does anybody have any questions like about this I can probably answer this is a great project Sharum I mean a lot of you know like informations are there so may I ask like what are the data sources like which places have you used for like crawling the data I'm already good I'm already said that I'm not going to mention that oh okay okay sorry maybe I missed that part no no it's absolutely fine um yeah like I you can I mean you can think about it logically like if you were to start doing this project by yourself what would you do yeah yeah but and I mean some of the sources a lot of these things are public like it's all accessible on the internet um it's just that like as Michael pointed out some of the some of the yeah that it's it's a bit of a gray area to collect this data so yeah okay so I mean for me um I'm a software developer but definitely data science and then this area data analytics is one of my area so do you have any suggestion for me how can I start I mean or how can explore this area from having the background of the software development um okay so I would say that you are very lucky to have a software development background and the number one thing that I would say that you should try to pick up is first of all python and then get involved in the python community if you aren't already and if you need to find a few communities that can help you with getting your feet wet in terms of like doing data analytics I would highly recommend FastAI as a place to start looking at resources uh I'm not affiliated with them anyway I just uh google solutions a lot and I end up at FastAI quite often but people generally are very helpful they're one of the they're one of the communities that are very inclusive and then within Singapore itself like we have AI Singapore which has a few platforms that you can look at I think they have a discord channel they have a couple other channels that you can look at yeah so basically like uh python and surround yourself with people who are also interested in the same sorts of things yeah if you want to get started okay thank you thank you so much sir Sharon no problem yeah hi Sharon um who's this I'm YC he's YC yeah oh hi yes hi actually just wondering like for the job post things like like do you manage to obtain like the breakdown between which are direct highest and which are postings by recruiters and so on it depends on the source but yes yes it's possible to obtain that information it really depends on where you get it from yeah oh I see so that that kind of breakdown is tied to the job site that you are sharing from like yes that's right yeah it really it really depends um most of the time uh job descriptions tend to be reused if uh the source is the same so you can actually do a lot of analysis to understand if like you know it's a repo star yeah well actually you don't have to do a lot of analysis you just have to calculate co-science similarity yeah thanks okay thank you no problem yeah um help me understand this graph a bit so uh this chart right now I mean what's on vertical and horizontal axis oh actually this one is very confusing and we only put it in because uh Gavtech insisted that we follow this format sorry Mike anyway uh yeah okay so so this is uh actually something that I never wanted to put inside but because we had to include five things and I would hit three so and this became inside the the two axes don't really mean anything they are basically x and y axes okay so imagine all the words that are in job description they're in job descriptions on the internet like over the past three months right um how would you actually try and look at like how would you actually try and model like them as points in space basically this 2d representation is what happens when you take the words and you form them into like numbers and then you flatter them into a 2d space yeah uh that's that's that's uh that's basically what this graph is meant to represent the two the more the closer the two points are the more similar the words that are used in the the two the the closer it's sorry the more similar the two job descriptions are yeah yeah so sometimes this is a problem like because I know for instance like when Singtau posts a job they will have this really long paragraph about Singtau as a company then a really short paragraph about like the company the the job itself so some of this is like inter intra company similarity as well yeah yeah but yes it's it's what it's uh distance between the two points yep so actually one of the things that like popped out that wasn't very well explained in this dashboard right is really that like it seems like within the two clusters like the gold colorful cluster and the not so colorful cluster sorry admin assistance of this world is that like you tend to get DevOps on one side clusters of jobs on one side and then like the BAs of the world and data analysts of the world kind of here so within the two big clusters you get like sub clusters where skills are skills may be similar as well but it's not one-to-one like for instance like there are some really weird jobs out there like for instance this software developer needs to do DevOps yeah and yeah like whereas like um yeah then you have embedded guys who need to JavaScript yeah yeah this was something that we were sort of expecting basically for software jobs to have keywords that were similar because of the way that the data was pre-processed yeah sorry it's not a very good answer I think it's probably easier to say like math assign a number to a word and then kind of splatter them on the screen and see what happens okay basically yeah so like imagine if your whole job description is turned into like a three-dimensional cube of words then you squish the cube into like a point and then like yeah math so for the one thing I was I'm interested to know is do you could you also do something like oh I'd like to know how far the workplace is from my house hello no no you have like so for instance this one like this personally this is of interest to me okay like so apple is somewhere here I think I can't remember where they are oh no I'm okay yeah but you but you know roughly where your house is right so by right what you can do is like you can you can shortlist based on uh distance to your house but we I didn't put it into the dashboard because a bit contentious and also very hard to calculate it can be done basically at the data level we have I'll show the data source actually I shouldn't show the data source but yeah at the data level we have addresses la yeah and then you just need to pass stuff well you didn't mention that the sound of the listing were actually from recruiters rather than yeah so so the recruiters uh generally either I will filter them out or I will use the recruiters address yeah like it's not it's there's a lot of the it's not one to one so like you see there's a lot of postings here but um this one there far fewer yeah yeah I'm curious do you see more jobs that allow remote work um not in Singapore okay and you would you would usually not be able to tease this out from a job description as far as I know so for instance a lot of job descriptions will just be like uh these are the skills we want and uh this is the profile that we're looking for then you have to go and talk to the potential prospective employers to ask them so what is your policy of remote work like yeah it could be that I'm not scraping international sites uh well sites geared to international jobs la yeah like all these jobs specifically are targeted for Singapore because I'm looking in Singapore how do you figure out how many people apply to the jobs I think most job sites don't actually list this number or do they uh okay so there are at least two which I'm scraping which do list this number uh but yes most don't and one of the limitations of this project is that you don't actually know who your competitors are because uh most of the job sites will only will not either will either not show that or uh it's unreliable yeah yeah do you have something about how much the job is going to pay uh what do you mean sorry I think you mentioned salary numbers the accuracy of dollars is related to the company but do you have something to visualize the um the the pay scales and things like that so basically yeah so I will share the uh what's it called I will share the this thing again like I have some stuff that I did that I didn't show a bit sensitive um yeah anyway like this is the graph for data scientist jobs for I think what like the past yeah it's it's it's actually a subset of the period like I think it's like only one month but basically you can see like which companies are hiring um and that they're not that not actually that many jobs there's only about like I think a hundred plus but yes like I have the data how do we read this so your x-axis is also salary and your y-axis is salary as well okay so basically most job sites when they give you a salary number it will not be a number but a range so uh what I did was I uh used the range this is not meant to be like this this this is basically just meant to show that like I have data that I cannot really put out there yeah yeah so anyway like um in this uh taboo dashboard the public one there are only certain tabs that are available to you yeah the data which I have uh there's a lot more yeah yeah like if you want to ask me any questions about the data I'm happy to answer if I can if I can answer it hi uh you must be a data scientist pay well sorry what does being a data scientist pay well no that's surprising it I would say that if you had to choose between being a data scientist and being a software developer you should probably play in a bigger market yeah but 20k is quite a lot actually software developers have that access to actually really yes they do they do they do okay yeah and sometimes the numbers are a bit off so from what I understand there are two reasons why this can be off first of all the sites that I'm scraping somebody made a typo it's quite often that like they made a typo and they're off by like tens of thousands of dollars is possible yeah so you have to go and look at like you have the cross reference across a few sources to see if it's correct and then the second thing is that um besides typos uh sometimes I think companies may repost and the number becomes a bit funny yeah this is just what I observe from um looking at how to say looking at the data that I have yeah oh oh yeah yeah there's there's actually one company would put like 100k I mean I saw Nicholas's thing like basically like the range was like 10 dollars to 100k or something there are some jobs that are like that okay that's interesting I think the thing is that uh when people put in these things they are also humans so uh sometimes there are mistakes most of the time uh the data is fairly okay uh but there's a lot of like edge cases you have to catch la so like you have to you can't just look at the number and go like okay this is the range like uh this is just like a guide for what companies to negotiate with companies la basically there's been cases where I've completely gotten the range wrong before that's because the company put up too high a number and they got called out on it so Mikey maybe this question maybe uh would be uh for I mean to you let's say I mean what kind of qualifications is required or what kind of qualifications the employers required who offer like um five business salary for the software developer because from from my personal um experience what I see what I have seen that I have friends who get um you know 5k another friend who gets double of it another friend or other people who gets quite a lot of them so so what kind of skill set actually you know draw the five-digit salaries and the stuffs what do you what what is going to be your response for that what do you say uh that's the trick question isn't it just curious um I think as a if um as as as it goes if you are paid more you are given more responsibilities so it come with more responsibilities you also need to do a bit more you need to have the ability to take on those responsibilities so those would be things like mentoring guiding people making technical decisions architectural decisions being able to execute on those decisions those will be the skills set you will need to kind of like be a lead level or senior to lead level kind of a person so if you're at that level you probably can command that kind of salary scale so that's my thinking around this does anyone else have a differing opinion serious so and that also comes with the year for year of experience that you have right it's not that okay you just have like two years of experience and then you will get being like in general it's about the years but then again sometimes in some companies that you work in you don't have much opportunities to really learn a lot of things yeah or don't have an opportunity to really stretch your knowledge so because instead of being having like 10 years of experience you'll have 10 times one year of experience so which means even though you have you say I've said 10 years worth of work experience at this job but your actual amount of like technical expertise and knowledge is maybe borderline one year plus because you haven't really grown or stretch yourself beyond a certain level so I wouldn't say a 10 year person here and for 10 year person there is equivalent you probably want to also evaluate both at the same time as in I guess you go if I go for interview I would still have to go through the whole process that any fresh grab where we go through things like the coding interview take home take home tests coding challenges these are stuff that I also have to do when I was interviewing as well so that's kind of like even for me as I'm interviewing someone even though they say oh I'm of this level I just want to verify and I want to do a simple test just to figure out whether the person really says what he can really do what he says he can do beyond that will be like okay in things you can't tell from a coding challenge will be like why do you make certain decisions right your knowledge of of a technical framework say your knowledge in the framework you say you have many years of experience with a framework and I'll ask you questions like so why is it better using this way or versus that way right if you and then if you can't give me a good answer about why certain technical decisions are better in a framework rather than another rather than you know another like approach approach be in framework like rails for example what different approaches which is better or what are the tradeoffs right so so if you can't give me the kind of kind of answers even though you have like 10 years experience doing rails and it's like okay yeah that's okay yeah that's definitely make perfect sense thanks thanks about yeah yeah so I think it's about operational knowledge how to get things done then you go one level deeper into like why do you want to lose why do we make certain decisions right so once you have that it's easier to build that once you have that depth it's easier to get be able to command more salary I guess you know sorry I didn't mean to hijack Sharon's talk no I think that's actually a very good summary of it yeah anyone else have questions for Sharon yeah I have one like a sub question more like a whole good if you see that like there's a quite a lag in number of data science shop and do you regret it down or like yeah I should have just done a boot camp and learn JavaScript regret regret regret actually the the easiest the biggest pool of jobs in Singapore where you actually do I guess like good work okay so like just personally this is Michael are you recording ah recording right yes I maybe I share this after maybe I share this after you stop recording yeah sure sure we can do it around of sorry who is this that I just oh Nick Nick yeah sure that we can do another round of Q&A after a wing is done then sure yeah dive into more sensitive things yeah um I'm just I'm just mindful I'm also mindful of the time so I think we have a really fun no worries um yeah if you have no other questions and thanks uh Sharon thank you thank you everybody