 Oh sorry, I have to leave now because my wife is sick and I want to be home early so it's a bit confusing that I'm moderating this and then suddenly I disappear, please excuse me. Jeremy is going to pick up the mic after I'm gone and when there's a short, like when we have question sessions on human moderates. Okay, thank you very much and hopefully we see this big group at our next 5th NewsHour Group meet up again. It's the biggest group that we had in months, so that's great. Thank you. Thank you very much. Is this working? That's for recording. Ah, okay, just for recording. Okay. Okay, hello everyone. So I'm, you can go to the, before we introduce ourselves actually, we are part of the, I'm leading the data engineering group in DBS. Actually, we are a group of 50 engineers actually, about 20, half and half base in here in Singapore, half base in Hyderabad in India. And we basically take care of the development of all the software layer, all the software infrastructure that sits on top of our data platform actually. And we provide kind of like a cell service platform to the internal user. So before we start, I want to do a little bit of advertising actually. Next, we are hiring a lot actually. Last, in 2018, DBS increases his number of people by 10% actually. We hired about 2,000 people and pretty much all technology related people actually. Most of them are engineers, both in Singapore and in Hyderabad. So, and we do it with events as well. So next weekend, actually not the following one, the one after, what is it, the 7th, we have this event, Hack2Hire. Basically, it's actually a hackathon and with prizes hosted by DBS. And in addition to winning the prize, you actually is kind of like an interview. So people get selected and at the end, they can get a job offer from DBS. So if you're interested, I would encourage you to sign up. You just search and you just Google Hack2Hire Singapore. You can sign up. It's a two-day event, Saturday and Sunday, where you can do all your hacking and then we'll have winners. And then if you're interested and if you are selected, you can also get a full-time job offer from DBS. Cool, great. Having said that, let's start. So this is about something that we started in our data group and specifically related to scheduling. So how we implemented human-readable scheduling grammars using Antler and Python for our data platform. Yes, as I said, a couple of introduction. I'm the head of the data engineering group here in DBS. And Dario, who is working with in our group, he is basically leading the integration part, which takes care of various components and integration across various components that we have. So one of them is all the integration happening that we have between Apache Airflow, that is the tool that we use, and all our cluster, our Spark integration, a Cloudera, et cetera, et cetera. So the problem. So if you take any of the scheduling tool, most of these tools, they use a cron-based grammar. So it's a very simple grammar. So we realized that it's okay, but sometimes we need much more complex stuff, especially in a bank where you have to schedule jobs based on business rules and not calendar days. Like this is a typical example of what we have. Run a job every Friday of the first month and the second and the third quarter, if not public holiday, otherwise the following working day. So how do you express this using a cron-expression? It's basically impossible. So we said, okay, how can we create a way for people to express these rules in an easy format? I have to mention that all our jobs are metadata driven. So basically we expect users that sometimes they're not technical to define a job, to build a job and to schedule a job. So we have to give them a very easy to use interface to allow to express these rules. All this fits into Apache Airflow. We are extensively using Apache Airflow. And the limitation is that Apache Airflow is entirely based on cron-expression. However, it's very much extensible. So we can write plugins. Luckily we have very good expertise on Apache Airflow. We are the number 10 contributor of Apache Airflow and an official committer. And this I would like to thank Wery Xiaodong. He should be here in the meeting, yes. He is our Apache Airflow committer. And we are now adding a lot of features to the official version of Apache Airflow. As I was saying, Apache Airflow only supports cron-based expression, but we can easily write plugins. So we said, okay, let's write a plugin for Apache Airflow that allows us to analyze this type of expression and decide to execute a dagger or not execute it based on the rules. So we know that we had to parse. We need to express that complexity. So how could we do it? The first option that we consider is having a structure that allows you to represent that information. So imagine like a JSON-based structure where you say, I want to run on these days, on these months, on this quarter, but I want to exclude some days. So you can build it, but it becomes very complex. So especially for the user to express such a grammar with a hierarchical structure like this, it becomes too complex. So we discarded that option. The second option that we consider is using some kind of NLP. The problem with NLP is that it's not something entirely deterministic. So when you interact with the software system which is NLP, it might happen that the system doesn't understand your request, which is okay if you're using Google Assistant because you can repeat your question, you can try to let them understand, and sometimes it doesn't, but which is not okay in our case. If we don't understand the expression, that's a big problem. So we decided that the best way to approach the problem was actually building a custom grammar. Of course, this forces the user to define the rule with a specific grammar, and for that reason we have built a small UI that allows the user to make less mistakes as possible. But we are sure that whatever we type, it's easily understandable from a human perspective, and what we type, we are sure that the system can also understand. So for this, we decided to use Antler, which is basically a lexar and parsed library, and this is a very overview. Then Dario will go into more details how this is implemented. So this is how we split up our parse tree, and from the grammar, we basically derive our parse tree and use it to determine if we need to run or not a job. Okay, so I'll leave it now to Dario. It's okay, we can just... So just to give you an idea, this is how, I don't know how many of you are familiar with Antler, but this is how we express the grammar in Antler. So it's basically a BNF representation, and you see here that you can compose your expression token by token. This case, for example, we can say a day can be, let's say, for example, in this case, a day can be any day or a day of the week or a day category and a day range, and then we have the definition for each and all these tokens. So starting from the smallest token, you start building all your complex expression. So this is just an overview. Then Dario is going to tell you more about how we go from the grammar itself to parsing the expression and building the scheduler from the grammar. Okay, come on. Okay, so well the implementation is quite easy, so we will have just an overview. To go from the sentence to a list of schedule dates, we go through four stages using three steps. We start from the sentence that it's a sentence like we saw in the beginning, like this one every Friday of the first month and so on. It should be. No. Oh, this one. Oh, okay. Yeah, yeah, yeah. A sentence like that. Then with Antler we go to a parse tree. Then based on the parse tree, we wrote some code to convert the parse tree to a filtering pipeline. And at the end, we generate all the list of dates. So, okay. Well, this part is mostly managed by Antler. I don't know if you can see on this part. Give us some tools to generate, to convert this grammar definition to some classes, some Python classes, a lexer, a parser, if furnish also a visitor for the tree. That we can use in our code. Now, well, the tokens is a format understand by Antler, and then we have these classes auto-generated by the tools. So, in this case, our grammar practically build these three where we have some nodes like this that are the master node. Let's say this node is about days. This node is about months. Here we have quarter. We can also have weeks and years. In this case, we just use some less words. And already in the grammar, we have our pipeline. What we said before, because we have our filtering pipeline based on days, a filter based on months, and quarter. On the left side of the node, we have the qualifier that specify, for instance, Friday. For the month, it specify the first month, quarter, second, and third. But, well, this part is completely managed by Antler. Okay. So, this is how our pipeline appear. We simply use a visitor pattern. When we find a spec node like the spec or month spec, we create a filter. And when we populate the filter with a qualifier, the day filter is, qualifier is Friday, well, and so on. It's not so complicated. Okay. How the filtering happen? How the filtering happen? Yes. Our filter pipeline is, let's say, is a bit tight to our grammar. Because we have the same structure. We have, in the grammar, we will find inside the pipeline. We go through the pipeline, through the, till the bottom. In the bottom, we generate the full list of our range. For instance, if we are working on 2019, we generate all the days from the 1st of January till the 31st of December. And then we go filtering, like in a SQL query. So, in the bottom, we generate all the days. Like here. Okay. Yes. Yes, then there is also another approach instead of generating that it's to just create intervals. But, well, at the moment, we are just working like this. So, we generate the full list of dates of our grammar. Then we go step-by-step in our pipeline. Here, the year is a quarter container. It can contain quarter. So, we just have some methods that say, give me the second and third quarter. Then, when we retrieve the quarter, we go ahead with the next step from the quarter. Quarter is a month container. So, we have our functionality to retrieve the first month for the quarter. We repeat the operation for every result of the previous filter. So, we have the first month for the second and third quarter. And then the last step, where we generate, where we retrieved the current day. Here, yeah, the month is a week container, but it's also a day container. And this is the same for every container. Because, for an instance, we can say, give me every Friday of the second and third quarter. Yes. In that case, we don't have just these Fridays and these Fridays, but we should have the Fridays for April, May, June, July, and so on. Okay. So, in this part, we have the generation of the list. And, yeah, well, I didn't say at the beginning. Other than select which day we want to schedule, we can also specify a condition, if it's not public holiday or if it's not working day, and an alternative. If it's not public holiday, otherwise the following working day. You can see here in the tree, we split the sentence in three parts. The first part is the action. So, every Friday, blah, blah, blah. The second part is the condition, if not public holiday. And then we have the alternative. The condition works, wait. These allow us, for instance, here we have generated all the Fridays for the first months of second and third quarter. But, 19 April is public holiday. What is the happy Friday? Something like that. Okay. So, the next working day after the 19 of April is the 22nd of April. How we do this is, well, with some set operations. We get the main list of our working days. Then we generate the conditional list with the same algorithm. In this case, we retrieve the list of public holidays. That's the Chinese New Year and the happy Friday. We do an intersection between these two lists to retrieve this one. It's just the day we want to use. And in the intersection, we have the public holiday in this case. We apply a difference. So, we retrieve just the public holidays we have to work on. And then the last part, the otherwise section, the alternative, is just search back or forward to find the correct result. Then we merge the result and we have the follow list. And, well, what's fast? So, here is a bit small. Okay. Just to have a quick overview. But here is the simple way we use to run. I can go on top. We use to run our grammar. Here we set the current day. It's the day where the job has to be run. So, today is today. And then we call, well, the call is down here with the schedule. This one is just a check. We have two functionality. One is to check if the current day is a day where we have to run the job or not. The other functionality is to generate all the dates based on this sentence. So, well, schedule check. We give the sentence every Friday of the first month of the second. We pass the day where we want to apply the grammar, then a list of public holidays. And then, well, that's a parameter to define which is the week. From Monday to Friday or from which other weeks there are, well, other weeks. Then if we run, we see, well, the pipeline with the action every Friday, first month, second and third quarter. The condition, every public holiday and alternative. And here the list of dates generated, as we see before, the Fridays. The condition is simply a list of public holidays. Here we miss the first day because we start from the current days. The alternative we found and the final result. So, at the question we have to run this job today is false because, well, it's not Friday, unfortunately. And yeah, that's all. Yes? There's time for questions. So, for anybody who has questions, if you can pick it up without the mic, that'd be good, but if you need the mic, that'd be no. Does it consider a leap year like a day before a leap? I would run a job a leap year, the next leap year, a day before and a day off. Yeah, yeah, yeah, yes. Can you repeat the question for the video? Sorry? Can you repeat the question for the recording? Can you repeat the question for the record? If, sorry, can you repeat? I want to run a job. Next leap year? A day before and a day after. In a day before the leap day? Ah, okay, yes, yes, it's okay. Can I run the job the next leap year, a day before or after? Yes, because we generate the full set of dates. Of course, in the future, we cannot evaluate past dates. We generate the full set of dates. Well, we are generating the dates for the public holiday. We have available because if we don't have the list of public holidays, we cannot evaluate if it's a public holiday or not. So we generate the full days for every year and we just filtering. So the previous day, instead of search in front, we search back. Does it reference any business calendar? For example, can I generate a date? Give me all business days valid in both New York and Singapore. Valid? It's a very common problem in your booking trades that I want all business days which are valid in New York as well in Singapore. Oh, yeah, yeah. I mean, you can fast the business calendar to the tool. It supports that. I mean, if you create a business calendar that is basically aggregating multiple holidays of different countries, then yes, then yes. I mean, currently our use case is one business calendar because we need business calendar country by country. But yes, that can be done. Can you give us a use case of this particular tool that you have built from a user point of view? Because if somebody is a programmer who understands Kron, he'll probably write it in Kron. But this is more like closer to natural language kind of thing. So what is the business use case that you had in mind? Well, the business use case is that, so let me tell you. Take ingestion pipelines. We have a lot of ingestion pipeline and compute pipelines. We have a lot of ingestion and compute pipelines. We have everything metadata driven. So that means, basically, that you can define, let's take an ingestion pipeline. You can define all the parameters of the ingestion, like where you're taking the file from, what is the format of the file. And in addition to that, you have to specify when you want to run your job. Now, all these parameters, they're not filled up by a programmer, but they're filled up by what we call business analyst SMEs, actually, which they don't know code, but they are fairly technical. They can write an expression. They can say, this is a CSV file that is comma separated, top separated, to basically configure the ingestion job. So this is exactly the use case. We have a metadata management tool where BAs go there and say, these are the parameters to run this ingestion. The file is here. This is the format and when I want to run it. So this is our typical user. We cannot really use Chrome to express. Let's say, just to give you an example, I want to run my job every last day of every month in the third quarter. How do you use Chrome expression to represent this? Actually, it's not the most simple one. I just want to run this on the last day of each month. This cannot be represented by Chrome expression. So that is why we had this tool to help us arrange job at daily. So if it's in our world, which means we still rely on Chrome expression for this kind of complicated... Actually, we can say it's complicated. Actually, Chrome does not really help. Yes, here the goal was not simply replace Chrome with, let's say, a better grammar, but it was to cover use cases that Chrome cannot support. Obviously, yes, we can also represent Chrome expression here in more user-friendly grammar, but that is not what the original purpose was, was to support these complex scheduling scenarios. The question is very simple. You're very lucky you're in Singapore whereby natural disaster is not part of the game. Whereas in countries like Japan, you never know when the next strike will hit you. So my question to you is that using this method, just in case an emergency strike, like the other day on Typhoon, you know, at the Typhoon, which is quite unexpected, so how do you deal with cases like this? Will you be able to use this without manual intervention or somewhere in the program in your passing tree whereby you cater for conditions that cater for unexpected situations? Thank you. Okay, now that feature is not there, but it could be possible to implement it. I mean, you could say... I mean, I can't imagine where you could have if public holiday or no natural disaster happened, actually. And the resolution of that token is dynamic. So we might have an API to invoke. Probably, Xiaodong, you can comment more on this, actually. But yes, it would be possible. That's a scenario we didn't consider, but in that case, you know, at the end of the day, how we look up or public holiday, we look up on a calendar to pass to the sentence. In the same way, this look-up could come from an API that will tell us if this day a natural disaster is happened or, I don't know, a storm is forecasted. Yeah, that would definitely be possible, yes. I mean, Xiaodong, I want to change it to a month if you're excited or possibly even whole week. In Israel, that day is changed. And let's say another thing is a public holiday. Actually, we use it. We use an external file to handle this. But actually, as Mathew said, we can use API or whatever thing. They say I just want to exclude anything possibly from natural disaster or whatever else reason. I can just reuse that part to help us exclude specific pain or damages. Okay. Do you want to show? The token that goes into the public holiday. How we handle public holidays. The code or the file? You're most interested in the code itself or the... How does it? Okay. Or maybe we can do it offline. I don't know. You want to do it here? As you prefer. If we have time, we can do it here. Otherwise, we can do it offline later. We do it... Yeah, we still have time for that question. Okay. Okay. It's quite simple. As I said before, we have this day container class. That is the class that generates, retrieves the final day list. So we have, yeah, ordinal days, relative days, all the weekend. Yeah. Okay. The ordinal day, yes, is to retrieve, let's say, the first day of the month, first day of the year, first day of the quarter. So an ordinal day. We can have business days. So also the first working day, second working day, and so on. For the holidays, we simply have get holiday days. And we have an utility that can say this day is holiday or not. So here we just retrieve the holiday days, all these bodies are not implemented. Okay, ordinal holiday days. If we want to have the first public holiday, the second public holiday, and so on. The implementation is mostly here for the day level. So at the end of the day, it's just a utility. The parse tree is retrieved from the Canada Parcer that is a class created from Antler. So Antler will take care of the parse tree. Then with the visitor, we go through the parse tree to bring node by node and put information in our filtering pipeline. Sorry? Yeah, more than filtering, we generate the public holidays and then we do a set operation. Or in Antler, in Antler... We have a token of public holiday. Yeah, yeah, yeah, yeah. We have a token of public holiday which we basically associate to all the public holidays. You can show to the camera. Yeah, the condition is not for the condition and we have a day specification that is the same we have on this side. But in this case, the qualifier is public holiday. So we build two lists working there, Friday and public holidays and then we... It's a really simple one because I don't understand the domain. Why is public holiday not also on the left hand side? Why does it only live in the condition side? What if I wanted to say every public holiday in the first quarter? Because... No, you can. So basically we have a day specification. That includes public holiday. So the general way is day specification that can be an ordinal representation. I mean the first day of the month or any public holiday or any other form of specifying a day actually. So does that answer your question? Good? Alright. You mentioned that you also have a GUI where you are enforcing the business analyst or the business user to follow a certain grammar. How do you go about doing that? Because this is very fine, right? You can't just do it with a lot of controls and you can't use an LP. So how do you enforce that part? So the GUI is very simple. It's basically an autocomplete function using Antler. Actually there is a JavaScript library that allows you to generate a like intelligence-like autocomplete base of an Antler grammar. So basically we can kind of suggest a user the next token, actually the possible token that he can use and then of course we can validate the expression on the backend using parser. Okay, so thank you Dario and Matteo. If you have any other questions, you can take it offline later. But the video of your engineers.sg and the slides will be on parks.sg. So round of applause for Dario and Matteo.