 Thank you very much for the introduction so yes today I want to Tell a bit of a story of how we built business dashboards for for a product. I Indeed the creator of bonobo, but I won't talk just about bonobo. I will show different tools So yes using bonobo airflow and graph and are all together to build really quickly some business dashboard Before we start so I'm romander guy. I'm Making computer related stuff since quite quite a long time now and this year my focus is I decided to change Quite a bit what I was doing and I'm building a new company called maker squad My goal is to build cloud native products mostly SAS products using containers and stuff like this for ism myself and live on on subscription and or is a clients and And yet just help them achieve this This cloud native stuff, which is not really natural for a lot So there is a lot of stuff today. I'm sorry if I'm good too fast sometime But I will give you a sidekick link after with all the all the code and everything The idea is I will introduce the product so you know what we're talking about. It's a bit of background context Then we'll see how to plan what we want to dashboard dies And we'll go into the technical part which is implementing the data pipelines using bonobo visualizing it using graffina and running seriously in production with airflow and a bit of Well, what we go from there after that, but yeah, that that will be it So it has been said, but I created bonobos off. So I'm a bit by Zed here I'm sorry about that. I'll try to be objective But make your own opinion if you're building a product you're bringing anything make your own research take a lot of time in planning It's very important. And yeah, I recommend nothing But maybe it's an idea you could try to apply to your your own products and business So the product It's a pretty simple thing It's something that you give a neural a size and it returns an image It was in fact a service that's a friend of mine Developed a long time ago like ten years ago an infinite amount of time in the internet age And it was really like this in January If we time Delta years nine, I added a parameter to time Delta here We come to February where I decided that it was not making money It was Using a lot of time wasting a lot of time and it was time to shut down the service and at this time I called him and we agreed on me taking over the service with brand new thing And so in March he started to send me his traffic and I didn't think he had this many traffic So I started serving gateway timeouts, but after a bit of work a lot of work actually I could serve the new version of the of the website with which was no PHP anymore all Python And you'll see after how it worked Few months after I started to measure everything technical It's the background to be able to measure business things after and in July also known as last week literally we launched the private beta and Started to win on board all the old users that was Yeah So things did not go exactly as expected, but I'm pretty happy It was a nice problem to have too much traffic actually when I measure the thing I could say it was around 1 million Images of the bed by day, which is not a lot if you're Google, but which is a lot if you just have few servers somewhere Yeah, so nice problem to have so Just so you understand how it works that's Basically the whole service at least the whole user-facing service It's pretty standard on the website. It's there is a load balancer that sends traffic both to a Django website Which is the marketing website and the user accounts manage the API keys the billing all stuff like this and Sounds also the traffic to the API server if it's an API query, which is written using tonnado Not really modern not really sexy, but it's really fast and it really works well for us This API server basically just have to serve images as fast as it can Whenever an image is not here, which happens quite often It sends a miss event to a message queue a rabbit MQ message queue Another service called genito picks it up decide update the status first decide whether or not this request is legit Is this domain not fluid is this is this some kind of abuse, etc And if it decides it legit it sends a crawl message to another message queue a spider picks it up open the image Save make a screenshot resize it compress it blah blah uploads it And at the end it sends either created or failed message to the events message queue Which will be which will be picked up by the janitor once again and the janitor will update in ready So the API server can serve the request next time and user asks for it basically that's All the service is what I said But to manage that we also have a lot of data producing services in the back end I won't detail all of those services But mostly we have prometheus which is a timestamp oriented database that's course metrics from services and stores them into a bundled timestamp database tsdb and also provide a query language that amongst other Grafana can ask can can query and get metrics from so build graphs from that And as it's a rest API like HTTP API you can also Just query it yourself and we'll use it to get some data And very important too. We have external services Which is pretty much services, you know, Google analytics stripe mail gun, etc That's interesting because there is a lot of data produced here too I'm mostly the most interesting insight we can get even if it's not Not any anyone in big data if it's small data, but The biggest insight we can get is when we cross the data from internal things with external things So at this point I already have Dashboards of technical metrics and I insist on this technical world because it's not what we want to do after But we already have those dashboards made in Grafana. It's really easy to build You just write a query and you have the graph. It's very quick to do We have this CPU memory network like standard monitoring thing and we also have also standard monitoring thing but more Related to one service like and ginix is returning Well the amount of 200 300 400 500 we have per second returned by and ginix the timing of request So we can know whether or not things are going nicely And I insist once again on the fact it's technical metrics because there's a lot of data every 15 seconds from it You scrolls everything and and and stores new data But we don't want to store forever this data because we can't make decisions based on that It's just that that helps us Restore the service whenever an incident Arise or just react to what's happening. It's it's good to know what's currently happening in the service But it's good to know we have that before we do more Also planning for how many water you drink in a conference is very important So the plan is Basically buzz on based on that if you can't measure something it's really hard to improve it You're just blind blindly running in a thing and as it's very negative I like better the the other version which says what gets measures get improved It's not as easy as that because there is metrics that you can look at after as much as you want and it's just Consequence and effects so You can try to improve it, but if you don't fix the cause it won't but if you choose your metrics wisely Choosing cause metrics really having a focus on yeah, that's that's our goal That's our team goal or that's my goal will really help you improve that and of course there's not one answer to this question The answer which is not an answer is that you should not focus on vanity metrics For example sooner. I said there's one million requests on the API per day. That's a perfect example of example of vanity metric It's good to say yeah, we have some traffic. It's good to say to your user. We're not We're not a tiny service But if you focus on that it will just waste your time because I could serve one billion requests It would be just cost if none of those requests are actually built. It it doesn't mean a thing So to plan the matrix we want to To measure It's important to Consider the business you are in like we are in since after as a service But there is different metrics for different kind of business I will give you some pointers to to find the State the state of the art and in this state in in this regard after that It's also important to not look at the same thing if you're like early-stage business pre-revenue pre-market Pre-product market fit or if you're in a late-stage business you won't look at the same kind of thing and Hopefully a lot of people are really smart people built Frameworks for that and that's I'm speaking about business frameworks and not technical frameworks But one you may have heard about is the well-known Framework also called the I don't know if I said that correctly Also called the pirate matrix framework which says okay users are complicated and we will segment the the user journey within our service into five kind of steps maps to the when we attract new users when we get them to Give give us a little something like when we activate them For example, it could be they leave the email that could be they create an account them But that could also be they give you a phone call or maybe they download your application if you're on a mobile app Then you have the retention phase where you say, okay, that's good You I know who you are but now I need to have you really come and come again need me every day or every Period of time you you defined I reverse the the two and think I didn't do it on the slide, but the then you have the revenue How do you get money from the from this user and And that's not necessarily He pays directly it can he can generate money and directly also and you have the referral phase that I put really at the end because In fact for a user to recommend your service he must be like deeply in love with your service and He must love your service so much that is willing to actually risk something of his social existence to Recommend to his friend. So for example to explain that People recommend slack because they think slack makes them like look cool But if you're a persite or some strange service somewhere People don't think for now that you're cool. So you can try to make referrals, but that won't happen at this stage Another way to present the same thing is the ash moya version You will you will have all that after And If you want if this is not enough There is people like Alistair cruel and Benjamin use covid's that works in their book lean analytics on Much more detailed version of the same thing. Actually, it's the same thing. You can look for the different RAA or our face on this, but they are whole ear But then it's much more finely grained and you can find It's fine tuning it's it's optimization. It's so as it's optimization. It's for later For today the plan is to focus on the activation phase like let's say acquisition to activation phase And we'll just use our that's enough So to summarize what business software as a service it's stage like the defining in lean analytics, which is Very early stage. We need to know if the if the the people are Because we have users, but we need to convert them to members So do we have enough do we trigger enough emotion to make them want to become members? Do we trigger enough? Need answers that Actually, they they stick to the service and come back and use it and use the apk so for now because I don't have a lot of time we will focus on rate from acquisition to activation and quality of service both because we want to measure it to improve it and both because it's the main Testimonial we can have if we are transparent on our quality of service to users They may think we are actually serious on how we're doing it so first of the First of the three technical steps is actually using bonobo to implement some data pipelines To integrate data in some metric database Very simple we get some data we aggregate it eventually normalize it and we put that in a metric to metric value database Most probably timestamped. So I thought about the most simple model. I could find about that and actually it looks like this It's Metrics and only values daily values That's very quick to write Probably not the best but it works for this stage and it's important to note that The technical metrics yield a lot of data those metrics will yield not a lot if we have one Datapoint per metric per hour. It's 24 data points per day. It's really not a lot. So That will be enough if you need to build a much bigger system, of course There is star and snowflake schemas you can find a lot of literature on the internet And yeah, so Just a quick introduction because I guess you may not know bonobo as it's really young projects to you Less than two years old now But bonobo is an extract from a lot framework. The goal of this is you build assembly lines of data transformations with independent stages and you will pass rows of data from one stage to another so less Metaphorically you will use Okay, it's fitting You will use different Codables and iterators and and classes instances to get some data and here we are Selecting some data from the database. We're qualifying it We're joining to other database and sending emails and the point is this the send email function will run While the select function is still yielding results So the first report will be sent and maybe select as still one million row to select Each runs in independent thread that I pass first in first out. It's kind of stupid It's of course, I have my example was linear But it supports any kind of directed acyclic graphs and we'll see that after in our imports It's standard python so you can use it in bonobo or outside bonobo the same code. That's pretty neat and Getting started is three lines and if I had a long sentence, you could have done it before I finish So let's write all jobs once again Apologies if the code is if I'm going too fast for the code It's why I put up a new while I will upload the code and notify as soon as it's done at the end, but First extractor looks like this It's an object counter reader. So this object counter reader Builds a sequel query, which is rather simple. It counts objects in a table and formats that as a tuple of dictionary of Dimensions and dictionary of metrics Of course that the query is parametrized So the person zero needs to be filled and for that we'll Simply use a dictionary that is a dictionary of Table to metric name and will create An iterator using the items the dict items built in That will pass data first in first out to object counter reader that will then Send the request to the database and etc. We'll see more when we aggregate all the thing The normalize is not a real normalization all there is no validation here But it will do the job for the proof concepts. We will just say, okay, whatever comes here is I Need to field and first one is called dims second one is called metrics We don't validate anything, but it's really easy then to replace the simple stupid thing with Real schema validation of whatever is inside if you need to be more solid We need something that writes to the metric database. So I wrote one that Can write both to the howly values and and and Daily values tables It filters out the rows. It doesn't need and then it's not really important to to get the details of that but Simply it will be called on each me each dimension metric tuple and it will put it in the into the database And of course, we need to compose all that which is probably the most important part So we instantiate or normalizing at the top We create a graph instance that get the readers we had before so the dict items and the object counter reader Then we normalize it just instance above and we had two chains with two instance of analytics writer and which will filter out non hourly metrics and Insert the only metric in the database and the other one will fit our filter out early metrics and insert Daily metrics into the database Because it's better when we visualize it size keepers a slide and that will do that. So It will be past first in first on between all and after the set fields The data will be passed both to a daily analytics writer and both and to early analytics writer And it's their role to filter out the the rules that doesn't want Just before we pass the database connection implementation, so it's SQL alchemy databases in engine, sorry But yeah, not of course you need to connect You can run it you will have linear status, but it's not linear. Okay, we got it Let's add a lot of readers again. I'm running quick, but You you'll get all that We connect to Google analytics, so we provide a service we call Google analytics So of course we will provide a client implementation In the services dictionary, but here same idea We send a query to an API and we yield a couple of two dictionaries dimensions metrics We can call Prometheus. You do not have to know what's the API of Prometheus But you have a query range and point we query this thing we ask for some metrics and the one we need we Often it's an agree. It's an aggregate query. So we just take hardly other edges or things like this and Again, we yield to dictionaries dimensions metrics We have spiders count. So we have a lot of spiders running some are inactive some are active So we get all these things and we'll use a concept similar to func tools that reduce in Python But adapted to the stream processing thing and we'll use so we use a reducer that will Take elements two by two and just make one So we take the spider statuses and we make dictionary of counts of active and inactive spiders Which look like this so that you need to initialize to give an initializer to the to the reducer thing and function to reduce so everything that goes out of spiders reader the status of spiders Goes into this reducer two by two and as soon as it's available It's yielding and we use a lambda here to just format to the dictionary of dimension dictionary of metrics. I Hope I didn't lose you completely. That's my biggest fear here, but the results graphically looks like this if we Put all the readers at the same time in the same graph and this graph can be executed You don't have to execute everything at the same time, but this graph can be executed and we have some kind of Bigger status If someone knows how to write ask your trees in in console, I really love to discuss the algorithm because for now I can't Again, I'm based but it's really easy to replace parts and that's That's really what what I like most is that for example the normalizer is still this set field thing Tomorrow, I will use maybe something like sabers or any schema validation library to Be sure that's my in my data is in the right format Enough talking about bonobo We need to visualize things and we already have a graph and I instance installed So it will be quite easy to just take some of those metrics and make graphs with that So I guess who doesn't know graph and at all okay, so Graph and I is quite a all software. I think it was a fork of Another suit like this before and the goal is to connect data sources and to Graphically configure graphs by making queries. So the interface to configure a graph looks like this You just have a query editor. You put the query and you get the graph. That's pretty much as simple as that of course, if you don't have data, you can't visualize anything but All all the configuration is done in the web interface and so for example, this graph is one of those we use for the QoS thing Which shows how many events we got every hour about how many miss we get we got how many created event We got how many failed event we got so we can compute ratio between Created failed for example Etc. We have also the one that passed through the reducer thing is the number of spiders So when we launch more we see the number of inactive spiders active spiders We can work we could work on better display here because it's what means inactive. Maybe it's just resting so anyway But much more readable for users. We have this Graph of how many time does it take to get a new picture and At the time I took this graph on average. It was 16 seconds If probably you don't have all this Advertising things JavaScript thing that make the browser load very long Maybe you were in the nine seconds lunch, which is the minimum we got here But yeah So we use that also the same kind of source to build our public dashboards here the front end is something based on C3 And that's just a parent is is not very important, but that's how we show to users The actual status But interesting also is that as you can write SQL queries directly in graph and are you can You You can actually make computation directly using your database and giants and in so here we add user counts We add new sessions from analytics so user counts coming from the count of objects That we computed every hour before new sessions that coming from Google Analytics API API calls And you can just divide one by the other Like subtract the maximum on a day with the minimum on a day of number of users then make Simple division and gets the conversion rate between Acquisition and activation Here it's a bit strange because the we have a that's actually real data and we have like a 25% Conversion rate at the beginning So you have to cross us also with events because it's the day when you we Mailed every people that said hey, I want an account We sent the males this day. So of course the this kind of acquisition rate is not possible normally Unless all your traffic actually already said they wanted an account That's very early as you seen in the In the timeline and traction at the beginning That's the affair of the last month and last weeks of of work, but yeah, that's that's our basis for the for Seeing a lot more thing and we really want to base our decisions on data So that's the basis to make the service evolve so Until then that's pretty good. It runs on my computer. It runs on the production servers, but This iteration zero, let's say Is yeah, it's it needed to work really quick So a current job was running everything every 30 minutes When something fails actually we could know if something fails because it was Kubernetes current jobs, but It's you need to you need to have a look maybe if you have a lot of job that fails You don't know which one failed Some expensive tasks were running every 30 minutes. It was hard to run manually So because you have to export the current job into a job with schedule that into the into the scheduler And then eventually delete the job afterwards. So a lot of things and we wanted to do To do better on this side so my cats Gently proposed to handle that I declined and the proposal and We installed airflow To manage that so as you may know airflow is a platform to programmatically author schedule and monitor workflows that the official docs And I think it's a pretty Precise And we mostly use it to schedule and monitor here Once again who doesn't know airflow at all Okay So it's a project that was created few years ago by Airbnb It's now on a project under the Apache incubation program. So that's a really great news for the future of this and The role is to schedule and monitor jobs either with Like if I run it on my computer it will just schedule jobs on my computer directly or if you're running on a cluster For example, Kubernetes, but other kind of cluster can work too. You can distribute workloads using cell read ask Or even they are working on a Kubernetes executor So soon it could even submit Kubernetes workloads directly and it can really run pretty much anything a computer can run Python being a special case of anything a Computer can run but it can run anything and it's written in Python. So the configuration is actually Python code Without too much details the architecture of airflow which is important to Well, if you want to run airflow You need to understand a bit the architecture because otherwise it will be a hell to to run But there is a web interface. I will show just after that helps you Updating the metadata database and there is a schedule of service that reads the metadata Database the data is hard to say With that and just say oh, but say I have a task to schedule So let me find the worker that that's okay. You can run it So it send the workload to worker get the results get the log file and update the metadata with that For us it looks like this. So we all the tasks we configured before we defined it as DAGs or Directed acyclic graphs, which is the name of the task in in airflow And My timer shut down. Okay Which is so a task in airflow and here we could separate For example, the cleanup task at the very top is only run daily because it deletes a bunch of rules in database No need to run it every hour and all the rest is run highly Which is what is great is that you can get the log files of one individual task you can get You can get all the all the runs you can run one manually You can know the timings of different individual tasks. So as we plan to add much more It's great to have and we also plan to have Dependencies between tasks because the main goal of airflow is also to say, okay I need to run the five tasks here and once all they are all are done Let's run this thing and then then then send a report Bonobo would do on one data sets first in first out at the raw level Fload does that at the job level and that's something you find in I mean legacy ETS like talent and Pentaho things like this you you often can manage the workflow between jobs and between tasks here We use two different tools Real quick the configuration So you build DAG objects here we built Simple DAG object with just one operator which runs Python code in another virtual environment We built a lot of DAGs using this function dynamically so for each metric each data source We build one DAG and we build at the end the clean all which just Yeah, delete things from database One thing airflow managed to is connections. So here we created upper seat events and upper seat website connections in airflow in the directly in the the web interface and we use Probably suboptimal trick to just use this thing pass it using the system environment And so our application that already respects pretty much the 12 factor principle can just use the database Connection from the environment and we can just configure it in airflow There was a bit of question we had to solve we needed to know where to start the DAG So we can run locally the same as production. We decided to Put it in directory, you know only code base We have one code base that runs all the different services with different and three points and that's one more on three points So we built an image What not important? It was probably a bit Complex to set up at first probably also because we didn't have a lot of experience with that We finally found that the helm charts in the community are not really good So we found a company called astronomer that's That's I think they provide Airflow as a service and they built really good quality images Helm charts, so Helm is a packaging thing for Kubernetes Yeah, the last thing also is that we had to read a lot of the airflow source because Yeah, probably I will try to contribute a bit to the documentation But there is a lot of things that lacks in the documentation. So Prepare to read code Which is also true with Bonobo and not with Grafana, which is really well packaged So from there I have about five minutes left from there Plan n plus one is pretty much the same as plan n Except that it's really easy to lose a lot of time When running this kind of experiment So what I'd suggest is that that every experiment you do on business data is Should be time boxed for example as you time box for example sprints in Development process you should time box experiments on data and decide before and what are the the results the numeric results that you actually Define as success and define as failure or not success And saying yeah, I will measure that in two weeks is important because otherwise you could say oh That's not yet or objective. Let's wait a bit. Let's wait a bit and probably maybe two years after you're still there Again, Ash Moria built some simple canvases Apparently canvases are easy to sell so they but that you should use something like this to write before on on the paper Why you're trying an experiment what X what hypothesis you you need to prove wrong? also, what get what kind of line in the sand you're you're putting down and And yes, so one week two week one month after you can just say okay. Do I did I validate something? Yes, no, what's the next action? Yeah Yeah, and I would probably move on a bit on the tech side Yeah, we have we want to show in Grafana month-on-month data. You're near that are very important to see the movement We have a lot of ideas We will probably use a very complex process Yeah, just so just to finish that the To go back to the assembly line analogy Airflow is really a very good factory manager. I mean you you you're building a lot of different things It it doesn't care about the jobs content. It can help you building dependency between small teams And some of you assembly lines could be built with bonobo Maybe once again make your own research and buy it So I don't want to advice something I can't do a Lot of the work here is based on very good books, so I need to make Advertisement for things I have no interest in like site reliability engineering, which is the best book ever you can read about managing Systems in production Lineal ethics when where you can find all the schemas for different kind of businesses like the one I Shot at the beginning and scaling lean, which is Book by ash Moria where the schema of the different canvases are coming from But if there is one you should read its site really busy engineering and it's free You can you can get it from Google directly on the web, of course if you're on the paper version is paid I really like feedback, so I said before my I Was frightened to send too much information here So I really I'd really like to know what you thought about This presentation so for that I will provide a link which is the same link as Where you'll find all the code example as soon as I can upload them so probably this afternoon or this evening And I will announce on Twitter too and this weekend Saturday Sunday I will announce it again in the sprint announcement session, but there will be a sprint and bonobo If you want to come just feel free it's very open. You don't you don't need to be expert in anything you beginners experts Pythonistas non-pythonistas everybody is welcome and even if you don't want to go to bonobo et al sprints You should really consider sprints, which is a really nice way to learn things other sprints I don't know which one that is this year, but really good thing All the resources will be available on this link you can I will announce on Twitter as soon as it's dated with the code and there is a link for feedback. So So that would be really nice of you if you can just send me a line that was stupid that was good That was I would have added this or this was hard to understand. I really like this Yeah, and basically that's it. So thank you very much and if I don't know if you have time for questions, but Thank you very much for this very interesting talk We have time for a couple of questions, but I also want to remind one thing if you download it Yep, there is also a way to rate all the talks. Please do that. Give five four three whatever stars For all the talks any questions Hi, thank you for the talk So I am Wondering about the airflow deployment. Do you have been using some what type of a secure it? So a secure a secure. Yes Local or celery secure if you can talk a bit about the deployment Okay, so We used both So let's start with my computer On my computer, I just went with the local executor and it's pretty easy to have it's hard to get the logs because there is a bit of communication problems, but I So exactly I build a docker image with both the airflow code base based on the astro number distribution of airflow with the upper side code bundled in a different virtual environment and I just want the docker image on my computer To deploy it to production the same docker image is deployed to a Kubernetes cluster But the configuration is using the celery executor. So the celery executor Are you familiar with Celery a bit? Okay, so Celery has a Like manager thing I don't really know the name but that runs in the scheduler of airflow and that will just Handle sending messages and receiving results from the workers. So The deployment is On our side is made using the Kubernetes Recipes that we got from astronomer. We changed it a bit. I can share that also if you if you need Mostly the struggle was to understand the architecture. It's it's why I did put the airflow architecture diagram here because as soon as we really took the time to Understand what was happening? Everything came in pretty logical But as it's a distributed system just trying to put up thing could Create some WTF moments about that Does that answer your question? Okay, hello sec next question that is maybe a little bit related How do you actually combine airflow and bonobo because I guess both have the individual graphs defined Okay, so actually it was Done Here we actually run another Python process from the workers in in Well airflow workers will run Python process that just runs graphs Again, I'll share this so probably you You can just take the text version, but It's in fact we really use airflow to manage the execution life cycle of the job And we really use bonobo to do the actual work That's not the only way you can use airflow There is also ways you can pass data from one job to another but the dependency Manage management of airflow is more Is is more gearing towards Waiting for things to complete because we know we need a completion before we learn something else while bonobo is trying to do As soon as available, let's continue the pipeline. So it's more the stream line Bonobo is more doing data streaming. So both are are combinable Thanks for the talk Just a curiosity question. How many machines do you run this on? Exactly three today. Okay, and do you deploy with what? It's actually deployed using Kubernetes So we have a Kubernetes cluster if you know don't know Kubernetes. It's a bit out of topic But basically you define manifest of your services You just put the manifest on Kubernetes and the scheduler in Kubernetes runs the docker image you define in your manifest No, no, we use the managed version of Kubernetes on Big and or whatever Okay, thank you very much again