OK. Thank you very much for attending this session.OK. My name is Shohei Matsura.I am a senior software engineer at L.Y. Corporation.The company was known as Yafu Japan Corporation before until this September.But the company has changed its name due to M&A with Align Corporation,a popular messaging service company in East Asia.Today I'm going to talk about database parameter or tuning through CI-CD pipelines.But before doing so, please allow me to introduce myself a little bit more.Again, my name is Shohei.And I also have another name Dominic.That sounds like French.This is because I was given this name by the Canadian Immigration OfficeAnd I was studying abroad in Canada in Montreal where people speak French.So I do have a name that sounds like French.It's unique, right?So you can call me Shohei or Dominic either way works.Going back to the talk.I'm at L.Y. Corporation.I'm in charge of R&D in database system.Where I develop in-house storage engine for my SQL.And I do research and implementation of new tech trends for RDBs.And prior to joining L.Y. and Yafu Japan Corporation,I was a developer of commercial RDBMS.And where I develop like query optimizer and concurrency controland disaster recovery feature and etc.So I do love database technology a lot.Now let me illustrate what we are going to talk about today.Data-based parameter tuning is very hard, right?Because we need to consider many factors at the same time.For example, we need to consider application workloadand hardware resources we have.And we also need to decide what parameter to tuneto a fat value to meet the SLA.So to make the things easier, we use machine learning.We integrate machine learning models into a CICD systemto automate the tuning process.This is when application developers put their codeinto a CICD system, we let the CICD systemto analyze application workloadand infer optimal parameter settingwith machine learning modelsand apply the inference resultto the target database automatically.That's the topic we are talking about today.So here is the agenda of the talk.First, let me illustrate the data infrastructureat L.Y. Corporationand explain why we are motivated by this research.Second, let me further explainwhy we need to combine database parameter tuningwith machine learning and CICD.Third, let me demonstrate our system CICD tunethat executes the automatic tuning with machine learningin the CICD pipeline.At last, I would like to conclude the sessionby sharing with you the future direction of our research.Now let's begin.Now let me explain L.Y. Corporationand its data infrastructure.L.Y. Corporation has a visionto create an amazing live platform to our usersand the company has more than 28,000 employees in total.One of the unique characteristics of the companyis that it has diverse service portfolioto fulfill people's everyday needs.So we deliver more than 100 servicesranging from media service to commerce,entertainment, lifestyle, and online services.And to deliver our diverse internet servicesto our users, we build and operateour own data centersand they are located multiple regionsacross Japan.To make sure they are always keep runningeven in the case of natural disasteror a power outage of a certain region.We also, in addition,virtualize our infrastructureby using open source software stacksuch as open stack and Kubernetesto use our resources efficiently.And here is a brief diagramover data infrastructureand it is made redundantand reliable with OSS software stack.And in our data infrastructuremy SQL plays an important roleand their nodes are spreadacross multiple regionsand availability zonesto make sure one node is always runningeven in the case of natural disaster.And data are always synchronizedacross region and availability zoneby data replication.Another characteristicof our data infrastructureis that it is providedas a form of service or a device.And it is made easyfor application developer to useand the application developercan launch a MySQL clusterby specifying our flavorfrom our web UIin a self-service manner.Now let me explainwhy we need to combinedb parameter tuningwith machine learningfirst before combining itwith CI CD.First, as I said beforedb parameter tuning is hardbecause we need to consider many factorsat the same time.But it is even harder in D-Bus.This is because in D-BusDb Instancerlaunched by the application developeron their zone on their ownwithout having the involvementof the DBAwho maintain the D-Bus environment.So for the DBAthere is no chance to knowwhat application is doingand they have no ideawhat to tunedb parameterbecause they don't knowthe application workload.In contrastin old daysbefore D-Bus appearsDBA wereoften asked to helpapplication developerin schema designand query designand hardware capacity planningso they knewwhat the application was doingso they hadthe database tuningwas a little bit easierthan D-Bus.So to facilitatedb parameter tuningeasier in D-Busdb researchercame up with the ideato autonomously tunedb parametersaccording toapplication workloadwith the aid ofmachine learning.And there are many academicresearchesthat are going onand many outputspublished.Here is a brief descriptionof theD-Bus parameter tuningwith machine learningand its flow.In D-Bus parameter tuningwith machine learningwe deploy agentin D-Bus serverin the production environmentto collect metricsfrom there.And these collected metricsare then sent tothe old tuning systemthat is separate fromthe database servers.And the collectedin the old tuning systemthere are two machinelearning models running.The first model is calledfeature extractionand this model is usedto extractworkload featuresfrom the collected inputmetric.And these extractedworkload featuresare then fed intothe second machine learning modelcalled automatic tuningmachine learning modelto inferoptimal databaseparameter settingwith respectto the applicationworkload features.And automatic tuning systemthen automaticallydeploy the inference resultto the production environment.And this whole processkeeps running foreverand to catch anyworkload changein the database serverin the production environment.And this processautomated the databaseparameter tuningbut there is an issuewith this approach.That is the accuracyis not 100%.Inference is justinferenceand sothe inference resultcould be wrong.So if the inference resultis wrongit could causeand if itdeployed to the productionenvironmentit could causeexpected performancedegradationin the production environment.On the right sideof this slidewe evaluatedmachine learningbased automatictuning modelagainstour e-commerceworkloadup to y.The x-axisis just time.The y-axisis query latencyin millisecond.The dotted lineis a query latencywith the defaultparameter setting.And the solidblackundotted linea query latencywith autotunedparameter applied.And if you lookat this graphyou can seethe query latencyhas worsenedmore than 10 timesdue towrongautotuning resultapplied there.This is a seriousissue, right?Because we haveperformance degradationin the productionenvironment.So onto copethis issuewe have came upwith the ideato autunedb parameterswith machine learningbutwe makeits risk assessmentin the CICD pipelinebefore deploying themto the production environment.We integratemachine learning modelsinto CICD systemand correct metricsfrom the db serverin the developmentwhile testsand application buildare being runin a CICD pipeline.And at the end of theCICDapplication developerscan checkwhether SLAor performance requirementis metwithout tuningresultor not.And they can nowdecidewhether to deploythe inference resultto the production environment.With this approachwe can avoidunexpectedperformance degradationcaused bywrong automatictuning resultapplied tothe production environment.Nowlet medemonstrate ourproposal systemCICD tuneby whichexecuteML-based automaticdata-basedparameter tuningin CICD pipeline.It's very busy slideand it's probablyhard to see.But here isthe overall architectureof the system.As the CICD systemwe useopen-sourceCICD systemfor the screw driverand it is connectedtomachine learning model repositorywhich hoststwo machine learning modelsfeature extractionand automatic tuning models.And these two machine learning modelsaretrained offlinein a separate environmentfrom CICDby usingdata-basedparametersand metricscollected frommanydata-based servers.When application developerpushdata-based application codeto CICD systemor screw driverand the screw driverpulls to machine learning modelfrom MLmachine learning modelrepoitryand start learningapplication buildand test.While executingapplication buildand testmetrics arecollectedfrom thedata-based serverin thedevelopmentand these arefed intomachine learning modelsin CICD systemto inferoptimaldata-basedparameterssettingwith respectto theapplication.finallydb-o-tunedparameterssettingis outputand shownto thereturnto theapplicationdeveloperand nowthey canmakethe determinationwhetherto deploythe automatictuningresultto thecorrespondingdbserverin theproductionenvironment.And thisis theoverallarchitectureof ourproposedsystem.Nowlet's seehowactuallyCICDtuneworks.I hopethevideoworkswell.In thedemonstrationwe tryto buildasimplewebapplicationtoexecutesysbenchworkloadagainstmyseql.And wemakethedefinitionofaCICDjobtobuildthiswebapplicationand wealsospecspecifywhichAPIto testnearCICDpipelineandapplicationdeveloperpushthisdefinitiontogetherwithapplicationcodetokicktheCICdjobandyou canseethewholeprocessthatnowlearning.And ifyou clickthenodedbparameterinferenceyou canseeactuallyhowdatabaseortuninginferenceisbeingdonebyusingmachinecalsotheo-tunedparametersettingin acsvformatandifyouclickthenodedbriskassessmentnodeifyougotothebottomyoucanseethequerylatencieswitho-tunedparametersandthedefaultsettingandinthiscasesincethequerylatencywitho-tunedparameterislowerthanthedefaultsettingand the application developer can rely on this message to deploy the O-tuned result to the production environment.Let me just briefly describe two machine learning models used in the demo.The automatic tuning consists of two layers.The model number one is called Workload Embeddlers.This model is used to extract workload features.And the second model and the output of the first model is used as an input of the second model.And the second model is called TPS Estimator.And these models infer db parameters to either maximize TPS or minimize query latency for the given workload.At last, I'd like to share with you the future direction of the risk of our research.One direction is to do more POC with more diverse workload.In the demo, we have just tried to O-tune a very simple web application workload, namely sysbench.But at LY, we have more than 100 services.And we would like to validate the usefulness of our proposal system against this diverse workload.This challenge is more fundamentalIn our proposal system, that is to absorb the differences between the environment and the product environment.In the CI CD, we usually use the environment, right?However, there are always differences between the environment and the product environment,such as hardware specs on the side of data and many others.So, we need some way to transfer db parameter inference result obtained in the environment to the product environmentin deploying them to the production environment.Because production environment has larger data or larger machine.To tackle this problem, we are now collaborating with academia to predict db performance of a production environmentwith a given parameter setting from db performance in the development environment.And by using this prediction model, we can determine if an O-tuning result can meet the performance requirement in the product environment.And the research is going on right now.And before closing my presentation, I'd like to express my gratitude to the following people.First, Mr. Nakamori and Dr. Kawashima from Keio University for their promotion collaboration of this research.And I'd like to also express my gratitude Mr. Hoshino from Cyborg Lab for his support and comment on this research.Also, I'd like to express my special thanks to Mr. Kino Shita from Keio University for his effort on building machine learning models to O2 database parameters.At last, I'd like to express my gratitude to all our colleagues in L.Y. Corporation for their support on this research without their support and this research cannot be done.So, thank you very much.I'd like to finish my presentation.And I think a few minutes is left, so I can take a few questions.And thank you very much for attending this session.But as you know, I'm not fluent in English.So, please speak slowly or you can ask me questions in French.Okay, I don't speak French, so we'll have to stick to English.How well do the results from the development environment translate to a production environment?Well, yeah, I know some, but I cannot tell you right now because I am going to publish the result in academia soon.Okay, so we'll wait for that.Thank you.I have one more question.Hello, Miguel Martinez.My understanding is that the scope of this research is database engine parameters, right?Have you considered a structural schema impact against those parameters on your research?Yeah, that's a very good question.And I have yet thought of that kind of questions.Thank you.We need to catch the schema change as well.So, the same thing, like when you have a multi-tenant shed environment, are there different results?When you run the same models and how do you translate them to specific tenants?Have you thought about that?Yeah, in the database environment, we have usually a multi-tenancy, right?So, that's a very important question, but I don't know the answer to your question, so I'm sorry.Similar to other people, have you considered moving this to things such as consuming query logs to provide app developers with suggestions on how to improve their queries or giving them feedback on common deadlocks in ways that they could improve the application on that side?Yes.So, similarly, we have tested like consuming query logs and like slow query logs and other operating system logs to improve the machine learning models, but we are still working on that, so yeah, sorry.Yeah, I will keep you updated probably in the next opportunity.Thank you very much.