大家好,刚才很抱歉,遇到了一些这个技术问题。我是来自谷歌的工程师,我叫刘硕。这次非常荣幸地到上海来给大家做演讲。这是我的这个co-speaker,Johnny George, he's from Cisco, do you want to?I'm Johnny, like technically from Cisco, I'm working in Qplur related.Ok, so I'll speak to English now.So today's agenda, we're going to introduce hyper parameter tuningand we'll be briefly covering what it is and why it's hard.And we'll go into our main topic, which is cool flow and catapult.And we'll see how these systems will help us do hyper parameter tuning.And we'll give a brief talk about the system architecture and the workflows.And we'll also show a quick demo of our system.And then we'll go into newer architecture search,which is a new area of research and implementation that catapult has done.And finally we'll go into some future work.So what is hyper parameter tuning?So let's start with an example.Suppose that we're doing this.This is a very basic machine learning program for recognizing digits.So on the right hand side, you can see some training code.And in fact right on the bottom there, you can see that this is where we're starting to fit a model.So hyper parameters are the parameters that you set before the training process begins.So if you look at this list, we have a network, batch size,a number of epochs, the learning rate and so on.So these are the parameters that are governing the training process.They're external to the training model.This is in contrast to model parameters, which is learned by your training.So yeah, so hyper parameters, configuration variables are external to the model,and they're set before the training begins.So setting the right parameters can really significantly improve your model performance.But it's only if you do it correctly, which can be very difficult.So hyper parameter tuning is a process of finding optimal values for hyper parameters,such that your objective function in the previous case,that is our function to predict a handwritten digits.You want to maximize the prediction accuracy.So why is that hard?First of all, if you increase your number of hyper parameters and your hyper parameter ranges,this search space becomes exponentially large.So in the previous slide we've seen about six or seven parameters.So imagine if you have a more complicated model and your space is just going to grow.So tuning by hand is really inefficient and it's very prone to errors.And in order to really track your performance from one configuration to another,you want to track metrics across multiple jobs.So that means you have to have some kind of data storage,some kind of visualization interface to see what your experimentation has done.Maybe you want to anchor on some particular valuefor one parameter like the batch sizeand you see what changes when you adjust the learning rate.So you need to be able to track metrics.And also you need to be able to manage resources and infrastructure.Sometimes they require additional hardware,some computing resources.So you want to be able to allocate and to provision and to clean up the resources.And finally,there's a lot of different frameworks and algorithms to support.For example,just from training,example that we've seen is for MXNet.But you can have a TensorFlow or PyTorch model.And algorithms could be grid search or random searchor Bayesian optimization and so on.So there are lots of different variety of similar problems.So how does Kubernetes help us?So first of all,Kubernetes makes it really easy for you to build microservices.So microservices are self-contained and lightweight servicesthat do one thing really well.And Kubernetes also makes containerization easy with Docker.So the good thing is that really applies to our modelof parallel training jobs.There's jobs we can containerize themand this makes them very easily scalable.And this also helps our system build up resiliencebecause Kubernetes provides auto healing and recovery options.So it's meaning that the failures of a particular training containerit won't affect the whole systemand the system can actually automatically bring it back up.This also helps us with our problem with multiple frameworksand different algorithms to supportbecause we could create this generic interfacethat's interacting with the algorithms as microservices.So we'll go into more details about that in a bit.And Kubernetes is also pretty good at describing the desired state.So we call those declarative APIs versus imperative.So remember we talked about we want to manage resources.So this makes managing resources very easybecause all you need to do is describe the desired stateinstead of having to write complicated instructionson what you want to provision.Kubernetes has a very flexible APImeaning that we can extend the core APIwith custom resource definitions.So this allows us to interactwith the objects using the standard REST APIsand tools like kubectl.And finally Kubernetes is very portable.So in production environmentswe often have to deal with different environmentslike if you're doing local developmentversus doing on-premise hostingor even moving cloud.Kubernetes has various optionsbut if you're built on Kubernetesyour applications should run everywherewhere your Kubernetes cluster is deployed.So a little bit about Kubeflow.Kubeflow is Kubernetes native machine learning platformfor developing, orchestrating, deployingand running scalable, end-to-end machine learning workflows.So that's quite a mouthful, right?So if you look at this diagramthere's not the skillbut a lot of the problemswith building machine learningin the productiondeals with the things in the blue boxes, right?So monitoring service infrastructureanalysis, processing management tools and so on.Your machine learning core logicmight be just only a small part of it.So Kubeflow wants to makebuilding all the other stuff easierso you can concentrate onthe core machine learning logic.And so right nowI'll turn it over to Johnwho will introduce us to Katib.Hi everyone.So let's talk about Katib.Katib is the hyper parameter tuningcomponent in Kubeflow.It is completely Kubernetes nativeso that means you could install iteven without Kubeflowbut then if you want to use featuresthe distributor training operatorsthen it is advised to usewithin the Kubeflow ecosystem.So it is inspired from theGoogle VCA project.You would have read the paper.It's a black box tuner.The link is given in the belowcomment.You could read that.It's fully open sourceand it is framework agnosticwhich means you could useyour programs based on your choice.It could be written in TensorFlow, Pytorch, MXNet.You could just provide the imagesand it should start running.It has a customizable algorithmbackend.By defaultit provides a random searchgrid search based in optimizationhyperband.User could pluginany algorithms basedon their environment.And you don't need to even restartthe components.It would bestarted as a separate microserviceand it shouldwithout even restartingthe component.It should start running.So let's talk about the concepts.The first concept is experiment.So it's a end-to-endprocess for hyperparameter optimization.So for exampletake example of a digitrecognition model.So you have to findthe best hyperparameters for that.So that's the entire experiment.The users provideexperiment config and submit to cat.So it hasdifferent sections.Theimportant sections are one is objectivewhich is what we are trying to optimize.For examplefor your model you want to optimizeaccuracy andwhat you want to achieve is 99%.So the objective metric name is accuracyand the value is 99%.The second is the search spacewhich is the constraints for configurations.So you could set limitsfor your parameters say the maxmaximum or the minimum.Or if it is categorically you could setthe list of options that are validfor a particular option.And the last one is the search algorithmwhich is how youfind the optimal configurations.So in the last slide as we saidyou could use the default onesor the ones which is added by the user.So like any otherresource in Kubernetesexperiment is also a resourceto be specific it's a custom resource.So you could useall the standard APIs tobasically get,createand delete theexperiment resource.You could use kubectl like any other resourcein Kubernetes and the state is storedin Kubernetes database.And the lifecycle is managedby the Kubernetes controller pattern.The second one is suggestion.So once user submitsthe experiment configthrough Kubernetes APIthe experiment controller picks it upand contacts the right suggestion service.So the suggestion is one proposedsolution to optimization problem.So you have set thesearch space and thesuggestion is providedfrom the based on thesettings given by the userwhich is one proposed solutionwhich isgiven by the suggestion service.So each suggestions algorithmis a standalone microservice.So which means you could add thatduring the runtime and it shouldbe a standalone experiment.And it allows user to createcustomized suggestion algorithms.And the third one is the trial.So once suggestions are providedby the suggestion servicethe trials are created bythe experiment controller.So trial is one iterationof the optimization process.So which meansonce you get the parameter assignmentsor the suggestion,there's aworkup process which actually startsemits out the observation matrix.So similar to trialsimilar to experimenttrial is also a custom resourceand the experiment controlleris the one which is spawningand managing the trials.And depending on the trialkind you can havethe distributed job for it.So this is a basic workflowof the hyperparameter tuning.So as I said earlierthis is the one optimization loopcontinues tillone of these conditionsare met.So eitherthe objective valueis reached.For exampleonce your accuracy has reached 99%or ifthe budgethas exhausted.So you could alsoset thatmy maximum number of trialsto be runby this experimentis X.So once Xis reached orwhen your objectiveis metthe optimization loop ends.Till that timethe suggestions are createdand for each suggestionstrials are runand the metrics are emittedand these metrics are collectedand new suggestions are created.So this is a system architecture.So the user submitsexperimentand experiment controlleris the Kubernetescustom controller.It picks it upand talks to the rightsuggestion throughactive manager.Soalgorithm relatedsettings would have been given in the experimentand once the experiment controllerget suggestions it createstrials.The trial controlleragain picks it up and createsstarts executing trialsand starts emittingmetrics.The metric collectoris a separate process whichagainduring the lifetimeof trial it collectsthe metrics and writes to the datastores.And this would be againused by the suggestionsfor the future suggestionsand that loop continues.Due to networkwe are not showing it livebut again we have acactive UI which actually providesmuch of the features as a user interfacebut for much morecustomizable optionsit is better to use the command lineat this point but we are continuously refining it.So as I saidfrom the user point of viewwhat we have to do is setting the experimentconfig and RESTALLis taken care by the cativecontroller.Sothe important sectionthat has to be filled would beone is the name,the namespace where the experiment has tobe run.The next sectionis the command set parameters.Oneis the parallel trial countwhich basicallyis how many number of concurrenttrials have to be run.Sobased on your resources you could sethigher number of trials to be runconcurrently.Theother one is the max trial countwhich is a budget saying this is themaximum number of trials which should berunduring the experiment lifecycle.Thethird one is the maxfailed trial count which saysmaximum number of failed trialsthat I can haveduring the entirelifecycle.Thenext is the objective which sayswhat is my type,whether I have tomaximise or minimise.Theobjective metric name is themetric that I have tooptimize.Andthe goal is thevalue that I have to reach.Sothis actually means I have tomaximise validation accuracyto 99%.AndI can actuallyprovide additional metric namesif I want to collect extra metricsduring the process.Thenext section is the algorithm sectionwhere I can specifythe list of algorithms.Somearealready provided.Youcould actually add duringa development process.Icould add extra algorithm settingswhich would be taken by thealgorithm.And the next sectionis the parameter section whereas I've told this is aplace where you specifythe search space.Foreach parameter I have tobasically say the max and min valueor if it is categorical I canbasically say the list of optionsthat are valid for it.Andthe last option is the trial specwhich is the most important onewhich would be where you'd bespecifyingwhat is youractual job that I have to run.Sothis is like the one runthat we have run usingthe base in optimization.Soyou can see that the finalresults havecome to 99%converged to that.Andfor eachthe trialwe can see that what are theparameter values that havegiven to that result.Andyou could basically say which oneand you can even gobackwards and figure out what are thebest values that I have to use it.Soa little bit more about this experiment.Sothis was conducted using abouta hundred different trials.Andthis was using base in optimization.Sothis is one of the most commonalgorithms for hyperparameter tuning.Theideas like the previous resultsof the trials are used toautomize the future results.Sogiven that we have selected someparameter values that have ledto a good result,we want toimprove upon ourcurrent status.Soit's kind of hard tosee from this picture,but if youlook at the corner there,maybewe started with maybe about 96%accuracy and we were ableto improve to 98,99%.Soandif you click on eachof these trials,you canactually see how the trialprogress over time.Sothe blue line there is theaccuracy that has improved.Andthere's also the validation accuracy.Soin this case we use the validationaccuracy as theobjective metric.Sowe use that to determinehow good ourcurrentset ofparameters is.Andthere,as John mentioned before,there are also waysto collect other metrics.Sowe can also see how itcorrelates with,for example,justthe accuracy.SoI want to bring us back toa broader landscape.Soandthis is a problem withclassical versus automated machinelearning.Soin classical machine learning,ahuman expert getsinvolving a lot of steps.SoI can feature choosing algorithms,configuring hyperparametervalues,evaluating performance trainingmodels.Anda general trend that we'veseen in computing is thatmore and more tasks becomeautomated and more and more becomemachines,rather than having a humangetting involved in every step.Soand this is why we havethis research field ofautomated machine learning,which ishaving a program generatingthe model without humanintervention,to reduce all thesetime-consuming steps.Sowhat we'veseen ishyperparameter optimization.Butin the general landscape,thereare also other areas,likefeature engineeringand architecture search.SoandI've included a link toa list of all ML papers,andthis isavailable on GitHub,and it'svery interesting.Sothere's something in common withall of these research,right?Soin all the cases,we're dealingwithautomaticallygenerating a configuration space.Sothat can include features,hyperparameters,architecture.Andin all of these,wegenerate some kind of ametrics to judge theirperformance.And based on that,ouroptimizer algorithmgeneratesa new configuration.Andthe output of this is amodel,which we canuse to serve predictions.Soin the recentdevelopment,wehave expandedCATIB tohandle newer architecture.Andto handle newer architecturesearch.Sothis is in contrast tohyperparameter optimization problemthat we just mentioned.Andthe differences inhyperparameter search,we'regenerating values fora list of parameters.Soyou can think of this as a vectorsearch.And newer architecturesearch is a graphsearch.Sothe space is exponentially higherthan that ofone-dimension higher.Andthere are a lot ofresearch inNew York architecturesearch.Sothere could be searching for anautomode network.Orcould be searching for anautomode cell,from which the restof the graph is generated.Andthere are alsovarious evolution strategiessuchas by generation or bymodification.Sothe workflow forarchitecturesearch is similartohyperparameter tuning,exceptthat we still haveour objective conditions,likewhether we have reachedour objective condition,orwhether we have exceeded oursearch budget.Butwe've also included constructingthe model,because inNNS,the human is nolonger,the humanexpert is no longer providingthe model.Sowhat we do is the suggestionwill generate the list ofevolve operations,and from thatwill generate the nextversion of the model.Andthen,similar tohyperparametertuning,we will run thesetrials based on the modelthat we've constructed,and usemetrics to using thereporting metrics,we canjust judge their performanceand come up with the nextversion for thearchitecture.Socurrently,we aresupporting envelopemet and reinforcementlearning forNNS.Sowe have seen whatCATIB has beenable to do for now,butwhat's coming in the future.Sofirst of all,we want to providebetter productionization support.Inthearchitecturediagram,you may rememberthere's a component fordata storage.Thisis useful because sometimesyou want to store your metricsfor the next run,or maybeto resume the experiments,ormaybe just as backup.Butcurrently,CATIB requiresan instance of MySQLto run as the datastored backend.Soone thing we want to do isto supportcustomizabledatabase.Anotherthing is metadatainteragration.Thisis inbodiescopeand this is a part of thebrothercoupleteam effort.Sowhat we want to do issuppose thatyou've done somehyperparameter tuning experimentsand you're able to producesome pretty good results.Soyou want to take those models that you trainand serve them.Sothis requires some kind of a generic storewhereyou can store your metadataand to be used by other partsof your machine learning pipeline.Finally,we havesupports for otherfeatures like long running experiments.So suppose you havetried to some combinationof parameters,but you wantto change your algorithm a little bit,maybechange your budget.Andyou want to continue experimentsfrom before.So this will providebetter support forhyperparameter service in production.Another aspect is thatwe want to add morefeatures relevant toautomated machine learningsuch as model compressionand automated feature engineer.Sowe've seenhow Cattip can dealwith thehyperparametersand NeuroCat as their search.But that has to do with the training process.Feature engineering will bethe step before all those begins.You're selectingyour list of features,maybeyou're doing some kind of feature selectionor maybe some feature processor maybe some transformationto transformyour training data.So these are the things thatwe're constantly researching.Sohow do you contribute?So you can find us on GitHuband there will befeel free to submitfeedback,try our serviceand provide feedbackand feature request.There will be lots ofhelp wanted features.Anda great way to contribute will beto help with the infrastructureand testing improvementsand maybe adding some new algorithms.So currently we supportmaybe five or sixdifferent algorithms.And our intention is forcatability involvement toopen source platform forgeneric data-makingmachine learning.So wewould really welcome contributions.And there'simitation to our Slack channel.You cannot open it from here,butwe're interested.Ok,finally we want tothank a lot of our contributors.Some of them are in the audience today.So entity,kai cloud,idm,sysco have alwaysbeen very,very helpful.Ok,so that isit for our presentationand we will take some questions.Yes.Can you change the dbbackend from thecurrent implementation to something else?Yeah.So currentlyin v1alpha2 which isa version which is coming in a month.So that's not possiblebut in the 0.3in the next versionit will have aplugable version whereyou could add any databaseand you could just implement the common interface.So the question is aboutconsumption of resourcesand Google.So the question is aboutconsumption of resources.So resource you can configureon a per job level.So there are a few waysto configure how much resourceyou want.So one wayis for your budget.So if you want to have a smallerparallel budget you can adjusthow many trials you want to run inparallel.And another wayis to change the specificationson the training job.So some trainingjobs you can use multiple GPUsmaybe some you can reduce numberGPUs.Ok,so the question is do we still needmachine learning experts after wehave neuro architecture search?So I thinka lot ofresearch going into how to efficientlygenerate neuro architecture search.So it's notso a lot of this is still veryearly.So we'restill researching into how toefficiently generate how togeneratenot only within the limitedbudget but also make itapplicable to various kind of problems.So I think currentlya neuro architecture search is pretty goodwith the image recognitionbut there's alimited application elsewhereso this is something that still requiresresearch.Thanksfor the presentation.So I have a question forthis slide.Do youhave asimple use casethat thecatted canmaybe inwhich stage thatmachine learningprogrammer can useit.For exampleI wrote a simpletraining program and I gotsome modelbut I don'tcertify with it.Sowhat's theeurocases that thiscattedproject cancatch in.So I thinkI'm not quite clear ofthis question,thisproblem.Maybea tactical use cases.So asyou described earlierthe wrong settingswould actually havevery bad performance of the model.Sofor example in the demoslide that he has showedif heso here if you see thatfor certain parameters youcould see that thefinal accuracy has come towhat's the value herejust 11%So it actually depends onhow do you actually figure outwhat's the best hyperparametervalues for your experiment.Sohere the clean advantagethat you actually get isyou don't need to worry aboutyou basically get the best hyperparametersthat is suited for your model.Soas you described earlierthe model performancecontinuously improves andfinally it gets to 98%in this case.YeahI have a very general question for Richardso from Google'sperspective iscuboflow driven byGoogle AI or Google CloudYour question is about whetherthe Google productwho is drivingcuboflowproject is Google Cloudor Google AISo it is thecuboflow organization isunder the cloud AIorganizationSo the team thatstriving is Google CloudGoogle CloudSo I kind ofhave a feeling thatfrom Google AI for exampleTensorFlow is very hotbut it looks likeCuboflowis not that popularat this moment.MaybeI'm wrong so maybe you can give me more insightabout that.Soour goal is todemocratize AInot just targetingTensorFlow.So we want tomake it easy for people to buildinto-end machine learning workflows onCubernettis and just on the cloud.Sothat's why theCATIBand other frameworks are agnostic tothe framework.Okay, I see.Okay, thank you.我对automatic machine learning有一个小问题就是我们通过神经网络搜索出来的传统的特征工程我们很多时候定义feature是根据我对识别对象的一个认知然后去定义它的一些feature一般都是有一些物理属性的嘛.那我通过神经网络搜索出来的feature它能对应到对象的一个物理的一个属性上就我能理解它是什么含义吗?我不知道我们要不要说清楚不太清楚的问题我们待会儿能私下聊一下小心So I think our time may be up.Do we have a quick question?Okay.Thank you.