非常感谢大家的兴趣和荣誉为了帮助大家进入深海首先,你们有多少工作人员和GPU的工作人员呢?有少数阿里 Cloud offers a GPU sharing and scheduling solution in February this year.How many of you have heard of it or tried?OK, also quite a few.So in February this year,阿里 Cloud Container Service Teamhave these GPU sharing and scheduling solutionand make it open.And we know that many friends have used this solution.This solution have helped you to save cost.So today I want to share with you this solutionand the inspiration behind it.Maybe we can have a further discussionabout how to refine this solution.Thank you to start with.I need to introduce myself.My name is Xue Yang.I'm from the阿里 Cloud Container Service Team.I've been working on using Container Technologyto improve the heterogeneous computingin阿里 Cloud.This is my co-worker Zhang Kai.He is responsible for the阿里 Cloud Container Service Teamcloud-native AI.He has rich experience in cloud computing,distributed systems and SOA.Maybe he will join the Q&A session later.So a bit of background introduction.So since 2016,there is a famous game competitionBitoneva Go and World Champion.Now three years have passedand AI have been implementedin many scenarios.For example, we have the smart robotor smart customer service,commercial recognition,and AI transportation.So what drives the AI development so fast?One of the reasons is thatinvident GPU in a specific scenarioit can improve the efficiency of the computing.In the past,we will take several monthsand several dozens of daysdo the calculation.But currently,we can only finish it within several daysor 17 hours.Why the GPU is faster than the GPU?Actually,it's related tothe process scenarioas well as the system architecture.What is the main differencebetween the CPU and GPU?I think it lies in the core number.The CPU will have several or dozens of the core number.But GPU will be different.For the really old generationof the GPU,for example,in the meantimeit will have at leasthundreds of the core number.But for the meantime,it has 5124so it will use the spaceto save the time.And this kind of the physical advantageis really usefulfor large-scale battery calculationin the scenario of the deep learning.And when the GPUfor number reaches a specific level,it needs some technologysuch as the Kubernetes.This kind of the clustermanagement technologyand scattering technologyto manage the GPU.But how the Kuberneteswill manage the GPU?And after the 1.8 versionof the Kubernetes,Kubernetes will be definedby a device pocket.So it supportsnot only the GPUbut FBGA,RDMA,and otherheterogenic resources.So this kind of the differencecan be divided into two parts.First one is the extendableresource.So if you supportGPU,FBGA,or FBMA,you need to give it a name.So it can be realized via an API.Another thing is thatall these kind of the companiesprovide this kind of the servicesuch as AMD and Nvidiain order to realizea device pocketIt is quite simple actually.It only have two process.One is the resource uploadresource reportsand how many device do you haveand how many GPUs do you havewhat are the ideas of the GPUand the second one iswhen we schedule the jobsor the tasksthe containerswhich is ascended runningmust be combined to the machine.So no matter what you supportGPU or FBGAyou can use this solutionbut it has a problem.You can see thatthe scheduling of the deviceit is only happenedabove this nodebut for the schedulerit only add or reducethe resourcebut in the complicated scenariosthis kind of the informationthis kind of the information is not enoughfor exampleI would like to scheduletoGPUwhich is not with any linkso if we don't extendthe scheduleron the nodeit cannot meetthe physicaltop of requirementssothese are the disadvantageor limitationof the Kubernetes schedulerfor the GPAso I don't thinkit will give enough supportfor theheterogeneous devicewell actually we providecontainer service on the added clouda lot of the clientscultures ask thatwhether you can providea solution to realizeand the purposeis to reduce the costfor examplewhether we can usethe peak load shifting methodto transferthe runtimetransfer the applications runon the devices and to anotherto youand on the added cloudsome clients also havetheir own API authorizationso that they can applyfor other clientsthey need to go through the P1 processso maybe they can usethe resource compression waytoreduce the costso all thesekind of thechallengesis thatwhether wecan use the Kubernetesto schedule the GPU,share the GPUand currentlythe scheduling mechanismmakes itsimpleit must be scheduledone by onethe second one is separationwhen we rundifferent applications on different GPUseparation is really importanthow to ensure that one is notinterviewed anotherthe immediate has two methodsone is the EVGUbut it must beachievein thedifferent levelso it is quite difficultfor us to make it truea second problem of theVGPU is thaton defaultyou need to supportsome licensefor the videoit must be playedand third one it is notso flexiblefor our clients they would liketo reduce the costby big low shiftingso i don't think thatbecause isolation wayis suitable for usanother one is the MPSmultipleprocessed serviceand this kind of machinematter can be done in the machinebut it can only be runand we already tested thatin the production environmentit is not really availablein some testing modelsthe MPS client terminalhad the crash problemwhich will affectother terminalswhich is connectedto the same terminalsand now we are working withthe video to solve these problemsso let's go backto our problemsour clientswould like to sharethe GPUand to realise the scheduleGPUand we hope that this kind ofthe schedule can be simpleand at this moment we need tofocus on the schedulingbut for the isolationwe can realise itby tradeoffand third we cannot changethe core code of the Kuberneteson cloud serviceproviderwe know that Kubernetesdevelops really fastand currently it alreadyreads the 1.15 versionthe minor versionand the safety pluginswill be launchedeven more frequentso if we changethe core codeand we need to invest a lotregardingorder to maintain itanother problem is thatonce we changethe core codethis kind of the mattercannot be copiedcannot be used thatfor other usersif they need tohave their own matterto start communicatingso that's why we don't want tochange the core code of the Kubernetesto solve these problemsand isolationwe are not out of focus anymoreso let's go back to our challengecurrently we don't haveany solution available in the communityso we need to startto tackle the challenges from the scratchwe need to do two thingsthe first one how should we define the ABIhow the customerswould use the systemhow do they apply for the resourceand the second is thatwe need to support this kind of the requestso let's have a look at the first questionand how can we decidethe interaction interfacewhen we talk to different kind of clientswe find that when they need to usethe shared GPUwell actually they need the shared GPUin a scenarioof the modulepredictionso at this scenariothey can assumea assumptionhow computing capacityof the GPUis in positive relationshipand withthe memoryand thenwe will transferthe sharing of the GPUto achallenge about theusage and the allocationof the special resourcesand this kind of thethingthe definition is easy to be acceptedby the usersand the next one is thatwe will divideadevice pluginwhich isa memory of the GPUbut is that enoughno i don't think soif you can do thisi don't think it's enoughand for the traditional Kubernetesthey are not really capableand the result is thatafter the schedulingthememoryresources would be fragmentedi will goi will cover that laterto be simple afterthe frequent schedulingif there are several partsthe total GPUcan meet the requirementsof the usersbut actuallyand what usershave a really large requirementof the schedulerthere must be a validso we needto add a scalablescheduleto have a bindingand filterso in our designthere are two core modulesone is the GPU share schedulerisn't there which is responsiblefor the schedulerthe main differencebetween this one and the native oneis that the schedulerhas already beentrue to the scheduling layerfor the GPU sharedevice pluginbe the onewho is execute the schedulerextendera signing resultso let's seewhat is theextend thingschedule doesjust imagine thatwhen the usersapply for the memoryof the 4GBand in the traditionalway they found that the nodeand 1 and 2 and 4 can meetthe requirements becausebecause the restof the GPU memory is over forthso this way we need tointroducethe 4th schedulerto do the filteringand then they found thatand 1 and 2 can meetthe requirementsso let's take and one as the exampleeven though the restis over forthbutthe maximum memoryfor one cardis 3GBand the same forand 2 the maximumis 2 onlyso only and 3 can meetthe requirementssothis one is playing theroad of thisschedule extenderit can do themore specificschedule taskso the next one is binding to the nodeit is quite differentfrom the traditional wayin the pastthey onlyschedule the nodeand filtered nodebut what we dois that in the schedulerwe will filterthe currentby the pack wayso let's see this examplewhen theGPU memoryusers the packfor 4GB of the memoryGPU memoryGPUwe have 4choices fromGPU 0 toGPU 3but we found that 1 andGPU 2 can havethe resources over 4only GPU 0 andGPU 3 can meet the requirementsin this way we will use the bin packthe newapplication resourceswill be deployedwill be allocated toGPU 0and the code imitation will be reported as welland it will be used as referencefor the schedulerson this stepthe device plugin is askingthedecisions done by the previouslevel of the schedulerall this kind of scheduler informationfor examplethe GPU resourcesand the occupationof the container on the GPUwill be recorded in theenvironments, variablesand broadcast to the usersand thatthe decision can be madefor example the occupation of the GPUthe occupation of thecontainer is that or實際上in factit is easy to deploy thisassimilationwe use this helpto deploy theto use share on theGPU cluster and you can decidewhichplug you want to deployin which namespaceand you can find themuster, the number ofmuster supported in thiscompletion installationand also on thisbasis you need todivide thewhich nodeswill want thisGPU sharingyou can just add thelevel that is doneand also in order tostrengthen the managementcheck the resourcedistriction ofGPU sharingit is easy to useit hasthis extension commandit provides theit help you to monitorthe resource availableon each node and totalresource available and totalresourcewhen you applynot to use the GPU sharingit isthere is no big differenceto the general approachyou just change theGPU limitinto the gpu.comand i will be fineyou just need to changethat specificationand identifythe numberor the capacityof video memorythat will be doneand in realitywhen you truly run ityou can seethe environment variables in thecontainerone is theGPU memoryon the current devicethe total amountof theGPU memory on the current devicethe second is theGPU memory full containerso through the tfsessionyou can use itto control yourGPU memoryfractionso now let's run the demo herelet me see if the demo ispossible or notcan you see clearlycan you see clearlycan you see clearlycan you see clearlycan you see clearlycan you see clearlycan you see clearlycan you see clearlycan you see clearlycan you see clearlycan you see clearlycan you see clearlycan you see clearlycan you see clearlycan you see clearlycan you see clearlycan you see clearlycan you see clearlycan you see clearly所以,這就是一個簡直的規定所以,這就是一個很大的規定我要12GB的顯存器這就是我要12GB的顯存器現在我們通過褲子條把它用來用所以我們用來用來用來用來用來用我们将它进行进行我们看能否进行看到了这个运用运用进行了现在我们看到运用运用的项目再看一看运用办我们可以查看运用办我们可以看到所以这里它现在的运用办我们还可以看更细节的信息我们可以看更多的信息我们可以看更多的信息是哪个运用我们可以看到哪个运用办我们可以看哪个运用办我们现在再来论出我们第二个运用现在我们可以运用第二个运用我们第二个运用也是一个小丁派的运用然后这个运用违忆派我们显存的是3GIB影片运用是3GIB我们再把这个运用讨讨成功之后运用等选定我们可以使用运用我们可以看到运用 Document你能 Daten live演� Mansion等选的运用这两个项目都被调到了运用你要管更多的权利你能够 chasing enough追着运用的信息然后我们现在真实的登录到刚才的这台机器上这个是刚才真正运行两个应用的节点这些是两个键盘 这些是两个键盘 这些是两个键盘 这些是两个键盘但是它好不容易说当你的节点调多的时候你会很容易让它进行节控管理所以我们有很多不同的节点你可以在这里看看视频的记忆以及使用的温度基本上大概我介绍基本节奏所以我的介绍就是我们的计划和计划所以我们现在可以把NPS进入我们的解决我们也会把这种方案推出一个我们更多的方案包括首先它会支持GPU-RDMA更多的解决所以它可以支持进入这种方案我们在想可以把NPS的解决做出这种解决更多的解决这个是我们的开源项目这是我们的开源项目是我们的开源项目如果您有免费您可以把这个解决试试这个解决让我们看待您的回忆我们在这里iCloud它是一个开源项目的技术如果您有免费您可以把解决开源项目谢谢您有任何问题首先一会您认为GPU的解决和解决您会想起这些解决和解决同样的解决所以GPU与解决同样的解决解决和解决解决和解决的解决是不同的所以这些解决和解决您可以把解决和解决与解决的解决使用者可以与解决解决解决解决解决解决这些解决我们也有谈谈解决和解决解决和解决一些一个一个也是这里这是这个这棟文字这里是这里这棟文字这里这个这里这里打扰这里这棟文字重新重新重新重新重新重看有些人會用不同的模式進入同一個項目的項目,而不需要再加一項模式。但如果你想加一項模式是必須的,你就可以再加一項模式,那就可以了。第三個問題,你提到的,是MPS,MPS,MPS,MPS,MPS,MPS,MPS may have feature loss by 10% or 20%.Do you have the specific number?Well, we test on valewonder.We also talk with our colleague.I didn't run a test.他們已經有很多的應用購入同樣的GPU所以有些限制現在的問題是在GP80 上它應該是發展不過 傳統的程度不太清楚所以要知道這些GPU can be distributed in a reasonable manner in these KubernetesSo they hope to get a systematic scheduling Regarding the isolation, I think it involves everyoneFor MPS, we can set the MPS covers rightWe can set the specificationFor a video memory, if you use the sensor blowthen you have to do it when you're wrongSo actually the gentlemanask the question nowI want to ask as wellSo just nowYou mentioned aboutyou can group the while performingGPU togetherWell, first I will talk about the Kubernetes schedulebut here my focus is aboutGPU sharingSo your question is not within this topicand later on we can elaborate on thatmaybe for the other personbut if you're interested, we can talk in privatebut again, your question is not about today's topicbut we can talk privatelyGood afternoonI want to confirm thatyou use the MPS rightwell, we are testing MPSbut we are really cautious about the MPSbecause the stability of the MPSis the main problemespecially for the online seriesyes, but from other channels I knew thatand if you don't use the MPSeven though you share itit won't run the project as latelyyes, that's rightso that's the utility of the GPUcannot meet 100%at some point, right?nowell, actually we need to cooperatewith the usage team as wellwell, I have a question about the memory allocationwell, for a lot of the GPU cardcan meet the requirements of the GPU extensionso do write itfor example, if the memory of the GPU is quite smallso that if it is rankingand usagethe fragments can be small as wellwhat do you mean?well, actually the memory cardof the GPU are differentbut actually if the memory is largeand then it is used for the sharingit will have a lot of the fragmentshave you considered that at all?yes, that's rightI think that is quite a good considerationand this is our architecture currentlywe have two booksone is the builderand another one is the prioritizedthis kind of the prioritized pointon the nodehaven't been doneand actually we can add thatto our task as wellI think it is not a problemthank yougood afternoonI also have a question about the pointif there is a batch of those taskshas already beenscheduledso I would like to knowthat it will cause a fragmentation problemso do you support the rescheduledprocesswhich means thatyou can reschedule the tasksto a GPUwhich can meet the requirement of thatand then release a larger memoryfor a specific GPUfor another taskwell, actually it is not reschedulingright?yes, rightand if I would like to do reschedulingI can need to improve the schedulerbecause it can only provide some good pointand all this kind of the good pointit's already to do thiswe know that when that has already beenhas already provided some portfor the prioritizationand which is doingwhat you just describedbut they are not available for extensiontwo questions are not related to the GPUone is thatyou do the reschedulingbeard the filter standardso what is the protocolof the filterHTTP or what?HTTPokhave you encountered a problemof the bottleneckof the performance?well, actuallythe performance-rated problemhas already been encountered by usbut firstfor the Kubernetessupport to the extendedhas already been scaled upit supports some filterit means that only the specificextended filter can be runor can be scheduledand have some specific requirementsthe expender will not be scheduledwhen there are some specifictype of therequirement is appliedit will use this kind of the special onefor the defaultsextended for resourcethey can be filled in the schedulerif we combine this filter togetherso that some reportscan be received by the expenderessentially what you need to dois to segment thischallengeanother onewell, actually the time is upso let's talk about it privately laterthank you