今天我会做这个talking用英语来讲为了就是其他的这些观众OKSo, hello everyone, my name is Feng YuanI'm a software engineer from BenwareI've been working on a project called PKSsince its beginning for almost two years nowSo today I'm going to talk about load balancingespecially in the context of cali nativeOKSo here's what I'm going to doFirstly, I'll give you a brief introduction of all the concepts related to load balancerThen I'll walk you through the lifecycle of a simple packetand point out where and how load balancing is happeningAlso, our list of all the techniques can be used to implement some concrete solutionsI do think that sometimes it really helps to get your hands dirtyto gain a better understanding of all the conceptsThat's why I prepared a demo at lastto show you how to build a reliable and scalable load balancerby running a bunch of very simple batch scriptsOK, let's get startedSo I'm pretty sure everyone here are already familiar with the term load balancingbut what does it mean?It cali native contextRightSo around five years ago, I mean, even for nowsome big companies are still depending on very expensive boxesto implement this or to fulfill this load balancing functionalitythey are greatbut they do come with a bunch of disadvantagesfirst of all, they are very expensivethe upgrade is very costlyand it takes lots of training and moneyto get the internal working of the very expensive boxesAlso, as our application scales upit's very hard to meet the HA requirementbecause the redundancy model here it provides is one plus onewhich means you have one active load balancer runningto serve other requests and the other one is standing byit's also syncing states with this active onealso it brings itself up once it detects a failure of the active onebut it's just not good enoughAlso if we're using hardware or some software in the hardwareto implement thisit lacks the programmability and flexibilityfor quick iterationthe same way we do for our cali native applicationsso we like to have our load balancer defined in softwareand to scale our load balancer itselfthe same way as we scale our applicationswe want to keep this software as steady as possiblethen we have this scale lab modelwhich provides n plus one redundancyAlso the deployment itself is extremely simplifiedbecause we can deploy the load balancer instancethe same way we deploy our applicationwe can just add more load balancer instancesto provide a better load balancing functionalitywith that help we can also shard the load balancer instancesto provide performance acylationto command different SLA for examplecontainer nativeif you think about the neon community deploymentwhere you have atleast three worker VMsand you deploy this one containerthat is serving one servicenow we export this serviceand you havecrisp bounding cluster IP service ingressand before the traffic hits a target containerif you really think about itit's very likely thatit'll hit the VM first dayand firmwarethe routing decision will be made againand the request will be routed to the other VMwhere this container is actually runningso it just addsa bunch of extra latency and overheadand we don't want thatso by being container nativefrom load balancers perspectiveall the containers are the same endpointsjust same as VMsthen we can cut all thelatency and overheadit's also very importantto be the load balancerto be able to interoperatewith other cloud servicesby being cloud nativefor examplesome security-related featuresprovided by a cloud providerlike DDOS mitigationor cachinglike cloud CDNor some identity-based authenticationit's also very important tonot ask our clients or customersto do any manual provisioningof platform-related resourcesin this case load balancers itselfas customers' application scales upwe want the load balancersto be automatically deployedand scaledto be consistent with our applicationalso incognitive worldas we know thatour back-end containersthey come and goso it's a veryit's a highlydynamic environmentand to be able toroute request tothe endpointscurrently endpointswe need our load balancerto be highly configurableto adapt to other changesspontaneouslyso without this being saidlet's take a look atbasic traffic flowor the lifecycle of the packagehere this picture demonstratesa very classic modelwhere we have kindsother packets are generatingkinds and before it hitsa target back-end serverit will reach a middlewarecalled load balancerin incognitive worldwe tend to modelindependent modulesinto separate microservicesand depending on how they interactwith eachotherit can be very complexbut if you really think about itthe service that is making a requestis actually acting as a kindso we still have thisocastic modelwhere three facesload balancing can happenkind load balancer and serviceso now let's take a look at themone by oneon kind sideif you have can be controlledor if you have partial control over kindsyou can start load balancing alreadyin this case you can choose to use grpcby making calls to grpc libraryin your application codeand grpc itself is smart enoughto directly routing requestto different back-end serversif you feel like that's too muchok grpc shouldn't be responsiblefor this load balancing decisionthat's finethey also provide something calledlook-aside load balancerso that meansthis grpc clientcan talk to a third-partyentitythat implements grpclb protocolso it acts as thisentity for routing decisionthen this response bcan be fully delegatedto the third partyit sounds all goodbut the problem isnot everyone understandsthis grpclb protocolso it is stillthe way they expose the data planeiscalls somethinguniversal data plane apiso it abstracts outother data planein all aspectsso grpc is also consideringimplement this universal data plane apiand in that casethe grpc clientwill make request directly tolet's sayis still controllingfor routing decisionand after that the requestwill be automatically routedto different back-end serversif you don't have any controllerkindslet's sayyou have applicationthat is the internal servicethat talk to other servicesand in this caseit's a kind itselfbut you just don't have controlover this codeor let's sayjar package whateverit runs just fineand you don't want to touch itin this casewe can adopt model calledpsychar proxyI believe everyoneare very familiar with thisit's actually the wayis still is leveragingunmultiproxyyou develop proxytogether with other kindswe talk to each otherthrough some localnetworking orstockidsand then the proxywill make the routing decisionsbut most timewe just don't have any controlon kindsand most timeloop balancingactually happensin the middlewareof courseour old friend DNScan be used hereyou can define anaid recordto map youron survey's nameto different IPor even one IPin that caseany cost can be usedthis IP itselfcan be backedup bydifferent serversthat deployedacross the worldactuallyand depending onwhere theserver is deployedthe request can berotted to the closestlocationI'm going to show you laterhow tohow to do thisin termsVIP and ECMPsoVIP here meansvirtualized IPinstead of thinkingas something thatwe assignedto a physicalvirtualizednick onvmspecificallyit's morelike aabstractionrightin L4networkingit's like a nameit's like adns namewe can actuallymap thisip addressto a bunchof clustersor a bunchof serversand ECMP itselforequalcostmultiplepathis wayto scaleone IPor onevirtualized IPso the ideais quite simplenowthe clusteris backedupby thisrotterrightthat's servingthisVIPand we havea bunch ofrottersthat'sservingthe sameVIPwe asktheserottersto broadcastthisVIPto theupstreamto thesameupstreamrotterthenfrom thisupstreamrottersperspectivethisVIP is actuallyassociate witha bunch ofrotter as a numberof low balancerinstanceand backhandservernumberchangedif that happenswe want to keepall the packetsfrom thesame streamor same connectiongoing tothe samebackhandserverand consistenthatchingis a techniquewe usehereso it givesyou thisguaranteethenumberofserversisenit's good enoughsometimesfor example ingooglecloudthis isactuallyit's actuallythepaperpublished bygoogleand this isthe thingthat'ssupportinggooglecloud networkingrightto themlow balancingis very importantso theydon't carethat muchabout connectionit'stogetherwith consistenthatchingto minimizethe numberofdisrupted connectionswithall thesetechnicsit'sall goodsometimeswe stillhad a limitationthatthe low balancerinstance itselfhas to be deployedin the sameL2 networkwithourbackhandserversI'llgive youmore detailslikescalebeyondthisL2networkingrightwedon'twantwantallourbackhandserversto beprobablyin the sameL2networkingoron thesame switchsothe wayto dothatis very simplewe usea bunchofIPtable rulesin L4as we knowexcept IPwe alsohaveport numberrightso hereport numberispart oftheadjusttheabstructionrightso it'svery importantsometimesto beableto exposethe sameport numberacrossorrottingmatchindockerswarmand they aremostlyimplementingby somethingcalledIPVSwhichgonnademonstratehow touse itlaterwith a bunchof detailsacrossso thismatch can beused hereit provides somethingfans herelikeL7different backendserversso nowwith thisconcept being saidI mean thisconcepts justvery dry hereandreally talking isvery cheapso let's justget into the demorightas we cansee fromthe previous listhereall these techniqueslisted hereso L4load bouncingis actuallythe mostfundamental buildingblockand that'showorI'm going to show in this demois to teach youhow toextendone IPto make itan nameor an abstractionin terms ofL4 networkingand I'm going to do thisby usingsome widelyavailableopen sourcesoftwareslikeIPVSor justdummy interfaceorandBGPorECMPbefore I start the demoI'd like to give youan IPvirtualserverit's actuallysomethingthat hasexistingLinus kernelfor over a decadesurprisinglyrightand it's widelyused nowin Kubernetesand Dockerto implementthe thingsI just mentionedlike custerIPand rotting meshand nopoorlike thatand what it providesessentiallyjust L4load bouncingwith manyinstanceswith otherIPVSinstancesso thatif one of theIPVSgoes downwe can stillkeep the statesthey aredoing thatthrough multicastbut it only supportsIPV4 nownow let'sstart the demoso in thisdemoI'm going to startfrom thevery simpledepartmentherecan everyonesee thegraph hereor I canthe top boxherethis guy hereis therotterso it's actuallymy laptophereand deploytwo otherVMSthe first onehas IPVSrunning on thatthat servesas a loadbanserand this guyhereas a serverit runs avery simplego-lanHTPServerand the kind hereis actuallyalso my laptopI'm going toornetworkingaddresstranslationideais very simplethatwhen a requestcomes to theIPVSthe destinationIPwill betranslatedto thisrealservers IPandthe defaultrottingfor this serverwill beset to this guyhereso when theresponsewill berotting backandso hereI have the firstVMthat has IPVSrunning on thatso andthe same boxhereI have the secondVMthenI'm going tostart thisserverrightso what thisthat doesreally isas you cansee hereyou twostep onea step twosorry it'stoo smallit removesthe defaultrottingthenIstart thisIPVSon the firstVMand Iadd thisserverinto theIPVSas you cansee herethe virtualIPis configuredround thisIPVSand wealso addthe serverinto theIPVSnow whena requestgoes tothisIPVSall thekindstartsisjustmakingcurlrequestto theendpointwith avirtualIPand itstepsonesecondeverytimeforthisonethousandtimeswhichishopefullylet'sgo back to thePPTand seewhat's wrong with thatso thatis verypain and simplebut it comes with a bunch of disadvantagesfor one thingall the packetsfor this connectionhave to go throughthe same IPVSthat's a low answerit's not likesomething everyonestands trapped into thatlet's say you aredownloada huge fileso in that casethe request itselfis way lighterthan the responseit doesn't make any sensefor a responseorgo throughthis low answerso that'sone thingthe other thing iswe alsoadd extra overheadfor thecritical pathherebecause weneed tomodifyeach packetrightwe aredoingDNAT herefor each packetohthank youfor each packetso wewould liketolet aserversend adirectrottingthe ideais thisserver herethis IPVSherewill sendthe packetin L2directlyto this serverit doesn'tdo anyip levelchangesit sendsthis packetin L2righta switchleveldirectlyto this guyhereand wehave adummy interfaceconfiguresorrysoresponsecan besend back tothe kindthistechniqueiscalleddirectlyserverreturnwithassumptionthatthis serverhas to bepartingin the sameL2natorwith theload balancerlet'ssee howit worksnowI'mmyload balancerokas you cansee herethis serverisaddedinto theload balancerright33.4if wegotothispageherewe canseeit startsservingrequestaswe wastebadbut youmight bewonderingnowlet medo awatch ontheload balancerstatsand see what happensas you cansee herethe activeconnectionfor thefirst oneis64thesecond oneis55sothesecond oneisstillcatchingupandbeforeweareoktodownloadsome hugefiles in thiscasebut stillwithassumptionnowwe have todeployload balancerandserverin the sameL2 networkandwe want toscalebeyondL2the wayto dothatis a thirdmodemulti-loadbalancerso herethe vmisdeployactuallyin another subnetif you cansee thatit'sdeploying34.2these3aredeploying33 subnetso it'sjustremotelyrawableit'snow inthesameL2networkandsoipipisprotocolusedbyipvShereforencapsulationwhichmeanstowrap1ippacketwithippacketthatsetthisstopnowIstarttheserversoclienthereoh my godwellI don't knowwhat's happeninghereit looks likethere's some failurebut that's finebecause we'regonnahavehow-check on thelubancer laterin the demoso nowthe requestsareservingby these serverseven withsome temporary failureprobably becausesome naturalking in this buildingI don't knowif you lookat this pictureso now everything'sgoodwe are able to scalebeyond now toowhat's the problem thenif you look at thedepartment herethe load balancerinstance itselfbecomes a singlepoint failureit's a bottleneckhereif this guy goes downit brings downall the otherserversso we want to scalebeyond thisipthis virtualipthat is supportedby thisserversso thepacket used hereis calledECMPto broadcastthis virtualipto upstreamrotterand we aregonnareplicate thisdepartment hereto havethe secondlubancerinstancewith otherserversand thesecondlubancerinstance isgonnabroadcastthe sameip to theupstreamrotterso that fromupstreamrotterit actuallyhasmorebmsoh nicei'm goingto deploytwo morebms herein differentsubnetsand theni'm going to havea secondlubancerinstancethen i'llset upthe secondlubancerinstanceas you cansee herethe secondlubanceroh sorrythe secondlubanceris herewith twootherp3.5which issecondlubancerand ideallyrequestwill berotted tothe othertwo serversok niceso as you cansee thesetwo guysstartservingrequestokand nowI want toscalebeyondthis virtualipthis twolubancerinstanceand we'regonnaideallyit should addit shouldbroadcast thevirtualiplet me run this manuallyokso we just startedgo bgp on bothlubancersand theybroadcast the samevipto its peerwhich is upstreamrotterand on this hostyou can see therotting table hereit actually hasthis vipassociatewith twonext hopsand thisthis guy willtostandout oftrafficsand tomakethis guyworkwe need tochangethe defaultrottingtoto itsupstreamrotterwhich is33.6andnowif yougo toherelet's seeok coolsoall requestsarestartwe justgo bgptobackup thisvirtualipbytworottersoklet'sthen mimicthe failurehere tobyshuttingdownthe vmandbeforethatI'll addsomehowcheckoksohowcheck isvery importantbyabilitywementionbeforesolet meaddsomehowcheckonlow balancersothehowcheckhere isalso very simplesomebatch functionswhat it doesreallyisit makesrequeston thisendpointif itfailsit's gonnaround thiscommandnow we just startedhowcheckand we can seethehowcheckrequestis showing upon these three serverslet memimic the failureby shutting this guy downcoolas you can see herethe third oneis removed automaticallyand fromkind's perspectivethere's no disruptionat allall the requestsare servedby these twothe third oneeven though it failsit's removed automaticallyby the howcheckfunctionmimic justremove the other twoand see if thetrafficswage tothe second guy hereok let meshut this guy downnow it's downand you can seeit's also removedfrom theipvsthat's herelet me alsoshut this guy downoh niceso there'sno disruptionas you can seewhen I shut this guyit'soverandall the requestsare automaticallyrotted tothe second load balancerright they areservedbythis casterserversso if yougo backto hereyou can seethe first load balanceris automaticallyremovefrom theupstreamrotterok soin this waywe justbuild a veryscableand reliableL4 load balancersince we areshutalso the setupis availableon githubif you have anyquestioncan just ask meor send me anyemailthanksthanks everyonefor joiningand bearingwith mefor thisvery longdemothank youwhat's the differencebetweenipvsand thervssorry can youseesecond onewhat's the second onervslvsandanother open sourceload balancerlvsI am not awareof thatsorryokyeahlvsI think it's justlaning's virtual serveror somethingmaybe they are the same thingit'srightyeah they are the same thingit's just thenames are differentit's the same thingipvs or lvssometimes it's calledlvsok thank youyeahany questionok thank you guys