首先非常感谢大家的到来看起来其实我们来是可以用中文讲但是因为我准备的是英文然后相信大佬们也能听懂我说的英文所以我后续会用英文来演讲OK, first thing firstThank you so much for being hereIt's my great honor to make this presentation todayI am a wisdom engineer from the Opsol BF team at Alibaba Cloudand I am going to show you our explanationin using AI GC technologies in the Opsol BFOK, please allow me to introduce myselfand the service we provideOK, I work in the Opsol BF teamand the service we provide is called AMSwhich means the Application Real Time Monitoring Serviceand it can be assessed in Alibaba Cloudand for me I specialize in AI OpsHere AI Ops means we are using algorithmsof course including machine learning algorithmsto make the morning season easier, saver, and fasterand by the way, I am a doctorand I work in AI Opswe have four parts for today's presentationand the first part is aboutwhy do we need a natural language to promote culture?and the second part is the role we have takenand not only going to show you the technologiesthat we have tried and turns out to be effectivebut also the technologies that we havemade many experiments and it turns outthey are not as helpful as we expectand the third part iswhere are we now?I am going to show you a demoof our Natural Language to PromCal Chapeland the first part is about the future developmentso what is PromCal?PromCal is the query languageis the query language intime series database for promisilsand promisils is calledthe de facto standard forcooperative applicationhere the de facto standard means thatalmost all of the matchesthat we correct from thecooperative applicationwill be de facto and will be queryin promisils databaseand we can see an example herewe can use thewe can use the promcal hereto get the top ten applicationswith the highest response timeand so why do we need a chatbook?it's like the answer lies in three Cstheir first C is aboutstands for complexcause the syntaxin all the grammarin promcal isvery complexcause it is very differentfrom the conventionalsecal languagecause we do withthe promcal do withthe matches thatstole in vectorsbut not the teststole in tablesso you can see the syntaxin the promcal isvery different from what we usein the secaland the second C stands forconfusingcause the matrixa promcal isconsist of thefunctions and the matrix namesbut the matrix namecan sometimes bevery confusingthis is because the matrixstole in the promisilsis correctedby different agentsthat provided bydifferent companieslike the skywalkingof open time matrixand of course armsand databasethe matches namecan bedifferentand sometimesthey are very confusingand the third C standsfor comelycause promcal islanguage thatvery common usecause not onlythe developmentsand the SIScan use promcalbut also the leadersand the PDsbecause we can getthe SISLAs, SLOsand incomesall comesPVs, UVsby using promcalsothat's why we needa natural languageto promcal tableokaynow is the role we have takenI'm going to tell yousome technology hereso what is the first thingto do?we are going to builda natural languageto promcal tablemaybe someone will saymaybe we need tobuy some GPUsor maybe we need tocorrect some materialslike a natural languageto promcal Q&Asbut they are importantbut they are notthe first thingwe need to dothe first thingwe need to dois to testwhether thecharGDPis good enoughin this scenarioif the answer isyeswe don't needto do anythingall we needto do isjust to weighthe Tongyi Qianwenand we canuse the APIthey providelucklywe have decentexperimentsand it turns outthatcharGDPcannot handlethis textand we cansee an example hereI askcharGDPtoand check methe top 10 applicationswith thehigh-gizarrange response sumand itprovide me onebut this oneis notthe one I needcause thematrix name hereis differentwith what we usein our systemit doesn't workand it iswronguser intentionrecognizationbecause thearrange response sumis thearrangeper requestnot thearrangeper minuteit doesn'tmay any thingsat least in theapm seasonoksothe lmthat forgeneral purposehas someknowledgeimpromptial syntaxcause thegrammar hereis correctbut it is likeof aknowledgeabout the arm seasonour seasonand he has no ideaabout thematrix nameand user's intentionit seems likethe lmneeds moreknowledgeand we needto put theknowledgeinto the lmwe have twonow wethe second stepfor usis tomake a decisionfor anyonewho wants tobuild anlm-basedserviceinthe firstthe second stepis tomake a decisionwhetheryou aregoing tobuild afight trainingprojectorpromptengineeringpromptengineeringoneherethefight trainingmealsyou aregoing tobuild yourown lmwith enoughcompersor we cansaya languagematerialson the other handin thepromptengineeringmeans thatyou aregoing touseanexistinglmfor examplewe canusechar GDPor wecanuse綜藝千文withoutanymodifyinganyparametersbut insteadjustprovidesomepromptsowhat isprompt?promptistheadditionalknowledgethataccombraceyourquestionwhenit isbeinginputtedinto anlmwe cansee anexamble hereby yourusermaybewe cansayanuserisaskingcouldyoupleasegivemeapromeqlandIwanttoseetheresponsesomeofeachofmyserviceandtheknowledgeistheistheinformationweyouprovidefromyoursystemImeansthecontactsingreenandcontactsinyellowandororboth of the knowledge in the and the query will be input to anlmlikechartgdpandthelmwillgiveyouanbetteranswercauseithasmoreknowledgeaboutyourseasonandyourintentionkaywealsomayadecisiontreeforthebeginnersinlmbeforwhointerestinginthelmbaseservicesothe firstquestionyou needtoask yourself is that are you an experienced nelp here if you don't know what is a nelp here just just choose theprom engineering one but if your answer is yes the second question for you is do you have enough GPUs I mean like dozens if you don't have that much GPUs fantastic you can try the you can use theFight Training Project maybe you can build your own charger dp in your film you can build a companybut your answer if your answer is no we recommend the Prom Engineering and don't be sad cause ithave been shown many times Prom Engineering can achieve the performance just as good as theFight Training one in many films and let's turn to another branch if you are not anexperienced nelp here the second question for you is do you have some good knowledge in somefields like if do you have a good knowledge inquantitive or observity if your answer is yesthen the Prom Engineering is the best path for you ok and if you don't have any goodknowledge in any fields or just found someone to help you is that for us for our teambecause we don't have some experience in our period neither do we have enough GPUs so wetry the Prom Engineering one so what kind of Prom works this slice is the most important oneat least for me I wish I can I can see in like several months before it will save me a lot oftime and effort and of course money from my team is very important ok our first try is towe use here the prompt is the information information about natural language to Prom cloud so we in the first step we use the documents about Prom cloud from the offshorepromisio's website and it turns out the accuracy rate is behind then 5% is a disaster wemade an analysis and then it turns out the matrix name are different from those we use in ourseason because we need the matrix name in the in the Prom cloud and secondly the documents aretoo long and the LM just cannot get a point about how to write a correct from cloud and thesecond second try we did is the we correct the QAs about promisio's from a communityplatform called StackOverflow and it turns out the performance is a little bit increased but it'sdo very well cause the similarity the matrix are different from those use in ourseason secondly many of the answers are incorrect and thirdly son of function dimension isoutdate and each has been invalid now so after these experiments we see thatwe need to provide some information about our matrix then and we have the third trialok we use the QAs correct from our users and we ignore the reasoning steps and thepromco has a very but it turns out it doesn't work cause the promco has a compactsyntax and the answer list of reasoning steps are insufficient for the LM and so we needsome reasoning steps cause we need to adjust the compactsyntax so we use the QAs from ourusersplus the explanation of promco that generated by CharGDP and it turns out theaccuracy right here is like 20 20% maybe and explanation but it's not good enoughthe explanation are not always correct because it's generated by AI but not humanso it doesn't work on and such prompts are sufficient and insufficient for the LM thatgenerate accurate promco after a very long journey we found a kind of prompt thatreally work is the the trade of song prompting which is proposed by Google Imean the Google Bray Lab if I surreglar and that time the accuracy increase from the 20% to 70%at that time we use an open source LM and then later we use the Q1 I mean the oneprovided by Tongyi Qianwen and then we get another 10% increased ok so what istrial song trial song is the kind of prompting engineering technology thatGoogle use to improve the performance of the HRGDP to deal with the logical task likethe mathematical task we did in our primary school is quite interesting here we cansee an example and we can say the trail song prompt is the knowledge yougive to the LM and the trail song prompt is the knowledge with reasoning steps wecan see an example here in the standard prompting the question raised by theuser is the is a very simple logical or mathematical task like theCoffee Tianan have 23 apples if they use 20 to main range and both six morehow many apples do they have and Q&A with similar is which is a similar questions likethe one raised by your user all of this will be pull into LM but thecharged B cannot give as the right answer and because the answer for the promptthe prompt is a mathematical about mathematical test about the Roger'sit just say the answer is 11 without explaining why the answer is 11 but inthe trail song prompting in the answer we not only provide that the answer is 11but we show that how I can get 11 here the contest in blue is thecom is the knowledge with reasoning steps at that time the Charged DP cancan get the right answer you can chart this this example in your Charged DPit really works so okay if you are familiar with prompt you can get the ideabecause writing a prompt is very similar to what we did in our primary schoolis just like a mathematical task we can see an example here is the trailof song prompt in the natural language to prompt the question here iswrite a prompt and show me the top 10 applications with the highest averageresponse time in the past five minutes the first step you need to get the average response time the first step for you is to concure the total latencyof each application in the past five minutes with the following prompt issome by service and some over time HGTV request five minutes and thenyou need to concure how many requests or how many costs in the past fiveminutes and you need to use this this prompt callor what we change is that the the metrics then we use the arms HGTVrequest count but not seconds we then we get the second prompt callfor the third one the third the prompt call we get in the third part isis to divide the prompt call we use in the firstfirst step by the one we use in the second stepif you are familiar with prompt call you can get this andnow we can get the average response time for every service at that timewith the prompt call in the third steps but what the user asks is to showme the top 10 applications with the highest response time sowe need another topic function here then we can get the finalok let's turn to the architecture of our chatbotwe have both off-line system and on-line systemin our off-line system we have many natural language to prompt call Q&Aexamples and of course in the choice of style just what i haveshow you in the last last slice and all of these Q&A will beput into a test sphere here and these documents willbe cut into several Q&As and then all of these Q&As willbe chun into the vector that consists of numbersnumber yeah it's called embedding after the embedding we getmany vectors that consist of numbers and all of thesevectors will be stored in the database here we gotwe got the off-line system now we turn to the on-line systemhere when a user is asked show me the top 10 applications with thehighest response time similarity the test thiswords will be chun into the newneuranglument vectors consist ofvectors consist of numbers and then we can get the user input vectorand then the next step is to search the top k vectors from the databaseand with the highest similarity it's very easy to to know thatand now we have k plus one vectors and all of these vectors consist ofnumber will beconverts into the test and then we can getuse the question raised by our userand and the k Q&As that's similar to the scenario and all of theseinformation will be put into the lm then we can get the finalpromechall we want okay i'm going to show you a demo you can see thetreadbook here it's the need to put thisyou can see oh it doesn't work it's okay we can see thetreadbook here is provided by arms create it and create thebottle calledpromechall and you can see the scenario we havesupported and you can just ask it an answer aboutpromechall it's like write apromechall andi want the the top 10 applications with thehighest response time and the lm is thinkingthen you can get thepromechall and you can get the reasoning stepsand we will turn we will putpromechall into our grofana serviceit's grofana but it we cannot see the whole service and you can see thematrix there and it looks like please answer right a little bitand i will go you i will show a random service random serviceand i will write apromechall and show me the response time ofapplication like bra bra bra and just wait for the answerhere is the answer provided by the lm and we can put it in our grofana servicerun query and it is no data because we don't really have a servicecall a file or adjust it randomlyremove this and you can see the application of each service in armswe can also raise the question by in english just saywrite apromechall and show me theact the arrow count of my servicethen we get thepromechall here and put it in our grofana servicecreate run query and no data because we don't really have aservicecall this remove it and you can gather rise and we you can see theservice we have the scenario we have cover and we also have some examplesabout thepromechall it's a demo and you can turn to anotherslice okay so where are we now currently we have cover 14 scenariosincluding the response time error rate qps andhttps.com and the accuracy there is like about40 about 76.69% and also for thescenario we didn't cover or the wrong case we can also provide maybe the finalpromechall is not correct but we can provide veryhelpful information write the correct matrix name and the proper syntaxokay the most important thing is because we use theprome the prompt engineering one but not the fight training sothe it is very easy for us to cover a new scenario the only thing we havewhat we have to do is to add some like five examples about the natural languageto prompt example here okay we also made some comparison amongour query system and the chart gdp we check that chart pv without any promptand it turns out thepromechall it provides is wrongalthough the grammar or the syntax is correct but the matrix nameis not because they are they have no idea what we use in our systemand the intention recognition is wrong becauseit is we I am asking the arranged response time and it gave me thearrange per minutes it doesn't make any sense in our scenarioand you can see the arms query assistant I use the Q1Tuber providedby Tongyi Qianwen also in iBaba cloud with three four to five promptsand thepromechall he the the query assistantshow me is correct we can see the correct syntax correctmatch name and correct intent user intention recognitionnow we turn to the row aheadnatural language to see natural language to see co-language is a veryemerging topics in the BG field and also in thedatabase field and something elseand the reading companies like googleand the Datadoc, Dynatrace, and the Neuralic have releasedtheir natural language to see co-language and thegoogle also provide a service that can translate fromnatural language to prompt crowd but even but they are just invitingtheir user to test it out but it has it cannot be assessed by the publicnow and even google cannotguarantee the 100% accurate rateatwith the chargeGDP and if we have a very wrong way to gothe first my store should be the natural language to chartto chart a to chart service here we can get the we can get the trust means that the the program crowd or thecicle language provided by AI is 100% correct then we can get thecharge but it has and the topic at that timeshould be chart with me and get the charge you wanthere is the first my store and for the second my store is thechappals for morning it's just like experience engineerit can give me the topic should be chart with me and get alertand a fixed suggestion you want but we still need a humanbeing to make the decision whether I will change the suggestionI need to someone to to confirm these things and for the firstfor the first my store is the chartoops at that time the topicshould be chart with me and I will take care of your system and all ofyour applications it's just like a human being it's like achallenged and experienced and of course hardworkingengineer just like your courage at that time it's the realchappals okay oh thank you so much for yourtime and your attention we have discussed a lot today and wehave own I have only two message that I really want toshare with you the first one is that the trail of songreally works in the natural language to promote our traveland the second one is that the Q&A expands examples withstep-by-step program crowd generation are effective in ourscenario thank you so much for your time and enjoy yournatural day holiday if you don't have a holiday just have anice day thank you so much or you can ask me inenglish or chinese英文可能比较差然后刚才有些地方可能听得不是很清楚然后我想问一下就是这个和统一签问是有什么关系吗就是我们是基于统一签问去做的吗对是的我们就是我们给那个大模型提供一些prom 就是一些提示词然后我们最后会调统一签问的接口由他会给到我们这个答案ok 然后上面那个76%这个是怎么测出来的啊这个是准确力的是我们这边是有有51个case 然后我会按照就会给他们打分如果他完全对的话就完全是我问的那个意思的话也有我直接能运行这样我就给他一份如果是他稍微错了一点点就比如说有个上拜他加了一个下滑线这种就给他扣一点分然后这么整个起来之后就有一些评分机制有了一个有了51个case然后就给他算到大概这么一个准确率这个测试几率有没有可能再放大一点是可以的我们也刚好有这个打算其实这块就是最理想的是有一些公开的标准的数据级嘛但是因为做这个自然语言到plumcloud的人其实很小据我们所知就是我们阿里云做了然后google做了别的都是做一些他们自己的查询就像做ccode就非常多所以这个我们也有打算把它扩大一点但是需要一点时间因为这可能有点小我想问一下那个普罗我因为有很多版本嘛就是可能在一个公司里面也有可能有很多版本并存的那如果是它ql有区别的话你这边就同意签文能够自动识别出来吗还是需要自己再输入一些别的什么信息这一块有两方面吧就是我们这个其实这个plumcloud跟你那个普罗米是普罗米就是自己的版本其实关系没有那么大因为它发布新版本它可能只是加了一点点的这个算子然后你那些老的算子像这种什么topk somebody还有some over time其实这种是一个非常非常经典的它们不怎么会动的所以您说的这个问题可能对我们这个性能影响不是很大就算如果说您这个是真的影响到了一些算子影响那它使用了这也没有关系我们这有提到一个是因为我们用的这种是plum engineer就提示词工程这一块的话你只要给到它一些相应的例子它就能get到你的那个点就是你的提示词包括了比如说普罗米就是新版本的新的盘述我给它例子它就能放回来给我你们就是在那个普罗的基础盘述上面然后再进行组装了是吧对对这种就是我们写的plumcloud都是标准的plumcloud只是它的那个matrix name是因为是我们自己的指标是alms的指标然后我们给它拼出来的谢谢因为我想得比较好奇的是如果普罗它因为它是在不断升级升级的话它有些语言或者说更简洁或者说它就会再给你包一层的话你这边能够自动去做这个识别吗还是说还需要再训练应该是不能自动做识别的但是因为它这个就是从算值这个角度它其实更新了没有那么快的因为它版本更新它基础算值一般来说会变得少一点对那现在这个语言就是因为普罗它有采集像容器层面包括数据层面这种它这些语言都能够兼容吗可以啊只要你给我例子我就能写好的谢谢啊我自己也在做一些Expression生成方面的一些实验吧然后我看到你们有用GPT3.5 Turbo来做这个PromQ的生成然后你们有测试过GPT3.5和GPT4在这个生成效率或者证确性上的一些区别吗因为就我自己的经验来看就是其实GPT4的这个生成的证确性上面会远远高于3.5然后不知道你们在这方面有没有什么关于3.5的一些调优或者说是有一些PromQ上面做一些特别的一些工作好的非常感谢我重复一下这个问题因为这个规矩是要先重复问题我刚忘了就是您说在这个ChartGP3.5和ChartGP4.0就是在这个NaturalMV2.0的这个性能上嘛首先我们调的是同一千万我们调的不是GPT3.5然后其实这一块虽然说同一千万GPT3.5和GPT4是一种非常非常好的模型但是我们这里也有分析了它虽然很聪明但是它其实可能是因为数据比较少就是它是不可能知道就是我们一个系统里面的那些指标的名的这个是因为它就没训练进去嘛这个是一个点所以我觉得即使你模型再调优了我们还是需要一些额外的关于我们自己的信息给到进去的然后还有一些意图识别这些都是要自己的信息但是如果说那个模型它自己本身比较聪明的话它可能就是那种泛化能力比较强就我可能给它一个容器的我给它一两个例子它就能很快get到别的例子但是我们因为我们是埃利云我们只能用同一千万所以你们在你们就是用同一千万有取得比较不错的效果是吗对挺好的就是我们的Case当中就70多到80好 我没有下问题谢谢您刚才有提到那个限量数据库然后是有用像那个一些开源项目嘛就是说能够把限量数据库和那个联合起来用的这种项目吗这个我们一开始做实验的时候确实是有用到一些开源的限量数据库的就是存我们那些vectors嘛但是后续因为我们下次要设阶段等我们真的商业化之后我们可能会换阿里云自己的限量数据库因为它现在已经released的就在前不久有一个这个东西就是有没有像用到一些什么ChadChad这样的一个ChadChadGLM这样的项目去和限量数据库去对接呢就是就是那个ChadGLM我们之前就一开始探索的时候因为我们探索得比较早当时那个同一千万还没有面向社会嘛当时我们确实是用过一些的然后发现它的性能可能还是比就是我们后来又换了那个同一千万又发现同一千万它可能确实是效果比较好而且它参数比较多我们就给它全都换掉了现在我们那个机器人的背后就是同一千万OK谢谢我问一下一个是就是在这个Inventing那块我不知道在这个做experiment的时候有没有就是比如说选择的那个model有没有做一些这种取舍或者什么其实是有的我们做过一些尝试但是这块其实因为我是跟我们组的另一个同学一起负责这个LM应用其实我自己对这个Inventing我只是选了一个可用的我们有特别的去调优它那我发现可能可能是因为我们这个Inventing它可能会对你去匹配那个知识的会有非常大的影响嘛但是因为我们那个Prompter把它写上那个Trail Song的时候它都很长就它怎么它都能配到就是跟它最合适的就是我们就选了一个可用的之后我们就能达到70%到80%的效果但这一块也是我们后续继续调优的一个方向谢谢我还有一个问题就是在做那个PrompterEngineer的时候有没有遇到过一些比如说因为你要准备一些example嘛对然后这个example的这种代表性或者是它的量不够或者是就是对那个效果会产生影响嘛我不知道这个实际的试验的时候有没有遇到过有但是其实在我们这一块的话我们就是因为也可以看到我们下覆盖的场景也就十四个嘛也不是很多然后其实每一个场景你给到就你匹配到的三到五个就够了这个Trail Song其实还是非常地work它在它的原文里面它解决的是那种小学应用题嘛就是那个Coveterian的那个它也是三到五就可以有一个非常好的效果了所以跟那个Use Case还是有关系的是吧因为这个特定的ProngQ的这个程度对但是我们的关键它其实还是匹配到那个用户想要的算子和它想要的那个Match跟MT其实你这两个给到了它基本上就用它自己大模型的能力它就能给到你一个很好的回答了这是我们的经验很好的谢谢谢谢我觉得最痛苦的这部分就是这个图就是你要找到一个合适的ProngQ这个就是因为你可以看到第一个观点直接从百分之十几给我升到了百分之七十几它其实那个Prong Engineering本身虽然说听起来给提示词这件事情很简单但是真正去做的时候发现它就是整个项目中最关键的点就是那个那个合适的合适形式的ProngQ它是最关键最难的我们花了很多的时间在这一块那你说你是ProngQ的条约对吧反而是对这个我不知道这个这个阿里的大模型有配合咱们这块Arms这个有没有做一些对应的优化没有就是相拥完全没有是吧对就是相拥的那个Q1 Tube就是相拥的直接能调的那种我这边其实有些这个经验我想跟你分享一下但我不知道这个你又没有这样的对应感受就是其实做这块工作如果你找到一个很好的模型的话会让你的ProngQ能四倍公办但是如果你底层的大模型如果很拿夸那你会花大量的这个经历去写ProngQ工程这个会整个过程会很痛苦我不知道你怎么看看这个事谢谢这个问题其实我们也有遇到也不如就我们也试过就是我们找到了就是那第一个拐点它拐到百分之十几到百分之七十其实就是你找到了那个COT你就能到达然后那一块我们想要从百分之七再往上升的时候我们确实是做了很多的努力就我去调我那个ProngQ怎么那么低又发现我们的那个性能它就卡在大模型那儿那就算了以后我们这个项目就停滞了得有一个多月以后我去干别的活了以后回来我听说那个同意签问是9月13号可以公开使用了我换了一下还真的有用有有给我涨了百分之十确实是这样的谢谢谢谢所以您这个项目本身没有对这个同意签问做任何的反听是吧没有完全就是PE之后再加一些RAG到你的下两个去扣对对 是这样的谢谢谢谢那我们有请下一位演讲者谢谢大家