 Hello everyone, I'm dream I'm very glad to join the open source submit Europe My topic today is how to accelerate Approximately nearest neighbor search for large-scale data set. This is in 101 for our project mules By creating this new infrastructure software for AI. We believe we have helped to make AI Technologies more accessible to every people so that everyone can utilize AI technologies to empower themselves First let me introduce myself I'm dreamers there as partner technical evangelists. I'm also a voting member of the Technique advisory community at the air fae foundation before joining zealous. I help for I Walk for ICBC IBM Morgan Stanley and Huawei I had it being a database engineer for 14 years in these two years. I turn to act as a Database product manager and also I'm the technology evangelist of the rivers project Who is zealous? This is the abbreviation for zealous not zealous So you can know by name that we are engaged in data related technology We are a technologist startup We focus on developing data science software based on heterogeneous computing We drive our software business models through open source. Our vision is to reinvent data science Which means we are aimed to provide data-related Technologies for the new domain the new scenario and the new requirements We want to help people better discover the value contained in data our open source project Nuvus during the RF AI foundation as an incubation project early this year. So now zealous is a major contributor of the Milvers project Okay, let's get into the topic We generally divided the data into three categories. The first is structured data Including numbers states strings and so on The second one is semi structured data mainly includes that tax information with a certain format Such as various system blocks The third is so-called unstructured data like pictures video voice natural language text These are not easy to be understood by the computer Relational database traditional big data these technologies are to solve the problem of structured data and For semi structured data people have a text-based the search engine but Only for the unstructured data which accounts for 80% of the total data sphere has been lacking in effective and elastic methods in the past Until the rise of AI deep learning technology in recent years unstructured data processing is accelerated So the term of deep learning model is that it can convert The unstructured information which originally the computer is difficult to understand to into the feature Information here the invading vectors So the analysis of unstructured data is transforming to vector computation How do people usually use AI technologies to analysis unstructured data? As shown here, it's a so-called flow-based application So AI application is this is a typical example. Assuming we have we are going to Analyze a video we can create some operation streams usually called pipelines the leftmost pipeline captures the video Frames and then extracted the feature from the captured of image Here for example, we can use the VGG model a model with excellent to general generalization Capability at last that we got a major feature vectors here The middle pipeline the middle pipeline handles sound eventually it generates Audio vectors that are converted from the sound and then the rightmost pipeline automatically labels some attributes For the video. So if you have other special requirements, we can build a new pipeline to do related processing This is why flow-based AI applications are so popular because they are flexible and the developer don't even have to write code There are web-based interface to help users to Compose the new process if you have no idea how to start you can even find some useful samples But in this way it also brings As a new data challenge the data becomes very fragmented it was originally only one video but at last With the operation of the pipelines it was gradually Transforming to different data spreading different corners. So what shall we do? Let's turn to a more traditional hierarchical view so this is a application to process and structure data the Top of this layer and at the bottom of the layer. They are all unstructured later AI technologies mainly walks in these two Middle-two layers the green layer which is called the inference layer And then the blue layer Which I call it a data service layer The task of model serving Is to transform and stretch the data into feature vectors models These models are trained by serving them Efficiently is still not easy the good news is There are already some mature projects in the industry such as NVIDIA's tensor RT Intel's open vinyl Microsoft onyx RT Google TensorFlow's TFRT But there is no comprehensive solution for the data service layer at this part some people put vector in a structured database Others they some they will they might put the vectors in HDFS and is that analysis these Vector through spark. Also, you can use some in libraries So in this area, everyone make their attempts. The challenge is how to manage and analyze the vectors efficiently Although a large number of Petrine models are not available But AI technology is too difficult to go production because after the data service layer The cost is too high. So to address this challenge Our answer is to build up the unstructured data service powered by the Milvus project. It contains four parts here the first part is the Invading similarity search to serve the vectors So it includes high dimension vectors in deep learning scenarios and also it supports the sparse vectors in traditional machine learning scenario The second part is the Attributes for the all the scholar data for example a label described by structured data like streams. So combine the attributes and Vectors We can provide the capability of hybrid search or collaborative search So which means you can use the attribute filtering When you do the vector search the third part is to support multi modus as in the previous example a video has Vectors of different dimensions. There are image vectors and there are audio vectors so in the real world The multimodal search is a common requirement So then we need to introduce the concept of entity for the unstructured data and entity could contain multiple multiple vectors of different dimensions The first part is the scoring component in some scenarios like multimodal search Because we introduce different models then the fully connected the layer of different models will need to be Fused to form a new scoring mechanism for the analysis of unstructured data at present Milvers already has built up the vector analysis capability We are constantly improving and enhancing it we are currently Actually by this time by the time you see this video the Attribute filtering function should be available already And for multimodal search and scoring search they are on the roadmap in the future release we will We will put these functionality in a higher priority so eventually Milvers is not just Positioned to be a high performance vector search engine So we want to build a comprehensive infrastructure software for unstructured data service based on Milvers Okay, so maybe you have been convinced that we needed a unstructured data service But why not build it through relational database or big-data technology a Vector also looks like a number. What's the difference between a vector and a number? To be precise a vector consists of a set of numbers the difference between vector and the number I think there are two major aspects First the common operation of vectors and the numbers are different for numbers addition Substruction multiplication and the division is the most common operation But for vectors the most common operation is to calculate the similarity So you see here. I'm giving I'm giving the formula of computing your silly the distance and In the computation of vector you can see here is much higher than the normal numeric calculation Secondly the index organization of data is different So between two numbers the value can be compared with each other So we could create the number index based on the algorithm like B tree for example here But between two vectors we cannot perform the comparison We can only calculate the similarity between them so the vector index is usually based on algorithm like Approximate nearest neighbor and algorithm. So here I give to an approach clustering index clustering index and graph index because of this Significant difference the tradition database and the big-data technology are Difficult to meet the requirements of vector Analysis the algorithms they support the scenarios they target are all different So that's why we try to build the Milvus project to serve the unstructured later the invading vectors So here is the picture picture of Milvus. I want to highlight four parts The first major part is the support of the Hydrogen is computing so I mentioned earlier our company is developing data soft sign software based on hydrogen is computing So we have experience in this area. The team has Experts in the CUDA programming and the SIM programming So during Milvus design we sought about how to support different computing resources So that we could accelerate such computation in cancer scenario The heterogeneous computing resources supporting Milvus including for example the SSC instruction site for x86 many AVX 2 AVX 512 and also NVIDIA GPU ARM Processor, but it requires 64-bit processor and also we are working with our partner to port Milvus into risk 5 Processor, but in your very early stage The second part is the data management function We want to provide the unstructured data service. So the function of data management is critical Milvus supports data partition data sharding deletion of vectors and also stream injection And then the third is the adoption and improvement of the AI and algorithm libraries the capability of vector search is the fundamental function in unstructured data service. So Milvus can provide a good vector search performance that By adopting and improving the well-known AI and algorithm libraries like Flay, Sonoy the first part is support for application development environment to enable AI developers to build the application some Milvus we provide several application development environments like a Python, C++, Java, Algo, REST API, etc Also Milvus is a server more complicated than the algorithm library But people are very curious about the performance comparison since performance impacts the hardware cost and they want to set an expectation when they start to use Milvus So we have run through the AI and benchmark It is a set of well-known AI and benchmark tests Thanks to Martin, Eric, Alec for developing the benchmarks We focus on GitHub and run through the test on Milvus in several different public cloud environments like AWS, Azure, Adirin, and we also run the test on our local machines Here I only captured one chart of Milvus performance The benchmark tests generate a bunch of reports, so you can find the details on our website Milvus.io Since the original AI and benchmarks only use one CPU core during the test We have made some modification so that Milvus could use all the 16 CPU cores in this example So here you can see one million one million vectors Milvus can reach almost 2000 QPS on a single machine, on a single cloud machine With 16 CPU cores So in order to boost the performance of AI and search We have spent effort tuning the algorithm, finding the best parameters, and also utilizing the model hardware So here I will show one example about How we utilizing AVX512 instruction set to accelerate AI and search The paradox in ANS is search less time and the memory consumption Usually faster ANS algorithm like graph type index They will consume more memory, which means they will have challenges when we have a large data set to do with So to strengthen memory footprint We will typically have to use certain kind of compression and encoding techniques But it will introduce more computing workload, which means it will be slower so In this example, we compare the IVF flat index and the IVF SQ8 SQ stands for Scala Quantization is a kind of compression So you can see When using the AVX2 instruction set the index with compression is slower than the index with no compression So you have to you have to make the trade-off you want a faster speed or you want a small memory footprint So usually IVF SQ8 only takes around one-third of memory consumption than the IVF flat index So We now we added the AVX512 support So you can see here with the AVX512 instruction set we get very obvious performance Improvement on both index with compression or without compression, but especially for the index with compression like IVF SQ8 The AVX512 improvement is much more than the Index with without compression because the index with compression they will incur more CPU workload. So it will be significantly improved and Now you can see With much smaller memory footprint the IVF SQ8 index also Get a very good performance compared to the index without compression especially if you wanted to do this Badge query like we get a 100 query Request together to compute to do the search you'll get a very significant significant improvement in this scenario so Now new list could help user to achieve faster search with lower memory consumption so In this test we are using new bus 0.10.3 It's running on Ubuntu We tested this two-way Intel server, it's Intel plan 2 a 163 The data set is a 100 million vectors It's tracked it from the SIP 1 billion Okay So yeah, that's why heterogeneous computing is so powerful in AI applications Now, let's take a look on how Milvers manages embedding data The current release of Milvers. We are working on is 0.11 So in this release, we will change the way how Milvers loads the data into the memory Here is the behavior before 0.11 Once the size of the data file exceeds the threshold of index file size parameter Milvers will trigger the index building process for this new file slides The index file shown here is an IVF index. It consists of two parts The first part is the collection of centrite that The number of centrite is defined by the index parameter a list Every centrite has two pointers which point to the beginning and the end of each inverted list So the second part is the index entries, which also is the inverted list to improve the IO performance. We increase the data locality here So Vector in the same inverted list are stored closely in this box So Milvers loads the data index file into the memory with the same structure as it is on this Now This is how it will work in 0.11 for the IVF index without compression For example, the IVF flat index Since we will add the attribute support in 0.11 now for each embedding There could be multiple files Including the file for vector The file for vectors data and in the files for attributes Each attribute will have its own data file So this data file is only for one attribute. So we can define up to 64 attributes for an embedding These files are aligned through the offset So once the size of the data file exceeds the threshold of index file size Milvers will trigger the index building process for this new file And an index file is still consists of two parts First is the centroid information Second is the offset of the vectors instead of the vector data before so you can see here We only store the offset When Milvers loads the data file Into memory now The memory data structure will be split into the Index part and the data part The index part contains the centroid from the index file The data part It holds the actual vector information So you can see here Milvers will reorder Milvers will reorder the data in memory to improve the IO performance. So originally it's The sequence of vector in the data file is Inserted time like you have one two three four, but when it loaded into the memory Milvers will reorder this Will reorder the vector sequence based on the offset here So it will have better data locality in memory since original Vector data are now cached in memory Then when we vote the Interface to retrieve the original vector data It will be much faster than before because before we didn't Cache the vector data in memory. So every time you want to get a Original value of the vector data you have to go through the file on the disk but in 0.11 it will be pretty fast and Also for the attribute files, it will be loaded into memory the same way as the vector vector data file So this is how it will work in 0.11 for the IV index IVF index with compression for example the IVF sq index and the IVF pq index so the difference is For the compressed index we will start a encoded vector data. This is the encoded vector data into the index file and in the runtime memory data structure, we won't cache the Original vector data in memory. So this part will be loaded into the memory as the index part and For the data part it will be loaded into memory and then reordered So you can see here the difference in the in-memory data structure for IVF index with compression Milvers will will will not Cache the original vector value So if you want to get the original vector data It will still have to go through the disk file. So It will be slower than the IVF index without compression in this case So you have to make your decision which one is more suitable for your scenario Okay, so This is our project journey This is how Milvers was born the initial idea of this project was October 2018 that time we were involved in a Project we needed to deliver the vector search function We try to do it in our structure database, but it didn't fit well That's why we started to think about this challenge seriously So April 2019 we released the Milvers 0.1 And we tested it in our first seed user and they prove Milvers a lot So Milvers 0.5 was the release when we were ready to open source it So now we must is an incubation project in the air for AI foundation today Milvers is the most active project from the Software development Perspective in Linux AI foundation So here is the Current status of the Milvers project we have made near six thousand commits and the 16 release and we have Hundreds of community users some of them have already put Milvers in production Milvers is a young project Less than one year. Oh, sorry almost one year Since it become open source So why did people start to build their AI applications upon Milvers? I think the most attractive benefits of Milvers are First it's easy to use and it's fast, which means lower hardware cost So developer could make a minimal viable product at a pretty low cost with Milvers So now I will show some real world use use cases Okay, the first one is an intelligent writing assistant This application is Supposed to help people to compose some kind of assy Like a year-end work summary cover letter or refer refer a letter the software vendor They first collect a bunch of corporate data after cleansing these data. They are encoding encoded with taxi and To extract the paragraph and summary then they will further encoded Encoded these corporate data with infoscent model at last We will get the vectors the invading vectors of the natural language paragraph and then they They are stored in Milvers. So when any user Submits the writing request It will go through the infoscent encoder and perform a vector search in Milvers The search results will be further transferred to a draft and Get back to the user so they can make some necessary modification and They will get a Assy pretty quick This is a very useful small tool Yeah, I tried it in my work year-end work summary The second example is a big data technology big data technology company They have collected a lot of corporate credit data including 55 million trademark images. They want to provide their members with the function about Search our company through the trademark image So they build the image search function upon the fine-tuned VGG model and Milvers Since nobody knows how many people will become their new member Just for this new AI function. So the development and the hosting cost is very sensitive So they're very happy with the performance Milvers provided. So you can see here With 55 million images they serve on a cloud GPU server The average Chirery response time is about 20 milliseconds So in the previous two example AI technology is not Creating the core value of the whole application. They are more like a value add But in this example an efficient vector search is creating the core value. This is a pharmaceutical user We first translated we first translated the mark molecule expression into One thousand twenty-four twenty-four bits string It's a binary string with only one and zero and it started in a Milvers server then the user could perform the The molecule Analysis similarity analysis including tiny model similarity to compare two molecules if they are similar and also substructure to see if they can provide just one piece of the molecule formula to see we Which molecules contain these substructure and also superstructure in another way So Previously with a spark cluster to perform these kind of analysis it will take around 14 seconds for 30 million molecules But with Milvers we can now analyze over a hundred million molecules within one second Actually 500 milliseconds on a single server. So this is almost 1,000 times faster than before Okay Now it's come to the end of my presentation Here are the useful links for the Milvers project if you want to explore the possibility of Introducing AI technology into your applications. Please think about Milvers. It will be very helpful. So we can find our documentation the benchmark reports on our Project website Milvers.io and this is our GitHub repo You can also follow us on the Twitter and we have a publication media publication with we would post our technology blog on to this say media publication And also we now enable the discussions function on GitHub So if you have any question you want to discuss you can post the question on our GitHub discussions, so Yeah, we welcome people to join the Milvers community Thank you for listening to this session