 So my name is John Fontanals and here is my colleague Johannes and we are here to present you how you can use Gina to bring your neural search and multimodal machine learning applications to the cloud and how we can help you with that. First let me share some words about Gina and how who we are. So we are an open source company that was founded in 2020 so in the middle of the pandemic and that influenced a lot the way we work. So we are distributed around the globe. We have more than 50 team members right now and we have four main offices two in China, one in Berlin and one new one in the US. We have raised 38 million dollars in funding and we are considered as a top tier AI company in the world by some publications. So what we do we are developing an ecosystem of projects and products open source based around this neural search and multimodal world. So don't worry because we're not going to take a look at all of these we are going to focus on doc array and Gina that form our ML ops framework to develop these applications. So we have been talking about these neural search and multimodal ML and first I would like to make sure that we all understand what neural search is. So in an easy way to describe this neural search is deep learning powered information retrieval. So in opposition to the traditional search where search is done by comparing keywords and tokens, neural search is about transforming documents that can be text or whatever into an embedding vector that we expect to have semantic content and do nearest neighbors with that. So we expect that relevant documents that are relevant between each other, they are together in this embedding space. This is the simplest way to explain it and with this we can do fine sentences in a book for instance or we can go and since we are transforming two vectors we can do this with images for instance or and you can build kind of a e-commerce shop applications with this principle but you can go beyond that you can go to audio video or whatever but the the quality of the results that you will get is limited by the well limited or provided by the capacity of these models to capture the world and this is provided by the community and the research and and the capacity to learn these concepts, these semantic concepts is being increased right now because the community is trying to work with multimodal data. So the capacity to understand the world is increased if we use for instance the captions that come with images, the audios linked to videos and so on. So we understand from Gina that multimodal is the future of machine learning so instead of staying in one modality or one type of data we are trying now to build relationships into understandable relationships within this type of data and it's not only us who believe that so all the big players in the tech industry and research labs have been in the last years developing models and applications in this direction. Okay so when we see this world of building relationship within this data we can see two patterns emerging so we have neural search where data is found when data is already there with more data you can find so you can do text to image search, audio to video search whatever you can do as long as you have a model that can extract this information or we have been seeing arrays in the last months and years about creative AI so for instance doing a painting from a textual prompt or whatever and in Gina for instance we have Daliflow that is a product that is done using our text tag that does this that enables to do this for instance you can create given a prompt you can create images such as this and well this is you can play it with that it's very fun and you can have a lot of really good results. So we have defined more or less which in which context and field we work on but then why are we here because Gina is here to help you solve some problem so here I want to describe what problems we have seen from Gina so that we can try to solve with our text tag. So first let's look at a typical workflow for a machine learning engineer when developing an application so first you would maybe it's an over simplification but we can agree on on these steps we have to train the model create a POC it will work locally you will share with your colleagues it will not scale or anything but you will prove that you can provide value to some users you know then you need to wrap this model into an API and make it accessible from the outside you have to do the data validation you need to containerize and deploy into the cloud all this is a long journey and it makes it even harder if you are working with multimodal data because you have to deal with more much more things so the challenges appear for instance the text tag blows up you have to deal with images with tags you have different libraries different applications different things to do so you have to turn it into an API they might collide they might have problems within each other this is something that we will see how we can ease with the use of Gina then bringing all these applications to productions in its own it has a lot of problems on its own that on its own it's already a nightmare so all combined it's really problematic you have to provision the hardware serving enable the monitoring to understand the how the application is used you have to handle the networking scale make it all robust and this journey since it's so long it involves many people with different skills and from Gina what we want to do is to empower the machine learning engineer to focus on what they are good in the in their business logic and in their machine learning knowledge and we are going to see and here Johannes my colleague is going to show you how we can try to ease and solve these problems using Gina hi okay it's working now thank you John so now that we know what the problem is and what we're trying to solve let's take a look at how we do solve it using Gina AI and the tech stack that we have developed and specifically we will be looking at the three parts of our ecosystem so the first part is decorate open source python library which will help you to unify your tech stack and handle data of all kinds of different minorities such as unsaid videos audio text whatever you want you can all handle this with doc array then the next part of the puzzle is Gina and Gina is our ML ops framework for multimodal data and research and this will help you then once you have developed your proof of concept to lift it into the cloud and make it production ready and serve it to your users and then the last step which we will touch upon very briefly today is jcloud so this is our cloud service which we can then use to actually run your Gina app so we can just give us your Gina app we run it for you and expose it to the outside world and with this in place then the new workflow of a machine learning engineer will be quite a bit simpler so it will be like this so you first develop your create a model you develop your proof of concept you will still use your your pie torch your tentaflow or whatever you want but to handle your data you will use doc array right so this is the first step then in the next step you will just do a simple refactoring of your code using Gina and you will get microservices scaling and deployability all for free with a simple refactor in the Gina framework and lastly you can deploy it as I said with just one line on jcloud so as we can see from here already it's a much simpler process to get from proof of concept to deployment so the first thing that we'll be taking a look at is doc array as I said our data structure for unstructured data of all kinds of different varieties and yeah it's fully open source you can check out the github and what it does is it gets you from your starting point where you just have an idea you just want to play around with things this is sort of the data science phase let's call it you maybe train your model or you have a pre-trained model and you get just from your idea to your proof of concept and the way we do it one important thing that we do to make this very easy when you deal with multimodal data is we provide a unified interface for all kinds of data so no matter the text the the data modality that you have you will only have to worry about one interface and this interface being a doc array so one API that we have is this data class API and as you can see here you can very easily create a data class that where you can define the structure of your data right so in your domain in your business problem you have a different structure of your data you have images that have some relation to text some relation to audio etc you can very easily model this here and then once you're done you can convert your data class to a document another thing you can do is as John mentioned in the beginning very easily build a search solution a search proof of concept and here we can see how easy this can be done in just 18 lines of code and at the end we'll also showcase some other features with this code snippet so let's maybe step through some some steps in this in this code snippet to understand how simple it can be to build a search application with doc array so the first thing we need to do is to load our data right so in this case this is just text this is some corpus that we get from the web and as we can see we can just very simply pass a uri to document to a document and then call this load uri to text helper function helper method and then document array will go to this address web address will grab the text will load it into the text attribute of the document next we can strip our data apart so we can synthesize it and split every sentence into one discrete part in our in our data handling and then we wrap all of this up in a document array right so first we had a document which holds all the text data now we have a document array and array of documents and each document holds one sentence of our text next we can do embedding which is the process that John explained in the beginning so we take each sentence and pass it through a model and create this vector representation of this of this text of this sentence these vectors can then be compared to do the actual search next we can create a query so we want to search for something what we want to search for well we can for example search for the phrase she smiled too much then in the end we will search through our entire text corpus for the sentence or sentences that are semantically similar to this and here is where the actual searching happens so array provides this method dot match where you can use your query use your corpus that you have tokenized sentence size in the manner before and then you can do searching in just one line some other features that doc array ships with out of the box is storing to disk and we have a multiple different options here so we have deep integration with different vector database back end so once you have your embedding vector representations you can store those on disk and do a very efficient approximate nearest neighbor search between those different vectors on disk and we have a range of options that you can choose from freely read is elastic search cqlite vv8 queue runs and also our in-house solution called in and light and another very important aspect of doc array is that it's ready to be ready to wire or it's made for data in transit this will obviously become very important later when we will talk about microservices right we need to send all of this data from one machine to the other over the network so we also offer a range of serialization options like protobuf json to pandas data frame just binary base 64 to pedantic model etc um so that you can very easily send your data and you don't need to worry about that either so if we run all of this then this is what we get right our query was she smiled too much and we can say see the output that the results that we get but she smiled too much uh little she might have fancied etc etc and at the end the numbers that we see or the similarity scores that we get and of course you can customize personalize every aspect of this you can use different similarity metrics etc use your own machine learning models to perform this search but this is the basic workflow that you would use with doc array to create such a search so we have now seen how you can use doc array to locally on your laptop build a very simple but impressive proof of concept or like data science kind of solution and the next step now and this will be the bulk of this talk is about gina like how you take what we have just seen what we have just created how you how do you take this and lift this to the cloud and as i said gina is our mlops framework for this kind of thing it's also a fully open source you cannot check this one out and this is what it does it goes from your laptop to the cloud so we've seen the first step you go from your idea proof of concept then you go vertical through the scaling and we basically incorporate all of these technologies that you see here that you would usually have to worry about we do it all for you and when i say production ready what do i actually mean what will it give you well it gives you a bunch of stuff right it gives you replicas and sharding and scalability so if you have a certain component you want to make it robust so if it goes down you have replicas that can kick in and keep serving the requests or you can scale it up for more requests at the same time etc we have streaming we have async non-blocking execution we have a range of different protocols that you can communicate with the data so this will spawn a server and then you can connect to the server using grpc web socket htp graphql whatever you want we basically under the hood launch an entire microservice architecture for you are containerized in docker we have observability support baked in so you can use prometheus and profana to see what's going on we have a hub plugin ecosystem you can take prebuilt building blocks that the community has built and used it in your application and lastly we have seamless integration with kubernetes so if you really want to be serious about your deployment you want scalability and all the automatic stuff that comes with kubernetes you can very easily do it with gina and to to make it so simple and to take all this hard work out of your hands we have to work with a number of different layers of abstraction we even have a very fancy animation here that's super cool and so these are the different layers of abstraction that you usually have to deal with when you build such an application but with gina you really only have to worry about and interact with three different things three concepts this is flow executor and document let me now explain what these are so document this is inherent from the document array package this is the basic data structure and as we will see everything in gina is based on this document this is a sort of contract that needs to be followed which makes everything else much easier and will ultimately make your life easier as a user of this framework the next thing is the executor and the executor is a group of functions a one computation that you define and you can define this as whatever you want but you can think of it as one unit of operation one computation and very importantly as I said this will take documents as input and spit out documents at the end again and later we'll see that one executor will then become one microservice in your microservice architecture and lastly we have the flow and the flow simply takes executors and ties them together in a sort of pipeline so you can define how your executor should be tied together what information goes into what executor and into what other executor next you can create this entire flow that combines them and this can basically be any the directed acyclic graph you're really free to choose how your information should flow through through this flow and with these concepts in mind we can now look at another basic example that shows us how with Gina then you can bring the example that you've seen before this easy text search to the cloud with a simple refactoring right so this is all the code that you need to create an embedding service for text I'm fully fully in the cloud you know it's a microservice so the first thing we do is we create our first executor this is done here so we just take this executor base class that the Gina package provides and we inherit from it and then we create this sentence isa and the sentence isa does the first step that we have seen before so it takes all the text and splits it into different sentences that then we can search through so we have one function sen I call it which takes as you can see docs which is a document array and then we operate on it we do the same operations that we did before we just stripped the text into different sentences and then we return a document array again so this is the first microservice the first executor that we built our first unit of computation then we can build another unit of computation our second executor this will be our encoder and this will then take the sentences from the first executor and then code them so create these vector embeddings for each sentence these vector embeddings that can then be used to search semantically similar sentences in our database then we have the flow right as we said this ties the executors together so we instantiate the flow object and here it is very simple it's just a chain so we first add token isa then oh this should be sentence isa small typo and then we add our encoder so we just take one and then the other and the data will flow through this but as I said before you can kind have all kinds of branchings and parallel branches and then merge stuff together this is really free for you to choose whatever is necessary in your in your domain for your app then we can again define a query and send it to this flow and now this is different from before before we just this was just python code running but now this will create when we launch the flow with this with statement this will create a whole microservice architecture in the server that you can connect to as with one of the protocols that I've mentioned so geoprc, htpp etc etc then we can post a message to the server and it will do the computation that we've defined and return the results right um so we can look at this in action so we launch our script and as you can see it takes a little bit to spin up the server now it's ready we send the data and lastly we get the results with the with the complete embeddings look at it again so all of these executors spinning up the server spinning up you can connect to it send data get the results and all it took for us to achieve this was this simple refactoring that you've just seen but now let's take a look at what actually happens under the hood when you execute these lines of code and what you see here is a graphical depiction of the architecture that we spawn for you when you when you do this kind of thing so what you can see here we have two different deployments we call them one for each executor right and so at the top you can see that we have three replicas of a certain executor so you can for each executor decide how many replicas you want to have so you have robustness and reliability if one of them goes down the other one can take over and you can keep serving or if you have an increase in requests coming in you can add more replicas and make everything more reliable and robust and the other thing is this gateway in front and the gateway basically does the connection between the executor so this gateway will take in requests and then decide or like choose where to send it to so the the logic that you've defined in the flow will then be handled by this gateway and this coordinates between the executors and the first thing I would like to look at it in even more detail is this gateway and how it does this connecting of the executors so the gateway can be thought of as sort of split into two halves one half is the one that you as a user connect to with your clients and your data and the other half is the flow side so the other half is the side that communicates with the microservices with the executors so on the left as a set of different options to connect to this gateway you have either grpc or you can use htp powered by fast api graph ql powered by strawberry or a web socket also powered by fast api so on this side we don't reinvent the wheel we just take these great open source projects and and use them in our gateway on the other side however um this is the one that we will look at more deeply we have our logic that handles this communication between the executors so we have a streamer we have a topology graph and a connection pool these are the sort of three concepts that under the hood do a lot of the networking for us so if we think about what needs to happen whenever a request comes into the system it's basically three things that need to happen for a request first is the topology graph because as we said in the flow a user can really define any directed as a graph that defines how the information flows through the architecture so the topology graph in the in the gateway is there to dictate the routing between different executors between different nodes in this graph but it's only on the logical level so to speak so like it really only has a graph representation and it does the routing between the graphs the next thing is the connection pool and the connection pool maps every logical note in this graph to an actual physical hardware networking address so we can really send stuff around lastly the streamer is the thing that does the actual sending over the network based on the information it gets with the topology graph and the connection pool so to dig in a little bit more how this works let's look at the topology graph so here we have a very simple example of a simple flow a simple graph that is just three nodes connected one after the other so if you have a flow and add an executor add another one add another one without any special requirements then you will just have three nodes they're like like like on a string essentially and as we said each node each executor represents one unit of computation and in python we have a very nice way of modeling this which is through tasks asyncio tasks so what happens is that each node in this graph will be associated by one task and the way we route between these nodes is that we recursively traverse the graph structure and basically every node we ask a node what is your task and internally it will ask the next node in the graph what is your task and then so forth and so forth until we reach the end of our graph representation so we have this recursive chain of tasks that sort of contain each other right the the calls go in one way and then the task chain goes in the other way then the last task in this graph so this is the end this is the leaf this is what we actually execute then in the streamer what we await so this is the computation that will be triggered and then internally it has all the other computations all the other tasks which will then be recursively called again this is this is how the topology graph figures out this routing okay so now we know how the information is supposed to logically flow through the graph but the next question is how do we actually find this unit of computation so to speak how do I find these microservices in my network and this is what the connection pool does so it maps from a logical node to a networking address but it's not so simple right because we have seen that we kind of different replicas for each executor so every logical node actually is associated with multiple multiple executors in hardware multiple networking address so we need to do load balancing between those nodes so the connection pool needs to decide what address do I actually choose from right here's different options different replicas what do I pick there's also infrastructure that needs to be handled for grpc to work this networking protocol so steps and channels and we need the ability to add and remove connections sort of dynamically on the fly and all of this is done by the connection pool lastly the request streamer takes the information that it gets from the two other things and does the extra sending over the wire and here we can basically see how this works so as we've said the last task is the one that we actually wait for or execute first so this would be task four here all right so we send our request once it comes in to the first or last executor associated with the last task and then we send it and receive it again from the streamer and then recursively we go through the entire chain and then we can execute all of the units of computation along our graph as we figure it out and here we can also see why we need this needs to be a very asynchronous system right you have these asyncio tasks what does it need to be asynchronous this we can see here so if you have a document which is in transit from the request streamer to one executor or maybe another document also in transit and then we have a new request coming in a new document coming in then the request streamer should obviously not be blocked it should in an asynchronous manner be able to handle with this new request coming in this is where asyncio is basically perfect we don't need to spawn new processes or threads this is entirely unnecessary because the latency will actually come from network from IO so this is why we have everything as asyncio tasks okay so this is how the gateway works we we've made it that that's how it's done on the hood and so the next thing we can look at is on the executor side how it's done so we can zoom into this deployment and and see what it does so as we've said each deployment can have for the same executor multiple replicas of the same executor and then what this actually is depends on how you run your flow where you run your system so we highly advise people to run it in Kubernetes if they are serious about production and then our deployment will just be a Kubernetes deployment and Kubernetes will take over the management of the lifecycle of these replicas and do all that stuff for us but if you run your genome flow for prototyping etc on your laptop then it will be it will be a genome deployment and it will just do very simple load balancing between these different replicas nothing fancy yes so then we can zoom in one last time to understand how these a single executor actually works as a microservice because as we've seen as a user you really only write python code but as for this to work as a microservice it can't just be python code it needs to have networking etc how does this work so essentially we built a runtime around these these executors that will handle a bunch of the IO for us so when a request comes in it's just yeah it's a request a network request that comes over the wire so it goes into a runtime and the job of the runtime is then to unpack this request and extract a document array because as we've said document array is our fundamental data structure that everything runs on right then we can pass this document array into the executor and this then just some does some computation that the users defined whatever it may be the only important thing is that it returns a document array again then we pipe this into the runtime again it does the inverse conversion it wraps it up into response and we can send it off via geopc again this as I said will all be handled for you and all you need to do as a user is write a clean python code and the rest is abstracted away so now that the response has been created as a networking response we can send it back to the gateway from the executor and the gateway can then do its management with the stream etc and send it on to the next executor okay now we've seen how Gina helps you to build a microservice architecture for scalability robustness and cloud readiness as a simple refactor in python and the last thing I would like to mention briefly is is jcloud our hosted solution for these flows once you've created them so um we've seen that Gina sort of brings you from local to the cloud but what you can do instead of just going to any cloud you can go specifically uh to jcloud and at this point let me emphasize again that you don't need to use Gina cloud right Gina and docker is fully open source you're free to deploy it on premise or on any cloud provider that you want uh you can do whatever you want with it but you will have to do some additional work to get it to really be ready so you will have to provision your own resources of course we cannot give you your machine if you run it on your own machine you will probably want to set up a kubernetes instance you will want to enable monitoring so you have to spawn graphona and prometheus you will probably want to put in a proper api gateway like kong etc issue certificates all that sort of stuff and finally then yeah actually deploy the flow and see that it's running and check that it's up and all that sort of stuff or alternatively you can do one line of code you can do Gina cloud deploy flow dot yammer so you can define your flow in the yammer file give it to us and we will basically manage everything that you see on the left here we'll manage all these parts for you and your flow flow will be deployed deployed on jcloud okay i think i think we've made it i hope it was interesting so we to recap what we've seen is how you can go from your idea and the proof of concept to it being production ready being a microservice and actually deploying it uh on the cloud that's all we wanted to share with you today i hope you enjoyed it and if you guys have any questions remarks then we are we're happy to answer thank you okay no questions that are very good or very bad i'm not sure there's one question there i think yeah so the question was what we offered to make development of the model sort of the the proof of concept stage i think how to make that faster and easier i think the main thing that is offered through doc array is just native handling of all kinds of different data so you because that's really our focus right this multimodal data that you've seen we have these convenience functions for text data to load it from the web loaded this into text we have similar stuff for video for audio for images etc i think this is our main value proposition here but like we don't try to replace pytorch or tensorflow etc so on the like actual building your model and training it this is this is not really our scope anything to add well yeah not we it was not in the scope of this presentation but we also have another open source project that is called our fine tuner that it's gonna help you to fine tune your models to your own data but we didn't cover here but it's also an open source project of gina that you can check in our kit have profile and yeah well the question was if there is versioning for executors and if you can roll back to an old one if it starts failing so there is versioning in our gina hub where we said about this hub of executors you can version and tag them and for the rolling back and it's in the deployment phase where you should cover it it's not in our m it's not in our gina open source covered but in the cloud if you have a deployment that is not working you can always roll back with this versioning yeah okay so is it it is possible because it's everything is containerized but we don't have a helper method to make it easy so we have a helper method to map gina applications to kubernetes to docker and compose but docker swarm we didn't cover but it should be some work but it could be so at the end it's only containerized executors connected to each other and if they are willing to contribute that to the project we would be more than happy to have them so the question was how is our experience with onboarding data scientists to the project so at the beginning we had a big problem to onboard people from only data science background that's where our split came and we separated docker arrays so that we it was an easy first point of touch and I think since then we have seen more people with no backend engineering background deploying and working with gina and having nice applications being built I think the key was when block array was integrated in gina at the beginning when we split and make it an independent project that you don't that you can install independently and play with it locally and make it easier that was the best point for them to have the first the first touch and the first feeling of how what power we can give if you want to add here like if you don't want to lean into docker array too much in theory you can just stick to your workflow do everything in NumPy or PyTorch whatever you want and then have your tensor at the end and put it in the docker array and send it off to gina of course we don't encourage you to do it but if a data scientist really doesn't want to change their workflow there's nothing that's in the way without a way of working so the question was about the benefits of displaying a tensor and how we manage mind types behind the scenes second part of the question I I'm not sure I don't know for how it manages I think I recall the question that was more or less referred in the previous talk it's right now I don't have the details in mind I don't know every line of code but I think so I'm not sure if we have a priority of which kind of uri we access I think it should be in the documentation if we it's not found in the condition is our fault and we would try to improve so feel free to go and ask in an issue so if that's it thank you very much for staying thank you