 Just Thanks for coming. So I will just present Panda Which is a platform for network data analytics, but keeping my just data analytics and also platform So just few words. So what I like really is innovation So he's doing the right stuff at the right time, but also DevOps because at the moment text is very good There's no issue with tech. We just issue with people. So that's why I like DevOps So so just to give you a few context about the project. So You will see that virtualization automation and the orchestration has made real-time service for viewing possible There's a lot of open source a bit of big data technologies at the moment I will speak later on what what there are and there's a lot of big change in the Network space. That's why Panda is only network oriented, but I will show you some use case which is not fully network So what we've seen so far and what Explained about all is that we've seen a sealed analytics pipelines and the issue is that once you are you implement a data pipeline and By using dedicated technology, then you put your putting constraints on your pipeline. And so it is not really Optimal and you are duplicating a thought. So that's what we we define Panda and why Why we want to achieve a really good Big data platform So we we do not invent a lot of stuff Just present you some some kid key aspect of the platform But but basically so this is a Linux foundation open source project Why Panda so really? We really want to make all the big data space really easy for you to use So and I will give you a live demo but they point some Spark streaming application And so what we want to achieve is we need to remove the the complexity of working with big data technologies and if you If you want to have a quick look so there is a Lot of technologies on the big data space So and you've got some open source distribution of ad-hub Like or turn works, which is fully open source, but there is also Cloudera distribution map are but you can also go the other side with plank, which is Not open source at all. So we are right in the middle so just to illustrate one thing if you want to just Keep one take away of this presentation is pandas about the platform So do not spend any time on building a platform for doing big data stuff Just be focused on your use case and on the application So because you will waste a lot of time of fine-tuning your ad-hub cluster making Analysis, etc. So really do not care about the platform. So here is just a short view about what is panda So this is the platform is just in the middle. So on the left inside So you've got how to integrate with the platform. So we we just use Kafka a well-known database technology, which is the better one at the moment in order to To be able to push that out who are to a panda. So it's quite easy to use we are providing as part of the project all the the Producers as the sample code and the plug-in to connect to the platform. So we are you can use lockstash We've got a plug-in for last touch. We've got a plug-in for open daylight and we've got all sample code in a PHP pattern and Java so that is just about using the Kafka API with the Encapacitation and then once you are you've got your data in the platform You can either consume me to consume them in real time directly from Kafka or you can build near-time application based on spark streaming or spark batch processing, etc So what we provide at the moment? So we are on the release 3.4 as part of this way. So we are providing all the The installation to build your open for your platform on top of open stack or on bar metal So this is based on heat template. So in less than one hour, you can have your platform ready to use It's also possible to stop the platform on Amazon As I say we've got plug-ins for lockstash open DMP and a lot more And so as mentioned, so you you must be focused on the right side because this is where the value is Once you've got all the data pushed to a cluster and then you need to build the app that Really extract the value of the data. And so we are already providing some sample application such as fault analytics I don't know, but just Maybe I want to spend more time on the older text that are inside Thunder So we are bringing all together using either heat template or cloud formation template for my zone We are using salt stack in order to orchestrate all this stack And so One benefit of pandas for every data you push inside the platform. This is automatically restored well stored in HDFS, so so do not care about The data will manage it. So we are using goblin in order to store that and then after that you can You can deploy any application on top of pandas and what we've built so far is with simple this process We we provide some simple coin in order to help you building stream a streaming application for example and Deploying it to to pandas is quite easy. We just Build a component you post your application. It is automatically Deploy on the platform. You do not care about all to configure your own, etc Everything is managed by the platform as I said And so just a few words about the console After I will do the demo. So we we provide a console in order To make available all the relevant relevant information and everything you you need for using pandas So this is a kind of dashboard that show you that all the components are Fully working and it gives you also all the statistics and all the links you need to know So what we provide also is that so you've got a set of software available But we we provide also all the operational UI to manage everything within the system So as for example for for Kafka if you want to manage it, we are using a Kafka manager So in order to create topics set the Replication factor or set the partition etc. So everything is available Through the UI for the other part so you can access to the Clothing Manager If you want to start because you you are putting a lot of data within pandas if you want to start some investigation We include Jupiter Jupiter is a notebook in order to start working in an interactive mode with the data So in order to play with all the datasets and start building something and And after so you've got the links for Grafana, OpenTSDV Impala All the stuff so rather than spending time on the slide I will show you a live demo This is back so So everything is in GitHub. So if you go to a GitHub.com slash Panda project you will find 25 repository which consists of So you have everything from the heat template to the sample application So for the example, so I use heat template. So yesterday I bootstrap a platform. So you've got Okay, so you can see that so you've got a simple CLI with some parameters and other To bootstrap a cluster. So you've got a lot of steps. So in 40 minutes, I've got a really to go big data platform so just to show you So on the So here what's look like so when we build the platform on top of OpenStack, so we build a Dedicated environment for for pandas. So it's consists of a bastion a net a dedicated network and a set of VM So everything is automatically created. So As I said, you do not care about the the platform. We manage that So this is this is the real Panda console. So so just to give you some insight so On the matrix part what we we build so we've been we build a platform testing component which grab all the metrics for all the component in order to ensure that they are working and expose all the information you need so so that we've got all the information for HGPase for example, you've got All the information for Iparra, Uzi, etc, etc And you've got also all the stats for all the brokers available in this cluster. So this is quite useful Just to show you so here for the for the demo I've just created a topic And so you will see that there is some data coming over over it and as I mentioned so So here's the You service which is part of CDH This is a way to bruise the HGFS So everything is stored in as a data set So any data push to Kafka goes over HGFS through Goblin and is sought by a source so we've got a Little we do not put any constraints on the data pack line as I said So we just put an everyone's calculation in order to specify a little platform schema, which is just about Defining a source in order to know where the data is coming from a timestamp Hosts and then the raw data so the raw data can contain whatever you want This is where you will put all your event data for the metrics. So here's an example So I've got a collect this source and so it is it is sorted also by by dates. So Just to give you the Some information about the the demo so I just have a little write-on script that will push that are so random data to a to a Kafka topic. So this is the Avro.caiso.metrics So just to show you that with just 10 lines of Python, you can push data to Kafka or to Panda So and so to Kafka, you just need to use the the Avro schema and then put your your raw data within the within the message. So just to show you that the Avro schema is quite simple and says timestamp source of IP and raw data so I will push I will push data just after. Just let me show you that So at the moment, so I've got my working cluster. What I will do is I will deploy a streaming application So I already compiled the sample code available on GitHub which so the The Spark streaming application will just read information from Kafka and we'll push it to open TSDB, which is a time-server database So I just use a package. So there's a few few configurations to two sets. I call the application for the demo Good name, it's creating Okay, so the application is available on top of the cluster. I can run it and just to show you that now I can produce data to the cluster And if I go to open TSDB, okay, so I've got my data that are coming over You can see that I've got some points. So just to show you so this is a real-time application So basically, it was easy to produce data to render to deploy in the application which just get the data from Kafka and push it to open TSDB. So open TSDB is a well-known time-server database. So most of the time all the data Are based on the on timestamp. So especially on the network side. So that's why we use to do this kind of Of demo. So just to show you that it's quite easy to use and Simple. So I show you that so this was back upside in case we lost the network So there is bare metal and open stack deployment based on each template. You can also deploy On AWS if you want. So we've seen the console all the Platform testing so the data ingestion is quite easy to do just with a simple Python code or so We've got also ODEI ODEI ODEI ODEI plugin and logstash. So logstash is just a matter of configuration On the data storage part. So I show you that with gobbling everything is stored in the HDFS But you can use also HBase as part of the platform The application management is made simple. You just deploy package and it is automatically automatically deploy for within the adobe cluster And then after you've got all the time series and impala interface available We use also impala because this is a you can easily connect it connect to To tableau which is a well-known BI tool and so just behind that so we've got at the moment the Claudia distribution of adobe So you and then you've got all the management UI behind that So so now I've just run through the the platform. So you see that this is This is Everything is available. And so we apply it first on on NFV in order to To we give get more analytics on top of the network So the first use case built on top of that were the service analysis and the service impact And we've done also with the MOOC soft So we integrate MOOC soft in order to provide some fault tolerant on top of an infrastructure We also apply Ponder for IoT and smart cities for example. This is this is what we've done so far In Paris. So this is good because this was a data-driven approach. And so we put we put a lot of sensor within So place that plus the nation which is a big place where we've got a lot of traffic and the city wants to get invents from the all the The work that they will happen they will close All the roads that come to the to the place in order to ensure that there is less pollution, etc so we put quality sensor We we get data from from the From the device available in the place through the Wi-Fi the wireless connection We were able also to interact on video analytics solution in order to identify the flows So once you've got the infrastructure you can build application on top of that So we are able to identify the the different flows Happening on the the vehicle flows For each road. So then in case of the closed one we identified the impact. So we are able also to to correlate the noise with the vehicle We are able also to identify the density of the pedestrian per section And also how we are in France. We can do more analytics regarding the noise and Correlate that with strikes because we are used to strikes in France. So as you see, there's a lot of spikes spikes and this was So there's some spikes and we can identify that this was when they are Strikes in France so a lot a lot of space also As you see we'll talk we talked a lot about open daylight and BGP So we are we've done some work on top of that Maybe what I can do is I can show you a Short video about the work done on So this is so what we've done so far. So we've put all the prefix within Panda so as Jill Demonstrated just later on just so extracting or everything from open daylight And putting that into a panda So we've got all that other so this is currently I would say In development, but we will soon soon demonstrated on the Cisco live Berlin So you will see that all the BGP events are stored in automatically in HDFS and then after that So you can run So BGP application on top of Panda So this is video. This is good because he's working faster than a live demo So we can launch multiple instance of an application and then after So we've done some data expression using Jupiter So this is a way to show you how the Jupiter works. So you can do some iteration on top of all the data set and start doing some some visualization and here you can You can have a network topology graph based on all those BGP data and after you are free to spend more time on the application and visualization which is the key part of all the big data space and So you can look at all the prefix because you can sort them their eyes And and after just for the fun After you are free to do a nice visualization on top of all those big data. So this is But keep in mind also that Panda is about big data. So you need you need to have a huge data set A big stream of data coming to the platform otherwise, this is not about big data So this is where just a nice example Okay, so we are building an ecosystem around Panda in order to so we are a small team at the moment on the core develop Core development, but we we are working with a lot of SP companies In order to build this co-system and also with third party So just to mention that So we are working with the Linux foundation. So we are working with mood especially for the Fulton exist in the network with stopper with strings. We are working also with open data soft around all the smart cities Topic with opening all the data We base also and you know ontology, etc Just to give you few words about what's coming for the next three. So we plan to do some Some stuff especially the big stuff is the the data security Which is always good to consider once working with the data we will do also maybe Well, not maybe we'll do that all the VMware support for enterprise application We'll do open source add-up at the moment We are leveraging through their distribution, but we plan to work on the add-up upstream In order to have a free open source platform and Also on the app side, which is key. We will have the so BGP deep analytics Which will be showcased at the scolife Berlin. We've also have black hole detection, etc So you still wonder why panda so I really see an open source platform And you've got everything you need for playing with big data And if you still wonder what is panda so I Go to the the website on panda.io. We put a guide so we say read the fucking manual, but this is really Yeah We put efforts on documentation But now you will you have a complete guide which explains all the technologies are we try to To give you as much insight as possible We're getting all the technologies or to work with panda or to integrate or to produce or to To work on the data information or tool to use all the simple code in order to build your application on top of panda Etc etc and so that now I've got five minutes for a question Yes Sorry This is Yes So we can discuss that but we plan to okay, so the question is Okay, so the question is what about VPP so we We are currently we will integrate VPP, but This is in the to do is we can discuss that this factor, but I'm not focused on this part at the moment So using I Don't know about the Okay, so so it's about you so the question is about So using a string set instead of Kafka So for engineer at least So if you go to the engineer website, you will see that they using stream set and we are also putting ponder on on the architecture So I've got a diagram we can discuss that We so we stream set do not replace Kafka, but we can become a bit of compatible with this kind of solution Any other question, and I do not have stickers, so okay, so thanks for your time So we are Coming So So we are So we are focused on the core development. We are supporting Focus on Yeah