 I mean today we'll be talking about inferencing big data with AI and machine learning models in metaverse and thank you all for showing up here and focus on camera and listening virtual. Why we are talking about inferencing and specifically focused on AI and machine learning models that's because in metaverse that's something that's very important because you're dealing with a volume of data you're dealing with variety of data which is coming at a very very infiquency you are basically dealing with a lot of users in in real time when when you're doing XYZ and with the network speed storage speed all those underlying infrastructure and you have to plan forward and our paper here touches different techniques through which you can plan towards migrating to metaverse or adopting metaverse or launching your applications in metaverse. We talk about increased efficiency and accuracy we talk about real time insights we talk about a better secure platform and security is one of the major concerns for metaverse I have seen some instances where where a lot of data that is banned in some countries by the governments etc ends up in a platform in a metaverse platform because metaverse is is something where you cannot put boundaries today so having protocols having standards is very important and we'll be discussing that we'll also be talking about the ability to effectively analyze the data so that you can gain a competitive advantage on your on your competition a little bit about myself if you google me that's how I come up this is the google knowledge panel has all my instagram facebook twitter linkedin so please feel free to connect I am a data scientist by open organ a technology specialist what those two titles means is I have the knowledge about building data science models and also matching them with the underlying infrastructure so when you're talking about workloads that are metaverse I can help folks understand what how they can leverage the underlying infrastructure or compute power to build those models train them and infer from them and then another things about me I also host a podcast that's on iHeart and Spotify and other platforms it's called what's up with technology and I do have research papers at IEEE at platforms like TCP and I did publish something at Ryerson University in Toronto as well so so in Canada also I have IEEE publication and and at IBM we have something called academy of technology that was started by noble laureates and I am a member of IBM academy of technology what academy of technology does is we identify initiatives that are forward looking and we try to include communities and and academia and other groups and work on those initiatives so there are communities that that people can get involved that IBM also promotes and and talk about so you know if you have any questions I can definitely answer those now coming to why visualize big data in metaverse now and I have highlighted these two columns here talk about why now we have these five points there's rapid growth of data uh there is uh I believe 180 zettabytes of data that that is predicted by IDC towards global data creation by 2025 so data is growing rapidly and as data grows there is an increase of you know threats and cyber attacks and and to give a number there is there are expectations to reach 10.5 trillions in amount annually by 2025 by cyber security ventures so that's that's a big amount and then another one demand for AI powered application as you're seeing large language models coming into play like chat gpt etc AI is going to be replacing a lot of mundane tasks and you'd you basically we all are at the at the era where we're going to be automating a lot of jobs that are mundane using AI so so different aspects of AI talking about machine learning and and three basic concepts that we're going to discuss today around supervised unsupervised and reinforcement so that that is one of the points why now and the need for real-time insights uh as per a survey by review analytics services from harvard business 82 percent of the companies say this is there they have the need for real-time insights when it comes to data analysis competitive advantage gardener says that 80 percent of organizations fail to develop a data-centric culture and data analytics would always help a company to decide on to make better decisions and to improve their products and services so so these five points you know basically talk about why now and then i also highlighted platforms like the underlying architecture and you might because we're talking about big data so there are big data frameworks hadoop no sequel databases in spark we also mentioned gpus tpus gpus are graphical processing units and tpus are tensor processing units they're a bit faster than gpus specifically design machine learning languages and libraries and then we have in-memory databases and stream processing and then again gpus and and tpus so these platforms are important because when you're dealing with big data sets you're basically talking about platforms that can handle those workloads those big volumes and different variety of data now how to visualize big data sets there are multiple ways you could do that the statistical analysis with machine learning for big data real-time and that's that's basically what we're focused we focused on in this paper around importance of pervasive computing pervasive computing or pervasive encryption helps you secure your platform and and do this computation and then some of the statistical methods that include clustering and k-means and pca for basically reducing the dimensionality in your data and then other statistic techniques that can be employed like regression analysis and time series now talking about differences between augmented reality and virtual reality augmented reality is is something when you're using a variable like your apple watch and you're you're basically taking help of a variable or augmented reality gear to guide you in the real world that can superimpose themselves and it could be any iot device not just a smart watch and then virtual reality is when you're entering into a virtual world and that's that's more of like let's say you go to a dealership where you want to buy a car and you use an online platform to customize different things in your car so that ways you're using VR to design your own car whereas whereas if you're going in IKEA and there are kiosks there and you're walking towards looking at particular one particular room that's more of a AR application or augmented reality application so think of VR as total immersive and AR as some some gadgets that you're superimposing in the real world and and taking help as a as a guide now coming to virtual fixtures virtual fixtures and and you hear these new terms uh someone told me you know if you're giving a talk you're you're giving a presentation always to always give your audiences new words new terms they can google later so virtual fixtures is part of augmented reality technology these fixtures can be in form of lights thermostats smart switch and not limited to like all these sensors and iot devices here but can also be thought of surgical equipments something doctors use while performing in operations and an insurgical training in assembly lines when you're placing parts together to build a complete product and in sports training when you're trying to analyze your opponent behavior and trying to improve your techniques in architecture and construction to visualize and manipulate 3d models of building and other structures and this this surgical training is actually interesting because my thesis is on collaborative robots as well and surgical training is something that the use of AR in surgical training is something very helpful in healthcare because think of a surgeon performing an operation and he's getting a superimposed view of what's inside your body on top of your body and able to like make those precise location able to identify those precise locations of nerves and things using augmented reality so it's a very interesting use because a lot of of operations or surgeries are happening using co-bots now the number is huge in us and and using these immersive ecosystems and data sources we talk about these data sources in our paper where we have different user groups and if you see in this slide here we have this particular chart where user groups and ecosystems are aligned to the data sources so specialized units hardware software and services the developers 3d artists AR engineers use them I also put the file types this is basically the data and it's part of unstructured data so that what big data is big data is three points volume velocity and variety and the file types like obj stl fox csvj son xml not everything structured you have these different pictures in metaverse you have different videos and these are all part of unstructured data and that's why you know we call inferencing big data in metaverse and this is basically a mapping of what are the different data sources what are the different file types and who are the users using it and this this small diagram here talks about user groups where the data is flowing the ecosystem mapping software services augmented displays and and other user gear part of all part of storage and and user ecosystem and then we have visualization types that's part of when you get this data how you visualize it and you you you do that through different techniques one of the techniques that I laid out in this paper is around rattle rattle tree that's that's part of rattle library function in r through which you can go in depth and see all the dimensions in that one uh decision tree that you make out of these data sets that you get on onto the onto from these data sources and then different user groups like people who would be using these different data sets if we have in the paper we also outline the security aspects the network administrator the security experts you know where the points that we should consider security and security as we talk to our clients is one of the major concerns that's coming out of using metaverse and putting users on metaverse there are some patterns that we worked on around metaverse as well and you will see a lot of a lot of concerns around security so security is one of the most important topics as well and there is a need of securing the data sets onto the hardware itself rather than doing it through softwares as well and and now talk about machine learning concepts so these three concepts are the basic concepts you may have supervised learning where you're taking label data and you're giving a output one of the example is you want to know if a particular email is a scam or not right that's one simple example where you can get these are the emails that are spams and these are the emails that are not so whenever you're using a service like gmail and it asks you if this is a spam and you mark it as a spam and it always goes to your spam folder so you can categorize that as a spam and it can identify it as a scam based on your your you know pattern and your your earlier behavior of figuring out if it's a scam or not so that's part of categorizing label data and supervised learning what's unsupervised learning is think of a time when when you get a itemized bill of your your cell phone the phone calls you made and take combine all those phone calls that you made into one one large sheet or large Excel or anything and then you're taking that cluster you're identifying you have a problem of figuring out where are the most phone calls made or who's the person you're talking to a lot what are the different timings you're talking to that person and in order to solve that problem you know you can use an unsupervised learning technique and over here we talk about in unsupervised learning machine learning model is unlabeled or you train the model and training testing inferencing they're all part of model building so most of the models that you build around you train them on certain data set you test them using an ORT test and you infer from them you basically figure out okay this was the prediction I was trying to make and this is the result and clustering putting all those phone numbers into one group or one cluster is part of the unsupervised reinforcement learning is on the other hand let's say you have a robot and you want to teach the robot to get out of a maze so that's basically he would make turns make decisions so those are all part of reinforcement learning in this paper what we did we took these three concepts and we define problems around these concepts so like number one problem was removing duplicate and this could be part of whenever a user is trying to log into metaverse and just do master data management there are multiple users there are users logging in from multiple devices do a master data management in order to figure out if a user is trying to purchase an item from a device and probably they're logging on to be device or using some other credentials just figuring out you know which session or which particular session is duplicated or not and what we do here is we combine the deduplicate row try to figure out how many users are on the platform right now calculate the length of top customers and and load the data set and run a sorting and storing the index so basically putting everything in a vector and then take and sorting it and then getting the result out removing the duplicate so in this data set and the whole code and and the program is part of the paper a paper is available at IEEE explore library and the result was there were 18 customers that were identified based on this scheme based on this algo and we were able to identify all of them now second is around predictive functions and that was like one of the basic ones this is more towards you know just explaining what are predictive functions or what are models and in order to understand this classification regression clustering association rule mining I'd say I'd give a small story say you have you have customers say you go to a particular store and and you have a friend with you and when you walk into that store that friend gets some offers around you know in in mail mailers or in email saying that this particular t-shirt is 20% off for you etc etc so they're they they base and you don't get the similar offer because because their marketing team is doing a classification is using a classification predictive function in order to figure out you know this person lies in which category so you might be a person who has gone to that store previously and asked for a good boy discount but but they know that already and they have classified you as xyz but the person going with you is does not fall under the same classification so that's why you know understanding predictive functions and getting the classification is important for the marketing department because you might be a suitable candidate for them to spend their marketing budget on and then talking about regression is once you have identified once you have identified that this person this person falls under this category you want to figure out okay how much is this person willing to spend so you're sending out a mailer to that person but you also want to know how much this person actually wants to spend on your product so that's when you apply regression techniques and there are different models that you can build through regression techniques and and the things I'm telling you around this story of you went into the store and the marketing department wants to find out what what category you want to classify you under one particular segment of the customers it's very hard for a data scientist as well to figure out which predictive function they should apply to a problem so most times you'll see data scientists struggling to figure out what is the best algorithm for this use case so in this in the in the regression classic regression you can basically build a function where you're figuring out okay this is the amount of money that this person's going to spend you know when the next time they come to the store now clustering clustering is another one where you have a customer that came into you into the store you classified that customer and now you want you have done the regression analysis on him where you got a number that this is the amount that person is going to spend but now you want want to find out who are other people that can spend or that fall under the same category and you want to make a cluster of those people or those habits or those patterns that you notice in this person and you want to promote your your products to those people so that's part of clustering and then association rule mining is when you associate the similar kind of behavior to other folks in that circle you try to match out you know what are the identity patterns in the sales data versus what are the patterns from the customer and then you do this association rule mining predictive function and you basically get a successful campaign so just just describe what predictive functions are and different use cases in terms of regulated industry you could do fraud detection credit risk assessment and loan approval when it comes to banking to classify customers you can do segmentations forecasting and prediction using regression you can also do segmentation and detection using clustering and then with association rule mining you can cross sell so don't be surprised when you see a spouse getting a certain type of credit card recommendation based on you know what her husband might have so that's that's part of association rule mining and and now we talk about using clustering like k means as an unsupervised learning method k means is a unsupervised learning method and by by now you know supervised unsupervised unsupervised faults under the unlabeled data where you you're basically making a model so k means clustering is a modeling technique and and why I talk about k means here is it's one of the common ones you can in the paper we have outlined one particular program where we are applying k means and getting results and plotting it on the paper so so that's why I put k means in here and our code tries to get different values of k and these range from 2 to 20 and for each value of k we run k means algorithm 100 times that basically improves our our results and the accuracy we find a total average we also plot different values of k onto the chart and and basically get the optimal value of k in order to use for clustering a data set and and this is all theory but when you apply to this to a use case you basically get a data set in that data set you have you have different numbers and these numbers can be like here 2 to 20 can be in a range but some of them can be out of range those are called outliers what you do is you do extract transform load and you put those outliers out of the equation or you put them in and you transform your data and you basically apply things like k means and and others this is the tree the rattle tree I talked about over here we are in the paper this is the third technique that we are applying that's the code snippet three we have one is 178 observations and 14 variables we are splitting it to 80 and 20 split over here you see the labels 1 2 5 14 4 6 15 so it goes to to a depth of 15 and these 15s are the different levels so these are layers in the data set there are 15 layers and that's identifiable on top and then you see 33 36 21 these are proline and phenols they are they are amino acids so this is a category of a vine and we are trying to figure out you know different categories of wines we are trying to cluster them together and that is based on different properties that vine have and we use this rattle chart to basically dig down what are the different properties what are the different layers and then make a decision on you know yes no make a decision on what are the different clusters we can form for these vines so we are able to put them on a chart and build a model for it we tested the accuracy and we displayed it in in our paper now towards the conclusion we have explored different ai and ml models on different data sets and these can be used in meta words these are standard techniques that we just talked about and we plotted that and we've shown those small use cases independently around those three techniques we have determined in our paper in the data set we have that 18 customers in a real-time data refinement we were able to remove the deduplicates and we have illustrated a AR application data flow for different user user groups and and the different devices that they're using the data sets or using for data sources and then we have used we have identified that the use of high-end machines for better security and enhanced views what it means is you know CPUs center processing units which we traditionally use there are certain GPUs that we use for gaming and things but those GPUs are you also used for things like putting blockchain led ledger together and and for some specialized machine learning workloads that that can give us results faster and when it comes to model building you're training a model you're testing a model you can use GPUs so like tesla when we drive tesla it's taking the autonomous car is trained on a particular model and it is leveraging those GPUs and it's tested on some data sets as well but when it comes to inferencing you are doing inferencing in real time through the objects that you see in real time while you're driving so inferencing is very important and and we laid out that in our paper giving a response time and and talking about inferencing and different surveys and different doing the different literature survey in the paper as well one of the key things that from the IBM side and and this ends the presentation but a question is around when you're inferencing a model what's faster than a GPU so when it comes to when it comes to regulated industry workload they rely heavily on on compute power so I'm not sure if if you guys ever went into quanta quanta computing or stats computing or figuring out predicting a stock price that's when you know you can use accelerated computer to get an edge on on some other models that are running on traditional devices and there's something called Bloomberg keyboard and those are special hardware that's basically for traders and there is some accelerated infrastructure that there are a lot of banks use in order to do perform trading and things and and that's where inferencing becomes important for them a fast inferencing mechanism it's also important in healthcare whenever you're performing a surgery remotely it's important to have a higher inferencing compute power so so Linux one is one platform that's faster than a GPU today when it comes to inferencing and we have some demos around that can show that as well but that's at IBM IBM also has quantum computer there is a platform to interact with quantum computer that platform is Qiskit that is also supported on Linux one so and it's available for public use for open source community to do hands-on on to Qiskit I think that's open source software available to general public as well that sort of ends the presentation for today if any questions we can thank you this is work okay um so uh you mentioned that IBM provides like the special Linux one I think goes to the cloud um is that like a specialized piece of equipment that's like um it's essentially like a dedicated equipment that is used specifically for machine learning or is it something that is more general purpose so so these are specialized for machine learning as well and you can you can use for different workloads so so like this particular one when it comes to machine learning you're running a snap ml algorithm you can run it around seven to eight times faster than any other on any other platform because this one provides and the demos that we have showcase you how much percentage of speed bump you get when you're using this platform versus other so and and when it comes to general purpose computing you can also use that but these are enterprise ready servers that are used for large workloads or specific workloads so like right now if you're using hundreds of say x86 platform or servers those can be consolidated into just four or five of the next one um also at the end you mentioned quantum computing do you think quantum computing in like the short near future um has any practical um applications in machine learning or will we be talking about like a hundred years into the future might be useful so quantum computing um i think we have access to the quantum lab uh what what we do is we basically change different states so like you know how how the machine is interpreting the code into zero and one format and if you have those so there's something called switching theory and logic design where you design these using these different gates like nan and operations and you design a particular state of a microprocessor with quantum computers you can design different states uh using a quantum composer and you can write your own algorithm using those gates or using that specific circuit that you have built for your computing algorithm so what you'll see is in future say 10 years 20 years from now is independent or an individual person coming up writing their own code and building their own quantum circuit and solving a use case that is not solved today through traditional computing you know using their own logic into a quantum land awesome thank you very much thank you um so i'm kind of curious uh on that question as well on the just to follow up uh the quantum computers is there currently like you said the next one has that uh uh tool or that library or whatever available to interact with quantum computers yes like is there currently an ml use case for like like speeding up or some specific kind of model architecture that doesn't work on regular computers that doesn't quantum that you're aware of so there are many libraries um there are many algorithms that we have built they're not um a lot of them but there are there are some specific algorithms that run that leverages the power of quantum computer the the platform is called uh cascade that's basically a type of a runtime environment of python so you can use cascade in order to interact with quantum computer and you can build your algorithm or leverage the ones that we have existing and you can leverage the power of quantum computing using this interface so so those are available but in in order to just to get hands on on on cascade i think that is that is something that you can download it's open source and you can start on it but in order to leverage the quantum computing power you would have to your code would have to speak to quantum computer through this interface so that the cascade interface can run very well on Linux one platform why we put it here is because we wanted to make quantum safe encryption available for for the clients and quantum safe encryption is important because if you're using a traditional computing and you're doing a brute force attack on one particular place in order to figure out a password or or you're using things like rainbow table to predict a password based on what data you have available today you could do that with faster computing because you only accounted for traditional computing in the past but with quantum computing it's it becomes uh you know your chances of hacking that increases exponentially so we wanted to provide a platform that can do quantum safe encryption and and Linux one provides that today but with quantum computer you could you could do that as well you could do use quantum safe encryption any more questions or we can call it a day thank thank you everyone for coming