 Hello everyone, my name is Gilesh Kaurikey and I work for DH Software, it's a group called Innovator. I carry you a lot of different kinds of projects and recently we have been doing some work in textile which is what I am going to discuss today and I try to give you some real-time examples as well how we have applied this into the problem of ID management. Before I begin, I just would like to know in the audience, how many of them are students? So I am just curious to know, do you have any kind of machine learning as a subject in part? Yeah, mine only. Mine only. I can detect it in the figure. Okay, that's good. And from the working professionals, this wanted to get a feel of how many of you are actually working on any machine learning technologies in your public domain. Okay. I am not currently working on that. Okay. Good to know. So let's begin. So today, I am going to talk about what is next analytics and introduce you to some of the challenges that are there in next analytics. I will also go over some common analytics tasks, what do we need to do if you want to get some of these things going. And then I will also take up some real-time examples of how we have used it in a couple of use cases, where you feel that, you know, next analytics will help us cover with some real-time use cases which has been developed. Okay. So what is next analytics? Okay. Okay. So this is the Wikipedia no-definition editor. So if, you know, it discusses and does a set of new things, statistical machine learning techniques that model and structure information content for just the citations, exploratory data analysis, research or investigation. So there are a lot of keywords here, a lot of stuff we can talk about. And I think, you know, this has been the last session in India where you might find it very difficult to understand. But, you know, a simple way to look at it is, you know, you have some kind of information from large amount of text data. And if we, you know, if we, you know, just by looking at the amount of data, you know, which is out there, probably text data is, you know, occupies most of it, you know, the real-time data, you know, in data, data basis which is most structured, then, you know, there are text information out there. So that's why, you know, this area is gaining a lot of traction. There are a lot of interesting use cases that are coming down in this field. And that's why it's becoming more and more popular. So I would also like to quickly, you know, point out some differences in the desktop. Now, other terms which are in also significant externalities, but I want to just kind of, you know, briefly describe how this is different from, let's say, search. Because I mean, search, you know, are specifically looking for something, and your main task is to find the things that are relevant to you, or discuss those which are not relevant. But in text to, text analytics for mining program, you may not be looking for something, or other, we don't know what we are looking for in hard times. So that is different from a search problem, where, you know, we know what we are looking for versus a general data mining problem. The other differentiation I want to make is with data mining, which we have been, you know, known to be doing in the past, which has been, you know, more around structured data. We have data, in databases, in data warehouses, you know, so we collect a large amount of data, put a bit of data, and also do all kinds of analytics and reporting on that. But, you know, the prime difference there is, you know, it is structured. So, you know, everything is defined before we pull the data up, you can add form and then run all the analytics on that. So, which is the key difference, you know, in the text analytics that we are looking for. Well, you know, in the primary, looking at text as an unstructured source of data. Okay. That is the key challenge, because it will bring about a lot of interesting things here. So, as we move next, you know, we will look at what are the challenges in room text analytics. So, one more thing I already mentioned, you know, in this unstructured nature of text, we are just collecting data now from web pages, you know, from one document, the first one is knowledge management systems. So, very few people have just written text in free form, and there's no real structure around that. And, you know, often those are data sources where we collect data from and we are able to make some, you know, data as a function out of it. So, a huge problem in text analytics will be first, when we put some amount of structure into this whole unstructured set of data. Again, you know, a common problem, we also know, somebody mentioned it in the previous presentation also is that, you know, in case of relaxed data, often, you know, you have the same words, totally meaning different things. You know, I went fishing for some sea bass, you know, bass is a kind of a fish, and whereas on the bass line of the song, this is telling me something about the frequency of, you know, the notes and the music, you know. So, another, you know, thing is, you know, there are many ways to represent similar concepts, you know. So, you know, you could be all in a spaceship, like, sources, you know, some of them mean the same thing, but you could actually be putting the money in different ways. So, that's another problem. Visualization, you know, we have lots of, you know, tools for the, you know, classical data mining, we have, you know, we have those pie charts, pie charts, models, they need not, you know, fit into this, you know, for the problems you are looking at, you know, you may need to have different new visualizations to make inference out of testing. And then, you know, high rationality with hundreds and thousands of features. So, this will come, you know, when we discuss this level also, we try to make some structure out of the unstructured. We get into some representation where, you know, you can end up with many very high-efficient data. And, you know, processing data is itself a challenge. So, let's say we have, you know, unstructured data, and we want to work on it. We want to, you know, run our algorithms and things like that. So, we need some kind of basic structure, you know. So, how do we structure text so that it becomes, you know, it gets into a form which is incident. So, that's the idea behind, you know, saying that, what are the different representations of text that we can think of that will help us in doing the analysis, you know, better. You know, what are other, those can use as an example. So, you know, roughly we can, you know, call that, you know, there are three double sets which we can think of representing text, one is at a lexical level, one is at a symmetrical and one is at a symmetrical level. So, what do we mean by this? Let's quickly look at some of these. So, text representation at a lexical level. So, you know, at a very, you know, basic, what is text? It is just a sequence of characters, right? So, if we just think that terms are the less, nothing but a sequence of characters, we can use that representation to do some, you know, for example, you know, in English language, E is the probably the most accurate character. So, if we just know what a character comes from text, probably, you know, we might get some kind of these insights. Then, you know, if I have a large amount of boundaries in English, probably E is the most often repeated word. What could be one of the, you know, interesting ways of using that information? If you have the business of doing compression, the entire compression, you know, this kind of kind of information that we are useful to do. Okay? So, that's one way of the value. Think about text as being a sequence of characters. But the most common thing that we are used to is words. We think of text, the first thing we break down is into words. We also are going to be thinking of whatever keywords and trying to make some sense out of it. But just by looking at the large document, you cannot infer anything. What you try to do is, you know, quickly become sure key words that are being described in the document and try to know what they are meaning or the fail. So, immediately we say, okay, this document is talking about something because I could find these key words in that document. So, this is the most often, most common use representation, which most of the words also do not have as important words. So, there are a lot of, you know, tokenizers are available, which will take your text, break them down into words, and then you will be able to form that you can use for your process. And then, you know, around words, there are also these qualities, you know, like, which also play a big part in doing some of your analytics. So, breaking down or other, viewing text as a set of words is the most, you know, natural thing that comes to your mind, that is most of the things that most of the words are saying. The next thing is just, you know, once in a while, instead of just looking at one word at a time, think about phrases. What are the most common occurring, you know, sequences of words? Okay, because, you might see that the support is occurring frequently, but when you might also have, you know, these two words together, these three words together, you know, bring up something more than just looking at one word at a time. Okay? So, the next thing is, you will call these as n-grams. So, y-gram is a combination of two words, a diagram is a combination of three words, and you could choose, you know, how much you want to go for them. So, that n-gram is what we talk about. So, those are kind of phrases that we could use of, you know, when we are doing, when we are representing verbs, we are representing text for our person. And then, there is, you know, something called parts of speech tagging, so that, you know, you might have, let's say, broken down your text into words, but now you are adding some special meaning to it as well. Whether this word is representing a noun or a pronoun or a verb, because that kind of information, when you have a noun with that word, you know, you could use that. For example, you are doing a task of finding out all the key persons we talk about, you know, how we talk about it. So, if you have that information, that's why we use it as a noun. We will use that to infer that. These are my potential words to infer non-names of words. And I am looking for persons in a piece of text. So, we looked at some representation at text level. The next one is impact level. So, I think of it as, you know, what are the data structures that we can use to, you know, do our text processing. So, you know, we have been used to doing vectors, maps, those kinds of things. And when we want to process a large amount of data we have to carry out, we need help of data structures. Okay? So, similarly, in case of text analytics, these could be thought of as, you know, data structures that you would need to do text analytics. So, vector space model. This is the, I think, you know, it is a very commonly used model. And, you know, the electric model is a bag of words. So, basically, you think of text as a sequence of words. Not a sequence, actually, because sequence implies order. A bag of words is there's no order, it's just a collection. So, typically, let's say we are doing, you know, what we want to read as, you know, a document is actually, you know, let's say a vector. And it is, you know, a vector has n number of, you know, points in it, and they could send you. So, a document is nothing but a collection of words, the key terms that are there in that document. So, you would think of a document being represented as a vector, and that's the vector space model or bag of words kind of model that is often used in a lot of clustering, classification, those kind of other others, where you first take the text, note it into a bag of words or a vector model, and then keep that as input to these other elements. So, that's, you know, one kind of investment you have for media processing. The other is language models. So, you know, you have this task of, when we understand, you know, what this text, what language it is. So, you know, if you want to do that, what you would do, because you would have some kind of knowledge around, you know, these are the templates. If I can identify the context, I can infer the languages in this language. So, here you talk about what are the probabilities of text, you know, appearing to get there, and we could use that kind of a notation to bring in some kind of a model, and then we can just re-form the context. The third one is, you know, past. So, you know, if we have, let's say you have, you know, you may use SQL, you know, SQL is also a language, you know, it's a structured language, and it has its own grammar, and you could parse that and bring it up past the other thing. And then once you have that pre-notation, you could use that to generate your query plan and the things like that. So, think of, you know, you create this as some, you know, text, which has some grammar, and if you can parse it at which entry model, then you know you can do something interesting with it. So, that's another kind of a structure that I'll do with you. And then the next level is, you know, the semantic level where, you know, you could tag the text to give it, you know, more meaning. So, instead of just having a document, which is more, you know, you're adding more meaning, when you're tagging around it, adding metadata to it, you know, so that that could be used to remove the processing, or you could bring techniques so that, you know, for commonly occurring things, maybe you are, you have to figure out, you know, I got a text and I can map it to this regular expression, you know, a most common example would be, think of template as some kind of a regular expression that you have to find in text. If you can match that template, you could bring it for that, okay, this is taking text, this is talking about some sort of thing. Because there is a template that I've been done to identify that piece of information. So, template is one way of, you know, quickly identifying, you know, how this text is talking about. And then there are other things. Other things are like, you build your own representations, you know, for example, you have built something around machine learning, you know, in your machine learning kind of a domain, you might want to say machine learning consists of supervised, unsupervised, supervised R, you know, has different kind of problems unsupervised. So, for your domain, you build some kind of a, you know, hierarchy representation of terms, and then, you know, use those in your algorithms. See, domain, understanding the domain would be very important, because that would define the context, right? Exactly, so that's why, you know, that information comes from a, you know, from a random, very unstructured set of text. How do you, you know, kind of discern the domain? So, the algorithms, let's say that they are there, those aren't very different. So, but, you know, if I have to make use of something like, these are, you know, these are alternatives, you know, of this, you know, in your word that is used in my domain, you know, is used differently in your domain, right? So, and I'm telling you, you can help that, because in my, I would have defined that this word is, you know, a part of this bigger object or something like that. So, a subjective aspect gets assigned to even machine learning. The subjective aspect gets assigned automatically by, you know. By having such a, you know, So, text analytics is, you know, as I said, lately, you know, think of it as, you know, derive information from text. Actual language processing is one of the areas that, or one of the techniques which, you know, mainly talk about how the words are related to one another, you know, whether, where in the sentence they are occurring, you know, what are the meanings. So, those kinds of semantics associated with natural language, if you are trying to understand, your problem requires that, that, you know, you are asking me a question and I'm waiting an automated answer system to you. I need to interpret your question. I need to understand whether you are talking about A or B, you know, what are the, that make sense of your query. So, those kinds of tasks will require natural language processing techniques. A clustering task will not be, you know, natural language processing. So, it's one class of problems for which you need to use natural language processing techniques. So, what we saw till now was, you know, some kind of, you know, bringing some structure to the text which will help us in carrying on some tasks. Okay? So, the common tasks, you know, that are there in text analysis, which we'll also be looking at, you know, as a case studies around some of the techniques that we have to use to solve our problems. But, you know, but by doing the things that I talked before, you know, by converting text into some data structure or like some structure to that, that becomes a way to, you know, process the text, create the input that is in this kind of task. Okay? So, first is information extraction. I also, you know, have a slide, just explaining each of these. But, this is what we will look at. So, information extraction, we leave no revolves around finding important information contained within text. And it could be, you know, identifying person, location, organization, products, so that kind of a thing. Because, you know, if you can get that information out, you can easily, you know, tag that document with that information. You could, you know, if it was a search that you were doing, maybe, you know, once you go through these, you can get information around those extracted terms and you can give results that are, you know, matching that. So, that would be, you know, extracting those entities out of your text. So, that is one area I know and the picture tells you, you know, the simple process in how it is done. It's not, you know, complete, but the main steps, you know, take a text, you know, tokenize it, you know, get individual words, then apply that as a part of speech tagging, you know, if I'm looking for, you know, these persons or locations, these are most, most key they be, nouns or proper nouns. So, you know, if I have some kind of B.O. stacks associated with those words, that would be okay. Then I can take out the entities. So, that is the main flow. We also have one. We have used it in one of these cases and I'll explain that later. That's common task. Another common task is, you know, around document similarity. But this is actually the basis of a lot of, you know, problems that you need to solve. So, the basic problem is, you know, how do we know whether two documents are similar or not? So, do we really talk about that? You know, representing document as a vector of words, okay. That concept is really useful in this kind of a problem. Where, first of all, what you see is, you know, if you want to find similarity between A and B, okay, if I represent A and B as two points, let's say 5.5 would be space and there are two points, okay, that is just to remember two points is a measure of similarity. If two points are close enough, they are similar. If they are far apart, they are not similar. So that's the logic we find out. Then, you know, a point is, you know, to displace a vector of two attributes, x and y, right. So that's what we see wrong there and the side of the angle to do the two is, it termizes similarity. So it's basically finding the the distance between the function here is cosine similarity. Think of it, I've taken a document and created a vector in 2D, it's like, you know, there are two points in it in any dimensional space. So this, you know, down here over there is a three-dimensional value. There are, every document, you know, can be say that it has a vector, which has three terms. Okay, and then I'm trying to find out how far each is with respect to, you know, one another. It shows that there are the distance functions the distance values should be used. Okay, so that's the idea behind documents similarity, but as I said, to convert that text into my path of words representation, it's like, you know, in itself it involves a lot of pre-processing. Okay, so for example, you know the document, I finalize it, I'll render all the stop words. Actually, you know, some cases, for example, if you have a lot of email data, it's new from CC, regards, thank you, dilemmas, right, all that happens to you, you know, most of the times. So actually, I don't think it is a cleanup even before the organization, where, you know, in case of emails or things like that, you know, standard forms you fill out, you know, all of the things you don't need, what you're working on. So there's cleanup involved in this organization, in the strongholding model. I think we look, you know, somebody went to start work, so it's one of the most common ones, you know, don't give them enough information, common words, oh, I know a lot of English language, but, you know, those are stop words and then, depending on what you're doing, you could also, you know, it's not a stop words. So for example, in our case, you know, almost other agents, you work on, your IDH is a very common word, it doesn't add value, because, you know, we have to get the money. Anybody writing IDH, you know, doesn't do it. So I'm not going to choose that as my stop words. So depending on your government, you might want to, you know, stop words. You know, there's stop words involved, and then there's timing. Discuss, I think, timing, something like that now. I'm trying to get to the base one, so that, you know, two documents talking, one is talking, one is writing, one is writing, one is writing, they are all talking about the same thing, that's the idea. And then there's TFI here, so let's say, you know, after being on this government, say that, okay, I've got 100 keywords. So I'm going to consider the top 10 ones. Okay, the TFI here is one technique that you can use to actually collect the most important verbs in your document, which you will use in your back of those words. So then here I find this as for term frequency, inverse document frequency. So, you know, any term that is operating very frequently in a document is of value, but any term that is operating frequently across all documents is of lesser value. So that's the inverse document part of it, and the term frequency, you know, you can say, you know, fingers of that word, you can say the document is a term frequency one. So I'm going to put a term in similarity, you see. So that means you have access to all your documents. So that means you see, you see, you see, you see. So, you look at document similarity, and actually that forms the underlying, you know, technique that is used, let's say, in the blustering. So blustering is a key task. The task here is, you know, you have a collection of documents. You want to quickly cluster them into some kind of topics, as some, you know, groups. So that's where you use clustering, and to carry out clustering, again, realize some document similarity because you want to form groups of documents that are similar. So we looked at how two documents are similar. So we have extended that to say that, you know, documents that are similar to one another form are, you know, belong to one cluster and those that are far apart will be represented across two documents. So that's the underlying technique of document similarity taken on the side to do document clustering. Right? Yeah. This is the problem with the management software. Yes. There are a lot of common problems with the development because the software is stored at maximum. So this is the problem. Because it's going to do a lot of packing, and eventually you can't manage it. So with this software, because I'm the document manager, the software is what you need to go and get that particular document that you need. Right? So with this software, you can manage the document manager software because there are even more great ways of hanging. So you can store this document on a file system and this machine learning algorithm that you need to manage the document more efficiently than a document manager. So that's that. In no way, you could think of saying that you know, that you can talk about it. You are trying to, let's say, extract the key files that represent your document. Yes. Okay? In a way, if you say that this document is actually a document of these three things, so you could index a document. Right? And then put that whole document in some cold storage. Okay? Only when your search matches that index forms, you know that I need to go and do that. Okay? So search, what can these do? You know, use this kind of DA by DA by things and kind of everything. The index does documents, so that you know, for your reader searches, you can get, you can retrieve and then your whole document, it does the searching. You know it in the first one. Okay? How do you start defining the number of customers and what do we do? Of course, to define that. So defining the number of customers is already, it's already a challenging class. Then I don't think it's a right or wrong. You know, I think one of our earlier speakers said that, you know, how many refunctions on that, why they have, they use that as, you know, a way of finding that. If you don't have a good starting point, I think, you know, one way is go with the way you could use this. You could use a thousand documents. Okay? And rough news, you know, I want maybe clusters having a hundred documents each. So I started from this case. Okay? Do the first one. And I think that research should drive you, you know, okay, I got two less or two more clusters. Okay? If I find that, okay, I have clusters, then I've got lots of clusters, he's having one hundred documents each. Okay? Maybe I need to reduce the number of customers. So iteratively, you could come up with a good key that, you know, matches your... So it should give a good... So it should give a good... So human interaction, yes? Or some rule-based thing, right? You can say that I don't want clusters which are, you know, having one document. Okay? At least give me clusters that are five and nine documents. Right? So this, or itself can iteratively run till it finds a good key which fits your source. Okay? So that should be what we are saying, that, you know, and have to automate what the user could have done but we could repetitively apply some rules. So clustering, you know, technique was one where we, you know, want to group together similar documents. The next one is categorization. So clustering, what we saw was a unsupervised class of machine learning where, you know, we didn't have anything historically built up that we had learned over time which we used here. But this is a case, you know, which is of the type super-concerned learning where what we do is we, you know, make use of documents that have been packed before buying humans. So let's say, you know, this con, for example, humans have spam mails, right? So where, you know, user marks that this bin is spam and, you know, the mail provider knows that, okay, I have seen so many users mapping these bins as spam. So it has built up a whole training set of documents which users have mapped as spam. And the next time any two bin comes, it is going to classify based on the modeling we're using this industry of user type documents to identify whether this new bin that is coming in is spam or not. So this supervised class of algorithms to make use of this learning that has happened over history to make some kind of a model and then use that model when a new document comes in to make the class of humans. So the diagram actually says that in any of this learning or text we use the learning dataset of text. We use that to create your model and when new documents come in, we use that to make a prediction. So now spam is one example of that. We have used it in different settings which I've explained a little bit. But some of the algorithms that do this kind of classification could be 9-byes, support organizations. And I guess this also relies on, you know, some of these also rely on creating, converting your text into this bag of text representations and that, you know, it takes that as input and then does the other processing. Another common task, you know, is of text summarization. So the aim is you have this huge document and you don't need to go to read it completely. Then you give me a short summary of it. So there are two main approaches. There is what is called selection based where, you know, you select only parts of the document. The typical example should be title, you know, there are paragraph papers or, you know, the first part of the paragraph. So those could be some ways you could quickly pick out important things that are there in the document and present them. The other way is you can use it some more in more techniques, you know, kind of. There are some semantic analysis which doesn't take so long. I'm not here now, looked at those kind of things, but this summarization is also one way, you know, the area where text analytics can help to, you know, quickly give a short summary instead of you having to go over long term. So with that, I'm going to go to some use cases before that, if there are any questions, I can take now. Yeah. So if I have a problem of, you know, I think creating abstraction of following up with the summarization category, information extraction is, you need to be looking for some key text in that document. Abstract is kind of, you know, giving you a light of the main content of that document. It should follow under the summarization. Okay. So let's get on to a little more use case. So, you know, BAC has, you know, is a practitioner of IT. It is a, you know, IT management kind of framework which IT management companies follow. And it has a lot of, you know, under IT, there are a lot of subcategories of different spaces that are covered. Incident management happens to be one of them. So incident management is about equal racing incidents, you know, tickets, queries, support calls, those kind of things. And the whole life cycle of how those incidents are migrated from generation to the resolution. Okay. So as part of incident creation, you know, BMC has a lot of activity. You know, people use that a lot. Now let's say, what the problem is, so if you look at an incident, it has, it could actually get into, it could have hundreds of attributes associated to it. What are your categorization here? One, two, three. Or your product category, one, two, three equal. And then there are lots of other information like who is it assigned to, when it was integrated. So lots of attributes that actually are associated with your incident. Okay. Which if you want to create an incident, let's say, you know, there is a classification who is taking a phone call from a customer who is reporting a problem. Just to create an incident in the system for a head, you might have to fill in a hundred phone feeds. Okay. That itself is a challenging task. Many times, you know, there's categories of information that you need to fill. You might have had different categories to select from it. And every time this person is not going to, you know, and to remember, you know, okay, I think this problem should go to this category. And you know, many of times, they don't even fill the information. They just report a problem and then, you know, let somebody else do the fill of the category of information. Even that happens. And actually, you do that also because the solution of the problem. Because unless it is categorized, it cannot be assigned to somebody. And if it gets assigned, then it's going to be resolved. So all kinds of problems, you know, do occur. And just because the problem wasn't entered correctly. Okay. So we are trying to come up with an option where, you know, you do away with, you know, filling out these wrong forms and also give the service station actually more power to, you know, try to resolve the incident there and there instead of, you know, even having to create an incident. Okay. So that's the value proposition that we're looking at. And in this case, what we are making use of is the information extraction I talked about. So as the customer is describing a problem, I'm picking out key information, you know, who's the person or the customer who's reporting the problem. What asset is he talking about? Is it from service that I'm supporting? Is it some kind of a laptop or some device that he's talking about? Or, you know, some asset that can be, it can be a server that he's talking about, which I know is an asset which I tag in the incident reaction itself and I'm going to give it to some faster resolution. So those are the key entities that I'm looking for. If I can extract that and create the incident, it will not only be able to get resolved faster because these are the key things that help decide who this incident should be related to or you know what category it should be assigned. Okay. So I'll, so look at the solution here. So as I said, the problem is of entity degradation. If you find the correct entity, then we can show you more additional information about that and that might help you know faster resolution, by the support called agent itself, or you might, you know, at least, end of the year, you can take it, it's just more information. Then he would have a, you know, try and do this. So I'll just give you a quick demo. We have the exact match and fuzziness that you are showing on this slide. Yeah, I'll just go now. So now this mouse is very understandable. Okay. So as you may know, you want to set this desk operator who is receiving characters from the call. Okay. And you're going to type what the customer is telling you. So let's say, you know, Chris Jones is calling me. So as soon as I type Chris J, this fuzzy match actually kicked in and it is telling me, oh, we have two people, you know, maybe called Chris J. I can quickly call R.E. Chris Jones, and I know Johnson. That itself is a radio ad, you know. You have recognized the right person. Had I just typed Chris and left it, somebody who is working on the problem need to know who are we talking about. In the pizza ad guys, we use this kind of. Sorry. In the pizza ad guys, we use these seven techniques. They are immediately cashing. So let's say this was the problem discussion. Okay. So as I'm typing, I'm getting some kind of help here. Saying that, okay, I found one person for this show. That's an exact match. He's a type person. So all I'm showing you is a person, but in reality, you might get great shows is, you know, he's a developer in this. He's a, you know, what is his organization? Everything that you have from your identity management system. You could extract that information and show it here. I know. I'm giving you SharePoint, you know, probably SharePoint is a service that I'm supporting, but it is called, maybe SharePoint 2010, 2017. There are three modules of SharePoint I'm supporting. So, you know, that part coming as a fuzzy match. Okay. That okay. What exactly are you talking about? Because if I had left it at that, the person to solve it, it would have asked this question. Is he using SharePoint 10, SharePoint 2013, SharePoint 2007, what version is he using? So you have the person who would have actually got a fuzzy match. He would have clarified it. What are you talking about? Oh, yeah. You want to scroll down? It's something not to scroll down. The staff. Yeah. The staff. Anyway, so that fuzzy match has a SharePoint down there. That's the user. The fuzzy match is not here. I'll go with the ordering. You can. You can. You can. You can. You can. You can. You can. You can. You can. You can. You can. You can. You can. You can. You can. You can. You can. You can. You can. You can. That's where we're gonna. That's where we're gonna. Yes. You can store. Yes. You see, these search. We know. For. Cells. Okay, so that is. It has built a machine. So PAHMU gives you this cluster classifications and all of those. Typically the use case for PAHMU is when you have a large connection and a patch kind of a process. Not a real time, because you have to have this as a whole. And the most useful is which one way to match that as a single unit. So it uses those new cluster classifications and takes some of that same patch you use PAHMU. Now I'll tell you if you want, you can always leave it in the middle. Depending on the use case. So it will go in as a text file. So you can access the document. The document can be accessed somehow. So it will be open. So this is one part where we have extracted the QLTDs. We are also quickly telling you what are the entities like. We are giving you more information around that. So at the problem entity itself, there are no activities. If there are any, those are correct. We are also giving some suggestions. So this is the document similarity that I was talking about. Looks like this problem you are talking about. We have a technique for it. Maybe this is a repair problem. So I will take around 20 issues. So I can actually, here I'm presenting at least, but what I can give you is a possible solution. If not, then I will just say, we take issue, apply the issue template. If I just say that apply the issue template, then all that can take the knee and all those things are taken care of you. Only if you just say, apply the issue template, hit that create button, and that's it. So that's how some of these texts are created. So this similarity of documents and entity extraction are being used here to do some kind of application in the area of fields. And also resolution online. The next, some of the tools as I said earlier, you see UIMA is an unstructured information management appliance or something, I think, or analysis. It's a really complicated thing. People call it UIMA. So UIMA is the framework areas. OpenNLP gives you some kind of named entity recognizers, and you have to use all of them. The next case study that I talked about, so it was incident management where you know, we look at how incidents are created, and they are the whole lifecycle of those incidents. Problem management is one area where you want to look at what incidents are telling you what incidents can get from those incidents. So a typical task of a problem management, there's a role called problem manager. And the problem manager, what he has to do, is he has to keep a constant look at, you know, what are the incidents that are being reported, what people are ablating about, and are there any patterns that I need to act on? Or other, I can take those patterns and you know, figure out these are the problem areas, or these are the hotspots, because those people are reporting this wrong. So you are looking at some kind of a post-partum analysis of all the incidents that you have been getting, to say that, okay, I see there are a lot of people talking about video issues. So we need to do something about it. So you go to your IT person and say, I got some of these incidents around you, okay, please correct me. Or you could go to your product and check, this feature is not usable, all people are telling me this is the problem. So this is a feedback look that goes through the different tips, but the problem manager has to constantly keep a watch out for what are the complaints. And usually this is the manual processing we use to, through his experience only, he can look at the last week also, there was this problem, there was this problem reported, you know. I think from last week it has not occurred, this week it has surfaced. So all those kinds of things he has to keep track of, only then he can create those problems, identify those problems and then pass it on. So today, this is a very manual task, and depends a lot on the skill of that problem manager. But he needs help, you know, he cannot simply go over with acts of tickets, you know these incidents reported, come up with these problems, identify these problems. So for him, you know, there is a, the solution is, use text clustering. Take the incidents, okay. As I said, most of the incidents don't have a category information field. So I can't really say, you know, give me all categories of the incidents. Because that is the easiest way to determine problems, but in practice, all that information might not be available. So the next thing is, just use the text of the incident. What is the incident coming up on you? Okay, and try to find an incident that are similar. Okay, so create groups of incidents, which are similar, and then the larger groups are your immediate problems. Okay, if I get a cluster of, you know, which has thousands of incidents, okay, then I need to go over the incidents. Okay, so your problem manager is now getting a quick view of what he needs to, you know, report. What are the hotspots? What are the problems that you have set at the end? Okay. So, I agree with that one place. So basically, these are the, you know, on the top what you see. So there's a difference in visualization, you know, also requirement. I know that, just looking at this, the size of that, you know, bubble is telling you, you know, go look at this, go look at this. Okay. I want to get to the other end. Okay. So, in our, you know, in our IT system, we get, we get software, you know, people report issues, right? So we ran this on that data, and it looks like a lot of people are having problems with their laptop. A lot of people are having, you know, trouble accessing something. Okay. There are a lot of problems in your view. There are a lot of password issues. So this is kind of giving you an indication, you know, what are the problems that people are talking about, and how, you know, where do you need to invest? Let's say you have two people who solve these problems. You know, both solve this because this looks like a big cluster. If you solve this, it's an idea. Can I automate this, you know, password problem, because most people say, you know, I forgot the password. So here we have a set of services where all customers make passports, right? So you take actions on the problems that you are, you know, most commonly looking at. The services we have for the problem manager, and the text rustling kind of technique is used here to quickly highlight the problem in this. Okay. So why do we have this? So I just ran this rustling on all the talks that are being, you know, done for Fifth Elephant. Okay. So I went to the Fifth Elephant page, and, you know, all the people were given their entries. I looked at titles of all those topics, and ran this rustling on that. So let's say now, when I'm attending this conference, okay, I'm interested in all real-time, you know, string processing activities. So I can quickly, you know, there are fine real-time talks that are Cougar cars and real-time, UX3, or you know, somebody is interested in solar and elastic search. You can quickly come to that, and he can understand, you know, okay, maybe I know, go and attend these topics, because they are fine people. So quickly, okay, you know, what is interesting to you, or if you have to change the tracks, no, you have some inside, okay, if you want to have a real-time track, maybe take these two groups, watch them and say to this very good track. Okay. So that was one example. Cougars, you see, Rekha has this, Rekha means the evidence from Rekha. This is Rekha Rekha, this is a commercial one, and the visualization you saw was a phone-driven visualization. Then we take to the next case study, which will be the last one, which is again around incident management, and the problem that we are looking at is, you know, when you raise and check, again, this is the problem, you know, somebody is reporting a problem either over the phone, and she is an analyst who is taking an email, so people are just, you know, firing an email to write this. Okay, the analyst can send an email saying something is not working or things like that. So when such problems come in, you know, how are they getting assigned? Again, as I said, categories are one, a commonly used technique, a problem is also a category, this solution is of the same. But as I said, categories are not releasing, or there are hundreds of categories, and then, you know, this whole discussion has to be, okay, this is, this category should be accepted, those kinds of problems are there. So what we looked at is, can we use text classification as a technique to solve this problem? So the idea here is, in my system, I have lots of incidents that have been reported in there, they have been assigned to different groups and solved with them, right? So this becomes my training data. So I use this training data to create some kind of a classification model. So all the past incidents, I know the text of these incidents, I know the assigned group for those. So when the new ticket comes in, I will apply that against the model that I have been using in my classifier, okay, and then this ticket should be assigned to this group. So this assignment problem, as I said, you know, it could take days in some cases, could actually be automated, or you could at least help the person doing the assignment to take those visits quickly. So that's the idea behind it. Doing the applied text classification, you know, as a technique for the problem of assigning of incidents to the right people. So sorry, so I think it's a new incident, I think it's related, because the server is very, this is an opponent that we know, it's not very important to ask what should be the assigned group. And this is essentially making use of the model we've been using, the historical tickets, okay, and with that classification problem, so this component actually where is that model, it says this is a suggested group that this ticket should be assigned to. And then, you know, the problem is that the assignment prediction can be do if it is not correct. There are certain algorithms which also can auto-correct the auto-learning to tell the review. So that's the thing I do, available for some classification algorithms that we have made, you know. But, you know, the idea is, we are doing a busy text classification as a technique to do the ticket assignment. So that's that's what I have to cover. If there are any questions, we can take them to the last talk of the day. I think people are there. Because we will take them to home. Really? Yeah. Of the UI, we have some UI and may have a full graphic background. For the UI for the prototype. Okay. There's a whole thing. There's a whole thing. I have, you know, it's based on D3. It's based on D3. It is a JavaScript thing, JavaScript based on D3. Yes. It's a commercial one. Yes. So when you store the data into the database, you also have to be sort of store the data or do some kind of a structure management as well, like draft. You know, are you having a specific problem out of the three or in general? In general. Okay. So in general, if you are a search kind of a problem at hand, you can put it as a search behind the six users you see. Okay. Indexing. Indexing. Search is your main usage. If you are ID based, it will take a long time. Hash is kind of, cash is your main requirement. You can see. You go over to these, you know, red heads or those kind of combo TV which are a key value, key value stores. You can use those. Typically. It's not that. It's a document on the entire database which is like, you know, put it for a scale but not for transactions. So depending on what your which case is, you know, you would go to this engine within this noise, you know, you have different types of data that says key value stores, document on it, color-oriented. So depending on which case you would choose the appropriate store, you would want to retain that. Do you have a use-case exact pattern? There are so many. For a particular use-case, then these are the stack of technology that we can use to solve that. Do you have a use-case pattern? Or do you have a use-case pattern? I would say, you know, once you understand the basic techniques, I don't really understand this. How do you think that in this use-case, I'm trying to, I'm trying to solve a problem. I'm trying to identify what are the key problem areas. So with a technique like clustering, because that's what I'm trying to do. I'm trying to create a group of documents which are dealing with some problem. So I have some set of techniques that are available to me at hand. Depending on my problem, if I can map that to my use-case, that's the way I would pick up the writing. Or at least I can try if this is working or not. Are you doing multiple of these which two types you may speak like an additional name? Yes. I think that's the reason we are moving up. Now if you're a three-text, you want to know what is somebody's problem. Just calling your phone and whatever it is, what does it do? You don't need a manager. We don't have a practical but if you use this problem, it's a very scenario solution So as I told you, these techniques are very basic. I have applied it to our domain. But you also know these techniques you can now include in your organization. That's the core idea is to use the techniques so that you can take it back and use it in your own way.