 Hello, okay, so everyone can hear me, right? All right, great. So yeah, so thanks to the organizers, Sumod and Zainab and all for inviting me. It's a real honor to be speaking here. So I will be talking about the area of knowledge graphs and why we should care about them and their relevance for AI. So as was mentioned, so I'm from Indonesia of Science, Bangalore. So earlier this year, I also founded this company Kinom and time permitting, I'll mention a little bit about that towards the end. So with the growth of, so before we get there, maybe like now, so if you have to summarize the thesis, research thesis in my group at IIC, it will be that background knowledge is really key to decision, intelligent decision making, right? So with advances in AI and machine learning, we are asking machines and intelligent agents to do lots of non-trivial tasks on our behalf that traditionally humans have been doing, right? So it would be like the simple things like how I should lift this particular phone from the table all the way to like how I should be investing in the stock market. So in order to make those decisions, we utilize our knowledge and context about the world and those kind of knowledge are really critical to do these tasks well. For example, how much force I should use to lift this phone. I already have some sense of like, how much this phone is going to weigh and I'm going to apply appropriate amount of force to lift that, right? So that kind of knowledge we have accumulated by interacting with the world right from our birth, maybe even starting in the womb and it's going to continue until we die, right? So that kind of knowledge about the world, that context is really important to make those decisions well, but intelligent agents don't have access to that kind of knowledge and context. So that seems a little bit unfair because we are asking them to do the same task but they don't share that same context and same knowledge that's really critical to do those tasks well. So our goal, research goal is how we can bridge that gap and make broad coverage, world knowledge, contextual knowledge available to AI and machine learning agents at scale. So just to drive home the point and make it a little bit more concrete, let's say I don't have time to go through 10 different news sites in the morning, I want a system that will summarize the news for me, right? So once a system comes across a sentence like state farm stocks tumble along with Berkshire Hathaway. So in order to make sense of this sentence, we need to know that there's a concept called company, two of whose instances are mentioned here. So one is the state farm, the other one is the Berkshire Hathaway. There is something called a stock market where these company stocks are traded, right? And there are price associated with those stocks and the price is very like, you know, even at a per second basis, right? So all that background knowledge is really important to understand the sentence, but none of that is actually mentioned in the sentence itself, right? And that's by design of language, right? Because we use language for communication, so we are only going to convey things that we think the recipient doesn't know about, right? So otherwise it's going to be very redundant and very boring for us, right? So even if you have all of that background knowledge, why there might be a correlation between the stock prices of these two companies, that's not immediately clear, right? So if I, sorry. Yeah, let me just change this pointer. Then I can, the laser is not working here. I'll just give me one second. Yeah, oh, I guess the laser works, but the screen doesn't. Okay, anyway. Okay, so if you want to understand why there might be a correlation, so that's not immediately clear, but if we have this kind of some structured knowledge, which is saying that Berkshire Hathaway is a company which is mentioned in this particular sentence, they have a subsidiary called Heinz that you may have seen in supermarket shelves, right? So those tomato ketchup bottles. And they are insured by State Farm, which is also mentioned in this particular sentence, right? So one plausible explanation for the correlation of prices of these two companies, of these two stocks, could be that Berkshire's stock prices are down because something bad has happened to its subsidiary, right? And the losses of that subsidiary is going to get pushed on to the insurance company, so their prospects also don't look great, so that their stock prices also needs to be factored in and lower, right? So our interpretation and analysis of this particular sentence is probably a little bit more deeper because we have access to this kind of structured knowledge compared to say 30 seconds before when we didn't have this kind of structured knowledge available to us, right? So are you with me on that? So this kind of like a multirational graph that you see, that is actually a fragment of a much bigger graph, and that's what we call as knowledge graph, right? So where the nodes are entities and objects of interest and the typed edges are the relationships that connect those entities, right? So our goal is and thesis is that if you are able to build these kind of graphs at scale, these knowledge graphs at scale and make it available to AI and machine learning agents, then their performance is going to improve significantly, right? So our research is focused on constructing, maintaining and applying this kind of knowledge, right? So are you on board with respect to the motivation? Okay, great. So knowledge graph is not like, you know, like some abstract academic pursuit. In fact, all of us are beneficiaries of knowledge graphs already, right? So when you search something on Google, you may have seen these kind of panel on the right-hand side, right? So hopefully everyone has seen. So we're like, you know, I search for Accenture here. It quickly tells me a lot of things about their CEO, where their headquarter, their revenue and things like that. Now that I don't have to go and read their Wikipedia page and five other pages to get the same information, right? So that way it improves my web search experience. And these kind of panels are called a knowledge panel, which is being served out of a knowledge graph, like the form that I showed you before, that Google has built internally, right? So this is one of many, many such use cases of how knowledge graphs could play a role in improving our lives, right? So the NEL project at CMU, where NEL stands for Neverending Language Learning, is an ambitious project trying to read web documents on a continuous basis, and which are written in natural language, like English Hindi Spanish. NEL currently focuses on English. And the goal is how we can read this entire corpus of English web documents and build this kind of structured knowledge graph, a fragment of which is what you see here, right? So here, so this is a fragment of this much bigger graph that NEL has built by running over more than eight years now. So it has been running since January 12, 2010. And what you see here are the nodes are the objects, like you know, Toronto Maple Leafs, their hometown is Toronto, they play hockey and they won Stanley Cup, right? So I was part of that project before joining IAC and we continue doing that line of research in my group here. So I wouldn't be going into the details of how NEL works and the internals because that's a whole another talk on its own. So I'll be talking about some of the more recent work we have done following this line of research in my group at IAC. But if you're interested, you should go and visit this URL, RTW.ML.CME.EDU, where RTW stands for Read the Web, right now, as ambitious as it sounds, but that's what NEL is tasked with. Those of you who are on Twitter, you can also follow NEL at this handle, CMUNEL. There are a few thousand followers. We hope most of them are humans, but like on Twitter you don't know what's going on. But NEL tweets things from time to time that it thinks is interesting, like these facts and edges and beliefs from here. And these people or these agents, I don't know whatever you want to call them, those followers also give feedback both positive and negative. So I was just having a conversation with someone just before the talk. I think there's a very interesting open question here. How do you kind of exploit this communication between an AI agent and these a few thousand human followers to improve the AI agent's performance itself? So I think that's a largely open question. Okay, so there are two views of knowledge. So one is the view of knowledge graphs that I have been motivating all along, which is symbolic. It's easier for us to consume and it's highly interpretable. The other side with all these advances in deep learning and representation learning, we have dense representations or embeddings of the same objects, like these entities and relationships, but not as nodes and edges in a graph, but as points in a high dimensional space. So in this particular case, GM and Toyota and competes with, these are two entities and one relation. So we embed them using something called graph embedding techniques. I'll mention that in my talk later on. So those will be like 100, 300 dimensional spaces and we can project them into two dimensions for readability. So we see this as a spectrum on one side, like you have this symbolic interpretable embeddings. The other side, those may be good for like not doing inference, but those are not very interpretable. You don't know what's really going happening in each one of these individual dimensions. So we see this as a spectrum and rather than choosing one side or the other, we're interested in how we can go back and forth between these two types of representation because they provide their own respective strengths and weaknesses. And also in the talk, you will actually see some instances of going this back and forth, right? So now, looking at the knowledge graph itself, the left side, I would like to point out there are two types of knowledge graphs. So one is what are called is ontological knowledge graph. By the way, if you take the sentence, like you know, Obama was the president of USA. So the joke is that some people wish this was not in the past tense, you know, like with everything else. I guess this is getting recorded so I should not be making those kind of jokes here. Okay, so one is this ontological knowledge graph where we have some schema, some ontology and we want to extract and organize this knowledge with respect to that schema and those ontology, right? So we maybe have like a precedent of relationship and we are interested in finding instances of that particular relation, which is like Barack Obama and USA, right? So these tend to be high precision. So the NEL knowledge graph that I mentioned is an instance of the ontological knowledge graph. So they also are canonicalized and normalized, right? So all surface forms of Barack Obama are going to get collapsed into this one node for Barack Obama, right? But the problem is in order to build this, you need supervision, right? So systems like NEL is trying to cut down on that supervision or training that's necessary, but that still is a initial step that you have to take. So on the other side, you have these things called open knowledge graph or ontology-free knowledge graph, right? So this is pioneered by Orinette Zioni and his group in University of Washington and Mossam from IIT, Delhi, who was also in that group and now in IIT, Delhi now, has been pioneered in this particular space of open knowledge graph. So the thing here is they kind of represent the knowledge just based on surface forms, right? So in this particular case, that's what I'm putting those in quotes. So Obama will be a noun phrase, was president of the relation phrase, and then USA will be another noun phrase. So the good thing here is that these are easy to build, right? So these are not so much tied to the semantics, primarily based on the syntax of the language. You can get these kind of subject predicate, object triples. They also are highly called, but the problem is they give you a fragmented view of the world, right? So Obama and Barack Obama are going to be two different nodes in this open knowledge graph, right? So which is not a very desirable situation to be in, right? So one of the approaches we are taking, like in terms of building this graph, is while that's a good end goal, this is an easier starting point, right? So what we do is we start here and slowly move towards that right hand side. Let's start with open knowledge graph and then move towards the ontological knowledge graph, and you will see some evidence of that in this particular talk. So I have organized the talk in three parts. So one is the problem of relation schema induction. The second is the canonicalization problem, and the third is other related work, and then I'll close with also talking about what we are doing at Kino. So the first problem is the problem of domain-specific relation schema induction. So many of these actually needs came along when I started at IAC and was talking with people from industry, and I was telling them about NEL, so they said, like, you know, that's great, but what we are interested in is building a domain-specific knowledge graph. Like, you know, I have these documents from biomedical domain or insurance domain or automotive domain. Can you go deep dive and build a specific knowledge model for each one of those different sources? So one of the problems there is if you are given a data set, so they are willing to give us the documents, construct your documents from that particular domain. So then the question is, what type of knowledge I should be even capturing? Right now, what are the categories and relationships I should be capturing from that particular domain? So this work is focused on doing that. So this is joint work with my PhD student mother, former project assistant Uda who is now doing his PhD in US and our collaborator Manish from Microsoft in Hyderabad. And these are the two references for this work. So as I mentioned, the relation schema induction problem is given a bunch of sentences from my target domain. So in this particular case, some biomedical or clinical sentences which says like, you know, John underwent angioplasty last Tuesday, Sam will undergo tone-selectomy, cells that undergo meiosis. Given these kind of sentences, we'd like to infer that there is an undergo relationship between patients undergoing surgery. And there is separate relations. We're still using that same relation phrase undergo about cells undergoing different types of division, right? So this becomes a first step towards building a domain-specific knowledge graph in terms of discovering what are the categories and the relations for that particular domain, right? So right now the mode to approach this is to just go and ask experts in that particular domain. But there are at least two problems with that because experts tend to be expensive and the limited time that you have access to them, they're only going to give you like a partial listing and not necessarily exhaustive listing of whatever you need to capture from that particular domain. So we are open to taking those kind of expert inputs, but we'd like to also have an automated method to complete those kind of inputs. By the way, is my terminology okay? Categories, relations, predicates, schemas, is that all fine? Okay. Okay, so we pose this as a tensor factorization problem, this discovery of the relation and the schemas. So, but before going there, what we do is like now, we take these sentences from our domain corpus that our collaborator has given and then we run this open information extraction tools that I was mentioning earlier and these kind of tools given a sentence will give us triples of this form, right? So if you have like Sam will undergo surgery next Tuesday, this will give us triples of the form, Sam undergo surgery, right? So this is a good starting point and we represent, so we run this thing on the entire corpus and collect their counts, right? Now how many times this particular triple occurred in my entire corpus? So in this particular case, John undergo surgery occurred like 10 times, right? So we extract and aggregate these counts and then represent this information in the form of a tensor, right? So if you're not familiar with tensors, you can just think of tensors as higher-order generalizations of matrices, right? So in case of matrix, you have two modes, but tensors have like more than two modes. And in this particular case, we have a three-mode tensor, right? So one mode corresponding to our noun phrases like John, cells, Sam, another mode corresponding to our relation phrase which in this particular case is just undergo and then the object, right? So which is like surgery division and all. And then the IJK, like one particular cell along with taking I along one mode, J along another mode, K along the third mode is going to represent some normalized counts that we have aggregated from the corpus, right? So that's how we construct that tensor, right? So now this relation schema discovery problem will pose this as a tensor factorization problem. So what we want to do, so given these noun phrases, we'd like to discover what kind of categories are there, right? So maybe induce some category called A2 which would be kind of like the patient category. Then there are some surgery category A4. So this is the induction of the category and then we'd like to discover how these categories are related with each other and which will finally form the schema that we are interested in building, right? So we do this category induction and the schema induction problem all as part of one joint optimization which is the tensor factorization. So we experimented with different types of factorization but we found this rescale to be the most effective. So like the rescale has been utilized for many, many years now but this is the first application of rescale type factorization for the schema induction problem. So here we are factorizing this matrix, this tensor as a product of a matrix A, a core tensor R and another matrix which is a transpose of the same matrix first A as A transpose, right? So without going into details, the way to think about it is this A matrix is going to give us the mapping from the noun phrases to the induce categories, right? So if I have N noun phrases and I specify that I want K categories, so that's going to be the K columns of this A matrix. So this is giving me the mapping from the noun phrases, subject noun phrases to the induce categories. On the other side, the transpose is also giving me the mapping from the object noun phrases to its respective categories. Now this core tensor has an interesting structure. So for each unique relation like undergo receive, there will be a separate tensor slice in this core tensor R, right? And then the size of each such core tensor slice is going to be K by K which is the number of latent components that we have that has the number of columns in this matrix A as well, right? So now each cell in that particular core tensor slice, so this is represented by that point A there, is actually telling me how two different categories, what is the strength of association between two different categories with respect to this relation phrase. So that currently corresponds to the undergo relation, right? So now if you think that's exactly the schema that I want to induce, right? Like you know, which is kind of like induce the categories from the two sides and then in the core tensor slice, you are kind of like sticking them both together, right? So it's the perfectly, actually models, the problem we wanted to do. So, I mean like you know, if you're seeing this for the first time, this might be a little bit too rushed but you can go and refer to the papers to get more details here. But hopefully you get the intuition of what's going on. And we optimize this by minimizing some reconstruction error that like you know, we have this input tensor X, by doing this product, we are getting this another tensor X prime and we want X and X prime to be close to each other. So if you're familiar with matrix factorization, this should be quite simple to extend. So with all this excitement, like you know this didn't work at all, right? So we applied this on two data sets, Medline and Stack Overflow. And then KBLDA was the state of the art at that point of time and higher is better here. On both of these two data sets, we tried Reskell which is what I described. ParaFAC is another type of tensor factorization. None of those, like those kind of like famed miserably, right? So for the faint hearted, this would be kind of like a point of departure and say that like you know, tensor factorization methods don't work for this problem. But we were not very willing to give up at that point of time. And based on our prior work on matrix factorization type of methods, we realized that like you know, most of these factorization style methods are highly non-convex and it's very easy for them to like get stuck in local minimum, right? So somehow if you can put the right type of constraints, we can kind of like make progress and go towards the right type of solution. So that's what we actually started doing. So we incorporated three types of side information. So I'm going to talk about one now. So one is the relation side side information. So let's say from our domain knowledge or by utilizing some word embeddings and all, we discovered that undergo and receive are very similar relations for this particular domain, right? So now what we do is we say that like since these two relations behave similarly, so their schemas also should be similar to each other, right? So schema for a relation here corresponds to this core tensor slice and we basically regularize and say that like, hey, during doing this factorization, their respective slices should be also very similar to each other. And then we do a joint factorization of not only the tensor, but also the same latent components will also have to explain the side information matrix corresponding to the relation, right? Similarly, we add a noun phrase side information as well and also we impose non negativity again based on past experience because we knew that that's usually works well. So this is the overall architecture where we create the tensor and add these two types of side information matrices to do a joint factorization. So after you do all of those things, there is success, right? You know, I'll not be talking to you about all of these things and papers would not be published if there were no results here. So like now on both med line and stack overflow, the purple bar is our final model, which is called SIGTF, which stands for schema induction using couple tensor factorization, right? And we see that along both datasets, we actually outperform. We also have about like 10x to 11x speedup compared to the SKE BLDA baseline in terms of runtime, right? So this is one way to do this. Now all this, whatever I've talked about is for binary relation schema induction, right? So for one relation, you have only two arguments. Subsequently, we have also done work on extending it to any relations to like three or four arguments. And that's what we call is the tensor factorization with back off and aggregation. So that's the work that was presented in ACL this year, just last week. That again, like, follows the tensor factorization scheme, but has some additional complexities that comes up because of sparsity when you go to even higher number of modes and we have some mechanisms to handle that using some back off mechanisms. So in the interest of time, I'm not going to get into those details, but I just want to show you some examples of schemas that are learned by this TFBA method. So this is an extension of the SIGTF, which was doing the binary schema induction. So now in this shootings dataset, it's able to identify, come up with this identification, like a victim identification relationship that has these four different categories where the first one is like police or the authorities. Second argument is recognizing some victim at a particular time and at a particular location. So these are kind of given by these different vectors. So we show these to human annotators and they give us judgments. That's what you see those valid and invalid. We aggregate all of those scores and that's what we were plotting in the previous slide. So if you're interested in building domain specific knowledge graphs and you don't have good access to domain experts, then discovering the schemas automatically is important and here we have presented a tensor factorization based method to do this. And if you're interested, you can go and take a look into the paper and also the source codes are available on from GitHub. So moving on, this canonicalization problem that I was mentioning earlier, like you know Barack Obama and Mr. Obama, how we figure out they are both the same entity. So this is a joint work with my PhD student, Shikhar, and former master's student Prince and this was presented in www.conference earlier this year and the source code is also available here. So you are all familiar, like another canonicalization problem is like, how do I figure out that Mumbai and Bombay are actually the same entities. So for this we present this method called CESI, which stands for canonicalization using embeddings and site information, right? So what we do here is that given the source documents, we again run those kind of open IE style methods to get these subject predicate object triples and by putting them together we get the open knowledge graph. So we get some knowledge graph out of them but purely based on the surface extractions, as we had seen earlier. Now over the last couple of years there have been tremendous number of methods that have been developed in embedding these multi-relational graphs in these vector spaces, right? So for one node you will get an embedding, for a relation you will also get an embedding. So you can apply those kind of methods and get those relation embeddings. Now we do some clustering in that learned space and the hope is that like the clusters that we discover are going to give us these canonicalized clusters, right? So the Mr. Obama and Barack Obama are going to be co-located and like Bill Gates and Mr. Gates will be somewhere else. But rather than doing this straightforward approach, we also put in this site information hope, right? So where what we say is like now if you already know some partial knowledge about like that two noun phrases should be actually like the same entity, like they are like different surface forms of that same entity, then maybe also we can influence that representation learning space by incorporating this kind of site information, right? So this follows kind of like our general approach of doing these kind of things is that like now supervision is very expensive and we are not at the luxury to get those things at scale, right? But we can only get maybe partial and noisy inputs from the users and experts. So we try to come up with frameworks where along with the main task that we want to do, we also want to influence that by incorporating this kind of site information in a principled manner, right? And this is one instance of that, instance of that, the previously the SIGTF and TFBA is another instance of that same philosophy. So as I mentioned, there'll be lots of work on doing knowledge graph embeddings. We use this method called Holi as a representative of the state of the art. Details are not super important, but you should know that like not, they have some scoring mechanisms for the edges and it will give you the embeddings out of the after the learning the embeddings. And then we utilize various types of site information, right? So if you have entity linkers which given around phrases can connect those phrases to entity linkers to knowledge graphs, then there are resources like paraphrase databases which will say that like management and administration are very similar to each other. So we'll kind of like impose regularizations utilizing those kind of site information. So this is objective. I don't expect you to parse and understand all this math here, but I just wanted to highlight like there are different blocks that's doing different parts. So like the first one is doing that KG embedding. So this is purely the Holi objective, right? So that is the knowledge graph embedding method we're using. And the NP site information, what it's saying is, so like notice theta is iterating over different types of site information, maybe like entity linking, paraphrase databases and all of that. And it's saying that like, if a particular type of site information says that long phrase V and V prime are very similar to each other, then we want their embeddings EV and EV prime should be also close to each other, right? And we are not asking them to be exactly same, but we're just kind of like, imposing them as a regularization term to be as close as possible. Similarly for relation, and then to make sure we don't overfit, we have some additional regularization terms. So this is what the method looks like. IDF was the previous state of the art. So this is a work by Kevin Murphy from Google and Fabian Sukhanek and his students from France. So this was in the pre-representation learning, pre-embedding world, right? So everything was done in the surface form, so that's what the performance you get. If you utilize each one of this site information, so those are the respective performances you get, but what we are able to do is, by learning this representation of these noun phrases and relation phrases in a controlled way, we are able to learn some better representation of them, which ultimately gives us better canonicalized clusters, right? And that's getting reflected and we get about 15% absolute boost in performance. So here are some qualitative examples. So from that high dimensional representation that we are learning, we are projecting them into lower dimensional space, and for noun phrases that are co-mentioned with each other, that's model's way of saying that I think they belong to that same canonical cluster. So it's able to learn some interesting patterns like in a GlaxoSmith line and GSK are kind of like occurring very close to each other, so model things, those are different forms of the same entity, but if you thought of doing this as a string matching, those would look very different from each other, right? So if you did like at a distance, those will be very far apart from each other, but by learning representations in a controlled way, we are able to bring those two together, utilizing like another structure of how that particular phrase is related to other phrases. So there's a five minutes or 10 minutes? 10 minutes, okay. Okay, so similarly we also have like in a dual relation phrase embeddings. One thing I didn't know before this is that Sakya Muni is another name for Buddha. How many of you knew that before? Okay, so maybe like 20 of you. So Sessi is already teaching you something new, right? And of course not all problems are solved. Like now we have errors like this here, like Toyota and Nissan are also proposed to be canonicalized versions of each other, but of course that's not true. But that could be because maybe we have like a sparsity about those entities in the graph so we were not able to learn the representations. But anyway, canonicalization is a very important problem, shows up in all places, whenever you have situations where data is being generated from different sources, right now where there is no prior agreement on the terminologies, this kind of fragmentation problem occurs, and you can utilize these kind of Sessi-style models on that problem. So I'll just briefly mention on some other works we have been doing here. So one is that like, as I mentioned, there have been lots of work on learning these embeddings, knowledge graph embeddings over the last couple of years, and most of them were motivated by like, whether I can get some improvement in prediction performance in various tasks, but with this importance of like explainability, interpretability and all of that, no one had looked at what kind of vectors are being learned by these methods. So in the first work of this style, we actually looked at what kind of embeddings, what is the geometry of the embeddings that are being learned by these methods and how these vectors are laid out in space and things like that. So we have some interesting observations that's documented in this ACL 2018 paper that you can go and take a look. Another interesting problem that we looked at is the problem of time stamping documents. So this is motivated by the fact that time is important for knowledge. Most facts are not universally true. They have certain life spans. So once a problem there is given a document, how do you figure out in what particular year that particular document was mentioned? So in this particular case, we are seeing something like that form of some form of taxation happened in 1995, four years after the IOC approved it. The true date of publication is 1999 in this particular case and the model has to somehow figure out that 1995 plus these four years is necessary to make that inference. So compared to other state of the art baselines in this particular case, they get confused by this 1995 and put most of their mass on that, which is like a year after 1995 and predict 1996, but our model, which is the neural data, kind of like puts a lot of its mass on the right answer. So in order to do this, we use this graph convolution network, so which is, if you're not familiar with them, these are like not deep learning method applied over graphs, right? So I guess nowadays you don't exist if you don't do deep learning. So this is our way of tick marking that, but we do a lot of work in representation and in learning and deep learning in the group, but what's remarkable is that we are able to get about like 18, 19% absolute boost in performance for this time stamping prediction problem. So this bursty SIM data was the previous state of the art and neural data is our proposed method and actually some subsequent work by utilizing attention, we are able to get even more improvement in performance. So this is just to show that also the predictions that we are making, even the errors, those are not as crazy or as bad as compared to these kind of like other baseline methods. So again, if you're interested, you can go and take a look at the ACL paper. So in my group, we are doing lots of work in and around knowledge graphs, as you can see. Some of them I have talked about here. We're also doing work on neural distance supervision where like now deep learning methods have been very successful when you have tremendous amounts of training data, but in our situations like now where we are interested in so many different types of knowledge, we cannot provide like a million instances for each and every one of those, right? So we are interested in how we can adapt deep methods but work with like what are called the silver training data which are like noisy and less quality. Then so the gold training data which is like highly curated by humans. More recently, we have also worked on problem of data augmentation using paraphrasing style models, utilizing these generative adversarial networks which have been very popular and successful in images but not so much in case of text or at least like, you know. Success cases are yet to come up and I think we have some interesting direction there. So it's not all work, we also have fun. So currently this is the composition of the group and even though we have been in existence for about just about three years now, so people have gone on to do graduated and gone on to do interesting works in other academia and industry worldwide. So to wrap up this part, like you know, we are interested in how we can go from strings to things, right, from unstructured data to these kind of structured repositories, knowledge graphs. This is not a one step process. Like you know, new knowledge is constantly becoming available so maintenance is a very important component. Construction of the graph itself is not the end goal because we want to improve the decision making capabilities of these intelligent agents by utilizing this kind of providing them with this kind of world knowledge and context knowledge and then also, like you know, this need not be a one way street. These applications could also dictate how this knowledge graph evolves over time. So in our group, we are interested in this entire spectrum of construction, maintenance and application of knowledge with knowledge graphs forming a central data structure where everything connects together. So in order to do this, our work spans the areas of machine learning, natural language processing and large-scale data analysis and through the combination of all of those, we want to build these kind of structured repositories and through that, improve decision making. We collaborate quite a bit with industry and also other academic groups and if you're interested in learning more, you should go and visit this URL. So to wrap up this part, so like you know, if you're interested in AI, then I think you should also think about knowledge and like you know, knowledge graphs could be one way to which is coming out to be a prominent way to overcome this knowledge bottleneck problems in artificial intelligence. Do we have a couple of minutes or questions? I'll just take two minutes maybe, yeah, okay, right. Okay, so like you know to help enterprises with this kind of like need for knowledge and knowledge graphs, I founded this company earlier this year or actually late last year, but we really started doing the work earlier this year of called Kinnom, which is like an enterprise knowledge graph company. So we are focused on enterprises and so we see that like you know, there are two big problems in enterprises where one is the big data which we are probably all very familiar with, right? So most of the enterprise applications run over structured data and there could be like you know, volumes of it, but then there is also the problem of dark data which like you know, enterprises are collecting as part of their day-to-day operations but not deriving as much value out of them. So these are things like you know, emails, chat communications, presentations, research manuals, maintenance reports, like in a customer interaction. So those are all inside the company but also outside people are talking about like you know, your products, your services. So how do you kind of like read through all of those and create some sort of unified knowledge model so that a lot of your business decision making could be run on top of that? So Kinnom is focused on how we can help enterprises make sense out of this heterogeneous big dark data. So in order to do that, we are building this platform called Kinnom Insights Platform or Kippin-Sharp whose goal is to basically churn through all of this data of many different diversity volumes, some of which could be inside the organization, some of which could be outside, read through all of that and build this knowledge model leveraging all our extensive like in a prior experience with knowledge graphs, natural language understanding, machine learning, deep learning and like a continuous never ending learning because we want the AI to improve over time as more data and interactions are available. So currently we are working with leading enterprises in the finance retail and consumer electronic space. So just to give you a highlight, one instantiation of this framework, so we have built this in collaboration with a company based out of Singapore, this tool called TrackCrypto which is trying to estimate the cryptocurrency prices. So as opposed to like in a standard stock market where you have fundamentals, the company structure is there, lot of the cryptocurrency prices are influenced by what kind of chatter goes on on the web and all of that is over unstructured channels. So these are like in a Twitter, telegram channels and all of that. So that's a perfect fit for what we want to do. So we have built this instantiation of Kipp which is like trying to do this social sensing signals out of this diverse amounts of data and then trying to do like price predictions not only based on the historical prices which is structured but also incorporate useful semantic signals from this unstructured data to do those predictions better and through like various types of back testing results we see that this is helpful. So we have a private beta launch coming up in this. So anyway, so if all of these things or any of these are of interest to you please come and talk to me and also these are my contact information. So thanks for your time and if you have any questions, do we have some time? Sorry, we don't have time. Oh, we don't have time. Okay, but I'll be around. Yeah. I'll be around, yeah. So please feel free to come and talk to me. So thanks again for your time. Thank you, Professor. Thanks for the wonderful talk. Sorry, we really...