 I'll just keep introducing about my team. I beg your understanding, I might, you know, read up my script for the presentation. So I just get straight to the topic of this presentation. So I want to introduce and propose a research topic at the same time. I'm introducing and proposing at the same time, since, you know, I'm introducing because my team is already working on the analysis that gives us Korea. And then I'm proposing because, you know, we're all evangelists of open science. So more resources and ideas are welcome. It would be perfect. So if you're interested, just come here later. So I'll focus on, you know, explaining why this study is important, the rationale behind it. And then I just briefly go through some research towards prior studies. And then the methodology we are going to use and the data set we are analyzing, how we are going to evaluate the analysis. And then last, the implementation of this study. So to quote some prior studies, clustering and classification is really, you know, fundamental to any kind of analysis on groups of data. And it is especially important because, you know, the data generated is very crazy, crazy. And to emphasize a little bit on, you know, academic knowledge graph, as many of you might already know. The silos of humanity knowledge is doubling every decade. And every year there are more than a million paper added to the process. So, you know, in this era of information overload, we need to have a decent analytic technique to, you know, get a grasp of what we have in the knowledge graph. And then on the other hand, this is the important part. So instead of mentioning the whole research thing, our project total needs this study. To explain the part, I need to briefly explain about what we are trying to do with Pluto. So basically, we are, you know, kind of often mentioned together with science roots on the press. So we are trying to do similar things. And we want to make a publishing platform where it is, you know, genuinely different from legacy system. So we don't want to have editors because, you know, and then no screening, like, except for reject. And then no grouping of papers is general. I think, you know, already mentioned, you know, it could be substituted by labeling or tagging. And then one thing we wanted to keep from the legacy system was, you know, having the peer review. Because still, we need a measure of evaluation on every piece of knowledge. So then one part was, you know, what we wanted to keep from the legacy system. And so as a new focus system, we wanted to have reviewers give scores on the paper they review quantitatively. And so this is kind of like, you know, having a consensus on the quality of papers on a public blockchain. And the major challenge here is that this is much of a domain of, you know, subjective field. So, you know, for public blockchains, they are making consensus on, you know, much more objective domains, like, you know, as in they have consensus on the legion of who has how much money, how much tokens of currencies. But here, what we want to have consensus on is the subject, you know, measure of, how variable of a piece of knowledge is or of papers. So this is a very subjective one. And the problem comes in this part. So this is to have a, you know, a consensus on the quality of a paper. And then the reviewers are giving scores. And then we want to have, you know, rather question on if the reviewers are good as good, you know, as the scores they are giving. So there's a second desk, you know, consensus. And at first we wanted to, we looked at it as having, you know, multi desk review, aka, you know, review of reviews. They have it in the period of science. And this is, you know, this doesn't work because, you know, so this is needed because we have reviews and then we don't, we can't ensure whether the reviews are good. So we have, you know, second desk of review. And then how can we ensure that the second desk is always good? And how about the third one or fourth one? So it came to a conclusion that a multi desk design doesn't work, the game never ends with this design. So our solution to this was to, you know, utilize the record of activity from the past on the reviewers. So that is, suppose that I am giving review for BFD and I have, you know, submitted those three papers in the past and then reviewed other two papers in the past, then we can just use the similarity function between those five papers and then the paper D. So if this set of five papers and the paper D to be reviewed are very similar together, then one could simply say that, okay, it's okay for me to review the paper. So that was what we wanted to do with the analysis on clustering papers. That's a long story for explaining why we needed to do this study. And so look at some, you know, prior studies on this field. They have, you know, analyzed on the co-citation more bibliographic co-citing, you know. So this is, you know, when a paper cites two papers in the same document, they're co-cited. And then again, a paper is cited by two different papers that have bibliographic co-citing. And then there has been also, you know, studies on cytases. Cytases are the sentences where the citation actually occurs in the full text. Studies have been analysis on this kind of thing. And then also we had recent works on, you know, analyzing the network we have on the knowledge graph. I personally had experienced a study in my undergraduate program that was doing, you know, the simple analysis like, you know, centrality work between these. So that's one of the most, you know, elementary analysis from the network analysis. But that didn't go really well. We found no meaningful results from those analysis. And these are relatively more recent works from the network analysis field. So the line is, you know, abbreviation for large-scale information network embedding and PTE, D-Borg, GCN, and SDN. Those are all recent works from network embedding and which is really relevant to our study. And our proposed methodology is not-to-backed algorithm which is devised by two Stanford researchers three years ago, I guess. And the whole contribution from this not-to-backed algorithm was that it has a flex of notion of neighborhood. Neighborhood means, you know, the adjacency of nodes in the site network. And then they don't need a seed in the first stage. So for, you know, for ordinary network analysis, they require a seed input to get a great modeling of it. But not-to-backed algorithm had to give no seed in the beginning. And then they use the skip-gram, you know, all that to explore the network. And they're also scalable to large-scale networks and they are good for general uses for any kind of networks. So this is the methodology we are trying to use to analyze our data. And our data being used is offered by Microsoft Research. It has more than 170 million papers, metadata of them, and which covers more than 48,000 journals worrying about the user, our service currently. And the prior studies had always, not always, but often been, you know, analyzing on DBLP or site-care database for sometimes web of science. But those are, it has near limitations in that those two on the other side, they are more focused on computer science field. And then they have much more less coverage of paper. And speaking of web of science, we are not an institutional research institute. So we don't have any access to web of science. Anyways, the data set we are using is this one. I guess it's one of the biggest database when we speak of knowledge graph. We are going to evaluate our results, maybe by one of these methods. So the most difficult things are, you know, checking the correspondence of journal. So if we have, you know, link from our network analysis between paper A and B, we could just see if paper A and B are actually, you know, published in the same journal. They could be used as a evaluation metric. Or we can just use the link prediction. So predict the link in a network with more significance of importance. It's a great evaluation. And another one we are thinking of is the proposed paper in last year. So they proposed to have a modularity of, you know, measuring the quality of the string using the time span of the path linking the two papers. So, you know, if two papers are clustered in one cluster and then they are, you know, spreading long time span, then they say it's a good, you know, blustering. I don't know why, we have to, you know, look at more about the imagination part. So if you have any suggestions, this is everything. And the impact of this study is, you know, so I have explained, you know, a little bit about our system design for peer review. And then if you have a good result from this study, we can, you know, make the design system. And then we also can have a good recommendation algorithm for papers. So, you know, Bluetooth network is currently providing service called Sinex, the urn is synex.io is SDI. It's a search engine like Google scholar. So you can look for academic papers, but you don't have full text, but it's a search engine. And then for the search engines for academic work, it's great to have a recommendation system. And then the limitation is that we are doing this study under some clear assumptions. So firstly, we are assuming that academic papers are available to measure on a quantitative base. So it's just, I don't want to argue on this. If you're just assuming it, we can have quantity scores on the papers. And then another assumption is that site tension is a clear evidence of relevancy between two papers. I also don't want to argue on this. It has been, you know, a lot of debates on this, but it's just some assumptions for our study. And then, so this low to vector analysis is a, I know, modeling thing. So if you have new entry, that is, you know, you have a freshly submitted paper into your network, you always have to model the network again. Because, you know, new entry changes the structure of the network. So it's always, it's to speak about machine learning. It's not a learning from machine, but it's a modeling of the machine. And then lastly, site tension practices might change in the future. So it's also one of the debates going on. And lots of my teammates think that it should change. So it's not a clear limitation. And so these are references of my proposal. I should have more references, but I forgot to put them, sorry. So I guess that's all I have to say about this. What's your comparison using this kind of graph analysis versus AI machine learning approaches to the same domain? Basically, network embedding is one of the AI techniques. So you see it's strong overlap, but you're not actually using or studying that in terms of how Pluto works. You're just providing, that research is just the algorithm which could be used. Yeah, we're just practically using the algorithm, yeah. You said that the site Asian practices might change. How do you think they will change? Well, I personally have no view on that, but my team lead, Jun Sun, thinks that currently there's too many references. So he thinks like, most of paper should have like less than 10 references, but it's just his personal thought.