 I'm super thrilled to welcome our first day to speaker. I'm sure many of you know him. He's the dean of computer science at Khan Academy and the creator of the jQuery JavaScript library, which I'm pretty sure changed the way we program things on the web. He's also the author of several amazing JavaScript books and an instrumental member of the whole JavaScript community and a personal inspiration to me for many years. And all of us are very excited to hear about some of his recent adventures. So please give a big, big warm welcome to John Rezegh. So I'm really excited to have the opportunity to talk about some of the work that I've been doing. And it's interesting because it's very data, but in my case, I'm doing lots of work with art, which is one of my passions, especially with Japanese art. And this is something that started for me personally and maybe about five years ago or so. And it's kind of been growing and growing. And as it's been growing, I've been trying to reconcile sort of my interest in this and my interest in obviously technology and programming and trying to find ways to make them work together. I think I've been pretty successful. So I wanted to talk a little bit about how this all fits together. I wanted to start with a super, super quick basic primer on the type of Japanese art that I'm really into. Just a quick intro. So I guess soon everyone recognizes this image. It's an incredibly famous, in this case, a Japanese print, The Great Way by Hokusai. And so this print came from a time in Japan from the 1600s to the mid-1800s. And this is a time of actually great peace in Japan and great prosperity. Everyone was making lots of money. And people were, as a result, wanting to buy art and acquire it for themselves. And this is actually a map here of Tokyo, then called Edo, from 1850 or so. And I love these maps. So one of the things I'll point out, given the crowd here, so you see like the little crest there with a couple leaves in it. And there's also all these tiny little red circles. Those are actually the family crests of the Damyo, so like the feudal lords. So what actually happened is that giant one there in the center, that's the shogun. That's his palace. That's still there in Tokyo. And all around it are all his favorite lords. And he physically positioned them close. He positioned the people he liked the best closest to him. So you can actually plot out the distance from the shogun's main castle to everyone else and get a plot of like how in favor they were with the shogun. So you can see some circles way out there at the edge. And they obviously did not like them at all. Totally anecdotal. I could go on about this stuff forever. So this art form, it's called Ukioe, pictures of the floating world. So it's woodblock printing. You have a piece of wood. You carve an image into it. You put some paper onto it with ink. And you're creating all these images. What's interesting about this is that you have artists who are doing this. And the artists are completely decoupled from this process. So like that initial designer showed you here with the Great Wave. So Hokusai, he designed this image. He drew up a picture. But then that was commissioned by a publisher. He did not carve it or print it or any of that himself. So the publisher actually paid him a small amount of money and went to the publisher. The publisher then contracted out to woodblock printers and woodblock carvers and did that all themselves. And this is actually kind of like how copyright was managed during this time. Whoever owned the physical woodblock that the image was carved into owned the print. So just an example here. There were a couple of types of imagery that were very, very common during this time frame. One of the most common was pictures of beautiful women in courtesans. So one of the reasons for this is that... So I showed the picture of Tokyo earlier. What happened in Tokyo is that when the Shogun came into power in the early 1600s, he demanded that every single feudal lord, no matter where they were in Japan, had to come and live every other year inside the capital. And as a result, so like you have all these lords who are coming and bringing all their samurai and all their retainers and everyone and literally marching to Tokyo, building up their residencies, you can see here. And what happens as a result is that you have... Tokyo goes from a tiny little fishing village to like a massive million person city in the course of just decades. But one of the things that happens as a result is that there are lots and lots and lots of bored samurai who have nothing better to do. And one of the things that happens as a result is that you have, for example, lots of courtesans and lots of beautiful ladies and people love these imagery. So like I just want to show a couple of examples. This is a courtesan here in a procession with other courtesans. Another piece of imagery that was super popular was depictions of Kabuki theater. So Kabuki in the day, there were three main Kabuki theaters in Japan and everyone, you would go there and it was a major, major event. You would go early in the morning. You would have like a box. You would sit in with your friends. You would eat food. It was much more raucous than your typical theater performance. People would be shouting and talking and whenever their favorite actors came on, they would shout out that person's name and all the actors had fan clubs as well. People would be waving banners and it was like a super intense event. It's sort of like a sporting event meets theater. And as a result, these actors were super, super popular. I don't know what the best analogy is, but they're kind of like, I'd say they're sort of like the Brad Pitt of the day, where everyone knows who these people are unquestionably. And so people would buy the prints depicting them because then you get to see your favorite actors and you'll have this momentum and remember them and all this stuff. So yeah, you end up with all sorts of just really, really interesting dynamics there. And I love some of the costumes as well. It's absolutely fantastic. Another common source of imagery was depictions of warriors and sort of myth. So this is coming from a Chinese tale here. And here you have a warrior. I love the tattoo on his chest and stomach. So this is done by an artist named Kuniyoshi. And if you ever see those like what the Yakuza mob and they have like the crazy full-body tattoos and stuff, that comes from Kuniyoshi's tattoos that he designed in prints. So they're cribbing off of that, which he invented. And then you also have, you know, like Sumo wrestlers, Sumo wrestling was still a thing. And of course you have depictions of nature. So we're sure the great wave. One thing that people sometimes don't notice is that you have this giant wave. You have the little boats inside there. You can actually see people hanging onto the boats or being tossed around hanging out for their dear lives. And way back tucked in there is the Mount Fuji way in the back. And so this is actually from the series 36 Views of Mount Fuji. And every single print in this series has a different depiction of Mount Fuji. Now granted, in this one it's incredibly understated. It looks like a wave, obviously intentionally. And then you have all these other depictions of nature and fish and all sorts of stuff. So I just want to bring up a couple really, I think, crazy prints because I really like them just to get you kind of interested. So this one here... So everyone can see it's cats. You have all these cats. And so I'll totally say Japan was weird before it was currently weird. And so these are all cats. They're dressed up in kimonos. And they also have incredibly distinctive faces. They look more human-like than they do cat-like. And this was created during a very, very particular time in Japan, early 1840s. And the reason why this print exists is that whoever was in power at that time actually made it illegal to depict kabuki actors in prints. And so one of the ways around it was having a whole bunch of just cool cats who happen to dress in kimonos who just so happen to have faces that look a lot like kabuki actors. So the thing is that people who knew the kabuki actors and knew them from the plays and other prints could look at these and be like, oh, that's Ichikawa Danjuro. And you can tell by the way the face looks, but it's actually a cat here and this is great. And so this is a way to subvert censorship during this time. And people really liked this. It was interesting is that even after this censorship was lifted, these sort of prints were still made because people were sort of like the puzzle of figuring out who was being depicted. This is another one. So you have a face there. I just want to zoom in a little bit. So if you look at the face, the face is actually, you can tell it's made up of bodies. So there's like the nose is a body and the ear is a body and the head is made up of a couple bodies. And just to zoom in a little more, you have here a cloaked, it's actually a catfish shooting like lasers. And just to kind of unravel what's going on here, because again, this is one of the things I really like about this art form is that there is so much particular imagery and there's so much to learn. I feel like I will never ever learn at all. But this particular one, this is a depiction of an earthquake. And the reason why is what you have here. So there was a major earthquake in Tokyo and the face is made up of the bodies of the people who died. And the cloak that he's wearing, you see there's patterns. There's like wood and tools and all sorts of repair supplies from where they're rebuilding after the earthquake. And the catfish is actually the depiction of the earthquake itself. In Japan, catfish were commonly associated with earthquakes. Now, the reason why there is this very weird imagery is that it was actually illegal to do any depictions of current events during this time. So you would end up with this, again, very particular imagery that is actually talking about the thing that just happened in Tokyo, but it's not talking about it. So again, there are all these ways to sort of subvert the government and try to get around these restrictions. Okay, so I wanted to bring those up just to kind of give you a quick taste. And so to kind of dive into my interest and my sort of problem is that, so here we have a whole bunch of woodblock prints. These are much, much later, 1890s or so. And sometimes prints like this come up at auction. They come up for sale in this case. There's a lot of 55, 20 Japanese woodblock prints, each depicting a female-slash-gay-shove figure with a coloring throughout each print. Now, what's interesting is that many auction houses don't have the staff or the ability to figure out what they have. I think a lot of people respect auction houses as sort of an authority figure. Some of them are, most of them are not. And so one thing that happens is that you end up with this where you have a giant lot of prints and they have literally no idea what it is. They've correctly guessed that it's Japanese, good for them, and that it's depicting female figures. And other than that, they have no idea what it is. They estimate it at $400 to $600. So the problem then becomes is that, you know, if they have no idea what it is, how do you, as a layperson, figure out what it is? And in my case, one of the things that I've done is you have to acquire and read lots and lots of expensive books. Almost none of these books are cheap. Like, I think the most I've ever paid for a book per page was a dollar per page. I mean, this is actually pretty common. It's absolutely ridiculous. And it's, so like this is just some samples from my bookshelf of all the different books I've just bought and read and read and read to try and understand this art form in the time period. And of course, a big part of this is that you have to learn how to read Japanese. Now, the asterisk here is that you need to learn to read Japanese from the 17th and 19th century. This is very, very different from current Japanese. You just can't help on Duolingo and be like, oh, let's learn some Japanese. It is completely different. What people will speak now and what they refer to now and how they write it now is completely different. And of course, yeah, so yeah, you're not going to learn this from Rosetta Stone. And on top of this, you have to be able to learn to read Japanese calligraphy. And this is the sort of thing that really only experts can do. Like postdoctoral, like they've spent their whole, like they, you know, like it's interesting, like, you know, sometimes go to Japanese lectures. And even within the lecture of all these experts, there's only like one or two people in the audience who can actually like translate like Japanese poems and stuff like read it and translate it. So it's incredibly hard skilled. So I've kind of chalked this up to be like, I am never going to do this ever. It is not worth my time. Whereas like, I can use technology to work around a lot of this. And so I built this website called ukioa.org. And I also have woodblock.org and a bunch of other domain names. But the, so what this website is, is it's an aggregation of Japanese woodblock prints from many different museums, universities, auction houses, dealers, anyone. And I pull in all these images of prints into a single unified database. And I put it up online and I make it really, really fast. I also completely index all the text so that way you can search for cat and get all these awesome prints that depict cats. And I also internationalize everything. So that you can actually search the website in both English and Japanese and get back listings in English and Japanese. So I actually take Japanese databases and translate them to English. English databases translate them to Japanese, which is really, really hard. I'll get into that. So yeah, so like here's the Japanese version of the website. I can't currently read or speak Japanese. So I do a lot of cheating. One quick tip, if you want to internationalize certain aspects of your UI. Like for example here, there's a button that says search, essentially. And I'm like, because there's many, you can put it in Google Translate and get back different ways to translate that. But not all of them mean like search from this text field kind of, you know, affordance. And like, so I was like, well, who has a search button? I bet Google has a search button. And you go to Google and see what's on their search button on Google dot, you know, on the Google Japanese site. And you copy the text and you bring it over. And you can cheat pretty far with that. So yeah, one thing I'll note is that actually the Japanese version of the site is more pop. Right now the popular site is number one in Japan, then Europe, then the US currently. Which is, again, it's an interesting dynamic. So I want to talk a lot about, I'm going to, as I go through the talk here, I'm going to talk about different modules and pieces of code that I've written. Virtually everything I've written is open source and up in my GitHub account. So like I've written a node module for doing easy string substitution and so that you can easily translate strings. So for example, you end up with sort of this map between an English version of the string in my case, the Japanese version. And you can port this to multiple languages. And this is really easy to pass to a translator. You can be like, here, here's this JSON file. Just, you know, type in all the strings and you're done. The architecture of the site. So this is how it currently is. I'm posting the main site on DigitalOcean. And I'm using Node for a lot of this. And I'm storing most of the data in MongoDB. And I'm storing the search stuff in Elasticsearch. One of the big things that was really important to me was making sure that all, that the site was going to load very quickly for those who weren't in US. So part of that was that I put all the images up on to Amazon CloudFront, which is a CDN. If you're not familiar with a CDN is a content distribution network. So these are servers that are positioned all over the globe. And in case of Amazon CloudFront, they have servers in Japan, in Europe, in the US. And you can put your images onto there and it will load very, very quickly. So again, I wanted to make sure the site was going to be very, very fast, no matter where you were. Okay, so one of the things I've also built is that you see these little images down here. So the images at the bottom are representative images of a particular artist. Now one of the features that I added was this ability to, you can hover over an artist and scrub through prints that they've created. Because the thing is that it's really hard to get, I feel like it's really hard to get a sense for an artist's total work. Because it oftentimes spans many, many decades, many, many different pieces of subject matter. And so I just wanted to show that again. So again, I made this plugin so you can move your mouse over and just quickly just scrub through and see. And you're like, oh, okay, I got a pretty good idea what this person makes. There's some nature stuff, there's some birds, there's a lot of landscapes, I've got it. And that's something that you can tell in a matter of seconds as opposed to having to click through the page, look through the results, all this sort of stuff. So I made that as a jQuery plugin and that's available online. So one of the big problems with this is, you know, I'm aggregating from all these institutions and I currently have about a quarter million images of prints. So I'm by far the largest public database of Japanese woodlock prints online. One of the challenges with collecting all this information is that it's really, really hard to do. Obviously, you know, you're getting into scraping. Now one of the problems here is that many different museums and institutions and universities don't have the best websites. This is incredibly common. I mean, at this point, like, I'm perfectly content scraping websites that are nothing but tables. I'm so used to it, I'm like, oh, tables and font tags, here we go. But like the, at least their website works. And that seems to be a pretty large hurdle. So like I wrote this Picker Scraping Framework and what it does is it works in a very, I think, unique way, which is it's designed to be able to easily navigate like search result pages. And so what you can do is you go to a page of search results and it goes through, it behaves like a human. It's actually using a PhantomJS, which is a headless webkit. So it's, you have a copy of a webkit, which we, you know, use in Safari and Chrome in the background. And it's actually doing things like click, it clicks a link, it navigates that page, looks through the page, hits the back button, goes back. And this is what it's doing. It's going to the search result page, clicking, downloading that page, going back, and it keeps doing this over and over again. So it's literally behaving like a human would. And this is important because a lot of these websites only really work in this way. And it also completely works even if the website is, you know, rendered with JavaScript or what have you. So it ends up with this giant pipeline. So I just want to show this because it ends up being kind of crazy. So you end up going to this website and it's going through Webkit and PhantomJS. I use these other libraries to interact with it from Node. I take the data, I save it, dump it out the XML files into a MongoDB log, process it, process the data, put it back into MongoDB. And the end result here is I end up with all these sort of chunks of data that are representing like images and artists. Just to show you an example of a module here. So this is actually a little scraper script that I wrote. And one of the things I do in the test is I actually scrape my own website just as a sanity check to make sure that it's working correctly. But this is the entire script and it's capable of scraping all the data off my website with just these seven lines or so. Just to kind of give an example of the kind of content that would come out. This is like an example chunk of data that would come out of a web page. Nicely formatted JSON and very easily managed and dumped to a database. So this is online, the scraper framework if you're interested. I'm very bad at naming things. So I call this stack scraper unoriginally. You probably tell I can name things jQuery and stupid stuff. So you can find that online. You can also find all the scrapers that I've written as well. If you're interested in stack scraper specifically let me know and I can try to document it better. So one of the big, big things I've been doing though has been doing computer vision work and image similarity analysis. And just to show an example, one of the services I'm using here is this service called Match Engine by a company called TenEye. And Match Engine allows you to do stuff like this where given an image it'll find all the similar looking images within the corpus. Now this is super, super useful. So in the case of woodblock printing they're prints so they're going to be multiples of them. You know there wasn't just a singular copy made. So what ends up happening is that for example here this is a print by a hooker side that's in the Metropolitan Museum of Art. And this is the same print in many other institutions around the world. So you have it here in the Museum of Fine Arts in Boston, the British Museum in San Francisco, Honolulu, Tokyo, Kyoto all over. And so what's happening here is that because I'm using image analysis on this I can completely ignore all the metadata or lack of metadata that is associated with these images. Because the thing is that frequently institutions they'll say oh this is by this certain artist and this is the title but they get a lot of that wrong very frequently. And that's part of what I'm trying to fix. So the nice thing about this is that again you end up with cases like this where you see down here in the bottom there are even prints with color bars inside the image and a black and white of photos of the prints. And it's even capable of matching in those cases. I'll show another example here. It's a print in the Art Institute of Chicago. And what's interesting is that this print, so it's a diptych, it's a two panel print and they actually put it backwards. So like each of the prints should be on the other sides. And if you look it actually matched it correctly in the Metropolitan Museum of Art down below where they did put it in the correct order. Now I mean you can't really blame but this sort of stuff just happens. You're taking photos, you're like alright there we go photo done. But the nice thing is that this technology is capable of ignoring that. Where you can now be like oh even though this is a two panel print here it's able to find the individual panels that are in other institutions. And again this really really helps scholarship and again is able to handle different variations in color and black and white. And one of the things, one of the cool features I want to show is I have the ability to take images of a print and be able to quickly toggle between them. So this is an example, so you ever do like those eye spy things where you have two things side by side and you try to spot the differences. Like this is what print scholars do on a professional level. I kid you not. Because the thing is that if you can spot differences in a print you can tell which edition they are. Roughly speaking like this came before this. And so just to flip through them. So these are two prints that are the same print for two different institutions. And the nice part about the computer vision work is that you can do the analysis and get back a matrix essentially saying these two portions are the same. You skewed this one to match this one and you can line them up perfectly. So I'm using, I'm just doing this in the browser in a canvas element. And so one thing you note here I just want to point out. So look at the very top of the image, top of the print. See how one's missing a giant chunk and one isn't? I mean that's kind of important to know that like one is actually missing about an inch. Whereas you can see in this one that this other print still has it. But again that's the sort of thing they can really only tell by starting to do this direct comparison of one print with another. And this is why it's so useful to scholars. So one thing I'll mention is that I've actually, when I started the website I started by scraping institutions just myself. I didn't ask for permission first. Which is probably a good thing because I'm sure if I asked for permission I never would have gotten it. Because it's sort of like hey I want to do a couple tens of thousands of requests and download these massive gigs and gigs of images off your server. Is that okay? And yeah like no one's going to ever say okay to that. But what's happened as a result is that the website is out there and people have been using it. And a couple of interesting things that happened as a result. One is the institutions have actually been contacting me saying how much they appreciate it. They're absolutely thrilled that I've built this. And the other thing is I now have museums coming to me. And be like how can we be a part of this? How can we be included? Because the interesting thing that's happened is that if you search for UQA or Japanese Woodblock Printing online. I am number two behind Wikipedia. And it's like so like now everyone's kind of going to my site as like the source for this information. It's like one of those things like if you aren't part of the database now no one knows that you exist. Which is a weird thing. So anyway I'm trying to be accommodating but it's a lot of work. But one of the best features and the reason why I made this site was to be able to do image similarity search. And what this does is you might be familiar with this like Google image search or something. There's a form of site that says search by image. You can take a photo of a print with your phone or what have you. And it'll find that print in all different institutions. Now as I said this is why I made this. Because if you're looking at this print there's tons of text and there's details about the artist. But if I can't read them then it becomes really really hard to figure out who it is and when it was by what the print is depicting. You know there are books for example there are books of just Japanese names like signatures. There are books of nothing but publisher seals. I own those books and you can just sit there and like page to them and try to figure it out. But wouldn't it be just a lot easier just to take a photo and just find the print in the institution that has already done this cataloging and research. So that's what I did. And this is the sort of thing that has been monumentally useful both to myself and other people is because now it's cut the research time from literally hours and days down to minutes. Where you can just take a picture look at what the Museum of Fine Arts says and just be done. So this is something I'm super excited about and super pleased with. There's one gotcha here with the image analysis I want to point out. So like this is the case with an image where there's like these giant color bars on the side a gradient bar and a color bar. So it did correctly match the same print in other institutions. However it then matched a different print at the same institution. But what it was doing is actually matching the color bars themselves not the image. So one of the things that I did is I've built this software for doing cropping and annotation of images. So it's actually a mobile application. I'm its only user at the moment. And what we can do is I built this for two reasons. One is to be able to crop out the prints from an image and just remove the color bars and stuff like that. But also the fact that when I live in New York City so I spend a lot of time taking the subway to different places. And that's a lot of time I'm spending not being productive. So I created this cropping application so that while I'm riding the subway I can be doing something. In this case cropping Japanese bulldog print images, removing color bars. And it's super useful. I mean like I can go through a couple hundred very very quickly. So I've been working on this tool and I've also been using it for starting to use it now for image annotations. Because one of the things I think would be really cool, and again this is my definition of really cool, is going through every print and drawing a box around every single head and every print. So every depiction of a person. And I want to do essentially face recognition on these depictions of people. Because the thing is that these people were actually, many of them were real people and they were using certain stylistic affordances to let you know that this was this person. And so you can do analysis on that to figure that out. But you can also figure out a lot, there's all sorts of symbolism in these prints to let you know who this person is. Like at least from my end, looking at this, this is absolutely a courtesan. The way the hair is done, the way there's pins in the hair, all these sorts of things. But the thing is that if I have thousands and thousands of these, what I want to be able to do is take all these little heads and have people do crowd sourcing and start cataloging them. So making little games and stuff so you can be like oh there's pins in the hair and there's all these sorts of things. And as a result you can start to figure out the subject matter of all these prints through reverse, essentially. So yeah, essentially it's going to end up kind of being like a Facebook for like Japan in the 1820s. But we'll see. So I've been working on this and as it turns out this particular application is really useful for other people as well. Working with the New York Public Library, who you'll be hearing from, working with Mauricio. This is the sort of, we want to be able to have this offline mobile experience for doing crowd sourcing. So hopefully we'll be coming up with a nice cool application for this sort of stuff here soon. I gave a version of this talk recently and afterwards a couple days later someone emailed me and he had done some work to make a script which went through and actually detected some color bars and images and automatically cropped them off. So I drop in a link there and if you want the details on it I can email you. So this is done by David Chester at Shutterstock in New York. And so obviously this is super useful and the problem that we've been hitting, so we've been working on this, is that it works great for some cases but there are lots of really, really weird cases. And those are cases where you might just have to fall back to doing it manually. But yeah, so there are being advances done in this now. So one of the things I think is really cool is obviously I built these tools for myself to be able to improve my research but I'm really interested in being able to move the study of Japanese prints forward on the whole. Like just be able to help researchers and help them with discovering interesting things. So I want to show an example here. So this is one print. I'm going to toggle back and forth here. Can anyone tell me what these, it's another eye spy, right? What do you see different between these two prints? Colors. The face. Insignia. Okay. Patterns, I heard. So all those things are true. So you can tell though that a large portion of the image are the same. The leaves are the same. The general flow, the clothing is the same. So they're obviously very, very similar. And this is actually getting into the dynamics of woodblock printing on the whole. So the woodblock printing, it was wood. You had a piece of wood. And what actually happened is that this particular woodblock, they printed a whole bunch of copies in this case. This is a Kabuki actor. Kabuki actors, there was males depicting both males and females. And I don't have time to get into it, but you can tell it's a male depicting a female. There's the purple patch. So all male Kabuki actors were, by law, had to shave the top of their head to make them less beautiful. It's a long story. As a result, they cover up the bald patch with a little purple patch that make it look like they're more like a woman anyway. So the thing here, though, is that this is an actor in a Kabuki play. And what happened, we can assume, is that the actor probably left the role, or the role changed, or something like that. And what happens is that these physical woodblocks were then sold to another publisher. And when the other publisher got it, then physically chopped out the head, because it's wood, put in a new chunk of wood and carved in the new face of the different actor playing a different role. But the interesting thing about this is that stuff like this can be kind of hard to find manually. But for image analysis, this is super easy. Because when you have 90% of the images, it's all completely the same. The only difference is the tiny head. Yeah, it's obviously the same. I want to show another case here. So let's go back and forth. So again, this case is, you can see, everything's virtually the same except for the head. However, I also want to show this. So in the bottom right here, you have a signature. This is the signature of the artist. That was carved into the woodblock as well. And if you toggle back and forth, the signature is actually different. So this is a really interesting case. And as far as I can tell, this is unique. I haven't found any other cases like this, nor has any other researcher I've talked to. And this is a case where the subject matter change, it's a kabuki actor, but it's a different kabuki actor. But additionally, the publisher or someone removed the signature of the artist and put in the signature of a different artist. I don't know why. That's what I've talked with so far. But this is really, really interesting because one of these prints is in the Metropolitan Museum of Art. The other one is in the Art Institute of Chicago. So they're by different artists depicting different actors in different plays. So there is zero metadata linking these two things together. It is physically impossible for any traditional means of finding these things to match up. So this is a case where I built this tool. And the reason why this is covered is because of this tool. So I've been working with the Metropolitan Museum of Art in New York on finding ways to improve the attributions on their prints. So this is just an example here. So these prints on the left are all ones in the Metropolitan Museum of Art that were labeled as unknown. They did not know the artist who created them. All the ones on the right are all in different institutions where that other institution knew who the artist was. So again, this is like image analysis coming into work where you can be like, okay, these two things are obviously incredibly visually similar. However, the Met here says that they don't know who this artist is whereas the other institution says it's Kuniyoshi. And in fact, you have three other institutions who all agree and say it's Kuniyoshi. But this is a case now where you can start to go back through and be like, hey, Met, let's update the database. Let's go through and actually fix this and improve the cataloging now. And it was through this that I actually discovered this because these two disagreed on the name of the artist. And when I looked at them, I'm like, wait, they're actually different prints. And so it's interesting. But again, once you start developing this stuff, it starts to come out. So a big, big part of what I'm doing here is I'm aggregating this data from all these institutions. But a big part of it is correcting all the print data. And there are lots of mistakes. I think one of the most challenging things straight out is the fact that there's no consistency in how Japanese names are written. Artists change names incredibly frequently. And then obviously in English institutions, they write it one way. In Japanese institutions, they write it a different way. All these are valid, alternate names for the same artist. In fact, I could fill multiple pages of this. And it gets really, really tricky, especially when you want to start translating artist names from English into Japanese and back, from Japanese back to the English. So you start to think, well, okay, if I had a good mapping, if I knew, if I always knew that Ando was this in Japanese and that in Japanese is always Ando, then I could build a mapping database to go back and forth. Well, the problem becomes, though, is that Japanese and English is not a one-to-one mapping. Okay? I should also mention this is how I'm learning Japanese, is by building these tools and figuring out this weariness. So for example, here, we have Ando, and all of these are valid ways to write Ando in Japanese. All right, so you can't go English to Japanese. I'm like, okay, well, that's not great. Well, can I go backwards? If I have the Japanese, can I go back to English? No, you can't. This is, again, a viral version of Ando, and those are all different ways of writing the same thing in English. So it is in complete mess. It's not easy to port these things any which way. So I've written a whole bunch of different node modules to fix this. These are going to be of interest, I'm sure, to very few people, but I just want to go through them. So one of the modules I wrote was, so this one already existed, and I worked on improving it. It was called Hepburn. So there's a form of what's called Romanization. So they're taking a Japanese phonetic alphabet and converting it into English and vice versa. So you can do something like this where you convert the English form of a name into the phonetic Japanese version of the name. EnomDect is this Japanese database of Japanese names. So you end up with a Japanese name, whether it's a surname or a given name, whether it's male or female, and how to read it in both, and how to read it in English. Which is super useful. And there's also this other database called the National Diet Library Name Authority. National Diet Library is essentially the Library of Congress for Japan. And what they have is a massive database of all these different people and artists inside Japan, and you can go through and start to do mapping. So one of the things that happens as a result here is when I dump it all into this one tool I wrote, a Romaji name, is you can take a name like this. So you can see the original there on the top. And so they wrote a Charaku, a Toshisai. And the problem is, though, is that they wrote it wrong. One, they wrote the name backwards. So in traditional Japanese, you always write the family name first and then the given name. Whereas in the West, we typically do a given name first and then a family name. And so one, that was flipped. So I had to figure out, go in and using the different databases, figure out which is the given and which is the surname. And then once I have that, start to correct sort of the phonetic issues with it. So you have the correct stress marks and all this sort of stuff. So this stuff really, really bothers me because like everyone messes up and when I was trying to learn this stuff, every single website on the internet disagrees on this. Like no one is consistent at all. Wikipedia is a mess. There's no one place you can go for this information. So I'm actually in the process right now of developing a database of Japanese artists. And one of the things I did as well was also fixing and parsing of dates. So museums have this very particular way of giving a date. There'll be like 18th century, 1820s. It's always fuzzy dates. It's date ranges. So I wrote this library for parsing these fuzzy dates and turning them into a real date range. So that way if someone says 1820s, it goes from 1820 to 1829. And then I put that in the database and you can query against it. So as I mentioned, I've been working on building a database of Japanese artists. I think currently the largest database that I know of has something like, I don't know, like the largest Western database has something like 70 artists in it. I think mine's gonna have about 4,000. And all with the names in English and Japanese, all the artists, different artist names, when they're active, when they're alive. And one of the hard parts about this is that when you're, I'm merging all these databases together is that you have to do what's called rectification. So figuring out if you have two different records, which one is the right record and if they are the same. So I actually built a tool for this so I can go through on the command line and look at an artist and be able to make these determinations like, oh, are these the same person? Could they be merged together? All this particular stuff. So in this case, you can see that these two are probably the same. It's just that they disagree on the dates a little bit. So to go back really quickly to the case I showed initially here. So this was, so there were all these prints that came up in the auction here. So I was curious and I went through and I took one of the pictures and I dumped it into my website. And sure enough, there were the same particular prints that were in different institutions. So as a result, I was able to figure out the name of the artist when it was printed. In this case, 1897. It's sold for $550. So $550 for 21 prints. But what the interesting thing I've discovered is that these prints individually sell for $100 to $400. So that means the true estimate for this is about $2,100 to $8,400. I should say I did get these prints. So yeah, exactly. This is an arbitrage case where auction houses don't know what they have and other people don't know what they have. And so you can go through and find these edge cases. So one of the things I actually do is I sometimes go to auctions and take my phone and take pictures of the prints that are for sale. And you can just see, and it gives me like, oh, that's that. Oh, this is a good impression. Yeah, that's worth 20 bucks. And it's like, because that can be worth a lot more. And now the one major caveat here is that it is theoretically worth $2,100 to $8,400. However, you have to find someone who's willing to buy them. Which means I would have to send you to turn into like a woodblock dealer, which is a whole other thing. But for now, I just think they're fantastic prints and I just like owning them. So, very, very, very quickly here. I want to go through, and I want to talk a little bit about how I've been extending this to other art forms. I've been collaborating recently with an institution in New York, the Frick Art Reference Library. It's part of the museum, the Frick Collection. And the Frick Art Reference Library is one of the best art history reference libraries in the world. And one of the things they have is a massive photo archive of 1.2 million photos of art. So it's not art themselves, it's photos of art. And so I've been doing image analysis on their collection, particularly on a collection of anonymous Italian art. So this is all Italian, mostly Renaissance, in these cases where they're all untributed or they don't know who created it. And so I'm able to find all sorts of cases where, for example, you have this case, so this is an archive, and these are photos that are literally being catalogued in separate places. So in that, like, there's one photo, and one person went, oh, this looks, you know, 14th century Milan, boink. And this person, sometime decades later, someone's like, oh, this looks 15th century Roman, boink. And like, the thing is that since they're untributed, you can't, it's nearly impossible to merge these back together again, unless you have just a human going through. So this is a great thing that image analysis can do. We can go through and be like, okay, these two things are way over here, but why are they, they're obviously the same thing. So I was able to do a lot of this, going through and finding cases, even when the lighting is different, finding cases where you have one image as part of a larger image or part of a mural, cases, you know, color versus black and white, you know, a little, a segment of a much larger, fresco, and really cool cases, like, cases, and as far as I'm able to determine, before and after conservation. So cases where, you know, paint had been repaired or things had been removed, and like, really dramatic cases like this, where large chunks of this fresco have been repaired, or it's either, so this is, I mean, this is the case where scholars can go in and figure this out. Either it's, this is after repair, or this is after it was damaged, you know, because it could have been in World War II or something like that. And you also spot, it's really good at spotting copies. So for example, these are two images that look very, very similar, but when you look at it, the faces are slightly different, and the little globes that person's holding is different. And that's because these, both of these paintings are actually copies of a Da Vinci painting. So they're, you know, and you also have cases here where like these, where they look very, very similar. But again, I guess it's like frolicking babies always love crazy stuff. But again, like you look at it, it looks very similar, but then you look at it like, okay, there are differences here, but they're obviously inspired by each other. So again, that's a really interesting thing for scholars to be able to dig into. Another thing that I've been starting to look into is I'm really interested in, like if you have two photo archives, okay, you have the Frick, so one thing I've been doing is I've been working with the Frick in New York and the Zuri Foundation in Italy, and they both have massive photo archives, finding ways to merge these two photo archives together using image analysis. And one of the things I've been doing is I've been doing image analysis and graph analysis. So I've been using this tool called Neo4j. It's a graph database. And I just want to show an example really quickly of the kind of queries you can do, because I think it's a really cool environment. So I should say a graph database, it's different from a relational database or, you know, other types of databases, and it's purely designed to have nodes and connections between nodes. And so you're querying sort of these graphs. And you write these really interesting queries. So like the top one here is actually kind of like SQL. So in this case you're finding all the nodes that is an artwork from the Frick and kind of returning the count of how many there are. However, you can do this really crazy stuff like at the bottom. So this gives all the artworks that have a connection to an image that matches another image of another artwork where that artwork is not the same as the original artwork. So this is able to find cases where artworks are matching new artworks. And this is like a single line query. And like the SQL for this, that would be like four pages long. I would not want to write that. And they're like with like 40 joins. So that is way, way simpler. I kind of love it. So what interesting to think about this is you can make all these really interesting discoveries. So these are cases like, for example, the Frick has a one record with an image and actually has two records with two images. However, there's no match between them. And additionally, there's the Zerry Foundation which has three images. Now all the images are matching each other except the Frick isn't matching the other records. However, because you have this sort of structure, you can then deduce that the Fricks 420, the Fricks 417 are actually the same thing. They're just showing different parts of it. And when you dig into it, you see that that's actually the case. They're just different photos of the same fresco, but just showing different parts of it. And then you end up with really interesting cases like this where all these records are pointing to each other, but none of them point to one to the same institution. And again, this is, I've written queries to detect this sort of thing in the graph database where you're looking for these edge cases where they don't point back. And if you look at the reason why is because one record has these two images and there's other two. In the other database, they just categorize them separately. But as a result, you're able to figure out that, oh wait, all four of these images are actually the same. So one thing I want to wrap up here really quickly. So one really great hack I want to say about learning art is I recommend going to, so there are auctions happening all the time all around the world, especially so in New York. And one of the great hacks for learning is the go-to-art auction previews. So you can go to a preview as just a dude on the street. You can just stroll in and they don't card you or anything. And you can go in and look at art. Additionally, you can also touch the art, okay? So the interesting thing about this, so this is me handling a quarter million dollar print. And this is the only way you can really learn this stuff. In my case, I wanted to learn about prints. And unless you own them, the only way to learn about them is to see them and handle them. And the thing is that even in a museum where they have something like this, the curators themselves aren't allowed to do this. They're all tucked away and only special handlers can handle them. So it's an amazing opportunity to be able to go to these previews. It's like a private exhibition. You just go in and you're like, I can just learn and touch and it's fantastic. So I just want to recommend that. And I want to close up by saying, I wrote a blog post about this recently, but one of the things, so this is all my side projects here. And I've been working on this for a while, but I made a decision last fall to start working on this much more aggressively. And so since last late November, I've been working on this every single day. And this has been a big change for me. It's actually really helped me to get a lot more work done. So I work less in general, but more consistently. So maybe only code maybe 30 minutes a day or so. And as a result, like since November, I've gotten just so many projects done. Done like a read write of a website. I built that cropping tool. I've done all sorts of like building all these other tools. And it's been really, really awesome. I definitely recommend trying this out as a strategy for improving productivity for side projects. Because at least for me, it's been a struggle trying to balance side projects with real life, with family, with friends, with work, and all these sort of things. So I'll end with links. So on my website, I've actually been publishing papers as well about this, about my research. And especially how can aid our historians. So I've published some papers on my website. There's the UQA site. And all my code is up on GitHub. And if any of this data sounds interesting to you, I will give you whatever you want to play around with. I would love to see cool visualizations or whatever. Yeah, just let me know. And I can send you just massive data dumps. That's not an issue. So yeah, I'll close with that. I think I have a couple minutes for questions. But thank you.