 Let's start with an image, one that fascinated me for quite a while. This is the edge of the Rocky Mountains, taken by a satellite that's looking almost horizontally rather than straight down. And it's a beautiful photo, but there's something weirder happening too. If we zoom in on the airport, notice that the runways that you see at the top are perfectly parallel to each other. They're parallel to the highways and the roads that we see as well. And they're parallel because there's no perspective in this image. And if we take this perspective-less image and stretch it out vertically a little bit, it starts to look like something that's familiar and quite important to me. This is SimCity 2000, arguably the best city simulator ever made. And it uses a very similar perspective, one that is in many ways the same as Hubert or the same as traditional painting in many cultures. So why don't most aerial photographs look like SimCity screenshots? And can we fix that? And what does it mean that we don't see the world like this all the time? This is a talk about how light becomes pixels, why we see things like we do, and how we can use computers to knowingly manipulate pixels, such that we see the world from new and unexpected angles. To begin, what even is a camera? Let's say a camera is anything that captures and projects light, like a digital camera, our eyes, or a hole in the wall, a camera obscura. This pinhole model of a camera is simple but still very useful. Just from the geometry here, we can see that moving objects closer and farther away gives us images that are larger and smaller. And that's really all perspective is. The cows further down the road aren't actually smaller cows. They just look that way. So what would it take to not have perspective? It would have to be a situation where all of our projection rays are parallel to each other. This is called an orthographic projection. And it's used in drafting and in SimCity. Of course, to do this on a real camera, you'd need an image sensor the same size as what you're trying to photograph, which is pretty tricky. But in the computational world, we're less limited. So how can we do this? Well, let's notice something. When objects get farther away, the angular difference between those projection rays gets smaller. So if an object was very far away, the rays would basically be parallel if we restricted our field of view to be very small. So we magnify that center image. Now, we can do this in most 3D rendering software pretty easily. For instance, in Mapbox, we just have to change this one line that calculates a matrix that turns a 3D coordinate into a 2D coordinate. And if we change that field of view variable to make it very small, we get an orthographic render. And notice how now all of the avenues of Manhattan are rendered perfectly parallel to each other. This is also a great way to crash Chrome. You can do this in Google Earth as well, which starts to give some very SimCity results. But these are all just simulations. What can we do with images captured by conventional cameras? Well, a kind of cool thing about cameras is that every pixel in a camera is looking at a ray in a slightly different direction, depending on where it is in the image. So if we take the same viewpoint looking straight down, but move the camera around a little bit, the angle that we see things changes. Like this building, for example, we can see it from the south or from the north, depending on whether it's at the top or bottom of the image. So you can imagine a camera moving, looking down at a plane, and as it moves, what it sees changes, but it images each part of the plane multiple times from different directions. So now imagine we take all of those images and we just take one row of rays from all of them. But we take all those rows and we combine them into one new image. Now we're imaging a scene with rays that are parallel to each other, in one axis at least. And by taking different rows, we can change the angle of those parallel rays. So what does that look like? Well, we can take this input video and do exactly that. Here we take the top row of every frame. What I want you to notice here is that we're looking at the south side of every building, whether the buildings at the top of the image or the bottom. When we take the middle row, we're now looking straight down at the street between the building. But notice that something else is happening from left to right. On the left, we see the sunny side of each building. And on the right, we see a shaded side. So we have an orthographic parallel projection vertically, but we still have our perspective projection horizontally because the rays are only parallel in one axis. This is a kind of impossible perspective, something that we can only do by synthesizing multiple viewpoints. And these are a lot like light fields. So talk to me later about how we can do fun light field tricks like refocusing too, but sadly, there's not enough time for light fields. When we use real camera motion that's less perfect, we can get fun distortions with this technique. And when we use camera movement in other axes, like vertical from a drone takeoff, we get different effects, but it's the same technique. And it's a simple technique. Here's the pseudo code for all those visualizations. We loop through every frame in a video, grab a particular row from each frame, and put it in some output image. And that's it. Let's think about what we're not doing. I love OpenCV, but to make these, we don't need to do image analysis and find correlations. We just take pixels that were in one place, put them somewhere else, and let the geometry of image formation do all of the hard work for us. So we've made some images more SimCity-like, but how can we generalize it? One way to think about this is to imagine a video as a kind of flip book where we stack up all of the frames on top of each other, and then imagine this flip book as a volume that exists in the space of x, y, and time. So far, we've been taking slices in the xt plane out of this volume, but we could take slices in arbitrary angles and directions. I wrote a program recently for exploring videos like this. Here you can see it loading a video as a volume and then displaying a plane sliced through the image and what that looks like. We can tilt it around, move it, and explore what we can find in these videos. And I found a lot of fun stuff. For example, the height of buildings, their three-dimensionality breaks our projection and causes them to curve outward, resulting in a kind of vertigo. Movement in the z-axis seems to curve distant scenes towards us, bending the earth, and still camera angles produce abstract patterns of color that are broken by anything that moves, like cars or like eddies flowing in a river. So these are pretty pictures, but is there some bigger reason to care? I think there is, and I'll start with some technical motivation. Remember this image? Why didn't it have perspective? Well, it turns out that this hybrid row-by-row imaging technique I've been describing is exactly how satellites image the world. They use exactly this where you have a scanner moving across a scene. Next, some more philosophical consideration. A lot of what I've shown you today was developed at Signal Culture, a media art residency upstate that evolved from the Experimental Television Center in 1969. Working there, I discovered the work of Stena Vasilka, who used mirrors, distortion, and technology to reconfigure geographies of the American West, to free them from the tyranny of a single point of view, and to explore new ways of seeing herself in the world. I also discovered Paul Ryan, an artist who perceived video accessibility as a tool for evolutionary change and how we interact with the world. These artists were working during a time of great excitement and optimism about the potential of new technologies to expand perception and democratize culture. Video and cinema were quickly moving from resemblance to representation, just like painting and photography before them had. And today, I think artistic use of machine learning and virtual reality may be reaching a parallel transition. The tradition of single viewpoint perspective is just that, a tradition. What would a world where photography didn't have to emphasize nearness in the same way look like? What would a world where the foreground didn't overshadow the background mean? What if photographs were cut across time that simultaneously showed the spatial and temporal evolution of natural processes? Or what if we could see land next to us in the way it was being used as easily as we could see clouds in the sky? Images of the world affect the way that we think about it. And it's as important as ever to remember that these images are not reality itself. They merely represent it. Today's world is complicated and it needs all of the perspectives it can get. I hope that machine learning, satellite photography, video manipulation, social media analysis, or whatever else it is that you're into continues to help us find new and critical ways of perceiving the world. And I hope that you leave today and bang bang con this weekend with some new ideas and inspiration about how to do so. Thank you.