 It's my honor today to have this opportunity to present some of our research in my group. So the title of this talk today is 3D computer vision. We're going to talk about some recent advances in biomedicine. So let me give you some background in the beginning. As we know, we have five senses. We gather information from the environment through what we see, what we smell, we hear, touch, and taste. It turns out that about 80% of the information we gather from the environment comes from what we see, or in other words, through our vision system. The original motivation of computer vision is to try to use computers, including the hardware, software to mimic the brains to capture, process, and understand the information, especially the visual information. So computer vision started in 2D because what we see with one eye is basically 2D images. This is also true for most imaging devices. Now we give you 2D images. Of course, 2D images give you a lot of information which are important in many applications. For example, in identification of a person through a face or her fingerprints or face images, we have 2D images for these purposes. So as we mentioned earlier, we use 2D images for lots of applications, including the identification of a person with face or her fingerprint or the face images. But the lack of depth information in 2D images makes it impossible for many applications like 3D quantification, 3D visualization, or 3D animations. So in order to have the depth information, we can use two eyes, or at least two eyes, or with two lens in the camera system to capture the depth information or to perceive the third dimension of the object. So with multiple views, it's possible to reconstruct 3D models from different views of the objects. Then we have two problems in our group to explore. One is to reconstruct 3D models from multiple images, multiple 2D images. Another question we are interested is to analyze the 3D models which we reconstruct from 2D data set. So in our group, we do have a number of projects going on. So we have the name of our lab is Biomedical Model and Visualization Lab. We basically take 2D images as input. The images can be 2D raw data, can be 3D volume images reconstructed from 2D images. Then what we do is to generate 3D models that can be used for quantification, simulation, and so on and so forth. Related to our work are three subjects or three courses we are teaching in our department, Computer Science Department. One is image processing. Second one is scientific computing. The third is computer graphics. So if you're interested in those topics, you're free to talk to me. Now I'm going to start with the first question we mentioned earlier. How do we reconstruct 3D models from 2D images? We do have two sub-problems here, depending on what we want to do in the problem domain. We can do surface reconstruction. We can do volumetric reconstruction. So with surface reconstruction, we only see the surface detail, either the geometry or the texture on the surface. We cannot see the internal structures through the surface reconstruction techniques. Another way is volumetric reconstruction. We can see through the surface, see the internal structures of the object. So this, of course, depends on what we really want in different applications. And also this depends on what imaging grids we are using for those different problems. So basically imaging grids can be reflective for surface reconstruction, can be transmissive for internal structures or volumetric reconstruction. With reflective grids, we can see the surface details or surface shape, which are reflected on the surface of the object. For example, in this picture, we have about 45 images of the flower from different views. We can reconstruct the 3D surface shape of the flower from those different views. So this is only surface reconstruction because the visible light is reflected on the surface of the flower before they reach our eyes. So they cannot really penetrate into the surface of this object. So that's the reflective grids based approach for surface reconstruction. Now in order to see the internal structures, we do have to consider something different, which is transmissive grids. So transmissive grids, for example, the X-ray or gamma rays can penetrate into the skin of our body to see the internal structures, like the bone structures or some other tissues. So using the transmissive grids, we can see through the surface and then see some internal structures with different intensities or with different shapes or geometries. So this is another problem based on the transmissive grids for 3D volumetric reconstruction. So I will start from the first problem, how we can compute the surface geometry by using model views of the 2D images. I will start with this simple example. So basically what we do is to detect the feature points in two of those images because we have to have at least two images to do the 3D reconstruction. So first we detect the feature points in both images. Then we find out the correspondence between the feature points in those images. So then we can do the back projection because each point in one view gives you one ray starting from the camera to the space. This one gives you another view or another ray. Those two rays will meet somewhere in the 3D space and that intersecting point is your 3D coordinates of this feature point. So if you have a sufficient number of corresponding points in those two images, you can come up with point clouds or a number of points in 3D space. Then we can reconstruct 3D surface model from those point clouds. So that's the basic idea of the multi-view reconstructing from multiple images. So we have been using this idea or this technique for scanning electron microscope reconstruction. This is a very simple picture. Here's the structure of SEM. So we have the electron stores that goes from top to bottom. We have the stage that will hold the biologic sample. The stage can rotate by some degrees from one to the next views. We have two types of detectors, one is BSE. BSE means back-scattered electron detectors. Now there is an SU detector, which is a secondary electron detector. Normally, we use an SU detector for 3D geometry because this one is better suited for geometry information on the surface or on the biologic samples. Those images give you some examples of SEM with two views. For example, this one here, we have two views from different angles. Normally, the angle between two views is from 7 to 10 degrees. We cannot have a two large gap between views or two small gaps. Otherwise, we have some accuracy issue with the 3D reconstruction. So we have a couple other examples. This one is the ash particle in the air. It's a very tiny size, but we're going to see some 3D models for these examples later. So the problem here is to take a number of 2D images as input. Again, those 2D images are taken from different views. The angle is between 7 to 10 degrees, between each two of those. Then our goal is to reconstruct 3D models from those different views of the 2D images. Here's one example. This is 3D models, which is computed from two views, only two views of the 2D pictures, from two views. Those two images are input. Then we can simply compute the 3D models or 3D details of the object from those two views. This is what we call the multi-view reconstruction for the surface. Again, this is only for surface reconstructing. The approach we are taking is called sparse dense approach because of the time I won't give you too much detail about the algorithm, but I will quickly go over the steps we have for this approach. The first step for this approach is to detect some sparse feature points in those two images. We use a SIFT, which is scale invariant feature transform, which is a popular technique for feature detecting in imaging data. So we use SIFT to detect the features in each one of those two images. Then we use a RANSAC to build up the correspondence between the feature points in those two images. Next, we estimate the matrix that will transform one image to the other so that those two images will be rectified. Because when you take the images, the images may be rotated absolutely. So in order to have a better accuracy, we have to rectify those two images so that they have the same orientation in the 3D space. So that's a rectification step for this algorithm. Then we estimate the disparity. Disparity is kind of the distance between two corresponding points in the two images. This disparity will tell you some information about the depth, as we're going to see in the next few slides. So this is a very important step in this algorithm. What we do here is to minimize the difference of the intensities between two corresponding points in those two images. By minimizing the intensity difference, we can find out the best disparity for each pair of the feature points in the images. So in order to have a better robustness to the noise, we have introduced the waiting window in this algorithm. So instead of simply computing the difference between two images, we added the exponential function into this formula. So we have a much more robust algorithm to the noise in the image. Now the next step is to, in addition to the weight we added into the formula, we also use a patch match technique to fix the problem. This problem is what we call the staircase problem, because when we take the picture of the 3D object, we simply have some slices from the farthest of the object relative to the camera. So those different slices will give you the staircase artifact. So this artifact can be fixed by using this patch match technique. We're going to see some examples later. Now once you have the disparity, we can first check the consistency, because when you compute a disparity, we do the disparity calculation for the left image by using the right image. In the meantime, we also have to do the disparity estimation by using the left image for the right image. So those two disparity maps have to be consistent to each other. So that's what we do in this step. The consistency checking between those two disparity maps. Once we do this one, then we compute the depth or the height information of the object. That's the third dimension of the object by using the disparity map, where D is what we computed from the previous steps. That's the disparity map. P is a pixel size. That's a physical size for each pixel in the image. The theta is the angle between those two views. You could have a 7 degree, 10 degree, and so on and so forth. So that would be the degree between the two views. So you can see the connection or conversion from D to H, from disparity to the height of the image. So that will give you the third dimension or the depth information of the 3D object. This one gave you a 3D animation of the example we have seen earlier. We have two images as input. This is the SEN image of the ash particle. We can see the 3D models. In the first approach we are taking here, we can see some artifact, the staircase artifact, in the depth information. So you can see those artifacts. Then we switch to the patch mesh technique. We can see much smoother your 3D models on the surface of this ash particle. While using just two views of the object, we can create our very nice 3D models. We have been working with our colleagues in our school, Pratip, trying to use this technique to analyze the surface roughness of the aluminum materials. So this is one of the applications we have tried with the technique we have developed so far. Now the technique we have seen so far is what we call the passive reconstruction technique. It's passive because we simply take whatever images we have. We cannot really deal with something different, something additional information we don't have. This technique heavily depends on the feature detection. Heavily depends on the correspondence between the feature of the two images. If you have an image in a very dark environment, you don't have much information about the features. Or if you have an image in a very bright environment, again, the features will disappear or you don't see much features in the images. So those cases will render this passive technique as a problem. So in order to fix that problem, we have another technique called active reconstruction technique. By active, we mean we can use not just the camera, we also use a projector to assist this process. So camera will take the picture of the surface. You will reflectively on the surface. We also have a projector that will cast some specially designed patterns on the surface. So you can see the patterns we have here is just a number of colored line structures. Ideally, if you have a flat surface, those line structures will still be imperial to each other. But if you have a curved surface, those patterns will become curved. So according to how the curves will be displayed in the image, we can estimate the depth. So that's the idea of this active approach. With this active approach, we can achieve better accuracy. We can also reduce the dependency of the approach to the image quality in the input. So we have been seeing some examples for some applications in dental industry. People have been using active 3D scanning technique for model scanning and also for intraoral scanning. This one is to scan the model of the teeth. This scanner, 3D scanner is like a 3D printer of this size. You can put the 3D models in this equipment. Then there's a visible light source on the top. You can rotate the platform here to take the pictures of the object from different views. Then you can stitch different 3D depth information together to form the final complete 3D models. So that's the model scanning. This one is more challenging because you have a very different environment in the models. You have much worse image quality. You also have to do this in a much higher speed because you have so many frames to process. This is like a video. So we have been working with some of our collaborators to develop some algorithms for intraoral scanning. This video here shows our algorithm to stitch different views together into our 3D models. So you can see one patch in green is just one frame. We take from the intraoral scanner. Those are given as 3D point clouds. So that means we have lots of points in each frame. We are trying to stitch together into our 3D models. So in this process, it's very critical to have the correct correspondence between each one pair of the adjacent frames. In this video or in this example, we have about 500 frames. Each frame contains about 30,000 points. In order to reduce the number of points for speed, accuracy, and so on and so forth, we reduce the number of points in each frame to 3,000 or 3K. Still, we have about 1.5 million points in this final 3D model. So you can see we have very decent accuracy in the final 3D reconstructed model. This slide shows you some analysis for the accuracy. We have the color map. The red color means higher error. Your green color means lower error. We use the model scanner as a ground source for this intra-owner scanner. So we do the comparison. We first register those two models. Then we calculate the distance between the two models. So that's how we do the error analysis for this study. Now you can see we do have average error distance, which is about 8 micron. 8 micron is a very high accuracy for dental applications. The positive average distance is about 30 micron. The negative average distance is about 25 micron. So all those numbers are good enough for most of the dental applications. So with the 3D models reconstructed from the active 3D scanning technique, we can do lots of different applications in dental or in dentistry. For example, you can do implant design based on the 3D model. Of course, in order to do this implant design, we have to also incorporate the CT of the patient to design the implant. So this is the first one. We can also do the dental restoration based on what should be covered from the patient. We can also do the TIS alignment for the patient based on the 3D models you have. So you can do different applications based on the 3D models. But 3D model is not sufficient. As I said, if you want to do implant design, we cannot just rely on the 3D models. We have to consider the CT internal structures. So that's what I want to talk about in the next few slides. We have surface reconstruction. We also have volumetric reconstruction, which is critical in many areas. So the internal structure determination has been on the market for many years because CT, MRI, lots of medical imaging devices are based on this concept. The basic idea for CT is to take the projecting data from different views and try to use some mathematics, techniques to reconstruct the 3D volumes from those 2D projecting data. So the theory has been very mature in the past decades. But we have been using similar techniques for a number of different areas. So for example, we have been thinking about the medical imaging for tissue or organ scale. We have been working with our collaborators for right microscope imaging at the micrometer scale. We also work with EN, which means in extra microscope imaging or tomography. At the nanometer scale, we have been using cryoEN. CryoEN means a very low temperature for the in-exomex scope imaging technique for the sub-nanometer scales. Now if you go further, you can go to the crystal structures using X-ray, X-ray crystal structures that can give you atomic resolution. So because of the time, I only gave you a very brief introduction to this cryoEN. How we can use imaging or 3D reconstruction for cryoEN volumetric reconstruction. So again, the concept is very similar to the medical city. We use model views to reconstruct 3D volumes. But in the cryoEN community, the signal to noise ratio is very, very low compared to the medical imaging devices. So if you look at those particles in the background, this has a very low quantity compared to the brain data or the heart data we take from the CT machines. So in order to achieve the sub-nanometer or even the atomic resolution, we have to consider some kind of averaging technique. Because averaging can reduce the noise. If you have lots of noise, the only way you can do is to use a huge amount of data to reduce the noise by averaging those data together. So we are using this technique by averaging thousands or 10,000 particles together to reduce the noise or to increase the signal to noise ratio. So that's why we call it a single particle technique. This has been a very powerful technique for your molecular structure determination. This pipeline shows you how we start from the very beginning. Of course, we don't do the experiments in our group. We only do the calculation or the software part of this pipeline. So starting from the biochemistry preparation, cryoEM sample preparation, doing the 2D imaging, data connection, 3D image processing to increase the quality of the image, then 3D reconstruction to some structure analysis. So our work is basically the data connection here. How do we detect the particles in the imaging data? How do we do the 3D reconstruction? How do we do the structure analysis in this process? This technique has been very powerful in recent years. So it's not impossible now to achieve almost atomic resolution. You can achieve almost two angstrom or three angstroms or even beyond that one by using this cryoEM technique. And this is why the Nobel Prize in cap three was awarded to three scientists last year for developing cryoEM for the high resolution structure determination of bio molecules in solutions. So that's a very important milestone for 3D structure determination in biology or biochemistry. So this will be the first question we have raised earlier. How do we do 3D reconstruction from 2D imaging data? Now the next question is how do we analyze the 3D models or 3D imaging volumes? So OK, we have two types of techniques. One is what we call the knowledge-based. Now there is data-driven. Data-driven is probably something you have heard about. This is like margin learning or deep learning. You simply use a huge amount of data to train the system. Then you can use the train system to do the work for you. So that's data-driven. The knowledge-driven is more like a traditional method. You have some understanding of the system, the problem. Then you design some algorithm to do the work for you. So both are powerful in terms of different problems or different understanding of the problems. So in the following slides, I want to give you a quick summary of those two problems. How we can use your knowledge-based method and data-driven method to do 3D analysis, especially for 3D segmentation of the models or 3D segmentation of the imaging volumes. So the first one is the segmentation is a very hard problem, traditionally. Given the image, can you quickly, accurately find out the features or the region of the features or the boundary of the features? This is a very hard problem before we actually started to use deep learning. So we have been using level-set method. This one has been considered as the best approach for segmentation before deep learning has been used in the community. So we use level-set method to detect the heart. We have a left ventricle, right ventricle in this heart of the mouse. We can segment two regions from one to your left and right ventricles in those different slices of the 3D volumes. So then you can construct the 3D models from those segmented contours on each slice. Then this is a 3D model of different time steps. You can see how the shape or size of the heart is changing over time. Again, this is possible because of the segmentation of the 3D volumes. I don't think I can play this video because it's not available on this computer. So let me. So in this video, I was trying to show the segmentation of some of the important structures, like the blood vessels, the spine structures, or the kidney and so on and so forth. So this has been based on the level-set method. You can segment some important features from the imaging volumes. But this is still the knowledge-based method. When you segment, for example, the heart of the mouse, you assume that the feature has higher intensity, higher intensity values than the background or some other features. So that's the knowledge you are using for this segmentation. You have certain understanding of the problem you have. In this case, it's simply the intensities. Now, in some other cases, you may use, for example, the symmetry. Number of cryo-yam I mentioned earlier, we can use cryo-yam to reconstruct very high resolution of the viruses or some other molecular structures. This is the 3D reconstructed structure map for Roystorf virus. We have a number of proteins on the capsid of this virus. The first stage is to segment the capsid containing a number of proteins in this 3D map. So that's the first stage to segment. In this stage, we are using the radiance information because we know the capsid is somewhere from the center of the virus. So that's the information we are using, or that's the knowledge we are using. Still the knowledge-based method. Then we detect the symmetry of this virus because symmetry is another information or another knowledge we are using for structure analysis. Once you detect the symmetry, we can segment the individual trimer. Trimer is a structure with three proteins together. So trimer segmentation for this whole map of the virus. Then we can segment individual trimmers. We can average those trimmers together to form this 3D map with less noise in the density map. Then we can further segment individual protein. We call this as monomer protein. We are using the symmetry information. So you can see this whole process is based on some knowledge, some understanding of the map. Without the knowledge, it's very hard to accurately segment the features. So we call this as knowledge-based method. So in order to do this, you have to have a deep understanding of the problems you have. So that's why when you consider knowledge-based method, you have to do lots of studies for the problems. Otherwise, your algorithm may not be working properly for those problems. So I have another video here, but I don't think it's wrong. So now I'll go into the next type of algorithm. Knowledge-based method depends on how you understand the problems. If you don't have a good understanding of the problem, but fortunately, you have a huge amount of data available, then you can switch to the data-driven approach. For data-driven approach, you normally have to construct a neural network. For example, this one here gives you a number of layers from the input to the output. So this one can give you some structure. You design a structure, but the structure are normally regular. You can use some structures that other people are using. Then use your own data to train the network. So this process purely depends on how much data you have and how much understanding you have on the data, because you have to label the data before you can train the system. So in this example here, we have trained this neural network to detect where each individual T is in the image. We can detect the upper T's, lower T's, and also we can locate each one of the T's in the image. Then we can segment the boundary, the accurate boundaries of each T's. Then this is what we call the instant segmentation of the object by using this what we call the CNN-based method. CNN is a convolutional neural network, which is a very popular network used for image processing. We use similar neural network to segment the tubes in the city of the head. Under this lower T's, we have this tube. We call this a mandibular canal extraction. So we try to extract those tubes from the city data. What we do here is we have to convert the 3D problem. This is the 3D data. We can convert the 3D problem into a 2D problem by cutting through the 3D data into a number of 2D slices. But this cutting direction is based on the arc of the T's. We cut through the perpendicular direction of the arc. Then we generate a number of 2D images. We have about 1.8 images for one host data set. Then we do the segmentation or feature detection individually on each image. Once you have the results from each image or slice, you can reconstruct 3D tube from all those detection of individual slices. This one is another area. We use the data-driven approach to segment the intervertebrate disks. You can see we have two models here. One is the current model, not the labeled model. So those are the manually segmented models by expert. We also have the predicted models in mesh like this one, or this white color. That gives you the predicted results. So those are the results predicted by our own trained network. So in most cases, they match very well to each other. You can even see the details. If you zoom in those three disks here, you can see some more details between the two models. Or you can see the correspondence between the two results in number of slices in 2D. So you can see the red color is our prediction. The yellow color is our labeled data. The match very well. We also remember the CNN, the neural network, or deep neural network, has been used extensively for imaging data, because images are regular. You have X, Y, or XYZ pixels or voxels regularly defined in three axes. That's very suitable for neural network, especially for CNN network. What if you have a 3D model? Because 3D model is given by a number of triangles. We call 3D models in triangle formator as unstructured model. Unstructured means we don't have a regular mesh. We don't have a regular grid. It's ops 3 in the 3D model or in the 3D space. Now in this case, it's harder. It's more challenging. But what we do here is to convert the 3D model segmentation into a 2D image segmentation problem. So we can simply use a graphics technique to take some virtual pictures of the 3D model from different views. We can place a virtual camera somewhere surrounding the 3D model. Then we take pictures. And those pictures will give you 2D images. Then we run the CNN technique we mentioned earlier for 2D image segmentation. Then we can reconstruct 3D models by going back to the 3D space. So that's what we do in this scenario. So first work we do is to segment the T's from the rest of the model. So that's the first one. Then we go further to segment individual T's by using different corners. So this is again using the idea we mentioned earlier. So we have converted 3D models into a number of 2D images. Then we do the 2D image segmentation, then going back to 3D models to stitch together the results. All right, so that's basically what we have for today's talk. Here's a quick summary for this talk. We have discussed two problems. The first one is 3D reconstruction. We have two techniques or two different scenarios. One is for surface reconstruction. We can simply have a 3D surface geometry by using reflective rays. Because rays will be reflected on the surface, so you cannot see through the surface to see the internal structures. We have active method. This is actually the method that most 3D scanners are using on the market. We also have a passive method. Passive method gives you less accuracy, but this is more accessible. Because you can simply take pictures of the object than do the 3D reconstruction. You don't have to have a 3D scanner equipment. So that's more accessible, but gives you less accuracy. We call this as a multi-view technique. Then we also talk about the volumetric or volume reconstruction based on the CT concept or CT theory. We have a computed tomography used by medical city or in actual Mexico or in actual tomography. We also give you a quick summary of the single particle method, which is also called cryoEM technique. So that's the first problem we have. Then the second problem is 3D analysis. In particular, we talk about 3D segmentation of the models or 3D segmentation of the volumes. We have a knowledge-based method. We have a data-driven method. But how do you choose between the two? As I said, if you have a deep understanding of the problem, but you don't have much data to use, then try the knowledge-based method. If you have lots of data to use and you have the labeled for the data, but you don't have much knowledge about the problem, then try the data-driven method. If you have both data and knowledge, then try learning from the knowledge. Because if you combine knowledge with data, you should have a better model for the data analysis. Now what if you don't have a data, you don't have knowledge? We always have the robots. So I believe if AI is today, then robotics should be tomorrow. But tomorrow is built upon today because AI is a brain of robots. So we have to first make AI work well before we can really have working robots. So that's all for this talk. Thank you very much. Yes? When you're reconstructing a surface, I'm saying two images having a different angle. When you're reconstructing a surface using two images taking different angles, is all the information that you need in the images themselves or do you also need to know what those relative angles were? If you know the angle, it's going to be easier for the carbon. But the reconstructing itself doesn't require the angle information. It's all about the two images. Because once you know the features and the correspondence between the features, you can estimate the angle. You can estimate the transformation from one to the other. So that's actually given. But you have to make sure the correspondence is accurate. Otherwise, the transformation may give you some problems. But yes, you don't have to use any other information. There's one. So when you showed that you reconstruct the teeth, it was in some green area that gradually getting the different part of the entire teeth that we had. So how you track the, I mean, every time you take some picture and then you add up together, right? Yes. So how it knows that, where it is looking for, or I mean, is there any correspondence between the two frames? Yes. OK. So this is depending on the features you have on each frame. If you have just a two-flat patch, it's impossible to match those two accurately. So you are using some correction information. So that means each frame must give you some curvature. And you use the correction information to match those two together. So otherwise, you really cannot have accurate registration between two frames if you don't have that information, the features. The user could change even in any direction that it wants. I mean, the camera. No, you have to make sure two frames overlap for about at least 70%. If you only have, let's say, 30% overlap between two frames, I mean, the accuracy may be very low. You may have some problems with that. So that's why when you take the videos by using the intra-order scanner, you have to make sure you move slowly. If you move too fast, then the overlap will be very minimal. Then you have some problems with the stitching process. Just for grins, have you guys tried using backscatter electron imaging? Why get to that to get subsurface information? I don't think we have tried that. But since those images coming from the EM lab in the bioengineering department, we are working with a professor there. So what we have been using so far is only the SE detector. Yeah. Thanks. OK. All right. OK. I have one question, actually, is I, personally, I have a dental implant. So when I saw your examples, oh, OK. So I wonder if any of those algorithms have generated any real-world application or commercially available products? This project is a collaboration with a group in China. So they are doing something in that direction. But for our case, we only focus on the algorithm, like the stitching of two frames or different frames. Those are our focus. So regarding the segmentation of the teeth, how people get the ground truth information? By drawing on the 3D model. So we ask the dentist to draw the curves by using some software. So we use that as a labeling or as the ground truth for the training. I see, I see. So for labeling, I think that's probably the common case. You always ask experts to provide the labeling data for you, because the accuracy of the network depends on how accurate the label data is. If you have some problem with labeling, then you may have some problems with the network too. So we ask the dentist to draw this for us. Excellent. Thank you very much.