 Should we commence? Good afternoon everyone and Welcome to my talk. Just let me know if I'm too loud. I'm usually loud So welcome to my talk on high performance image uploads the web edition a little bit of introduction about myself I'm a binash. I am a UI engineer at flipkart. I have been predominantly working on the flipkart light PWA and I recently moved on to the ads platform there I am a gamer a sneak ahead and and Biker and you can find me at Pistedfork at 88 at Twitter So let's jump right in. What's the problem at hand? So allow users to upload images while they were writing reviews for the purchase products Okay So the first version of flipkart light that was launched which allowed users to write reviews for the purchase products that did not allow them to upload images as well and the requirement that came to us was basically to Implement this particular feature and the stock is basically going to be my team's journey as to how we tackle this problem at scale So let's first understand the requirements. We needed to validate the image before we upload Sounds pretty simple. We need to handle the orientation Because when we are uploading an image, we want to show a thumbnail. We want to handle something called an orientation But we will look into it in detail We need to resize the image before uploading This is a standard feature provided by any file upload service and nothing expects and nothing new is expected here and Finally the insatiable demands of product and engineering that can you still try something better than this? Okay, so let's do this. We have the requirements. We are all the smart minds in this room So let's try to brainstorm as to how could we potentially solve this problem? Which might seem to be a simple problem initially at scale So we'll go through the requirements one by one. So the first thing was we needed to validate the image sounds pretty simple We take the image We might check the Extension the mime type the size and based on our constraints. We either let this go or we bail out pretty simple The next thing however is that we needed to handle orientation. So why orientation or why do we need to handle this? So when we upload a file or an image, we wanted to show the thumbnail for that image So the requirement crops up from there Okay, so before we understand how to handle orientation. Let's understand like what orientation exactly are you talking about? So digital cameras phone cameras these days They have a lot of sensors and they can collect a lot of metadata About the camera or the device with which the photograph is clicked and they can inject this metadata into the image The metadata that we are most interested in is called the orientation property Most of us here We are already aware of two standard values for this property for the orientation property, which is landscape and portrait But technically there are eight possible values You can think of it like an image is top bottom left right and the mirror of these four values So that makes it eight possible values We are not interested in actually going through the details of all those eight possible values But the ones which actually impact us Are those landscape and orientation property? So to understand how this metadata is put into the image How much of this metadata can be put into the image where in the image can I potentially find this metadata? We have to understand a little bit about the standard that defines this metadata and that Met and that standard is defined by something called as Xif which stands for exchangeable image file format I'll just read out the Wikipedia definition So Xif is a standard that specifies the formats for images sound and ancillary tags used by digital cameras canals and other systems Handling image and sound files recorded by digital cameras. Okay Too much but in a nutshell what we want is We somehow need to get this information About the device and maybe things like location and then somehow put it into the image file This is what the standard defines now this information could be things like your shutter speed your ISO your focal length thumbnails even as a matter of fact For example, this is a standard picture. It is tilted Purposefully and on the right you see a table listing out a few of these properties For example, you have to make the model the orientation will get into this ISO the shutter speed focal length and all of that So this is a Subset of what can potentially be put into the image as the metadata or the Xif data Just to add that Xif data is only defined for JPEGs. We do not have such mechanisms for other file formats Now let's understand why are we even doing all of this? Like why do we need to know that we understand what orientation means? Let's understand. Why do we need to handle orientation the first place? To understand that we have to get a bit deeper as to understand how cameras actually work So there are three columns here and the first column basically says what did the camera look like when you click the photograph? The second column tells you how did the camera save that image and The third column tells you when you viewed the saved image. What did it look like? Now let's focus on the second column here It says that when you click the photograph the camera was pointing Upwards and the four and the top of the image was pointing to the top of the camera and This is exactly how the camera is going to save it and this is exactly the same how you would view it right Now let's change the orientation of our camera. I tilted my camera. I click the photograph But before saving the camera the image the camera is going to first ensure that the top of the camera is actually pointing upwards So what you're going to do is actually it's going to rotate 90 degrees anticlockwise and then save it Now if you want to view the image right away, you would actually see it in a tilted manner But what is expected is that you should be viewing the photograph as it was when it was clicked Which means that somehow someone needs to tell you that this image has been rotated by 90 degrees counterclockwise You need to rotate it by 90 degrees clockwise so that the orientation is set Okay, and this is the reason why we had to handle orientation in the first place Now once we understood the problem we thought about well Do we have any existing solutions at hand this browser? actually give us something the answer is Unfortunately, yes We have a very simple CSS property called image dash orientation and if you give the value of that Property to be from dash image It would actually read the six of information and do all the heavy lifting for you But as a picture speaks a thousand words The not so humbling support for this particular feature Made this in made this information quite irrelevant for us So we quickly realized that we have to take matter into our own hands and do all this heavy lifting for our users So we had to extract this extra information somehow So what we did was we took this file that the user in selected and we rated asynchronously and as an Arabifer and Thanks to the file reader API by the way file reader API is a dedicated a web-based API for reading files and It allows us to read files asynchronously and potentially as Arabifer's to Once we have this Arabifer as we discussed that first we need to find whether except exists for this or not Potentially whether it's a JPEG or not If it is a JPEG, but still does it have exit information or not? We have a so many use cases Which means that we have to run a Computationally intensive loop on this Arabifer to find out if such use cases exist or not and if they do where do they exist and Depending upon the situations we clearly very quickly understood that running all of this code on the main thread is not going to work So this is just a sample code for how you're going to use a file reader API You have a file reader constructor you instantiate it and at the bottom you actually Pass the file and call readers Arabifer and you have an on-load handler which gives you the Arabifer in the result property Okay, so let's now look at how this code executes when this except extraction is run on the main thread Okay, so I want you to focus on the fact that let it run through one cycle actually So we selected a few images here. Now you imagine when you click on add image You see that the image is not responsive quite a it takes a lot of time for that bottom sheet to actually appear It's taking a lot of time and it's Intuitive right because you're actually taking a big Arabifer running a loop on it You're trying to find certain information and it is blocking the main thread Now the most intuitive solution to counter this problem is Somehow if I could delegate this heavy lifting off of the main thread Which is nothing to say Stop in a worker. So let's drop in a worker You can see that if I now click the add image. It is quite responsive because you Delegated this huge loop that you're running on the main thread your worker thread Everything remains same you get the file you read the Arabifer everything happens on the main thread But the extra extraction runs on a parallel thread But nothing comes for free So when we took a call that we want to keep our main thread quite free for users gestures Maybe if the application now has to pay a different cost because we added a new construct called web worker into our ecosystem So what could these potential costs be? The first thing is that there is a creation cost of a worker There's an instantiation cost that is involved when you ask the browser to create a worker thread Which could potentially spawn a thread behind and give it back to you on a decently sized device. It is around 40 milliseconds But this is not a really really big problem. Why because Technically as a developer I could build a worker pool Where I could have a set of X workers so that I do not create workers again and again, but rather reuse them So this potentially is a solvable problem to some extent But let's discuss about the bigger problem. So imagine that you have two context or two threads You have the main thread and you have a worker thread They have to communicate with each other correct like we created them for a reason and they have to communicate with each other and This is what exactly the problem is that there is a latency cause that is going to be involved the moment You start communicating between two threads Okay for the web developers here What is the potential most ubiquitous way that two threads communicate with each other anyone Message passing message passing of course and the API would be Post message perfect. Is there any other API? No, we want to talk about this go about communicating between two threads well Broadcast channel is one particular API that you can potentially use for you know communicating between two context or two threads But it makes sense when you have a lot of context and you want to broadcast it has a broad it does it has the word broadcast We just had to come context worker and the main thread. So we just stuck with post message, okay Now the most common way that post message works is called the copying mechanism for communicating data Which means that if you have two threads T1 and T2 and T1 wants to send over some data to T2 Then T1 first has to serialize the data It has to deep clone this data using the structured clone algorithm and Then send it over to T2 where T2 will now deserlize and use it now. It works very fine But the problem is the deep copying step Goes without saying that if your data that you're trying to send over is large This deep clone is going to take a lot of time Which would mean that your post messages start lagging in right Now given the status quo of web, do we have a solution? Can we get around this? Do we have something that will actually help us move away from this particular step and yet help us achieve our intent? The answer is yes That is implemented using something called as transferables so To understand transferables you can just think about that right now we were copying the data and sending over, right? In transferables instead of copying the data what you would potentially do is you transfer the ownership of that data Now imagine the data is the block in white and T1 and T2 want to communicate that data Instead of copying this white block and sending over to context You would shift the ownership of that memory block from T1 to T2 Which means that once this transfer is complete T1 will not have access to that memory block anymore and only T2 This actually helped because you are getting rid of the entire copying mechanism and your post messages are going to be faster and Thanks to the fact that these transferables actually is an interface and post message actually implements it So there is no new API for making this work if transferables are supported in the browser You can use it right away where you can pass the last argument as an array of these transferable objects So we realized that yes if transferables are available This is the way to go and we actually did a small experiment on a measly 1.7 to 2 MB sized image and We ran an experiment three times and to measure how much does cloning takes for the post message and how much does Transferables take for the post message and as you can see that transferals actually gives much more promising results Overcloning and hence we realized that yes if transferables are supported. It's a way to go So let's have a quick recap. What did we do till now? We made the user select an image. We read the array buffer out of it We created a worker. We asked the worker to compute the XIF orientation property and give us back and in the meantime we use transferables if supported so that this process is fast and This is what exactly looks like after all of this have lifting is done on the top You see is what would have the image looked like if we did not do any of this And at the bottom is the UI from flip cut light where at the end flip card I actually understands the orientation and adjust the orientation so that it looks in the correct orientation problem Okay, so the next problem was that we needed to resize our images before we want to upload it So you can imagine that you have an image of W by H and you want to resize to some other Other size of a by B, which is potentially smaller in size So this is exactly what the exercise was that you take the file But you do not want to lose the quality over it You are not reducing the pixel count in this particular image while the resize is going on But you want to just reduce the size of the physical dimensions of the image so This actually gave us quite a big bit of boost around the reduction in size potentially around 30 to 35 percent In the reduction of the size just because of resizing On top of that we took it to the next level where we started experimenting with resampling so resampling is like up sampling or down sampling or Where you can say that this is an interpolation process where let's say we were discussing about down sampling So what you'd potentially do is you take a set of pixels and you replace that with a single pixel which is an interpolation of all those pixels goes without saying that we have a loss of data and the fact that we are introducing noise to the system and Based on our knowledge about this domain and like we wanted to have a faster algorithm Which is going to give us more the outputs. We decided to use the Hermit resampling for it So the small flow is pretty simple We take the Arab upper we pass it to our algorithm which resizes and re-samples both and then give it gives us back From the Arab upper we want to create this image finally so that we can send it back to our servers So we use the canvas APIs for that and we have two APIs called the data URL or to the blob To extract the blob or the file out of it and then the UI can use it and this is what it actually looks like in the end that on the left is the Uploaded and the right is the one after resizing and re-sampling you can see a little bit of damage around the eyes The whiskers and potentially around the leaves edges Okay, so Since the beginning we haven't discussed about the fact that this has to be truly multi-threaded where the UI is doing nothing apart from the from the user gestures But since the few last few slides we have been discussing about canvas our reliance on canvas is too much So do do these two APIs can work in tandem can they work together and the answer is yes They can with the introduction of something called as off-screen canvas where you can imagine that you had a dumb canvas element and you had the abilities of the canvas like context drawing and everything and Off-screen canvas just decouples them now you have the dumb element separate and the ability to separate Using that you can now use these abilities on an object, which is very similar to the dumb element But it could work or run in a different context like a worker thread or something So something like this So you have an off-screen canvas construct constructor And you could just call it and then run whatever operations you want to do and this could offload all of your heavy operations from the main thread into a worker thread or You could have an actual dumb element in your UI and you could create an off-screen canvas out of it Using the transfer control to off-screen API, which can then run in an up in a worker thread But the beauty of this API is these two things are now linked Any operation that you actually do on the off-screen canvas object would actually relay those changes to the actual dumb element So now let's imagine the resizing and resampling working on the main thread versus something using the off-screen canvas I want to focus. I want everyone to focus on the thumbnails here When I selected a few images you can see that when I'm selecting few more images these thumbnails Haven't actually come up yet because it is running a lot of code in the background because it is actually running the resizing of these files But on the main thread Now let's see how this looks like with an off-screen canvas. I selected those images now Let it run through one more cycle please. Yeah, so let's select a few images and Now we select a few more images You can see that the thumbnails have actually arrived much faster than the one which wasn't using this off-screen canvas And it makes sense because it is running all of this code in a different worker context So summing it up. This is what our architecture actually looks like right now is that the user selects an image After that we extract the array buffer out of it and then we pass it to an worker pool if you remember we talked about Instantiation cost of workers and to minimize that we actually created a worker pool where this array buffer is sent and for each image We are actually spawning two workers one to run the Xiv extraction and one to run the resizing and the reason is these are concurrent these are Synchronous processes these are loops these are very heavy quadratic order n-square loops and they are not related to each other So there was no reason for us to run them concurrently But rather we chose to run them in parallel and once the Xiv extraction is done It would communicate to the thumbnail API the thumbnail react component or the component that we have that hey I have this is the orientation you can adjust itself and When the resizes over then it can communicate to the network utility that hey, I have resized this file. Can you please go ahead and upload? So, yeah, so this is it from my side. I actually have a little bit more slides around them a little bit of experimentation that we actually did around web assembly where Maybe we can just take a little bit of maybe a little two minutes more time where I would actually like to attribute this motivation to our native teams Where particularly my colleague nourish and when I had when I actually spoke with him He said that you know after all of this is done by the way This is exactly how the native flow actually also works Similarly, they say that you know after we do all of this sampling and all of that We actually encode our final image into weppy to further reduce the footprint and then upload it We had already converted a file upload into a research problem, then why not let's try if you can actually try this from web as well But the problem is we don't have a native encoder or decoder API in JavaScript is exposed by the browser And hence it was pretty clear that whatever we had to do we had to do it outside the realm of JavaScript And that is exactly what the motivation of using web assembly came into picture and No amount of time can do justice to explain what was in web assembly is But I'm trying going to try my best in explaining this in one single slide, which has three steps So you choose a low-level language, which is C or C++ or Rust for now and you write your Low-level code that you want to do Then there is a set of built-in tools that you have to use To build and compile your low-level code that you have written That build tool actually is going to spit out two files for you one is a JavaScript file and one is your actual Module that is going to execute and once you have this this module is actually going to run in the same JavaScript runtime in the browser, which is actually executing your application code as well It's very important to understand that web assembly did not push another runtime or a virtual machine into the browser It actually relied on the existing virtual machine inside the browser Which right now only executed JavaScript, but now it is extended to run web wasm or web assembly code as well So yeah, so this is what our exactly our experimentation is our I think we are seeing good results But I think our only potential Callout is the fact that these JavaScript files and these module files are pretty large for now and Hence on a mobile network and on a low-end device. It has to be really really It has to provide a good ROI for the fact that you are making the user download a much bigger module So I think in the new future when web assembly improves upon the size of these modules Maybe it would actually help and bring in a lot of goodness that native JavaScript wasn't able to offer in the first place So, yeah, thank you. Thank you for your patient hearing Questions. Oh, okay. Okay, sure Okay, we can we can have a discussion offline maybe Okay