 All right. Hi everyone. My name is Yuan. I'm a software engineer at OpenGavern Products. Maybe you guys have heard of us. We work on the sixth floor of this building. Can I have a quick show of hands? How many people here have used or heard of FormSG before? FormSG. FormSG is our department's product that really allows you to transact with the Singapore government. I work for the Singapore government as a software engineer. Recently, we've been building a really cool feature that's practically hidden from most users. It's basically a zero-knowledge system. We want to transit FormSG, which is a Google-formal government, into a zero-knowledge system. We want to make sure that when citizens interact with the government that all their data will be encrypted, including on the cloud. The only people that are allowed to see the data is a form creator, which is a public officer, a civil servant, and people like yourself. This is kind of what the product looks like. It's mobile responses. Basically, the key question I suppose is what does zero-knowledge mean? Basically what happens is when the public officer creates a form and receives a private key that's never seen by our servers, it is generated in the browser of a public officer. The public officer safeguards this, but the public key is provided with the form. When the submission comes in, you're able to take the public key. This is invisible to the phone filler, like the citizen, but it will be used to encrypt the data before sending it to the cloud. The cloud has zero-knowledge of the data it's storing. It doesn't matter if NSA has a back door, it actually does that. We try and minimize the information over the network, so we make sure that only the browsers can decrypt the data. You might be submitting something really sensitive like your NRC, or maybe this is a form about maybe age sufferers would be filling in. It's really important that the data that you try to decrypt is encrypted. What happens when the public officer wants to retrieve the data is that they have to provide this private key to decrypt the responses. What does the UI kind of look like? This is what happens when you create a new form, you type in a form name, you have to download the private key, you build and submit a form as usual as a public officer, and then you decrypt the responses by supplying the private key which you can upload. All this sounds really cool, and encrypting and decrypting is, I think, familiar to many of you here. One feature that we really struggle with for a while is the so-called export CSV feature, because as a public officer, when I want to download large volume of responses, I need to be able to collate all the data that I have in the system. This bottom in the top right here is deceptively simple looking. However, it's actually really tricky to implement because it requires a lot of heavy CPU cycles to decrypt every submission. It actually pushes the bounds of the browser performance, like memory in terms of memory, CPU. In fact, when you try and do it on mobile, device actually heats up, right? And we really don't want this. It also pushes the bound of UX because there's a long waiting time if you have very large volume of submissions, you can out of memory pressures and battery life drain. So the top will mainly focus on how we optimize this using some of the technologies that the first talk was actually sharing. But that's really interesting. So what happens under the hood? When we first built this feature, we went with the standard solution, RSA with AES, right? So this is the flow that I was talking about. You have a public and private key pair, and the public key goes to the server. The user or the citizen accesses the form, and the public key is used to encrypt. And when you have to get the data, you supply the private key. Now, the problem is, when we tried to launch it last year for the National Day ticket applications, we realized that 100,000 form submissions took two hours and 40 minutes to decrypt. So that's made the solution completely unusable. In theory, it works fine. In practice, it was not usable, right? So what did we do? So we did some research, and we realized that actually RSA was like the 70s, right? There's a new class of algorithms called elliptic curve cryptography, and it's really cool because for the same amount of security, you can have much smaller key sizes. So smaller key sizes directly converts into less CPU cycles being used, and it greatly reduced the computational requirements, which made it great for IoT mobile browsers. And if you look at the NIST, back to the National Institute of Science and Tech, I think all the standards for the United States, it actually means the US government's criteria for secret classification. So when we saw this, we immediately jumped on it, and we actually built it into our product. So that changed a bit of the flow, right? What happens now is that when you visit a form that's end-to-end encrypted, you'll be the top left there where you are form submitter, and we actually generate a public-private key pair in the browser, right? So what happens is that we have to do a mathematical operation called an elliptic curve digital element, right? And what that does is that it takes your private key, the form's public key, and it generates an in-memory symmetric key. So this key in the middle only exists in memory, and it's always regenerated from the data. It's never stored anywhere except in memory, and it's transit. So what happens is that we encrypt the form response with the symmetric key. And you can see that, essentially, generation of the symmetric key is slower and compute-intensive, but the actual encryption itself is really fast, right? So the first time we did it, let's see, the first time we did it, we wanted to make it really efficient in terms of downloading and decrypting. So what we did was we used a streaming implementation where the formative database will step through each record and stream it to the server, right? And the server would receive every record and write it to a client stream, right? And the client would essentially do that flow, right, and collate it as a CSV program. Now, once we switched to elliptic curve topography, 2 hours 40 minutes was reduced to 8 minutes. So that's like a 30-times improvement. And there's still a problem, though, that you have to execute this 100,000 times. So if you have 100,000 submissions, you have to do this 100,000 times. And so what that means is that your browser would freeze, the main thread gets choked up, and so the solution was essentially that we would do streaming with a web worker. So we would offload the compute-intensive part of our work into a web worker, and so the main JavaScript thread would essentially post the data to a web worker so that it could be run on a separate thread. So you don't freeze up your browser rendering. You don't freeze up user interactions and so on. And when we did this, it was reduced from 8 minutes to 2 minutes 30 seconds. And now, once we used a web worker, we thought, why don't we have more web workers, right? If you can use one extra thread, why not use like 4 or 8, right? So we switched to having round robin across parallel web workers, right? We do the work in parallel. And this reduced the compute time from 2 minutes 30 seconds to 45 seconds. And this was all great. And we launched NDTV last year in June 2019. And we thought, okay, this solution is really usable. Let's try and get it out there for the next big event, which was the Bicentennial Notes. So just a comparison, we had more than half a million submissions for Nestor data applications. So Bicentennial Notes, we had about 200,000, right? And we thought the event was going well. It's a big event. We can surely handle the load in the browser. And the reason for that was because the form for the Bicentennial Note application looked really simple with only six questions, right? So if we could deal with half a million from the previous event, why can't we deal with, you know, 200,000 for this simple form? And the answer, well, rather the problem was that on the second day of the event launch, our users called us and said, hey, we actually can't decrypt. We can't get our data out of the system. What's wrong? And essentially the browser had crashed. So we had to just find out what happened. And to actually explain what was going on is that in the form, when you were filling it in, they actually had the slowest control to determine what were the drop-down options. So imagine if your bank was DBS, and let's say you lived in the east of Singapore, right? It would only show up the options relevant to you. So this is to avoid surfacing like 100 different banks for you to choose from. And the way the form was able to do that was actually because it wasn't a six-question farm. It had another 51 hidden questions. So essentially to provide for a good user experience, you had to step through a complicated logic to determine which form field would show up when the user selected which drop-down options. And that blew up the response size heavily. So what did we do to try and improve the decrypting performance? We went to performance profile with Chrome. And basically we saw two main problems. The first problem was that they saw two memory sites. Every time we tried to decrypt, there would be heavy garbage collection causing the GC to choke up the thread, but also that your memory growth was not coming down fast enough. So that was a problem for memory. And secondly was that the web workers appeared to be eye-linked for a very long period of time. So what's going on here? We tried to relieve garbage collection pressure. And when we took a look at the memory heap allocation, what we found was that W2 objects were traced to a stream parser called OgoJS. And what OgoJS does was that it would actually parse from a HTTP stream the objects that it could find before the download had completed. So normally in an application, you would probably wait for the download to complete before processing the data. We thought that why don't we do it as the data is coming in so that we can make it faster. The downside was that there was, in the heap space, there was a lot of objects being created as the stream was being parsed. So our workaround was essentially to have our server return new line delimited JSON. So on the left here, you can see that before the server would send the standard JSON and the client would parse for nested JSON objects, which is kind of a tricky operation. If you send new line delimited JSON, technically it's invalid JSON, but your client now only has to parse for the new line character. And this simple change actually reduced memory footprints by 10 to 20 times. Sometimes really simple, but it can work wonders. And that's what the memories find everywhere. Before we would actually have crashes when we hit the string limit or when we try to concatenate the lines of the CSV. But once you take a look at the blog constructor, you realize that, hey, actually you can simply create a blog, pass it in an array, and have the blog reference existing data. So you don't have to trigger file download. You don't have to join all the records of your CSV. Now, when you come and look at the thread profiling, we saw that, hey, you know, all the web workers even paralyzing the work. What you see is that the individual web workers, they would be doing work, but there are these large gaps in between them. So what's going on, right? We realized that there was actually a bug in round robin scheduling. And essentially the bug was that we were not distributing the work when we first received it. Instead, we were waiting for a worker to complete it before incrementing a counter. So this was a really simple round robin bug. You see that, hey, I have proof now that in the profiling that my web workers are actually doing work in parallel. So that was great. But there's still a problem. The work, if you look at this performance profile, there's still a large gap between different cycles that the web workers are performing. So what's the main thread doing? And so when we look at our main thread execution, we realized that for every 15 milliseconds that we spent decrypting, 260 milliseconds was spent in the byline library. And what the byline library was doing was finding the new line characters. So lots of time were being spent on actually streaming consummations as the data was coming in. So how do we optimize that? We actually took the library, copied, pasted a code in our own repository, and we made changes so that it would work only for our use case. We simplified. Instead of catering for the general case, we catered for our specific case. And what did we do was we realized that our encrypted data was always in Bay 64, so it could be represented in ASCII. So we removed functionality for UTF-8. And when we looked at the line splitting code, we realized that, hey, it's using RedX, which is kind of expensive. It's catering for carried returns, which we're not sending from our server, and unique code line boundaries. So these are all features that we did not need. Our server only sends new line delimited and no other characters. So we pruned all of that out. And what we realized was that it went from 266 milliseconds between decryption cycles down to 35 milliseconds. And so the total cycle, 10,000 by-centers submissions, went down to about 30 seconds from previously 40 seconds. After we released the improvements to production, our users showed up to our office and were like, hey, let's see your fix. And essentially what happened was that we didn't realize it, but they've got a really old tablet, right? And this picture goes from demo purposes, but in the government sometimes we have these cycles called tag refresh, where we improve or we swap out old technology for new ones, and not everyone gets the hardware at the same time. But with all these improvements, we actually made it work for a 4GB Windows 7 Internet Explorer tablet. And this is really cool. This means that if you can get performance down to such an efficient level and we can push this out, it means all citizens' data will be protected. Even though not all of the hardware in the government is running on top-notch Apple machines like we probably have. And it turned out that the user was really happy with the encrypted successfully in 15 minutes. And so yeah, we celebrated this win. So every change, every speed gain, every improvement we made was actually resulted in a multiplicative effect. So it's a very fast algorithm. I'll go change from RSA to a static curve. It actually resulted in a speed gain of 20 times. Using a web worker, a gain that's another three times, so on and so forth. And you can see that the multiplicative effect was that for a static curve alone, on that benchmark, there was a 33 times speed up. But compared to RSA, it's about 600 times faster. So this means that this technology is highly performant. And we realized that we did not go for any very difficult optimizations. Simply by profiling. So some learning points, I would say the number one was that we avoid premature optimization. So look at this hypothetical graph. Every change, the easiest change was to just change the encryption library. And that got us a 30x improvement. But every subsequent improvement we had to do required more and more expertise, more and more input effort. And the number one rule was avoid premature optimization. Every time you copy code from a library into your own use case, you're actually taking on the maintenance burden. And it will be easier to introduce bugs. So that's one point. The second point would be to avoid speculation and performance profile instead. Because that's the only evidence that you have. Everything else is conjecture. We had a lot of team discussions about what would it take to get this to work under time pressure in the production environment. So these are a bunch of famous lines that we said in the office. Like, oh, we need web assembly for this. Actually, we didn't. The benchmarks showed that it was really comparable. We needed to replace our stringer pens with array joins. Or we needed to pre-allocate our arrays by filling it with empty strings. So that's not actually how JavaScript works under the hood. Or we had to move embarrassingly parallel tasks into our web workers. We actually did that by the early result in marginal performance games. Which didn't make the table use focus. Thanks for listening to me. That was essentially what we did to roll out this feature. It's currently in closed beta. We're hoping to roll it out at the end of this quarter. And I hope all of you will get to use it, whether you know it or not. So, thanks very much. I'm happy to take any questions. If you want the slides available at this URL. Thanks very much.