 Hello, thank you all for staying here and I'm very glad to have this opportunity to give a talk at Apache Conference in North America and today I will give a talk about Apache Commence Cripto, another view of Apache Commence. Okay, my name is Dapeng and I'm a Committer of Apache Commence Cripto and I'm also a PMC member of Apache Sentry and I'm focused on big data and the security and Shenna is my colleague and he contributed many improvements on Apache Commence Cripto and Apache Peak and he also contributed some improvements on open JDK and it's a little pity that Shenna couldn't come here so I will give this talk by myself. Okay, here's the agenda of this talk. First of all, before we discuss Cripto, it's better to take a look at Cripto graphic and then we will talk about what is the Commence Cripto and why did they create such a library and after that I will try to give a bit deeper and we will talk about the performance secret and lastly, I will show you some API, I will show you how Commence Cripto is securing big data and I also show you some API samples and I have a question, how many of you are familiar with JCE and Criptographic? Okay, no, okay, okay, so no, no, I will give a brief introduction about Cripto graphic. Okay, Cripto graphic is a big topic, so I'm going to cover all the first things. Let's only recover some Cripto primitives and the low-level algorithms. They can be classified as the following types. Semantic encryption is used for confidentiality, DES, tribal DES, RC4-AES, popular semantic algorithms and asymmetric encryption is used for authentication and K exchange such as RAC and Deep Helmets and ECC and so on and HACI algorithm is used for provide digital fingerprint for message integrity and authenticated encryption is fundamental of Cripto primitives. It is a convention of confidentiality and integrity and then random number generator and is used for or used for generate the key and the IV for the curve graphic and it has the pseudo number generator and the true random number generator and before we talk about low-level algorithms, there is a concept mode of the operations and it allows the self-output to avoid creating identical input data and there are some modes like CTR mode and CPC mode and GCN mode. Okay, let's get back to what is the RPG Commons Cripto and this is a high-performance Java Cripto graphic library which is optimized by Intel AES technology. And it weps open SL engine and it is high-performance and we use the GNI between the Java and the native and we also support the GNI and okay here is the history of Commons Cripto and Intel big data team worked with Apache hardware community to improve the performance of HDFS transparent encryption. First it is called the HDFS Cripto codec and which is based on Intel AES technology and open SL codec and it could improve the performance by 17 times over and this is a significant performance improvement and later I will also give a talk about Intel AES technology. Okay, we found other projects also have in-question requirements such as Spark HDFS. So we created a library Chimera project. We wanted the library to along other big data projects to easily contributed the advance for Intel AES technology and this project will handle the effects of the compiling of native library of Linux. Okay and we found for some reason we found Apache Commons Cripto, we found Apache Commons community would be the best place to host this project and also many PMC and commuters from Apache Commons community have improved this project and such as the coder refactor documentation, coders dialing under the release many things and let's look at the features of Apcom's Cripto and it provides low-level algorithms currently only AES right now and you can use it to encrypt your data and implement your own Cripto graphic proposals and it also provides streaming API. You can use the high-level API to encrypt and decrypt your data from stream and channel and we also provide a random API. It has a true random number generator. We say the accelerator by hardware. Okay you may have a question if we already have GCE and then why did we create another view? GCE support a lot of algorithms and GCE is also very powerful so I will give a talk about the background and the motivation of our project. Currently some Java Cripto graphic proposals such as the SASO and are using in-efficient and weak low-level algorithms 3DS, RC4, etc. and we know that Hadoop and Spark users are on very big data and if the low-level algorithms are weak and the data will be renewable to attack and what's more the performance of GCE is bottleneck. We need to take security and performance into account. GCE is impeccable for big data users. Okay then which algorithm should we choose? DS is insecure and broken and triple DS is too slow and RC4 is well known for its speed. However recently it is considered to be insecure. It was prohibited by RFC and both Microsoft and Mozilla recommend developer disabling RC4. So when it comes to security the winner is undoubtedly AES. AES is secure and it's considered as an industry standard and however the problem is that the performance of AES in GCE is not good enough for big data users. Performance does the matter here. Okay let's look at the benchmark numbers. Here is the throughput of the symmetric encryption algorithms from this chart we can see that triple DS is very slow and RC4 which is famous for its speed is a bit faster than AES in GDK8 and we can clearly see that the performance of AES in common script is outstanding and it can more than a gigabyte per second. It can be about four times faster than RC4. Okay let's look at the authenticated encryption and now the ecosystem is almost using HMAC Shell and HMAC MD5 and AES DCM mode is secure. It was designed for high performance and however nobody want to use it because it's not supporting GDK7 and its performance in GDK8 is awful and it improve in GDK9 but it still not good enough. We can clearly see that the common crypto AES DCM mode is very fast about six times faster than RC4 and HMAC Shell 1 close to one gigabyte per second. Okay in some case let's say application generates a pseudo number key and send it to client and a detector can guess the key by observing the sequence of output and therefore pseudo number generators are considered to be insecure and the random class in GDK is a pseudo number generator and it is insecure. The secure random in GDK is strong and crypto random in common script library is true number generator and both of them are strong and from the performance level common script is the winner and we can see that common script is about 13 times faster than the secure random of Java. Okay here comes the conclusion common script is secure and fast and it's a better alternative okay and using common script to secure your big data and no performance bottleneck animal and you may ask what make common script so fast okay here is the performance secret and the secret one AES AI technology AI means new instruction and there are a few instructions and let's take one of the instructions AES in C for example the specification document says the AES algorithm have about 10 to 40 rounds and each round consists of several processing steps and these steps are complex and time consuming and which AES AI only a single instruction and the CPU will do all this stuff for you it will improve the performance remarkably and the AES AI is also enabled in JVM and since GDK 8 update 45 so why is the common script still faster than GDK and still five to seven times faster than GDK okay here comes the secret two pipeline and instruction level parallelism and the one instruction takes seven CPU clock cycles let's look at the upper figure and the destination trajectory in two instructions are the same and the data dependency exists and the second instruction have to wait about seven CPU seven CPU clock cycles until the first instruction have finished and if we break down if we break the data dependency as the logo finger shows the destination rejector are different and then all the instructions can be skewed in parallel okay and there are other reasons why common script is faster than GDK and in GDK 8 AES is not well optimized and UPSR has a lot of tricky signals and you may ask why they not open optimized GVM and GVM is becoming faster in GDK 9 we can see the hardware acceleration is enabled in GDK 9 and we also contributed to patch and the patch is contributed by my colleague Shenda and parallelization optimization into open GDK 9 and the performance of AES CBC and CTR mode is close to open SL now but the bad news is we discussed in the last section and we need we need to wait a few we need to wait to adopt GVM in our software production yeah so so we we only make the data optimized it's better to use the common script and which is optimized okay here is our work of common script and Intel AES and technology and the and the big data user are major for security security and performance Hadoop and SPAC users counterweighted counterweighted counterweighted to use the common script and AES technology and the they this work are all related work and the first and the second is about the SPAC and the software and the RPC in SPAC is using common script to to encrypt the data from a wide attack and it also bring a lot of performance gain and the hdfs in quick but uh transpire no game question is also used uh Intel AES and anti technology which is boring uh about uh 17 times the performance and uh and hbase uh in hbase uh may improve the transpire in the table and the column finally may increase with common script and it also show a greater performance gain in the benchmark and we also improve the RPC increase of hbase and the performance is also very good and currently we still have a 2GRA in development uh one is open optimizing the Hadoop RPC encryption performance uh it is something like the all worker in hbase and uh and we are also working on the GRA for replacing Hadoop crypto uh codec with Apache common script okay let's see some more API samples and here is the API samples and the upper sample is the random number uh random generator uh it is similar to gdk and uh use factory to get an instance and just fill the ivy and the key and and the and the cipher the soft cipher is the same same and use the factory to get a cipher instance and and pass a mode of uh pass a mode yes and uh you need cipher and to final and close the cipher okay uh here is uh example for decryption decryption uh is the almost the same and uh adjust uh uh set different mode and this is different uh cipher cipher cipher the method for cipher is the same but the mode is different uh cipher mode is the decrypt mode and also need key and ivy uh we must close the cipher uh when we finish the encryption okay uh here is the stream sample and uh it is also uh and uh we we we can see uh the stream api can secure the stream and channel data and the stream is based on the cipher cipher is based on the the mode and currently uh how do uh how do uh using the uh sorry spark spark are using this okay and here is the status uh of uh of our project and uh and the feature work currently is the the first release uh the first release is uh is released and currently the cbc and the ctr mode is supported and uh ccm mode will be uh released in next release and uh we we will also uh planning to cover a model for this uh such as the support asymmetric key uh algorithms uh such as the rsa rsa and seta and uh and if you are uh interested in contributing to commas crypto and if you are uh uh you can create a jira and create a request on github and you can also send an email at the mail list of abaji commas okay okay it's a little shout thank you thank you