 We'll start with Ruyo Ando. Hi everyone, I'm really excited to talk here. So thank you for listening to me. And in this presentation, I'm going to talk about weird application using multithreading for analyzing a huge pickup pile. This is a tool which takes full advantage of multi-core processor and achieve high-performance improvement. Actually, is there anyone in this room that thinks that wireshark is a little bit slow? The design goal of these tools is kind of multithreaded wireshark with automated deduction. So I guess multithreading can be one of the new frontier for packet inspection. Okay, sorry. My name is Ruyo Ando. I'm working in a complementary organization. So I'm not weird. So my talk is divided four parts. At first, I would like to talk about the current catastrophic situation of traffic analysis. The funny thing here is that we have too many packets to be inspected. However, for the problem to have the solution, we have more packets. This is a kind of really helpless situation. I'll tell you what later. And the second one is the main part. When you build a tool for analyzing huge pickup wire using massive threads, you have some selections and how to convert code into a concurrent version. The selection of features and containers and synchronization mechanisms such as mutex, block free, and something like that. And the third part is a demo and an experimental result. Simply stated, speedup is a radio of parallel execution time to serial execution time. So I'll show you the comparison. Then let me conclude this talk. This slide shows the catastrophic situation as everyone of the audience already know internet traffic is increasing at exponential rate. However, there are two huge professionals. I'm so excited, dude. What's going on? Thank you, thank you. Huge traffic imposes a great burden on security researchers and analysts. But traffic explosion is not similar to the hacking or exploit. Because hacking and exploit is impulsive and will be finished within several minutes. But this is not the case of traffic explosion. Unfortunately, traffic explosion keeps exploding like an accident of nuclear power plant. So in my case, in my laboratory, I have 200 to 300 log files to be stored in the server to be inspected. Well, during this 20-minute presentation, 325 gigabyte file is stored to be inspected. This is really helpful session for me. So automation is really important for me and everyone of the audience. But for my experience, open source data mining too doesn't work in many cases. Because in the world of advertisement and marketing, commercial tool is not going to find people trying to hide his activities. And to make things more worse, open source data mining too simply ignore people trying to hide and assume in-line behavior is a part of everyone else. So I would like to emphasize that packet dump is the last resort. Pick a file is rare and hard to find source to be trusted. And this one is 1 million versus 1 trillion. Machine learning has table property. I tell you all. If machine learning doesn't work on the dataset comprising 1 million training dataset, what is needed? What is needed is much more bigger set. This is unexpected because machine learning failed to fail in the dataset comprising 1 million training dataset. The intuitive conclusion is that it doesn't work at all. But according to this paper all we need is much more bigger packets. So the situation is very curious, isn't it? So Astra has four features. At first, Astra should be run on commodity workstation and laptops. It can run with reasonable computing resources. Because GPU and crafting system such as Spark is still expensive and high-cost and sometimes sparky. And more importantly, Astra uses project speed thread which is really old programming style. Writing a program and choosing appropriate labor of abstraction is really important. Usually hardly anyone misses the old programming method except hackers. What we are copying here is a real-world packet stream which is huge, not nice, not organized in regular pattern and unfortunately unpredictable. So flexibility is important like you use assembly language for analyzing malware binaries. So Lawsless and MPI expose the control of parallel computing at the lowest level. But at the lowest level we don't have rivalries, containers and schedulers. So you have to implement these utilities in full scratch by yourselves like the era of 1980s or 1990s. I guess this field can be one of the new frontier for packet inspection. As a result, Astra is compact but powerful. Actually, Astra has about 2,000 lines of code that can process about 75 million packets in 200 to 400 minutes. These two are intuitively simple. Astra takes two steps. Reduction using task decomposition and clustering using data decomposition. As you know, reduction takes a collection of data and reduces it to single scalar value. And clustering is a task of grouping data in the same group in such a way that data in the same group is more similar than to those in other groups. And the important thing here is reduction passes container to clustering. Container is a really important key in this two-stage processing. Container is a class template of C++ and the future selection are almost anomaly detection of packet based on futures. There are many research efforts and futures could be many but the important thing here is to find proper representation for reducing massive pickup fire. We use this representation in the middle of this slide, key value and we use two structures. This is a little bit complicated. Please see the source code in detail. And let me talk more about containers. Containers is a really important point for mass-reading. You have three options. First one is STL. STL is an old, basic and regular programming style. But STL is not concurrent friendly. So it is a standard practice to wrap up a rock around STL to make them safe for concurrent access. And the second one is Intel TVB. It's an excellent library but Intel TVB provides highly-concurrent container but highly-concurrent container is sometimes high cost. It takes longer time. And I guess this one is mainly for scientific computation so unfortunately what everyone here is doing is like science computation. So the data is not well organized and I'm pretty predictable. So in this case, TVB is not suitable, I guess. And the third one is the emerging technology of thrust. Thrust is a TC++ template library for GPU. By using thrust, you can write a call to perform a reduced scan and something like that. Accelerated by GPU. But unfortunately, as far as I know, there's no plan to implement the hash table, the associated container in GPU. So I guess it will be time to be common for packet inspection. I guess this could be a future work. And this slide is the main architecture of Azure. If you have a case when the computation time on individual pick up file is variable and unpredictable, you'd be better served by task decomposition. Specifically, if you have a case the amount of computation time will vary. Dynamic scheduler will be based. And here, as with dynamic scheduler of task decomposition, load balancing is important to take into consideration. You have to implement scheduler by yourself. So the please see the upper side of this slide. This is a shared container which is queued. Dynamic scheduler involves setting up the shared container which holds data and allows threads to pull out tasks when the previous task is completed. So you should protect shared container so that the thread can be assigned correctly and the task should not lost through some corruption of shared container. Okay, let me show about explain our result. To put it simply, the speed up here is the ratio of parallel computing time to sequential computing time and scalability. So why scalability? Because scalability is a measure of how much speed up the program get as you add more and more core and threads with I guess this kernel tuning is proper but with proper tuning of Linux kernel Azure can 75 million packets with 500 threads in about 287 minutes. To tell the truth, there are some rooms to be improved to be improved because the size of shared container is not proper. Lock intention and context switching occurred too much. But why is this reasonable that can process more than 7 million packets in several hours? And I would like to skip the attack detected in detail because some issues of public dataset. So instead, let me show you a demo. Let me show you a demo. First of all, binary is compiled configuration of number of threads. Reduction step one, this demo is too fast. I guess reduction step two building binaries for clustering. Clustering has 5 to 7 dimensions and the data is truncated for this demo too short. Do you know what's going on? Sorry, I don't know what's going on. Because machine learning is too fast. The machine learning relies on so huge dataset and processing speed is so fast. So machine learning might solve problem. We cannot expect it to solve in a way such that we understand. So demo is too fast. So I cannot talk about it. So let me conclude this talk. I have talked about a little bit weird application, which is called ASRA using multi-threading. For copying with real world pickup stream which is huge, not nice and sometimes evil flexibility is needed just like you should use assembly language for malware binaries but using raw thread and MPI takes advantage of retrieved full performance of magical processors and P thread can expose the control of parallel computing at the lowest level but unfortunately or not we should implement everything, libraries, content schedules which is really exciting for me and as a result they offer maximum flexibility so as a result ASRA is compact but powerful ASRA has thousands of code and can process more than 70 million packets and 200 to 5 million so 200 to 200 to 5 minutes for future work ASRA must be speed up because there are rooms to improve the size of containers and applying TBB and GPU and I really recommend this mouse setting applying mouse setting for packet inspection it's really exciting this can be one of the new frontiers for packet inspection so thank you everyone that's all, thank you for listening