 Hi everyone, my name is Tung Chai, I'm an IOS engineer at Rakuten Viki. Today I'll be talking about writing high performance Swiss code. So let's get started. So for the agenda today, I'll be covering three topics. The first one is the dimensions of performance. The second one is a struct versus class, and the last one is about the compiler optimizations. So on the first topic, dimensions of performance. So when you hear the word dimensions, you dimensions of something you might be thinking of, it might be consists of many multiple stuff to consider about, and it might look something like this when you see it for the first time, like this might be too complicated to understand. But actually for performance, there's only three key dimensions to consider, which are the allocation, reference counting, and method dispatch. So let's look at the first one, allocation. So on one side, we have the stack and on the opposite side, we have the heap. So what is a stack allocation? Stack allocation is like a normal data structure. It is a last in first out data structure, which means that you can only write or edit at the end of the stack. So the steps to allocate the memory is just to decrement or increment the stack pointer. So the good thing about stack allocation is it is very fast, because it is literally a cause of assigning an integer. But the drawbacks is, for example, the size and the lifetime is somewhat fixed, which I mean, because the stack memory is allocated like on top of each other. So it is quite hard for you to change this size after you have allocated the memory. And usually the stack memory is deallocated when you go out of the function scope. So you can't control much of its lifetime. And so the example of stack allocation is struct and enew. And on the other side, we have heap allocation. Heap allocation requires the use of a more advanced data structure. So the step is a bit more complicated, whereby you have to search for and use block of memory of appropriate size to allocate, and to deallocate you have to reinsert that block back to the appropriate position. So clearly there's more involved here. And the good thing about this is that the lifetime and the size can be more dynamic. Like you can force the heap memory to deallocate like how you want it to be. So because of this data, more advanced data structure, it is slower than using the stack allocation. And there are some threats, safety overhead, because multiple threats can allocate memory on the heap at the same time. So they always have to enforce locking or other synchronization mechanism to make it safe. So an example is a class. So this is an example of struct allocation where we have a struct. Point with x and y. So you can see here that we create the point one and assign it to point two. In the stack memory, so we have two points side by side in the stack. So not much here. And then when we assign x to point two, we only modify the value of point two and point one is the same. As opposed to using a class, so we have the same data structure. But now we change from struct to class. Here the stack memory is still involved, but we also have the heap allocation. So the stack actually has the pointer pointing to the heap. Which point one and point two points to the same heap memory. So when you edit point two, the point one get modified also. So next about reference counting. So you might be thinking that reference counting is only about incrementing and documenting a number. But there's actually more to it. So when you compile code with reference counting, the compiler actually have to insert, retain and release calls. So retain and release function calls to your code. And as I mentioned with the heap allocation. So there's threat safety over here because these retain and release calls has to be atomic. So there's more threat safety over here. So as an example, for the same class point, when you compile the code from on the left side, you can see on the right side that the compiler actually generates more code than you can see. So it has reference count inside and also they are multiple retain and release calls generated. Compared to struct, there's no return and release call generated, so less is involved here, so it is faster. And next, method dispatch. So this is the third dimension. So static method dispatch is just a normal method call. So nothing fancy here. But I want to focus on the dynamic dispatch. So this usually happens when you have inheritance or protocol classes where there can be cases that at runtime, you don't know what actually is the actual type of that protocol or the super class is representing. So it has to look at a table at runtime to see what is the actual implementation is, which this table looking is not big overhead, it is just one more level of in direction. But I want to focus here that dynamic dispatch prevents inlining and other optimizations that would otherwise be visible to the compiler. And I'll be demonstrating inlining later. So for example, dynamic dispatch. So here we have a drawable superclass and we have point and line subclasses. And we have a drawable array. So you can see it at runtime because the array can contain either point or line. So it has to look at the table on the right side here with the actual element is a line or a class. And when you call draw for that object, it by looking at the table, it can actually jump to the correct implementation. Okay, for the first of it, what I want to leave for you guys is when you think what you should think when writing script code. You should ask yourself these questions whether should the instance be created on the stack or the heap. And when passing the instances around, how much reference count are you incurring? And when you call the method, is it gonna be statically or dynamically dispatched? And as a general rule of thumb, you don't want to pay for the dynamism, you don't think. Okay, for the second topic, struct versus class. So I think you guys might be running into this situation a lot where you want to this, whether you done sure with, whether you want to use a struct or a class for your data. So when I hear about Swift at the first time and with the introduction of value types and structs, I have this thought in my mind. Structs are very fast. I should be using it everywhere. And can I ask who in this room has also have this idea? Okay, a few people. Actually, this is not always true. And so let's look at a simple example. So things get quite a bit more complicated when you have a struct with reference types inside. So I have a quiz here. So as a question, when I pass this struct around, what will happen to the text and font variables inside? We would get copied each time I pass it around. Or if each of it will be reference counted or nothing will happen. So who thinks it's A? Everything gets copied and who thinks it is B? So in normal circumstances, it is actually B. So as an example here, I create one label with string and font. And I assign it to label two. So when I compile the font, it is actually retain and releasing each of the properties inside the label. So you can imagine if you are using a class instead of a label here, when you pass it around, then the compiler would only have to increment or decrement the reference count of the class. But here it is actually increment or decrement the reference of each properties inside. So it is like two times more expensive than what a class would have. And there's another case with, when you are using large value types inside a protocol type collection. So when you have a protocol type collection, because protocol can hold, it doesn't know what is the actual type is. It uses this technique called existential container where it starts a pointer to the actual memory inside the heap. So here we have a stack pair with two protocol types that are drawable. So when you copy the pair like here, when you assign the pair to the copy, it actually copies the whole struct. So instead of like, if you are using, if you use class, it would be like only one reference count increase, right? But if you have a struct, in this case it would copy the whole struct. And as you can see, it would be very expensive if you have something like this. Okay, so for those two examples I want to highlight that, when you use structs, you have to be careful when there's a class, if there's a class inside, you can be hosting excessive reference counting overhead. And for protocol type collection with large struct inside, it can cost excessive copying. So I forgot to mention one thing from the previous slide. So the existential container actually has about three value buffer size of three words. So if your struct size is less than three words, then it can be stored right inside the existential container without allocating the heap. But here for the line we have four words. So it has to allocate a heap memory outside. Okay, so with all those information, you might be thinking, okay, fine, I have lots of data and protocols everywhere. So structs usually will be always too slow for me. I'll just use class everywhere. But actually you don't have to lose your hope yet. We have this thing called copy and write technique that can basically go around all these pitfalls. So what is copy and write? Copy and write is basically a struct with internal class storage, which we manually copy when we mutate the struct. So as an example, we have this line struct here. And inside we actually declared a storage of line storage, which is actually a class. And when we call the move function, which is changing the data inside, we say if we are the only person to hold the reference to this storage, if we are not the only one, then we create a copy. So the checking is done by causing this function called isUniquelyReferenceNon-ObjectiveC, which you might think that it looks ugly. But luckily we don't always have to do this. So standard library arrays and dictionary already implements this copy and write functionality for you. So you can use it directly. Okay, so to summarize, when you should use struct or when you should use a class. So for struct, you can use it when comparing the data makes sense like comparing X and Y value of two points. And when you want each copy to have independent states or use it across multiple tracks. Example is just what you see before, like points, size or coordinates. And for class, you can use it when comparing the identity of each instance makes sense, like for example, you compare it to UI bills. Because you won't be comparing the frame or background parallel of each field, right? You just compare the two pointers. And if you are working with Cocoa Framework, most of the APIs expects you to pass an X object anywhere which is a class, so you can get around it most of the time. So and like personally I find that I'm using more class than struct because it has like very little use case. So most of the time, we just have to use class. Okay, for the last topic, compiler optimization. So why do I want to talk about this? So once upon a time in Stack Overflow, I was looking up like a function to calculate the distance between two point coordinates. And I saw this define here, this macro here, which is dividing the pi with 180. So I suggested that why don't you just replace this pi divided by 180 with a constant? Because pi is a constant, right? Dividing with a constant is just another constant. So you can just use a number to save some computation time. Go a little faster. And I made this suggestion on 2011. And five years later, somebody told me that you don't need to do this because the compiler actually calculates distance for you. So yeah, so at the time, I get confused basically. So do I really don't need to do this? The only way to check this is to test it. So I'll be testing this and see if it is true or not. I created a sweet file for demo and inside this file, I just have two variables, A and B. A is the same thing as we have before. And B is the constant, which means to be the result of the computation of the pi over 180. So what we can do is, so we have to compile it and see what the compiler actually throws out. Then you can do this by, no, not this, no. So you can do this by using C run. So you can compile it using XC run switch C and then you pass on the name of the file and you can specify the. So dashboard none, here is telling the compiler that you don't want to do any optimization. And this is just the output. So we will do this and see what will happen. So we have our output here and this is an executable file. So we won't be able to read it, but we can read it actually using a two hopper disassembler. So let's fire it up and I just want to try the demo. And you can just drag and drop this executable to the hopper and boom, we have lots. Yeah, as you can see, we have lots of assembly code which we cannot read. So all my functions is here. But actually you can read the code in hopper provides you a tool called the pseudo code mode. So you can just highlight the main function here and click on that button. And you will see what the compiler code actually looks like. So here for the variable A. So the first two lines is just the program trying to read the main argument. So let's skip that. So the third line is actually the variable. So we can see that it is actually doing some dividing, sitting from the 16th SD, which is like dividing static, double or something. And from B is just a number. So I don't know what XMM1 is, but I think it's just a number, so it's called. So without optimization, you can see that there's a divide division happening. So, but this is not actually the code that you built for production. So for release mode, you're actually doing some optimization here. So we actually have to change the optimization level here to dash O and we will change the name of the output. So now we have another output which is optimized. And let's see if it is different from earlier. So let's do the same thing here. So now you see that A and B is actually just a number after you applied the optimization. So the last number is just maybe my rounding error. Okay, let's... Actually, I also played it a bit further, so I tested it with... I want to see if the compiler would be smart enough to collapse the calculation from a for loop. So we can do the same thing here. Let's see. So yeah, actually C is just a number after we applied optimization. But let's test for bigger loop and see what happens. Let's do the same thing here. And here. Oh, so it's not actually a number anymore as it has. It's actually going through some loop because you can see some overflow shakes and there's a lot of jumping here. So you can see this 1580 here is actually the address on the top. So it's actually going in the loop here. And there's lots of overflow shakes. So it turns out that if your for loop is large, then the compiler doesn't collapse it. But I see there's an overflow shake here. And I was wondering, and I know that Swift actually has operator that actually ignores the overflow shaking. So I'll try to use that and see if it produces a different result. And go back to here. So now instead of doing this, I'll do this instead. So instead of plus, I'll use 10% plus, which is like a plus with overflow shaking ignored. And let's see if it's okay. Oh yeah, so now it's just a number. Yeah, so it's just a trick that if you want to like crazy fast performance, you can like skip the overflow shaking. And if you ever need to do some hard mathematics. Okay. So yeah, I have a last example. I want to show you here, which is inlining. So I'll create a tree function which calls each other like function one, which call function two, two, which call three, and three, which just random number. So if we compile this one, so let's do this as well, and we just call function one here. So let's compile it without optimization. And then with optimization. So without the optimization, you see that inside the main function, it actually calls function one. And inside function one, it calls function two as we have written. And two in two, it calls three, and in three, actually in three, actually calls the random function. But for the optimized version, you can see in the main function, it actually skips all the function, function calls we have made, goes straight to the random work. So yeah, so that's what the compiler is doing on your behalf when you use specified optimization flag. So let's go back to here. So yeah, so we have seen some example of compiler optimizations in action, which is a constant folding and lining, and I haven't demoed it for elimination, but basically it's just a... It's just a notification, messenger, program is available. Okay. So it just removes unused program. If the compiler can detect that some function is not used anywhere, so it just remove it. So constant folding is just one of, like many, many compiler optimizations that it is doing behind. So you can check it out in this rekey page if you are interested. So that's all the works. Just if you want to see later. And yeah, that's all. Thank you. And if you have any questions, feel free to ask. And by the way, we are airing. If you are interested, please contact this email. And the slides will be in this URL. And you can follow me on Twitter or GitHub. Thank you. Any questions? I was just wondering, do you do any patch marking, for example, in between the structs and classes? I know in theory it's supposed to be, like faster as you say, but in practice, like what is the speed difference? Yeah, it's quite hard to tell, but I haven't done that myself. But I think you can use the time profiling to, like, and switch around, like between the two explanations and see what the time is. But yeah, I haven't done the actual testing. Because of what I do, the reason it's true to know about this in theory, but by the way, in practice, is that the bottleneck in your code, does it not? I think, yeah, depends on how intensive your code is around that, using that struct class, right? Yeah, because the overhead is, if you see, like, if you can compare them side by side, the function calls overhead from return and release is gonna be quite a lot. And also, if you have, if you access it on multiple threads, then you have like, track safety overhead. So, yeah, it depends on the circumstances you use those data. How about you know, you know, first of all, what's the struct? You know, what's the struct? See, I think they are very similar, but, yeah, but, but it's, but it's deep, use cases are different. But I think both are value types. So performance wise, I don't think it's about the same, but yeah, I don't know if there's like, because you have like, different variations of enum, like associated values and all of this kind. I don't know if those will impose more overheads, but yeah, but I think if you use it traditionally, it should be roughly the same performance. Is that, answer your question? Maybe I have to find another place. You should actually have an Xcode plug-in or something, right? For example, it's kind of like plus, you can suggest that use that over flow operator. That's what I'm thinking of. Plug-in or something that can suggest to developer that this is another alternative way of doing this for higher performance. Oh, yeah. So you mean like, if you, there are some libraries that, what was it? But yeah, I'm not sure whether it's just libraries like this. Oh, I don't know. Maybe like some graphics intensive, we'll use those kind of stuff underneath for you and you don't actually have to use that yourself, but yeah. Yeah, maybe, maybe not. Okay? Yeah, other questions? Okay, no, thank you. Yeah, thank you. Okay.