 Welcome everybody. Here I'm presenting Arno Schull from head of development for Thingsal and he's going to have a presentation that is titled from iterators to ranges. All right thank you very much and thank you for coming. So I'm gonna talk about from iterators to ranges. What's going on currently in the standard in the standard library. Okay so we're gonna talk about from iterators to ranges. The first thing is who is familiar with ranges? Who is who knows the boost range library? Okay I guess very few. Now who knows the Eric Niebler's range v3 library? Okay no one and I guess that's kind of superfluous to ask. No one is using ranges and everyday programming. I think you should. So what I'm I think what I want you to what I want you to take away from today's talk that you can do your C++ programming much in a much more productive a much more efficient way if you make good use of ranges. Well what are ranges? Well when you're using C++ in the old-fashioned way you would write code like this. You have the std vector and then when you want to sort the vector you pull the begin and end iterators out of that vector and call std sort and then if you want to remove the duplicate elements because it's unique again with begin and end and then you call that dot erase of this thing again vector and that's a lot of mentioning of that pairs of iterators really belong together they should be one object so it would be much nicer if you could write something like unique in place of sort of vector where sort is doing the sorting of the vector unique in place is throwing away the duplicate elements. Now before we go on there is a bug in this piece of code at least it's not entirely generic and I want you to tell me what is that bug which operator a standard sort use to sort the range operator less what does the unique use to throw away elements equality operator okay and if these two are not consistently implemented on the T the whole code is not going to work and this is also something that range the good range library will give you our range library certainly gives you that that you use these these operators more consistently because you have functions that package everything together into range functions okay why do I think that we in particular know something about ranges things that has its own range library it grew out of boost range and we have about one million lines of production codes that use that library and we have the freedom to change both the library and our code as we like and the the library grew out of this collaboration between the our code and the library and I think that's the only way to come up with a good library design you need a heavy user of that library you need a large code base to try out your ideas and that's what we did we iterated our design until we came up with something that was really practical and I want to share with you today some of the insights that we got before well there are ranges kind of in C++ 11 already there is the range-based for loop and who is familiar with the range-based for loop okay most of you alright then there is actually universal access for the stiff begins to with stiff against it end so you don't have to call the member function it will actually the stiff against it end will also work on something like arrays so you can use always stiff begin instead and universally in generic code and well no there's no end that's all you have in terms of ranges in the standard C++ 11 C++ 14 didn't change much in that regard the future is going to be the current pet project of Eric Niebler he has a ranges technical specification in the standard committee and the first or the thing that this the this specification the technical specification does is to lay some groundwork for ranges and to make basically algorithm the algorithm header support ranges so you have for example a function find that doesn't work on two iterators anymore instead it takes a range and internally it will then just dispatch to stiff find pulling the iterators out of the range that you passed in and then he has a bit more advanced range of version three code base on the web and this is basically what Eric envisions to standardize eventually it contains more functionality than just basically packaging two iterators together into a range object and we'll see what this functionality is now first of all what are ranges first of all ranges are the familiar containers you can call begin and end on them and that makes them really ranges that they have a begin and end and they own their elements when you copy them you will copy the elements along with the object and you have deep consciousness if you have a const vector you can't mutate its elements in addition to containers there is this second thing which are also ranges the standard committee calls views and ranges are really containers and views taken together so what's a view well a view is basically your old iterator pair packaged into one object it instead of owning elements it will actually reference elements just like iterators do and you have shallow copying again just like iterators you can copy the reference in all of one and you have shallow constness so when you your object is const that doesn't necessarily mean you can't mutate the elements just like with an iterator it's const iterator versus iterator and the the constness of the iterator itself doesn't really matter now what I showed you was these just pip packaged pair of iterators and that's pretty boring there are more interesting ranges that make these ranges an actual practical concept and that's really driving the productivity gain you will get from them so when you have say a find you have a vector of ince you have a find and you find an element yeah so you have a find and you're finding an element for then well that's a that's simple call you get an iterator back now let's say you want to do the same thing on a structure and the ins are no longer just plain in there in that vector but they are packaged as an ID in structure a and you have a vector of ace you would like to kind of this code to be somewhat similar to that code because you are I mean you're looking for force at the end right so how different can it be well in fact as you if you write it like you write it today it's quite different because you are using a different function you're using find if and that function that you plug into the find if does basically the extraction of the end a dot ID and the equality comparison together in one op in one predicate so they're similar in semantics these two things but they're not really similar in syntax well how can we change that and there's the transform adapter so down here what we do is we take the vector and wrap it into it into a transform that you pass in an extractor function so what the TC transform does is it returns an object first of all it doesn't do much it just generates an object that reference this is a vector and hold on to that to that function to execute and when you then iterate it's a it's a range when you iterate over that range with begin to end then the function this did mention will be on the fly executed on every one of the elements so when you're extracting your effectively extracting the IDs from the vector and on this on this extract on these extracted IDs you will then run your regular ranges fine function with four and you get the iterator back now to further your understanding of the whole thing now what is it pointing to when you're running this anyone it's pointing to it right so it's not pointing to a because basically when you ran the system transform you turned your vector of a's into a range of it's and so the iterator is now pointing to it's if you want the iterator to the ace back there is a function where you can call base on the iterator and it will actually go back it will unwrap your transform that you did and give you the original iterator maybe to remove some of the magic here is the implementation of such a transform adapter the transform function will give you back an object of type transform range and it will have iterators and these iterators contain the functor and they will contain also the underlying iterator that is iterating over your vector your original vector of ace and whenever you are calling the dereference operator it will actually execute the function on the dereference value and return that okay and that really does the unpacking of the ID and here well the base function easy enough you just return your iterator okay let's talk about a different adapter and and that transform and the filter adapter are really the the bread and butter off of using ranges they will already get you if you if you nest them they will already get you quite far in your code these are by far the most used adapters for ranges so here's a filter adapter what does it do it takes a vector again and it will filter with that predicate it will filter all the elements out of that vector where the ID equals four and it will only return these and again it's lazy so when you create this thing it doesn't do anything and when you start iterating it will look at the individual elements and decide whether to pass them on to you or whether to skip them and just just go on and give you back the next one here's the implementation you have a function that is the one that basically decides whether that that's the filter function whether you pass or not and you have an iterator and you have the end iterator why do you have the end iterator let's look at the implementation of the increment operator so what does it do it increments the it goes one further and then until if you don't hit the end and there you need the end iterator it will basically keep filtering until you find something that actually passes your filter and that's the one where the iterator is gonna stop okay so it's it's just gonna skip the elements that you don't want basically but in order for you to ensure that you don't run past the end you need that end iterator okay pretty good so now we have two pretty useful components of our range library let's see how does the TC filter of the filter of a filter look like okay let's take a look looks like this you get a huge iterator when you're using this in your code and this is actually what boost range does you're getting iterators which are one kilobyte in size if you copy them around things get terrible why is that well you have a function that's the function three here okay fine then you have an iterator well the iterator the inner iterator itself is a filter iterator so it will return contain a function and an iterator and it's again a filter well it would turn in contain a function and an iterator and the iterator and and then this guy will again return the iterator and which again looks like this and then that and so on and so you get exponential growth in your iterator size obviously that's not what we want so the idea is you have to keep your iterators small iterators are copied frequently there are there moved around in your code you can't have them one kilobyte in size obviously okay so one idea is to say okay we have this adapter object and you notice that the filter and the transform adapter objects they were all empty I just defined the iterator inside but they can't carry data so why don't we put the data that is common to all these adapters the function and the end into this adapter object and then our iterators they just reference the range the adapter object and they carry around the inner iterator this is now for the filter range and as you can see well your iterators get quite a bit smaller there is a limitation since the iterator is now pointing at the adapter object the iterators must not outlive their range object their adapter object and that's actually a requirement that is in the range TS so the standard that you will the thing that you will get in the C++ standard tells you if you pull an iterator out of a range object you must not use this iterator object once your range object is dead and this is this requirement is essential to get reasonable performance in particular to reduce the size of the iterator and that's something that we actually that they actually put into the standard once we notice this problem okay so here is now the iterator of filter or filter well you have a range of an iterator well that iterator still contains a point until its range and the iterator and it contains a range to this way up well that's not great right it's actually what range v3 does this is the state of the art of the standard committee but it is not insanely great we want to do better how can we do better well let's introduce a new concept we call it index it's kind of like an iterator except that the operations require that you pass in the range object that this is this that this iterator is off okay so and then the operations on the on the index are actually defined by convention on the range object so when you have this index range which is which is a range implemented via these index concept then you have begin index and index which is kind of like begin and end and you have an increment index that takes the index which will increment this index you have a decrement index which decrements the index and the dereference so every time you're implementing these operations you have the index object as well as the range well first of all well that's a great idea but you know the world is running on iterators so if someone comes up and say well we're gonna gonna throw away the iterators we now need this new index concept and you're gonna rewrite your code all with indices that's unlikely to fly right well um luckily the index if you have implemented this index range you can easily implement a generic iterator for index wrapper that turns that index concept back into an iterator concept and so whenever you have an index range it's kind of trivial to implement iterators for them you it's boilerplate you can do it automatically um so you have iterator for index it just has a pointer to the range and it stores the index type of that range and if you have something like an increment operator it just calls the range with increment index of that index okay so easy enough um so that's not a problem whenever you're we are now doing index ranges we you can use them with iterators no problem okay so how does that help us to solve our problem here's an example of a filter range implemented using this concept so you have the filter range with a function the filter function and a reference to the base range you don't have iterate you don't store directly begin and iterators anymore but you could store a reference to the base range and you say okay my index type is actually the base index type there's no difference in types so it's kind of like well the filter needs the underlying vector iterator to increment but it doesn't need much else that's the essential piece of information now if you increment the index it you can ask the base because you stored it to increment your index and you do this until you had the end or until something passes your function every time for for increment for end for the reference you can always refer to the base which we stored here okay so together with this iterator for index you can now build stacks remember the filter index is no matter how large your stack is is always going to be the underlying in underlying iterator so if you have a filter of a filter of a filter of a vector the outermost filter will still have as its index the iterator iterator type of the vector and then you use our little wrapper iterator for index and now all the iterators you build on these huge stacks they're all just two pointers basically the iterator of the vector and a pointer to the range object the outermost range object okay that's quite nice let's go to another difference between what's basically proposed for standardization and what we are doing the if the adapter input who's familiar with l value versus r value okay if the adapter input is an l value container okay so it's basically a a vec here that has been has been declared elsewhere and here you are just generating a filter object on that vector with some predicate then this will obviously work but what happens if you turn this into an r value you're basically inside your filter you are creating the vector and directly passes into the filter and this is a quite quite frequent in your in code that that you want to do this kind of stuff i'm talking out of experience well the thing is it doesn't compile why doesn't it compile well the view is referencing its base range its base range or container whatever it is and in this case the vector will actually go out of scope as soon as you you go past the semicolon here and so when this is an r value you suddenly have a dangling reference certainly not a good idea and so what the range v3 library says is it doesn't compile we're not going to deal with this well the thing is there's an alternative in that library to call action filter instead of view filter and what will it will it will actually compile but what it does is that instead of generating a lazy range it will directly execute that filter operation on the underlying vector and basically take the vector filter it eagerly right away and pass out the vector the thing is i mean i'm afraid programmers are going to try or view filter doesn't work well then we use action filter oh this works that's that's great so let's use that well you suddenly pessimized your code because you really need to you it will eagerly filter that whole vector and even if you only need the first element and then throw away everything else well you you just force yourself to spend the time to filter the vector so what do we do instead yeah so it's not lazy anymore so what we are doing in our library is if the adapter is an l-value container there's just no problem the filter is going to create a view its reference of one copy shallow constants now if the adapter input is an r-value container what filter does is it creates a container it actually aggregates the r-value it moves the r-value into an internal variable an internal member and and then it's a deep copy it has deep constants it's just like a container but it's always lazy because in our mind the laziness concept and this container-ness are really orthogonal you can be a container but still be lazy and we hope that that avoids these kind of traps that you would get into when you when you follow this previous approach and at the same time it allows you to write compact code because you can you can nest all these these things the things that create the vector and the filtering and so on okay um there's one more thing that we have it's a bit stolen from the from the blue strange library they have something similar um the the more flexible algorithm returns so when you have a find usually you return an iterator to the element that you actually found uh or end if you didn't find anything sometimes it may be nice to do something like uh like this where you have so this is an extension where it gets an extra type pack that kind of lets you customize what happens when you found something or when you did not find something okay what pack actually gets is an iterator and a range uh and there is this special thing pack singleton which says I didn't find anything what shall I do uh and that is only past the range and the standard implementation would look like this uh the the regular pack just returns the iterator the pack singleton returns the end of the range okay so what you now can do is document for example in your code if you don't expect the find to ever fail if you know that something is or that what you what you are looking for is in the range then the uh then you can actually document I'm not expecting n to be ever returned and that's that usually requires this extra assert uh and and it's kind of lengthy to write but here you can simply say okay I'm gonna declare this should actually return something you can do more uh you can say hey um return me everything that is before the thing that I was looking for return the head okay so this will do something like it takes to take is is um generating a range that goes from from begin to the iterator that you passed in okay so it will basically up to the point where you where you found something it will give you a range of values and yeah we still have this the third false we still expect something to find something okay um let me get to a generalization of ranges so far ranges we're always using iterators in our mind that doesn't need to be the case sometimes you have some code like this where you are traversing widgets and you pass in a function uh that using essentially the big old uh visitor pattern where every every widget that you have gets passed in to this function so here you you may have a nesting traversal where you pass in the function by reference and otherwise the function gets called with every single element that you have and it's a bit like a range except that well you you iterate in kind of a different way there are no iterators and but still it might make sense to write something like this you say okay uh did my mouse hit anything well the first thing is kind of a range you pass in a function and it will just reverse the widgets and this thing is a little bit like the test function of a normal any off where you say okay get me a widget and I'm just going to check whether the mouse is when whether that widget got hit by the mouse okay so it's this is very range like although what you're what you're iterating over is not an iterator based range so how can we fold these concepts together what we're really doing here is replacing or integrating uh two concepts of iteration and I want to want to point them out to you um there is something called external iteration that's basic computer science um the the consumer that's consuming the data calls the producer to get the new element this is what we are used to with iterators so you are consumer sitting down here and whenever you're saying I don't want another element I say start it and it produces an element and returns it to the consumer and then I may say plus plus it and then again so I start it and I go into the producer and produce an element and it goes back to the consumer so the consumer is at the bottom of the stack this is how the stack grows and the producer is at the top of the stack and whenever you want an element you call up into a function and the functions come back with a new element um you can turn the whole thing around see do I have it internal iteration right so first of all uh the advantages and disadvantages of external iteration the consumer as I said is at the bottom of the stack and you can write contiguous code for this consumer you can do any kind of logic uh and you you have one contiguous code path and whenever you want an element you say give me an element give me an element give me an element that's easier to write obviously um and it also has better performance um because for for the consumer because the state in which the consumer is currently in is encoded in your program flow you don't have to restore the state you don't have you don't get called at one point and say well where why or where am I what what which state am I in you are in the state that you are while you're executing your program and you have no limit for stack memory whenever you need memory you just can get it get as much stack memory as you want you can recurse what do whatever you like uh there's no limit now the producer is in a more difficult put the situation it doesn't have uh the contiguous code path for the whole sequence it only has a contiguous code path to generate one single item and every time it gets called again for the next item it has to restore its state and decide where am I for example in the uh in the tree that I am iterating over or so um that makes it a bit harder to write and it also makes the performance a little bit worse uh you only have a single entry point so you kind of have this dynamic dispatch at that single entry point where I'm going to continue with in my code flow and um there is basically when you when you want to carry around state for one part of the form from producing one element to producing the next element so for one call to the next call the only place you really have you can really can do this is inside your iterator so inside your iterator you only have limited space you can't use arbitrary lots of of stack space that gets carried between any any iterator calls uh or you have to go to the heap and allocate the memory there but that's certainly not very efficient contrast that with internal iteration so the here the producer calls the consumer with and uh to with with a new element this is what we saw with the traverse widgets so um here the widgets are being traversed and whenever you are uh you whenever the producer has a new element it will consume call the consumer so it really turns things upside down the producer is again at the bottom of the stack and being at the bottom of the stack is great as all the advantages that I already talked about the consumer is now at the top and it has all the problems that the consumer would have if it's being at the top just all the problems of being at the top of the stack now how can we integrate these two well it would be nice to have both right that both are actually at the bottom of the stack and yes you can do this um with coroutines these are really what coroutines are all about you uh whenever you are you are hitting something that you want to pass on to your to your consumer you say yield and the other guy was waiting in its control flow and picks up the item and and quite a few programming languages already have that uh c plus person doesn't have it yet uh but there are actually proposals to do that um there are basically two ways to do it one is stackful you have in both have in an arbitrary amount of stack they're usually implemented as operating system fibers and that's very expensive so if you want to write a tight loop this is not going to be feasible this is not going to be efficient because every time you are basically yielding to your other fiber um you have to restore its state you have to switch stack and and that's that's expensive um every fiber is going to take one megabyte usually um on windows 32 I think of virtual memory uh because you for for this core routine so it's not nothing that you're going to do to iterate over a bunch of integers um the second alternative is uh our stackless core routines and because of the performance benefits that there is actually a proposal there's no proposal for stackful uh stackful core routines but there you can only yield in the topmost function they kind of go around this problem of having a limited stack space by saying well whenever you are yielding I I know that I will analyze the amount of stack that I have to allocate um for that core routine by yielding only in the topmost function and but that limits uh quite a bit what you can do because if you um one reason that you want this kind of thing is that you are going into uh that you are that you are recursing and they are naturally uh you you want to yield inside these recursive functions so and it's also still a bit expensive because um every time you yield to the other to the other uh other other fiber um you have to find your resume point so you kind of gonna have something like a dynamic jump kind of like a virtual function call every time you're doing it um again if you're if you are in a very tight loop and you want to aggressively inline your loops and aggressively optimize the code that's not something you want to use okay uh but it turns out internal iteration is quite quite often is good enough for many algorithms uh it's like like here find or binary search uh we'll find you want an iterator out so it's going to be difficult because with internal iteration you don't have have any iterators but for each that just works okay and most of the time what you're doing with rangers is for each you can accumulate you could do all of any of none of all these things they all work with internal iteration and there's basically no reason why your range library shouldn't support these algorithms that the regular library supporting with iterators why it it should not support this with internal iteration and our library does that the adapters i talked about the filter and the transform also they actually are implementable with internal iteration basically when you're filtering you are being called with a new element and if you don't like that new element you simply return wait for the next one to come um with a transform you just pick up the element transform it and pass it on so that's easy to implement also with internal iteration so we allow ranges that support only internal iteration and then the any off implementation for example looks like this so you get a range and again this range doesn't have any iterators okay uh you you have this helper function enumerate that basically takes the range and hides the fact whether that range has iterators or not if it has iterators it will iterate using iterators if it doesn't have any iterators it will call this this this for each function inside the range which will pass out all the elements and then you have this function here that you are passing into this enumerate and it will actually do the work of the any off um just ordering the results now is that all good well not quite there's something missing regular any off is lazy right it will stop as soon as it decided that it's true so how do we do that hmm first attempt was ah let's throw an exception right whenever you are you're true you throw an exception well that's not a good idea it's too slow throwing exceptions is expensive is not meant to be for regular control flow okay so second idea was a simple enum so what we say in our library is when you are when you're your your filter function here returns an enum of that special type break or continue he wants to tell you something he wants to tell you hey either continue or break the nice thing about returning this special type is you can actually check at compile time to eliminate this break check if the funk a refund returns break or continue then you have to do the check and you have to break when break is returned but if the return returns anything else there's nothing to check you know he's going to continue okay he doesn't try to tell you anything so you can just loop like you ever before and you don't have the check now um actually it's interesting that this concept is practical for things other than places where you need internal iteration for example when you want to concat uh to heterogeneous containers so you have a list and you have a vector and you want to append the list the vector to the list okay so the way you write it in our program is like this concat list back and here you are now um getting the the contents of both well usually when you want to write this with indices you would have to do something like a concat range that incorporates these two ranges and it also has to store an index that is either of the type of one range or of the type of the other range so you first have an iterator that iterates over the first range and once hits the end it will switch over to the second range and iterate through that one so that that index has to be either of that kind of iterator or that kind of iterator either of iterator of vectors or of iterator of lists in our case the thing is that an increment index then is quite complicated you basically have to switch on that variant if the variant is of one type you have to increment it for the first range and check whether you hit the end and if you hit the end you have to reassign that index with the begin of the second range so you have to kind of do the skip over and concatenate the two together and you have to do this every time you're incrementing your index you have quite a few checks that you have to do and it would be nice to avoid them uh dereference actually is similar here's also the switch on the index and and again you have a branch with every every time you are actually dereferencing your iterator hmm well with generator ranges you can actually do this very efficiently i mean that's that's pretty obvious uh in our case these operator parentheses here is um the the operator that enumerates the elements so that concat range on top of having this iterator functionality which it also has if you want to use it it for efficiency uh this time it actually has that generator interface this this internal iteration interface that will just plug the elements into this functional object and it does that by again calling the internal iteration of these two ranges that incorporates one and then and then the other and there you don't need any checks for for end it's very efficient and so so algorithms that are actually capable of dealing with internal iteration like this any of would actually use this form of iterating over ranges um because it's because it's more efficient all right uh that was it thank you very much we do have that range library public um on github if you want to play with it this is the url you also find it on our website and to wrap up i want to say this i hate the range-based for loop it's forbidden at think-cell why because instead people then write this the good old for loop instead of using algorithms and writing it nice and compactly like this so get away from the range-based for loop i don't know why they got this thing in there thank you very much okay thank you or no so we have some time for questions so raise your hands and i will run to you so hi i think that since your library is seems to be quite a bit more powerful and better than the boost version or the you know standardization version is your company involved in standardization efforts to push that upstream into sql well we are we are actually the the sponsors of the c++ committee in germany uh because there was no one else i mean siemens didn't want to do it so we did it um so we are sponsoring the participation of the germans and the c++ standard committee we didn't push this as a standardization effort why well if you have ever been to a standardization committee meeting it is a very political and very long-winded process and um so far we've been we spent our time rather on improving the code then then pushing it into standardization so it's it's simply a matter of of spending the time making the effort and certainly what what's going to come out of the current implementation or the the current standardization effort the ranges is not bad and in particular it doesn't do anything that would preclude later improvement so that's that's basically what we did as a gatekeeper this was one this one thing with the lifetime of iterators and ranges i mean if that would have been voted the other way around we say okay you can keep using your iterators even if your range is gone um then basically the the door would have been slammed shut to ever make something that actually works you you would have been in a in a dead end and and that we actually prevented so so that was it's kind of like the scope of our effort um other than that i'd rather write code than doing political work i'm sorry any more questions none okay then big thanks to arno again thank you very much and um there is actually if people haven't noticed there's ice cream outside and i'm gonna be in the um near our booth for the next half hour and and yes we are hiring thank you very much