 Hello and welcome everybody to this talk about the end of the time. My name is Arndt Bergmann. This is one of my side projects that I've been working on for the past six years, hoping to get done by the end of the year since I started. We'll see which year that will be. If necessary, I can talk a bit more about why we're doing all this work. I don't know how many of you have seen the same talk when I was here a couple of years ago. Can you just raise your hands? That's a few of them. I'll probably get into all the background before we dive deep into the code and what we've done since then. The basic problem is that there's a type called time t which is fundamental to all Unix systems. It counts the second since 1970 and that means that in the beginning of 2038 it will overflow and since it's a signed type it will become a negative number and we go back to the year 1902 which causes all kinds of problems. The only fix that really works as we found out a few years earlier is to make it a 64-bit type as a couple of other operating systems already do and that gives us basically to the end of the universe. Why do we actually want to do this? There are an awful lot of 32-bit machines out in the world. People are deploying 32-bit machines at a very high rate to this day. We are adding roughly twice the number of 32-bit machines to the ARM kernel compared to 64-bit machines to this day and it's not likely to stop anytime soon. Can I have another show of hands? Who's working on 32-bit products at this moment? That's more than half of everybody here in this room. That's amazing. Some of these things have awfully long service lives and are run awfully old kernels. Some examples that we do support in the upstream kernel. We have some toys. My kids are playing with LEGOs from the 70s and these have been deployed for a couple of years now starting out with a kernel from 2010 I think initially and that has been updated. We have some very heavy industry stuff that has tiny embedded controllers in them sometimes with an ARM9 or even older and we have in the automotive industry a lot of products are being deployed still with 32-bit and obviously these will all fail unless we do something about them and some products have even longer service lives than those. One example of this and there are lots of examples that most of you are familiar with. You work on a product, you start out with a BSP coming from an SSE manufacturer. It already has a kernel that is a few years old. It takes a couple of years before something makes it to the market then it's being sold for hopefully a good number of years before you have to replace it for one reason or another and then on top of that the customer actually expects to use a product for a good number of years. So what we are worried about today is kernels that people will put into production possibly 10 years from now and then run for another 10 or 20 years on top of that. Another unrelated issue is not even if you're using 64-bit hardware you may be using 32-bit user space. One example of this is the Raspbian distribution which a lot of hobbyists use. So even if you have a Raspberry Pi 3 that has a 64-bit CPU you would be running 32-bit user space and have exactly the same problems. And for industrial users there are still good reasons to have 32-bit user space which may be memory constraints or it may be that you lost the source code for something. You don't have a 64-bit compiler, it's not portable code but it still works so there's no real reason to update it unless you want to make it run beyond 2038. And then we have some more fundamental problems. We have network protocols that embed a 32-bit number with a time. The same thing goes for file systems and applications may save time stamps in arbitrary formats and a lot of those use 32-bit time stamps and I listed two examples here. The UTMP file is used by G-LIB C to store the login times of the user and everybody has that. And CPIO is used by the RPM tool for example but also the kernel itself for the inner drama phase. And then we have hardware interfaces which are almost impossible to fix. A lot of real-time clocks use 32-bit second counters and we have some code in place to work for those. And then lots of other things that need to know the time in hardware or in firmware you interchange time stamps between the firmware and the kernel or between firmware and user space between hardware and kernel and any of those combinations where you have an interface that comes with a product that you communicate with, you have an inherent problem. If you're lucky you can use unsigned seconds and don't have to change anything else but then at least both sides have to agree that this number is unsigned. That gives you time until the year 2106 which is 136 years since the start of the Unix epoch in 1970. So what have we been doing in the kernel? This is the first patch that I could find to address this topic. One of my colleagues, John Stolls, sent this patch to create a time 64t type inside of the kernel. This type is used for the internal timekeeping so his approach was to start with all the time-hanging code inside of the kernel and work his way out from there. Basically, the timekeeping code was done within a year or two and then we worked towards the device drivers and then the user interfaces. I started working actually from the other side coming up from the system calls which didn't work all that well. We'll get to that. Then once we had the timekeeping interfaces fixed we started converting every single device driver which interacted with the timekeeping core. So we did not change the type throughout the kernel. We addressed every single driver individually. There were, I think, at this point over a thousand patches that we have done to address this just in the kernel. So how do we do this? We have a type named ktint which is only used inside of the kernel. This is now a 64-bit nanosecond counter. It's an opaque type that you can convert to other timestamps and you can just check how much time has elapsed between two of these ktint and some other interfaces use it. That just makes the code work the same way on 64 and 32-bit and many times it works much more efficiently and more accurately than using a tint. Using Jiffy's is a very old method to address this and that simplifies a lot of code and it's well understood and it doesn't suffer from any of the other problems that time t has. In some cases we have to use the time64t simply converting the timestamps to a wider type but I try to avoid that and also the time spec and timeval types. So timeval is, for those of you who don't know it, the type that has a second and a microsecond value. This was traditionally used on Unix but inside of the kernel we all work with nanoseconds instead of microseconds so you always have to multiply and divide by a thousand. So we got rid of all the timeval usage inside of the kernel and used time spec 64 which in turn is a type using 64-bit seconds plus nanoseconds. And another change that we did at the same time was to change a lot of the users of clock real-time to clock monotonic. So what's clock monotonic? The main difference is clock real-time counts in the UTC time domain starting at 1970. It does not handle leap seconds which can often become a problem and clock real-time is also the time that the user space when you talk to an NTP server or you just update the time using the set time of day system call or the real-time clock interfaces to copy the time from real-time clock into the kernel. That all changes the time and a lot of cases inside of the kernel you want to know how much time has elapsed between two events so clock real-time is a really bad idea for that. In the case of leap seconds it can sometimes go backwards when you set the time it can be all over the place and clock monotonic doesn't have this problem it just starts ticking when you boot up the system. One of the harder problems always is the user space interfaces so when you have a user space interface we know that we will have to change both the kernel and the user space at the same time in the same way and ideally we want to do it in a way that users don't ever see it so you just recompile your program and it should work within your kernel. IoControl is the most common user interface that you have in a device driver and here we got a little bit lucky. So we have these fancy macros that are used to define IoControl command codes and they take the size of the argument type so you pass a pointer to a structure and the size of that structure is used to define the command number that the device driver uses internally. So the parallel port driver was one of the cases where we pass a struct timeval. Struct timeval, in turn, user space will have to redefine it to be 64-bit base so it will grow from two 32-bit integers to a 64-bit integer, a 32-bit integer and some padding so it will be 16 bytes long instead of 8 bytes and that means that this macro evaluates to a different number in user space we pass that number as a command code into the IoControl system call and the device driver just sees a new command and then the same goes for this PPP command which takes a structure and that structure embeds a time type. The implementation looks like this. We have a big switch case statement that evaluates all the IoControl commands that a driver supports and the code that handles the old command gets replaced with two case statements one for the version that the 32-bit user... that the old 32-bit user space sees and one for the version that any 64-bit user space or modern 32-bit user space sees and then this driver just understands both commands we can use them on both 32-bit and 64-bit kernels and we don't even have to worry about the compact mode anymore where you run 32-bit user space on 64-bit kernels and also have a related problem to this but unfortunately that's not always possible so we have some very common IoControl commands that date back to much older versions of Linux or even old Unix versions so if you get a socket timestamp you receive a packet, you want to know when it arrived the command code is actually defined as a hexadecimal number that someone came up with it does not change when you change the type of a timeval even though it passes a timeval and there we have to go through a lot of extra hoops to make it work the same way so we change the header file and we change it in the way that on 64-bit architectures we still get the same old command code and on everything else we define it to an expression that contains a ternary operator and either uses the old or the new IoControl command and the new command is defined in the proper way as a combination of two 64-bit integers there's a special case for the x86 x32 ABI that basically has no users but that always gets in the way because they decided to address this problem a number of years ago by making time t 64-bit for their architecture but now they don't have any users but they're making all their father unfortunately so we always have this one special case for those and I hope that we can eventually get rid of that it gets worse the input event does not use an IoControl command at all so if you want to know the time stamp of the clicker or how fast someone's moving over the trackpad you get the same kind of time stamp in a data structure but you get it through the read system call and the read system call we have no way of finding out either the user space has been built with an old G-Lib C or a new G-Lib C so we don't know which type of time information it uses the only way we can make this work at all is by keeping the use of a 32-bit time type and making sure that the users don't notice it so in the header file we redefined the input event structure to not contain a time spec anymore but instead have a similar structure that has 32-bit seconds and 32-bit nanoseconds do I have a yes there is so on 32-bit we use this type which again is a special hack for x32 and then we have to make sure that the time stamps that are passed through this are always done in terms of clock monotonic and not clock real-time this is something that we have to add some checking for right now you can choose between the types of time stamps if you use clock real-time it will keep working but it may or may not cause problems in 2038 depending on how you interpret those time stamps if you're using clock monotonic it works just fine and then you have every application that uses this structure has to include the new header file if you copied the header file into an application which some people do, then it will break and it will just not work once you upgrade to G-Lib C and a similar problem exists in some drivers I know of two examples where we have time information in a data structure that is exported from the kernel to user space using a memory mapped interface the most commonly used one is the PCM interface in Alsa in the sound drivers and there we just had a discussion on Sunday about this we've had patches for at least a year and we've talked about it for longer we still have to make the final decision which way we do it we can make, we can change the kernel to detect which interfaces the user space expects and then export one ABI or the other or we can keep the ABI like we did for the input event and change all of user space to not expect 64-bit time stamps or change all the types that are visible in user space headers and also make sure that the user space uses clock monotonic which you normally want anyway for audio but we're not currently forcing users to use clock monotonic the biggest issue so far has been the virtual file system the main reason is that we have this large number of file system implementations I think there are over 40 or 50 of them in the kernel and the fundamental structure that they use is the inode the inode contains a couple of timestamps the timestamp is the A time, M time, C time, usually we also added B time, that's the birth time or creation time of a file in the meantime and I worked on this as the very first thing when I started looking at this, I think it was 2012 posted patches in 2014 Deeper took over and did another version from scratch on the same patches she posted it in 2016 as part of her outreach internship which seemed like a good idea at the time it turns out she's still working with me on these but she has managed to get it done after five more rewrites of this patch set another piece of the puzzle was the stat X system call so we have over a dozen different implementations of the stat system call in the kernel we have an architecture specific structure for stat and we have on some architectures three or four different binary layouts of stat which is crazy and then we also have L stat and F stat and F stat add and all these combinations the stat X system call just replaces all of them the new architecture that we're adding in the future will only implement stat X and then G-Lib C can implement all the other ones based on that for U times, I think we've just done that but I'm not sure, I have to check it will be on the later slide and then the file systems themselves so the work that Deeper did was for the core file system code sits between the system calls and the file system implementation so we made that 64-bit but some file systems either in the code are still using 32-bit or on disk or on the network whatever file systems use they store the time stamps permanently as 32-bits so for example XFS we know how to fix it but we haven't done it yet so for NFS the code is still wrong we have to fix that and for NFS v3 I think it's fundamentally 32-bit and NFS v4 uses 64-bit time stamps already so we just need to get the API in between right the API and then a couple more and XT3 for example is similar to NFS v3 that will never be fixed so if you're using XT3 at the moment stop doing that use XT4 because that has to fix it the system calls that's the most obvious thing there's an advantage that it's only between the kernel and user space typically we found around 50 system calls that pass time information in some form and in 4.18 we had 50% done in 4.19 I think we didn't get any but for 4.20 we now have another set of system calls so we get it to maybe 75% of the system calls that have a correct implementation in the kernel but no architecture is currently using those the idea is that we change all the architectures at the same time every single 32-bit architecture will start using those entry points once we have done them all and at that point you can start building a glibc or whichever libc implementation you use to call those system calls and there's still a couple of them we basically just agreed on how to how to do the adjust time x which, sorry, clock adjust time and adjust time x and then there are these 4 system calls that use special data structures I'm still in the discussion with a number of people unfortunately the main problem here is that nobody really has an opinion we know there are at least 4 different ways of how to address those that are all slightly different and may have downsides and upsides but nobody really has put their foot down and said, well, don't use that, use that and I'm sort of on the fence between two different ways let me get to how we do it turns out we already have for each of the system calls that we need to address two implementations in the kernel we have one that is used natively and we have one that is used for 32-bit processes on a 64-bit architecture for each of those because they all have the problem that they pass a 32-bit time t so taking this few-text system call you have a 32-bit system that has a native 32-bit few-text passing a time spec then you have a 64-bit system that has almost the same system calls implemented by the same source code but drug time spec is defined differently so this looks slightly differently at the ABI level even though the source code is the same and we added a compact version compact system few-text just implements the same ABI that system few-text implements on 32-bit architecture so we can use this to run a 32-bit user space application that calls few-text what our patches did was to add to just enable compacts with few-texts on a 32-bit architecture so now we have two system calls that implement the exact same ABI this is what the patch looked like for this so instead of guarding it by config compact which is only enabled on 64-bit architectures changed it to a config compact 32-bit time and this symbol can now be enabled if you have either config compact set or you are running on a 32-bit architecture and at that point this becomes available and then we have to change the type so instead of compact time spec we now use an old time spec 32 which is the name for the structure that 32-bit tasks use for time spec and the final step was based on feedback from Christoph Helwig just in the past few months let's just rename that system call as well once we are there so we don't have the compact name on 32-bit architectures the code is still the same and we use the sysfutex time 32 on both 32-bit and 64-bit architectures and this still implements the same ABI as this one and the final step is to change over the native sysfutex system call on a 32-bit architecture to look exactly like the sysfutex on a 64-bit architecture and at that point we have two implementations of the system call that no longer differ between the type of architecture you have you just use the old system call number to jump into this function and we assign a new system call number to jump into the old version which now implements the new ABI we also get a new type for this so a kernel time spec is now the structure that is used on the interface between the kernel and user space and nowhere else we don't use it in the kernel, we don't use it in user space we use it only at the boundary there is a reason for that to do this we change over we already have this patch in the kernel so kernel time spec is now used on a 32-bit architecture but we still haven't changed the system call tables so this gets redefined to the old time spec until we change the system call table when we change the system call table this code will just go away we will use the new definition of the structure and then life will be good then we have everything the way we have been done 20 years ago this is the number of the list of all the system calls that already have replacement in the kernel for one reason or another so the time system call tells you the time in seconds the get time of day tells you the time in seconds and microseconds we have another system call called clock get time that tells you the time in seconds and nanoseconds clock get time with the clock get time 64 then we have four different versions of the system call but the older ones we don't have to replace because glibc can just implement time and get time of day by calling clock get time 64 and the same goes for all the other ones here these are the ones that are fixed in 4.18 these are the ones that we have patches for that unfortunately did not quite make it into what will become 4.20 so I'm fairly sure that in 4.21 these are all good to go the patches are reviewed it was just the bad timing that made me not able to submit them in time these are the all the system calls that do need replacement so for each of these versions there is not already a replacement system call in the kernel and we pass a 32-bit time T and again about half of these are done already in 4.18 and 4.19 we have a couple more that have patches ready and that gets us to the get itimer set itimer, got our usage, wait ID oh and sysinfo we don't do actually so get our usage as the example this structure is defined on every unique system is always defined the same way we have a timeval in here the timeval tells us how much time has elapsed while running this process and the same thing for user time and system time the same structure is used for wait for wait ID and our usage and there are multiple ways of doing this so one way is to basically just say if we are in the kernel or we are running a 32-bit where am I so we just define it to kernel old timeval so that when you include the kernel header you see a structure definition that matches the old binary interface that's important because struct timeval is getting redefined by glipc so if you use a new glipc include the kernel header that defines the structure and then call the old system call doesn't work because all the fields after this are in the wrong place and these fields don't tell you the right time and then glipc can on top of that implement the same structure and copy between those we don't have to do anything in the kernel we just have to change the kernel header file another option would be to get rid of the timeval and put a timespec in because we actually really hate timeval in the kernel so timespec would be the correct thing to do here and this still keeps all the other members the same and these would be long we could also make them 60 for a bit which is another version of this and then glipc still has to convert between it because now it has to divide by 1000 to get from timespec to timeval but it's a nice interface and those are the main options that we have for addressing this and I'm sort of leaning towards not doing anything in the kernel and letting glipc take care of it because 20 years from now everybody will wonder what's this strange structure doing here and why do we pass 32-bit seconds where everything else uses 64-bit seconds and there's a problem with the timespec definition which is really interesting so the C99 standard actually defines what the timespec looks like and it says there's a member called TVSEC that is a time T in the kernel we would use time64T with underscore so it doesn't conflict with any user space definition or this is actually the glipc definition the internal glipc definition has 64-bit TVSEC member and this is fine the kernel uses 64-bit nanoseconds at the interface but C99 and POSEX both say that the nanoseconds have to be long which is always 32-bit wide in user space and that's sort of okay on x86 because it's little endian and you just get more padding at the end the reason why we want to use 64-bit on the kernel interface is to make sure that this padding is always zero-filled if we had implied padding by just having two members here 64-bit member and a 32-bit member the ABI would add four more bytes if you filled first two members and then do a mem copy from kernel to user space you get four bytes of kernel stack data which can be used potentially in an exploit by finding out information that user space should not have but that is available in the kernel and this is a real problem and then user space needs to make sure we match the layout so on a little little endian architecture did I get this the wrong way so we either have to add padding in the end or we have to add padding before it so if we add the padding in the end we actually can call it padding if we add the padding before it we have another problem which is a time spec that you want to pass to kernel space by using C code to just set the first and the second member you really want the second member to be the one that is the nanoseconds you don't want it to be the padding so we have to use a bit field which is another way of doing this so this is really ugly already but as always it gets worse oh yes and I had this timeval and I really only found out this a month or two ago there's one architecture in the kernel that defines timeval differently from everybody else and that's spark64 on spark64 we have a 32 bit microseconds value on a 64 bit architecture whereas all other 64 bit architectures have a 64 bit microseconds value and that means it's currently broken already because we again pass kernel stack data to user space as I just explained if we replace it with a 64 bit member copying and out since spark is a big Indian architecture we copied the wrong bytes and I already introduced a bug in some code that I really have to fix now and I didn't manage to send some other patches where I would have broken a lot more important code so the parallel port driver fortunately is not used much on spark64 practice but I have to go through all the interfaces again that use a timeval on the user space to make sure that I'm not breaking and either of the two spark64 users so what's coming up so this was all about the kernel so far the next step obviously is glibc or any other libc implementation we have two libc implementations that have some sort of code working albert aribou has spent a long time designing the interfaces for glibc they've had some surprising as I found decisions in there one thing is that they actually want to run user space with the 64 bit time t on older kernels that did not have the 64 bit time t system calls they also want to make sure that you can basically indefinitely build user space using either 32 bit time t or 64 bit time t depending on the macro that you set just like we do for long offsets in files so we pass a macro definition to the compiler you can set it in a header file you can pass it in the command line if you do this you get the 64 bit time t once those patches have been merged if you don't do it you get the old interfaces and then you have to build everything else on top of the glibc if it passes a time structure from one library to something else then you have to make sure it matches or you also have to use symbol versioning and make sure that you build it the right way I did a prototype of another libc called muzzle that is used by a number of people if anybody is interested in seeing the code I have it available but the version that I did will actually not make it into muzzle itself because I made it configurable so at the time when you build muzzle the way I did it you build it either with the 32 bit or 64 bit time t and then user space will match that this is not what's going to happen in muzzle because they decided to rework their user space avi from ground up and fix everything that has bothered them in a while so there will be a muzzle 2 at some point in the future which will have 64 bit time t only and also fix a few other things and then we have to see how we can deploy that because that will only work on newer kernels then there's after we have the libc we have to worry about the distros the easiest case are the embedded distros so if you are working with open embedded or any of the other ones that build everything from source you are lucky because you can just rebuild everything with a 64 bit time t flag set and then you have to worry about how to deploy all the user space binaries in the field at the same time without breaking anything which is also interesting but it's much easier than having to worry about gradual upgrade strategy so android is probably in the worst situation because they have avi levels they support old 32 bit apps on all their android including 64 bit android so they are definitely in this position where it matters a lot to them I don't know how they will solve it they might just drop 32 bit apps support but right now almost all the apps that you see like the most used apps all rely on 32 bit arm binary interfaces and they will all change so none of the 32 bit apps if you deploy them in the field they are almost certainly going to be broken you will have to recompile them to a new abi and then there is a desktop 64 bit desktops we basically only have to worry about bugs so if you are running XFS you will have to upgrade to a new kernel that supports future time stamps in XFS and that's it then and if you have applications that have bugs you will have to update those applications and that's all easy 32 bit distros for a lot of those the plan is that they will go away so if you have an embedded system that happens to rely on a Debian instead of open embedded or open zoosa or fedora or open2o any of the others they will by 2038 be out of long gone probably but you might still be using them because for your product it doesn't it happens to not have any communication interfaces maybe and you think it doesn't matter but then if you have a real time clock it doesn't matter these are the architect the distros that I could find easily that have support for 32 bit user space today out of these almost all of them are on their way out or will go away in the near future these are the ones that probably matter for a long time longer everything that has an rmv7 port at the moment has a good chance of having someone deploy this distro in some embedded device or in some low end desktop machine somewhere so those are the ones we worry about all the raspberry pies and all the industrial embedded stuff Debian is probably going to be the only one with an x86 port that will matter in 2038 even if they stop x86 at the time there will still be users somewhere and this is roughly my progress the driver code as I said that was hundreds and hundreds of patches that is mostly done we basically have patches for all the drivers we are still lacking patches for a couple of the subsystems so video for linux, alza and sockets they are the big ones where we are working on fixing the user space interface is still but the individual device drivers are usually fine the core timer handling the only thing that is missing is removing the interfaces that are still used by one or two remaining drivers for which the maintainer has not picked up my patches yet system calls as I said we have a couple of system calls that have patches that we hope to get into 4.21 and for some of them we still have to decide on how we do it file systems there is still some cleanup work and some development work for a couple of them, most of them are already perfect and then the architecture specific code I have one colleague who is working on reworking the way we handle system call tables fundamentally so at the moment we have about half the architectures using the same system call table as in generic that is something I worked on at least 10 years ago probably longer so all the new architectures are fine they only have one table but then we have 10 to 12 different architects that each have a completely separate system call table not only the contents differ but also the way that the system call table looks like when you look at the source code and it is very hard to find out which architecture actually implements which system call so we are fixing that first once we have that done we just add all the system calls that are missing to make sure that each architecture has all the support in the same kernel version in order to make it easier for the Lipsy people and that they can set a minimum kernel version say if you have a Linux 4.22 kernel it will work so there is still something going on there for the C library it depends a lot on which library you use and the distros for Debian we have had some long conversations about migration plans other distros were just convinced to drop their 32 bit support sooner but it is still very much open what is going to happen there and that is it for me yes you have a microphone I have a question how do you test such changes that you do and more generic question is there any test you that can say this part or this part of your operation system behaves incorrectly so you have to look at it so for the most part we try to make it so that the changes we do are automatically part of what people use so that the general testing we haven't actually started testing much so in the kernel I try to make sure not to change things that don't see testing but if you have a system the only real thing that you can do is set the time to 2039 and see what happens the other thing that we did is try to make it possible to disable all the 32 bit interfaces if you don't have the old system calls anymore they are completely removed you run your user space today and it tries to call one of the system calls it fails immediately which is much better than failing in the future have you considered using kernel personality to implement the choice of whether doing 32 bit or 64 bit timestamps we thought about this right at the beginning and decided not to do it because it would basically end up meaning that we introduce a completely new binary interface which is sort of the opposite of what we are trying to do because all of this is just for compatibility with old systems and if we break in compatibility then that's this ah we failed, yes what about user space applications if any application copies a time theme into a long it also dies yes that's definitely broke there will be a lot of applications that are broken and there's not much we can do we can fix a lot of the source applications when we get there, when we rebuild Debian we will find those problems if you have an in-house application of course if you have a bug that's no different from porting from big and into little and from porting from 32 to 64 bit you will have bugs you say you will find those problems you will find them in 2038 yes if it's your application and your bug you will have to fix it you will have to find it we thought about having some way to make it easier but we haven't actually done any of that one more question do we need to really care about 32 bit platform distributions except embedded stuff I don't know I'm just trying to fix the kernel because for example in your slide I guess Chrome OS never was certain to be right sorry Chrome OS yeah Chrome OS from Google the last 32 bit Chrome OS devices are going out of service very soon and they're all based on EMMC which dies long before that so I'm not worried about those do you know from anyone who is trying to fix user space only so a new libc which could handle an old kernel that would not work because the kernel will just still crash thank you ok thank you