 Okay, so I think we got started now. Welcome to my talk. Today I'm gonna talk to you about PTP and how to verify it in the real world. I will add some failure mode to my encounter if you deployed in the real world and we will have a look at it. And yeah, basically I'm Johannes on my social networks. I'm known as Stickenhubelix. I'm a former systems engineer. I worked in the field for like 10 to 15 years, depending on how you define working in the field. And by now I'm working as Pengatronics, and I'm at the Lynx Consulting Company in Hildesheim, Northern Germany. And I'm working there as a senior kernel developer. Small disclaimer, this talk may contain some profanity. So if you're offended by that, you may leave now. Rocky Horror is of course about interaction, so I want you to get interactive with me. And there will be a common thing and I will ask you to always check your assumptions. And when we get to that, I want you to repeat after me that you also check your assumptions. So when we get to that, we will try that out and we will, I will ask you to repeat after me. I hope that goes well. This talk is about my personal experience when setting up PTP for our customers. So your mileage may vary, especially if you work on other hardware than like ARM boards or other hardware than embedded SOCs. So there may be some difference in your experience. And the last disclaimer, this talk has been prepared with Linux PTP version 3.1. By now version four has been released and I have not had the chance to port that talk to Linux PTP version four, but I have been told that stability and stuff has improved a lot since then, so you may want to try that out yourself. Now, first of all, we gonna look, I give you a brief introduction to what PTP is and what PTP does. We will have a short look on which kernel components and user space components are involved. We will talk about different measurement methods you can use to verify if your setup actually works and works as intended. Then I have a lot of examples brought with me and we will look into common pitfalls, best practices and I hope that we still have time for lots of Q and A afterwards. So what do we do with PTP? The basic idea is we have multiple clocks in different SOCs, different computers and we want to synchronize them and we don't want to synchronize them with like a line we pull or with a pin we toggle but we want to synchronize them over network. We want to auto-select the best possible clock reference in the network and we want to compensate for path delays because as you see in that picture down there we have like queuing in different bridges in the network. We have path delays between those different nodes in the network and we need to compensate for that. And the way we do this is with a mechanism called two-step sync. We can employ basically we send out a sync packet, we timestamp that packet that's where this T1 timestamp is recorded in hardware if ever possible and we send that timestamp to our follower clock so that's the secondary clock that is to be synced to the leader clock and we answer with a delay request and of course that delay request is record or the receive timestamp is recorded and is sent in the delay response and I tried to, in that blue box I tried to recall which timestamps are known to the follower clock at a certain point in time. If you follow through and there's those nice formula that's the only math we gonna encounter in that talk and you can actually calculate the delay and the offset of those two different clocks and you can use that for compensating for of course the path delay and for the clock's offsets and you can use that to calculate from the sync point in time and to basically retune your clock or calculate the clock offsets of leader and follower. That's the basic idea of how you do PTP and that's the basic idea of how basically any time synchronization protocol in the world works. For the selection of the best leader clock, the basic idea is to announce capabilities that's like clock quality if you're running from a crystal, if you're running from like an atomic clock or a rubidium reference and to also have like some user configurable stuff because you may want to use a central entity in your network as a reference clock that won't go away and maybe skip a clock that is in the periphery of your entire network because well it may be unplugged and then you have to re-elect the best master clock or a leader clock and there's also like some tie breakers if you have different clocks of the same class and the idea is basically if you want to become a leader clock or if your system says it thinks it's a good leader clock it will announce that to the network and then this algorithm runs that basically selects the best leader clock there. That's in a nutshell how PTP works but there's a catch to that because there's like a lot of different variants of how stuff works. There's first of all the question of how your bridge basically in most cases that's a network switch works. There's like a boundary clock which runs like one of those two-step syncs we saw earlier between each of its neighbor or to each of its neighbors. There's like a transparent clock where you don't run a separate sync to your neighbor clock but you instead compensate for queuing delays within the bridge. So if the bridge knows how long the package or how long it takes for the package to travel through the bridge it can actually use a field in the message to compensate for that or you can have an ordinary clock which is basically worst implementation you would just ignore queuing delays. Some of the different PTP profiles only allow for a certain type of bridge. For example, if you look at 802.1 AS that's the TSM version of PTP, GPTP it only allows for boundary clocks so each switch has actually to run a synchronization to each of its neighbors. So that may differ a lot between different versions and it strongly depends on your application profile and on the requirements for that application profile. The same is true for the sync type. We saw the two-step sync. It's called two-step sync because we send like the two messages you see those two sync and follow-up messages there and you could theoretically include the timestamp T1 into that sync packet if you have like special hardware that can inject the timestamp within the packet at the point of time where it's sent. It's basically a mess. It never really worked especially for faster than 100 megabit, so for gigabit ethernet or for even faster line rates it actually never worked and it's usually not recommended to use one-step sync but it's possible for some of the profiles in theory. The same is true for the transport layer. You have like layer three implementation so basically in UDP packets and you have layer two that's the way TSM does stuff because it's layer two protocol suite so it does us in raw ethernet packets basically with a special ether type. You can have different ways of delay measurement. You can measure between different points in the network and measure point to point or you can measure your delay end to end so from one leaf node of your graph in the network to the entire different end and compensate over the entire path or as I said you can compensate for different path delays within certain sub-paths of the network. And of course as with every protocol there's like lots of extensions for redundancy for special applications for whatever you can think of for security. Whatever, we will ignore that part for this talk but be aware that different profiles can change the behavior of the system a lot and they're usually incompatible to each other for how PTP is implemented on Linux there's a general abstraction in the PTP hardware clock that basically just describes a abstract clock that can be tuned and this is usually used with hardware offloaded packet time stamping most of the time you use the hardware offloaded version because it gives much better performance and usually FMAP Max nowadays have support for actually telling you when a packet is sent or received and they usually use that PTP hardware clock abstraction I put the documentation links in the presentation so you can read in the kernel docs if you're interested in how that is implemented and how the abstraction looks exactly. The closer the general rule of thumb is the closer to the actual wire you do your time stamping the more precise you can more precision in your time synchronization you can achieve that's a general rule of thumb. The more interesting part from my point of view because there's more variation to it is the user space and the most common project used there and yeah, it's Linux PTP it's quite well established and actually quite well maintained it's quite an active project supports lots of different profiles as long as they don't interfere with the baseline implementation. It's quite tricky to configure the user's config files and also command line you can also pass lots of command line parameters and sadly it doesn't always check for consistency overall parameters and it's quite hard to get it right and it'll sometimes won't tell you if you have like an incompatible configuration and it will just fail to synchronize. By now they release quarterly until version four they had like once in the blue moon release schemes I think version three something was released like three years ago and I'd usually recommend to go to a new version because they tend to fix quite a lot of issues and to also support like newer versions of the synchronization protocols like the latest PTP versions. By now I just recommend to go to the quarterly releases probably or just pick master if you will. There's also some other projects like PTPD or like the Excel four stack I'd only recommend to stay away from these because they're usually not well maintained or only cover small subsets of profiles and often that's like in our company we call it industry code quality it's quite a bit bad. Well, that's the part of which parts we need to put together and now we come to the measurement part so how can we measure if two systems are actually synchronized and the most obvious way of doing that is basically just generating pulse outputs on each second for example usually they are called pulse per second outputs or PPS for short and just hook them up to an oscilloscope or time domain analyzer or whatever measurement equipment you've got and compare the edges if they match up and you really want to observe that output over a longer period of time that's quite important because well you want to make sure that you don't have like jitter over time or wander over time or systems drift apart. There's also other measurement methods like reverse sync where the leader clock sends back a sync message to the leader that needs support in the leader clock stack usually tends to work quite well but you should really verify with like one of the pulse per second outputs that your sync output you measure is actually what you think you measure and that's where the interactive part comes in so I want you to shout as loud as you can always check your assumptions now at this point in time and we'll try that on the count of three so one, two, three always check your assumptions you're great, you're the best. So that's the first thing I learned when I worked with that reverse sync method because I thought well that looks really nice well it didn't and you really want to check that your measurement system is not fooling you basically the same is true for the ingress and the egress measurement methods where you observe your local clock so every system basically has a quartz crystal or something that you derive your clocks from and you can observe the incoming sync packages and watch if they drift apart and how fast they drift apart and if you have like a high quality local clock you can compare it to the incoming sync messages and you can check if like there's a linear regression or how the different systems behave and the same holds true so one, two, three always check your assumptions that's good, that's great yeah so Murphy's Law is very strong if you set up PTP and it becomes even worse because you have that many permutations in that and in the different sub settings and it's I will show you some of the mistakes I made in the past that's not at all exhaustive and you really want to make sure that your measurement setup is set up well and that you can actually measure sync and I'd also strongly recommend you add like plausibility checks what do I mean by plausibility checks you're sending times over your network and you can actually make use of that and not only check that every sync pulse you measure is the right one or that they align in your oscilloscope but also you could like check if the absolute time of day you send over the network matches what you expect to be sent and I think in the next slide I will show you my measurement setup which will make it quite obvious what I mean with that and of course always check your assumptions so we have like a GNSS receiver that's a micro blocks whatever GPS receiver and we feed the pulse per second and the absolute time of day in the PPS capture and we use that to tune the PTP hardware clock and we send that time over to a device under test and we generate a pulse per second output from that and we compare that on an oscilloscope so I said the GPS has like a time of day you really want to make sure that your device under test has the same time of day synced to the time of day you set in your reference system or you received in your reference system because it didn't always in my case and may have some strange issues there where for example your PPS capture may be wrong or your GPS may not work and you just send over your local whatever clock and you really want to make sure that because we measure like the PPS output that you actually measure PPS output that has been captured by your leader clock, right? So how does it look like? We have an example measurement here I hooked up the oscilloscope as I showed you and I hooked it up to, well, a cheapish scope but it'll do for that example and please check the time scale, I circled it in red that's what you will expect for a system that isn't tuned too well but is syncing okayish I run the test for like over three hours I basically pumped the persistence of my scope to infinity and I observed like normally distributed excursions so of course there's like a control loop running that synchronizes your hardware clock that's what you'd expect for not too well tuned but for out of the box system more or less and yeah, we have that blue trace that's the reference clock and the yellow one is the device on the test and we will look into some failure modes I encountered you really want to make sure to run your measurement over a longer period of time because while you may lose sync within that period or something you want to make sure you captured those failure events and make sure that you're actually synchronized and stable and basically that's what we did in this case you also want to make sure to like run your device on the test through several temperature cycles to make clocks drift if they weren't synchronized for some reason because that could fool you that quartz crystals are quite good in keeping like semi sync because they're quite small and tolerant so you can easily get fooled now that's an unsynchronized system basically we just have like a random clock running through and in this case my synchronized signal wasn't captured my setup on my leader clock was basically I misconfigured the leader clock the same good hold if you have a link issue basically the systems can talk to each other if you have incompatible settings there's a lot of reasons why a system could look like that in this case my link capture failure my reference signal capture failed what we see here is a quite large time offset we have 10 milliseconds per division so it's quiet for 40 milliseconds here and the issues if you have that large of an offset is that you can zoom in and even if your system has deviations and is not synchronized the offset and the one that's generated by each measurement is so small you probably won't observe it so basically you don't know if that's synchronized so it's not synchronized the signals do not overlap but they could have the same frequency basically but you don't know in this case the device under test is running ahead of our reference time and actually they didn't drift apart and it wasn't an issue in the driver that failed to set the absolute point in time where the rising edge was generated could also be an issue with time scale you have to keep in mind that GPS uses a different time scale than UTC and other GNSS systems have completely different time scales like Russian one uses like Moscow wall clock time for whatever reason and depending on your reference time source that may introduce quite some issues in your system you also have some systems where the delay that is measured in hardware is over compensated there is some Intel drivers that are famous for doing that or your hardware clock may be broken there's quite a lot of reasons but we see in that or in this case we see that we actually are ahead of time of our reference signal and that's very, very unlikely for transfer signal in this case we have like an asymmetric distribution of our error I didn't measure for a very long time in this case it was caused by a bad energy efficient ethernet setting I'm not entirely sure why it is wrong but the EE that influenced the time stamping in some kind and some weird way I'm not a hardware engineer I don't know why it is wrong exactly but it influenced my measurement you could if it's constant and if you qualify that it's constant you could compensate for that but I'm not the one to ask about why exactly it is wrong maybe some power gating clocks are different whatever in this case you see that we completely lost sync for a moment and we had like the clock drift away for some time and then gaining sync again so if you only did like a temporary measurement it would look quite okay and in this case our leader clock missed the transmission interrupt which is a problem especially with some Intel cards and there's actually like a Linux PTP option that you can make your system a bit more resilient about and yeah I didn't use that in this case so we missed the time stamp and the system got out of sync the clock was speeding up a bit so the edges were traveling left after some back of time with the error mode we just regained synchronization and the clock tuned back so in this case you only see this failure mode if you record over a longer period of time if you permanently lose your sync of course you will drift apart over a longer period of time in this case it didn't drift much that's only like several hundred nanoseconds because I had a firmly stable system and the clocks were quite well synced before we lost our synchronization so that's how this basically looks like so you drift apart might also drift back a bit depending on your power supplies and thermal effects now how does it look if we look into the lock files because that's what we usually see if we look at software for at least layer two transport it looks pretty much like this we have like a start where a negative delay is indicated and that's basically because the clocks are out of sync and somewhere weird startup things are going on we can safely ignore that if that's only a transient startup and then basically we just start and that's more or less how it will look and we have like a delay of 300 is nanoseconds and we see that our frequency is a bit apart that's what we'd expect that's why we do a synchronization because our clocks are running at different speeds note that the output will look different for layer three transports and yeah, let's look at failure modes because that's what we're here for if we lose sync, that's usually quite a unique pattern that we can scan for we see master sync timeouts so the master clock doesn't send us any synchronization and I configured my system to never or my leader, my device on the test to never become master so I just lose the synchronization and it backs off that's basically how that looks like I did a little different setup and I used some bridges and I changed one of the links while running to half duplex now if we run a half duplex link the halfway time stamping won't work anymore because of course we have errors and we can actually tell if our time stamp belongs to a successfully sent frame or to a frame that has a collision so the standard forbids to run on that and it will basically look something like that ish but it's very hard to tell apart from a complete loss of sync because well we cut a cable or something like that and that is a pattern that you will encounter a lot it's quite hard to tell the different failure modes apart from logs only if we start on a half duplex link it will look a bit different we just have like a fault detected and it backs off and won't start at all in this case we see that we measure a peer delay of like 1700 nanoseconds and there's a part in the standard for GPTP that's the TSN version that forbids links with large and 800 nanoseconds delay and that's basically how we detect that our peer isn't capable of doing that special version of PTP and it just bales out and loses its power or its membership in the synchronization domain and we have an entire loss of sync in this case it was caused by an incomplete driver and a hardware bug for basically the driver was well tested for gigabit ethernet but it was not tested at all for 100 megabit and I will probably fix that next week or so but that's how that looks and it's quite hard to see the correct line if you don't know where to look at I told you about the leader clock that missed its TX timestamp interrupt that's a problem mainly caused by the hardware design of some network interfaces and how the fact how stuff is communicated in there you have like for the receive part it's quite easy because you receive a packet you take the timestamp out of the corresponding register clear the interrupt flag and you hand it over via an ancillary message in your SKB basically and for TX it's much harder because you have to communicate back to the error queue and you have a small window of time when you can do that before the SKBs are discarded and if for some reason the nick fails to raise the interrupt at the right point in time you may miss your interrupt or it may be delayed for some reason some other interrupts going on and if Linux PTP fails to capture the interrupt or can't read it from the error queue that's the way how it is communicated back it will basically go in the fault state and will back up for several seconds and will not synchronize or start resynchronization for several seconds and basically that's the line you want to scan for and there is an option you can set to make that more resilient so some come pitfalls I promised you in the title that we do a time warp if you have like different sources for synchronization in your system like NTP for example you may encounter a different time scales different reference point in times where system time jumps and if you synchronize your hardware clock to your system time that is quite a bad thing and will basically break your PTP we talked a bit about PTP profiles we may have missing incomplete or defective time stamping support in hardware or drivers never rely on the data sheets that's really really bad because vendors will always tell you that they support any different or any available modes they usually don't and they haven't qualified for any of them or not for some very uncommon ones you should do your measurements yourself you may encounter differences depending on whether you time stamp in the Mac or in the file and yeah as I told you the hardware definitely supports a subset of one step, two step sync layer two, layer three, point to point end to end and you will have to choose your hardware depending on what profiles you need in your application I talked about time scales you have different offset sleep seconds whatever you have sometimes false positive debug outputs you may encounter issues with demon stabilities your demon may die and you may not notice if you are not using like system D and check in and restarting stuff your measurement method for the measurement method you have to choose if you like run one pulse per second or like 100 pulses per second with more pulses you usually have like a better measurement because you take more samples over time but you can only like detect errors up to 100s of a second because then you will roll over and if you have like a larger delay you won't notice so you will probably want to check both of them you may have sporadic dropouts that's the TX timestamp timeout option you want to set especially if you're working with like internal itinerary Chinese cards you want to check if your leader clock is actually on the device you expect your leader clock to be on and on the device that you actually have your reference time input and never rely on data sheets measure yourself so for the best practices choose the correct profile often you need to select one depending on the application if you do TSN that's the GPTP one if you do like power, substation, automation, crazy stuff whatever they have their own profiles so check if you're hard to actually support that never copy paste commands from the internet there's they may use a different profile or may use a combination of settings that's incompatible by now nowadays with a new version read the fine man pages check hardware clock availability check if it's stable read your logs especially logs and bridges are quite valuable to for debugging because you can check what exactly goes wrong if your offsets, if your peer delays drift apart it usually gives a good hint what goes wrong again the man pages they're quite good and thoroughly test over a longer period of time and of course last time let's check your assumptions you're great so PDP King work great if done right it has lots of parameters and your mileage may vary basically that's anything from my point if you want to work more with those fine things and want to be paid for doing that we're hiring and do you have any questions? I can see with the scope that you can see the clock synchronized but how do you tell that they both have the same time of day? Sorry, could you speak up a bit? Yeah, how do you tell that they both have the same time of day? Basically you can just read out the absolute value of the clock and compare if there are the same values within a certain reasonable margin so if they like less than a second apart you could say tell that they are probably reasonably synced if you're just run like time date controller whatever on both machines and check if they have like a reasonable close value that's usually the way I do it you can get if they're like one is epoch and the other is like today I'm wondering for PDP does it work over a complex network layout with many different switches, routers, these kind of things? I don't know, to be honest So your test is basically point to point connection? I did some testing with switches though I only tested for some profiles Okay, so you can say it's actually dependent on hardware vendor as well for those Yes, because they need to interfere especially for boundary clocks where the clock or where the switch takes part in the interaction and they often only support a certain type of clock so for the switches I use that's usually like the TSN version they probably won't do like the power substation profiles and that will vary a lot depending on your vendor and your switch system you use Okay, thank you Did you ever work with EtherCAT from Backoff? Yes Would that be a nice option for having a reference clock tested so you have both from your clock source running out on the EtherCAT device and giving timestamps for PBS or whatever for testing? Probably not so much because the way EtherCAT works is it bypasses basically Ethernet and uses just the physical layer and the way EtherCAT timing works is to offload precision from the master they reclock the packets in the first slave so from the first slave on stuff is probably okay to or it'd be probably okay to reuse the regenerated clock from the slaves but if you compare the master system to the first follow for example you may encounter some offsets because they actually regenerate the clock from the packet flow so depending on your hardware setup might work Thank you for the talk In recent Wi-Fi standards there is also some time synchronization intended mostly intended for multi-room speakers Do you know something about how this could or how PTP or time synchronization could benefit from these? To be honest I haven't looked too deep into that but you have like a different situation because you have first of all links with an adjustable path delay because propagation is more or less with the speed of light and you have an entire different situation because you have a shared medium so you have to make sure that you don't have like interfering synchronization pulses but I haven't looked too deep into that if you want to fund such work I'd be happy to look into more and deeper into that Insert money here Can you tell us some more about the hardware you used for your measurements and if you take care on adjusting the high delay compensation? Yeah, sure Basically as a reference clock I used a micro blocks GPS receiver that generates a PPS output and via serial link a time of day an EAA string that's okay within a certain reasonable yeah, precision and I captured that at an Intel basically standard PC box with a E210 and with a separate E210 I generated the master clock reference and I connected that to a follower device that used an iMix 8MP which has support for regenerating PPS from its time stamping clock and yeah, basically that's a measurement setup I had like different switches in the middle to also check with switches one of the more hackable ones which is reasonably priced and has quite good software support is from Contron which is an industrial switch D10 I think it's called and it's not too expensive and you can access quite a lot of measurements which is always nice as a debugging tool Yeah, basically then I hooked up my oscilloscope and whatever and checked that and I will actually look into the delay measurement of my PHY and the MAC in the next few days and the MAC actually should report the PHY should actually report in its extended registers what its delay is but I still have to verify that We're actually at time, thank you Thank you