 The topic of this presentation is understanding GST2 for B in less than 30 minutes. So it definitely won't be a complete guide to GST2 for B, but I will try to cover the basics, the most important subjects. And if you're still interested in it, there are some links at the end of the slide that you can follow if you want to further dive into it. My name is Lars Peter Klausen. I work for Analog Devices. And so first introduction, what is GST2 for B? And it has been designed up to 32 lanes per link, and each lane can transport up to 12.5 gigabits per second of raw data. The actual useful payload is a little bit less, because there's a little bit of protocol overhead, but overall you can get like 40 gigabits per second, 400 gigabits per second over one link, which is quite a bit. And the standard, the GST2 for B, or GST2 for standard, not only describes how to get bits from A to B, but it also assigns meaning to the bits. It describes how to map a sample onto those high-speed lanes. This is all defined in the standard itself. Unlike, for example, if you are using converters using parallel LVDS, there's no common standard how to map the data onto the LVDS lines. And other things that the GST2 for standard offers is multi-ship synchronization. So if you have multiple chips, multiple ADCs, multiple DACs, and want to capture synchronized data or transmit synchronized data, the standard defines how to do this, so there's no longer chip specific. And the other very important thing is deterministic latency. We will talk about this in more detail in a moment. But it basically allows you to make assumptions about how much time will elapse between when the conversion was done and when the data arrives at your logic processing device. Quick timelines. So GST2 for B or GST2 for B went through three revisions. There was the first one in 2006 and it only had one lane and it was only running at 3.125 gigabits per second. And people quickly figured out it was not enough. So in 2008 there was the first revision, GST2 for A, and they added support for multiple lanes. But it became pre-apparent and still not enough. So there was the second revision or third revision, GST2 for B, which raised the limit for each lane up to 12.5 gigabits per lane. And if you kind of follow this timeline, right about now should be time for the next standard. And there are actually devices shipping that go beyond what the standard has to offer and run at 15 gigabits per second pilling. And I think there will also be a RevC at some point in the future soon. And the other important thing that GST2 for B introduced was the deterministic latency. There are three different subclasses. So subclasses 0 doesn't have a deterministic latency and 1 and 2 basically have different methods for achieving deterministic latency. And it also introduced a more flexible clocking scheme. In the previous iterations you had to supply the same clock at the same frequency to all the devices in the system. And with GST2 for B it's possible to run some of the clocks at harmonic or sub harmonic frequencies. This means the clock going to your converter might be twice as fast or four times as fast as the clock going to your FPGA. Because let's say you have a converter running at 1 giga sample per second. You won't be able to run the logic in your FPGA at 1 giga hertz. So you maybe want to scale down, run it at 250 mega hertz. And this was introduced by the standard. So motivation, why do we actually need this? Especially for the software defined radio stuff. Why do we care about GST2 for B? And what we're seeing is that the increasing data demands like a lot of the new mobile communications standards have rather wide channels. Wi-Fi, the latest version, AC has channels which can be up to 160 mega hertz wide. LTE supports channel bundling or channel aggregation up to five to 20 mega hertz channels and to one logical 100 mega hertz channel. And at the higher bands like what's in discussion for 5G and also Wi-Fi AD, you got channels which are more than a giga hertz wide. For example, for Wi-Fi AD I believe it's like 2.6 giga hertz your channel. So you have to kind of like capture all this data and get it somewhere. And another trend we're seeing is the adoption of diversity transmitters and receivers. And the most simplistic way this is kind of like MIMO where you're just using it to gain diversity gain which means you have multiple antennas which are placed in a certain specific pattern so you can receive the same signal multiple times. And if one of the signals gets some kind of distortion you set things up so that the other one doesn't so you can still recover your signal that you're sending. But more and more wireless standards are actually designing support for diversity into the standard itself like with Wi-Fi AC. There's kind of like provisions for measuring the channel and making actually using the different multi-path propagation of your signal as separate channels. And for this of course you need diversity transmitters and receivers. And for each transmitter and receiver you add to your system you're going to double your data rate or increase your data rate. And the last trend that's becoming very important at the moment is direct RF which means you are no longer having your analog mixers where you're down-modulating your signal of interest but you're capturing the whole spectrum up to your signal of interest like you capture 2 GHz, 4 GHz, maybe even more like 10 GHz at the same time and then do digital processing on it to extract your signal of interest. And for certain applications it gets you better results than doing this in the analog part of your design. But why does it mean that we actually need something like JST2 to be? Why can't we just keep on and continue what we had before? And like parallel buses. And the issue with parallel buses is there are two ways to increase your data throughput. Either increase the number of pins or increase the clock rate. If you increase the number of pins you can send twice as much data. If you double the clock rate you can send twice as much data. More pins has the issue of routing. So if you like have 40 pins or even 60 pins it gets really complicated routing this all and you really have to kind of like hope that your receiver device is the same mapping as your transmitting device otherwise you have to kind of like do a lot of you need a lot of layers to be able to route it. And the other issue the more lanes you have is the more power you use. So power becomes also an issue at some point. And the other thing is if you increase the clock rate you are running into the issue that for parallel bus you need to capture all your data at the same time. You have your clock and then you have your data. And both the clock has jitter, the data has jitter, the skew between the clock and the data. And all of these kind of like depend on different operating parameters. They will change with the process, with the voltage, with the temperature. And voltage here doesn't mean kind of like at 1.8 and 3.3 you will get different propagation delays. It actually means that your power supply which has a uncertainty of like plus minus 10%, even though it's set up for the same nominal voltage will introduce different skews and delays here. And the higher up you go with your clock frequency the smaller this window will get and eventually it simply becomes impossible to match it. So, and this is where GST2 for B or GST2 for Cams into play and a quick overview of how such a system looks like. You basically have four components. There's the clock chip. And the clock chip is connected to some kind of reference clock. And then the clock chip will typically contain a PLL and a couple of clock dividers. And the clock chip is responsible for generating all the clocks that are used inside your system. And it's also responsible for creating the so-called SISRF signal which is used for synchronization between multiple devices. And then up here you have your transmitter. And on the other side you have your receiver and in between there's a high-speed serial link. And as we discussed already there can be up to 32 lanes. And there's one additional signal which is the so-called sync signal, synchronization signal. And to establish a link what the receiver does in the beginning it pulls the sync signal down and then the transmitter will send some kind of synchronization sequence. And once synchronization has completed the receiver has locked onto the signal that's being sent here it will de-assert the sync signal and the transmitter will start sending the data. But in addition to this the sync signal can also be used once the link has been established to do error reporting. So if something goes wrong, if the data is no longer good if there are lots of errors the receiver can assert the sync signal for one clock cycle to tell the receiver at a transmitter that something's wrong and maybe it's time for a re-initialization of the link. And there are two different classes of transmitters and receivers. There's the so-called converter devices and the logic devices. And the converter devices are your ADCs, DACs and so on and the logic device is the processing. And in addition to this as I said GSD not only defines the physical link layer but it also does a lot of other stuff. And the standard defines four different layers. There's first of all the application layer where all the application specific processing happens and since this is application specific there's obviously nothing in the standard that says what needs to be done here but what the standard defines is the interface between the application layer and the underlying layer which is the transport layer. And what the transport layer does it's responsible for the so-called sample framing and also lane mapping. This means it takes the raw sample data and packs it into in a certain way that everybody agrees on all GSD 2 for B devices agree on and then distributes this data onto the different lanes. And then the next layer is the link layer and the link layer is per lane. So from the transport layer the transport layer will pass so-called octets which is eight bytes of data or eight bits of data to the link layer and the link layer will do some processing like scrambling, scramble the data it will do the so-called character replacement which we'll talk about in a moment and it will also do 8B, 10B encoding. And then at the physical layer we have really the high speed serial interface and this typically involves the conversion from parallel data into serial data and on the receiving side you also do clock recovery and often you also include signal shaping because at 12.5 gigabits per second your transmission line is really transmission line it's no longer normal data. So you need to do some signal shaping and yeah, so let's talk about the converter device the converter device there are two different kinds there's the ADC device and the DAC device and so first of all a converter device can contain multiple converters and typically in modern converter devices in addition to just the data conversion there's some kind of processing and here is basically the split where the GSD layer starts first of all we got the framer which works over all data and then distributes the data onto each lane and typically you also have a PLL in there to generate the clock for the high speed serial link and on the receiver side on the DAC side it's basically the same you have one lane or you have your lane specific processing then it goes into the D-Framer and the D-Framer distributes it to all the converters to all the DACs and what's important is that all those ADCs and DACs inside one device are all running synchronous and logic device looks basically the same like except that instead of having your converters here you have huge block of custom processing and the one special thing about the logic device is that one logic device can actually interface to multiple converter devices the so-called multipoint link for example you need four ADCs but your converter device only has two ADCs so you can take two of those and combine them into one logical converter device which has four ADCs then the link so as I already said the link consists of multiple independent lanes and on the physical level it uses differential current mode logic it's kind of like LVDS but a little bit more power to be able to handle the high speeds and it has an embedded clock rather than a separate clock this way you no longer have to deal with this clock and data matching and as I said it does the data scrambling and the data scrambling is optional but it's highly recommended because if you turn off data scrambling you kind of like your data might contain certain patterns and this will result in certain spurs which will then show up in your actual data so that's why the data scrambling should pretty much always be enabled and also the CDR kind of expects data scrambling to be enabled and a link or a lane has a couple of parameters there are lots of them even more than what's shown on the slide I don't want to go into detail of all of them but you can see there's a lot of lots of things which can be configured which tell the sender and the receiver how the data is sent over the link let's talk about deterministic latency quickly so propagating data over a link takes time IQ might have pipeline delay and also propagating the signal from A to B over your transmission line takes time and this time so part of this is fixed you know how many pipeline cycles you have but part of this is also kind of like variable and depends on manufacturing differences and environmental conditions again process voltage temperature and there are certain systems and algorithms that are very latency sensitive for example closed loop control system where you transmit something then measure it and then adopt your transmit based on this like DPD and also radar where you want to measure the runtime difference between two signals is very dependent on latency so ideally you always want to have the same latency and the way GSD does this it does not remove the latency from the system but it compensates for it and the way this works there's the so-called local multi-frame clock which is kind of like a slower version of the frame clock can be yes kind of like a local clock sorry slow clock generated inside the device and all events that are deal with synchronizing things are synchronized to this local multi-frame clock and how it works is first the receiver asserts the sync pin at the asserts the sync pin which means it's ready to receive data then on the next clock the TX starts sending data and it will take a little bit of time until the data reaches the receiver and there's also a certain amount of variants in how long it takes and the way GSD makes sure that your latency is always the same it uses kind of like a FIFO and it delays the data until the release opportunity so it doesn't matter whether the data arrives here or here the first sample that will be released to the application layer will be at this release opportunity and for this to work of course your variance needs to be less than one local multi-frame clock and let's quickly talk about data integrity so 8B 10B allows some detection of a few simple errors but not so many and what the standard defines is if an error is detected it should actually replace with the data of the previous frame but what many implementations do is they just assert some kind of error signal, error flag because from a processing point of view it's better to know that you got an error rather than replacing your data with random data but there's no additional data protection like no CRC or forward error correction on the link itself and if you look at what kind of data GSD is actually transporting so it's not very like kind of like high fidelity you have your DAC which will do digital to analog conversion which is a noisy process on the receiver side you have an ADC which does analog to digital conversion which has a noisy process there's always background noise which kind of like affects your data and then of course you all know the RF channel and the RF channel is kind of like the worst thing you have lots of interference that will flip bits, destroy bits, whatever so what the GSD link has to or needs to offer is just it needs to be better than all of this the bits, bit flips that are introduced at this level need to be the noise floor of this part because the upper layers will already know how to deal with certain kinds of errors let's skip this so software support since GSD is kind of like a standard you might expect there's a really great software infrastructure based on top of this but the current situation is not so great there's no common infrastructure and typically the system integrator the guy who puts together the converters and puts them onto this PCB and maybe writes some software needs to research all the constraints of all the different system components different converters, different logic devices have different constraints for these parameters that I showed before and so you have to go into these data sheets and figure out how does all of this work and find a configuration that works for all of them and then you have to look up the magic register values that map to those settings and program them and typically the application developer the software defined radio person has to work with what's provided from the system integrator because changing this is a big hassle and we're trying to change this with the development of the GSD24 and the way it works it has a built-in database of all the converter devices, logic devices and so on and all the constraints that are imposed by those devices and those rules this database is not kind of like a database with AB this supports one lane, two lanes and so on but it's rather a database of programmatic rules where you specify relationships and then the system integrator only needs to specify in addition to the constraints that are already found in this database constraints of his board like for example a converter might support four lanes are wired up on a certain system and then the application developer can dynamically change the configuration at runtime for example change the separate and I think this ties in with the talk we've seen before because people want to get stuff done they don't want to care about all this low level stuff they want to build a software defined as radio applications and hopefully what this lib GSD24 will provide will be able to achieve this and the other part the lib will do it will automatically map a configuration to register settings and that's it questions or yeah yeah so there's the Mlabs open source implementation of the GSD24b core it's written in in mygen yeah it's kind of like a special language it's like maybe a little bit like myhtl but different and they provide an open source core and let's see this one will also be available soon it's something we've been working on it's yeah the complete transmit and receive course implemented in the FPGA vendor independent way and it's not quite ready yet but it will in the next two weeks appear on our github repository so yeah that's one question over here have you tested the Mlabs core with your chips? the what core? you just mentioned an Mlabs quite I wasn't familiar with have you tested it with your chips? I know that they're using our chips yes the question is has the Mlabs core been tested with the ADi converters and yes it has been they're using the ADi converters is it an open source core? this one is open source, yeah you mentioned receiver reasserts the sync line the transmitter sends some synchronization information and also send information about what the configuration parameters are of the samples such as that contained in that configuration the main two are the receiving side and the transmitting side both manually specify this is what I'm sending and the other side has it so there's a so called initial lane alignment sequence and in this lane alignment sequence you send the full configuration parameters over the link but typically you also program them on both sides but if you had hardware which supported this you could do it this way and you can use the channel for configuration using spy or I2C any one question over here sorry could you repeat sorry could you repeat sorry could you repeat sorry could you repeat sorry could you repeat sorry could you repeat yes it could be but the standard explicitly forbids this sorry the question is use the recovered clock to run the device itself rather than distributing a separate clock to the device yes you can but then you don't get the deterministic latency because your recovered clock will have kind of like random phase depending because you recover the high rate clock and then you create a down divided clock for the parallel section of your processing and the parallel clock will have to your transmit parallel clock no no we can talk about this later think that's one more question if you have a short one I think there was one over there so the question is if the latency is mandatory in the standard and the answer is no I mentioned this quickly in the beginning there are three different subclasses there is subclass zero and subclass zero basically just says this release opportunity just release the data when the last lane arrives ok that's it thank you very much