 Hello, so we'll speak about, talk about the Ethernet link, so from the Mac to the link partner and we will see what is, what do we call an Ethernet link, what can you find in it, and you'll see various related topics about it. But first of all we will give a first quick introduction about ourselves. So my name is Antoine, I work at Bootlin, it's a French company specialized in embedded Linux topics and we are doing consulting and trainings. As part of it, I had the chance to work on many networking topics such as Mac driver, five driver or switches as well. And today I'm with my colleague. Hi, so I'm Maxime Chevalier, I work on similar topics at Bootlin. And so today we're going to introduce you to the first two layers of the network stack. So we're going to introduce the technologies and protocols that are used to connect the Mac to a file and the file to the link partner. And see how this interface is with Linux. So Ethernet has been here for more than 30 years now, so it has evolved and is now a very complex set of specifications. So we will take some shortcuts at some times so that we make this a bit more understandable. Yes, we could speak for a day about it, so we'll try to. So we'll begin with an introduction to the Ethernet link layer to first get a definition of what do we call the Ethernet link layer, what this link layer will solve, what problem will it solve and what it will look like. So to begin with we need a quick remainder about the OZ model. So do you know the OZ model? Yes, probably lots of you. So if we have a look at the OZ model which describes the networking stack, you can see many levels, many independent levels. And the first one would be the physical layer. So the physical layer would be responsible of transmitting digital data, converting digital data into a signal and to transmit this signal over a medium so that two devices can be connected together and can talk together. So the physical layer will be really dependent on a given medium because you can have many different mediums used as a physical medium, such as electrical ones, radio ones, optical ones. So we'll see a few of them. On top of it you have the data link layer such as Ethernet. So we will speak about the data link layer in case of Ethernet during this talk and the data link layer will be responsible of connecting two devices and transmitting data between these two devices. It will transmit data using what we call a frame and this is important. This means you will have data which will be structured with the raw data plus some information needed so that you can understand what is being transmitted. Then on top of it you have the network layer such as IP and this one is really important because if you're only using the data link layer you can only connect two devices directly connected but if you do want to connect more than two devices and to be able to send one data from a given devices to a very remote devices hoping through routers you need this network layer. So the main idea with the network layer would be to be able to route the data through many machines up to your destination. In case of the networking layer we will be using packets. So a packet will be the data you will want to send, a chunk of data you want to send from one device to another one. On top of it you've got another layer which is L4 and this layer will be the transport layer. Given the one you'll be using, so the two main examples would be TCP and UDP you will add extra capabilities such as reliability, ordering, flow control, so it really depends on the one you'll be using. So this will be on top of the network layer. Today while speaking about Ethernet we will focus on the first two layers. So the physical layer and the data link layer but we will not speak about any other layer of the OZ model. So the two lowest layer of this model. So if we have a look at what Ethernet can look like the idea is you want to send data from a CPU to a link partner which will be a remote device and to do this you will have a few elements on this link which will handle different things. First one would be the MAC. It's called the media access control and this one will be responsible of handling everything that's linked to the level 2 of the OZ model. So we saw the OZ model. The MAC would be used to handle everything that is linked to the data link layer. So Ethernet. Then you will have the network file and the network file will be used to convert this digital signal into an electrical signal and the file will be responsible of handling L1. So the first level of the OZ model. So you have a clear separation between L2 and L1 in the design of the Ethernet link. One thing about the EFI is it will usually be controlled through a bus which is called an MDIO bus. On this scheme it's directly connected to the MAC because this is something you can see in many cases but it can be connected to something else. It does not have to be connected to the MAC itself. One comment about it. This is the main diagram about the Ethernet link layer. But you can have a few other constructions of the Ethernet link and we will see a few of them. This is the simplest one. This is the one you need to learn about to really understand what's going to happen. But you can have a few modifications within this link to handle more advanced cases, to handle specific cases. But we will see a few examples. So we will see a few examples using one real device which is called the MacIatobin board. And this board is an ARM64 board which is using an SoC for Marvel. And what's really interesting about it is you can see many networking ports. So it has four network ports with three different link designs. So we will be able to see three examples of what can be the Ethernet link. And you have six cages. So it's quite interesting. So this is a full diagram but we will see each port individually. The first one you will see would be represented as ETH2 in Linux and it can handle up to one gigalinks. As you can see, this is exactly what we previously saw. So this is a simplest design of an Ethernet link. You've got the CPU which is connected to the Mac. The Mac will handle L2 and then you've got a PHI which will handle L1. And finally a connector, an RG45 connector. So it's quite common. Then you have access to two ports which are ETH0 and 1 in Linux and they can handle up to 10 gig connections. What's really interesting about it is you have two connectors on the same port. So you can either use an RG45 connector or the SFP connector. This means that at the time you can only use one of them because you only have one Mac connected to those ports but depending on the one you'll be using you will need to reconfigure the link. You cannot have the same configuration of the link, depending on the one you'll be using. And this begins to be interesting. So you will need to be able to do dynamic reconfiguration of the link, to reconfigure the Mac, to reconfigure the service lines and to reconfigure the PHI to allow to switch between the two usage. And the first port, which is ETH3 in Linux can handle up to 2.5 gigs and it will be only connected to an SFP cage. As you can see, there are no PHI in it. So the Mac is directly connected to the SFP cage. So this means, if you do not have a PHI that you can have a direct Mac-to-Mac connection you can also have a PHI which would be plugged in at runtime within the SFP connector. But we will see... Within Linux you will have different kinds of drivers depending on which hardware you will need to drive. So the ethernet Mac will be driven by an ethernet driver which can be found inside driver's net ethernet and the ethernet Mac controller will be represented by a net device. So this is a Mac. We still need to have a driver for the PHI which is a second element we will have inside the ethernet link and to drive the PHI you will have a driver within net PHI driver's net PHI and it will be represented by a PHI device. So in our example of the maciatto bin example we have the ethernet driver which is within driver net ethernet Marvel MVPP2 and then we do have two kinds of PHI within the board and so we have a driver for each one of those PHI one which is marvel.c In some cases you can have a package which will have the mac and the PHI directly inside within the same package. So if you do have this kind of configuration this means you will not be able to connect whatever PHI you want on this mac but you will need only a single driver and this driver will be the ethernet driver. At runtime you can ask those two drivers to report a few elements a few statistics and you can also control what this driver will do The main tool used is ETH tool and ETH tool will be used to select and to modify options within the mac driver within the ethernet driver So everything that will be reported will be what the mac is seeing, not the PHI In cases you do have a package which will contain both mac and PHI then this view will be reported by ETH tool So it really depends on the hardware you will have If you have a specific driver for the PHI and a specific one for the mac then ETH tool will give you information about the mac but if you do have the same driver for both of them then it will report what this package is seeing The second one would be MII tool and this one is deprecated It was replaced in a few in most parts by ETH tool but it can still be used to dump the PHI status to ask the PHI to give you the status It's not working with every PHI but it can be useful in a few cases So it's good to know it exists if you do need to use it So we just saw what's gonna be inside our ethernet link What will be the representation in Linux How can you construct one ethernet link But then the next thing to do would be to understand how the PHI can communicate with the link partner and how the mac can communicate with the PHI And to do this we have access to standards which were standardised by I3E and you have two kinds of standards One would be the media independent interface, MII and this one will handle the mac to PHI connection So the idea is to connect different kinds of mac two different kinds of PHI and you can find a few standards such as SGMI you probably heard of it at least once and the second one will be media dependent interface, the MDI and this one is dependent on the physical medium you'll be using and this will be the standard used by the PHI to communicate to the link partner So it will connect your physical layer to the physical medium So depending on the physical medium you'll be using easy to copper cable, easy to fiber cable then you'll be using different kinds of specifications A few examples about it you can have 1000xst for example which is quite well used It's really simple, you only have a few of them it's easy to recall, to remember So we will have a look at each and every of them of course so 10 base 2, 10 base 5 I'm kidding The important thing about it is to understand that those standards will be directly linked to the kind of medium you'll be using and then we have a slide explaining the name of the standard which is quite useful to know So you will have something of the form speed, band, medium encoding on lane to have as a result standard which could be the one we just saw 1000xst and you need to know the meaning of each of those later on keywords within this name So the first one will be the speed This is the bandwidth at which you can send data or receive data to the link partner Then the band you can have baseband, broadband, passband The most common one is baseband I'm not an expert in signal processing so I cannot explain everything to you but it's depending on who the device will understand the frequencies sent to it So if it's a baseband it should be close to zero passband in between two frequencies Then the medium which is important as well and the medium would be the physical medium used by this protocol So if you're using something over twisted copper cable which is a classic RG45 it will be T So this means in my example 1000xT the protocol which will be used if you want to use 1GB link over RG45 cables Ok You have many more examples such as a base C which would be a copper link or H, a plastic fiber and you have many of them So as you can see each of them is specific to the medium you'll be using Then the encoding So the encoding used by the PCS Maxime will speak about the PCS a bit later so you will see what the PCS is responsible for And finally the number of lanes per link and for base T So for RG45 it will be the number of twisted pairs used Ok And using this you can construct the different specifications name The other thing about the link will be the parameter of the link and you have the speed which we just saw So this will be the speed at which you will be able to send data through the link It can be many things The common ones would be 1GB 10GB 40GB Then you've got a second characteristic which would be the duplex and this one is really important It can be either half duplex or full duplex Half duplex means only one of the two devices communicating can send data at a time And full duplex means you can send and receive data at the same time So you need to make sure that the two devices communicating together will use the same duplex Otherwise it cannot work And auto negotiation Is your link about to perform auto negotiation Auto negotiation will be used to exchange information about what a link could be capable of And then, based on this information the links will be able to select one common common parameters to be able to talk together So different specifications will be able to operate at the same speed duplex But you will have a working link if you use only compatible MII and MDI protocols So you need to make sure that the MII and MDI, so the protocol which will connect the MAC to the fee and the FI to the link partner will be compatible And that's it So we just saw the introduction about what is an Ethernet link and we will see what things about media interfaces Thank you So, to dig a bit deeper let's see what's inside the FI So this is a MAC to FI connection And inside the FI you typically find three main components The first one is the PCS which interfaces with the MAC and is in charge of encoding and decoding the link between the MAC and the FI so that it can transmit it to the rest of the parts inside the FI Next you have the PMA which is kind of a glue logic between the PCS that interfaces with the MAC and the PMD that interfaces with the link partner The PMA is also responsible for collision detection for example And next you have the PMD which will modulate the signal to send it on the physical medium So the PCS is kind of the important part here because as we will see later what was only at the beginning some internal part of the FI is starting to migrate to the MAC So let's first focus on the MDIO link The MDIO link is the link that is used to configure the FI So it's also called sometime SMI It's basically an I2C bus You have two lines and data And it allows to connect multiple FI to the same MAC using the same bus So it is an addressable bus you can connect up to 32 FI on the same bus And you can through this bus access the FI registers So inside each FI there are some registers that are standardized allowing to have some generic drivers The MDIO bus controller is sometimes part of the MAC Sometimes it's an external device within your SOC And there are mainly two flavors of MDIO You have the close 22 which is the kind of historic one which has only 5 bit register addresses So most of the time you have to implement some kind of indirect access to access all the register set of your FI And you now have the close 45 which allows to use 16 bit register addresses And that also provides a way to sort the register addresses between different devices of the FI So as I said, you have a PCS a PMA, a PMD Using close 45 you will be able to address the PCS inside the FI, for example So that's the notion of devices inside the FI So, here it is how it is handled in Linux So, as I said there are some generic register sets So you can find generic helpers in 5device.c for close 22 and C45 for close 45 Each FI has a unique identifier based on its model number and its vendor So that you can select the correct driver to handle this particular FI And each FI is described as a node of the MDIO bus So in the device tree, here is a binding example So all that you specify is basically if it speaks C45 or C22 and its address on the MDIO bus So now let's talk about the high speed link the MII link which is used to transmit the packets from the MAC to the FI The frames This link is replicated each time you connect a FI to a MAC contrary to MDIO For example, the first kind of link which is called simply MII is made up of 16 pins to connect the MAC to the FI So each time you connect a new FI you have to reroute 16 other pins So, there is a variant which used a reduced number of pins so RMII stands for reduced MII Then you have the gigabit version since MII only does 10 and 100 Mbps GMII is used to transmit a gigabit link It has 24 pins so of course there is a reduced version which uses only 12 pins and sometimes you can find a version called RGMIIID which has some timing tweaks on here And finally you have the XGMII link for transmitted 10 gigabits per second data It has 74 pins so it's really not something that you can use to connect a MAC to a FI on a PCB It's mostly used for on-package MAC to FI connections So, obviously there is a problem How do I connect a 10 gigabit FI to a 10 gigabit MAC if I can't use this interface So, what has been done is that some parts of the FI namely the PCS that is in charge of encoding and decoding this link is used inside the MAC to serialize the connection So, it's not something that is that was originally specified in the i3 standard but since we used already defined bricks from the standard it's pretty easy to implement in new devices Basically you will serialize the link inside the MAC send the serialized version and deserialize it from the FI or simply handle it directly into the FI as is There is the Reconciliation Sublayer which is some glue logic to implement that, but what's important here are the Serger's Lane notions So, you will transmit your serialized link over some, most of the time differential pairs Allowing to have much higher clock rates and a better signal integrity But also, the encoding needs to be done in a way that you won't have your signal staying at the same level for too long So, most of the time you use something like 10 bits, 8 bits encoding which is actually defined in the base X specification or 66 bits, 64 bits encoding So, basically what it means is that when you want to transmit 8 bits of real data you do that in 10 bits on the MII link So, for example SGMII link which is a GMI serialized When you want to transmit a 1 gigabit per second connection you actually clock your Serger's Lane at 1.25 gigabits per second to deal with this encoding that enlarge your data Sometimes the clock is inside the same lane as the data, so you have clock recovery inside the PHY and sometimes it just simply transmit it in a parallel manner And the Serger's lanes sometimes also have a specific driver that is handled in the generic PHY subsystem So, some example of serialized connections So, you have SGMII SGMII is actually a de facto standard that was, I think designed by Cisco It uses four differential pairs uses a basic PCS There are some new flavors of this link some that can transmit 2.5 gigabits per second data and also the QSGMII which is basically aggregating four links together to create one 5 gigabits per second link Then you have the ZOE So, it's called ZOE but it's written XAUI which is a standard defined in the I3E specifications to serialize the XGMII connection You transmit it over four Serger's lanes So, you have 16 pins to route And you also have a reduced version called RZOE Only two Serger's lanes this time And finally, you have kind of a family, I grouped together for XFI and SFI and also 10G base KR which are families So, I transmit this link over one single Serger's lanes that goes at 10Gbps So, how do I represent my MAC 2.5 connection and specifically this mode inside Linux So, first you only had an enumerator that would say okay, I'm using SGMII or GMII So, it's the file interface T Nowadays, you have the filing framework that is designed to really connecter with the application of this link and one will talk about it a bit later And that's what the binding looks like So, basically, you only say that your Ethernet port is connected to this file using this mode So, let's see a bit what is inside a file driver So, a file driver is a really simple thing As I said, most of the heavy lifting is done by the file framework and the rest is the resister sets The file driver, all it does is manage the auto negotiation parameters and report the link status So, is the link up or down or is there something plugged on the port So, nowadays you are starting to have more complex features implemented inside the files You have complex statistics reporting max-sec of loading and work on line configuration So, I talk about auto negotiation I'm going to talk about the auto negotiation that happens when you want to connect two devices using base T because base T, it's basically using the cat5 or cat6 cable with rg45 connector at each end So, this connector can be used to transmit from 10 megabits per second up to 10 gigabits per second and both devices have to agree on which speed to use On this example, I have two devices one using 10 gigabits per second when supporting up to 10 gigabits per second and the other one up to 2.5 gigabits per second So, obviously, since both support 2.5 gigs they will agree to use this speed The main issue here is that it's not that simple to make the list of the supported speeds because you have to take into account what the file supports but also what speeds the max supports and how is it connected from the max to the file So, all of these link parameters list building is done in software by the file-lib framework So, basically, you will end do a logical end between what the max and what the file supports On the mac yetubin, for example using, this is the 1 gigabit per second link as ETH2, you can see what your device supports what it advertises to the other file and what your link partner advertises So, in that case, everything supports 1 gigabit on base T so they agreed to use 1 gigabit link Now, Antoine will talk about the new stuff that we need to have New stuff Ok, so we just saw the internet link or it's configured what kind of protocol will be used within this link but this interface evolved over time to support new needs So, we will see what would be the current evolution of this link and we will start by having a slide on what is an SFP module because it will be really important when understanding why this link will evolve This SFP module will be a small form factor plugable transceiver It's basically a module which you can plug within an SFP cage and this is defined by a specification which means it's quite well used within network devices This SFP interface a various medias various kinds of cables so you can have copper cables you can have fiber and what's also interesting is it's hot plugable and it can embed a file So, within this SFP connector you can have a file in it which means you can at runtime add a new file within the ethernet link One thing about it it can be passive meaning that you won't have a file to give you the link status in this case the service driver will give you the link status or the SFP module itself So, with this SFP modules the ethernet link is no longer fixed because you don't longer have only one MAC connected to a single file connecting to a connector but you can have hot plugable file within the SFP modules which means you will need to be able to dynamically reconfigure the link Second thing about it is you can have as maximum side with the PCS some part of a file which can be embed inside the MAC itself allowing not to have a file in it So, if you recall correctly ETH3 does not have a file by default it only has a MAC an SFP cage so that you can have a direct MAC to MAC connection So, let's take a look at ETH0 on one within the MAC to bin So, the first diagram is the one we previously saw this is say ETH0 the MAC a file on this file can be connected either on RG45 cable or an SFP connector Depending on what you will connect on to this link you will have different configuration of the ethernet link itself So, this means at runtime you will need to reconfigure it so that it can look like one of these three examples The first one would be the case where you connect something on to the RG45 connector So, this is a simple ethernet link we saw already a lot of time in this slide So, you have the MAC the file, this file it's acting as a file and then the link part but you can also connect an SFP transceiver on the SFP connector and in this case the file will act as a pass-through not act as a file and you have two possible options either you have a passive SFP transceiver with no file in it so you have what we call a direct MAC to MAC connection but you can also connect an SFP transceiver which embed a file and in this case you will have dynamically a file which will be added to the ethernet link So, depending on the configuration you will have to configure the link So, this was an issue in Linux because you had no way to do it and as a solution one framework called filink filink was added by Russell King and this filink infrastructure aims at solving this issue So, filink will represent the link itself it will not represent the MAC or the file but it will represent the link itself so that you can have outplugable file within SFP transceiver Thanks to filink you will be able to reconfigure the PCS within the MAC so that you can handle different kind of connections and filink will act as a synchronization layer to make sure that the link will be configured in a way it will be working because as we just saw within those examples we need to make sure that the MAC will be configured in a way that it will be working with the file configured in a way it will be working with every protocol used between the MAC and the file and the link partner compatible So, this is what filink will do Thanks to a state machine it will make sure everything is configured the right way so that it can work within a given configuration about what filink will do and Maxime will show you this example Yes, to quickly explain what's happening basically this is the init sequence so your MAC is initialized at boot all the ports are down the filink instance is created and you create your network devices on a purple basis when you do an IP link set your port up you will start your port inside the MAC and filink will try to connect to the file that is described in the device tree so in the device tree binding that I explained previously it's basically a default configuration that is described the file is powered on using this default configuration and there is an internal state machine that is started here the MIA interface is configured to its default value so in the device tree we set it to be 10G base KR so if you remember it's using one single service lane to transmit the link over some traces on the PCB the problem with that is that the file itself can only use its 10 gigabit on base T connection to the MAC using 10G base KR and what happens when we plug a simple 1 gigabit per second device on the other side is that you will have auto negotiation happening between your file and the other file they will both agree to use 1 gigabit on base T and at this speed the file expects the MAC to be configured using SGMI and not 10G base KR and this is something that we couldn't do before we had filing since the MAC to file link was fixed so in that case the file will notify that it has changed its speed and its link parameters file link will notice that and ask the MAC to reconfigure itself using file link config so that the MAC to file connection is reconfigured to use a compatible MAC to file to the link partner link connection so that's why filing is useful because in some cases you do have to change the way you connect the MAC to your file so basically on your board you have the MAC and the file connected through Servers lanes and the Servers driver is reconfigured so obviously the board has to be routed in the correct way so that it can support all of this modes so that's all so as you saw file to link partner is very complex nowadays where you have to have a very dynamic thing instead of what was previously a fixed link file can be outplugged with the SFP which is also hard to handle with the previous representations that we had so I hope that you learned something and now if you have any questions thank you one thing we would like to thanks Stéphane, Andrew and Quentin for reviewing the slides it was appreciated there are some questions over there hi I was just curious you've got things like director, chatch, SFP where I presume maybe that cable would have a fire in it but maybe it wouldn't is that correct that that's kind of implementation defined yes you can direct attach ok so we can also potentially now have an SFP module with a fire in but also a fire behind the cage on the board is that correct as well and we now end up in a situation where the module is actually not necessarily compatible with the cage which you can see from the outside you are referring to the case where the fire is connected to the SFP cage and you plug an SFP transceiver with a fire in it correct, yeah but in that case the fire simply acts as a pass through for the service lanes so actually you directly your mac is connected to the SFP cage in that case ok so if we flip it around then sorry couple questions if we flip it around then there's no fire on the board it's just a mac to the cage do we now need to have an SFP module with a mac in or some kind of mac to mac link not necessarily you simply transmit the service lanes as is to the other mac great thank you very much we had an SFP and an RJ link you had two ethernet interfaces why do I need two ethernet interfaces I can only plug in one link so why do I need two IP interfaces well you don't necessarily need two yes you're referring to the case where you have two connectors on the same link so you have two connectors different kinds of connectors one is RJ45, the other one is SFP but of course you can only use one at a time so if you plugged in two cables to those two connectors at the same time then you will probably have undefined behaviors and what really will happen in our case would be the last one would be taken into account but why do I need two IP interfaces no you don't need to you only need one depending on how do you want to design the board yeah I get the question the answer is that there are AT0 et ATH1 but it's two times ah ok ok two times the RJ45 and the SFP cage you have really two different interfaces on the board right you have two max and they both have one RJ5 connector you can have a look here so you have two times this connection SFP, RJ45, SFP, RJ45 undefined ok so when we are showing this it's only one of the two ok but you don't have ATH0 which would be this one this is ATH0 ok can you say a few words about jumbo frames and how they relate to MAC undefined it's not really the topic of this talk I only worked once with jumbo frames so I don't really know what to tell you about it Alex hello thank you for the presentation I would like to know if there is an interface for inspection of the traffic that happens on the media independent interface or there is only control pass for that to monitor what is going on that's it because as far as I understand we linux has phi has exposure of control pass to linux and that's it right through the registers but for the MIA you don't really have anything to monitor all this I don't think there is something to monitor it for example the broken the link failure between collision or something like that yeah so probably you will have access to debugging registers which can show you this kind of information but within linux I don't know of any standard tool to do it I see ok thank you no more questions so it's lunchtime then