 Hello everyone, my name is Robert Varga, I'm a BNPM Fellow at BNPM Technologies. I'm here to talk about briefly about the ten years that has been Open Daylight, and essentially where we've come with Linux Foundation and open stores and networks and everything all that in the past ten years, and it's been an interesting ride to say the least. Okay, so the agenda have some introductions, some preludes, because all of this software-defined networking and all they started a long time ago and it just took so long to actually snowball and gather momentum that it's important to understand that Open form didn't start things, it just gathered some critical mass. So we'll go into some of the details of the individual industries where we were, say, 15 years ago. Then I'll jump into the early days, so 2013 to 2015, where we essentially organized all this, had all the attention of the media and all that. Then we transitioned, obviously, to around the mill where we mostly have figured out things. And then, obviously, came in downturn where a lot of SDN companies went out of business. There were a ton of things that occurred and they had a direct impact on Open Daylight and other projects as well. And then we'll get back into where we are, how we stabilize the community, and how we plan to get back into their unit. So Open Daylight was, and this is a direct quote from the initial announcement, it was created as an open source community and a meritocracy, which is to say that contributions matter, it's not platinum levels, it's really about open source values and contributors govern. It was launched, I think, is the first collaborative project under Elephant umbrella. It's not first, it was second or second major. RP just stepped out, he could have corrected me there. And the scope was essentially everything. Everything SDN, everything that has SDN or SDN implications is part of the scope for Open Daylight. We need to work whoever wants to contribute it. We need to figure out how to work with them, ring them, embrace them in open source community, and build it up. Which is quite a thing to say when nobody really knew what SDN really was. In certain ways it's still true to these days because essentially SDN is many things to many people. As for priorities, our core concept was how focus follows interest and interest is measured in terms of contributors, so competing projects are okay. It is okay to abandon projects if they don't have contributors and there's no one to tell anybody to work on something and you have to work on this. There's no armchair architecture or anything. There is a DSC which takes care of the ground-the-mill operations of the entire project, which became, see the platform members because essentially we needed to start up the community from scratch, from different competing code bases, from different competing use cases and all that. And that was the only way we could make it work. But once we had bootstrapped it, things like I want to build an open-source plugin, you go wild and as soon as the project got created and approved, you were sovereign. So nobody could tell you how can you build it, when and what your deliverables are, what's your time now. So in this regard, it was a very much desegregated project. Contributors had to commit to, yeah, you're going to do this. And it wasn't like somebody can tell them you have to do that in half a year and you have to deliver this documentation hard on this project, deliver this container or whatever. So that was really the idea of open-source and how communities should work. So built from grassroots. There is some oversight, but it's completely open and the accountability is there. So software defined networking. It's really that point when OpenFlow kind of took over the term and said, well, we have to desegregate the control plane because, yeah, that's what really the switch is, the cost of switches is really the control plane. So let's quit off the forwarding hardware with the final protocol where a centralized point can decide who's doing what and completely control the data. Well, yes, that works for campus data center later too, if you buy OpenFlow spaces. But yeah, we have other networks, right? We have service providers who have MPLS cores, who have access layer, who have edge and services layers, and all these things run in multiple data centers. There's high availability, there's compliance issues, there's 50-minutes-seconds spot restoration, which is a must because if you can't restore it that quickly, that your voice services are not working. And yeah, so it really became rapidly obvious that nobody really has the answer of what SDN is, how do you do SDN? But there actually were many, many, many different partial solutions, pieces of the puzzle and individual use cases where you took the SDN approach, you programmed your network to do something and it fulfilled your need. One example I have here from 2004 actually, I used to work at the telco and I was in business support system something and we had three engineers who were responsible for configuring our internal network. One of them figured out that there is this cool thing, which is KD Console with SSH integration and he spent two months just building up upstreams and in 2004 his configuration of switches was, yeah, SDN, I will just run this script, it's going to log on there, it's going to do a transformation of the configuration and yeah, it's done in one minute. It was 30,000 lines of configuration so it took some time but still it worked. And that's one way of doing things. So yeah, then the other part is obviously how do you do this? Is it MNMS where engineering an end-to-end path and committing all the resources takes three months and a SP level project and then you forget about it and why it was turned out or is it really a real-time optimization problem where you essentially are delivering services and something is computing things and getting you to, well, essentially solutions. So it was very, very fluent and we didn't really know what design are we taking into account. So, preludes. So this had a prelude very, very long time ago. Back in 2001, I don't remember, there was a call between service providers and ITF and the service providers came in and said, well, you define everything that is Internet and for management we have SNMP. You cannot use SNMP to configure things. You cannot use it to really pull things because it doesn't scale. It's binary, nobody understands it. It's stateless, please fix this. So it took six years for ITF to come up with the first proposal, which was NetConf, which provided a direct path from SNMP. It was XML-based, human readable, based on SSH or TLS or whatever and essentially fixed all the perceived flows. It was one track. The other track in Stanford, OpenFlow was defined to essentially solve a completely different problem that actually a campus can have and they perceived it as a problem that it's solving. And then in 2010, actually, ITF defined a competing solution to the OpenFlow problem, again, based on XML, and that's forces, which stands for forwarding and control element separation protocol. I think it was a long time ago. And then, obviously, Yang was defined as the data definition language because as we just discussed, when you have just YAML to have your configuration data, it doesn't tell you anything. So Yang again solves the data definition problem where you have clear definition of what the configuration means, what is the intent of that. And then all that came together and then in 2012, two major things really happened, which was OpenFlow 1.3, which finally meant that OpenFlow could be used in production because 1.0, if you try to do anything with it, you'd really quickly find out that, yeah, you can't really do anything. And then VMware acquired NYSERA for a billion dollars. And that meant everybody's screaming and everybody wanted to be the next startup who gets acquired and SDN has to be the thing that we get acquired on for another cool billion dollars. Actually, at the end of 2012, IBM started pushing for saying that, well, an OpenFlow controller and really an SDN controller should be something that is off the shelf. Just like for an IDE, you have Eclipse and you can, well, design your code there. You should have a standard platform for SDN controllers. And that's where essentially Open Daylight got started. So, yeah. So, in the... I got engaged with Open Daylight in, I think it was something like first week of January of 2013 when it was clear that this is happening. There are actually two competing code bases which will be used as the seed code. One was coming from Cisco, one was coming from Big Switch and there were all these things trying to figure out how do we start with this? How are we going to code drop it? How the infrastructure will work, who is going to take care of that and all that. And then on March the 22nd, the initial code drop of the Cisco code got dropped into a Linux Foundation Private Repository and, yeah, two weeks later there was the first announcement of the Open Daylight as a community with Target to do everything as DN. So, from there it was quite an interesting ride because everything happened essentially on the go because we needed to figure out our governance, needed to get to know all our peers, figure out what is it that we are going to deliver in the first release? How are we going to structure documentation, testing, all that goes into setup which you don't really find unless you're on a particular project from the get-go because usually you come in as a later contributor and these things are most of the time figured out and work well enough. So, based on that there was a first mini-summit in New Orleans in September 2013 where essentially the basics of the first release were defined. And the idea was that we really have three targets which is a base addition which is just open-flow and it's really just not even a learning switch. It's just something to get you started and that's the baseline controller. Plus, obviously since the initial contribution actually included a NetConf server, a complete management, configuration management layer and something else I always forget. Oh, DMD is obviously. So, those were packaged inside the base addition and then we had service provider addition which was all this plus BGP and PSAP and things that you really need when you want to integrate with an NSP network. So, BGP LS to ingest the state of the core and all those forwarding things and all that plus PSAP to actually control your layer 3 MPLS tunnels and get those computations going. And then there was virtualization which was really about integrating with network slicing and OpenStack primarily. So, there were I think three computing applications which did network slicing and scheduling flows on top of an OpenFlow network. So, based on that, on second try actually we got the first hydrogen release in February 2014 and there was a huge flux. The first release was like it took 12 hours of an IRC session to get everything released because 15 minutes before the actual release process started, Garrett mismatched one of our late coming bug fixes. So, it actually created a regression which the these things we didn't find. And we found it only after we tried to integrate the release so it took something like two hours to figure out and fix. And then obviously everybody on IRC for those I think it was like 15 projects for each of those projects because everybody is responsible for their own releases to do their release thing which was highly manual, very error prone. So, it was a nightmare. But we got it done and we said, okay, we need so much things to change. Things need to be automated. We have to change the packaging, figure out how to have distributed documentation because we actually had none and all those things. So, it was really a year of a lot of changes which at the end of the day defined our scope for next I think three years until we executed each and every one of those changes as we also built up our use cases and I'll show you more about it in the next slides. There was another thing that obviously Redhead came in and started taking over the network virtualization use cases and the entire stack which was very important because they gave it a structure to define the CSIT and drove essentially the entire use case stack and solved a lot of the issues that really were integration issues and the low level plugins didn't really care about. Then there was an ITF shootout where ITF suddenly realized that yes, there is SDN and yes, it is happening and yeah, we have a couple of protocols which are contenders and what do we do about them. And there was a huge shootout between essentially forces which was the control plane separation thing and NetConf which was really driving management only but it was shown by major contributors that essentially NetConf could be used also as a control plane protocol and it's just less efficient and we can always fix that but the core basics are there and the entire shootout ended up with NetConf Yang winning and the big reason behind that was it was human readable and there was an open source implementation you could download any day and it was called Open Daylight. It had only one implementation which was proprietary you couldn't get even a shareware or tryout version of it anywhere. And that's where essentially all the standardization in ITF converged and said well, okay, if we are going to define management plane and control plane operations we will use Yang to do that and use NetConf to drive it into network and that push essentially continues to this day there's a huge uptake of new models. So sometime after that in October we had our second release which was essentially the first non-alpha release which we kind of said okay, everything works as far as we can tell and we are reasonably feature complete and then the deployment feedback came in and said well it can't really work as soon as you try to scale it up or try to do anything real with it. So we came back and said okay, so we need to adjust we need to spend actually a full release cycle where we do not care about features but we care about testing, documentation and making sure those features are as solid as can be. By this time we kind of figured out and this overlaps into 2015 that we really want to be doing something like two releases each year and have that first release somewhere in spring and then have the second release somewhere in autumn. As things go if you're stabilizing things it's going to take longer than you think. So in June finally we originally shoot it for March I think or April it took actually two months longer to actually integrate all the changes and sign off on all the new tests that we've created so we could actually ship it and didn't feel bad about it that we knew about bugs and shipped anyway. We also figured out our governance and how it's going to work all the representation stuff and prepare the transition from having appointing members of the TSC and essentially having companies dictate to how our elections will work will be eligible to run and vote and how those votes will work for what term and how all this will work and we kind of said okay so it's going to take us this is what we want it to look like there are certain parameters that we can so there's a framework and there are parameters that we plug into it like number of TSC seats how many are representatives of projects how many are committers at large and similar things the other part was that our day to day operations finally started to work so we had all the verification there wasn't like we have 15 projects building and anybody can break anybody just by merging something in wrong order most of that verification we got down it was very costly it took us two years to build those pipelines but we were finally there and we started to get real deployment feedback and some use cases actually started to work so that was very very happy time to actually see all this come together and essentially we started to get used to development testing forums which were huge events with many breakout sessions hackathons and well those design sessions were always broken up because it took like two, three four hours to figure out just to get together and figure out what the next steps are but it was critical in getting the use cases and those distributed communities to actually deliver the value so by 2016 we were essentially run of the mill so we knew our release cadence we knew our priorities we had most of the projects in and we essentially I think it was in October 2016 when we held our elections out of which we had ATSC which did not have any appointed members so it was fully community driven but there were things that were not quite okay because individual member companies started to get acquired suddenly some people stopped communicating, participating and we didn't know what was going on and while all this build up was still happening and there were new use cases coming on board there were some use cases which suddenly had contributors who were not communicating and there were cutbacks even on things that were committed previously that suddenly didn't work so suddenly we had this weird sense that okay so not everything is going as planned or not everything is rosy and yeah we forgot to mention that in 2015 we switched to as our OSGI container because before then we had something completely homegrown which happened to work but nobody could tell anybody why it's built the way it is it just worked and you shouldn't really touch it and upgrade components and anything it just worked okay so then we came and before came out and we needed to migrate and we suddenly realized we have a huge amount of technical debt that needs to be taken care of and yeah so if you take a look at what it was this was boron I think it was that was released in September 2016 and there's a lot of plugins and so the dark things are just new things that appeared in that release in the middle that's the essentially the core application components that were part of the platform then there was the actually still is the service abstraction layer which essentially bound all those individual plugins together and all those small rectangles down those are protocol plugins so we had something like 15 of those so overall a big picture so a lot of things coming in then in 2017 a ton of things changed first of all on up got announced in February and suddenly there was a cool new kid in town which was doing orchestration and if anybody can tell me what's the difference between well if you have a controller and an orchestrator who's doing what where's the clear delineation of who's doing what nobody could tell me then nobody can tell me now because yeah they overlap in different deployments they will have different functions but yeah orchestrator sits on top of a controller because mostly the controller should be a single side thing which reacts to immediate needs whereas the orchestrator is concerned with the more reconfiguration things and kind of planning ahead multiple days across data centers and all that well that separation didn't exist in 2013 that's something that came about with finding OWNAP and obviously we saw a fallout from there because a lot of member companies which were contributing to Open Daylight suddenly switched to yeah Dionab architecture has different modeling it's more aligned with what we are doing and suddenly a large chunk of our contributors moved not somewhere else within LFN but to a different project so we kind of felt that pain but it was still all good because yeah we are aligned and it's still the same house we also got a new executive director so Neil Ajak stepped down and Phil Robb started to fill that role as an interim for the essentially a year because by now LFN project started to crop up left and right things like open governance became a thing and the member company started to push back I remember one conversation which went like well why should I be paying $1 million for to sit on the TSC for four projects if I'm also contributing I would much rather spend three of those $4 million to pay for developers and get stuff done instead of well paying for something so that actually led to a recharter at the end of 2017 where all those individual projects got merged into LFN and all that lawyer magic happened but and also obviously Cisco pulled out announced that it no longer will ship Cisco Daylight or whatever they called it CD something I don't remember and we actually had to call some projects from our simultaneous release because the contributors were not there they needed to update the code base and given the distributed nature we couldn't do that for them so yeah and then the 2018 crunch continued because at the beginning of 2018 Red Hat came in and said well yeah we will be going away but luckily not like Cisco pulled out quite quickly well actually Cisco took something like three months to scale down and exit or something like that Red Hat came in and said well we will be disengaging after we do the TOIs and training and what not but we are exiting so we as a community picked up some of the slack but most notably the network visualization bits were not handed over and lost their structured leader well structured and the tribal knowledge of how things work how CSID works for example what migrations are ongoing and what needs to be done so kind of through a range there and yeah after that we actually had the first mass archival of projects when we archived something like 15 projects because we said well nobody is actually contributing them they are not well documented and there don't seem to be any users of them so we can't maintain them so we either push back our releases and try to make it work outside of our usual framework or we just drop them so we chose to drop them and it turns out it took something like four years for some users to appear and ask about those projects at which point all we could say yeah we can bring it back if you are willing to pay for that or contribute the resources but it's going to be a lot of work because a lot of things changed but we kind of said well it's what it is if you don't have users and contributors you cannot be delivering something and spending resources on that in that year also Inosive went out of business quite quickly which was interesting because they were sponsoring quite a bit of our infrastructure work especially around Datastore and documentation about AAA and whatnot so that was quite hurtful but yeah both of those cases Lumina which essentially was the leftover from Rocket picked up most of the slack and we kind of charged on and said well okay we recognize suddenly that we have a lot of technical debt which was not addressed and things seemed to work but we cannot really really evolve the system unless we make drastic cuts our releases went just fine so at the end of the day the end deliverables we delivered on time and on quality the problem was that in the following lists some things stopped working so there were users who asked questions and suddenly nobody answered or answered three months later so that became quite problematic and the trend continued so now we are in our third year and in 2019 we actually start slipping our internal deadlines for simultaneous release so I thank the testing deadline for NEON was meant to complete in December and actually completed in February so there was trouble brewing there but fortunately at that time we finished flashing out the technical debt from our kernel projects and we kind of stabilized the overall thing of what the features are going to be unfortunately new features now were mostly improvements to existing use cases here and there not something you can really market about because it's not cool new feature it's just an improvement and things start to work better and more reliably this also meant that the pressure on our end user applications like Netvert and Nemo and all those started to mount because essentially they were the only ones that needed to fix their migrations their technical debt and they actually didn't have all the developers so we didn't realize that then but we kind of knew that we know the set of problems that need to be solved and we can kind of hedge it and not complete all those migrations and we'll wait for a year for those things to get fixed and then we'll move on and with that we essentially shipped sodium in September and said well okay we've got something like a year so contributor wise it seems that going into 2020 we knew the basic feature set it was not it was essentially Netvert plus some of the SP use cases plus TransportPC which is really about SP and Netconfianc as the core core feature and we realized that in 2020 most of the features we delivered were centered around Netconfianc there was something like two or three improvements to BGPSAP there were zero major improvements to Netvert for example or OpenFlow because it was what it was it worked and there were few contributors so it was mostly like little tweaks maybe a bug fixer too to just keep the ball rolling and it seemed that this is going to work quite well up until August I think 25th or there about when suddenly Luminov and out of business essentially over weekend or something like that which essentially removed something like four major contributors from our community and created again pressure on who is going to maintain this and again this was testing this was the cluster Datastore and the Netconf Ptl at the time so it was problematic we still were not quite on time with internal milestones for example again the testing milestone for aluminum was three months late because we delivered it in August instead of sometime in May or there about but again releases were on time because so it was late it was sufficiently good to pass all the gate criteria and we could release and then just as we started to plan our 2021 releases which we now knew that we well using elements to code your releases and rely on people to know the periodic table and know what the order is is not going to be nice so we decided that the next release is going to be 2021.03 and then 09 and then we'll just use the dates just as we started opening that release cycle Ericsson came back and said well we are pulling all the resources we will still use this in our in our product but we will not be contributing to open source anymore and that essentially killed a project right there and then because there were the last contributors to that and it kind of started unraveling the entire stack of four projects that contributed to that use case so but fortunately that was the end of it and we kind of arrived in a safe place where we had been called sufficiently to a very small and manageable set of use cases with sufficient number of contributors to deliver them and maintain them going forward and actually evolve them so essentially what we boiled down to is NetConf Yang so the RESTConf to NetConf translation plus all the tooling that goes into all dealing with well Yang model data transport PC which is maintained by service providers most notably Orange but there are others I always forget who they are sorry then BGP and PSAP obviously but that's mostly in maintenance because well it works there has been something like this over the past two years and we kind of fix them if we can if not we try to fix them later then we have OpenFlow and OBSDP there is actually one member company who is kind of interested and still have a deployment so they kind of maintain it but again nothing major going on just maintenance and the same thing is full mapping which is a complete solution from a plugin and an application and JSON RPC which is a JSON RPC integration those are again just maintenance they just work they have all the use cases well defined and there is no technical there that we know of with that we kind of say that okay these are the use cases we provide we are not a product we never were although an open daylight distribution was something a number of companies tried to sell and as we have seen altered around it so we are not a product we are set of services libraries maybe use cases or components of use cases and we are open source we maintain that if it happens to solve a problem for you that's great please use components if it doesn't please try to look somewhere else hopefully somebody else is going to fulfill that need if you feel like you can contribute and bring back some of those use cases that's very welcome but we are not going to do that because we don't have the resources to commit to that and there's things like an FEO there's other ways of achieving SDN in underlying languages which are not Java architected differently which may be point solutions we are not trying to take over the world and be the be one and be all and all for everything SDN anymore we have a set of technologies if they fit your use case fine if they don't well somebody else is going to be there and that was a very striking realization which took many years to come to with that we were able to stabilize our committer and contributor base we were able to get our releases under control again not quite because we also started flushing more of the technical depth to essentially at this point we delivered something like 10x optimization to the end compiler and all that thing and it was incompatible and it took two years to integrate and all that but we don't have an outflow we have a known set of use cases and we have stuff to actually support them and we have the ecosystem to actually make sure that those contributors do not suddenly go away because there's some amount of business that is actually backing those contributions and we actually came out and understood what we need to do to get more contributors or hopefully make some incentives that will bring them back which revolve around more Kubernetes integration and essentially going back to re-architecting the way we package and deliver our components and use cases to the end users last year we actually saw some uptick in contributors we had some newcomers and we kind of shuffled the contributions to actually have the core and big things and small things taken care by novices and reviewed and there was a pipeline that starts to work finally and we almost made our releases on time there were big improvements as I said in scalability that we were pushing out actually since 2015 some of them and we realized that there's still more technical depth to get around but if we can make that work we can improve our deployment scalability 10x and integrate with Kubernetes and have autoscale and all that that wasn't common in 2013 when you deployed on a big King server and you ran your deployment there for 5 years now in Kubernetes it works kind of differently and we need to kind of change our plugins slightly to make that work now coming back to this years we actually got a new cometer this year that's something that didn't happen in 2 years and previously there was just one so in the past 4 years we had 2 new cometers luckily there's one this year there's probably going to be a couple more I think it's going to be something like 2 or 3 if my crystal ball is okay the contributors contributions actually individual contributions now are structured and there's a flow of them it's not one contributor or one random contribution in 3 months it's like every week and there's a steady stream of patches and they kind of build up to something that is reasonable we also flushed out essentially all the technical depth we had there's one more probably painful revision which is going to just remove all the compatibility layers but we are okay to kind of hold on to that for a couple of years if we have to it doesn't affect anything really it's just like a sick code that needs to be flushed out we shipped Argon in March I think on time I think we were 2 weeks late and it was just because our infrastructure was broken for 3 weeks and we didn't realize and we didn't realize actually what the problem was once we fixed it we actually got it working and obviously for potassium we want to make that scalability pieces work and the idea is by the end of the year we want to ship actually demo an end-to-end integration of NET-confressed-conf control use case with all the scale hitting tens of thousands of southbound devices which is something that is obviously interesting for O-RAN and all those edge-driven cases so yeah that's it and it took way longer than I expected so any questions I yeah so the document so first of all our site OpenDaylight.org needs to be completely revamped unfortunately we have few marketing resources that can actually do that I didn't mention it we wanted to do that last year unfortunately the guy I had marked for that actually moved on and he did a prototype and I still have the prototype on GitHub but I know nothing about it I'm looking for the person who's going to pick it up documentation buys the use case documentation is mostly accurate still because essentially when we removed a project we also removed all of its documentation there are bits and pieces which are stale if they're stale we'll fix them it's just that people who so the internal deployment team and everybody who has been using this for years now knows everything by heart and has their collection of what they want to do already there so really people who read the documentation are mostly newcomers and it's problematic to get them engaged with mailing lists because everybody wants to be on stack overflow and we don't have the resources to man stack overflow again so it's problematic but it should be mostly accurate if it's not we're going to obviously fix it you're welcome so you've talked about the challenge of having a few contributors and just the answer to this question have you thought about how hard is it to onboard do you spend any time how do you make it ever easier to be a first time contributor to your project yeah that's a tough one because essentially as we scale down the community it's still 12 Git repos that have are connected somehow and there's ancient documentation which is somewhere in the meantime we migrated the VK once or twice we migrated the dogs from ASCII dog to RST once so there's bits and pieces that get left behind and the most critical thing that we lost is the tribal knowledge so that makes it very very hard to to essentially onboard people because if a question comes in essentially there's a pool of I don't know say 10-15 people who cover the entire stack and they usually don't work on that particular area so they have to go back check and yeah it's kind of hard to create those incentives too to make that work that having been said our grant plan is to redo the use case integration so that it's just turnkey Kubernetes plus an operator or what not and the end user engagement becomes easy and as it turns out to do that we need to clean up the technical layout the source code layout so that it makes much more sense so we have just completed that for NetConf which is the last participating project just last week and out of that we have some read me RSTs for now but it's going to be MDs which are going to point out what so rather than the southbound plugin being called SolNetConfConnector which doesn't tell you anything because the northbound is MDsolNetConfConnector and now which one is which one is now called a southbound and the other one is called the northbound and they don't have weird cross dependencies anymore and now we can actually reason about it and throw it into well because MDsol actually is microservices but they run in the same JVM plus ARCA clustering and all that which you don't want and we just need to create the API services which turns out are there it's just they're modeled to reintegrate this on Kubernetes or anything that is really microservices and once we do that it's going to be much easier to contribute especially fixes because most of the time you just need a bug fix somewhere and it's Java so it's secretary so you know where to look if the names are right thanks