 All right. Thanks everybody for joining. My name is Jason Hunt. I'm a software architect with IBM Focused on things cloud and NFV My name is Thomas Spatzi. I work in the IBM Netcool team especially looking at things like service assurance. I will cover this today And I'm Toby Ford. I work at AT&T and responsible for the the architecture of our domain 2 platform All right, so what we're going to do today if you've been to a number of the other sessions to talk about NFV A lot of them have probably talked about you know core aspects You know things that need to be done with service chaining or neutron or you know hypervisor enhancements That's not what we're going to cover today We're going to kind of step up a level and go to end-to-end You know across the service life cycle if you will so not even just the VM life cycle not You know instantiate VM scale it tear it down But you know what does it take to deliver service sort of end-to-end from the first thought about what that service should be to assembling it together It is assuring it Etc. So kind of our flow here is we're going to have Toby kind of tee this up for us Tell us what the business challenge is tell us what some of the Approaches might be to solve that and then Thomas and I are going to tag team on that life cycle Talk about service fulfillment service design and creation and service assurance and techniques that we can take from both the cloud And in the telco world and bring those together to help deliver on NFV Perfect. Thank you. All right So this is our challenge for today the telcos Starting with AT&T. We've decided to take on the mission of what we call NFV. This is our transformation from bespoke vertically integrated hardware to a world that is Disaggregated disaggregated hardware and software disaggregated control plane data plane Running on commodity hardware a drive toward virtualization For us, this is all about Helping us to move faster build new function more quickly to get the competitive cost structure both in terms of our capital spends and our Operational expense spends so this is this is what we're trying to to make happen and make happen faster But at the same time we have to live within a set of expectations so this dynamic of Wanting to move quickly and then meeting customer needs with regard to resiliency or availability Efficiency performance their expectations of cost And then also wrapped around that is also all the rules that we have to live within so that's our basic challenge Is dealing with this these two vectors often going against each other? This this concept essentially is a universal problem right and is When we go back and rewind 25 years to when I first started Coding and how agile development came up. This is really how you characterize agile development is how do I? iteratively evolve quickly and then at the same time over time how do I refactor code to make it stable and more simple This as we've seen Is a very effective way of moving quickly and meeting people's expectations and creating very large Very capable like e-commerce sites and very capable Video content management or distribution sites and these these types of things so this is a proven Mechanism to solve for this is using agile forces to do this and it very much is a part of OpenStack as well And you see this with our work lately in the foundation with promoting the big tent We want to make it possible for people to innovate and innovate quickly and then have their ideas Front and center as they start to try things and make things happen, but then at the same time we have to find a way to make it more Manageable and resilient and upgradeable and these types of things so like how do I make OpenStack into something that is? has five to six nines of resiliency 16 nines of durability, how do I make it so I have less than a microsecond of jitter and latency? How can I transfer you know 10 20 30 mega? Million packets per second through an x86 box. How do I actually upgrade sites? Get out of the conundrum that we're in today of having for AT&T We have sites already continue to exist that are on Essex How do we bring them all up to a current version and even go beyond that to see ICD? And then how do we get to something that that actually is secure? So that's that's the problem that we have to solve both for telcos and for OpenStack And wait, I think that this is very much about three things How do we solve this we solve it with with in my view? Agile as I described already and then also this new concept that's come to the forefront Making things cloud native If you've heard me talk before I've talked about the pets and midget cattle Kind of dynamic trying to convince Applications that were built in the legacy world to be Something that's less scale up and vertically scaled to something that's more scale out So these are two two concepts and then there's a third one that somehow got missed on this slide We're where did my home policy go? So there's a third thing that is very much a telco concept is called policy policy driven is a way of thinking that the telcos have used for many many years of taking expectations people have and then formally Identifying what that represents in this and then making manifest in a system and trying to drive to those those expectations So these are the three general ways. I think it's going to take to solve our problem now Diving into each one a little bit more deeply on the the cloud native part of it more than just making something scale out making something that is simpler more modular and More easy to manage and you see this with containers. It's not more. It's more than just making something more efficient than virtual machines Using less overhead. It also represents a move towards simplicity Where you're using less it's it's using less packages less overhead of in the operating system You're using something that's far more transparent and easy to to see what's going on I think that's an essential move as we add complexity that the components be more easy to understand easy to grok Then also Nothing that we do in any of these contexts can be done manually. There's no time for manual It has to be dynamic and autonomous So going beyond just the simple automations that we've done in the past with scripting and maybe in the more recent time With configuration management tools we have to then get to the next level, which is closing the loop in our parlance is making the System autonomous so that it can be left to its own device to solve and resolve problems And the last but not least is is really a commitment toward API driven interop integration and modularity so that I can allow for innovation to happen and that for What I call eat our own baby so make something that's solid and and Works well and such but then realize that maybe somebody could come up with a more efficient and optimal way of doing something And even though I've spent my life's work making this thing that I'm able to let it go and let it be replaced by something more capable So I think cloud native as a number of different aspects and you see this with our cloud native compute foundation group we're trying to promote this and for me, I'm trying to promote these concepts with the VNF vendors who have just Basically gotten over the hurdle of trying to be virtualized now. We're trying to get them over to another hurdle to even go the next step The last piece is the policy-driven part for me is And this is one I've been struggling with because it represents bringing a lot of the Cruft of the old OSS BSS systems of telcos Bringing it forward into the modern world. How do we bring policy which in my past experience has been very Overwrought and complicated. How do I make it simple and incorporate it into a more cloud native microservices way of thinking? And so hopefully Jason and Thomas can solve that problem for us here so essentially policy is about Writing down all of our expectations Making it very clear and logical This is I want to maintain this level of availability. I want to have this person can only access this resource This company can only do these things This I have to maintain this level of Latency and jitter in the system. How do I specify that in rules and then put it into the system into the autonomous system and let it maintain those rules and Manage any conflicts. So yeah, this is the third part of what I think is essential to make to solve our problem So you know as Toby mentioned we got this conflict here with his two arrows coming together It's same thing service agility on the one side We got to roll these services out faster if you remember at the keynote yesterday talked about taking service Design time from 18 months to six months, you know, so you've got that on the one side on the other side You have the operational trust so you remember that big picture of the network operation center I mean that's serious stuff, right? I mean those guys are very serious about making sure their services are up and running because if they're not People can die or if you're in the United States your American Idol votes might not go through which is a very bad thing So, you know, how do you bring those two together? This is the same problem in the sort of software world that you know You've had with dev ops right dev on the left side ops on the right side you bring them together This is sort of the same thing, you know on steroids a little bit Because particularly on the operation side So if we overlay on top of this this will sort of set up the rest of the presentation you've got you know This sort of life cycle that I talked about starting with the service design and creation How do you come up with that service? How do you take the functions that you get from the various vendors and whatever form they might be put them together into a service? The fulfillment piece which is how we make that realized in the cloud when a customer orders a service And then assuring that it's going to perform as expected within all those constraints So I'm actually going to start with service fulfillment because I think that's the one that relates most to you know What folks might be familiar with on the open stack side? and first thing to sort of bring up here is that We're going to be in what we call a hybrid world for a while, right? Physical network functions aren't going anywhere You know the business case has to be there to go to virtual So what you'll see initially is new services on virtual network functions You'll see services that are growing, you know cap sort of their physical piece and grow the new things in virtual But you know for quite a while you have both physical and virtual in a hybrid environment So what that means is you know if your order systems and your bss os s they're going to have to talk to both sides They're going to have to talk to those existing provisioning and activation systems, and then on the right side They'll have to talk to you know your your virtual aspects of it now in this stack here We used sort of the NFV terminology of NFV orchestrators VNF managers the VIMS Which of those of you have been on the Etsy side are familiar with I do want to make the point though that these network Services are going to not just be network functions. They're going to have it Characteristics or applications with them. There'll be mobile apps. There'll be portals There'll be apis that they have to call out to so this stack You know if you're more of a cloud software type person don't get scared off by this terminology You know you need application management. You need orchestration on top of that now off to the side Here are some of the artifacts that you need to make that Thing become real right so at the very base level you're going to need images you're going to need containers if that's the route to go down You're going to need the software components of it You're going to have to assemble that together in templates or patterns as the next layer up So that you can take those different network functions piece them together And then you know you've got to chain those together They've got of one has to be able to talk to the other to the other So you've got that as the next layer up and then when you bring that into a network service There's sort of an end-to-end flow or workflow that might be needed to make that fulfillment happen Now the other challenge around Service fulfillment is you know not just sort of getting through that stack, but placing those network functions in the right place You know whether it's within a region or across regions, and I know Toby's Particularly passionate about this topic. So I'll let him kind of lay out the challenges yeah, so so placement is One of the things that I think the open stack has been pretty good at setting up a scheduler that looked at Some of the resources that you put where do you put it? The challenge for us has been okay taking open stack and making it something that works across a distributed setup So that's one dimension of what we want to see Evolve, but then the other dimension of it is that placement isn't just one time. It's not the first time It's something even though we're trying to get to disposable Microservices architecture where I could spin up new things and use the spinning up of new things to solve for placement I do want to actually optimize things that are running, and I don't want to Maybe end your call so that I can do bin packing. So in the end I want to be able to to also over its lifecycle change the placement and Move things around behind the scenes without people knowing about it All right, so that's sort of the service fulfillment side now Let's stop step back and say what does it take to to make this whole thing real And to create it in the first place now, you know when I've talked to somebody In in a telco in the early days of NFV He joked that when a network vendor would come in pitch their latest product to him He'd say well does it come on a USB key all right? I mean this was the whole challenge right because you're moving from hardware to software So if your network function doesn't come on a USB key, I don't want to talk to you And I thought well, what's on the USB key right? You know you expect okay. I'm going to get their core software Maybe I'll get some scripts that help automate the installation configuration of it You know if you're lucky, maybe you'll get something like this This is a heat template I've got a couple of slideshots here out of a demonstration that we do around an end-to-end service life cycle So this is a virtual evolved packet core and the heat template for that So maybe they'll provide a heat template that says these are all the VMs and the types of networks that have to come together to make my to make my network function But you know as I mentioned before you'll kind of need a workflow layer above that both to talk to the physical things and the virtual sides As well as within the telco Area there's always a number of other systems that you have to talk to beyond just getting your infrastructure spun up So it's great, you know now we've got IP address management, you know in neutron So that might take one of these steps out of this flow, but you've still got inventory systems or security systems to talk to or you know the subscriber management systems whatever and you know having some sort of workflow capability on top of all of this Is going to be necessary and in this example We had things like virtual probes that also got spun up and virtual testers along with it And all this ends up being something that an end user can can order along the way So you might need some assets for that in the end, you know What I kind of was covering that the core capabilities here these core assets from the image up to the pattern up to the workflow But in order to deliver on a lot of those requirements that Toby was talking about you know the five nines and The policy driven pieces and and the low latency you're going to need a lot of other kind of assets around the edges And that's what we show around the edges here. I want to highlight a couple of them on the testing side you know telcos are very Diligent about you know testing their functions before they go out In the market and so for for those of us from the software or cloud side This is a little challenging because these aren't just rest API calls, right? You've got very specialized protocols You've got specialized testing components and you have to treat those testers and the simulators on the back end Just the same way that you treat the network function. They need to be virtualized as well They need to be spun up dynamically in your test environments Even when you roll them out into production you might spin up a tester and test that thing in production at the same time And when you're testing them you need to capture the metrics as to how that thing performs So you need to know, you know, maybe initially how does how does this function perform on a VM by itself? What happens if I put other network functions on those VMs we start getting contention or whatever You want to capture that set of metrics so that when we get to service assurance that Thomas is going to talk about You know the service assurance systems know what sort of expected or not expected for that particular function So you've got all these artifacts. These are all the things that have to come together to make a network service How do you manage, you know the creation of those? So you've got all these people that are maybe using different tools to create these sort of things And I'm not going to go through this whole slide But the general gist is if you have a software development life cycle today and you do sort of Continuous delivery of your software components. You can use that to do NFV as well a couple things, you know, that would be unique I talked about different test tools so your test management is going to be able to Need to call out to different test tools You're going to have a few more artifacts along the way and you're going to have a degree of rigor around the certification environments and the pre you know the pre-production labs and all of that to get that done and then at the end when When all of this goes out to production, you know, you've got a couple challenges because we're dealing with services, right? It's not just like an application that you just go and you know I'll continuously update it you might have a number of instances of that service for different tenants or different customers So you might need to do sort of blue green, you know, zero downtime deployments for that you know if it's Design in a cloud native fashion as Toby talked about and for you know new customers that come in you've got to take all of the Things that you've learned if you will in this design and test process and put it out into your system So your service assurance systems have the right, you know thresholds set your orchestration systems have all of these assets ready To be provisioned or fulfilled when the orders come in All right, so you've heard enough from me. Let's hear from Thomas a little bit around how we're gonna make these things perform in in the real world Yeah, last but not least service assurance and in my opinion. This is an aspect. It is sometimes a little bit neglected so everyone thinks about how I Stand up those virtual network functions and how I scale them But in order to decide when you have to scale or to to see when something's wrong You have to have source assurance in place When when we come to this new world of virtual network functions or cloud in general We talk about the problem of two-speed IT. So we have The classical world so the hardware that you that you put in place and this is a pretty Well, it's a static environment. So this is what's called it the heavy goods lane here and we have Little changes in that layer on the volume is not that big in terms of variety we see events we see SNMP Traps EMS bulk imports and in terms of velocity We typically do periodic batch discovery which can run sometimes for a couple of hours So it's it's pretty heavy and but pretty static and the higher we get in this deck We we call it on that picture. We call it a fast lane that system is much more dynamic So we have high change rates and things that work in the old world don't work in the new world So for example, we can't do long-running discoveries because at the time the discovery ends The system might look completely different. So what we do instead of discovery is just capturing all all kinds of observations From from this dynamic environment that out of those observations We construct our view of the world so that we know how how the system looked like At any point in time and in terms of volume We are talking but about much higher volumes of gigabytes of terabytes over time because this is a constantly running process and Here we have to apply big data technologies Talking about service assurance architecture. So what we had so far I tried to summarize in this picture. So both from a from a tools perspective a data perspective and and also an organizational perspective We had we had those silos silos. So we had a team managing the network. We had a team managing the servers We had a team managing applications. We had event management in different tools and And and Not only did we have a separation between the disciplines But also layers. So I talked about the lower level network layer than your infrastructure the applications and Another thing was that they the the old service assurance architectures were built for mostly static systems So we we haven't things in place to cope with the high change rates in in virtual network functions or cloud what we are working on now is So in my opinion the the main component of this is bringing the data together So instead of having separated data Stores for network IT applications or for events or metrics We are applying big data technologies to basically get an end-to-end view Across all components of the system. So we we have application formation the underlying Virtual infrastructure and also the hardware because I I think when you when you want to drill down into a problem that you have at the Virtual layer you have to understand where this thing is running because Just provisioning a new VM or new container might not be the solution to your problem if you have a Problem really low down in your in your in your hardware infrastructure and hardware will be around for quite some time I think probably forever and What we are also doing we we convert from the silos to a more much more integrated stack where all the The disciplines have access to the very same data and they can also Pull new put new information in in this shared data in terms of getting data from the environment We still have traditional discovery. So for your pretty static environment, you can still do discovery You don't have anyone running into your data center and plugging in or removing servers Every hour. So discovery is still good here But what's what's more important for the higher layers is those those observations Observation can be anything it can be data from agents. It can be flow data and that flow as flow adapters that we're using here or it can be the the virtual probes and virtual tester information that Jason talked about and All this information is just posted to a to a rest API and then In real time basically correlated with the data that you have already and then all the components have access to that To the overall data wire wire data program Another thing is we have those managers So we have the women the VNF manager the NFV orchestrators and they also provide useful information When we know what we are deploying so we have we do deployment based on templates So we don't have to discover everything because we basically know what what we deployed and we have a good seat at least for doing For correlating data and finding out what belongs to a virtual network function or what belongs to the end-to-end service So what we do is either using plugins for for knowing for example when heat has deployed a stack So it has a plug-in a plug point where you can basically get information on life cycle events of the stack or we can also apply adapters that use the rest API's of the of the managers to pull out instance information or information about changes and Finally now that we have this new architecture and we have the service fulfillment piece with the managers We have to close the loop for having what we call closed loop control And I think an important point is that you you need this end-to-end service orchestration that talks to all the disciplines so as part of Deploying your network functions. You're not only pushing down instructions for well for deploying a heat stack or for creating virtual resources, but you also push down hooks of your service assurance system so that agents for example can can Post information to that system for each instance In the same way for each instance that you deploy you you push down assurance rules and policies for correlating events of of that specific instance and And at the end you also put policies in the middle to basically tie back Insights that you gain in service assurance to the service fulfillment piece to make to to reach this this autonomic behavior that Toby was talking about and I think If you look at opens that you have auto scaling capabilities you and he there's a new project called Sandlin that allows auto scaling but What do you get out of the box is scaling based on infrastructure metrics, but Is it a really a problem when you have high CPU load? Is it really a problem when you have high memory consumptions in many cases? Yes, but what you also have to do is Pull in the information from your virtual probes to see well is this video streaming really bad or do I have? bad response times on my web applications and So the decision cannot be made by by the by the virtual infrastructure managers or the VNF managers in some cases So what you can do is you you gain this insight on the right hand side And then you basically do not use the out-of-the-box auto scaling Metrics, but you you basically invoke a webhooks based on the information you have from your assurance system and So from my side think we're just about ready for questions They did ask the line up at the microphone if you have any questions, but just for food for thought Well, people are thinking about any questions. They have I want to go back to to this diagram Because you know as I've been here this week trying to understand, you know, which projects which open-stack projects How could open-stack help with some of these components that are necessary there and I'll pause it You know something and you guys can agree or disagree is that you know in the core projects You know, we're sort of agreed on you know, what what should do what and you're seeing a lot of the NFE requirements starting to be addressed in there You know a lot of the enhancements in neutron for example and probably heat, you know You know as Thomas said there's some things that need to be adapted to it But you know probably everybody agrees that's a good start for the patterns and templates But if you look at all these other things That's where you start to get into projects that maybe have less adoption at this point and some overlap So for example workflow mist rolls the workflow engine, right? Manasco has some workflow in it as well, and then there's a Tacker, which is going to be an NFE orchestrator. I imagine it's going to need workflow pieces as well So, you know as you go around here, you can probably you know find Open-stack projects that have pieces of what's required here Overlaps, you know, those are some things. I think we need to work out Any questions? Yes Between heat in the orchestrator. Do you want to as the heat reviewer, New York? So we have different levels of orchestration I think for for the VNF manager for example heat is a perfect fit because in this on this layer You can apply a lot of a pattern based orchestration and I mean Jason showed this screenshot of a heat template for virtually EPC So so that is the place where he plays a role But there's also a layer on top of heat so for for chaining several network functions or For talking to you or to your service assurance system at this layer you can apply Workflow technology and in our solution we we currently use our business process management system for doing this Yeah, if you want to talk task because I know Thomas is involved in task as well, so yeah I mean we I'm also involved in the Tosca standardization And we have a working group that deals especially with virtual network function With a VNF profile for Tosca and Tosca includes more than what what heat understands So what what you do or what you can do is use the well a complete Tosca package and then Extract things that can be understood and processed by heat push it on to heat, but you push other elements like policies to another component in OpenStack That that is a possible way to go And I think the other thing about Tosca too is you know you got to be able to bundle all these things up together and deliver them So Tosca has this concept of a cloud service archive file cloud service network service You know maybe this is a you know a good way to bundle these artifacts up But to be honest, I think there are multiple alternatives So I mean as you correctly said heat does does some Level of orchestration, but do you need something on top? You can I mean you can also use heat template composition have a have a high level template that invokes lower level templates That is a way to do it You can use workflow orchestration or you can do this this kind of translation of some higher-level construct into pieces That you put into different All right, thank you for coming up to the microphone Sure So my question is a lot of this stuff is still being standardized in places like Etsy, you know ATF etc Separation of responsibilities has not been well defined So trying to implement this in OpenStack right separating into different projects deciding which project is responsible for what? especially in the orchestration area right Orchestration versus SDN controller versus VNF manager who does what and talks to where we don't know yet So is it too early for OpenStack to start addressing it or? Should we just create de facto standard just because it's already implemented That's never too early for OpenStack to take on the new project. So It's my big tentative joke for this week You know, I think you're bring up a very a valid point is this this issue of For me that I'll take it a little different direction. I mean authorization keeping things bounded for who does what? This is the question the policy question that I have is can that be done as an overlay? With the heats and the attackers of the world or are we gonna have to go back and actually relook at some of the integral assumptions within a Nova and a Keystone or a glance and and maybe rethink how that happens So I think that is I think in general you're gonna see projects like tack or tactically deal with the problem You're talking about or Congress But then and we want to see that happen But then eventually we're gonna have to again as I've mentioned earlier go back and refat Maybe have to refactor something and to make it actually work Yeah, I also think it's not too early to start something in OpenStack. I mean we have project tackle It's they start implementing something. It won't probably be the best solution at the beginnings But it has to change but I I think waiting for a standard to finalize is not the right thing I think you just have to try out things So that's an experience that I made since I joined the OpenStack activities about two and a half years ago and and learn and feedback to the standards world and and It already done it. So I think It's it's not the final solution that we will get today on the next cycle But we just have to get started and implement something Thank you Hey guys, I'm kind of to elaborate on that right now The tools are picked do this every vendors kind of doing it themselves Like before we kind of standardized on heat people were using different things like chef and puppets and then we all move to heat But there's like a list of recommended tools that you're saying, you know If we were to look at it right now These are the tools that you should go for but let us know why these won't work So we don't have one vendor off trying to figure that one way and another vendor trying to figure it off another way And ultimately one of them, you know, many of them will succeed This is this is the the balance that we have to find and this also goes back to Big 10 is okay Do we have do we open up the aperture have a VNF manager per VNF? That's probably not the ideal setup. You want to have some commonality eventually get to one thing So that's one part of it But at the same time you want to have you open open the field up for more innovation And I think in many ways allowing for the chefs the puppet stansables to evolve, you know Let the best tool win in the end, you know, and then also it goes back to us saying about Eat your own babies like realize that there may be somebody that shows up that That threads the needle and solves this problem in a better way And then you need to be ready to replace whatever it was that you you spent your last five years working I mean, I kind of worry that many people are going to solve it in a fairly decent way But it's going to be different and then you're going to align on one that The other guys don't like or something so that maybe having Maybe it's opnv pushing to say, you know, these are the things that we would recommend But if you find something better, let us know and we'll put it into the Well, I think in the opnv thing, this is we've struggled with this in this in opnv to find this Balance between picking and then letting it open. And I think the end answer is back to the to the relying on that gel methods is if we set up the right testing framework in the same with tempest set up the right Testing framework around your the thing you're trying to solve for then you can rely on that To provide Does it work or not? And then you can have a lot of variation inside, you know, and within opnv We're allowing for multiple deployment tools multiple SDNs multiple combinations because we realize it's too hard to pick What is way more important is that the requirements and the specs are clear and that the tests are complete And that somebody rigorously Makes things keep flowing Thank you so much for the panel to the panel for this really insightful conversation My question is about the vision that you guys are driving about like a hundred gigabits per second on a commodity hardware Say doing by stock neutron or like basically custom neutron or something like that If you were to take a playbook page from a playbook from high frequency trading They shaved off a lot of the customized a kernel lot. They made it really lean and achieve that performance But do you expect open-stack community to work on? Making the neutrons so lean that we can achieve this goal while still it remaining relevant for the mainstream open-stack community I hear anchor is he's laughing at me on the subject Yes, I mean this is a big challenge for us right now because the expectation from The people inside telcos and in that VNF vendors is that An x86 is going to infinitely scale to be able to do hundreds of thousand hundreds of millions of packets per second and such This is really the integral problem. We have to solve for we just have to live with the fact that Right now. It's going to be pretty hard to make an x86 box A Linux kernel go from 10 gigs to 25 gigs to 100 gigs the there's a limit and so How have we solved this before? We've solved it with scaling methods scale out methods, right? I mean that that's that's what I always lean back on and it also I think it goes somewhat to what Thomas is talking about to bring it back is about Visibility and having end-to-end visibility into how performance is achieved in the environments that we work on if you look at Just a KVM environment where you have vert IO and you have the host net or you have all the components of neutron Working to orchestrate it. You it's getting more and more complicated The people that we're managing networks. They're like I never touched a Linux box before. What is this? What is this mess so but giving people visibility into how what is going on and where there's Blockages or bottlenecks in a nice easy way is the challenge I have for the for the community is making something that allows us to see it And then then it's easier to solve Thank you. I think we're up on time. Thank you guys for taking part of your morning with us And if you liked it fill out the feedback on the app if you didn't you can come talk to us