 Welcome to the session entitled Field Report, setting up a software product line architecture based on Zephyr. This is part of the OS dependability track of the open source summit. I am Gregory Shu from Legrom Incorporated, and I will be sharing my learnings from using Zephyr as the basis for a secure IoT device software product line framework and architecture, composed primarily from open source projects. Don't worry if you're not already familiar with the software engineering institute's research into software product lines. I will be reviewing that too. By request, the presentation given at the 2021 Zephyr Developer Summit is also being presented here. The ZDS 2021 session was given only 30 minutes for presentation and working group discussion. Here, in the 50 minutes allotted, we have time to get into deeper explanations and context for each of the things, relevance beyond Zephyr based projects, and the envisioned path forward on to the presentation. In the late 1990s, Gula Packard management made a strategic decision to restructure the desktop inkjet printer development to reduce expenses and improve efficiency. The decision was to change the development paradigm from leverage and potential reuse to reuse and extend. My printer project was severely impacted by this, required to deliver a hybrid implementation of functionality from old and new firmware architectures. Internally, we jokingly called this the platypus architecture because the implementation was rather ugly, solid, but unmaintainable. That was the beginning of Gula Packard's Owen printer from our cooperative. A hall of fame example of a software product line and was wildly more successful than anyone involved ever expected. With the onset of IoT security regulations and many years of experience previously working in that software product line, my attempt to set up a new software product line from the ground built upon the Zephyr ecosystem proved largely promising. But a paradigm shift is still needed. What is a software product line? It's worth looking at carefully. Here's the definition provided by the Software Engineering Institute. A software product line is a set of software intensive systems sharing a common managed set of features that satisfy the specific needs of a particular market segment or mission and that are developed from a common set of core assets in a prescribed way. This is a new application of a proven concept. It's not a new concept. In 2008, it was considered an innovative growing concept in software engineering. It actually first came about. The early pioneers in it seemed to be doing it in the late 90s and early 2000s. Software product lines involve strategic planned reuse that yields predictable results. It's really important to understand what a software product line is not. We are not talking about clone and own, a single system development with reuse, one where modifying code is necessary for the single system only. We are not talking about fortuitous small grained reuse, just grabbing reuse libraries, containing algorithms and modules and objects or components. We're not talking about just component-based or service-based development, selecting components or services from an in-house library, the marketplace or the web, with no architectural focus. We're not talking about just versions of a single product, rather simultaneous release and support of multiple products. I got to see the entire generation of HP desktop In-Chat printers get delivered out of this code base. We're not talking about just a configurable architecture. It's a good start, but it's only part of the reuse potential. And we're not talking about just a set of technical standards. Those constrained choices without an architecture-based reuse strategy. Where has a software product line been used? As you can read on the slide, there's a variety of places. Successful software product lines have been built for families of, among other things, command and control ship systems, pagers, medical devices, consumer electronics, billing systems. If we look in the linked slide deck, you will find a number of companies inside of there, even at that time, that have successfully delivered software product lines. Some of which are very well-known names, including Bosch and NASA, Rockwell Collins, if you're familiar with automation, Siemens and the US Army. I mentioned this a moment ago, innovators and early adopters have already demonstrated the feasibility and benefits of software product lines, even by 2008. Short list of early adopters includes Hula Packard. And by this point in time, the Software Engineering Institute and others have tried to lower the adoption barrier by codifying practices, writing case studies, perfecting methods useful in product line approaches and engendering a software product line community. Even back then, many organizations were handsomely achieving their business goals using a software product line approach. I'd like to share with you a little bit more about my experience from that early adopter kind of base. The Hula Packard Owen firmware cooperative was voted into the Software Product Line Hall of Fame in the first session of that conference. It is still going on after, what's this, 23, 24 years. I got status update a couple of months ago that yes, it is still alive and well. It started out targeting IoT products requiring to run seven plus years without a power cycle. So this is just not your everyday product. Well, actually it is. It was largely driven by fax machines. The first seven generations, which were delivered on largely a year cycle, were designed around using only in RTOS on a custom SOC with 32-bit ARM microprocessors. Successive generations extended to a multiprocessor system running Linux and in RTOS. Some of this might sound familiar from any of the other embedded environments. It eventually involved into a continuously integrated repository, live updated by developers across the world. We actually had, that I know of, at one point 400 firmware developers working live on changes that were getting integrated into the main branch. And it was active all but about two hours a day. This grew beyond anything that we'd ever imagined and expanded beyond printers and scanners. How successful was it? Well, that just is the breadth. I'd like to share a story with you. So I was directly involved for the first seven years of the software product line and created derivative products for the industrial space over the next nine years. How effective was this? About four years into the software product line, late in the product development cycle, HP management told me we are creating a new model printer for a specific market. This requires one new feature in firmware and supporting a new combination of existing hardware. The first prototype of the unit would be on my desk in an hour. And I'm the product integrator and new feature developer lead. Along with everything else, I need to deliver this in the same product generation. So I started pulling things together and configuring the project. I ended up being very disappointed in what came next. How long do you think it took me to get functionality complete and fully integrated firmware image other than the new feature? Three weeks? One week? One day? Would you believe 30 minutes? It was ready with the correct model numbers and brands and feature configurations before the first mechanical prototype arrived on my desk. So much for job security. I knew it could be that easy to support a new product, but I hadn't expected that we'd already arrived. Why was I disappointed? I had to ask myself the question, now what? The new feature took a few months to develop, but that experience of everything working together because we designed it to work together left a lasting impression. So I changed companies. I'm now working in commercial building control systems, lighting control, distributed systems that are deployed in public and private areas. And we're now having to confront security. So security takes a new rigger on firmware development than we've been used to in the past. I had the opportunity to evaluate how if we're gonna change this code over to being secure code, we're gonna be able to do that. We're gonna change this code over to being secure code. What else can we take advantage of at the same time? So I took some time and I evaluated and attempted to use Zephyr as the basis for a software product line. But how did I do that? You're gonna find Zephyr specific details listed here. One was I started with reuse and extend as the paradigm. So we're reusing the Zephyr ecosystem and choosing supported options inside of that regarding what's the topology of the repositories? How do we tie into the build system? How do we configure and adjust code for the different variation points that we need on there? What else? We don't copy the code. We wanna keep the history on there. Did I create internal forks? Yes, because I anticipate there would be a point where in each repository, I will need to tweak something or isolate particular changes. And so I forked all the open source stuff, made my own branch inside of that, migrate stuff over, merge stuff over as appropriate and try to absolutely minimize what I put in my forks. Repository organization. What are we end up doing at this point? Following the ownership and the licensing that's available, what is proprietary, what is cross licensed, what is open source licensed. I ended up with, oh, more than 50 repositories that we're building together for this. One of them contains all the proprietary content and it contains the manifest of which repositories and which shaws at which repositories, branches and commits are being seen by the build system and being configured against for building the product. So we have one of those for all of our proprietary stuff. The open source and the commercial repositories have been extended to be Zephyr build system recognized modules. Inside of each of these extensions is glue code, APIs, tests, samples, documentation and manifest if needed for isolated development and verification of the extension. And all the Zephyr and Zephyr forks with already set of module repositories also get brought in. The, so we're really focused on reusing and extending this. All the configurable subsystems and drivers contain all the functionality. The project, because we're gonna be building multiple products out of this and whatever is the source code tree that is product specific. The goal is it has no functionality in project unique source tree. In Zephyr main function actually gets called after the entire OS initializes and it is left up to the top level application. In this exercise main is intentionally left empty because I don't want to copy any code between projects. I wanna reuse, live reuse the same code. What else do we do? I check to make sure that it can support multiple manifests and figure out how to do that because there are multiple forks of Zephyr and we need to be able to have part of our products be built off of one and part of our products be built off of another. The reason for that is these forks are seem to be provided by hardware chip vendors and when we change chips, we may need to tweak and change forks but a lot of our application layer proprietary protocols and all that kind of stuff needs to say the same. The workspace is that collection of repositories and we have our own documentation. We need to capture that documentation in our overarching repository, the manifest repository. We chose to parallel the Zephyr documentation structure and tools and the documentation structure is used for generating webpages. It's used for generating PDFs. It is also used in Zephyr or will be used in Zephyr for generating traceability tables of requirements. So the goal was we're attempting to do all of these things and really build our software product line as an extension and application if you will of the Zephyr ecosystem. Now, that was the goal. How do we measure this? The Software Engineering Institute gives us a table already of one of the costs of a software product line and as you can see on the left, there's assets that are identified and what are the costs that are involved over single product focus development. Creating and maintaining a software product line is not free. But it is frequently more efficient overall than maintaining separate implementations. This includes the necessary business support and alignment that the governance structure that are essential parts of the software product line. Those aspects are not being considered in the scope of this presentation and are already covered in the SEI course material on software product lines. What we will do is we will look at the state of the Zephyr project ecosystem related to each of these core assets. Let's start with the first one, architecture. The cost is it must support variation inherent in the product line. So you just don't have a plan to know architecture. You have an architecture to solve a problem and as it varies, you have the scope of the product line is encompassed in the single architecture. So Zephyr has multiple mechanisms already in place to help support this. You can see that on the left and I'm not going to read through it all. I also noticed from previous experience and from attempting to use this that there's things that are provided by the Hall of Fame project that I worked on that are lacking or seem to be lacking at this point. That's the list on the right. So among these and several of these are worth going through. Giving guidance on when to design to use your own versus a caller's thread stack. That really ends up being a significant question as I've pulled functionality together from multiple repositories back on printers. And when I've had to do the same thing here is what stack is stuff running on? What priority is stuff running on? How do I allocate my resources? And that impacts design. Another one is provide a clean controlled shutdown. And right now Zephyr has none. What I've experienced before is a need to do this in order to be able to close logs out appropriately before a reset, especially before doing a firmware update, being able to have a clean controlled handoff from boot code to application code. There were a few other cases that came up where a clean shutdown was really best practice because it modeled what happens with an object-oriented resources. There's a cleanup that needs to be done. Does everything require a cleanup? No, but you do need to have a mechanism for that. Let's go a little bit farther. Software components. So software product line is composable of components. You'd be able to assemble these things. There's already some examples that Zephyr meets. As you can see, the cost is it must be designed to be general without a loss of performance. Must build in support for the variation points. There's some support for this already in Zephyr. Ords, device tree, driver subsystems libraries, multiple manifest support. There's opportunity beyond this. Adding the directory tree, there's already a directory tree for test applications, ones where you're running a particular test case, where samples are provided that demonstrate how to call on an API without having to deal with all the error handling. What there isn't in the directory tree is a prescribed place to put actual applications, non-test, non-sample applications. And predictability is a very good thing. When we start looking at how does this get used beyond just the particular target environment, we have to set ourselves up to avoid collisions. And so being able to specify and say, everybody should put all of these open source repositories that are going to be aggregated together, should put their application code in this subtree, according to this particular organization and naming pattern, it avoids a lot of conflict. What else? Other things related to how documentation gets designed. Specifying, latent, and it's tolerated for each ISR or thread and preemption level. When you do spine-grained reuse, the person that's developing the reusable components generally doesn't have to worry about how does it give up the processor or the bandwidth of the resources. But as an integrator, pulling all of these things together that have already been designed, I need to know how much I can push around the execution environment and what the design assumptions are without breaking the particular system. Let's go a little bit farther. Test plans, cases, and data must consider variation points and multiple instances of the product line. A lot of this means that the tests need to be parametrized or the test cases need to be parametrized. The results can vary from one product to another, but the algorithm would probably be the same. There is some support for this already. Zephyr already has a Z test library. They've got tests and samples folders. They've got a test runner named Twister. They've already got requirements tags as an extension in the comments using Doxygen. The requirements are coming, needing to, we need to capture better. As an integrator, I need to be able to find how I can hook in whatever else I need for code coverage, MCDC, this may already be there. I need to know what the test plans are for each of the integrated components that I integrate. I need to know what the cases are, the data points that are there because they have to reflect all the variation test points. I also need to know when I integrate this, have I integrated it correctly? It doesn't work. Tools and processes that must be more robust than required for single product development. This is largely supported by Zephyr. It actually needs to get a bit more. One of the things that I discovered was a lot of the concepts are there in Zephyr for supporting these types of things, but they aren't continuously being verified as staying functional yet. For instance, the extension mechanism of module.yaml file that the build system looks for has settings that are inside of it, but I have not noticed in the continuous integration system any tests to make sure that those settings mechanisms are working properly. I expect things have matured some since the last look, but I also expect that there's other issues very similar to that. Let's go a little bit farther. Business case and market analysis. Must address a family of software products, not just one product. Zephyr is actually pretty good at this because they're setting it up for having lots of examples inside of it and lots of samples. There does, they need to take it another level. There's opportunity to update the directory documentation and test cases and samples. And so it's not fully there yet. Project plans must be generic or made extensible to accommodate product variations. Zephyr finds out, the Zephyr build system finds out about repositories and the module instructions in repositories through a couple of environment variables, that tell it where these have been mounted on the tree. This came about before they added any support in for Git sub-modules. And there's actually a lot of benefit in not using Git sub-modules because it means I don't have to change what the parent is when I need to tweak the contents in a sub-module. It actually opens up the opportunity for reusing the parent repository because the link isn't in the repository itself, at least not directly. For high-level support, documentation structure and modularity. Let's go a little bit farther. People, skills, and training must involve training and expertise centered around the assets and procedures associated with the product line. This is in many senses beyond the scope of Zephyr project because the assets are gonna be whatever is needed for the customer identified product line. But Zephyr project is setting up procedures that could be and ideally would be appropriate for product line support. So some of the things that I noticed I would like to see clearly define terms and consistently apply them, terms like code base and module. There's opportunity for clarity there. Consistently following the existing processes. I have noticed some inconsistencies in that. So it will be beneficial for those of us using it if things get a little bit more formalized. Recognize that requirements tracing and documentation generation in general needs to be just as important and extensible as the source code. I'm trying to use the Zephyr documentation structure and pattern and tools for generating all the content I need in my software product line. And I have talked with others that are trying to use it for generating documentation for the customers that have contracted them to do work. This is a really valuable tool if we can bring it up into shape. Let's go a little bit farther. Oops. What's the overall assessment? In my opinion, the Zephyr project is about halfway to where the Hall of Fame winners have already led us. With direction, alignment, and investment, it's possible to match their architectural capability within three years. The Zephyr project will inevitably mature as a software product line because the commitments to adopting and applying software development best practices have launched it in a promising direction. The governance and commitment to security and the audited safety releases are providing framework and infrastructure for a reusable, composable system appropriate for a broad community. And finally, if it doesn't fly, it will fall out of the sky. And for those that are not familiar, the logo for Zephyr is a kite. So what do I think is needed for an SPL best practice? Three things primarily. One is embrace extensibility. How is each aspect of the ecosystem impacted by module extensibility? It has long fingers that go many directions and it's really powerful when it's in place, but it takes a commitment and it takes ongoing maintenance and verification. Embrace live reuse. What does that for ecosystem need to look like when Zephyr repository is not the focus of code and documentation. It's not the manifest repository. What do the submodels need to look like when clients and application repositories have to configure it without having changes injected into it. Third one, embrace composability. What ought to be common so that arbitrary combinations of subsystems and drivers have the requirements met. How do the different elements that can be brought in and out? How do they work together by design to coexist and cooperate or at least not infringe on each other if they're independent? So there's, I think those are the three paradigm changes that need to happen now. I'm sure there's other changes that will be needed on top of that, but that's as best as I could summarize it. Let's go a little bit farther. I've got some proposals that I gave to the Zephyr project on what to do as a next step forward to begin with that paradigm shift. First one, secure firmware imposes requirements on the entire executable. All code built into the image is affected. So the initial proposal, define code base in glossary to be all the source viewable by the Zephyr build system. That's going to change the dialogue and it's going to help clarify what are we talking about? Second, when secure firmware artifacts are needed and they're not needed everywhere, the artifacts also need to cover the entire code base. So the proposal is all tools needed for firmware configuration, build, debugging, testing, and document generation need to, for specific, recognize and operate on the entire code base found through those environment variables that I mentioned earlier. These same concepts we're going to apply no matter whether it's Zephyr or other ecosystems. Let's go a little farther. Zephyr has adopted as a recommendation at this point following many of the MISRA coding guidelines. They're not being strictly compliant at this point. One of them that's really notable because of how much it affects things. I've listed here. It's MISRA directive 3.1, which is required in the states. All code shall be traceable to documented requirements. So generated requirements traceability artifacts will be needed to include the entire code base as part of all. Supporting the star topology was Zephyr, not the hub. Implies that Zephyr documentation tools can generate documentation including requirements traceability from the entire code base. So the proposal is the generated Zephyr documentation are to be adjusted to include documentation found in the entire code base. How documentation is reorganized needs to be managed by the document maintainers and collaborators. Fairly straightforward. Let's go a little farther. Embracing live reuse, not life reuse. Live reuse includes topologies where the Zephyr repository is just another module in the workspace. It happens to be the one that provides the build system. It's the one that provides the kernel, but really everything else could be pulled in from a separate repository. What it means is the proposal is the Zephyr documentation generation tool support workspace level content coming from the manifest repository. The one that is in the hub. And there's a list of things that I've said I've identified is really belonging up at the workspace. There's another set of content that I've identified really belong at the module. But the idea is we're pulling together workspace level documentation and Zephyr's not in the center. Second proposal, the Zephyr continuous integration system verifies support for all user configurable settings in the extension mechanism of module.yaml. Let's go to the last one. Embracing composability. Composability means managing boards, drivers and subsystems so that they can be assembled into a working device without requiring crafting or tuning. The proposal is, and like I said, this is just a start, specify a location in the Zephyr directory tree structure for the configuration overlays that would be used for enabling particular functionality and show how they are referenced in the project's build file. Now, this is not the only proposal or only set of proposals that are necessary for a mature system. These are just the most obvious ones that will start to drive a change in paradigm. Let's go a little bit farther. That's what I presented at the Zephyr Developer Summit. There wasn't the opportunity to engage in discussion and take action on those, but there is opportunity coming up. We're talking about a larger audience here. The previous content, as you can see, is specific to a Zephyr release, happens to be 2.6.0, but most of it is not unique to Zephyr. Much of it applies to embedded software systems in general. So we're now going to look at issues that come up when integrating multiple open-source repositories into a software product line architecture. I ran across these when I was working at HP. I ran across them again when I was pulling open-source projects into Zephyr and having to extend them for the build system. And so I know that what I've run into here so far is not Zephyr unique. Let's see what we found. Continuous integration of multiple open-source projects into an SPO. Let's just review what are the things that have to come together for reuse. And this was pulled from the costs and the definition that are on there. So we must satisfy the specific needs of a particular market segment or mission. We must be designed to be general without a loss of performance. We must build in support for variation points. Must be generic or made extensible to accommodate product variations. Must be more robust than for a single function product. And it must involve training and expertise centered around assets and procedures associated with the product line. And all of this is review. But I wanted a refresher before we moved on again. Documentation needs to include design assumptions and constraints such as latency tolerances and required performances. Keep in mind some integrators of these reused components will put these in a system that behaves differently than the project designers assume. The documentation needs to be entirely editable content because we're reusing it and we're extending it. We have to be able to edit it. Remember, the integrator is not necessarily the last consumer. And it must include the processes and what needed for re-verification of the integrated functionality. We have to know how to verify that it's inside of there. And this may even, let's go farther. Verification needs to have the processes and tests for on and off target verification. Why do we have to verify off on target? Because it's a different tool chain than off target because we're shipping it on target. Why do we want to verify off target? Because generally it's a whole lot faster and cheaper. The number of target systems that are available may be extremely costly and extremely limited. The report, you have to report the results in a format for automated analysis and aggregation. We have to pull together the integrated results just like we have to compose an integrated system. The verification needs to be complete enough to identify unreachable and dead code. Now for some of you, these may be new terms, but these actually come from Misra requirements also. Misra C defines unreachable code as code that cannot be executed. Nobody calls it. And it defines dead code as the code that can be executed but has no effect on the functional behavior of the program. These definitions differ what traditional terminology which refers to the first as dead code in the second category is useless code. The Misra requirements neither exists in the code, in the final executable. So there's a lot that has to be provided by the integrators if it's not provided by the open source group, open source maintainers. And given the maintainers are the ones that are fixing bugs and rolling them out, it's just more work on everybody if they don't provide this verification, this level of verification. Let's go a little bit farther. Implementation needs to have client controllable global namespace management. When you're working in an entirely private repository and private code, you have the luxury of knowing that you're owning everything in the entire namespace. Then, and you're responsible for maintaining it. When you start integrating open source content inside of there, suddenly you don't. And when you're combining multiple open source repositories into the same executable, either they need to make sure they don't conflict with each other or we no longer can reuse the code. What it means is the strategy is you can't just use whatever names you want to. You kind of have to carve out your own part of the namespace and stick within it. What else? There needs to be an alignment on level of static analysis coverage. This is becoming critically important with security types of things. There needs to be alignment on the level of automated unit test coverage. And this is needed for verification of integration, verification in a build environment, especially to start bringing in building these things under different tools. Now, one of the consequences is when we're talking about alignment, if you want the code to be able to be reused in a particular target system, you almost always have to meet the demands of that target system. This one's gonna be pretty general. Meet the restrictions for secure coding such as MSRC 2012. And the question is gonna be which rules and advisements and guidance and directives do you follow? The ones that you want, the ones that are chosen by the integrators. It's a hard challenge, but this is what we face when pulling things together. Design needs to, in our particular case with Zephyr, and the design of whatever we're gonna integrate needs to exist under preemptive scheduling. When you get code coming in that assumes cooperative scheduling, we have to redesign. When you get code in there that assumes a private address space and we're now talking about it running in kernel address space, we may need to redesign. These really are inhibitors to reuse. Let's see, what else? It has to use the OS constructs appropriately. Now, in this particular case, some of the things that we have to deal with are when is it a mutex and when is it an in-tlock for providing mutual exclusion that's on there? What is the preemption priority that this is expected to run at? What is involved with polling? Does it expect to hang onto the stack or give it up? Lots of things like that are concerns what we're talking about, especially a resource constrained RTOS environment. What other design issues? Providing configurability and extensibility? Absolutely. Defense in depth, this is a security thing is are we checking, always checking for making sure that the pointer is not null before we dereference it, that the parameter is into a function or valid? Are we handling to make sure that all calls to other functions if they return and have a return value that we're checking the return value and acting on it appropriately? Aligning on error reporting and handling. So is it okay on any error to just reset the device to kill the application? Is it, the answer is not always. In some target systems, it may be that you have to keep running code and be able to put the mechanics into a safe state such as when you're controlling a train. Do you fail safe? Do you fail secure? Do you fail gracefully? All of those types of things matter and would require touching the code if they don't fit or using something else. Let's go a little bit farther. Requirements need to be explicit, thorough and a complete set. This really got driven home to me when I recognized that an empty list of formal requirements statements means do only nothing. So if you started with an empty list of formal requirements and said this is what I need the product to beat, a solar powered calculator doesn't meet the requirements because when you touch it or press on it or do anything to it, it can do something. But a rock does or a brick does. And this is also a paradigm shift that we're all having to go through is what are the requirements? How do we capture them? How do we describe them? What are the terminology that we use? Do we all agree with what they are? Requirements will need to be traced to specific higher layer requirements. Why is this in there? Is it dead code? Is it unreachable code? Requirements need to get under track change control. You have to know why these changed. And the list of requirements or the management of the requirements needs to include explicit definitions of the terminology used. Reminds me of a story that probably doesn't matter. But yeah, we have to know exactly what's being talked about here. Especially with open source. Let's go a little farther. So the path forward. How do we engineer the path forward? After some reflection, the statement that came up with two statements that seem to really distill all of this down together. Software product line is the current best practice model that we're trying to achieve with reuse of software in embedded systems, open source or not. Engineering the software is the best practice approach for perpetually meeting the reusability requirements of a software product line. What do we mean by engineering the software? Take a look at the software engineering body of knowledge at the link provided. We were really talking about formal software engineering. It is a process already required for many safety critical systems. Embedded software for internet connected devices is becoming increasingly regulated and will need to be no less rigorously developed in both private and open source environments. It's a mouthful. And I've wrestled with this for a while but this is the conclusion I keep finding myself coming back to. So let's sum up where we've been on this particular thing. We've reviewed what a software product line is, is not and some of the costs incurred in creating and maintaining one. We've evaluated the suitability of the Zephyr project ecosystem as the basis for a software product line for resource constrained secure devices and identified three core paradigm shifts needed for maturity. One of them is embrace extensibility. One of them is embrace reuse and one of them is embrace composability. Beyond Zephyr, we identified key issues that need to be broadly addressed for continuous integration of multiple open source projects into a software product line. Looking back on all that, we've concluded that the best practice process to perpetually meeting the reusability requirements for a software product line is to engineer the software projects for specific target domains. Thank you.