 and to be alive. So hi everybody, this is the first meeting of Cloud Native Special Interest Group. As you probably know, Jenkins Project has introduced a new initiative in order to have smaller teams working on particular areas and this team is working on everything related to Cloud Native and especially to Plugable Storage. So today we will have short introductions and then we will talk about external task logging story. It's one of the foundation stories we defined in 2016 for Plugable Storage. So we will start from that and I hope that then we will have regular meetings to talk about various aspects of Cloud Native Genfits. So yeah, I guess that's introduction. So maybe we could start from a short introductions of all the people who are on the call. I'll share my screen. Okay, do you see my screen now? I bet so. So yeah. Yes, we're in the matrix. Well, I even have slides. Yeah, so if you want to join the meeting, there is a link to the chat, Jenkins.ci slash Cloud Native seek in Gitter. If you join this chat, you will see the link and feel free to use this chat to post any questions. So we will be monitoring this chat so you can just ask questions there. And we also have a meeting notes document. It's here and yeah, in this meeting notes, we will track the entire discussion. So my proposal is to just start from introductions as I said before. So yeah, probably I should just try to use myself first. Yeah, so my name is Alekty Narshev. I'm a Jenkins core maintainer. So I do a lot of the stuff on the core site and previously I used to maintain large scale instances. So all stories are related to Jenkins collability and reliability are really important to me. That's why I decided to join this special interest group and that's why I wanted to discuss some projects like external logging in this group. So yeah, that's me. Okay, Jason. Sure, I'm Jesse Glick. I'm a longtime Jenkins contributor and work at CloudBees. And let's see, I've done a lot of work on Jenkins core in different aspects and on the sort of basic infrastructure for Jenkins pipeline build system. And for a while now I've been working with OEG and Carlos on a scalability of Jenkins storage. So, for example, the external artifacts storage. Lef? Mm-hmm. Thanks, Jason. So let's start from the left, Alex. Hi, I'm Alex Nordland. I work as a, I work in DevOps at Sinober and I'm interested in scaling Jenkins and getting things externalized to SQL. Mm-hmm. We had a discussion with Alex about configuration storages. So it's also one of the stories which we will be discussing at the next meetings. Okay, I lost the chat again. So, Liam? Yeah, hi. Just turn over here for a minute. Hello, I'm Liam Newman. I work at CloudBees. I'm a Jenkins technical evangelist on time DevOps. Well, not DevOps. CICD advocate. And I'm interested in scaling Jenkins. Definitely, I've mostly worked with, I've seen those sort of the limitations of what one Jenkins master can do on its own. I'm interested to see where we can go if we move some of those limitations. Okay. Rick? Yeah, by the way, this is editable to everybody. So you can just edit your self description. Okay. Sorry for interruption. Okay. Hello, I'm Rick and I'm working on a DevOps project in China company. And we expect a lot of things find a way to get a log in a more beautiful way. Yes, thank you. Yeah. Just please feel free to edit this chat. Yeah. So, who else do we have on the call? Carlos? Yeah. This is Carlos Sanchez. I'm working on the GSE on externalization of the data on Jenkins. I've been contributing to Jenkins for some years now. And I guess in the chair of this six, we'll see more of me in the future. Yeah, right. And yeah. Oh, we've got so many people on the call. Evilina? Hey, I'm a continuous integration, continuous delivery consultant in Pragma, Scandinavian based company. And currently I'm working on Jenkins configuration as code plugin. We were actually considering having a separate group for that one, but we've decided to check if it maybe makes more sense to be a member of cloud native group. So let's see how that goes. Yeah, it's still my extra item to raise this discussion. Thank you. Yeah, Tracy. Hi, this is Tracy. Directive Open Source Community at CloudBees. And yeah, just generally looking forward to Jenkins going more cloud native. Evik? I'm back. I'm Vicky Glacius. I work at Google Cloud and I've been using Jenkins for a long time and I've been eventualizing best practices for Jenkins on GCP for a few years now. I do a lot of the publishing of articles and tutorials on our site and externalizing that through conferences and things like that. So I'm just here to make sure that Google can plug into all these cloud native things that are happening. Yeah, thank you. Yeah, I'm really making notes. So yeah, please feel free to add it. And yeah, who else do we have on the call? Jeff. Hi, my name is Jeff Pierce. I'm a development manager at GoDaddy and we're in the process of moving our stuff from OpenStack to AWS. And so we're interested in moving our Jenkins cluster as well. I've been a plugin developer for about a year. So I'm also interested in contributing to make sense. Thanks everybody for introductions. Have I missed anybody? Just a second, I'll paste the link to the chat. Okay. Yeah, the link is pasted. Okay, so yeah, let's go back to the seek introduction. Actually, I already said some beats, but yeah, generally if you want to know more about seek, you can find information using the links on the slides. So as you may have seen, we have a GitHub chat. We also have a mailing list where you can find all the ongoing discussions. So it's Jenkins-cloud-native-seek and yeah, there are some threads already and it's just the first meeting. So I hope that we will get more and more discussions here. Then what else we have? Yeah, actually, we have some discussions in the developer mailing list and I recently also posted my vision about what cloud-native-seek would do. So today we've heard some inputs and interests from seek participants and if you want to contribute your vision and if you want to wait in on what you would like to see being delivered cloud-native-seek, just respond to this thread, cloud-native-seek vision and priorities. So let's have a discussion there. So what do we do today? Today we do, yeah, today we have a plugable storage discussions in general and external loading overview and then we will be able to discuss all other stories. We may want to cover the next meetings. So, yeah, as you've seen, we have submitted and found major things and you may see that there is a number of existing JEPs. Several ones are related to external logging. So firstly, it's UTF-8 use for pipeline build. So then there is external build login and Jenkins core, one pending JEP for external logging PPI and then external log storage for pipeline. So all these things are linked in my linked list and yeah, I've already got some feedback from seek participants. So I assume that, you know, what is the context of these JEPs and but yeah, I'm going to just make a really, really short introduction before we deep dive into the discussion. So if you have any questions, again, please use it too. Okay, a pluggable storage. So this story may mean different things for everybody but generally it's well known that Jenkins stores pretty much everything in Jenkins home. So it means that it stores logs, configurations, all kinds of metadata there. This Jenkins home is generally a location on the machine hosting Jenkins master. Of course you can connect NFS and externalize it in such way but it's still a well-known scalability and performance bottleneck because big storage means big loading time sometimes. Sometimes this data just cannot be externalized. Sometimes it's not trivial to do the backup. So yeah, there are many concerns about these storages and that there are many proposals. Actually, there are also many plugins. For example, for artifact manager, you will, there are plugins like Artifactory plugin, OS3 plugin which allowed externalizing artifacts using custom steps. Same for login, there is a LogStash plugin which also allows externalizing clock but the problem was that all these features were implemented in plugins. Sometimes they were implemented using various hacks and the workarounds and it was pretty difficult to configure the instance and to use that. So yeah, even if some storage can be addressed, it's still a big problem because from Jenkins architecture standpoint, we cannot easily implement these things. So plugin developers spent a lot of time sometimes they hit issues with missing extensibility. So for example, Alex had this XML storage when he wanted to externalize it to SQL storage. Probably he discovered that XML file is just hard coded as file within Jenkins APIs. So it would be hard to externalize that. So all kinds of these problems are in the scope for plugable storage and actually plugable storage may include multiple instances. So it may be artifacts, build logs, configurations, et cetera, et cetera. Two years ago, we had a contributor summit when we spent maybe three hours and a lot of time after the discussion to talk about these stories, to prioritize them somehow. It was two years ago. So I don't think that these priorities are still relevant. So in this seek, I would rather expect the team to define new priorities. Okay, the story we will talk today is build logging. Just to check, did everybody take a look at the designs and the proposals? So if you haven't taken a look, just tell it. So I will probably spend more time describing how it works. Hello. I think that means that one's willing to continue on. Unless people are muted and just... Yeah, I think I think it would be cool to just see an overall overview of what the design is. Maybe not a deep dive, but just the integration points maybe. Okay, let's do that. So yeah, for build log and cut scale, there are two major programs. So first one is that all logs go through master. So what does it mean that if your agent produces a build log, for example, one megabyte or so, what it does, firstly, it streams the entire log to the master, then master saves it to the disk. And if it's one agent, it's fine. If you have hundreds of agents, it means that every time you may have pretty high network traffic, you may have a pretty high traffic to the disk. And moreover, you can have a pretty big RAM and CPU overhead because you need to buffer these logs. You need to process them. There is removing protocols involved which sent all this data. So effectively it hits all the resources and one of the biggest problems that Jenkins itself doesn't prioritize traffic. So it means that if you sent gigabytes of logs from agents, some system traffic may get lost and it may lead to the instability of the entire agent ecosystem. And we heated multiple times before. Another problem is browsing of logs because when users browse the log, yeah, they may go to the console each time. Jenkins master has to load this log in order to display them. And probably you've hit the situation when your users forgot about closing browser pages. And then there were hundreds of browser pages, browsing logs and each of these browser page consumes some master resources. And eventually even such browsing may hit the stability of the system. So for example, we had a case when we had a network disconnect in our office. People had open browser pages. At this point, these pages had automatic reload. And yeah, when the network restored, we had something like a few dozen log browsing sessions pulling in all logs and it completely, it may just cost the dose on the master side. So these issues are well known. Actually, we were discussing in 2016. 2016, yeah, there were lots of discussions. And the idea was that instead of sending all logs to the disk, we would rather send them to external storage and they display them from external storage. So you may see some arrows there. So the main idea that nothing goes to the disk and then master and agents sent logs directly to the storage. So there is no extra traffic between master and agent. And then agent users will be able to browse the logs directly from console without utilizing master. So for example, we could implement some thinker which would use JavaScript in order to retrieve for the information on the client side and just display that. So it was the idea in 2016, we prototyped that. So you may see that there is just a screenshot of my presentation is demos. I will show this demo later if you're interested. But yeah, the idea is trivial. So in 2016, we created some prototypes for Elasticsearch. So there were several contributors. And in 2018, this winter and then in spring, we revised the stories. We refactored some pipeline storage. So there is a lot of requests created by JSA. And finally, we created a new demo so that it's something you could try if you're interested. So this demo is fetched as docket compose. You can just run that. And this demo allows to do some external logging using Elasticsearch. So yeah, I'll probably show it later. I just click make run. Maybe I should just show it. Okay. Yeah, I'll show it later then. I was switching between the demos and I have similar container names. And by the way, Oleg, I have a demo of the pipeline side of stuff with CloudWatch. About any time, if you will. Okay, then we can just demo CloudWatch and then I can demo Elasticsearch if somebody is interested. Okay, so regarding the JEPs, the current vision is that we have a logging system. This logging system will be generally based on the external logging KPI plugin. So it's just a top level plugin which offers high level interfaces and plugin implementations can extend the CPI and extension points in order to log and to send logs and to browse them. So we expect the most of the complexity to be based around here. And yeah, we still need to do some bits on the core side because currently you cannot easily do external logging for freestyle projects. In 2016, we have created an implementation based on the LogStash plugin and we were able to work around the most of the issues in Jenkins Core, but it was a kind of weird hack. And instead of that, it would be rather preferable to have pure implementation in Jenkins Core with extra APIs added. So for example, it would be possible to set up define the logging method and log browsers for runs and then all APIs would be handling that. In addition, there are pipeline patches probably these pipeline patches, yeah, most likely these pipeline patches will be much more complex than Jenkins Core patches themselves. So the idea that a durable task has a different implementation. And in order to utilize external logging in pipeline durable tasks, you need something different. So Jesse will talk briefly how it's implemented later. But yeah, generally for pipeline, it's possible to create external logging engine without utilizing as in Jenkins Core at all. So it's not clear how these stories will interact between each other because currently we have independent implementation for JEP 2.0 which doesn't utilize Jenkins Core patches and we have Jenkins Core patches which also provides some overlapping API. But the good thing is that this complexity is hidden by external logging API plugin. So we just offer some rich methods for pipeline and then final implementations don't need to care about that. So yeah, this is a current vision of how it would work and yeah, this is how we implemented the JEPs. And let me just jump in here. The links to these JEPs are posted in the chat and also in the notes for the meeting. If anyone's interested. Yeah, sir. We can come back with, if you're looking for, you're only gonna get those there and then they're in the docs and also in the chat. Yeah, thanks, Liam. So in addition to that, we have some related JEPs. As I mentioned, the UTF-8 Login for Pipeline and there is open discussion about whether we want to have UTF-8 in the Jenkins Core. So let's talk about it later. Okay, regarding the implementations, currently we plan three base implementations. So one is file system-based storage for Jenkins Core. So this implementation is just what we have now, exactly in the same behavior. So logs are streamed through the master, logs are stored on the file system and this compatibility thing is required so that nobody notices changes. So when they update Jenkins to the version with new APIs. And then we have two reference implementations in prototypes. So one is based on elastic search. I mean, this implementation is a generic one. So you can run it in the cloud or in a classic setup. It doesn't require a specific environment. And yeah, it's fully available now. It's posted on the Jenkins site. And another implementation has been implemented by JC for FluentD and CloudWatch. It's also available now. It's EWS native. It's actually EWS CloudWatch. So the idea that we show cloud native implementation, which is highly efficient for a particular service and some EWS focused projects, for example, Jenkins Essentials could use this implementation and get benefits from that. So these are the current reference implementations we plan. And if you are interested to implement something else, let's discuss that at the call. I have a question on the FluentD and CloudWatch plugin. Is it possible to build off of the FluentD setup there to do other providers with just FluentD? I know that most things can run through FluentD, like even Elasticsearch and Stackdriver and a bunch of other things can be configured. Yeah, that's possible. Actually, I guess I could take that. At least in the current prototype, the FluentD specific code is so short, it's pretty much irrelevant. Actually, most of the complexity is in the CloudWatch side plus the underlying changes to pipeline logging and so on. But yeah, I mean, if you needed to have... Yeah, I mean, somehow or another, you need to have some way for Jenkins to send data and some way for it to receive data if and when required. And the sending side in this case is pretty simple. Yeah, right. Cool, and then a question on the viewing side. How is authentication, is the idea already to be kind of browser specific JS to pull the data and how would authentication work for each user? That's currently handled in the Jenkins master. That might be clearer if we do a demo of that. So maybe we might wanna hold off in that sort of question until after we have a quick... Maybe I should just... So we have JEP 207 and the story is explicitly covered there. So client site only log browsing. So we had some demo implementations for Kibana in the case of Elasticsearch and Jesse also has implementation for CloudWatch. But in the scope of the current JEPs, we decided to put this question aside. So it's possible, but we do not add specific APIs for that. But yeah, we can talk how it's implemented and it's pretty trivial. So in the current reference implementations for both... Yeah, sorry, for both Elasticsearch and Q&B, we wrote a log browsing through master for now. Does it have an option to ingest and in a... Well, we looked at options from Flu indeed to ingest the data from the backend, but then you need to query the backend in a specific way to say, whenever you're browsing a job, you want to see the logs for just a job, things like that that Flu indeed doesn't log. Flu indeed is more like just streaming everything so that there's no easy way to use it for consuming data. You have to go to the backend. Yeah, right. So speaking of the end design, so what I did in the design that I defined separate abstraction layers for logging method and log browsing method. So it means that, for example, if you know that you sent the log to log stage but browse for Elasticsearch or something like that, you can explicitly define different logging method and browsing method and it would be working. So technically we can create more than one log browsing implementation depending on what backend storage you use. It's included in the design now and it's how it would be working. Okay, so since I've already presented this slide, so this is how the current demo for Elasticsearch implemented. So we have Elasticsearch external logging plugin. It has one implementation for logging, one implementation for log browsing. Previously, as I said, I had implementation for Kibana but for now I removed it. Maybe I will restore that depending if time allows. Originally this implementation was based on the log stage plugin, but in the current version I removed this dependency because log stage plugin has some issues. For example, JEP 200, there are also some missing public APIs so the plugin would need a patch and for the reference implementation, I decided to have a clean engine. This plugin depends on external built login KPI which implements all the stuff. So these implementations are really small. They just query the data and send it back using common interfaces and external login KPI does the main job. So yeah, everything can be there and like I can show you the code. So yeah, it's pretty simple. Regarding cloud watching 3D, the approach is pretty similar. The current draft doesn't use external login KPI because yeah, it has been created to as a reference implementation for pipeline log storage. But yeah, I think that for the final implementation we would be rather start using external built log KPI as well, especially if you want to support pipeline freestyle projects. Okay, so these are two implementations. Karan stated that all JEPs are submitted. Three JEPs have been already accepted as draft. One JEP is approved. So I guess the only thing left here is to just integrate it. Yeah, I just haven't, I'm actually working on that. Right as we speak, it's getting that number. Okay, thank you. So yeah, all these JEPs ready to go. We have started some discussions in the mailing list. I think we will need more discussions because yeah, there are some design edge cases and yeah, regarding reference implementations, we have two core API and file system storage is ready for review. So it's here, Jenkins CI, Jenkins tools. So yeah, here's this thing. So I hope that we could integrate it as better API shortly so that we can start evaluating the plugins. But yeah, it's something to be discussed. Regarding the rest, yeah, all other things also published and ready for review. So for example, external login KPI. Second, external login KPI is here. It's, the build is not passing. I need to pass to fix something. But yeah, it's here. So you can take a look how it works and it starts. It has some integrated mock test. And finally, we have log stage plugin. So it's also ready. It's here. You may see that the code is pretty big now. But just to clarify, this code includes all kinds of integration tests, demos, et cetera. And some bits still need to be moved to external login KPI. So for example, we define an extra JSON convenience layer because many common external login commentations use JSON. So it would be rather reasonable to have it an external login KPI. And yeah, these things still need to be moved. But yeah, this demo basically started working. And yeah, I'm pretty confident about that. Regarding pipeline storage, we also have everything in place and pending reviews. I forgot to paste the link here, but yeah, as Jesse said, everything is linked from the JEP. So here's JEP 210, you go here. And here you can see a number of implementations. So you may see the durable tasks patched, a number of pipeline plugins, and here's a link to the reference implementation. Okay. So just regarding the design in general, I think it makes sense to discuss these questions once we finish with demos. So one of the questions submitted by Alex was about multi-destination logging where we can send logs to multiple destinations at once. Then we have pending question about switching abstract project builds to UTF-8, whether it makes sense, whether we are ready to invest into that. Then we have open concern from JC about events API. So the current stated that Jenkins mostly uses data streaming APIs in all its implementations, and introducing events entity may increase the complexity of the implementation. So my current design introduces events at the level of external logging API. So they do not exist at the Jenkins core level, but for all external logging implementations, I expect these events to be implemented. So it may be something complicated. Then we need to understand whether we need a different logging method and log storage entries, because it's one of the difference between reference implementations. So my implementation in the core introduces different abstraction layers. JC's implementation pipeline has a single log storage extension point, which does most. And it depends how we ensure the complexity, et cetera. And yeah, final question I noted is about dependent on external plugins, because what I needed to do is to depend on unique ID plugin. So the story behind that in Jenkins project, in Jenkins you can rename the jobs, et cetera. And when you rename the jobs, you need to ensure that the logs are still somehow accessible. So in Jenkins, there is a plugin which allows assigning unique IDs to jobs, et cetera. And I used this plugin in order to inject unique identifiers for jobs and bills. But yeah, it's still an external dependency and it's a design concern which we could discuss. So these are questions from me, which I proposed discussing, but yeah, I think we could take a look at them after short demos, what do you think? Yeah, let's see the demos first. Okay, this fashion will take a little bit longer I think, so. Yeah, right. So yeah, then I'll let Jesse present his flu-indie CloudWatch implementation. And then regarding the elastic search one, it's a pretty boring demo because it just works, but it is elastic. Okay, so yeah, let's start from Jesse's demo. Nope, nope, nope, okay. Let's get to this one. You should be able to see this. Nope. Yeah, that looks good. Great, okay, so this is an implementation of external storage of pipeline build logs in CloudWatch. And the code perspective, it's a single plugin as you can see, but it has a number of dependencies on various patches to parts of pipeline. So here's workflow API plugin, workflow job plugin, durable task step and support plugin. So there are several pipeline core plugins that have been patched for this. If you're not familiar with this version number notation, these are called incremental versions. Basically, the idea here is that we're able to put together prototypes like this, where every time anybody pushes a commit to a pull request that passes CI, you automatically get a publicly usable quasi-release, of that plugin that you can depend on. So that makes it easy to play around with experimental APIs like we've put together even before they've gotten merged with plugin maintainers. And basically it just adds one extension point, which in a nutshell allows us, it says that whenever you have some build log messages coming out of a pipeline build, then they should go to this Fluent DeLogger, which as I was mentioning before is pretty dumb, simple. It's basically just creating this output stream that says whenever you get a complete line of text, then put together this little JSON record with the information about the build number, which pipeline step sort of it's running and a couple other bits of it. And a couple other bits of metadata. And then this is just going to some library called Fluency that sends to Fluency. And that's basically all it does. So as you can see, it's quite short. And then if and when somebody tries to browse the logs from Jenkins, which might never happen because they might just be getting a green check mark and GitHub or something and never pay attention. But if they do, then we make some API calls directly using CloudWatch logs API. So this is coming from the AWS SDK. And this part is a little bit more complicated because we need to, the meat of it is that we need to create a filter query in CloudWatch logs that looks up a given log stream. And then it does some query patterns to pull out particular fields in the JSON. And then it takes all of these log messages and packs them together into basically a plain text stream in a format that Jenkins Core is comfortable parsing and rendering as HTML. So if we go to an example completed build here. So here I've got a pipeline build that's run in a couple of branches. Part of JEP 210 is that you can, is some minor enhancements to the classic UI log view which is especially helpful for showing demos. You can see which parts of log output are coming from different branches or even individual steps and things like that. So we have some messages coming from this branch that are running locally on my laptop and it's running some shell scripts and stuff like that. And then we also have another branch that's running on a make believe agent. So it's still physically on the same machine but I'm using Remoting Protocol. So it's running in a different Java virtual machine for the controls. And then it's again, running a shell step and I'm doing things like using the with credential step to do some stuff with passwords. And you can see that the passwords are getting masked and running some other stuff that takes a little time. So if you want, we can run a fresh build and you can see that we're going to get, this is a little bit slow because I'm running Jenkins in development mode so everything is artificially slowed down here. But if you go to console output, so you can see that we're getting some live console output. And so there's a shell process that's running and it's continually streaming output directly from that agent to Fluent D. And then Fluent D is collecting log messages and sending them on to CloudWatch logs. And then Jenkins master is polling CloudWatch logs and asking for new events that it can display in this. And so then it's showing an incremental log display. And then if you want to, yeah. Maybe you're about to say this, but I was going to ask, so for this demo, did you write these both as one plugin? Are they two separate things or? This is one plugin. Yeah, they handle both sides of it. As I said, the Fluent D side of it was pretty minimal. And a lot of the details are actually sort of communicating between the exact format that Fluent D is sending has to exactly match the format that will appear in CloudWatch logs. I'm just figuring though in a non-demo state sort of what Nick was asking earlier. So having the thing that sends the logs as one plugin, having whatever storage backend that you want would probably be two separate plugins. If we have external login KPI, it would be possible because as Jesse said, sending logs and reading clocks is interdependent because you need to define how to store fields, et cetera. But in external login KPI, we have JSON layer and this JSON layer reunifies that. So for thin implementations, you can just make them completely separate and then they will be working using the same API. Right, so somehow or another, you need to make sure that the particular choice of fields and their format and so on is lined up between the sender and the receiver. In this case, it's all just within a plugin but in Oleg's prototype, it's that common format is defined in an API plugin instead. Got it, got it, okay. So, and this is also partially for demo purposes, possibly useful. It also adds a sidebar link that jumps directly to the AWS console, so let's see if this opens up. Yep, okay, so this was my build number 50, which is now completed. So it's constructed a query in CloudWatch that's going to a particular log stream, which is named after the jobs. If you have some multi-branch project within a folder or whatever, you'll get a corresponding log group name here that's just based on the full path in Jenkins. And then it picks out all records with that build number so that it filters it to only this particular build of the job. As you can see, let's get rid of the time because it's just taking up some space that we don't really need to have there. It's a little bit better. So as you can see, the plain text of each line is sent in a message field in the JSON. Currently, we're recording a timestamp separately to work around a bug in one of the Fluenti plugins. This would probably go away and we would instead use the source timestamp that's defined by Fluenti and also honored by CloudWatch to keep track of the order of events. You may have noticed that there are classic Jenkins UI annotations in here. So you see certain things like stuff that's in gray because it's part of the metadata of the build. And you also see some things like hyperlinks which work like they would in any Jenkins installation. So those are all separated from the message so that we could build log viewers that just show the plain text. And then the particular, I'll expand this, you can see a little bit more detail. The actual data that's parsable essentially only by Jenkins is kept in a separate field in the JSON. You can't really do anything with this outside of JSON, outside of Jenkins very easily because it's base 64 encrypted but cryptographically signed version of a serialized Java object that tells it what HTML to render there. So that's- I would think that if your browser locks outside of Jenkins you can get the raw text at least. Right. So the raw text is available as a separate JSON field. So if we do build an external JavaScript client that somehow gets API token suitable for accessing CloudWatch directly then we could just pull out this message field and ignore the invitations. It does give you, it does record the build number as I said before. Node is a reference to which particular pipeline step this came from. So if you go to the pipeline steps view in Jenkins. So that was the line that said, so as a node block it said running on remote or something like that. So branch node, yeah. So this log message here is you can, if you can see the URL here, this eight is the representation and pipeline of which step it was running. And this single line of text is basically produced by doing up a CloudWatch query that says shell.node equals eight. I think if you do, maybe it's a lamp or something. Yeah, there it is. So basically when you display this page and the same would be true of BlueOcean. I haven't loaded that up in this demo but when BlueOcean is showing you per step log output it's making essentially the same query and only loading those messages from Fluentine. All of this, if you go to the build record on Jenkins you see that there is no log file in Jenkins. So we have other kinds of Jenkins specific metadata are still being stored because we haven't worked on those things yet but there is nothing stored on Jenkins home that contains any of this log data. It's all in CloudWatch. And another thing you can see is that we're doing some password masking here. So if you go take a brief look at the pipeline that it ran or you can use a replay link. So I'm using the with credential step to do some password masking and this is inside an agent node block. This is supposed to be running remotely not in the Jenkins master. That password masking is still effective. So even though the shell script actually printed receiving and then some secret text, it got replaced by asterisks and that you can also back to overall query. You can see this somewhere. Yeah, you can see that this is in fact masked out on the CloudWatch side. So before the message was sent to Fluency the password masking was done and that was actually done on the agent side. Something else that's done in this reference implementation is we're defining the sender field. And this is pretty important because this goes to one of the key performance aspects of the whole system is that this write replace method this is Java serialization and it's called when we're sending a handle to the Flinty log output over the remoting channel to the agent side. And when we do that, we're keeping track of the fact that this is a different instance in different Java virtual machine and every time we send a message we're recording that in the log record. And so you can see some things that are running say from Michelle step in the master or say that they're sent from the master node but when we got this, we saw that it was sent on the remote node and that meant that this log message and all of the work, all of the use of the fluency logging library was done entirely on the agent side. So we never transferred this text or anything related to it back to the Jenkins master. The master will only ever see it if we display the UI for it. Yeah, and technically it works for almost every console log filter. So it's a standard extension point in Jenkins core. So if this log filter is serializable it can be executed on the agent side. Right, so essentially this is part of one of the patches that was done in pipeline code is that we have basically a fresh set of code that's implemented for sending log messages from pipeline steps. And it produces something that can be safely transferred over a remoting channel. And among other things that it does it assumes that the task listener is something that can be safely transferred. So in the case of the fluency implementation that is this fluency logger object is serializable. So it's safe to transfer. And we also check that any console log filters like the one that masks passwords is also serialized and sent over to the agent side. So that basically the agent becomes autonomous. It can just sit there and handle 300 megabytes of output and send it right onto Fluency without any communication with the master after the steps. One of the things we should be aware here. So we check only that it's serializable. We don't check that it actually can be serialized because Jenkins has its own stuff like a JEP 200 by the civilization protection. And you probably should ensure that whatever we send if it's rejected by JEP 200 it doesn't break the rest of the log in. Well, JEP 200 won't apply here because we're sending something from the master to the agent. There could be cases where something is marked serializable but it's really only expecting to be serialized in a different context. It's another story. Yeah, we probably will at some point need further APIs for or at least some functional tests or something like that to check if there are problems with existing console log filters. But at least for with credential step it worked without modifications to that plugin. Also for mask passwords and some other beats but there are so many things which you decide to learn this functionality. And in the JEP 2007 design we presumed that some compatibility may be broken and it's explicitly written in the design. So we will be tracking these issues and documenting them like we did for JEP 200 before. Yep. I think that's all of the important parts of the demo. If anybody wants to ask questions about what I just showed, if you haven't already written anything up I don't see anything in the chat. I don't see any questions in the chat. Anyone on you that's asked questions? Okay, good. Just checking. Okay. So they called Jesse. I'm really eager to see where that goes. Yeah, right. So we have 30 minutes left. So what we could do, we could briefly discuss the design concerns or I could show the external login API part. Yeah, but from the behavior everything is expected to work pretty much as Jesse has presented. But instead of doing it on the customer implementation side you just should get a pretty simple API which you implement and everything starts magically working because the rest is handled by external login API plug-in engine score. So instead of having to create your own message format the API will specify that. Yeah, right. So I can briefly show it to you. Sure. Okay, so do you see my screen? Yes. I should point out that creating a message format is literally a few minutes. Oh yeah, but if we have a consistent one across all of them then that's sort of... Yeah, right. So when I set a message format it's not only about the event. So you may see that the event is pretty simple. It's just message timestamp and whatever ID but we still need to send additional metadata and this metadata depends on the log type. So although in the design we do only build logging we keep in mind that eventually we will need logging for other types of events. For example, agent connection or multi-branch repository scanning or whatever. So the implementation in JEP 207 is a bit more generic. So in addition to that we need console annotations which should be injected and yeah even this data is being generated automatically. So what we expect from the final implementation the final implementation actually should implement only one method. It should produce external logging character. So yeah, here. And you may see that the publicly accessible method when you create a writer it also injects some metadata. For example, build number, job ID which we need to retrieve the data. And yeah, if an implementation creates this method everything starts working out of the box. And this method in the current implementation is actually just write event. So whatever you need to do you just write this event and it's up to you how you handle that. There will be additional abstraction layer for JSON events. So finally it will be simple. And of course this logic will be doing all things like processing streams because one of the complex parts is that you needed to process incoming streams. So for example, currently it's an elastic search plugin but it will be moved. Yeah, what you do is that you sit on the output stream you receive the stream but you need to parse it and then you need to extract console nodes put it somewhere else. So these implementations will need additional logic which I'm trying to put externally so that you get better API layers. And finally, yeah, so how it looks like. So currently this is a write event for elastic search. So effectively I get event and convert it to JSON which I sent to elastic search. But this code will be also moved to external login KPIs. So effectively you will just need to implement a push method which puts the data externally. Obviously for elastic search it may be a bit more complex finally because you need to handle indexes especially if you want things like index rotation but from my event sending part it should be still simple and same for browsing. So this is all the idea of external login KPIs stuff. And yeah, if we switch back, so you may see that there is a bridge method for pipeline. So external login KPI implements pipeline log storage. So it means that the current implementation can work with both freestyle projects and pipeline and this bridge method should ensure that it's transparent to the user. So at some point when for example, pipeline implements new Jenkins code APIs it can be this implementation can be just replaced and everything will continue working transparently. Okay, so yeah, that's the idea. So this is a big class which integrates almost everything which adds all these conveniently layers. It also introduces extension points so that it becomes configurable from the user standpoint. So for example, there is external log browser factor the same for login method. So finally, you would be configuring to them either from UI or from configuration as code plugin. So how it looks like, actually I haven't converted my demo to configuration as code yet but he's a number of groovy hook initialization scripts. So here what I do is that I stop what would be a login method and browsing method for my instance and it's applied globally. If you need something more complex you can create another factory implementation which would take an account for example, folders or which would be emerging several login methods as Alex proposed. So all of these things are possible but it would require a bit more complex configuration than I create now in the reference implementation. Okay. Yeah, I mean I wanted to point out that the at least in the JEP 2.10 reference implementation I mean the actual sources of complexity and bugs including one I just fixed today are mostly surround the particular mechanics of the log browsing. I'm not sure if I wanna go into huge amount of detail but basically I mean if you look at the current implementation it's fairly complicated to figure out exactly how to get exactly the right events from CloudWatch efficiently and to pick the particular point when you're doing incremental log browsing because all of the Jenkins APIs are oriented entirely around plain text output streams with sort of a notion of a assumption that things are being stored in a binary file. And so you have to do a fair amount of work to basically fake that out and produce equivalent behavior without that and we'll probably need to do some fundamental changes in core to the code that displays logs and in Blue Ocean plugin as well to allow plugins like this to do something more efficiently. For example, currently we're just taking all of, let me have this. Yeah, like right here. Unfortunate looper basically just have to get all of the log records for a given build and we basically can't pay any attention to the byte offset that Jenkins passes us because that concept doesn't line up at all with what CloudWatch provides. So I mean CloudWatch provides pagination and it provides a token system so that you can retrieve log messages by page but we have no way of hooking that into what Jenkins actually does. All right, I have a pretty same problem so you may see that I have a pass offset but at some point this offset magically disappears because we cannot do much with that in Elasticsearch as well. And another problem which hits us hard is eventual consistency. So since we report logs in parallel with agents and from the master, especially in pipeline, we do not really know what is the order. So API things like scenes just do not make sense because when a large text pulls in logs once, it doesn't guarantee that there will be no events posted for earlier time frames after we query the data because in Elasticsearch it may take a few seconds because before everything is synchronized on a cluster. In FluidD and CloudWatch, it's pretty much the same because we push the data to FluidD then it takes a while till it flushes the buffer to CloudWatch. So it gets eventual consistency presence there as well. Yeah, so for CloudWatch, we are able to handle that but it requires special code that we need to go back and double check with CloudWatch whether it's actually received a given event that we had sent to FluidD earlier and if not, we need to instruct Jenkins that we don't yet have a complete build log and that it should keep on. Yeah, right. I started implementing this API as well. So most likely the eventual consistency engine will go to external login API plugin as well because I found no good way except in doing pretty the same thing like JC does. And to some extent, this is implementation specific. So in the case of FluidD, CloudWatch, we actually would want to currently, it's using a JSON field that we would actually want to use the native timestamp field for the record as soon as that has millisecond precision. Yeah, there is a question whether millisecond precision is actually enough for logging because yeah, milliseconds in Java are not really milliseconds, especially on the distributed instance. So... There can be clock skew between master and agent. Yeah, so it's something we will need to consider eventually, but yeah, there is no good solution. So the only thing that it happens in such way even now, we just put everything to this single stream. So we never notice it. Right, yeah. So for now, for CloudWatch, I'm simply sorting everything by source timestamp under the assumption that the clocks are reasonably close. And so you get a build log that shows an interleaved view according to source clock, basically. Yeah, right. So yeah, as you may see, that there are some issues behind, we are trying to solve. Yeah, that's why I originally decided to do external loading in KPI. So we don't bring all this stuff into the Jenkins Core and then that we keep it as simple as possible. Okay. So what do you think about such approach? And I think it sounds pretty cool and very useful. Thank you. I had a question earlier about code coverage and test results, was that you? Yeah, it was definitely. So that's actually being tracked as separate efforts. I think it has a very different system for managing things like test results and code coverage. That's also certainly something we're interested in. I've already started some research and bits of experimentation with test results, that's not going to look anything like this and we would likely not be using the same backend at all. Well, we may be using the same backend eventually, but whatever we do will be pluggable. So it means that if somebody wants to implement, for example, SQL based storage, it will be possible. Right. And yeah, speaking of code coverage, together with Jeff and Shengyu and several other mentors, we are working on a code coverage API plugin, which introduces a kind of unified data model for code coverage plugins. So it would be the first foundation step to externalize this data, because our current problem is that each plugin implements the storage on its own. So for example, it's not even always XML. Jack-O-Cop plugin uses non-XML format. So we wouldn't be able to use XML storage, abstraction layer proposed by Alex and it's a major issue on this front. But once we have external login, sorry, once we have code coverage API plugin, it's something which would be plausible, but it's not even designed. So it's just somewhere in the list since 2016. Okay, just a second, I'll show the entire list of stories we had on the table. Okay, so back to my slides before I lost that. Yeah, so yeah, we have several data types and there are test results, but obviously test coverage would be somewhere around here as well because your coverage is also a part of the test data usually. Okay, any other questions? Not in the conversation history. Actually, I have a question, not really technical, and I'm sorry if you already mentioned that, I just did not get it. So what's the maturity of the things you're presenting? Is this released? Will it be released in a predictable future? Okay, these are just drafts at this point. They're just starting at that stage, right? So for pipeline storage, everything is available through incremental releases. So if you want to just try it out, you can use existing tooling like Docker, et cetera, to install these plugins and to see how it works. But yeah, generally all of that now proofs of concepts. We don't have fixed timeline for making it available in base centers. So yeah, we agree that we need to do that and we will be working towards that, but there is no ETA. Okay, very good, thanks. Yeah, so the idea that once we have something stable, we publish this, and we start from there. And for Jenkins Core changes, I hope we merge it earlier, so that we will be able to use better APIs like we do for other stories like Artifact Manager. Okay, I'm asking because like those ideas seem to be very cool. And the moment I presented to my colleagues at Pragma, I'm pretty sure that that would be the first question they asked. So I just wanted to clarify what's distributed. But yeah, I get it, I understand the answer, so thank you. Yeah, so we have demos even now. These demos are just in flux because we change APIs if you want, you can try their links in the chat and in the meeting document, but we will continue improving them later. I have a question for people in the call, is people interested on these Alexa certs and CloudWatch backends or a different one? Yeah, what else starts to doing? We had meeting notes somewhere in our Google Doc. So let's start filling these things. Maybe Jeff, you asked the questions. Are you thinking on using one of these, creating a new one, or would be interesting on a new different backend? In terms of storage? Yeah. No, this is fine. You mean the Elk stack? Yeah, Elk stack or CloudWatch or something else. Yeah, no, we store our logs in Elk right now in AWS. So this seems like it fits perfectly. Okay. Any, anyone else? Yes. Alex said that he looks towards moving everything to SQL. So Alex, if you could clarify your vision, it would be great. Yeah, I muted the heaters. Okay. Yeah, I think in chat, he said he's looking at Postgres. SQL is not gonna be a great option for storage logs. Yeah, so our main problem is that everything is pluggable. So what we could do in the current event API, at least we have this thing. So string serializable, but actually what we could do is convert it to string string. So that at least you would be able to implement a kind of key value storage and SQL. So it would become more flexible. But yeah, with string serializable, I don't believe that SQL would be easier. Well, I think you would just want to custom table definition with proper columns for all of that. I mean, I don't think in general that a storage provider would want to be given an opaque map of data. They think it would be expecting a particular set of fields such as timestamp message, build number, step number. You would start the event object with this data. It's attributes. Yeah, I mean, so for CloudWatch, for example, for the CloudWatch logs implementation, we are sending a JSON string to FluentD and then CloudWatch is storing that and allows you to do JSON field-based queries. But they're specific fields that are called out directly in the plugin code as being meaningful. And then it's significant what those fields are on the CloudWatch side as well, because you, again, you could have an implementation of a client-side log browsing that's expecting certain fields to be in certain places. And you have expectations about what kinds of filter searches you can do efficiently and things like that. So I think the same would be true of a SQL backend like Postgres, you would expect to have a table with predefined columns that you could, that would be properly indexed so that you could efficiently run a query, you know, say on build and step, like we're doing in the CloudWatch implementation. Yeah, right. So it would be easier as long as the only logable object is run, but actually we could have other implementations. So for example, we could create a log browser or whatever for any kind of task in Jenkins. In such case, you may have to create separate tables for each kind of logable, which is probably doable, but it will require some work. I'm not sure that you do. I mean, all of Jenkins logging basically uses the same task listener type interfaces. It's all basically sending text streams with console. Right, but you will need to categorize the data. So it means that for runs, we can easily say that there is built a number or so. What would we say about periodic task, for example, or what would we say about agent connection log? So for all of these types, we will need different categorization data, and this data will need to be indexed if you want to get at least some feasible performance. So I expect that it will be still some differences in SQL structures. Well, I mean, the only difference between runs in Jenkins of build logs and the other things that you mentioned or that the other things have no notion of history. So whereas builds have a numeric history sequence, and you want to record the build history at any given time for things like agent connection logs, branch re-indexing and so on, we could just omit that field or we could set it to the timestamp of the start of the build or to the start of the log or something like that. But in general, those things we don't have, Jenkins doesn't provide any notion of historical log for this, it's just ongoing log. Fortunately, but from the administrative standpoint, it may still be valuable to have access to this history. Yeah, so you could, right. But that can also be provided pretty simply by something like starting timestamp. So we're already storing a timestamp in the back end whenever we log a line of texts. We could simply say that if you're displaying something like an agent connection log, you the logic would just be apply a filter with timestamp greater than whatever you required as the start time of the agent connection. From the back end, you would see the aggregated log from all reconnections of a given agent by name. Yeah. In the UI, you would see only those from the current connection. All right, but it's still going after SQL will require a lot of design. However, even for document based storage, maybe I'll streak it. I think it intrinsically requires any special design. I mean, you could take, I mean, we're doing for CloudWatch, it's just a set of fields and JSON fields containing flat strings. I mean, it would map directly to a table structure in SQL, but we would need to have a stable common API that we can prototype this things with as well. I'm not sure how widespread the desire for SQL back end is. I know at least one person who also wants to get that. Yeah, but yeah. So does anybody else have opinions about the back end? Anybody? Sounds like no right now. Okay, so if you have any opinions, just put it there. Yeah, so... I'm sure more will come along. Yeah, right. So what are our consumers? As I said before in the presentation, we have several projects like Jenkins sex and Jenkins essentials. So they seem to be a natural consumers of all this stuff. And yeah, likely we will be working with these teams in order to define how it would look like for them. And maybe we will invite them to our next sessions. So any other feedback regarding the designs? Yeah, that's it for you. Yeah, right. So as you may see, there is still a lot of work ahead. We started synchronizing base stories and when we have a kind of stable implementation with all bits and external login KPI, I will do another demo so that we see how it could be implemented in a pure implementation. Okay, yeah. We don't have so much time left. There was another item in the agenda of what would we like to discuss during the next meetings? So in addition to stories, we announced in cloud native special interest group, would you like to put something into the agenda for the next meetings? Just a second, I'll screen share again. Okay, so what do we have now if we go to cloud native SIG? Yeah, so cloud native SIGs, cloud native SIG. Here you may see that we have three stories in the priority list. It's log storage, it's artifact storage, which has been already released. And there is also configuration storage, which we'll likely start soon. I have an item to publish some foundation designs for that. So these are our main three stories we have on the table now. And things mentioned by Jeff, like test storages, may also get into the list. I think we should add test storage for sure. Yeah, so it's likely an action item for Carlos. So if everybody has opinions of what needs to be done by cloud native SIG, just comment in the mailing list. So he's cloud native SIG vision and priorities. So if you want to add something to the agenda, just put it here and we will try to discuss that. I tried to provide my feedback in this blog post. So yeah, there's lots of text, but yeah, actually what I've tried to point out here in addition to advanced status updates, that yeah, there are some other stories I would like to consider. And yeah, Jeff, you may see that actually code coverage API is also there. So I'm interested. And yeah, it also may make sense to start looking at other architecture changes because the end goal mentioned by Carlos when he created SIG is to move towards things like high availability, role and complicated, et cetera. So it's not just externalizing the storage by moving storage outside, we won't make Jenkins cloud native. So there are more stories to follow up and we could start building a backlog of these stories and somehow prioritizing them. However, yeah, it's hard to say when we get there. Okay, so does anybody want to discuss any additional stories? Okay. It seems like I think we've probably covered enough ground in this meeting, we're at an hour and a half. There's a lot that's gone on here for us to sort of grok. Maybe it's good enough for the first meeting and we should schedule the next one and have an agenda for that. I think this is all us kind of getting up to speed. Yeah, right. Something I wanted to ask next is how do we organize the meetings? Because EA6, I expected to have at least one meeting per month or so. But in the current stage, we could do the meetings more often because we already have some history we could talk about like artifact management and yeah, we could continue deep diving into external logging because it's hard to discuss everything at a single meeting. So one of the options would be to just have weekly meetings or maybe weekly meetings. So how would we approach that? What do you think? Carlos, if you're talking to a meter. Yeah. Weekly seems too frequent to me given the scope of stuff being discussed. I got a call at the same time. You are talking about the frequency of the meetings? Yeah, weekly seems too much. We can either do it every month or so or whenever we have new things to work or people want to have a update. I can see there may be other people interested in knowing like when this goes live or there is or the JEP is approval, how do I implement my own backend driver or something like that? Well, it seems that there's also some open issues a number of issues to be discussed. So it might be good to start with at least one week or so to kind of move those discussions along. Yeah, I agree. So what we could do, let's schedule. So for external logging, I would be interested to have a meeting in two weeks with status update and with something up on the core site. But if needed, we also can do intermediate meeting but we can sync up in the chat on that. So let's schedule something in two weeks for external logging. Sounds great. I'll send doodle then. Or maybe just pick this time again. It seems reasonable. I would say the first time. Yeah, this time isn't exactly comfortable to me. Oh, okay. Yeah, I would rather prefer another time. Okay, so we'll discuss. Yeah, I would rather propose to go through another scheduling process and define regular time slot. Because yeah, this time I just sent doodle for one day so that I've got not so many slots where we have overlap but if you schedule weekly meetings, let's just have another voting and just have the entire week as options. Okay, I will do that. Cool, all right, so are we getting to this meeting? Yeah, we're so. All right. So then I will stop the broadcast. So thanks everybody who participated in the call. If you have any follow-ups, let's discuss them in the chat or in the mailing list. And yeah, see you in two weeks. Cool, great, thanks, Oleg. Thanks. Thank you. Thanks.