 So let me now start to formally welcome you and introduce the session. So this is the session security coordination and policy of the EOSC hub week. So just the question, why do we do security? Why are we talking about security? I think people are well aware in this day and age that the internet is a hostile environment. The fact that we're doing federated access with distributed infrastructures with open scientific communities and computer networks leads to a sort of special threat landscape in which we live. And the security risks of that need to be well assessed and well managed. The aims, of course, are to maintain the services, confidentiality, integrity and availability of services and all of the data. And so everything we do is aimed in that sense. There's an information security management team in the EOSC hub projects in WP4, sub-task four. And we are charged with coordinating the security issues across the infrastructures that participate in the EOSC hub. Our primary aim is to avoid security incidents. But when inevitably security incidents still happen, we are charged with coordinating the handling of those between the various security teams across the infrastructures when they do happen. And it's a pleasure that we have a fairly large number of members of the team who you're going to hear from today talking about various aspects of that. And so in terms of the agenda, we'll start with Vincent, who will talk about as an opening example, a recent security incident. Then there'll be a number of talks looking at what we do to avoid security incidents. What lessons do we learn from incidents and assessing risks from OPPO? Linda will talk about handling software vulnerabilities. I will talk about trust and policies. And then there's a group of people, I think it's just Roma, David talking. We're going to talk about threats and sharing intelligence amongst the communities. Then Sven will go into handling security incidents and Roma will end up by talking about how can we improve coordination amongst the security teams. So we're going to start with a Slido poll just to see who it is we've got in the room. I can see the numbers are going up. We've got now 50 participants. So if you can actually go to theslido.com, select 19th of May. So this is for everybody participating, please. 19th of May security coordination and if I can find it. And then I guess I want to show this, we could show this as well. So if I stop sharing that and now share this. And so now you should see the question, what is your current role? Question mark, select all that apply. So you've got one, two, three, four, five, six possible choices and we can watch the answers as two out of 55. So it's fascinating watch this grow as time goes on, right? But what I think I will do, I mean, please, please do all of you participate because it's good for us to know who we've got actually in the audience. We're still only getting a small fraction of the people. But we could leave this, we'll leave this poll running while we go on to the. Sorry, could you share the link for the slide on the chat so everybody can access it? OK, thank you. It's just Slido.com and the EOS. Well, it's actually in this thing here. Join the Slido.com. It's on the slide and the events code is EOS.com week. And then you have to select the room 19th of May security coordination. OK, thank you. OK, so about half the people have answered so far. What's interesting is that we have a really nice spread of all categories. It's at least good that the the other category is the maximum in that I've forgotten important roles that you will have. Anyway, I will leave that poll running. So please carry on answering that. And now I will hand over to Vansal, who's going to start us off and talk about a recent security incident. And I'll stop sharing. So Vansal, over to you. Right, so I will be presenting a recent security incident that's happened in our communities. And I think why cooperation is important. Let's start with disclaimer first. The incident here occurred within the last year. It's a real incident and I will be putting it from EGICS as point of view. Unfortunately, has this forum here is quite public. I will not be able to share more details and you will see in particular that I will not be sharing names of the affected places or victims or the precise states. To be clear also to the best of the knowledge, there was no EGI or ESCUB resources, which were directly affected by this incident. But the thing is that some of the affected resources were close to us and our users were those affected. And last but not least, this presentation is not about the incidents affecting HPC right now. It's an older incident. So let's see how this is started, rather how we learned about the incidents. So first of all, one site, let's call it A, discovered some compromised systems. What was bad for us was that what was compromised or part of what was compromised was what we called login or user interaction nodes, which are what users used to connect to access all the systems, which means that user credential was stolen. Thankfully, this site decided to share these details with the collaboration and other sites and we are really thankful to them for that. In particular, they directly contacted another academic place that we called B to tell them that the malicious connection that they initially seen was from their own systems and they were recommending to loop connection from the active systems to see like the world malicious connections. We learned about it as a GI through cross-committee membership because within different committee in Europe, we do have cross membership to try to collaborate as much as we can, but we were not directly affected. So initially we didn't pull that out of. However, we were later contacted by B, which was the source kind of for the attack at the time, tried to analyze malware funds and tried to manage security and asking for help to notify the victim and contact the incidents because we have quite some good contacts there. So we tried to start helping as much as we could for correcting the response, which meant trying to build up a central view of incidents with all the victims, the OCs, the details that we could collect, broadcasting those OCs within our community, try to identify the victims and last but definitely not least, for every time we had some more detailed suspicious activity linking to someone to share directly with the victims or through the national initiatives and then the incident group. However, let's first take a small interlude on what actually we're seeing. So by the end of the incidents, we had three pieces of malware that we had analyzed and fully reverse engineered. And that was thanks to national collaboration between different people and in particular, the secretion shows have nothing to do with us, but we really thankful for ourselves. So we had one malicious of an SHD, which was a backdoor, which was providing hidden boot access, so no longer nothing when you were looking on it from the right post. We had also malicious of a search clients, which was stealing some credentials and we were storing those into a local file, which was like user passwords and the remorse host name for any admin connection. And there was a malicious bot, which was connecting to some command and control servers, waiting for incoming command in particular commands to get the credential of the local system. In term of malicious activity, we're able to identify some previous creation mechanisms. For example, there was at least one zero day on the search elements, which had been communicated with the vendor and is now patched. And there was at least another case where it was some comments on a storage miscreation that was applied. But what was interesting at the time that we found no functionality, no like malicious functionality, but those solely used for discovery spread of persistence. So as I said, we had they were stealing more credentials to access new nodes, they were installing backdoors to keep access to systems. And they also had some tunnel and proxy functionalities so that they could like hide themselves behind stolen systems. But there was no clement activity. There was no cryptocurrency mining. There was no network abuse. And still to these days, we don't know why they came in. All right, back to the incidents. So what we tried to do after identifying everyone we could was to attract back the attack to its origin. We had quite some clear network activity. We had the connection from the bots to the CNC servers. And we had incoming management connections. Unfortunately, all of these came from systems which were outside the academic community. So it was a bit harder for us to contact the owner of the systems. Still our secret techniques from the managed to convince the owner of those systems to give them access with access. And that, in fact, happens more often than that's my thoughts. And this system when I'm modified to try to gather again more evidence on what was going on, which means that not like removing, kicking out the mesh sectors, but keeping them in to try to collect more, which meant collecting IPs that was used to connect the CNC, try to track back the next level again and again. And also connecting the IPs of the victims. Either due to the bots connecting to the CNC or outgoing SS connection from this social service. And indeed we identified more victims places. Either due to the houses we would guess it's because it's reached not only our site, but usually a bit around our sites because people have more than just the security admin for sites. And also thanks to the communication when we had specific IP to communicate. Those were different stages of infection. Some of them were fully compromised. Sometimes some of them were exploited, for example. And in this case was quite bite for us because it meant that there was more customers to stolen and you have to fill up again to more devices and get those fixed. Some of them were only user level compromised, which means that the God to stolen user, they were able to connect to the box, but they were not yet able to exploit the availability. So the incidents started to be contained. And we were trying to wrap up things. However, some time were not fast enough. Late in the incident, we received a list of stolen credentials. To be clear here, these were shared in the interest of the users so that we can prevent that their privileges, the access were abused and, you know, that more damage will be caused on their behalf. So we shared those information with the security contacts of the affected hosts because the malware was storing the users, the password that we as a giant never saw, and the remote host name. We were told that the affected users had already been contacted and asked to turn the password, but we wanted to make sure that's the service provider who also contacted. And in fact, at least in one case, the administrators of one of these destinations was not, were not contacted or at least did not receive the correct message. And worse, the stolen credentials were the one that they were having identified the one being used by the attack. And even worse, it happened that the attack took place quite a while after the attack. In fact, after the question was found, but before they were shared to us and shared with this victim. So in fact, if we had, if this had been shared earlier, basically the attack on that victim could have been prevented. And yeah, we had one less victim. Sorry. On the other side, you know, to do all this containment, we took quite a few actions. We broadcasted all the hours in our communities to try to get as much things under control as possible and identify as much victims. Every single academic systems that we found affected was taken down, you know, green stall and so on. And every stolen credential that we identified was changed, certificates were evoked, and so on and so forth. So quite quickly, the attacker realized that we were unto them and decided that to cut the losses. They just cleaned up all the CNC the system we have access to and basically disappeared. So we lost track of them. So in the end, even though we, you know, contained the incidents, who will never figure out who they were and what they were after. So going forward after the incidents, if you think about it, sharing an incident was the key to contain, to contain it. We shared the ICs of the measures IPs we found, the measures fives we had, some TTPs and what's not. But more importantly, I think we shared information about the compromise systems and the compromise accounts. In particular, with the victim themselves, then the administrators or the actual users. And I think that's our committee is small. So, you know, so within EGI we have the right rules and so on, but the community and the diversity of the committee is growing. There are already different initiatives in place. AGI, EDIT, PRESS, and many more. And we do try to have suppression between us, but you know, it's not as close. But more importantly, with EOSC hub growing on EOSC growing itself, there would be many more services, many more provider coming. And because our users are quite all the same, we will still have all this together. So, I think we need to overcome those sharing barriers between us. And there are many, there are like fear for abuse and damage, trust issues between us, or legal issues like GDPR. But it's very important to overcome them so that we can in the future handle incidents. And that's all what I had to present here. I've seen that stuff. Okay, thank you very much, Vansar. I hope people appreciate that that's a sort of, a good example of the sort of problems we can, we can all jointly have to deal with in this distributed research and IT environment that we work. And you've also, and Vansar, you've also given us some ideas about what we need to do to improve things. I don't see any specific questions on that in the, although just, time for one quick question, Vansar, and other things will come up later, I'm sure. So thanks, Vansar, great overview. How long did the response take? That's quite a complex question I would say because there was multiple steps. I think that from the beginning to the end, you know, it's probably took more than a month. To get anything under cover and contain. I mean, initially like, most of the response from each side was much faster than that. But in order to get all the chain and to get access to the systems and get, you know, to provide all the information, get all the information provided. I think until we got figured from that by the mesh sector and that they cleaned up the systems, I think in total it was maybe a month or two. But like, more systems affected work usually cleaned up much faster than this. Okay, thank you. At this point, I think we should say thank you, Vansar. And we'll go on now to Orpo Kyler from CSC in Finland, who's gonna tell us about instance and assessing risks and things. So Orpo, can you share your slides, please? I will do that. Hello, everybody. It's a pleasure to meet you here even virtually. Let's see if I can get this. Can you see my presentation now? So I would like to complement the presentation of Vincent from another angle. I was also heavily involved with the incident from our behalf from CSC in Finland and EUDOT. But now I would like to look at security from the BERSI perspective. What are the big bits and pieces because I see a constant confusion about this when we have complicated networks of infrastructures and communities. So I try to show you the big picture and I start with a little task for you. And I would like to you all to open the chat window and reply to the questions I will present here. Let's see. So we see a dispenser system here. Can you identify the security pieces here? What is the management systems? What is the security control for confidentiality, for availability? What are the risks? Oops, my screen's turning. G is freezed here. Sorry, I have a technical problem here. Can you still hear me? I need to... I guess we can still hear you up over here. Yeah, we have a problem with when we had a VPN on. It's sometimes freezed. So we see there's a kind of a big picture about vulnerabilities, risk, controls, on confidentiality, availability, usability. But it's always the same thing. Also when you have computing infrastructure it's still about risk, vulnerabilities, controls and have kind of a sensible setup of this. People normally get lost in little details because we have plenty of them. Let's see if I can open. So I'm going to say now we don't see your... Oh, okay. We see the... Let's see if I can reshare it. Can you see it again? I can see your screen. Okay. So we had the same thing here. And the problem is that there are a lot of different people and teams involved. It's easy when you are doing it yourself or have a little team of five to six people but when you have hundreds and thousands of people you really need to organize the security so that everybody will do their own task about that. And also you need to be able to speak the same language and identify the roles of people who you are communicating with. So often in security incidents like the incident Vincent told us about it's not all public. We cannot publish everything at once at least so we need to know who we trust. Oppo, can you switch to presentation mode or is that not possible? Let's try it. Is it better now? That's better, yeah, thank you. All right. So this is our new puhti computer in northeastern Finland. So my story is that it's the same picture all the time. Depending on big system or small system or even analog system you need to have some kind of logic how you take care of security. Incident management is the start where we all have started and then you can also think that what is your definition? I know the practices varies here. Somebody would like to keep it strict. Focus on computer security, incident system compromises and that's fine. But at least in my context I prefer a broader definition that whatever goes wrong result in unexpected anomalies of confidentiality or integrity or security or availability. And also on our people and service reputation that's an incident for me. I have been tracked on incident in my organization for about 15 years and we have seen all kind of them. And I must say that these kind of intrusions as we heard about they're actually a minority but not even the worst one. The most problematic are one or kind of false unexpected false or malfunctions which we cannot identify. Fortunately they don't happen very often. But also we have physical incidents, data center incident and one special event I'm also representing you that I'm really afraid of is something which is called the data rot something that breaks integrity slowly without anybody noticing it. And now today we have this COVID-19 pandemic and it's very good to have incident management in place when we get meet this kind of situation because it was very easy for our organization to follow already well established roles and ways to walk how to handle this very special situation. So my experience as my personal learning that incident management is a starting point but it's kind of multi-damage control that you cannot make your system secure only with incident management. You need of course to have a plan for that both technology and roles and also communication between pairs internally and also to the media it is over half of the incident management. It's not a one-man job or one-team job incident management is not only for security but we need to involve all the roles in our communities like admins, service managers communications, senior management our CSER teams and our security contacts in our infrastructure that's what just happened with this current incident we heard about oops sorry I'm getting there So what is the way to make the big picture? Well I call it risk management because you need to start with the risk and not with the tools and the way to cope with the risk sometimes perhaps a little bit over-focused on technical controls like access controls and authentication and we absolutely need that but the biggest challenge I think is in organizing things and how get all the roles involved in your communities about that You need to have some kind of comprehensive plan for that and you cannot apply your security controls in random order if you need to start somebody and the plan should be based on risk management So how we do risk management they are very established tools for that too but in our context in research computing and networking it's a little bit foreign concept compared with commercial companies where I come from but I feel that's an increasing pressure that we should apply risk management also for our services and that's what we are currently doing in EOS Hub as well using a template we have developed jointly in one of our communities device you see mostly the standard model but it's a little bit difficult to get everybody involved for that so at the end of the day or the period I would like to see that most of the core services in EOS should have been undergone risk management with all the stakeholders involved so that was my little message to you I hope I hear from you later on if we have time maybe we have time for one question still Thank you Wopo, I don't see any questions in the chat window at the moment any burning issue right now by all teams carry on answering asking questions in the chat window during the session and we can answer them afterwards Thank you Wopo now it's my pleasure to ask Linda Cornwall to tell us about software vulnerabilities can you share your screen Linda it doesn't work as I expected I'm trying to work out how to ah does that work yes right and it's not it but it's not in presentation mode yes right it's a slide show is it now it's changed but now we see your presenter I think if you are the double screen I think you have to go to display settings and invert the two screens I can't hear what that was said you've got two screens you need display settings yes if I unplug this there's a problem I can't see how to make that work I can show the slides for you shall I do them ok I can't stop I can't work out how to stop it stop share there we go I need to find I can't see the slides I'm not sharing them sorry I've not now we see them right yeah well thank you very much for inviting me to talk and it's nice to see so many people interested in security and how to prevent incidents ah one of the causes one of the things that can cause incidents is if there are software vulnerabilities that can be exploited and then you get security incidents so we set up quite a long time ago the software vulnerability group to minimise the risk to EGI infrastructure arising from software vulnerabilities to prevent security incidents software vulnerabilities in EGI and its predecessors for more than a decade and next slide please we've got a procedure for handling vulnerabilities which has been started really about 2006 and it's changed a little bit since and basically anyone can report a vulnerability to report vulnerability at EGI EU it's not been announced as fixed by the software provider as to if you contact the software provider and then we investigate the issue if it's relevant to EGI we assess the risk in the EGI environment and put into one of four categories critical high moderate or low if it's not been fixed the target date for resolution is set it's high it's moderate four months, low one year and then an advisory is issued to sites by SREG if it's high or critical to the EGI infrastructure or if we're the main handler of vulnerabilities for example it's community software that we're the main person that looks after these things and no one else is issuing vulnerability announcements and or if there's a good reason to send an advisory sometimes we even send an advisory because something's in the news and it's not relevant to us we'll just send maybe information then just saying well you know we don't need to worry about this critical vulnerabilities are handled with top priority and we're trying to do something we have special procedures for those we have a document which is approved and how to do that we're trying to evolve SREG at the moment it's proving quite tricky the EGI infrastructure is increasing in homogeneous so you can't just say oh this is the effect of this vulnerability in our infrastructure because there's so many different configurations and different bits of software in use and there's services in the EGOS catalog currently we've searched 261 services in the catalog I mean some of the traditional EOS EGI services and some application services all sorts and we need to consider how to prevent vulnerability and address vulnerabilities in this wider catalog to keep the whole thing secure but we can't tell people what software to use people doing their own thing doing what works for them but we can't but nor can we so for software in use it just doesn't work the way we've been doing things for the last more than a decade so for it as a start we've been saying those who select developer configuration software should at least consider software security we made a 10-point checklist where you can look for things common problems common things to check like is there who's the provider and a few other things and then if you're selecting or deciding how to deploy software you could consider doing the software the deployment expert group where you can help look out for vulnerabilities help tackle vulnerabilities when they're they do come up that allows people to volunteer expertise we've tried to get that going actually we've proposed it a few months ago but we haven't really got it started yet the idea is that help services help one another stay secure and avoid vulnerabilities next slide please some main messages are very simple if you find us it's what you find a relevant software vulnerability if it's not public don't disclose it don't put it on a public webpage or discuss it on a public way you're probably causing a lot of problems if you do report it to those responsible for software without using public forum for example a lot of software providers have somewhere where you can report problems and report confidentiality most do sometimes we've found ones where we've had a real difficulty finding who to report something to but that's rare also report it software report vulnerability EGIU then we'll get to know about it if you think it's relevant to EGI and if you don't feel you can report it to the people who are responsible for software we'll try and do that as well we've done that a few times before and if you wish to help us you wish to help us and help various services help one another you can join the deployment expert group consider emailing me and think about that next slide I think that's it thank you that's very simple so thank you Linda I'm not looking at the chat window right now are there any questions in the chat window I don't see it now one yes so okay if not there is one question we came just now known software vulnerabilities are often communicated via CVE would it make sense to become CNA so I didn't hear so it's in the chat window known software vulnerabilities are often communicated by CVEs it makes sense to become a CNA CNA it's an acronym we don't know Johannes it's an acronym that people can actually deliver CVEs all right well we haven't in the past we've gone for a very simple the critical high moderate low and also we tend to look at the CVEs to see whether look at the CVE score to see whether they're high if something's got a high CVE CVEs score obviously it's probably high but sometimes in our environment especially when we started doing the grid some things that probably had a high score because the way we used the software didn't really affect us much and other things which probably in a distributed environment were more serious than the scores but we haven't thought about that we should look at it so there was a question from Mark as well let's let Mark speak please go Mark you can speak sure thanks I'll just repeat what I wrote and that is that I'm curious to see how the different infrastructures are dealing with security vulnerabilities inside containers that are brought by users so missing patches and this could be VMs as well missing patches, poor design in general I keep asking this question and no one seems to have a good solid answer but maybe that's just the life we lead I don't think we have a good solid answer but we've tended to the cloud services we tended to say you have to use approved images inside containers in EGI we tended to sort of say you can only use run certain images but there is still a big area there if we are allowing people to run any software they like inside containers yes of course we can't always control everything but for the cloud we had approved images type of thing we will run the only other approach is just to assume that the containers are compromised deal with them appropriately the same way that many commercial cloud providers hold you responsible for what goes on sandbox you in such a way you cannot do damage to anybody else we should move on we are running behind time thank you Linda now it's my chance to tell you about trust and security policies you see those slides I hope Opel already mentioned the wise community this is the wise information security for e-infrastructures which is a collaborative activity between multiple infrastructures across the world and we decided in EOSCUB that was a very good place to take our standards from in terms of trust and policy templates as it were so as people know security policies are one of the managerial controls you have for mitigating the security risks Opel was talking about through our membership of the global wise community we base our policies and procedures on the wise security for collaborating infrastructures version 2 trust framework that was published in 2017 and the EU Horizon 2020 the ARC project authentication and authorization for research collaboration that produced a very nice policy development kit and we've taken our EOSCUB policies from that policy development kit and these will be taken forward under the auspices of wise so there are three policies currently in the information security management process in the EOSCUB service management system there's a top level security policy applies to everybody setting the attitude of the infrastructures as a whole towards security and giving various people rights and responsibilities but I won't say any more about that today then we've got the AUP the acceptable use policy and conditions of use which I'll show one slide on and also a security policy relating to service operations within EOSCUB the AUP this is based on the template that came out of the ARC policy development kit and is now taken forward as wise as the baseline AUP version 1 the idea of a common baseline AUP is to actually benefit the infrastructure and services and indeed it makes it simpler for the end user they don't have to read and sign a different AUP service they access if there's a common baseline AUP containing all of the required policy statements they see that whenever they register either with their research community or their infrastructure and then new infrastructures and services are confident that the new users have already accepted this so gradually this is gaining more and more traction there are more and more infrastructures around the world who are using this and have decided to use it in the EOSCUB and so if you're looking around as a service operator for an AUP we encourage you to consider using this baseline AUP it's just 10 policy points and to which you can then add your own GDPR privacy notice and any other service conditions which are different from the add more detail to other issues to do with the AUP here are the 10 bullet points I'm not going to go through these we don't have time they're therefore complete this for you to look but they've been widely discussed by many infrastructures across the world and agreed within the wise community so that was the AUP then for service operators we have a security policy based on the again on the policy development kit this applies to all EOSCUB hub services obviously not to the other ones that onboarding into the EOSCUB catalogs but we encourage other services to to consider adhering to its requirements because it's a good statement of best practice it meets the requirements of the wise SCI version 2 trust framework and it has policy statements about the collaboration with the infrastructures and the security team and here again there are just eight bullet points if you sub bullet points addressing things like you'll see in bullet point 2 there provide and maintain accurate contact information including at least one security contact who shall support certify which is another published instant response trust framework compliant with SCI and lots of statements there which are very useful so if you are a hub service it already applies to you if you're not a hub service we encourage you to consider abiding by this so this will stop to all through where we are with policies in EOSCUB now I can't see the guy out of that swing shops okay so I think at this point we just we move on so now we're going to look at landscape security threats and the sharing intelligence and what have you and I believe Roma you're going to speak first is that true yes just a very brief word of introduction for David and Liviu who are doing a lot of the work there to explain a bit of the context and in particular as you said earlier in this day and age the internet is a fairly hostile environment and in particular as an infrastructure research infrastructure we have to tackle a number of threats that are standing typically either from cybercrime which is involving huge amounts of money and so there is a high incentive for criminal organizations worldwide to attack our resource but also from nation states as it was demonstrated multiple times including by Verizon and other studies where they reported that around 20% of the attack last year were tied to malicious act or linked with nation states so it means that the adversaries we have to face as a community are really well funded and organized and we have of course limited efforts and resources especially on those sides who have smaller teams or less experienced teams it might be very very difficult because most participants in our community cannot afford threat intelligence fees or dedicated security appliance or the services of security companies etc etc so it's pretty difficult and also even if you could afford all this the information you really want has to be relevant and targeted for our particular community and this is very hard so really at the end of the day the only way to address this issue is to have full community based instance response and threat intelligence and this is really an important aspect the idea is that the first participants observing an attack the details and the specific with others as quickly as possible ideally via automated mechanisms so that everybody in the community can be protected and then we also mutualize expertise to address the issue and respond to the particular attack at stake and really this is the best mean to fight sophisticated adversaries at an acceptable cost so really the message here is that threat intelligence and having sites being able to make use of it is really the cornerstone of the strategy or operational security and our security capabilities so what do you want to hear from the next speaker is pretty important in this context okay thank you Roma David we had a slide our poll that we could use but I think we're running a bit behind time maybe we should skip that for now if there's time at the end we could always ask the audience but if not we just give it no I agree and I think just to give people a flavour as Roma was discussing the importance of threat intelligence and as I try and talk and share slides at the same time we go hopefully you can see that now so if I present those slides yeah and so as I talk as I talk to this really particularly one of the things that we're interested in and the poll questions that we had were around what people were currently doing and what they are one of you know even what their hopes and dreams of what they would like to do in this context so let me get started then can I just ask you to double check that you can hear me yes we can hear you and your slides are in presentation mode so that's fine perfect, great okay so I'm going to talk for the next five or so minutes about now looking at threat intelligence and security operation centres and really following on from what Roma was saying this is really cooperation through intelligent sharing and again as Roma mentioned allowing WLCG sites to digest and make active use of threat intelligence as a cornerstone of the WLCG security strategy and in fact the WLCG security operations centre working group was established to enable the deployment of security tools to enable this but although this began as a WLCG activity and it has a specific mandate there in fact this has wide relevance to a much wider community and so the working group now has members from across the academic research community including institutional C-Certs and N-Rens as well so it's a broader group but it has a specific focus as well and so the demand data of the working group is to create reference designs to allow sites to ingest security monitoring data of different kinds to enrich this data, store it and visualise it and then to alert based on matches between the stored data and threat intelligence using indicators of compromise or IOCs which is a very common abbreviation and the two different capabilities together forms what you might call a security operations centre you might also have seen this in the context of analytics for security and the working group has been working on an initial model for such a what you might call a minimum viable product of such a sock and it has four phases and I'll show you the phases here so you need to have a table with the technologies just so you have that for reference the first phase which is data sources and threat intelligence is really the coal phase it's the threat intelligence that you're receiving from external parties, from trusted partners and the intelligence that you generate perhaps within your organisation or what to share in different parts or as you're developing it make sure that you have an active copy of it alongside that you need some kind of monitoring system and so we currently specify two of these in the model, one of them is the Zeke intrusion detection system formerly called Bro and that offers deep packing and spec capabilities so you tap network traffic and then the system inspects every packet so it gives you very detailed, very granular logs which are great for looking back to see exactly what happened and telling you everything that happened the flip side of that is that it has a specific hardware requirement to let you to use that capability and then the other source that we've currently specified is using NetFlow or SFlow or IPvx to provide network flow metadata so this is at the other end of the scale where now what we are seeing is metadata about particular connections so endpoints IP addresses, port numbers, time stamps metadata but then not the details about exactly what's happening in the packets but again now that data is typically very often found in a wide range of switch manufacturers and there's host based clients that you can use and that information is very readily available but offers you a different level of information going back to what I was saying earlier about the range of teams that are part of the working group for example you might expect that an intrusion detection system is something that particularly a larger site would have for example CERN who has a production security operation center uses Zeke whereas at the end-run level they are not going to have that level of monitoring but they will have really detailed and really broad based NetFlow data so it's really matching your requirements and your capabilities to what you need so in the next phase so now we have our data it's great we're tapping the right network point or we're using our NetFlow now we want to ingest that data based on its ubiquity and common use we're using LogStash and therefore the out-components in the current initial model and so we specify two pipelines one for Zeke which is based on the use of JSON logs from Zeke and then using the file-beat transporter to ingest that data and then for NetFlow or SFlow there's a really nice LogStash package called Elastaflow which is a set of pipelines and dashboards which are really effective for this purpose then we use Elasticsearch which is directly from LogStash for storage and Kibana for visualization and then the last phase which is really important is the alerting phase and so there are two options here one of them is a set of enrichment correlates and aggregation scripts that CERN has worked on for their own purposes but is making available generally which allows specific alerting based on key information trying to package that up with the most important things you need to know and then also the ElastAlert package which was originally developed by Yelp which lets you report on lots of different filters inside Elasticsearch and so that gives you a very very quick overview of the initial model and as you'll see there some of these elements like Elasticsearch and Kibana are essential and then some of them and MISP of course as Earth Intelligence platform but then some of them are optional but you need at least one so you need at least one data source and you need at least one alerting system and this is a place where alerting is really important because you can't spend your whole life looking at dashboards there are lots of them so you need to have the system be proactive in telling you when things are happening okay so here is the technology stack I'm not going to talk too much to this now because I've already covered a lot of this ground but for reference and there are links there to the different components if you're interested but taking you through the different parts and kind of why we're using them at each phase okay so let me move on so I mentioned MISP there as Earth Intelligence platform and if the use of threat intelligence and its active use within a site is cornerstone of the WOTG sort of WOTG security efforts MISP and the threat intelligence platform itself is the cornerstone of the SOC because this is what lets you share securely indicators of compromise with trusted partners to pull that information and then to update that for the community and so the model that we're using is a hub and spoke model which is based around a specific academic instance which is hosted at CERN and this allows us to benefit from the trust relationships that CERN already has and the experience they have in this field and so the event data that's contained within it is primarily TLP green and TLP white so as a reminder this is information which should be shared within a community but it's not publicly available and information which is publicly available but in addition to that CERN has some events which are classified as TLP amber which should then only be shared with trusted contacts but gives you more detailed information about a live incident and so it's particularly useful for finding issues and tracking malicious traffic at participating sites and so with this service in place there is a document detailing the roles of participation for the service which is currently in preparation so what's the current status of all of this work so at the moment we have a number of prototype systems at different sites including the Rall Tier 1 in the UK and this is in addition to a production fully featured SOC which is in daily use at CERN I haven't featured that here but if you imagine the diagram I showed earlier but with substantially more developed hardware and infrastructure then that's effectively it shows the aspiration for the SOC as a whole and so in terms of deployment our focus at the moment is supporting the grid the Tier 1 sites in deploying these tools not only as the sites who are likely to have more resource to do this but also where they see that the most traffic is the biggest impact of having the sharing in place and then from an operational deployment perspective it also lets us gain experience with the Tier 1s and then can filter that down to the Tier 2 sites we also have a end-to-end test of the intelligence sharing workflow that we used at a workshop in October at Nikhev where we generated a misbevent at CERN based on a demonstration of malicious traffic we downloaded something and then said okay we're calling this traffic generated the event shared that with an STFC prototype and then demonstrated that we could then cause an alert to be raised based on triggering the same traffic so we've seen that the system works end-to-end and now we can work on growing our capabilities at different sites and let me leave you just at the end with some contact details for more information Liviu and I who jointly lead the working group are very happy to hear from anyone working on this who wants to work in this and also you can see more details about the working group itself as well as publications and reporting we've done on this at the web page there and so with that I'll say thank you very much and ask if there are any questions okay thank you David I don't see any questions right now in the chat window shop's going so if not let's now move on to Sven who's going to tell us about incidents et cetera so Sven can you do you have some slides can you share them yeah I try let me see okay I will get there eventually the thing is that I now probably have a USM on the screen yes wait yeah why not yeah if you have them David that would be good I'm afraid I don't have them so oh look I would use Sven excellent so welcome to the presentation then on incident handling in each GSC cert so since we have some new faces so my name is Sven Kaplil I work for the Dutch National Institute for Subatomic Physics and I am the GSC cert security officer so next slide please so if you come to the situation that you have to handle an incident you will see that having a computer security instant response team comes in handy so let's first start with a couple of brief notes on how to set up and further develop the C-cert next slide please so this is actually a rather long process and it will also not really stop so this is something actually every computing say project somehow has to go through so at the beginning you will have some steep development curve so things will go quick ideally you would already have some policy framework existing before the C-cert starts operations this is in particularly security operations by nature will will get into into the ways how users think they can use the infrastructure and you will end up in disputes and you don't want to have them during an incident here you only want to present a policy look this is what we all agreed on and please do like that or otherwise have a way to escalate it to some to the next level when you when you have the basic setup of your C-cert you also want to be a bit more clear about the services you want to provide and more important here under which authority so if you are simply and say if you only send advisories to your constituency to your users, to your customers so you probably only would need a very weak authority so you can always send some information but if you then what we will see in a couple of slides when you then want to do something like vulnerability management where you then would require the sites to take action and you want to and you want to enforce it you already would need some authority from the governing bodies you are operating in well this then again depends on what services you want to provide we have in each C-cert we do quite a bit which also needs some authority but further you of course need to say something on the service level you want to provide this for do you want to do something like 24-7 then you would need a very different budget than what we are doing so we are doing 9-5 and for the best effort which is actually covers quite a bit of the cases we have to deal with well when you have all this setup and you also have some information compiled for example on how to contact your C-cert you then would also reach out to other teams and build your trust network next slide please so this is a long process if you look at the timeline where we actually came from this is well before people like Google or Amazon web services started so we had at the beginning the European data grid and then Facebook came they meanwhile overtook us then we had the Aggie 1 project so the series of the Aggie projects where the goal was to set up a compute globally distribute compute infrastructure for research with the customers being the particle physicists using the data generated at the large Chetron Collider at CERN what you see here also is that in 2003 already the policy group started their work and then in 2005 the Operational Security Coordination team started so this was the predecessor of each C-cert which was led by by Romahos who is talking a bit later and then finally in 2010 we had the each C-cert where we had some major changes on how the infrastructure looked like in particular how the organization of it looked like so during all these years we did quite some development from well starting first with indeed just a mail list and then also building up services for our constituency next slide please so this is then actually then maintaining a C-cert and further developing it you would constantly run and update your services to new developments to new technologies you want to automate this in particular in a case like ours where you have more than 200 security teams at the resource centers you really want to have this as automated as possible and you also want to have procedures so that this more or less runs automatically and you have a minimal interaction with people triggering response for example but for the majority it should just work an example for that would then be the vulnerability management next slide please we do so this is the vulnerability management cycle and an example on how we actually make sure that our infrastructure is a bit at least let's say in a good shape so that you cannot easily well explore old vulnerabilities to just escalate your privileges and abuse the infrastructure so how does this actually work so a vulnerability is reported to us in our case would mean the software vulnerability group which is headed by Linda she talked before about that and here you have the usual steps so first you want to verify well this vulnerability indeed is affecting our infrastructure the criticality or the impact has this at that level so in our case we only distinguish between critical or high well moderately we do not follow up on and then if we have critical vulnerability this information gets sent over to the monitoring team they then update their sensors and probes we get an overview within one day which parts of the infrastructures are detected at the same time already the resource centers get information about the vulnerability issue and they start updating it and then after a couple of days we then handle in the incident response task force which is led by who had the presentation on the incidents earlier which is then handled by this team and they then also make sure that the actually the vulnerability is patched alright so every part of it actually what is the presentation itself but just to be given so the tools you want to have the main part is in a distributed infrastructure that you have communication tools so we challenge them well regularly once a year and here we use the incident response tools we usually use also in terms of response so we have a contact database and we have a ticketing system the ticketing system fetches the contact information for the resource centers every side gets gets contacted by the accepted channel so a mail and in the mail we ask for a measurable reaction of the context so for example clicking link this then can be evaluated on the web server and we have a nice overview of the reaction time of our security teams and for the ones for which ever reason did not react this then followed up by us or by operations but that we maintain quite a usable contact database and make sure that in case of an incident we actually reach our sites next slide please so this then the result from this year's run so this was in business raw data so this is not cleaned up for out of office hours and this was run during the corona lockdown we see that we have the majority of the sites already react quite quick so within 24 hours and all what is then after one day you can say and the 260 sites we were challenging this then contacted separately and we make sure that we get an explanation why this went not as smooth as expected span I wonder how many more slides do you have because you're 50 yes yes so let me let me let me let me simply skip about over the security service challenges so what is what is what is more interesting is the the the incidents also the ones we we are not looking now on is is tend to affect the the the neighboring infrastructure so what you see here is the picture what is what is at the moment so many devices connected to the internet accessing compute or storage resources this is even even even more true at the at the moment since people now have to work from home we have in this link we have some some recommendations or some advisories on how to how to do it securely what I'm about here is to do exactly the the clusters that the people are using if you have the situation as we had at the beginning that these clusters are well separated so the blueish ones are in each eye and the yellow and the amber one are in another project this is not much of a problem but what we see now next slide please is that these are really getting closer so they they have resources shared in in in different in different projects and by that incidents also affecting another another infrastructure are getting quickly closer to off to us and we have to we have to really make sure that we at least are a bit of ahead of things and and get this information as early as possible or even even put our our defenses a bit a bit more outside yes so here you can have the the link to the some some hints on how to do working from home more securely the contact info for the HRC third and then at the end also the link to the incidents we have which are which are now well in the press the the loading is is PRP white and well this is what we are currently busy with a bit that's it thank you very much then are there any questions I promise to I promise to record it and play it more more fast but I failed to do so we got there it's fine so let's give the floor now to Roma and then they still be a few minutes for questions at the end okay I'll be very quick Dave so really this is just to conclude on the various talks on instance response and threat intelligence that our goal here is to be able to achieve a successful instance response throughout the global community and doing truly global successful instance response is really really hard there are lots of technical challenges ahead but mostly the difficulties is coordination and the human factor as we've seen many times individual security can be really good taken on their own but collaboration is where they typically struggle and this is a difficult part we for example conducted a security drill between EGI and other security in the US the result was that they were really good taken apart but together they're really areas where they could do better and synergies could be optimized a little bit and this really is what at stake here is how do we improve the coordination, the communication and the different leaderships in global instance response and we are in a unique position in the academic and research community where we are required to make it work and I don't know of any other community who has had to deal with this challenge before the key issue here is that when you have distributed tasks and central tasks for different teams they are really everybody's jobs which effectively means nobody's job and finding out who has to do what and getting organized is really really difficult so to put it in a nutshell what we try and do here is first identify the key stakeholders at peer infrastructures and really approach them and identify these contact points then what really does help is to define a joint strategy for instance response like workflow, model, framework what have you where the roles, the different leadership activities and the expectation each other teams are explicitly defined that really really helps also reminding that remembering that there is no boundary for security incidents right they don't stop at the gate of the research infrastructure or the e-infrastructure so it's very important to keep an open mind make sure that we have good relationship with the campus or the scientific security teams to get the relevant information and don't limit ourselves to the exact scope of the mandate of our respective security teams so the best way to really help ourselves is to coordinate and help others as well and this is really the essence it's really how to collaborate and cooperate at the global scale and this is why it's so key for the whole project so that's what I have, I hope you enjoyed the different presentations and I think we still have a few more minutes for discussions and questions yeah perfect thank you very much Roma so either type questions in the chat room or raise your hand we can allow you to speak do you know how to raise your hand yeah I see Mark go Mark hi there can I speak yes okay in all of these presentations I haven't seen any discussion of human resources and training I know that's a significant factor in for example ISO 27000 procedures and practices I also know it's an extremely difficult problem when you're dealing with many many different organizations and many many different sets of HR policies and even labor unions any thoughts, any good ideas any common approaches that are being used maybe it's buried in some documents and we just haven't gotten to it to be at least pointed to all your good solutions very good question does one of my colleagues wish to address this so this was about about training well and resources right this is well if you have if I can jump in let's say you have 10 people at a given institution who are responsible for you know running an important resource in the broader federation there's an assumption that those 10 people have been properly trained in security processes as it applies to their jobs and that they are in some cases shall we say certified or they reach a certain level of capability and understand the requirements the perspective of them so I'm curious what how the federation the joint operation looks to each participant each partner to accomplish the subjective yeah okay so for example for each for the in-center response task force within each agency so we have a requirement if you want to be a member there you need to have some basic training in operation security that this this training happens actually well let's say the monetary aspect to it is actually subject to the employee of the potential IRTF member so this is the training part for the security people what we do internally in the project is that we also provide trainings to our security teams at the sites so we have various people who are on a well who are professionally providing trainings or giving courses at universities that then also develop trainings for internal trainings for our security contacts does this help a bit certainly part of the equation but thank you may I also comment this excellent question yeah it's a little bit sore point and I think the issue is that we always had to start we had done some attempts with Sven and Adas even had kind of admin security certification a long time ago an experiment but the scope is pretty broad and as you said this is a mandatory requirement in all security government that you have your stuff and different roles properly changed so maybe we should think about that again and focus a little bit should we focus on admins or security stuff for management yeah I'd just like to support what you say I mean I think it's been well recognised that if we as the security coordination team can get sufficient resourcing from the projects and things one of our vital roles is the support of others and the training and the dissemination and we also offer forensic analysis services to people who may not be so well trained to be able to do those other things but it's always a challenge there's always more that could be done than the funding we actually managed to get but coordinating these things and sharing our experiences with people in broad ranges is one of the important things we should do in the future with EOSC as it moves forward any other questions I see a comment from Matthew is participating in E-infrastructures will have a mature service management system so they expected to invest in their information security management yeah but as always there's enormous challenges that we we have small amounts of funding which are expected to cover a large number of things and there's lots of important things we need to do so we need to keep lobbying for more money for security based activities anybody else with their hand up I don't see anything there okay if that's the case we've perfectly reached the end of the session so first of all I'd like to say a big thank you to all my colleagues for giving the presentations and also to all of you for attending