 Good day everyone and thank you for joining our presentation today. Yeah, today we will first tell you a little bit about ourselves then we will give you some project context and then I will tell you some particular takeaways that I've taken away from this project from a project manager perspective and I will then hand over to Alex who will tell you a little bit about the technical challenges we faced and how we've overcome them and of course we'll leave some time for questions in the end as well. So we'll get right into it. So I'm Julie, I was born and raised in Germany but somehow a journey around the world has led me to move to Australia about 10 years ago and now I call this almost specifically Melbourne home. I have been working in the IT industry for about 15 years, seven of which I have been project manager for and in the last five years I've been managing Drupal projects mostly in the government space working for Salsa Digital. And over to Alex. Hi everyone, Alex Skripnik, Drupal Solution Architect working in IT for 20 years, 12 of which in Drupal. All right, we'll give you some project context. So this project has some sensitivities which is why this particular client has asked us to stay anonymous. However, for the context of this presentation, this is a federal agency in Australia and they were operating an authenticated user portal on an old version of MS SharePoint. This was very costly for them. It was not very flexible. It was visually highly unappealing and just overall not fit for purpose. So they had gone out to market and they basically looked for vendor to help them replace this authenticated user portal, migrate all of the existing data and show that the platform was able to scale up for a lot more data to come in in the future. And they also wanted to create analytical reports to get business insights in the absence of Google Analytics as an option. And they also in the future will integrate with an internal publishing system. Naturally, we cannot demo the solution without revealing the agency. However, I hope that in the next 20 minutes, we will be able to give you very specific takeaways anyway and make you feel like you had some good insights from this presentation. So I'll dive right in. So first one up. One of the key tools that I used in the discovery phase was a content data model which actually Alex initiated in the phase. And it basically outlines all of the content types with their fields, all of the vocabularies with their fields and the relationships between them. And that was a very powerful tool to myself but also to display to the stakeholders to show these relationships. So let me just quickly bring this up and I'm going to just say this is also a highly anonymized, sanitized version of the thing. Give me a second. All right. So it looked, find the mouse. Oh, there it is. It looked something like this. So you have a legend where you basically outline the different field values and then you have your various content types with the various fields referencing the vocabularies. And then of course you have the actual, you can see there's a pretty large content data model. You have all the vocabularies with all of the fields as well. And it is really nice now. Basically, what that allows you to do is what I'll not come back on this side. What that allows you to do is whenever you basically look at any part of your project basically, you can kind of cross check and find whether you're having gaps. So one simple example, you look at a Figma file and you might see there is a tooltip. And then you can quickly check your content data model and identify whether that vocabulary actually has a description field which can then display the information in the tooltip. So little things like that just make it really easy. It's basically like a puzzle. When you're trying to puzzle, you're also looking at an image to see how your puzzle pieces fit into that image. Trying to do that puzzle without it would be infinitely more difficult. So that's basically what a content data model can do for you. So if you ever end up in a complex project, I highly recommend it. Right now, let me just figure out how to get here. Okay, next one. Staggering work is a risk. So we all, I think, know projects rarely ever have a lot of time. We somehow always end up in projects where time is real. So it's very, very tempting to stagger work as much as you can. In our case, we basically went through the design discovery and out of that came wireframes and we started doing our backhand work based on the wireframes before the high fidelity designs were done. And then we started the front-end work. Now, of course, what happened? The UI designer went in, looked at the wireframes, thought about it a bit more and started to tweak things. Unfortunately, those little tweaks, you know, they're not always so little to then implement once you've already done the work. So it basically puts you in a position where you're either kind of very inflexible because, you know, after all the work's done, or you need to refactor a lot of the work. Now, my takeaway here is not that you shouldn't do it, but just prepare your client and prepare yourself for that to happen. So if you are going to stagger it like that, make sure the client is aware of that risk and is not completely surprised by the fact that you might have a few additional stories in your juror backlog to then correct, make small corrections of the work that was already done. All right. And then there is accessibility. They have already been some presentations on accessibility and we also had our own personal takeaway. We were asked by this client to implement a two-factor authentication method by which a phone call was made on a desk phone. We couldn't, for various reasons that I can't talk about, send a message like usually would to a mobile phone. Now, we had pretty much implemented this and we had all the various screens where the user was asked to do A, B, and C. And I had gone to our accessibility specialist and I said, hey, can you please have a look at these screens and make sure, you know, people can look, tap through them and, you know, it's all accessible. And she looked at the screens and she said, yeah, truly, your screens are fine. But, you know, how is a hearing impaired person going to enter this code? And I literally just stood there and I had this right. Yeah, no. And yeah, it was one of those moments where I think we were all kind of bothered by the fact that we had just not considered that. So, yeah, for various reasons we moved away from this solution and we moved towards other solutions. But, yeah, definitely, and I think Philippa mentioned it today as well in her presentation, just make sure you consider accessibility all the way through. Sometimes it can be tempting to focus on a problem and how to solve it. Just make sure you solve it for all users. And this one was a fun one too. So, we had this incredibly tight timeline and we had delivered all of these beautiful designs to the various stakeholders and the stakeholders came back and they said, this looks fantastic. This is awesome. Go for it. There was literally no feedback. It's like, great, that's fantastic because we have no time so, you know, let's just move on. And there was this little voice in my head that said, but, you know, it was very, it was very convenient to have no feedback when you're on a very tight timeline. And guess what happened? As soon as we built the thing, stakeholders came back and they're like, well, obviously, you know, we need that banner to be smaller. And I was like, obviously, huh. Wish I had known that earlier because, of course, at that point it was so much more expensive, so much more time consuming to correct these things. So, my takeaway from you is, for you is that if you ever get no feedback, you probably just need to ask more questions, go into another meeting with those stakeholders, talk them through in more detail because more likely than not, they have just not spent enough time looking at these designs. The devil's in the detail. The Germans say the devil is a squirrel. Don't ask me why. So, get stakeholders across it or maybe just adjust the expectations. What do I mean? We had gone through, you know, basically months and months of projects, like months and months. And we get to this day, this go live day, and we are so excited. And we get there. And a week later, within a week, we get a list of enhancement requests, literally like 30 or so requests. And it was literally for, I remember talking to the tech leader, I was like, how did this happen? You know, how did this happen? We demoed every two weeks. UAT got done within a week of every sprint closing. It was, you know, literally the content editors were some of the key stakeholders. We had given them a training, really, you know, weeks and weeks in advance of the actual go live so they could go in and they could start having a play with the platform. And really, I felt like we had everything. We've done everything. The product owner was highly engaged and really tried his best to get all of the stakeholder feedback as well. So, you know, what could we have done differently? Initially, I thought, well, maybe we should have brought the stakeholders more across the Jira stories, actually, like, shared all that detail with them. But looking at it and looking at the feedback again, I realized, no, it would have been across the board. They couldn't have read every Jira story. So my conclusion was, you actually have to expect it. If you run a very, very complex project of that size, you probably will only really test it when you do go live. So have a week, have two weeks, have a hyper care sprint where you actually can just take that feedback and board also mentally prepare yourself for that to happen. We were really kind of disappointed. But I think if you expect it, you can interview the client expects it and everyone expects it, then you kind of say, hey, you know what, we'll go live, we'll probably get some more feedback, we'll have a sprint for that and we'll have budget for that. So that was that takeaway. And this gets me to search. Now, it's not the first time. It is not the first time that this has happened to me in a Drupal project. So this is why I've chosen to include this. Search filters are always difficult to implement because people have very, very strong views of how they should behave. In our case, the client wanted faceted search combined with view filters in a toolbar above the search results, essentially wanting the results to refresh without the page refreshing and also combining that with checkboxes to either include or exclude certain results. So Drupal provides about 90% of that out of the box with the faceted and search API modules. But to override and tweak those last 10% turned out to be very, very, very tricky. So yes, if you ever see faceted search in your designs, just add a bit more risk, add a bit more time, because you will likely need it. And lastly, transparency is key. We had bi-weekly detailed project reports delivered to the client, which very clearly outlined how we were traveling from the budget perspective, how we were going on risk, what challenges we were facing, how we were burning down. Really everything was in there very detailed, but also we had a weekly check-in with them where we go through that and we would just basically put everything on the table. So there was literally a no surprise environment for this client. They knew things when we knew things and they could help us solve the problems we were facing. We also had, of course, daily stand-ups and retrospectives. It just helped the client make informed decisions, really. I think it can sometimes be tempting to either hide or maybe under-represent mistakes or challenges, but really when you have a large complex project, they are unavoidable both on the supplier side and on the client side. So I think it is my responsibility as an engagement manager to make sure that I create an environment where everyone is comfortable sharing whatever needs to be shared to make good decisions together. And that's it for now. For me, I'll hand over back to you, Alex. Thank you, Julie. Very interesting insights in that. My side is more on solutions. I just want to start with the fact that government projects are harder than normal projects and secure government projects are 10 times harder, I think. We have multiple items here. Just going to fly through them. Infrastructure. By definition of that project, we had to come up with the infrastructure that would adhere to government security controls. Not just some, but like 750. That's quite a lot of controls to adhere to. And we also had to go through the internal assessment and accreditation. Well, we did spend a couple of months there. That was a bit challenging, but we went through this line, I would say. But on the solution side, we're looking to multiple vendors and ended up with having a dedicated AWS account and an amazing team helped us to set up a lagoon cluster. And a part of that, we also had to go through baseline clearance, which is another interesting exercise if you work with government projects. So, yeah, our internal team had to go through this process of getting this certification done. And another part of that infrastructure was to build security Internet Gateway. Again, we looked into several options. We're quite happy with solution that Amazey provided to us, which was based on Fastly. Also, that had a firewall. And the good thing about that, the decline was able to have a tiered access to that solution to be able to configure and add rules and have certain logging available. And then another part was the content migration. This was hard, not only because it was a lot of content or 100,000 content items and data assets, but it also was like 25 data structures. But it also has to be sanitized, which I'm going to touch in a second. On implementation side, for those of you who have done migrations, well, migrations are hard because you need to have a process of how to validate that whatever you're migrating is actually have been migrated correctly. We have came up with the process of having a validation and nightly builds. So we would run the migration overnight and with a fresh install and that would show us what kind of defects we would have. And we will have a next day to rectify those defects and have another run. And then on the validation side, we would pick a randomly, we would select randomly a set of data and we would share it with the customer and the customer would collaborate with us to basically validate that the data was migrated correctly. And we would repeat that for two or three months and basically everyone would be very comfortable with the fact that everything went through smoothly and we would have all that data migrated. So we actually nothing fell off. And as a part of that migration exercise, since we are running multiple non-production environments, your migration environment, development environment, pull request environments, and there is also a production environment, we needed to minimize security footprint and we needed to log all the access to the content assets because we even within our team, we had a tier access so only solution architect just me and technical lead had access to production data. Everyone else in the team, including developers and QA, they did not. So we needed to find something to sanitize it, some way to sanitize the data, but at the same time allow to validate that whatever we migrating is actually there. The way we achieved it is by using separate AWS three buckets, so we would have a bucket with the data which is sanitized, attached to our development environments. And then overnight, we'll use this amazing module, highly recommend to everyone, GDPR, Drupal module, that you can provide the configuration to specify which fields on your source database get sanitized and sanitization rules themselves can be random field or sorry, random value or empty value or something like that. And you can specify for any field. And then there is another pretty cool module where it's actually packaged GDPR MySQL DOM. That is a that plugs in into Drush command. And what happens is if you run your Drush SQL DOM, it actually streams that through that GDPR Drupal module and it dumps the database with already sanitized data. So you don't have to, normally you would have to have another database where you would take the production database, import it into second one and then then sanitization and then export that one. This module allows to do it like on fly. And so you don't you don't need a second database. Practically, we would run another job in production by dumping database into this dedicated S3 bucket. Actually, if anyone interesting, I'm happy to talk about it more. We probably going to implement it on all our other projects, this approach, because what happens is it's not only that you sanitize the data but also you can significantly reuse the size of the database. So if your production database is like three, five, 10 gigabytes, you can actually go down to 100 megabytes quite easily. And then that database can be moved to like CI environment or local environment, which is yeah, it speeds up the development time. And yeah, and also the kind of funny one is that the dummies we needed to since we were dealing with not only was content but also was files for lots of PDFs. But no one could see what's inside. So we needed to create PDFs that looked like or named like the original files but containing some dummy content. We needed that so that all the internal linking and all the entity linking and files linking would work. So the QIT team could actually validate that whatever was migrated would work together and I guess, referenceable. So yeah, we created a couple of scripts to just replace real files with dummy files. And this is the most interesting part. So one of the phases of the project was after we've done the migration, the original migration of the 100,000 items, there was another data source. And we needed Drupal to work with 1.2 terabytes of data. And yeah, terabytes not gigabytes. And with 500 gigabytes of data growth every year. So that is quite a lot of data. And that also has to be searchable. We looked around and we're like, hmm, can Drupal handle this? It's good that we were on AWS already. So we looked at solutions and open search. Yes, so open search is elastic. It's a fork of elastic search. But it's in AWS, available to service, cloud-based. So what we did was we used this module, again, pretty cool module, called external entities. What it does, and elastic search connector, of course, what it does, it's, you can have your nodes or your entities sitting within a third party system like elastic search or open search in this case, and then create a mapping for the fields so that those entities appear to Drupal as they are normal nodes. And then you can use views, you can use whatever infrastructure or structure of Drupal you need to just communicate with them. And you can use Search API to actually build the search. So yeah, that 1.2 terabytes of data is the problem itself was solved like that. And open search is fast. And if we need more space, it's cloud-based. So we just increase the size. And I think it goes, I believe one web head there goes up to 4 terabytes. So for the next maybe five years, we're fine. Four, five years, we're fine. And then we'll see if there's another option. Yeah, I'm just going to fly through this one. So another option, another point was to do a visual uplift. And some of you have seen Civic Theme, which is behind you right now on the screen. That's open source Drupal Theme and component-based library developed by Salsa Digital. And we've used it in an early version to basically uplift visually this project. The good stuff from all of these was that we were able to validate Civic Theme as a design theme quite early in the process. And that was good because we were able to do a Drupal Theme, produce a Drupal Theme out of it. And on the business side, that actually saved about 80% of the front-end budget since 80% was done in the Civic Theme. But we did spend time on that search, which was not covered by Civic Theme, because the search is hard. One of the features of this platform was a subscription system. There are some subscription modules available in Drupal, country space. But they all, let's say, limited for a reason to certain no-types or other things, didn't fit us. So we looked in the custom solution. Basically, what we needed to do is there will be a content type, or actually a content with fields and which is a taxonomy. And then the members of the portal would subscribe to changes based on taxonomy and would produce some sort of daily digests. And the number of variants multiplied by number of users was like 100,000. So we couldn't, basically, it's like a mail merge. If you know what mail merge, have you heard of it? Every user would get some sort of customized, experienced subscription email. So we decided to just go with batching. And so we would batch production of this unique permutations, I would say, of content updates with who they have to be sent to through a queue. So we'd have a queue running and just calling for these updates, listening, what kind of content has been changed and who should receive it. And then, based on that, it will be a second queue. It will be like pre-rendering emails or actually email content. So we can then pass it to a next system to actually send these 100,000s of emails. And I think we're talking about 10,000 emails per day, no, about 5,000. So it's a lot. For a Drupal site. Again, looked places and they'll be like, okay, we already have a WS account. Why don't we look at the one of the services, which is a simple email service? So SCS. Surprisingly, this was the easy part because we already had a system that would generate an email. So it was just a matter of passing them over to the system that would be sending them. And that was okay. So one thing we had to be considerate is the sender reputation. So sender reputation is something that you need to monitor. So if you send an email and it will be marked as spam, and many people are going to mark it as spam, your sender reputation may go down and your account may be blocked by Amazon. It doesn't matter how much money you pay to them. They just consider that. So this is the spam prevention technique. So really, I have to be monitoring that. And if something happens, you need to respond to their emails to Amazon and you explain what kind of content you send into them. So that sender reputation doesn't go down. I guess the last part of the system that we built was analytics. And yeah, so we needed to build a secure analytics system. And we needed to allow administrative users to be able to customize existing or create new reports for analytics. Users usually didn't know how to use Google Analytics. And they get used to the fact that that's very flexible and powerful. And for reasons of data sovereignty and personal data storage, we couldn't use that. Again, looked around what's available in open source space. And Matomo. It appeared to be quite a robust system. Very, very nice on the UI as well. So all the, I guess, our users liked that. So they were able to build robust reports and complex reports with the data. And our solution actually sits within AWS as well. So I would say thanks to the Lagoon implementation that we've used and whole Docker container-based system, we just spun up another container, which we had to, like our internal team did it. So we didn't have to involve anyone external to do that for us. So now we have this Matomo running in a container in our AWS. And I think we even have a development instance of that if we need to play with it. And we have analytics basically feeding into an internal system. And it's kind of closed loop. That's it for my technical side of things. So back to you, Julie. Thank you. All right. I have a final conclusion. Finding the right people for this job is critical to the success. So this was obviously, as you can tell, a highly complex project that we've not even told you half of it. And I really attribute a lot of the success to the fact that we really just had exactly the right team on this project. We had a very strong technical lead who was approachable and a very good communicator. We had a very strong solution architect and Alex who would make sure that all these solutions were found and would fit together. We had a migration engineer who has had years of experience in migrations and could literally foresee challenges before we saw them. And he literally smashed through these 25 different migrations groups. We had a front-end developer who had lots of years and years of experience in front-end, but also particularly in implementing design system, I guess, front-end based on design systems. And we did have a developer who could do a little bit of all trades. It was good at front-end and good at back-end. And a very thorough QA who really also looked at the big picture, which is, I think, really important, because otherwise you'll find the problems when the thing goes live. And a strong partner in AMAZE for the infrastructure side of things and a very strong design partner in Oliver Grace who made sure that that visual uplift happened and the users would now actually enjoy using the portal. And lastly, but not to be forgotten at all, we had a good product owner on client side. We had a great project manager on client side. They were able to make decisions. They really involved the stakeholders. The stakeholders were knowledgeable. Basically, it all really came together really well. And I guess that's my opportunity to also just acknowledge that. And that's it. Questions? Sorry. I'll try to speak louder. So it's just a question around working with the federal agency and data is quite sensitive and all that sort of stuff. I'm just curious to know why there wasn't any considerations given to unit testing or saline testing? Oh, we have tests for everything automated. We even had tests for the validation of migrations. The visual was for the human, the 10% of what you cannot automate. So we needed someone to look at things from human side. Does that make sense? So they said there were unit tests but not saline tests? We do have unit tests, functional tests, front end tests, but that would be testing generic things, not the actual content of the migrations. So we needed the human direction test that's all coming through probably. I actually just had a comment more than a question. You said that when you handed it over, you then got a list of 30 enhancements. Yes. When we did the CASA website last year, we had a beta version up publicly for three months, not one beta feedback. We went live and with two months, the storm of emails was, oh my God. And we sent to everybody, but it's been up there, like why haven't you looked at it? So I completely survived. It hurt a little bit. Thank you both for the talk. My questions are in the open search data on how are you managing the backups and the stores into that data store? We are using, can I answer that later? Yes. Or not at all? There are. What I can say is that for every AWS 3 bucket, we have a second bucket which logs access to the first one. And we have like 30 buckets. Now, to answer your question about that open search, we do have a backup system, but it's not AWS. So we have like a offline backup system. Now, whether that is, sorry, I just can't say more than we do have that. We've managed the backup, but yes, sure. That amount of data, restoring that is problematic. So as it's like it's long, and the way to restore it would be to speed up another instance, import it, and then do a switch. Doesn't that make sense? Yes, so it's not. Thank you. Oh no. You chose a little bit better. There's a GDPR module for sanitizing databases. You also said you can use it to squash the size of your data. That's right. So what happens is, two parts, right? So you have a Drupal module, and then you have this package that is basically you replace Drupal SQL dump with Drupal SQL dump that's sanitized or something, right? So you just change the command. And then the Drupal GDPR module is specify the configuration of the fields, so how you exactly want to sanitize them, and that Drupal command that just exports it. So first module configuration, second one is actually action to do it. How hard is array? It's array of fields, and then you just specify if it's empty to empty, or empty to empty the field, or put a random number, or put a random number, so you choose, and if you actually put a lot empty, lots of empty fields, that will squash your database because you will have no content. And what was it? You were mostly using an external data system for your storage, so it still kind of works, even though you're no longer using a local database. So you're using an asset storage, so to store your data. Yeah. And this, well, I mean, I'm going to achieve the answer, yes, because you did it. This still works, even though you're in, you're dangerous and still, you're not kind of storing it in a sub-traditional database. That's right, yes. So what happens is there is a connector that can we're just basically, sorry, there is a module called external entities, right? That module is connected with the fields and a URL to Elasticsearch. So every time there is an internal Drupal programmatic or API or whatever request to that, you know, like node slash 123, that actually knows that that entity type is external, and that goes to that Elasticsearch, retrieves it, and pulls it back, and also there is a caching mechanism as well. So you can even set it up so that most, the one that you retrieve the most, are cached inside of Drupal. So you can still store, imagine if you have like historical data, like only a handful of them or hundreds or even a thousand of them is used. You can cache that one thousand, but your other million records can store Elasticsearch and access a little bit slowly. I'm happy, sorry, I'm happy to talk about that more in detail with you. Thanks everyone. Thank you.