 Hello everyone, welcome to our session and thank you so much for making it to the very first session of the day. We didn't know when we you know they when they're planning things they don't always tell you when things are occurring and we thought this is actually awesome because now we don't have to be worried about this for the rest of my name is Melissa Bent and I'm a senior software engineer at Red Hat and this is my first time presenting ever at a conference which is kind of fun. I did that just for the I did that just for the clap so thank you for for getting that cue for me. I am the lead engineer on our customer portal which is at access.redhat.com and there are many engineers I should say this there are many engineers on that on that website I'm on the Drupal portions of it and I live in Mampa Idaho so I got in here very late last night I might be a little bit manic it'll make for more fun during this presentation and I have a small farm with goats chickens and many dogs I also love all things 3d printing Lego IOT and anything to do with plants so should we run out of topics related to Drupal there are many others if you'd like to talk to you about any of those. And I'm April Sides I am a senior software engineer at Red Hat as well or a back-end Drupal developer I work on developers.redhat.com cloud.redhat.com and kubernetesbyexample.com I too am into the Lego Harry Potter and Lego Harry Potter I work on the Drupal community working group community health team the alley talks accessibility talks monthly virtual meetup and Drupal Camp Asheville speaking of which is happening next month July 7th or 9th in Asheville, North Carolina trainings, unconference sessions, social events all the things check out the website get your tickets today or tomorrow or whenever back to our regular scheduled session so today we're going to talk about the problem we were trying to solve we'll go over a little bit of the discovery process and talk about our solution we will talk about a couple of different implementations that we're doing at Red Hat and then talk a little bit about our future plans so the problem this isn't a problem but Red Hat is big it's a global company with over 19,000 employees we are built on open source principles originating from the Linux community we use a lot of different technologies to serve our customers and these technologies include open source and homegrown solutions and redhat.com is made up of multiple Drupal sites and single page applications like react and view so our organizational data like product information and taxonomies ends up being duplicated in each site and app and managed by different product owners and content managers and engineers so changing the data is a very manual process that has to propagate across all sites and apps by different people and this leads to sad data pictured here is data from Star Trek being sad so we care about our data quality we want it to be consistent and accurate there has to be a better way right some data should be managed centrally and made reusable across the ecosystem so I wanted to make sure that I mentioned this as well I came to Red Hat about two years ago and discussions about this topic were already happening before that we kind of took it and ran with it so there are many many people involved in this we're just doing the presentation right so this is a this is a group effort but we are doing the presentation today so I want to make sure I called that out but when I when I came to Red Hat you know you go through the whole onboarding experience when you come to a new company you're like what is this what is life which team do I talk to and I used to joke with my my co-workers and say things like I feel like every time I talk with someone here it's like in Mario Brothers when you got to the end of a level and it says I'm sorry but the princess is in another castle because it would be like oh no that's this team and you talked to that team they're like oh we don't do that anymore that's this team and it that's kind of like a scale issue of course but really what it comes down to is because of that kind of like because things were split across multiple teams across multiple time zones countries it means that we ended up with a a data issue of who's going to maintain this who's going to be our source of truth and everyone agreed we needed it but they didn't all we didn't always agree on who should be the person doing it because we all have our own stuff right everyone has their own things that they have to do so I decided because I work on the customer portal that I would go ahead and try to do this because we needed it customer portal we we try as much as we can to make it easier for our customers to get the most out of their subscriptions through articles and solutions documentation all different kinds of things and we needed them to be able to get to those things quickly and also be able to show them hey you're talking about this did you know that we have this as well all those kinds of things that in a normal world in a normal like small Drupal system yeah you make an entity reference you you make taxonomies you make connections that way but when you have all these different systems in this complexity it becomes a problem so we had the greatest data needs as well on the portal because we're a support site so we don't just support the newest and best thing we support the server that's you know 12 years old that's running in someone's back office as well and they still need documentation for that or they still need oh I'm having this weird issue oh here's an article on it right so we have the greatest need for for information on this and specifically around product data meaning versions end-of-life data document all these different things and between that we also had multiple teams working on this together and it becomes it at scale it was getting difficult to grow that and do it in a way that's maintainable so talking specifically and airing a little bit of dirty laundry when I was onboarding onto this project I started realizing that some of our most important pages were actually just Drupal nodes the classic page content type where you're like I need to get this up today and we're going to just put it into the body field and then it's like the thing I say all the time is what happens when for now becomes forever right where you're like I'm going to put this up here for now and then three years later you're like we're still copying and pasting HTML into the field which happens all the time more than we'd like to admit so we see that kind of thing happening which is a bad developer experience it's a bad customer experience because it leads to things where things get out of date things get you know it's a product update comes out and it doesn't trickle down the way it should so additionally it makes because we're not using the CMS in the way it's supposed to be used it leads to us recreating content in multiple places all the time again it created fragmentation issues with maintenance issues with just getting our best forward to our customers but also again our our main goal was to get customers the information they needed as quickly as possible so they could be successful and then yeah the biggest one I think is at the very end here which is that we had lost time not just for customers but for developers doing things that they shouldn't have to be doing with their skill set the thing I keep telling my developers I want to get them out of the content game we shouldn't have to have a developer making these changes because it's too complex for a content editor to do so these are the kinds of things we were looking to fix so specifically what came out of this discovery process is that we needed to be able to share our data in a way that scaled well across all of our properties we needed it to be flexible enough that we could work with our team but also other teams we have a lot of other teams that work with data in their specific way we also needed to have different tech stacks because we have Drupal which is what we work with but there are plenty of other as she said before homegrown solutions there's salesforce data that you know we want people to work in the tool that they're comfortable with and we didn't want to sit there and say oh by the way now you have to learn Drupal to work with us it didn't make sense so we they needed to be able to work at the tech stack that they already used and then additionally we wanted to have it specifically be a single source of truth this is something you can rely on for product data for customer portal so our solution wait for it data lake so looking back at our requirements from before with a data lake we get scalable low maintenance solutions we have flexible structure and the there's also a schema on read option with data lakes where you can apply a schema onto the data when you're pulling it out rather than when you're indexing it it also or there's lots of tools for connecting with our current tech stack there's Drupal modules there's GraphQL there's PHP drivers and then as far as being able to set something up as a single size source of truth you can bring your own government governance so you can make it your single source of truth but with great flexibility comes great responsibility isn't that right Melissa that's correct April we didn't plan that at all uh thanks so whenever you remove some of these governance tasks from Drupal's purview it means that you now have to do it in a different way and uh one of the things that we had to guard against was what we call data rot which is when your data gets out of date like why is this in here this is you know three years old this should have been deleted a long time ago just things that normally in that in a CMS is like second nature I unpublished that it's unpublished it's removed from the site all the the things we take for granted and we also wanted to make sure that we're not having to reinvent the wheel more than we have to right to to make this work so we wanted to rely on things that we're already present in Drupal to manage the content itself we wanted to establish a governance plan um specifically around who's allowed to put things in what kind of data we put in how they use it and that's the biggest thing I think that was really hard to get down because we had to get people to agree and when you have lots and lots of people who all have things that they need it's harder to get them to agree on what they're actually going to put in and how and we also wanted to make sure that we restricted it in a way that protected customer data if there was any involved in what we were indexing we also wanted to make sure that we were protecting anything that was gated so that if there was security information in there related to like a release that had not yet been made public we wanted to make sure that that didn't go in um specifically not that uh people would use it in in a bad way but I wanted to make it harder to do the wrong thing right because when you have a really big system like this and it's removed from the the original source that created it it's easier for someone to see well the data is there I'm going to post it but they wouldn't necessarily know oh not yet right so we wanted to make sure that once something went in that it was free for someone to use and that was a really big important piece to get into the system and other things like GDPR we wanted to make sure that we were able to accurately identify information that needed to be managed to stay in compliance with privacy concerns and I guess it goes without saying when you have a large company like Red Hat it has to scale it has to work really well has to uh it has to be performant uh with redhat.com it gets a lot of hits and I don't know the traffic on redhat.com but on the portal we get like over 300 or sorry wrong over three million hits a month and I did the traffic analysis before I left and we were getting 15,000 hits an hour which is a lot I've never worked on a site quite this large so it's it's very very large for us and we wanted to make sure that that was maintained or at least that if we were going to do something it didn't make it worse right we wanted to always make something better and the the other thing that we liked about this data lake concept is that the architecture itself is simple anyone who's worked with Drupal at scale knows that yeah you can do it but there are a lot of different things you have to do to make it happen and there are a lot of layers but we wanted something really simple and this that covers that additionally this acts as a caching layer for Drupal everyone knows that with Drupal there's a lot of caching involved in general just within Drupal itself you can add additional layers on top of that but the the data lake itself if you do it the way we're doing it it allows you to then it becomes kind of your presentation layer of Drupal instead of using a traditional html page so our solution was MongoDB this is what we ended up using for our database backend for our data lake one of the things we liked about MongoDB is that it uh it doesn't care about what you put in it just says oh you have data in here that's great it's JSON blobs it just goes right in and that is in alignment with our schema being configurable customizable things like that uh it's not opinionated about what you put in as long as it as long as it's a JSON object you're good to go uh so that that was the direction we went so here's the nitty-gritty this is the Drupal part you're like you're talking about all these different things what about Drupal this is the Drupal part so this is where Drupal comes in into play in order to make this happen we decided to go with search api which most people everybody here knows about search api right you've used search api before i see three people nodding their heads more people yay okay somebody back there uh we made a custom backend that interacts with MongoDB uh so there does the MongoDB module on Drupal.org and then we also use search api on top of that using this custom backend it allowed us to create an event that we could fire and then we could react to that event with a subscriber using you know Drupal 8 plus concepts and uh because of how search api works it helps you prevent data rot the things we were worried about right if something changed it would go in instantaneously you automatically have access to all the search apis filters or plugins i should say so if something's unpublished you can have it removed all those different things it simplifies a lot of the process we can rely on the the shoulders of people before us and and build upon that and additionally uh it allowed us to be flexible with our schema which is absolutely a requirement for this when we were creating our data we decided that there are kind of two different levels of of data we needed to to worry about there's what we call primary data which is information that should be shared across all records so that way if you're indexing you can rely on that being present like a title right created date updated date author those kinds of things and then what we call secondary data which is where you can make it customizable for the source so for our product data it would be things like logos links to logos or a description that's specific to a product things like that so it allows you to then diverge from that primary list of data and create a unified schema that allows you to query things consistently and then on top of that you get all the Drupal has to offer so you have access control for the the ui you have permissions you have security and privacy you've got all the good things that Drupal offers involved in the creation of that content in the management of that content you don't have to reinvent the wheel to make it work it's just that in the end when you hit save instead of it showing up on a on a web page like a standard Drupal web page it gets indexed into this data lake and then as far as retrieving that data we have like i said before single page applications and Drupal sites so on the single page applications they're using GraphQL a GraphQL layer on top of MongoDB and for the Drupal solution which I've been working on we've been using the PHP MongoDB driver to just query the MongoDB database directly and we'll talk a little bit more about that in our implementations and hey we're in our implementations your transitions are so smooth so again back to our product experience which is this the reason we keep coming back to this is this was our proof of concept for the data lake we only just launched a page using this concept earlier this year and it was the result of two years of discovery conversations and effort to make it happen and the best part about it to me was that this is our product index page it's at access.redhat.com slash products you can go there today you can see it you can see it rendering this is a single page application that pulls information from MongoDB using GraphQL there's a GraphQL layer we're not going to get into the GraphQL layer because that's actually not something we have another team that manages that part for us so we're talking specifically about the Drupal portion but the the thing i loved about this is that the measurement of success was that our customers didn't notice that it had changed right that literally we swapped it out from being an html blob in a in a node to this statically built page and nothing changed on to their perception other than that it was like 200 or 300 faster when it loaded because it didn't bootstrap Drupal to do anything there was no caching layer on top of it it literally was just a statically generated page that went popped right up okay and the thing i loved about it is that our front-end developer when we launched it i was like chase are you so excited you don't have to ever touch Drupal again if you don't want to and he's like i don't know what to do with my hands it's like specifically what he said because now he doesn't have to touch the content anymore he can just focus on the build he can focus on the code he can focus on beautifying the actual presentation of the data which is what he wants to be doing anyway and we're generating this via a gitlab pipeline every 30 minutes so it automatically rebuilds the page every 30 minutes we can change that timeline we're still kind of tweaking what that timeline looks like but for to start with that's what we did because our product information doesn't change often enough to make it more than that but it does work really really well so far and i think the bottom point here is the one that i think is super important where because it was a page content type before and there are a lot of people in this site people could have technically deleted this page and then we would have no product index page anymore and now it's protected from that by being a spa we call a spa static their single page application is a spa it has two different displays there's an index page there's a category view that you see here on the category view we also have the ability actually on both views but we're pulling in translations from Drupal as well as part of the data lake this you you can tell that not all of this is translated because we're still getting the translations in but i wanted to show you that that's actually present on the page and it supports it natively from the data lake and just in april we launched our product pages themselves using this same architecture and you look at this page and i i mean you look at this and you're like well that could be a Drupal page absolutely it could be a Drupal page no questions about that but because it's statically generated because we're pulling things in from the data lake because we're using graphql we can do things like the section down there that's documentation that's pulled from a completely separate team that uses a custom homegrown documentation storage system that has nothing to do with Drupal and knows nothing about Drupal but we can pull that information in and there are tons of ways you can do this absolutely but for us the foundational data layer we're building here allows us to then build new things off of it and so we're building that structure now for this particular one product lifecycle api which we have for our customer portal it handles end of life data and things like that we're actually using that to set our titles of our products every so often though the company will tweak a name of a product or rebrand something and getting that information out to the public is a slow process and i remember when i was doing a proof of concept of the Drupal system and i built this this integration for the api that a name change had come through that night and my manager is like hey does this support like pulling in updates and i was like you mean this update that's already here right now that came in last night it was like already in and i was like and when it changes the name it saves the old name in an alternative name field so that it supports it for solar searching across all the websites and he's like that's so cool i was like it's amazing because then we don't have to do it anymore right we don't have to do it manually um and then uh something that i did add for this particular instance is canonical links are managed in Drupal as well so that's not something that someone's adding by hand anymore there's a lot of validation there uh let's see this is the new feature we added which is product bundles so some of our products have products within them there's a there's a bundled section in the middle there where it says included with red hat amq on here those are bundled products and in Drupal that's just an entity reference right but in the data lake it doesn't have it doesn't have any opinions about what things are so because we now have this unified id this an internal um this internal source of truth we can recreate kind of recreate that process but in the data lake instead and then make that available for other systems to query in the same way um so we can actually pull in all the same information documentation links those kinds of things in this way but then other systems could do the same so they would actually be able to say oh i want to know all the bundled products with red hat amq here they are so for the ui on the back end this is pretty con i mean you'll see this is basically claro and the the new beautiful add the new i on the back of dribble i think it's dribble 10 it's default now isn't it dribble 10 is it not yet okay but anyway this is claro uh i made a custom product entity using using drush and uh the the bundles entity then is pretty standard you see this kind of stuff all the time where you've got an end of your reference field and you you reference those in and um one thing i did do which i was you know it's it's a small thing but it's really nice is that the order of the products being added to the bundle are respected in the data lake so that our product admin who's managing all this data she can just rearrange these fields and then on the front end it'll reflect it automatically and every time she saves the product bundle it triggers an update to the data lake it's instantaneously available so that when it's next built it shows up on the page so to sum up you manage your data in dribble you index it into the data lake using search api you query it with graphql you build it with a git lab pipeline and it's refreshed every 30 minutes and then you enjoy your newly automated life and i get to follow that act wow all right so for the first site that my team integrated the learning paths data lake was the developers.redhat.com this is a site to learn about new products and technologies at redhat and just so we know like what a common understanding about a learning path is a learning path is a curated collection of content directing users to learn more about a particular topic or product. So the idea is that any content that can be a part of a learning path like an article or cheat sheet for example they're indexed into the learning paths data lake collection so we have a separate collection from the products collection in mongo db and then once that content is indexed it can be referenced and displayed in the context of learning paths so the way that we're doing this is we have a learning path content type and a resource content type learning paths reference resources in an entity reference field so we're using in this screenshot here we're using the entity browser contrib module for a nice UX so that you can see card views of existing resources within your site and then you can create a resource in the little tab there at the top and so the resource content is just a container for data lake content we're using the external data source contrib module so it allows us to create this autocomplete field and you can see we're looking for a title that's based on title but it shows you like the a little snippet of like the origin site that rhdp is the developer site we see what type of content it is what language it is in the title and when you click on it then it puts this universal or this uuid here for the data lake into the text field and that's what we're using to then pull that data we also have we're using the automatic entity label contrib module and a custom token so that when you save this resource it's going to use the data lake title as the title of the node so that anytime you update that node then it will match and we're also using the allow only one contrib module so that we don't have tons of resources referencing the same data lake data we should only have one a one-to-one ratio there so then if we look back at our example that I showed previously we see we now have like a hero and a sidebar that are driven by the learning path content type and then the data inside there is coming from the data lake so to display our data lake data in carding contextual views we're using a preprocess hook so we take that uuid we query um mongo db we get that data back and we add it to the render array so that twig can use it however our front-end developers want to show it the way that we're doing this is we have an internal contrib module that is pulling all this stuff together in a consistent way so we've got the the main points are that we have the ability to set our schema in this module so anybody that is putting content into the data lake the learning pass data lake has the same schema we're also we have a service used to query the data lake so if we ever change and add a different way of querying if we're not going to query directly we add a graph ql here we can just swap out that service as well um and then just reusable code so we've got a controller like you were seeing before with the the hero in the sidebar we have a controller for viewing resources within the context of a learning path and that will allow us to reuse resources on different learning paths so you may have like some setup instructions or something like that that can be a part of any learning path and you can still see it in the context of each learning path that was pretty cool to work on and then the event subscriber that she was talking about before we're using to pre-process the data and make sure it looks the way that we want to before it's indexed and we have a couple of constraints and validators just to make sure that when you're referencing content you know does this is this uuid actually correspond with data lake data those sorts of things and so yes that's a lot of stuff for just like showing a learning path um but the idea is that what we did last month was to integrate the hybrid cloud site with the same module so that now you can create a learning path on the developer site and reference content on the cloud site and vice versa you can create a learning path on the cloud site and reference articles cheat sheets ebooks from the developer site um so this was basically a first proof of concept of how we can share this data across Drupal Drupal sites and we have other teams at Red Hat that are interested in continuing to expand the capabilities of this uh shareability all right so our future plans and i wanted to make sure that i called out um that actually the customer portal is looking to reuse some of this learning path stuff that she's been building as well so when we talk about it being an internally contrib module that's something that we coined because we have so many Drupal instances we have kind of our own ecosystem internally but we're always looking for ways that we can contribute that back whenever possible which we'll show you in a minute so that for the customer portal side of things things we're looking at doing our standardizing our product taxonomy there's we have product taxonomy on most of our sites right now but because they're maintained manually they're all different all of them and it's it's it's really getting out of hand so uh when i say that when i say that we're doing this one thing to keep in mind because the customer portal is at one subdomain access at redhat.com but we're currently in the process of splitting it into seven different Drupal instances and in order to make that feasible we have to have a way to share a lot of these this different information so that's one of the reasons like you look at this and you're like this is a lot of work to share this stuff when a single Drupal instance could do it and that's true absolutely if you look at what we built it's completely overbuilt for what we've done so far but it's the foundation for what we're going to do later one of them being product taxonomy and once we get that available sites like developers could use that shared taxonomy module as well on their site and standardize the presentation across their site as well another thing we talked about doing is integrating other systems at redhat like our subscriptions information and our let's see again more more information from product lifecycle so we can do things like when a product gets and becomes end of life or an end of life date is announced then we can use our notification system on the customer portal to say hey did you know that this was just announced here's detail on that here are some articles on that from our site so we can create this really great customer experience for to get people again get them what they need when they need it oh this is still me sorry the other thing we wanted to do is i i was calling it content syndication but that sounds like rss feeds which is not what it is it's really patterns which you basically create content store the html the rendered html in a data lake object and then you could actually reuse it across multiple Drupal sites so you could have someone create like hey this is a this is a new feature of this product and you can show it on all the sub-product pages that it goes with and again in a normal Drupal instance this would be like a block like a simple block with an entity reference and you're good to go but because we have multiple Drupal instances going on this is a way to do that yeah then as far as the learning paths we're looking at like learning path discovery like she was talking about on the customer portal if you subscribe or purchase a product it would be nice to know where you can learn more about how you can use that product right so being able to share that data in that way we're also looking at adding the graphql layer or sub graph layer similar to what they've done with the single page apps so that it can be used by single page apps if someone wants to use it in that way and then i think just as far as like enhancing the learning path stuff gamifying learning paths like making it so that you can actually earn a badge or something when you've completed a learning path that data may not be strictly put into a data lake but like something that we are looking to enhance our learning path experiences so we went ahead and we actually did this presentation at Drupal con florida Drupal con florida is Drupal camp it is not a con it's a camp Drupal camp florida and one of the main questions people asked is where can we see this code and i was like you can't so i did this this is a sandbox module on drupal.org you can absolutely use the short link that i put up there it's not going to like ask you for anything or track you it's really just to make it shorter but also i put the long one at the bottom here which is the actual sandbox link to the module i will say caveat it is untested i took out all the red hat isms from it but it also means that you know we're listed as as maintainers so you guys can connect with us that way if you'd like to absolutely in the spirit of Drupal we are accepting contributions but within this module it shows you the event subscriber it shows you the the custom database backend there's also some example code in there of how to implement a schema how to get that into the to the data lake the way you want it there's some really good read me articles in there about how to set up a mongo db instance on your local to get it a proof of concept for yourself if you wanted to present this to someone else is like a hey this is a great idea i like this idea um and again we're we're available for conversation and and help whenever you guys need it yeah so these slides will be available and we put our resource links in these couple pages here um so there is um so thank you to red hat for letting us work on cool stuff and for sending us here to speak thank you for drupal con for selecting our session and thank you all for being here this early in the morning for our talk anyone has any questions anyone has any questions shout them out i don't know raise your hand whatever we've covered it all okay yep okay so the question is are we indexing to another search solution and indexing to solar as well um the answer is yes yes we have our our search backend is um also being indexed so they're just separate indexes in search api this is our proof of concept to say hey come join us yeah so that was the question was uh are there non-drupal products putting data into the data lake and we're not doing that at this moment but you know expansion this is the foundation i think we had a hand up back here so the question is well why did we go with mongo db i will say it's because jason smith said it was a good idea and he's sitting right there so if you want to ask him a question um no it was it was available and we had we already had architecture in place to do it it made it a really easy choice um i can say that with all the different back ends for search api you could totally do it with a different back end if you wanted to but this is also a way to it was it was non-complex to get it set up and we needed that we needed to take some complexity out when it was already so complex yeah so the question is are we looking to further or continue to use single page applications or spas to be the front end of our website and keeping drupal as the content management system you get to repeat the questions and i get to answer them yes um so the answer is sometimes uh we have for the customer portal we have a lot of single page applications in place for various reasons and so i would say that it's a case by case basis sometimes it makes sense sometimes it doesn't and i don't want to do a blanket statement and say we're doing the coupled all the time no matter what because sometimes it just doesn't make sense but sometimes it does so in situations where we needed to be highly performant and highly available things like that we tend to go towards spas and i've built highly performant highly available drupal sites before um so not to say that drupal can't but i would say that it's a lot more accessible for our our front end developers because it's a lot easier to find a react developer than a front end drupal developer and we have a lot of those um and it just makes it it makes it easier to cater to the skill sets of the people we have so kind of back and forth i would say and uh and and again we're not we're not looking to be like this like like thumbs down on drupal all the time it's more of a case by case basis you know the the follow-up question is uh can we share a case where a spa makes more sense is it okay a spa makes more sense yeah so we have um we have listings of like vulnerability information which would normally be probably a view in drupal but it's really easy to do that in a react application so that's one that's one that i can think of uh there's also a case where we have um our we call them vulnerabilities but we're revamping those two called security bulletins which is basically red hats response to a cde those are staying in drupal and they have elements that are uh asynchronous like a thumbs up thumbs down feature commenting those kinds of things those are asynchronously handled with javascript but the page itself is rendered with drupal and those will stay that way for now for the foreseeable future let's say it that way so those are two different situations more questions so in the back okay yeah so the question is um why did we choose to use mongo db instead of a relational database like oracle or something like that so one of the things we needed to have happen in the in the data lake is we needed it to not be not be opinionated about the data that goes into it because we have like i said the primary schema and then the secondary and it's just to store those objects as one piece of content in mongo db is a lot simpler to just say here you go there's no maintenance on the database side other than keeping it up and adding users as necessary the schema itself is maintained by us externally and then in we we basically the drupal instance enforces the schema that we create so it takes out a layer so if we need to change something we don't have to go to another team to do it not again not that we can't do it ourselves but it just makes it we can pivot more quickly and we needed that for this proof of concept so that's one of the reasons another reason we chose mongo db scalability uh is another one in performance yeah so the question was if there was a search api related module for mongo db would we have used that instead of our homegrown solution yeah i don't think that there was anything for a back end um to index and to mongo db so yeah that's why this was created for sure um i don't know if it's something that could be additionally added to the mongo db module itself or if it just makes more sense for it to be this standalone project and becoming a contra module instead of a sandbox yeah yeah and really the only reason there well there are two reasons one we needed the back end but the second one was to give us an event we could we could subscribe to to apply the schema in the first place um there are some things i mean you can do this with the if your schema is flat then you can use search api to do it because search api will let you you know you can do standard mapping through the ui just like you normally do a search api but we had some we had some uh i really didn't want drupal to leak out into my data at all and we all know that drupal tends to leave bits and pieces of itself all over the place uh with uh i mean drupal seven right like language none zero value situations so uh those kinds of i wanted the data to be i wanted if someone looked at our data i wanted them to have no idea where it came from i wanted them to be like that's that data looks great to me and and not have to know anything about drupal to access it so i was i was opinionated about how the information got in so that was one reason um but i i wanted to be able to have full control over because i didn't know how people would use it like april's project i didn't know how she was going to use the schema so i basically was as i was trying to be as unopinionated about that schema as possible when i created the module yeah any other questions do you mean internally contrived modules or yeah so the question is how do we manage our internal contrived modules that's probably another whole session um we do have a satis server um yeah again jason smith is here we probably know more about these details but we're adding we add basically a repository in our composer and we're pulling those packages in similar to a drupal country module it's just it knows where to find our modules you know and then i have a process where we create a tag for the module and it creates the package and it adds the version information just like drupal does um yeah thank you to our digital acceleration team at red hat for those fun technologies to make things easier customer portal they are sharing so the the he asked if the if the sites were sharing the same module we're using uh an internal indexing module that is similar so it's you know the learning path modules built on top of that to say okay this is how i'm indexing using this module here's my schema and all of that and how the learning path stuff is going to use that index process but really the index module is the mongo db module that you shared right yeah it's this one yeah yeah and the the examples that are in here are actually uh i took some of it like i said i took the red hat isms out of it but the example event event subscriber that's in this module is pulled from the one that i made so if you look at it you'll see the primary data the secondary data that kind of stuff in there it also allows you to do things like our graph ql team didn't like the idea that because search api is really geared towards solar most of the time so it requires all of your keys like your your object keys to be lowercase and we wanted them to be camel case for just because we wanted that so when you're doing your schema like like mapping and stuff like that on the back end you can actually you can do that and tell it to be to be camel case and do some pre-processing that's actually how i get the locales information in for translations because the way drupal stores it right is a single node but if you're indexing it was search api it'll index them as individual records but we wanted them to be one record so we actually have one record and then we have a locales key like a property in there and anything that's translated goes under there so that's how we that's how we architected it but we needed it that way so that's what we did and that's that's actually in there too i left the locales mapping i'm pretty sure i left it in there so if people wanted to do translation they could yeah yes translation for for what we're doing here so the question was to be trans have we thought about translating our application to allow for other locales other languages and this supports out of the box whatever you tell it to do so if you want your properties to be in french your properties will be in french if you want your primary drupal well basically whatever your primary language is on on your site is what your scheme is primary languages so you have full customizable control over how that data goes in so for us our locales or our languages we support on the customer portal are english as primary but we also have korean japanese and chinese and uh and we're actually considering adding french actually yes and uh but that doesn't mean you can't flip flop that and make french your primary and make english your secondary you could absolutely do that yeah mm-hmm so the question was i'll do it this time okay you can take a rest uh the the question was are we considering using this data lake for all for everything or is it just for customer facing and i think for now we're starting with public information because it makes it easier to protect the privacy concerns that we talked about uh we are talking about an access layer to protect like who's allowed to access what um but that's still in process and we didn't want to delay the development to get that in place so we started off with public information um and it's it's the classic thing of you can tell someone hey we're thinking of doing this and they're like that sounds great come back to me when you've done it um which now we have so we're actually getting a lot of people that are interested in it because we actually have proof of concept between april with her uh presentation with the learning paths is a completely separate and different implementation than mine but they're both using the same same concept and both of us have been getting lots of people saying hey i want to learn about that we're like hey you should go listen to this presentation we did but yeah so that that's what we're starting with i have a i have a feeling it'll expand over time but uh we just wanted to get a proof of concept out the door and then build upon it i'm looking at jason smith does red hat hat the question was does red hat have other data lakes we have other types of data like repositories they're for different purposes um and i would say that they were probably considered to be closed and should stay closed for the use case that they're they're made for this is specifically engineered for public information and specifically engineered to unify our public facing sites so that's why we made this one yeah and we are at time so i want to make sure i honor your time we are absolutely here for questions if you guys have more but otherwise enjoy your hookah thank you for coming thank you