 Hi, everyone. Today, we'd like to talk to you about how we classify and sort through more than 200 documents in a single bulk upload. This will be presented by Queenie and myself. Queenie is a super-duper senior developer at Morft with extensive experience in design implementation and custom module development. I am the digital experience lead in specializing in human-centered design methodologies. I find problems and find solutions for them. We at Morft take Drupal to the next level. We deliver quality UX design and development and site builds. And we add value by extending the core functionality to offer solutions for business and user needs. This whole story revolves around the project that we delivered for the Attorney General's department in Canberra. They run and maintain a number of Royal Commission sites. And a regular request that comes to them is the update and management of batches of files that are attended or presented at the Royal Commission hearings. And they normally come in like a batch of 20 or 200 files or more. So they needed a solution that would help them through this. So stating the business problem, we realized they wanted to do a batch processing of more than 200 files or documents, manage different file formats. Each document might come in in multiple formats, PDF, WordDocs, MP4s, and so on. They want to sort through documents and classify them by type and publish them in different places, depending on the type, across the site. And of course, they've got to make all of that available through a search. What are the requirements they had? It needed to land on GovCMS SAS. It needed to be done in a very simple process in the bulk upload and submit through a publishing workflow. And if there's a document with sensitive material when they go through editing it, it needed to be not visible to the world. So what's the solution? Queenie will tell us about that. Well, thank you, Andre. So GovCMS SAS, as we know, no custom modules are allowed. Therefore, we create a bulk upload content type to handle the metadata and files. Then we use the admin theme to handle the not-safe feature so we can create a batch processing for the large files. As for content types, we created two content types. One is publication, and the other is bulk upload. In publication, we display documents to user, for example, exhibit, submission, et cetera. And we apply publication type for the classification. Lastly, display it in searchable document library. Whereas in bulk upload, we capture all the documents based on the CSV record. And then we create a set of publication, one per document set. Meaning if a publication has more than two different files, for example, CSV, Excel, S, and et cetera, we are able to group them. And document are stored based on publication type for easy grouping. This means if the publication is belongs to exhibit, all the files will be kept in a private directory under exhibit folder. As mentioned about CSV file previously, we use this format to facilitate the bulk upload. The document ID is used for file grouping. Column A to E is used for publication title. Column date is used for publication date. We use publication to display in accordion, document library with facet, and table format. So what's under the hood? Both CSV and physical files are uploaded to the bulk upload content type. Then we check if the CSV format is valid. If it's valid, we'll go straight to group CSV to array. Otherwise, we'll throw an error for the editor so they can correct the CSV file. This is where we create the loop to group each document by document name for managing the file extension easily. If it's end of files, we will start the batch processing problem to create or update publication. If the publication exists, it will append and update the existing publication record. If the publication is new, it will create a brand new publication content. Then we will check, should we publish the content? If it's no, it will only for internal review. If it's yes, it will be viewable by everyone. So next two slides are the code that we mainly use. First one is group documents. Translate the CSV document to an array format for easily upload. Secondly, it's attached documents. This is where we create the file and move them to the rightful folder. All right, let's have a play now. Let's have a look on how the CSV file are formatted again. Document ID is mainly used for reference to the actual file. You can see the file name with different file extension. Now let's begin. So let's give it a name for the title of this blog upload. Then choose the publication type, for example, exhibit, and choose the hearing you want to attach. For the few document dates prepared by that can be override on the CSV file. So let's attach the CSV file. And right now, we attach the physical file. As you can see, we can do a blog upload straight away, but we have to keep in mind that it has to be less than 255 megabytes as a total. As over here, you can see those two checkboxes. One is publish and one is exclude from search for publish, which means that you want to put the publication for public viewing. If you want to check that, you can check that for public viewing. Otherwise, just leave it blank. And you can see save as, which is the workflow. And for that, we will have two dropdown. One is draft and one is review. Draft is ready for internal review approval. As for review, means it's approved and ready for publish. So let's see, at the moment, I'm choosing draft and save. This is where the magic begins. It will create and update the publication and files. This is the blog upload content display for internal view only. As you can see, each links goes to individual publication page. If any of the file does not exist in the CSV, it will create without the name convention. So let's go to round and let's have a look on the exhibit. It should not display what is was previously created because it's in draft. So let's see. Here you go. Those new ones was created, it wasn't shown. You can see it from here, the difference. And let's look at the backend for publication. As you can see, those new publication are created by blog upload and the statuses are unpublished. This is what we want. Let's have a look again. Before, I added the same blog upload and let's change it to publish and review. So I'll check the publish checkbox and I change that to review, which means it's ready to go live and click save. At the moment, you can see the publish. It's changed to yes, which means that it's already updated. And let's refresh the download page again. There you go. That's all the files that say yes to be published. Let's check at the backend. And the status are published. So this is how it works once publication is published. Thank you, over to you, Andre. So what was the outcome? The digital applications team, and Scott right there, will tell you how well that project went for them and that we've given them a tool that they were doubly waiting for and it works really nicely for them. Every project had some limitations and we hit a few of them along the way. It was a process of discovery for us. We discovered that while we, when we first started the project, the limits for the file sizes there, it was at two gigs. And then eventually we had to work with 256 megabytes. There was some, if there were some many funny characters in the titles of the documents in the CSV, we hit some problems on the error handling whereas it wasn't the easiest one to manage. Whenever we started a new bulk upload process on a new site, we needed to get the server memory up to eight megabytes to help with the processing and documents cannot be overwritten. Any temporary documents could also not be deleted. You needed to send a request to GulfCMS to get them removed. Publication titles from the CSV were turned into a node titles and that's limited to 255 characters. You can imagine an exhibit or a piece of evidence document that presented to the hearings at the public, at the Royal Commissions that have some really long descriptive titles. And that's it for us. Any questions? Queenie, perhaps if you've still got like a version of the site locally, you could walk through the site a bit more. Is that possible? There were some comments stating that the video was a bit blurry. Yeah, unfortunately, my local environment is not working. So when I managed to capture that video, I was allowed to do it. Sorry guys. Okay, so we've got quite a few questions in the live Q&A tab. So starting with Simon Hayes, is there a mechanism to apply any metadata taxonomy tags to the whole group of docs being ingested? Sorry, come again. Can you apply metadata and taxonomy tags to the whole group of docs being ingested? Yes, you can. We just need to sort of make some changes at the back and just to map them across. Obviously, you need to have that taxonomy exist. Otherwise, we won't be able to tag them. Okay, so next question from Eddie Samuilu. Does this throw the files into the public file schema in draft and then public or are they always in public? They're always in private, they'll never be in public. The reason behind this, we want to keep the file away from anonymous user when the node is unpublished or archived. That's the way that we're doing it. Okay. Timothy Cosgrove, you mentioned the 255 megabyte total upload limit. How is that determined and what's some of the considerations there? Well, that was determined by GavCMS platform. I think it's ClamAB that was restricted to 250 MB of upload. So obviously, if you can up it to two gig, then you can do that. So it's all based on the size of the file as a total. Yeah. So next question from Jeffrey Robbins. This is a good question. Given that you couldn't use any country modules that GavCMS didn't provide, how did you make it work out in the box for this work that we're testing these times? Yep. So we did it using the theme, which is the admin theme layer and we call that node hook, so which is like the node edit hook. And then when you save the file, which is the node save, and that's where we do the magic and do all those kind of stuff. So whereabouts is the code for those hooks, is that? It's all in theme. So it's admin theme, yeah. So it's not on the production sort of a phase theme, but it's on the admin theme where you can actually still do the hook from there. Okay. Next question we've got is from Michael Williams. When you say files cannot be overwritten, is that only during the upload process or does that apply once the files are left? It's only, it's actually both. So when the file is already uploaded, you can't replace it, that's how it goes. And then Drupal tends to, if they find the same file, Drupal tends to just rename the new file with an underscore zero underscore one. So that's the issue that we've got. And that's why even though it's been flagged as temporary, it did not, it did not purge out when the Chrome job was run. So we have to raise a ticket to GraphCMS support to help with deleting those files. Okay. Next question we've got is from Anthony Malka. Is this using migration processes in the background and can you do a rollback? It's not using migration process at all. It's just purely using PHP code together with Drupal code as well. I think we've got one more question here from Simon Kosek. Can you clarify the content types you created? Are the files you uploaded a media type and so they are loaded into private storage? At the moment, we don't use media at all. We just use a direct file. So the file that we have on the publication type is actually based on what the DRC wanted. Like for example, exhibit, submission, what else, Andre? They're the main two ones. So we use the publication content type to attach the files to that node. And that's where we play with the versioning. So it's really removing that attachment and adding another one instead for each of the publications. Okay, so I think that's the last question. If any of us has got any more, do you want to get them in now? What's Andre out? Yeah, do you have any clothing things you want to say, Andre or Oprin? Well, that was a really fun sort of project that changed over as we were going through it because of some of the changes to the limitations on the GalCMS with the size of files allowed. So we had to adapt the code as we went along. And we did start by using media entities at the beginning, but then as requirements evolved, we shifted to the two different content types, the publications and the bulk upload to manage the process. Okay, thank you all for attending. Thanks, we've got time for one more question. We've got 187 here. Was there any performance issues with using just private files? Yes, we do. And that's why we require to up the memory size to 8GB, so 8MB so that it helps with the processing as well. And also I think the restriction for 250MB file is help as well in the sense of that kind of file generation and stuff. Okay, so I think we're finished now. We've got about 48 seconds to go. And as we've seen from other presentations, it ends quite abruptly. So yeah, there's a question from David just to finish. If anyone else is interested in to reiterate or repeat this, how do they get more detail? My advice would be to reach out to Queenie and Andre in some of the meetup spaces in this software. And yeah, that's the key part of these conversations is contacts that you can discuss with later. So thanks everyone for coming to the session. Thank you all. Have a good day. Thank you very much. See you guys.