 No worries. Hello everyone. And yeah, I'd like to show you a recent project we worked on under salsa. So we had a project where the client wanted text to speed feature implemented on the site. So we opted for Amazon poll. So let me share my presentation and talk to you about Amazon polly and how we made like how we integrated with the triple site. So let me start the presentation. Is the text clear? Yep. So my session title is using Amazon polly to convert text to speech in Drupal. So the agenda is initially just, yeah, we'll keep it short and sweet so we'll just talk a bit of like an overview of Amazon polly and then the solution architecture. And I'll demo to you how it looks in the real side and then happy to take any questions. So what is Amazon polly? So it's a service offered by Amazon under the package of AWS services. It converts text to speech using deep learning. And it's really, it was my first interaction with it and it's really quite good in terms of the features as well as ease of use and also the, yeah, it really provides a nice way of converting text to speech. So the main features are that it supports multiple languages and accents. It doesn't, it's a limited list. It's not every language or accent, but it's really extensive. It covers, I think the major live languages, like world live languages, so it's good enough. And they offer two options. One is they call it neural mode. And that's more lifelike speech. So it sounds as natural as possible. And then there's a standard mode which is more robotic, if you could say. And, and both, yeah, we come to pricing that those this two modes have two different pricing. The neural one is a bit more expensive. And then the good thing about Amazon polly it integrates well with other AWS services so you can integrate it with Amazon S3 with lambada if you want if you want to create a back ends like surface less back end using lambada. And also it offers a lot of customization options for example, you can add some, let's say the way you pronounce certain words you can customize it you can add poses to certain places you can emphasize on certain sentences. So it provides a huge list of customization options. And you can, you can choose, for example, if you want male voice female voice. And as I mentioned the accent, not only the language but also the accent for example English you have the American English, the British and then the Australian English for example and it's quite cost effective because they charge per character. So a million character which is like it's quite huge is like a narrating a 23 hours movie, you could say it's about $16 if you go for the neural mode, and it's about $4 if you go for the standard mode so it's really cost effective and the good thing is, they don't charge you for the playback so once you convert the text to speech it sources as an MP3 file, and that MP3 file you can either store it in your on your like if you have your own data center for example or you can sort in S3. So the playback is not charged they just charge you the amount of characters you convert to speech. So that's really cost effective if you have a news article let's say three pages, it costs about 16 cent American dollars, and maybe three cents, if it's really standard mode so it's really it's quite cost effective. And that's one of the reasons the client went for the solution. So, just to give you a solution like overview about the solution architecture so what we have is here the Drupal site, and we created a custom module that you know basically when you save the node, it will take the content, and then create a Q item, and using cron will process the Q item and when you when you post the item, we are basically sending the content to Amazon poly, and Amazon poly then converted to the speech and stores the file as MP3. You have some options you can sort as MP3 or different audio formats, and then Amazon poly stores the file in S3 in our, in our case we use that approach but in other cases you can store it in your local if you want. But in our case we thought it's easier if you just start in S3. And what happens is, once the file is converted, Amazon poly returns the file URL to us, and then we store that file URL within the node we have a field in the node, and the file will be attached to the node that way. So when you load the node and the front end, we check if that field has a value, basically the file URL, and then we basically play the audio. So that's how it works. We initially went with this Q approach because we thought the processing might be a bit extensive, and maybe to take time so we didn't want to really have a performance overhead there. But later on we find out actually it wasn't really a big concern because Amazon poly is quite performance, so it really processes, like we tested several like, or different lengths of articles and usually the file is ready within like a second or less. So it's really performance, we didn't have any issue with the overhead because everything is happening, the processing is happening in Amazon poly. So we're not really processing anything in our Drupal site. So the Q option was offered that eventually right now we went just for the immediate conversion. So once you said no, we send the content to Amazon poly, we get it back and then that's how we we process it at the moment, but the option is there, and I'll show you later on the demo, how you can configure the options. So, yes, the process as I mentioned now, once you load the node, you save the node, and then you get the content fields, you can specify the content field using a custom view mode we created for the node. So you can specify which, because not all fields, you know, like, we don't want all fields to be converted to speech. For example, if you have an image field, there's no point. But you can specify which field they say the body field or description on any other text based field, and you can arrange also the ordering of the fields for example if you want the body to be pronounced or narrated before the description you can do that by arranging all the fields in the view mode. Also, we strip certain HTML tags from the content because, yeah, not every HTML element, it's worth converting to speech for example images, or let's say video. So what we do is before we send the content we make sure that we are just sending mainly text content, and not nothing else. So we do strip certain HTML tags from the content. And then Amazon Poly supports a kind of markup language called speech synthesis markup language SSML. And that allows you basically to control how the voice narrates the content. For example, we can say, after each list list item, add three second pause for example, so you can say look at that, that point of the text, can you add a pause for three seconds when the text is narrated, there will be a pause. So you can control basically how the text is narrated using the SSML markup language. And, as I mentioned, we have two options when we send data we can immediately send the data to Amazon Poly when we save the node, or we queue it, and then later on we can process it via corn in the background. And then once we get the S3 object URL, we save it to the field in the node. In the front end, we have a custom block so that allows you to basically place the audio player anywhere on the page. And also we use a really good JavaScript library called media element JS. It's really full-fledged audio and media player in general. You can use for playing video files or audio files, and we customize it. We created our own audio player using the media element JS to, yeah, to just fit the styling or the branding of the site. So now let me just quickly demo to you how it works. So let me know if you're seeing my browser. Yes, actually. So let me go back here. So can you see the site now? Yep, it's test page. Yep, test page. This is a GovCMS test site. I set up on my local. And really Amazon Poly, just to show you the interface or how to manage the Amazon Poly in the EdgeOS console, you can see it doesn't offer a lot of options here because it's really, the idea is to really be simple to set up. So here basically just to test the feature, but you don't have any additional, you don't have to configure anything. The only thing you need to do is basically create a user, IAM user, add it to have access to the Amazon Poly service, and then use the same user and add it to the S3 bucket, create a custom S3 bucket, add that user there. So Amazon Poly can actually store the file in the S3. So that's all you need to configure in Amazon Poly really. There's nothing much, you can add some customizations, the one I mentioned just now. So for example here you can add some lexons, for example how to pronounce certain words or you can, if you have for example acronyms, if you want how to pronounce those acronyms you can specify them here. And this just shows you the list of recent tasks that they conversion tasks basically. So that's it. And this shows you the engine. As I mentioned there's neural and there's a standard. And then it shows you also the voices, because you can specify different voices based on the available language. So for example here for English Australian, you do have the options of female and male. And then here you can specify the neural mode or not. So just to give you an example how it works. So this is a neural mode. If you listen to it. Hi there, my name is Olivia. I will read any text you type here. You'll notice it's a bit more natural than if we go for the standard one. So the standard one, let's say Nicole here. Hi there. My name is Nicole. I will read any text you type here. Yeah, so it's that's the differences. And, yeah, and here is the s3 packet where we store the empty files so I'll show you how it looks in the front end. So basically add a content. And let's say I'll add the test content about Atlantic Ocean. Let's just cover some text from here. And that's it. And we just just publish it. One thing to note is, in the current implementation, we are really not checking the publishing state. We can configure the module to save the file every time you save the node, but our assumption is we want to leave it, you know, up to the client and the tutorial process to determine one to create the audio file. So here, as you can, as you can see, we have this field text to speech. So you can enable this field. And this will make will convert this text into the audio file. So once I publish this now, you'll notice now we have listened to article button appearing above the content. We have to wait for a second just for Amazon poly to create the file and place it here. So if I check the list now. So now this is scheduled as you can see it's in the, the queue and it should be completed shortly. So it's already completed. So it's really quick. And once I play the file. Atlantic Ocean. The Atlantic Ocean occupies an elongated as shaped base and extend that's that's how it works. So the settings so you can set the text to speech feature settings in two places. The first place is in the content type itself we added a third party configuration, and this third party configuration allows you to either enable text to speech for all the content from this content type. Also you can allow an override so basically if you enable this, and you uncheck this box that means content editors will not have the option to disable the text to speech feature. So if you allow this way that means they will have the optionally they can enable uncertain content content items if they want to and then this is what I was talking about about the different processing mode so you can have either to convert the content to audio immediately or just and check this if you want the content to be queued and then processed in the background by Chrome. Another setting is just like API settings for example the AWS user credentials so I won't be showing that now, but it's, it's, it's a general settings here we can specify the AWS user and the credentials, as well as you can also choose. I think that's why I can just delete the user later on so I'll just show you quickly. So you can specify the access keys and stuff, and then the bucket is three bucket, as well as wish language and voice generation mode whether it's standard or neutral. And then you here you can also select the available voices. So, this way. Yeah, I'll show you quickly the custom view modes we created for this content type. So if I go back here. I'll just show you quickly. We have this remote extra speech. And that's where you basically specify which fields will be sent to the Amazon policy so for example if I want the public state to appear I can do it I can just drag drop here and save the view mode. Also I can place it above the body. So this gives you the options if you want to not really follow the actual layout in the front end. You can specify different layout here when the text is narrated. There is a custom widget we created for this. The reason we created custom widget is we didn't want to have two different fields for the audio URL, and also the option to enable the text to speech feature so we created one field and that field actually uses double serialization API to save the three values which is the enable checkbox, enable disable checkbox and the audio file URL, and another thing we do check as well is we compare the content. If let's say an audio file is generated when we save the node again we do a comparison to make sure we are not generating the regenerating the file and necessary we compare the content. See if there are any changes there if there are no changes we're not we're not going to be regenerating the file but if there are changes then we will generate the file. So we do have we do keep a hashed version of the content. And then we compare it when we save the node to make sure that we are unnecessarily not you know regenerating the MP3 file. And yeah that's that's about it. I'm just making sure that I haven't forgotten anything else. Yeah, that's that's about it so it's it was really good experience and even though there are some commercial alternatives to Amazon Poly that offers a lot of features but I think in terms of pricing in terms of customization. I think it was really good choice. And it was really good to see that we have now tools that allows you to, you know, basically provide more accessibility because when you have the main use of text to speech is mainly for accessibility also for, you know it's more user friendly because if you have a long article, you can just listen to it instead of reading the whole thing. You can use it also for broadcasting mainly this one of the use cases of Amazon Poly to support broadcasting where you can you can create the article on your blog and then convert that to audio file way and then you can stream that audio file as a broadcast. So it's really powerful feature and yeah it was fun to work with. And so any questions. I did have a few questions as we went along. I think you answered many of them which is good. I was just going to ask about use cases that you mentioned the podcast. And the other thing I was going to ask was about accessibility right so this was, what is the main intent for this for the for the client. Yeah, for the client was mainly because they was mainly because actually some people, they're reading skill, especially in non English speakers native speakers, they're reading skill might not be as good as they're listening, or in terms of comprehension I think so they they understand better when they listen to things so that's why we have this feature, especially because it was a parent from the corporate crisis that you know publishing announcements, not everyone can understand what the content is because maybe the reading skill is not up to, you know, the level X. So, yeah the audio file makes it easy for them to listen to it, and maybe that will help them understand the content better. Yep. That's cool. And I was going to ask about slang and other kind of words but I noticed that you had the lexicon there so I suppose you could add, you know, whatever terms you wanted to so that it would be there right or does it just read through. Yep, so you can add, you know you can add different lexicon and, as I mentioned also for acronyms for example, right, you can do it here. Also you can do it in Drupal, for example, where you can configure like your alarm, we didn't have that option here but we can have an option where when you send the content you can also send or use SSML to Yeah, around acronyms and stuff like that. So it's really a powerful tool. Yeah, it's cool. I had just one more question or just around again. For accessibility, can this be used to create, or I suppose it would be right to generate audio transcripts effectively that's what it is right. And it's actually supports creating for video files so you can support here. It supports creating transcripts for video files so you can, and it's interestingly they charge a bit differently for the video than the audio but still the pricing is really, really good because they do offer up to I believe 16 million character within the free type. So you have the free, free usage and on top after you finish the free user then they will start charging you so I think the bill for this particular client maybe like the month will be less than $50. Yeah, for the amount of usage they use right now. And one thing we also provided is a way to bulk generate. I believe that that module is not enabled here but we did offer another thing which is you can both generate the audio file for multiple content from here to for patch processing. Yeah, this is the last thing I was going to say is just that I think it's interesting because one of the shortcomings it's very difficult to get triple a rating for for accessibility on sites is normally comes down to the cost of getting audio transcripts for instance because someone's actually going to write and someone's actually going to spend the time to talk it through, especially when you have media releases which could be several pages or or a video right. This would be a very powerful expose cost effective way to increase and narrow that gap to get to triple a I think it's cool. Exactly. It's really cool and it's again the pricing the best thing about the pricing is bear character, and you can have pay as you go. So it's really really good like, as I mentioned at three hours, 23 hours is video or like audio file also like a million characters equals 23 hours of audio is only $16 if you use the most advanced mode which is the neural mode. But if you use the standard one is only $4 American dollars. So it's really it's really cost effective. And, and the good thing about this one is you have fully customization so you can, you can determine what you want to send, and how you can process it, and then you can also use other services in Amazon to complement the whole package. So you can do all the processing right now we're doing in Drupal but you can do the whole processing in Amazon, and you just get the end file and just publish it in your site. Because S3 is a public place, you can actually use this file in other sites, you can use it in your own let's say other services, you know because everyone can access the S3. That's what I was going to ask. That's my question before. Is there anyone else that has any other questions. Yeah, obligatory question. Since this is a Drupal module, is it available for anyone else to use or is it specific to this client or Our goal is definitely to contribute this. Yeah, this module back to the community. So that's our goal and and when we worked on it to be really focused on having it really self contained so everything, the audio player everything is within the module. So, it will require some polishing but I believe we can definitely contribute this one back and the client. Yeah, and the client is a believer in open source as well. Thanks. Anyone else. Okay. Well, thank you very much for that. Yeah, that was excellent. That's actually quite interesting and I can see so many uses for that. That's pretty cool. Stop recording here. Thank you.