 Hi, and welcome to the Open Voice OS presentation. Today I'm going to introduce you to what Open Voice OS is all about. So what is Open Voice OS? Open Voice OS or OVOS, as we like to call it in short, is a full updating voice assistant stack that's completely free and open source. The OVOS ecosystem consists of open voice packages, services, and multiple frameworks. It also consists of free software released by third parties, such as skills and plugins. The OVOS stack can be referenced as a framework or an entire operating system for embedded devices. But before we dive deeper into the two concepts above, let's dive a bit into the history of how the project came into existence and what its internments look like to get a better understanding. So how did Open Voice OS actually start? Open Voice OS actually started as a reference image based off of heavily modified and patched Microsoft stack on an embedded build-out system for do-it-yourself Raspberry Pi based smart speakers. Microsoft community developers got interested in supporting more platforms, architectures, and environments that Microsoft AI as a company was not in support of. Some of the top community contributors to the Microsoft project figured it would be best to soft-core Microsoft and build a new ecosystem around it, which would not be under direct control of any organization and would be a community-first project. Now let's look into the internments of Open Voice OS and how this ecosystem works. Open Voice OS consists of a Python-based core framework called OVOS. The framework is responsible for providing distinct services, some of the important services provided by this framework are the WebSocket Message Pass Service, a Skill Service, an Audio Service, a Speech and Graphical Use Interface Service. The first and most important backbone of this ecosystem is our WebSocket-based Message Pass Service. The Message Pass Service provides an unsynchronous communication layer between all our services. It is an event-based system where services can emit events along with data globally and other services can choose to listen and react to these events by utilizing the data and also having the ability to respond back to the original service. Everything in the OVOS ecosystem utilizes Message Pass to talk to each other. The second most important service provided by OVOS core is our Skill Service. The Skill Service is the home to our many voice interface applications we call Skills. It is also a house to our intent parsers and provides a Skill Manager interface to manage the life cycle of our skills from initialization to shutdown. Let's see how the Skill Service works. A user initiates a voice query which is magically translated to text by the Speed Service, which is then picked up by the Skill Service. The Skill Service on receiving the text input activates an intent parser which is responsible for looking up the initialized skills table inside the Skill Manager to find the closest intent match. If the intent match is found in an initialized skill, the skill will output its response to our text-to-speech service. Simply, the Skill Service handles all communication and processes that take place between the speech-to-text input and text-to-speech output life cycle within the OVOS framework. Now, let's look into the Speed Service and what it does. The Speed Service houses our wake word engine API and our speech-to-text engine API. It is responsible for everything within the speech input process. So how does it work? Our speech service provides an always listening wake word engine that is responsible for activating our STT API once it recognizes the registered wake word spoken by the user. On wake up, the Speed Service starts recording user input and forwards it to our selected STT engine for transcription. Upon getting a successful transcription, the text is forwarded to our skill service as we saw in the previous slide. The skill service will handle all operations from there. Moving on, let's look at the audio service. The audio service handles our text-to-speech engine API and is also home to our multimedia framework. It is responsible for everything in the audio output process. Now, let's go through the audio output service. User initiates a query, tell me a joke. The Speed Service forwards the transcribed text to our skill service where, if it finds a match to our joke skills, the joke skills reaches out to an external API to formulate an audio response dialogue that is then forwarded to our audio service. The audio service converts the text to an audio clip and is selected by using the selected text-to-speech engine. The audio clip is then played back through the configured output that could be a speaker hardware device. Now let's look at the exciting GUI service. The OvaScore GUI service is an API for skills to be able to output information on screens. This API allows skills to show QML-based pages with data for any particular query. The GUI service is responsible along with the skills displaying the data for life-cycle management of the skills display. Let's see how this service works. When skills have found a match to a user query and can output audio, the skills can also have the ability to display this output using the GUI service. The GUI output can also be used to display a failure if it is not found by the intent process. We also have some external important frameworks and services that build up the OvaStack. The first one is the OpenVoice OS shell and the second is the OpenVoice OS Chrome and PlayMark Media Framework. Now, let's look at both of these in detail. The OpenVoice OS shell is a Kirigami-based application running on top of EGLFS for embedded devices. It is a graphical front-end UI that incorporates the multi, incorporates the micro of GUI protocol skill view inside the application along with a few features such as drop-down menus, volume control, brightness control, and additional system settings. The shell utilizes a custom Kirigami platform theme plug-in to provide our skills and use interfaces with global theme. The shell additionally integrates with KD Connect out of the box. It allows users to pair their smart speakers during December to shell with KD Connect and also control multimedia playback via Empress integration. Let's see a quick demo of what the OpenVoice OS shell offers. Now let's look at some of the features the OvaShell provides. As you can see in the video, we have a drop-down menu that can be used to quickly access stuff like network and wireless settings. In the network and wireless settings page, you can easily change your current network settings or connect to a different network. As smart speaker displays are considered always on displays, OvaShell provides quick access to night mode. The night mode can be activated by tapping on the left-pill button on the left edge of the screen. OvaShell provides you with quick access to voice applications. Voice applications are a concept where skills provide a home screen. These skills let you directly interact with the skill information using only a touch screen when you don't want to use your voice. OvaShell provides another feature called quick access dashboards. The quick access dashboard can be activated by pressing on the right-pill button on the right edge of the screen. This dashboard provides a view of cards. These cards record your daily activities. Double tapping a card will activate the activity and more activities can be added to the dashboard. OvaShell provides full rotation support for vertical orientation, horizontal orientation, flipped vertical orientation and flipped horizontal orientation screens. This setting can easily be accessed from our drop-down menu and preferences will be always saved on reboot. OvaShell provides full support for global color schemes. The custom theme engine within OvaShell is powered by Kirigami platform themes and cute quick control style. One can easily change the color scheme from the Advanced Customization menu. The theme engine by default supports dark and light color schemes and also allows one to easily create a new color scheme on the fly. All skills and user interfaces developed under OpenVoice OS follow Kirigami teaming. This also helps us ensure our skills are usable on all Plasma platforms. OvaShell also provides a full shutdown menu. The shutdown menu lets you decide whether you want to restart only the OvaServices, the entire shell or the entire system. Now that we've seen what our OvaShell can do, let's look at our multimedia framework. OpenVoice OS call and play is a fully fledged voice media player framework handling voice integration and playback functionality for multimedia-related skills. It is responsible for handling all media-related queries such as play a song, play a video, stop the music, play the next track and more. The framework can be extended by media providers and special OCP framework subclass skills. This framework uses fuzzy matching to find the best match to your media player request. Let's look at a short demo of the framework. Hey, Microsoft, play some gorillas. Hey, Microsoft, play the next song. Hey, Microsoft, stop. How is OpenVoice OS code different from Microsoft code? Microsoft Code is based on monolithic architecture. Due to its current architecture, it supports limited speech-to-text and text-to-speech engines as they are hard-coded into the Microsoft Code API. Microsoft is also deeply tied into Microsoft AI's online saline backend. This means devices will require pairing to be even remotely functional. Overscore, on the other hand, is based off a microservice plugin-based architecture. At OpenVoice OS, all our speech-to-text engines, text-to-speech engines, live outside of the Code API. This allows us to be fully pluggable into any speech-to-text and text-to-speech engines available out there. OpenVoice OS code framework is local by default. This means it's backend-free and does not require an online backend to be fully functional. We optionally also provide compatibility to micro saline backend or give you the ability to host your own self-personal backend service on your local network if you wish to run something off your device. Let's see what makes this pluggable infrastructure possible. OpenVoice OS incorporates our plugin-manager system, which can be used to search, install, load, and create plugins for the OpenVoice OS ecosystem. Every external service or engine that lives outside Overscore is a plugin loaded by the plugin manager in the Overs world. We support multiple types of plugins, make-word engine plugins, speech-to-text engine plugins, text-to-speech engine plugins, intent-parts engine plugins, and hardware abstraction plugins. Now that we have explored the internals of the OpenVoice OS, let's look at how we see OpenVoice OS as a framework. OpenVoice OS, as it is structured into a pluggable infrastructure, allows us to incorporate Overs into other projects by utilizing only a subset of the services to meet the project required. It, as a framework, for example, allows us to incorporate voice assistance technology into projects such as Plasma Big Screen. Developers can also connect native Qt applications and serve them as voice interfaces using the LibQ microflibrary. Let's see a short demo of our framework running on the Plasma Big Screen image. Hey, my croft, what is the current weather? It's currently clear sky in 12 degrees Celsius. Today's forecast is for a high of 26 and a low of 10. So now we've seen a demo of Big Screen running OpenVoice OS. Let's understand how we see OpenVoice OS as the building blocks for an embedded system. We see OpenVoice OS packages and shell as building blocks for an embedded platform where you can choose to run on bare metal without requiring things like a display and window manager, especially useful for device environments like smart speakers, voice satellites, and magic mirrors. OpenVoice OS as an operating system stack utilizes a plug-in-based hardware extraction layer called DoVosVal. It supports both user land and direct access to the underlying hardware. Some of the hardware abstractions you can find in this pluggable layer are support for different audio boards and mic RA hardware, screen and display management support for TSI and HDMI-based displays, GPSD hardware support for location and positioning. Now let's look at a quick demo of our OS stack running on the Microsoft Mk2 DevKit. Hey Microsoft, what's the weather like? It's currently clear cloud in 17 degrees. Today's forecast is for a high of 20 and a low of 10. Hey Microsoft, what date is it? It's September 4th, 2021. Hey Microsoft, what's the time? 11.30. Hey Microsoft, tell me about Elon Musk. I'm checking what you see here for Elon Musk. Elon Musk came on June 28th, 1971 as an entrepreneur and business manager. Hey Microsoft, tell me more. He is the founder of Sanidon Chief Engineering Spaces. Former state investor Sanidon product architect at St. Lawrence, founder of the Boring Company and co-founder of Neuralink and OpenAir. Hey Microsoft, set the timer for two minutes. I started the timer for two minutes. Hey Microsoft, set another timer for five minutes. I started the timer named Timer 2 for five minutes. Hey Microsoft, cancel all timers. Two timers have been canceled. Hey Microsoft, play Denny Vera, Roller Coaster, Audio Only please. This one moment while I work for that. Hey Microsoft, set volume to 80%. Volume updated to 80%. Hey Microsoft, set volume to 30%. Hey Microsoft, stop. At OpenVoice OS, we are targeting all platforms for integration. From embedded headless devices, single boat computers to do-it-your-own-self smart speakers. We also are targeting projects providing TV desktop and mobile interfaces. OpenVoice OS is the open community playground for all platforms to come and experiment at. You can try us out either incorporated or in a running system or standalone in multiple places. We have images available for the Raspberry Pi 4 boards with three speaker hardware to Microsoft Mark II dev kits. You can also find us on the Manjaro plasma big screen image. Finally before ending, OpenVoice OS is a community-powered project that contributes from various parts of the globe working in various industries. We have a bunch of set of lead that maintainers for certain parts of our framework. But as a new project, we are constantly looking for increasing our member count and increasing contributions. If there are any features or integration or platforms you would like to support voice interfaces with, come join us. We are very active on our Matrix channel. You can find links to us on the next slide. Thank you. Do you find yourself using your open smart speaker as a daily driver on a regular basis? Yes. Do you want to elaborate on your daily workflow? Yes. So basically we use it on a speaker-based hardware for Raspberry Pi units where we use it for weather updates and use it for activities. Activities is a special platform where you can program a set of activities that you wouldn't want your smart speaker to do, like stuff like turning off the lights, home control, and also giving you information about the next bus update that's available locally. Some of these services are well-amined APIs, and we have provided a proxy service for these as well as we allow users to add their own proxy service. Thank you. Philip in the online chat asks, is the Mark 1 kit capable enough to run OVOS? Yes, it is. It won't run a Graph2 user interface, but you can use our images for Mark 1. We don't have images right now, but you can use our stack for running on your Raspberry Pi inside your Mark 1 devices. Just because I know someone is thinking it, when am I getting my Mark 2? I know it's not your problem. We've heard that Mark 2 released stuff on 19th September for early backers. We're hoping that the Mark 2 reaches people in October by the time everything shifts. Sweet. Any other questions? You can replace the OVOS with OpenOS OVOS. I don't see anyone else having questions from the in-person audience, nor the online one. Thank you very much. Thank you.