 So now I'm going to show you how the basic concept of the basic flow of creating a neural network So we'll be showing using the tool called Keras in Python We won't be capturing any data today. We'll just be downloading data from the internet and Then feed it through a neural network classific classifier for the training and then Bartos will then show you how to convert this to STM32 code So we'll get the data Do some preparing some framing on it? Pre-process the data to prepare some features so then be fed to the neural network training and Then we'll build convolutional neural network classifier train it using the data that we've prepared and Evaluate it against data that we've set aside So we have a data set that will be split into three different subsets where the first Part is going to be used only for the training to improve the model and then we have test data set that's it's like a New information that's unknown to the model during training and it's used to evaluate How it behaves? It's just like when you're at school You've given new questions. It's an exam that you haven't seen before So the full pipeline for a audio scene classification is you have your input signal in a time domain That's converted to Frequency domain and then further to the male domain using the log mail pre-processing. This is called the feature extraction those features are then scaled and Fed to the convolutional neural network where we get an output confidence of each class This is called the softmax layer typically found in classifiers The data set will be using so we're not going to be doing it live I'm just going to reference the code for you and I highly encourage you to go and look at the Code and try it yourself The data set is very large. It can take up to tens of gigabytes on your hard drive. So you need a good machine It was originally found in the TUT acoustic scene 2016 data set. This was a competition and attendees were given a task of classifying the sounds It had 15 different classes. So for our example We narrowed it down to only three classes. So like the classes such as bus car and train We're all merged into one class called in vehicle The signal was down sampled from 44 kilohertz to 16 kilohertz This is the frequent the sampling frequency that we'll be using with our microphones and from stereo to mono The Python code to download the data set is just load data into the development data set with the inputs and ground truth or expected outputs for development and evaluation Then we want to frame this This input signal. So the originally they came at 30 second long input Recording and we want to cut them down to one second long clips. So we get even more data we have For example, 11 1170 sample for one case For the future extraction, we're taking this one second long signal that we're further cutting down into overlapping frames apply the hand window on it FFT melted filter bank application and then we create each column that we lay side by side Then we finally apply some log scaling to it to get our 30 by 32 representation of the input sound The mail filter bank is just we take the energy and each filter sum them up and apply a log scaling to it at the end Here's the code to Work on all the input features. So we just have what's important here is the function called feature extraction it's going to call a sub module and the other code around it is just to Get the correct number of files and so on Next we want to prepare the output data. So the what we call ground truth To convert it to the one hot and coding expected by Keras. So this is Each output will be equal to the expected output of the neural network. So the confidence level of each class so here for example In an ideal world, we would expect to have a confidence level of a 100% for outdoor Here 100% for in vehicle matching the input feature in number three Before we feed our what we call the spectrograms the input features to the neural network We want to standardize them to normalize them using the standard scalar from SK learn So this will have a zero mean and a unit variance of one So all the values will be in between minus one and one if we're going into the neural network The data set split as I mentioned earlier. We want to split our data set between training sample validation sample and test samples The training and validation sample will be used during the training. We'll compare We'll use the training data set to feed the neural network So this is just like the exercise that you're doing during a class So the lesson validation sample is let's say the exercise that you're in the class and the test is the final exam So simple analogy for the neural network that will be building It has two convolutional layers and two dense layers or fully connected And can see that the input shape matches the dimension of our input features of 30 by 32 matrix and the output shape The dimension of three matches the number of classes that we want to classify We have Intermediate activation functions value This is to transform the inputs into outputs for each neuron So from the values whatever it has been calculated to zero from zero to one and then the softmax activation function is to pull the winner apart without really destroying the information of the other Confidence levels During the training the training happens in different steps. So initially the Coefficients the weights of the neural network are initialized to random values and we will have a very high a very poor accuracy So then we say oh it didn't it wasn't trained properly So it's going to adjust those coefficients to become better and better and learn This is where you do the learning on a powerful computer like a GPU or in the cloud and you're creating your model and it could last from let's say 30 minutes to several days and so on and At the end here, for example, we've reached an accuracy of 99 percent against the validation data set Then if we give it the test data set, so this is unknown data. It hasn't seen before We'd get an accuracy of 89 percent Another way of looking at how well our model is behaving is to look at something what we call the confusion matrix Where we can see the confidence the accuracy per class So what we can see here is that the indoor and outdoor classes are not as well recognized as the in-vehicle class But we still have a pretty Actually very high accuracy here Finally all of this is done usually by your data scientist and then given to your firmware developer So the data scientists should provide you with a model the h5 and some test data the model that h5 is like exporting the model the pre-trained model to Universal standard file which contains the model topology the number of layers number of filters and kernel size and The weights and the biases so this is the file that will be used in xQba i to map it to your code to your microcontroller Then we can export the data set so the test data set for example and to simple csv file for further validation on the target So here's a link to the Python code that you can go online and run this code yourself