 So, yeah, I'm Alexi and I work for Power Partner and basically I've been working on multimedia frameworks for quite a while, integrating codecs, performance optimization, power optimization, integrating devices and stuff. So, that's my contact, so there would be, if there are questions which I'm not able to answer, we could talk later. So this is the flow that I'm planning for the whole presentation. So, the initial few sections are more of introduction kind of a thing. I got to spend more time with it, we've been optimizing the hardware, acceleration or the codec integration, that's a bit less. Streaming, that's a bit less. So, how I'd like to do it is, if you have some questions in between, I'm ready to take it up. I have total of 45 minutes. So I can speak for 30 minutes and then take a quiz for 10 minutes. Otherwise, if you want to be having a quiz in between, I'm fine with that. So, yeah, so, multimedia frameworks. So, what it's supposed to be doing is, it's supposed to take care of making sure you have the best experience for your multimedia playback, your camera recording, your 3D playback, 3D recording or stuff like that. But, why is it supposed to be doing is because when Boxy was launched, one of the biggest things they said was, you don't care about what the 3D printer does after the dot is. Basically it's really difficult to do that. There are enough coding formats, there are enough file formats, there are umpteen audio formats, there are different types of latency requirements. So your video conferencing might require very hard latency requirements. So, there are a lot of variations possible. So, how can one single framework take care of that? And, StateStrike tries to do it, but it does a okay job. But, there are lots of ways we can potentially go for improving it. So that's why I said, I'd like to talk about the weeks, the performance improvements we can do inside StateStrike. So, moving ahead, so, StateStrike is the primary multimedia framework that's there now. What was there earlier was, you know, it was open code. And then they moved into StateStrike. But, there's also this talk about an OpenMax AL coming up. So, how many of you are familiar with OpenMax? Yeah, OpenMax is a set of EPS standards from an industry consortium called Kronos. So, AL is something like an equivalent to your player record ideas. So, Google is trying to see if they can have, that's what I've heard. So Google is trying to see if they can have an AL for performance critical applications on Android moving ahead. So, it's still being discussed. But, considering the problems that we have seen with StateStrike, or, like, coming up with very high performance applications, if the person here was talking about, can we do something called voice synthesis? So, for voice synthesis, one of the biggest requirements is, or, like, song synthesis. So, one of the biggest requirements for that is an extremely low-latency audio pipeline. So, which is something like what Paul Sardia does. But, if you add an audio plugin on top of it, and then you add a StateStrike on top of it, that's not the kind of latency you'll get. Or, you go for, say, a 50 millisecond video conferencing end-to-end latency requirement. You can't do it with the kind of framework that StateStrike provides. But, that's where the custom frameworks might come into picture. Or, if OpenMax AL could be a right way to go ahead, depends on how that actually pans out. So, for the feature set, there is support for multiple file formats. It integrates OpenMax codecs. And, also, like, each of the silicon vendors are looking at integrating their codecs into the StateStrike so that they can have accelerated multimedia playback. So, we work a lot with TI. We work a lot with TI and integrating the codecs and optimizing the codecs and stuff. So, OpenMax put a lot of the codecs out from us. So, the OpenMax components are also from us. So, we basically look at accelerating the StateStrike, accelerating the media playback as such, and also recording with the StateStrike component. Next thing is the recording. So, recording, you also have, like, say, regular recording, camera recording, and then you have your 3D recording coming in, your multi-view codec. There's a whole lot of things which are coming in now. And, there is streaming to it. So, streaming is, luckily, there's full support for RTSP playback. So, the record path is coming in. And, then, you have SDP streaming. So, most of your MP3 songs, your radio stations that you see, it will be either over RTSP or it will be at times over the SDP. So, this is the block diagram that a lot of people have seen. It's huge, it's scary, so we'll keep referring back to this. So, I'd like to talk about some of them, some of the blocks. So, these are the main components that I've been interested in. Like, how I do it is, I'll just give you an overview of the blocks, and then I'll actually come back and say, like, what is it we did for Skype? So, we have a Skype solution. And, I'll talk about how we've been there and customized, so you try to actually achieve that. So, if that's up to you or if you have specific questions, we can take that. So, the data source is, yeah, the source of every flow in the pipe. Even though I refer to it again and again as pipes, it's not exactly a pipe in the multimedia framework in a little bit. So, if you've seen G-stream on the Rx show or any of the multimedia PDF, like, dealt with multimedia frameworks, till date, everything was a proper pipe. Like, you have a source node from where the data flows onto the sync, the next node, and then there is other nodes. But, in StateScribe, there are more of function calls, which are called by the level one, one level below. So, the, say, the graph builder would take care of the awesome player or StateScribe recorder, would take each of these modules, and then connect them together. And then the downstream time would call the read of the upstream and that's how the data flows down. So, data source is this, all of the source elements, like MPECO data source, or any of those, take care, is derived from data source. And this is where you actually put in your slippers. Slippers are file format recognizers. So, if you want to add, there's support for NKB now, there's support for ABI, there's support for MP4. So, if you want to add support for, say, WMV, you'll be looking at the data source. So, you'll add a sniffer, there is a register sniffer function you can put. So, you'll add a sniffer, get the mine type, and see that, okay, once the file is this particular type, and then you see this particular mine type, and then this particular header, and then, okay, this is my type of file, and then, that's how the rest of the file would recognize that. Yes, this is a particular file type. And you have media extractors which actually take care of extracting and reading the data out, because each of the file formats would have its own specifics about how is that data stored. Say, ABI, just a block of data that's there. So, if you have MP4, you have more track specific things, you have multiple tracks, you have audio, video tracks, and then you have, inside video, you can have multiple video tracks that are done. So, the extractor is what understands the format. So, and then you have the media source. Media source, again, is any component which actually acts as a source for any of the downstream components of the media source. So, our OEMX codec, which is actually integrating the codecs, your encoders and decoders, is also a media source. Because with respect to the writer or the renderer, if we had a proper renderer in place, then that would be a media source, right, because it's actually generating media and then passing it off. So, the media source is that, and then we have the media buffer, which is what actually carries the data around. So, that's the data that you'll be looking at. For the data part. Why this media buffer in metadata and all that is involved, we found this when you want to tweak the framework, you have to understand the structure very well. So, the metadata is what is used for generally your time stamps, your configuration information. So, why we dealt with it was when Skype was being done. So, we had to write a note which would take care of taking data from Skype and then passing it on to your decoder by flight. So, the metadata has to be passed from the street and then, because it's SO6 first. So, you have a codec specific configuration. How many of you have background with codecs and multi-label records? So, that's a very good question. So, essentially, maybe I didn't read the description properly, but is this a lecture for writing a new codec or optimizing a codec or is it using them? Okay, it's actually not any of them because writing a new codec is gone. That's my whole company's job. So, optimizing it again is very platform specific. So, codecs are out of the picture here. Integrating a new component could be, but it's mostly about giving an overview of space time because if people had more, I could get the details, but there are hardly five people with background and aren't really a person coding. So, it's almost impossible with that kind of background again. What I can try to do is I could move back, I could do a bit more intro of that kind of a thing rather than trying to get into the details. So, if you want to talk about details, we can do that later. But, I think I should stick to the basics part. So, is that okay? It's like a paid framework or something. I'm sorry? It's like the paid framework. Paid. Paid. Yeah, is it a paid framework which is open for all? It's open for all. You can download. It's part of that as well. So, the modifications to each of the companies, like everyone has their own variant of the framework. So, if you look at the TI branch of Android, which is there in my video, it's, they are changes out there. So, they are actually making a lot of changes for adding 3D support and a lot of codec support is coming in. So, that's there. Adding support for like, new servers. So, do you want me to talk more about the components and like how the, what this picture is about? Do you think that will help? Yeah. Go look at all the blocks. Just look at, say, let's just talk about the source node. Let's just talk about the source. Let's start with the source. So, how do you start writing your file? So, from the, like, how many of you are application, okay, you are guys. So, how many of you use multimedia in your applications? Okay. So, then let's talk about from the usage perspective. So, from the user, one of the problems that you see is there's a team in our company who was trying to write an internet radio application. So, the thing is it's actually pretty straightforward to write an internet radio application. The UI is pretty straightforward. Again, the idea is to give you like 10,000 radio stations. But then the issue is about 9,500 odd stations not playing. Why? Because from the app guys, it's the parsing code for you send a request, you get a response, you parse it, you build a list. Beautiful, you have 10,000 stations with you. But 9,500 of them doesn't play. So, that's when you'd have to understand, even if you're an application guy, you would still have to understand what is it that happens underneath. So, that's why this whole RTSP and HTTP and all the coding formats comes in. The file containers, the file formats, all of that is relevant for you. So, let's talk from that perspective. So, if you take a streaming format, so you get an HTTP link, you might, you'll expect that it will directly play, right? Because SagePriot really has support for HTTP. Yes, but the thing is HTTP will give you some data. So, if it's an MP3, yeah. Assuming you are a SagePriot variant that's there on the phone that you have, has support for that particular MP3. So, if it's a VBR, which is variable bitrate, the coding might not support it. So, you have to get into like, what is the data type that's coming out of the data stream? So, how do you go about doing it? So, generally when you get to the HTTP link, so the easiest way to do it is, paste it in your browser on the PC and then get the data, and then analyze the data. So, you have a tool called Wireshark, which is extremely useful. Wireshark, wire, W-I-R-V, shark. So, it was called ethereal. So, now it's called Wireshark. So, that's an extremely useful tool to analyze what type of data comes in. So, what we saw was, some of the stations were actually in HTTP, but the content that was coming in was AAC. So, AAC is Advanced Audio Coding, coding. So, the platforms that they were trying it on did not have the AAC coding. So, that's why it wasn't playing fine. So, that gave us some more stations and then moved there to the rest of it. So, there's RTSP. So, that's the next type of streaming format that you might encounter. So, RTSP streaming, that's one of the things I would like to get into, but again, that's beginning into RTSP streaming as well. So, I'll explain what basically RTSP streaming is. RTSP streaming is like, say, if you make a phone call, you first, there is this, you dial a number. You dial a number and then you find it. So, the backend takes care of finding out where the other person is and then ringing it. So, that part is the call signaling part. So, people from the SIP background will easily relate to it. The SIP takes care of initiating the call. So, RTSP is something like that. RTSP takes care of initiating the media session. So, if you look it, when you go to YouTube, you have a small tab that says, quick 240p, 360p, or 480p and so on, or 720p. So, when you say 720p, people generally move to SD. So, the same stream is stored in different formats, in different resolutions. So, depending on your capability or your interest, say, if you have a lower end phone, you might not want to play, you might want to see the video, but not at the, maybe the fullest quality possible. They're streaming over GPRS. You'd be happy if you can get some data here, right? So, if you are on 3G or maybe if you are home and if you are on Wi-Fi, you would be, you would want a better quality video. So, how is it that the server gets to know that what is the capability of the client? So, the client basically has a negotiation session that happens, so that's what this RTSP does. So, RTSP gives it, so, again, coming back to the internal data example, we get an RTSP link, we give it to the media player and as an app where you expect it to play, it doesn't play, what happens? So, the RTSP that's there in the state spec, what it tries to do is, it tries to send a request to the server, and the server comes back with, okay, these are the formats that I support. I support X format in DCG's resolutions. Y format in DCG's resolutions. And then, the player has to decide okay, yeah, which is the best of it. The app doesn't even come to know the media framework takes care of it. So, the media framework has to decide okay. Based on my load right now, maybe I might be able to do a 1080p or a 720p or a lower resolution. So, that part is again taken care by RTSP and then the RTP part comes in, which is actually the data flow. So, the data flow is taken care by this protocol, this is like an associated protocol of RTSP called RTP. Again, there are variants, a lot of variants do it. So, Microsoft has gone and implemented their extensions, other people have. So, that's why RTSP, even though the link might say RTSP and even though you might, when you look at the specification of state spread, yes, RTSP is supported. And you expect the links to play, no, it doesn't play. Because again, the RTSP might be contained, might have a lot of codecs which are not supported by the platform. So, that's why you would, when you write an application, when you use streaming, when you use file play back, you have to understand what type of files are there, what are the different type of streaming formats that are there. What is the difference between the format, the codec and stuff. So, I'll give you one more part about what is this codec file format and stuff. So, if you look at an mp4 file, is it a codec or a file format or a? What? A file format. So, it's a file format, it's a container, right? It's a file, it's a dot mp4 file. But an mp3, is it a file format or a codec? Codec or a file format? It's both. Because mp3 can be a dot mp3 stuff. So, dot mp3 is a file format and a codec. But mp3 can also go inside mp4, an mp4 file, right? So, that's why it's a codec and a file format. Yeah, JPEG, is it a codec or a file format? It's a file format. Yeah, so, you get the hint. So, there are things which can expire itself. So, the wmd is actually a codec, there's a codec called wmd and there's also a file format, it's kind of wmd. So, yeah, but there are files, that's technically ASR. But, some people actually just rename the ASR file as wmd, so, yeah, technically yes. But, if you look at, if you download a container, you'll see a dot wmd file. It's technically an ASR file. It's, purely technically speaking, it's an ASR file. But, people would say, okay, this is wmd file, that's only bd, you know. So, it's, you're from here, right? So, yeah, so, let's move on. We will, I'll talk about the pieces of the playback part. So, the file playback is that, I mean, it's dp and rtsp, whatever I'm talking about. File playback is basically playing back from ugbug file and it plays back. So, you have the local file. So, you can play back from that. So, why I mentioned this time queue on the AV sync part is, AV sync is audio video sync. So, how do you make sure that a recorded video will have the audio playing together? So, what is the mechanism inside stage, right? Which hand is that? So, how it does this is, basically there's this queue that is there. So, in the awesome player part, let's come back to the log diagram. So, you have the, am I going to pass or, is it something of interest or should we go into something else? It's a very, very, so, talking from house. Yeah. My question on, I mean, from an application point of view, that would be interesting. Probably, I think the level here is bigger. I guess so. Okay, yeah, yeah. What's the reason you're going to do that? Yeah, I can take that question up because, I would like to cover some points of, since this was supposed to be a stage right now. So, let's at least cover some points of it, come on. I put some according to it, sir. I'll definitely talk about talking about that, because that's one of the applications that my wife used a lot. She sings to it. So, it's really interesting in the way people come up with multi-layered applications. So, there was a question there. No, I just wanted to add to that, if we can, we can render through on the use case of what you did with this type implementation. Okay. Then that too. Yeah, I'll get that. And when, I'll get to the end, where I summarize the performance of the application, the latency part, and everything so. That's when I would like to, that's exactly the capsule. Is it covers the integration for any products? I want to keep it as a needed component, a parcel or a company. Okay, no, I won't be able to talk about it, but we can at any time. We've done it, so we can. Yeah, one more question is a. Sorry. If it covers integration, would it be negative? Yeah, but the problem is, there are like hardly five people with codec backgrounds. So, if I start talking about it. Yeah. I have one question on the HTTP, which you put in your slide. I mean, Google says that HTTP live streaming is available only from Ice Cream Sandwich, right? So, what do you mean by that? Yeah, actually, there is some, with some push of this code, in Gingerbread. So. The phone doesn't support it. Yeah, it could be depending on, like I'm not sure, which phone I'd like, I don't know. No, Gingerbread phones. Okay. Yeah, but in the code, there is support for HTTP. You actually have to do the same Apple protocol, HTTP live stream. No. I'm talking about, if you give an HTTP link and then, I give a HTTP call in slash slash and then a file name, right? At the end of it, there'll be a file. So, kind of like a progressive download here. So, not the live stream? No. Yeah, officially, yeah, it's only coming up in Ice Cream Sandwich, but there are, say usually, how the Android works is the first time around, when statesmen will introduce, there's only a playback path. But then the record got added later. But even in the first one, there was a, like kind of experimental variant of it and stuff. People keep nodding. How does Flame's question be made in the... How does Flame's question be made in the... How does Flame's question be made in the... HTTP live and Gingerbread and... HTTP live stream completely different from... Yeah. This via Flame. There is some support there. Yeah, there's, like a preliminary support for it. So, this is saying that HTTP progress is done. Yeah. So, why this... I'm gonna talk about a bit more about AV sync. So, AV sync, say, I'll talk about what AV sync does. So, you have a file and there are video samples and there are audio samples. So, you know that both of it are stored separately. So, video is stored separately from audio. So, how is that you can correlate a particular video sample with an audio sample? So, for that, the basic thing that it have is a timestamp. But then, the thing is, your audio pipe is very different from the video pipe. Your decoding goes on from through a different piece of hardware, while your audio might be mostly software coding. Even if there are hardware coding, it's a different pipe again. So, finally when we are rendering it, someone has to make sure that the samples that I'm playing back on the screen is the same thing as somewhat, or somewhere on the same reason as the audio. It should not be like too slow or too fast, right? So, one of the things is you have to rate limit it, because audio coding, decoding might be faster. So, you should make sure that the audio playback is still happening at the same speed as what was recorded, right? So, if you are recording at 8,000 samples a second, you should still be playing back at like the 3,000 samples a second. If you're playing back music, you might be at 48 kilowatts, and you should still be playing at 48,000 kilowatts. So, similarly, for video, when you have video coming at 30 frames per second, how is that even correlated? So, what happens is in the renderer part, the final component which takes care of putting things onto the screen, the renderer has a skew that is there. So, it's kind of like having a timer. And the timer is actually updated from the audio side. So, whenever the audio is played, the time gets updated. And this is used as a clock for the video. So, the video looks at the time and sees that, okay, 15 samples have played. And then, I should be at this particular sample. And if it's not, then it goes for a sleep. It basically should use itself again, after that much time. And then, it comes in the renderer. So, that's the basic logic that's there. The audio is the master. So, audio, based on how much audio has been played, you go ahead and update the clock. And then the video looks at the clock and then sees what to do. If it's too late, it drops the sample and then moves on to the next sample. So, why is it that we generally keep audio as a master? Because audio, slight jerks in the audio, you can actually perceive. So, generally that's why audio is kept as a master. So, So, shouldn't it be the other way around then? I mean, if you are saying that slight jerks in audio is like acceptable. No, it's perceivable. It's not acceptable, that's what it is. Oh, perceivable, okay. It's perceivable. So, you should, you keep audio playing and just go, what, whatever happens. You keep audio playing and then you drop, you drop or you sleep for a while for a video to catch it. Or if video is too fast, you sleep for a while. And my second follow up question is like, let's say you're actually doing this, right? Generally, the one which gets processed faster, that should be the one which should be sleeping, right? Because the other one is taking more time. Ideally, yes. But if you're trying to do, or you're sleeping in audio, you can actually notice it. The problem is perception. Because video, you can't perceive it even if I drop like five frames and give it to you. For a second, you might, you might be able to perceive like five frames, but if I'm doing it cleanly, like if I do drop two here and then two later, you might not be able to perceive it. So, that means it's actually like getting magnified, right? One is you're dropping, you may be dropping a couple of video frames because the processing is slower. And second thing is you may be dropping a couple of more frames to actually make sure that your video is in sync with the audio. So, exactly, it's the same thing. The same thing, say when you're playing back flash on your laptop, like flash is a bad boy, right? So, everyone tries to play flash and look how slow flash play back is. So, when you play back a high resolution flash video on your laptop, if you're, you try it on a network, you can actually see this. You can see that the video is terribly slow. Like it's painstakingly slow. It just doesn't move, it's just like, I'm moving like a slow motion stuff, right? So, basically what's happening is a lot of frames being dropped. So, that's what finally happens. It's one of, whatever you said it's one of the same. So, basically you drop video. And it's not that, just because you drop video, you drop more, no, that doesn't happen. You drop video and hope to catch up at some point in time. So, assume that your video processing, say your processor speed is 600 megahertz. And assuming that you have finer megahertz per video. But for the resolution that you have, if you're processing, if you need 1000 megahertz, that means that you can do only 15 FPS, right? 15 frames in a second. So, 1000 megahertz is required for 30 frames. Then, finer megahertz you can do 15 frames. Whatever you do, you can do 15 frames. So, you have to drop the rest of it. And you can't drop before the decoder because of issues with the coding. It would be ideal if you could drop it before the decoder because you can save on the processing and then move on, but you can't do that. Because video generally has a dependency with the previous frames and frames link. Why this slide, okay? So, generally, this is what would be looked into when we actually talk about integrating a coding. So, state spread assets works on integrates OpenX codecs. So, this is the flow that would be there. Why, again, this is required is, say, when you are integrating a custom codec, when you are trying to integrate a new file format or a new streaming format or something, and then your decoding is not happening fine, you'll have to dig into OpenX OpenX codecs. Because this is the flow that will happen. You are empty buffer done, you're empty this buffer and fill buffer done and fill this buffer. So, this is the flow that would be there. So, basically, codecs are given input buffers through, depending on what type of codec it is. So, as soon as you are encoder. So, encoders input buffer would be given to an empty this buffer call. And the encoders output buffer will be given to a fill this buffer call. And then, when it's done encoding, the encoder will give back a fill buffer done. And the input buffer would be created through an empty buffer. So, that's the sequence. So, what, and awesome player actually through the WinX codec, WinX codec is the wrapper over the whole WinX way that's there with them. So, through the read function, the awesome player takes the buffer up. So, if it's video, after the read, you get the data and then you render it. Okay, so, let's talk a bit about this. Let's just talk about the codec configuration because one of the things when you have to talk, when you look at a codec integration is, the way your codec understands a particular codec, it might be different from the way it comes in. So, that's one of the problems we have with Skype. The way the PC gives you, the frames are a bit different from the way the platform codecs were giving us. The PC was giving the first frame as a smaller frame, which is just the header, and then the data separately. While the rest of the data, while the platform was giving the whole frame with the header and the frame as one single frame. So, the codec config, you'll have to understand how is it that the platform takes care of, the codec on the platform takes care of parsing this codec configuration information. So, when you're integrating a codec or when you're integrating, when you're trying to use a codec, if it's failing, you would have to understand what is the type, what is the way in which it takes the header. Because usually, the header parsing is one of the things where you might mess up. The way you're giving the header to the codec might be different from the way it's expecting it. So, try to understand, go through the codec specification or the documentation that's there. Try to see the way it's already indicated. So, if from the Epic 4, the Epic 4 extractor is a good place to look at, because they would have integrated with the Epic 4 extractor. You can see how is it that he's using the codec configuration which is parsed on there. So, do we have any questions here? What you think is, if there's an audio presence, you can do a thing, you can, you know, you will just explain that. Yeah. No. If there's no audio presence. If there's no audio presence. Yeah, it's easy, right? If there's no audio presence, you just play it along with the timestamps. So, your timestamps would be proper. Assuming your timestamps are proper. If your timestamps are not proper, that's when you're done with them. So, if your timestamps are proper, what you'll be doing is you just take, look at the timestamps and see when is it that each of the frames should be played. So, you have a presentation timestamp, which is basically your time at which your video should be played. That's it. When, so, for example, the use case that I'm talking about is a surveillance application where you get only the audio, but not the video. Okay. So, basically the driver is also to be written, or at least the modification of the driver is to be written and the analysis and the layout part is also to be written. So, now, when would it occur that the timestamp are not proper? It's an only audio, right? Only video. Only video sign. Okay. The timestamps might be a problem if your capture site itself is timestamping it wrongly, right? That's when the timestamps will go for it also. Right. Is there a way to ensure that the timestamps don't go wrong? No. There's nothing you can do. What you can do is, like, if you, say, if you, there are other problems you'll have, you'll have an issue with clock grips and stuff. So, there are protocols for, like, trying to synchronize that and stuff. So, that's outside the purview of whatever that's there in states. But when you're implementing an application like this, you'll have to worry about clock grips, you'll have to worry about, because the way you're booking, you might say that your clock is actually a, you would say that you would want a clock tick every once in 33 millisecond. And your system will tell you that, yes, I'm giving you the 30 millisecond, but what could be happening is the system might be giving it 40 milliseconds. What can you do for that? Right? So, these are the practical problems you'll face, because you might still be assuming that your system is running fine, but it doesn't. So, the record pipe, do you want to get into this or should we take up more quizzes? Can we? Yeah. How do we record it? Okay, recording is likely simpler, because you just, as long as you put the correct timestamps, it's okay. So, each of the frame, say for, we'll talk about the record. So, the camera source actually is the one that gives, takes data from the camera hat and then gives to the rest of the pipe. So, the camera source when it, there's a timestamp callback. So, it basically timestamps it. With the media buffer, there'll be a timestamp associated with it. So, and that timestamp is what is used by the encoder and that timestamp is what is used while writing it. So, the time is taken care of by that. So, you drop samples with the video encoding takes long time? Yeah, if video encoding takes long time, right? We don't have to take care of it. Automatically buffering inside the pipe will take care of it. So, assume you have your encoder, your camera is capturing at 30 frames per second. But your camera is doing 1080p, but your system is not able to do that kind of resolution. So, what will happen is, even though your camera is capturing at 30 frames, since your encoder is happening at, say, 15 frames, it'll automatically drop down to the lowest. It'll drop below the encoder. No, it'll somewhere close to the encoder. Because you can't be faster than the slowest guy in the pipe. So, it'll automatically drop down to the slowest guy in the pipe. And why that happens is because the buffering internally, says the camera needs some buffer into which it can write data, right? So, the camera basically, since the encoder is still crossing the earlier buffers, the camera after a while, it will start starving. So, and then the camera will automatically drop down to 15 frames. So, after the initial slight jump, the camera will automatically come down to 15 frames. That's the beginning. Despite the audience being separated, it's like sequence. It's a separate, separate box. RTSP as it has a control for forward end. Yeah. No, actually it depends. It depends on the way you would like to implement. Because fast forward end view is slightly more complicated because you have to have the biggest frames also, right? If you are, look, if you take a look at quick time, you can actually go back a frame by frame. If you are good at it, you'll have to decode every single frame. So, there's a huge component coming up for RTSP, but I don't think we'll get there. So, yeah. So, yeah. So, another thing is, with respect to the record pipers, how do you make sure that you can actually have a zero copy? So, StageBright allows for works. So, works are variations that are required by the camera and the encoder. So, StageBright actually tells you that, okay, if you don't want a mem copy and if you don't want a particular port to allocate memory for it, you can take care of, you can tell the encoder that, okay, there's not going to be any memory. Your buffers are going to come from the camera. So, you have to make sure that you use, you basically look at any extensors like this, which is possible in your platform, because doing a mem copy for a camera type of a data is huge, because if you're doing 1080p and if you're doing mem copy at 30 frames per second, you're looking at about 40 megabytes. So, which is just going to kill your system, your power consumption is going to go up, your bandwidth, your memory interface is going to be up. So, basically, you're doing the wrong thing if you're doing a mem copy. The hardware coding integration, I think we can skip this. I think we'll have to skip the RTSP streaming because again, way too many components involved. Maybe we can talk about, I think what I'd like to talk about is this. This is my last slide. So, the copies is what I think. I do see how is that we can exchange buffers instead of copying data. The latency is one of the things that we had. I'll come back to the example of Skype. I think the talking time will have to wait. So, I'll talk about Skype because it involves latency, involves performance, and it also involves eliminating a new component. So, latency because Skype actually has a hard requirement on the latency. Latency is basically how much time you take from the camera. Say, if this camera is taking a picture of mine, it's actually decoding and then playing back on a TV. So, latency that we talk about is in glass-to-glass delay. We call it a glass-to-glass delay. So, basically, from this glass, which is the lens, to the other glass, which is the TV there. So, what is the latency that we have that we can achieve there? So, ideally, you'd be happy if, generally, people don't notice if it's like 300 milliseconds, this type. It's okay, it's supposedly okay. But we actually work on systems which needs 50 milliseconds latency. So, you want to achieve 50 millisecond latency. You have to go really deep into the framework. So, what you have to do for that is, your camera generally captures one frame in a second, right? No, it's a 30 frames in a second, but it's 30 frames. So, the least you can go to is 30 milliseconds, right? You can't go below that. So, what do you do? You basically split that frame into multiple slices. So, you don't get a frame completely. You take your portions of it. And then, the encoder also, like if the platform supports, like encoding that portions of the frame rather than the whole frame, you take that. And then, similarly, the decoder should also support it and then your display should also support it. So, that's how we go ahead and optimize it. And another possibility is with respect to the buffering that's there. So, if you see, like the default configuration which might be there, might be having a lot of buffering internally. Or your codec configuration. Your S264 has specific configuration called display delay. So, your display delay basically tells you like if there's a B frame, you would need so many buffers before which you can give a buffer up. So, you have to understand the codec internals if we have, not about how to implement a coding or how to optimize a coding, but about how the behavior of the coding is. Basically, read the codec data sheet that comes along with it and then see how it behaves. So, if you feel that at baseline, maybe we can, at the baseline profile, we can configure the codec so that the buffers are not a lot. Any latency is gonna come down. The other thing is signaling errors. So, one of the issues that we have with the current state of state is the fact that there is a player pipeline and a recorder pipeline. What if I wanna do something like a feedback point? Say, based on like a bandwidth, say if you are doing a media conferencing, what you would like to have is a bandwidth identification scenario where it actually adapts to the kind of bandwidth that's there in the system. So, if you're making a media call and if I'm making a media call from a phone on a 3D, the kind of bandwidth I have would be very different from what it would be have at office. At office, if you're seeing on a big screen, you'd want to have the best quality possible. But if you're doing it on a phone, you'd be happy if you could see the video there. And you might not, because you're anyway looking on a small screen. So, you would have to identify what's the bandwidth that's there. And then change the encoder bitrate and then based on that the decoder, because the receiver pipe is the one that actually gets known. How many frames are getting dropped, right? So, based on that, it has to somehow signal to the other pipe, to the recorder pipe that, yes, there are backup drops. There should be some mechanism of reducing your sending bitrate because there seems to be a lot of backup drops in the network. So, if, or if, what if, because generally your radio, your 3G data, there'll also be errors. There could be errors too. So, if there are errors, how is it that you recover from it, right? Because the decoder has options of recovering. But then, how is it that can signal to the encoder? Add these tools. So, it would be ideal if it had some kind of a framework for that, like some kind of signaling framework for that. So, did you, as part of your Skype project, the minimum time for Skype has plans on all the platforms? Yeah. Okay, did you, did we build what? As part of Skype, you would have run into this problem. Yeah. In large screen. Yes. So, Skype is actually pretty easy for us because Skype actually takes care of understanding what the bandwidth is and telling us. If you're doing something on your own, if you want to do video conferencing, scalable video conferencing, using stage fright, it's gonna be worth it. So. Related to the slice mode, the latency per frame will still remain the same, right? Going to slices? Do you want to, like, move it? Latency has actually come down because, say, assume that your buffering is going to be four frames, okay? So, if you're, if you're capturing at a frame per second, the lowest denominator is going to be a frame. Then you're looking at four to 30 milliseconds. So, if your buffering is going to be four frames and if your slices are going to be one-fourth of it. So, you were saying basically your processing will happen in a patch. Is it what you were saying? No. My processing still happens in terms of the slices. So, yeah, if your encoder doesn't support slice mode, there's no point capturing in slice mode. No, I agree. But I'm just wondering. And let's take it up because I can show that exactly the world. And the recording works in terms of pipe or in terms of buffer? Because in buffer, we fill the data and then pick the data. Yeah. Write data on N and read from start. After recording, the data is passed around with the buffers. This is why we are creating some application where network communication is there for, suppose audio chat or video chat. Okay. Then we should get continuous data. Okay. If we create buffers and picking data, it may give us delay. No, that's what we do. Internally, that's what finally happens. There is a pipeline, but there are buffers being exchanged. That's how finally the recording happens. Then what about the latency between them? That's what we're supposed to pick. And that's why we get paid. Yeah. So that's a very good thing. So Skype support starts supporting video call and Android very big and stuff, right? Yeah. So what took such a long time for Skype to start programming? That API support because, actually we have comparison which says our performance is like 16 times better than the Skype performance that's set. Because we are basically using the hardware better. Because if you write something that's generic, you can't actually be using the full capability of a hardware. Because at the top layer, if you're writing an application which you can install on, say from your Nexus One to Nexus S to, from your single cores to dual cores to different types of hardware, you just can't use the hardware as the most optimal possible. So there are like empty number of silicons that are there. There are empty number of, can't be reached there about those silicons. Just not possible to write an optimized code forever. So even ours doesn't work well for every silicon. One question is, Skype has a pure app. Yeah. You are talking about framework changes and all that. Yeah. So ours is not downloadable into any code that's set. But Skype is downloadable to any code. Yeah. But that's why they're not, that's the good thing. Skype is downloadable onto any code. The bad thing is, you won't be able to get the best possible performance. Say, an example is on an iPad, you have FaceTime and then you have Skype. You should see the quality difference between the FaceTime and Skype. On the same bit, right? You'll actually have a much better performance and your battery was going to last much longer with FaceTime. We have tried that. Because FaceTime actually uses the hardware much closer. And Skype is not able to do it because Apple is just not interested in opening it up. This is a performance where you say 16X improvement, what is that performance? You're talking about decode and say, the resolution that we can support. The resolution at the same performance level, right? What's your Skype? We are able to do something with it, 30 FPS and the next step we are able to do is 1080p 30. We have showcased it, so. I think we should go on to something other than Skype. So we can wrap up now, anybody who has questions, can you get us the AXE data? So, it's on the first slide, it's go ahead, it's gmail.com. So, I'm really glad that Alice could take up time to have a session over here. We've got our applause for this session.