 There was there is this talk called audio networks and the security implications and you're probably going to Well, what why do we have security implications with audio networks? It's all XLR and analog and there are no security issues, right? well Apparently the world moved on and now it's all audio over IP and especially with Commodity hardware, so here to talk about audio networks and security implications Please welcome with a very warm round of applause PC Wiz Hey Hey Yes, I'm on PC Wiz. I Work for the company that put the sound system in abacus if you've heard that you probably definitely heard that on Friday night If you were here But this is my own work outside of work, so I've been sort of doing my own research But I work in that field and that's why I'm sort of looking at it so I'm hopefully I'm going to introduce you to all your networks what they are and Where they're deployed and get you a bit interested and Also show you just how soft they are as a target and Then after showing you how soft they are as a target hopefully people are interested enough to Think about it and maybe when people think about it we can change it So where are all your networks deployed well abacus But also big broadcast Situations, so think about Olympic games. You've got lots of stadiums lots of broadcasters lots of different people you've got to provide Public address inside the stadium for people watching in the stands And you've got to provide media streams that all of the various television and radio broadcasters can use to Create their coverage And what tends to happen when you make an audio network is That they extend to places you don't expect them to go to especially in places like theaters for example It's increasingly common now that your bathroom also has an audio feed from the stage So you don't miss anything when you go to the toilet, which is really convenient But now means that you've got audio access inside your bathroom so That might be a problem thinking about the things we're going to talk about later And yeah, any anybody who's got any experience doing penetration tests for know that bathrooms are quite a good place to Have plenty of private time with network equipment Okay So why are we deploying these networks? Because well no one deploys technology just for the sake of it unless you had a hacker camp in which case maybe you do It's mostly well cost Everything in AV is moving in this direction For IP based networking, so you've got art net on the lighting side. You've got various different video networking standards and Obviously ancillary stuff. So you show control Everything is going over IP networks nowadays So you can save a lot of money by building one IP based infrastructure and dragging one set of cables inside your building And that's that's pretty that's pretty nifty Yeah and if you're thinking about It's got it's do I shouldn't touch the microphone It's it's also more versatile Because you can share your audio feeds a lot easier with multiple places so sharing in the Olympics again between the the Public address system in the stadium and the multiple broadcasters outside. It's really easy You just sort of let them attach to the network and then they can ask for those Audio feeds and it doesn't have to go through any special box. The network takes care of Transporting the audio there so it solves a lot of problems. It's it's really nice and It's also really helps of consistency. I don't need a box of lighting cables a box of audio cables and a box of whatever else I just buy some single mode fiber and some cat 6 cables and some normal for all fairly normal switches and And everything's good. I can if something breaks I can just go to the local IT supplier and probably buy a replacement. It's it's nothing which is Special to the AV industry anymore really which Really improves your business continuity, especially if you are touring and you need the placements for things Okay, so why am I going to talk about the network and not the hundreds or maybe thousands of insecure Linux devices that are sad of the network? probably probably because it's more interesting to me and because a lot of people will already talk about hacking embedded Linux devices and I could do a separate talk on that but but changing the network is going to be a longer term goal because that requires integration between multiple different players of the industry hardware vendors software vendors and of course adoption by by people and that's That that takes a decade at least Even once you've got once you've got a standard in place to get that built into hardware we'll build into software updates and Then actually deployed There's a lot of work to be done. So I thought I would highlight the topic now and hopefully we can start to agitate some change So now I'm going to get into a section where I'm going to run through some attacks and we're going to build up We're going to sort of build up the knowledge we need to understand the system one attack at a time So I've got three different attacks And what kind of attack objectives might we have? So we can damage people obvious way to damage people is play something really really loud and create hearing loss less obvious way is just disrupt the system during Some kind of emergency announcement and you can create quite a lot of panic and then you're really You're really Gonna have problems with panicking people you get crushed people and that's that's not great and If you saw on the previous slides We also deploy audio networks in Transport hubs like train stations and airports so So definitely places where you want your public address to to work You can also damage equipment you can play square waves Out of the speakers and the transducers will pretty quickly burn out if you play a sufficient amplitude So Yeah, and then there's also reputational harm which which could go together with snooping you've got a big a big star and Then you Get a recording of the microphone while they're off stage doing something less than ideal or saying something less than ideal and well, you've just you've just got some blackmail material or Created a very sticky Situation and of course disinformation if you can control the system you audio coming out in especially a broadcast situation Then it's like in the movies. I mean you you you can do whatever you like You would hope you couldn't but there's some soft bits on either side where you probably cut if you've got time and effort I'm a time of motivation So let's start off with snooping right we've got we've got an analog audio system here. We've got a microphone a stage box and and a mixing console pretty pretty simple the There's an XLR cable that goes into the stage box and then there's an XLR snake or maybe Ethernet cable doing like AES 50 which is not IP based but but a special thing that also happens to use ethernet cables Into the mixing desk and it's pretty pretty hard to intercept that. It's going to be detected very very easily Because if I unplug the cable the audio stops the guy sitting at the front of house is going to notice the audio stopped that's It's hard to attack that situation But let's make this a network. Okay, we've got a switch in the middle great Well now now I can now I can plug it and unplug things without the guy noticing necessarily That's that's already a start So how how am I going to how am I going to tap this? And for that we're going to need to explain a unicast a mollycast But here's a attacker who is clearly Motivated to to spy on what's what's coming out of its microphone so just some quick background because I also need to explain multicast because in most network situations multicast is something you ban burn with fire and then Pretend doesn't exist, but in media applications. It's quite necessary. Unfortunately So so to explain it, this is what unicast looks like you send The same feed multiple times from your source, which is great If you if you have a handful of devices that need to receive this audio send unicast feeds It works you've got But what happens with unicast is that you start to consume all of your transmission bandwidth on your source and then you cause congestion and Especially in audio networks. We don't like congestion because the congestion Basically increases our latency and increases our jitter and makes our audio neck works less stable. And if we're trying to operate in a low latency environment for example doing Public address on a on a stage when we need low latency Somewhere in the sort of under two millisecond range would be nice So so we have to send a different Different stream for each device and this is this is Also somewhat vulnerable because you can just up spoof and then as long as you forward the packets quick enough They the person on the desk might not notice But your But you have to be very careful about your your forwarding because the latency is quite sensitive so Expensive network calls are advisable If you're gonna ob spoof and intercept unicast in an audio network But luckily for attackers well, we use multicast most of the time so so it's actually much simpler so we solved we solved our We've solved our problem of the source transmission bandwidth Because the source just sends it into the network and then it's for the network's job. So the switch is in the network to understand who is part of this multicast group and Who wants this network traffic on our case audio? And the basic way that this works is someone well usually the Vuta or a switch is a query and it asks everybody in the network what multicast groups would you like to be part of and Then everybody in the network applies with a list of the multicast troops that they'd like to be part of this perverses up the tree of switches and You basically end up receiving that multicast group So all you need to do to snoop on the audio is plug into that switch from earlier say Look look find out which multicast group has the audio you want in it and Join it and then you can pack it capture and Wire shock has a really handy tool which will take an RTP stream and just turn it into a WAV for you Which is which is pretty nifty So that's that's snooping it was very easy Yeah, so what else can we do? Okay, we can disrupt The network we can disrupt audio coming out. We can create audio artifacts We can make we can make dodgy noises come out of the speakers. We can we can just make the system not work Which depending on your objectives Could be interesting to you So the thing to attack here is the clock on the network Clocking is extremely important on media networks because to get phase alignment for the 48 kilohertz Audio stream, so if you're 48,000 samples of audio per second, and that's that's your media clock when you need a Accuracy for your wall clock within about one One microsecond so it's a very Very high-activity clock that you need available in your network. Otherwise you get phase alignment issues and then you get Various interference patterns, especially in AV setups like this So so if you even adjust the clock, mind you, Lee, you're going to cause problems for people So how how are clocks distributed? How are clocks distributed on these networks or they use the precise time protocol It's it's in the name really and what it what it does is it has a Leader which unfortunately in in IEEE telemetology is called a grandmaster, but I will not use that term from here on This and this basically is the best clock in the network theoretically and this should in In theory also be backed by GNSS, so it should be a satellite device clock and this is going to also help you When you come to stream audio between multiple sites because the precise time protocol only works within one LAN So within one network within one site, you don't want to run the precise time protocol anywhere else Running the precise time protocol over a Over the one will not go very well because it basically calculates lots of delays and sort of makes some assumptions about how even that works and It does a lot of clever things and relies on a lot of clever assumptions about how local heavy networks work and it doesn't extend so well over over Over the internet so Yeah, that's That's where that works. Anyway, so precise time protocol you've got the sort of main leader of the top of your tree and then you might have boundary clocks or transparent clocks in your network and those are typically switches and these These boundary clocks will basically act as They will listen to the to the leading clock on one interface and then will lead clocks on the other interfaces Which helps take load off your main clock lead And a transparent clock is really just about providing increased accuracy which you'll find in newer switches Because it basically measures the time that your packet is inside the switch and then provides that information too which is quite useful for Understanding the delay in the jitter on your network Yes Yeah Yeah, so if you're if you want a LAN It's just ptp if you're on if you've got multiple sites then because they're both based off a satellite based clock They should be coordinated and pretty much aligned so you can have A broadcast coming from South Africa than streaming it somewhere and mixing it in Brazil and it should work just fine as long as you've got Access to satellite time But of course we can disrupt this clock and then things will go wrong So one option we have for disrupting this clock is by bigging elections Because the leader the lead clock is elected naturally Based on its characteristics, but not just its characteristics. It's also elected based on to use a controllable priority fields and If you ever touch a media network You will probably find out very quickly that you have to configure these In order for it to work correctly You'll find a lot of media networks where they basically Denial of service themselves because they're not properly configured and the thing that isn't properly configured is is the priorities in the in the In the ptp announcements for the for the election of clocks But in reality if you're if you're an attacker You can change all of these fields and basically what you want to do is say I have the lowest priority which the lowest number of priority gives you the highest score in the election and You can sort of fake your accuracy and instead of saying you've an eternal also later You maybe you can save an atomic clock or something if you really really want to be the clock So if you want to be the clock, it's pretty easy to be the clock So you're the time Lord now great so if you want to make the clock drift you can make the clock drift if you want to Make the clock jump you can make the clock jump you can do whatever you like and It's pretty much as simple as running ptp for Linux on your laptop and Have fun with that There's there's another option of course You can you can do a denial of service you can generate some traffic But you don't have to generate that much traffic because because all your networks rely on Prioritization happening within the switches to achieve different Amounts of latency and different traffic classes basically So the clock is the most important and this goes into the expected forwarding class The audio is the second most important thing and this goes into a short forwarding and everything else is best effort and what this relates to is for the switch typically has a couple of cues like eight or so and there's there's a basically at some There's basically a load balancing algorithm that prioritizes certain cues like the expected forwarding cue over the assured forwarding cue and then the assured forwarding cue over the default forwarding cue and this basically Tries to make it under certain conditions that you don't lose your clock packets and that you don't very often lose your lose your Audio packets and that the jitter applied to them is also lower so all you actually need to do is send send enough traffic labeled Labeled with this field with the correct value and then you fill up the cue where the ptp should be going and When you filled up that cue then the ptp will start to deteriorate quite quickly So you don't need to saturate a lot of the network to cause Problems, and if you're just aiming for Chaos, then that will give you chaos Okay, now we move on to possibly the most interesting attack types of Of this presentation Actually hijacking audio streams. So how can we get the speakers to play our audio or how can we get the broadcast station to play our audio? So we've got the first option which is pretty Pretty trivial all of your audio Streams will be announced on the network somehow This example is from a Venom implementation so we're choosing the session announcement protocol but But you'll also see Dante using MD and s but it doesn't that that's really doesn't matter because the Always inside MD and s all the session announcement protocol you will find the session description protocol, which was the same in both Dante and the Venom Basically any audio for IP solution And what what you find in here is Basically the multicast group where where the audio is being sent to You find the port number and you find It's an information about the time synchronization You'll notice the media clock direct equals zero line. This is interesting In in already RTP the real-time transport protocol that you you have a timestamp and this There should be a timestamp for your media clock and According to the standard it should start at a random number Where the idea was to Make plain tech known plain text attacks less possible if you if you encrypted your your RTP traffic but In most implementations of audio networks This is zero the random interval is pretty much always zero So that's that's just an interesting thing to note. So you pretty much you don't need to take account of random intervals in practice The information you've got here is basically what type of audio it is and this is linear PCM 16-bit encoded two channels at 48 kilohertz If you have done any digital techniques, you probably know what pulse What PCM audio is? You basically just measure the amplitude of the signal and you measure it into a 16-bit value each time and then you measure it 48,000 times a second and in this stream. It's a stereo stream. So we have two channels of audio And it also tells us what clock domain we're in which is quite useful Okay so what you can do here is you can just make your own announcement and then you can make your own name and Wait for someone to click on it. That's possibly not so interesting, but it will probably work because there's no verification here, so the client will present all of them equally and Depending on the recommendation of your if your clients There will be some caching of these messages and you can probably abuse the caching of these messages to Make yourself higher in the list and more likely to be clicked on So that's one way of doing it which would work When the stream is set up But how do we do it when the stream is already running? How can we attack? something that's already running So we've mentioned already That we have Latency on our network, so it takes it takes time For a packet containing audio data to get from our stage box to our mixing console And it takes a different amount. It takes a different amount of time each time The variation in this different amount of time is our jitter So we can't just we can't rely on On our network Delivering something in constant time Which means we have to we have to buffer when we're receiving when we're receiving When we're receiving audio data in our mixing console, we have to buff it. We have to have a buffer to With a fixed offset In time against against the clock We need a fixed offset to ensure it's real-time playback otherwise would be speeding up and slowing down each time packets Can arrive and they can also arrive out of order. So we definitely need to buff it So So what we can walk in do is we can we can abuse this this buffer and this buffer will be biggest in broadcast applications Because in broadcast applications, you've got higher latencies because you're typically going longer distances over over wider networks But you can if you've got sufficient resources you can probably also do this On on a on us on a smaller network. You just need to be very precise about your timing And just to look at what more package you're sending here. It's it's a real-time transport protocol There's your time stamp, which is based on your media clock the sequence number which is fair basically to detect packet loss and in some applications of RTP you would send Basically Replies back occasionally telling telling the sender how much packet loss you've you've encountered But in most applications You don't So that's not so important to us and then your payload basically just contains It just contains your linear PCM encoded audio Interesting sort of fun fact here is that you can list your contributing sources So if you've got multiple audio streams being mixed into one audio stream, you can theoretically keep track of them But I don't think anybody actually does that Yes So how how can we abuse this buffer? Well, we can basically pay play pixel flute But with with a deadline So you basically just need you basically just need your pack your packet to be the last packet that arrives because there's no way to verify the packets coming in and You're unlikely to spend the time checking if there's anything in your buffer already Because why would you and even if you did then it would just change the game round so we can We can basically monitor the network and see when our legitimate source is sending packets So we sent a packet and then We can offset our clock. So we always send a packet So we don't arrives at the destination just after Just after the legitimate packet arrives, but just before the clock comes along When we get played back and that's That's that's basically how you do it Almost got this working So I hope at some point to have a proof of concept published because it's it's not that not that difficult To achieve especially on higher higher latency setups So let's start to summarize this a bit Yeah, so we've seen a couple of ways that you can attack network audio this definitely more of them and let's Just compare Yes, okay, so Why are we in this state is is interesting The the the industry hasn't quite adapted to the fact they're running networks yet They're running them They don't understand the implications of them yet and this this is a sort of table word which should help us understand the differences and what they're expecting So as we mentioned if you've got an analog system a cable goes from point A to point B This is you know, it helps you with traceability. You just follow the cable it helps you With controlling ingress and egress because if you unplug the cable audio stops, it's very detectable and It creates And it creates a natural choke point of a mixing desk, but all of the audio is definitely going through a mixing desk because that's where the cables go Whether's on when you're using audio of IP the network can basically be rooted The audio can basically be rooted in a matrix from any device on the network to any other device on the network So you've lost that choke point And the only way to really observe that is to get your net flow data from all of your switches Which is not something that audio engineers are equipped to do or particularly want to do So we need to find some solutions to that What can we do in practice today? Well, a lot of these are sort of bad solutions to problems. We really shouldn't have But you can do network monitoring and if you've got a fixed installation With a lot of resources then you can probably afford to have a security team look at your network occasionally collecting data and Figuring out what normal looks like You can disable ports which Kind of works In practice, it's not going to happen because Someone's going to pluck something in last minute complain, but it doesn't work then just enable all of the ports again because that's basically the the practical side of AV technology at the moment You could try to implement 802 on X on all of your devices Which has a bunch of management overhead and your devices poorly that support it and you could try to do Mac address filtering But it really wouldn't be that effective Yeah, yeah, we increase increase your attack time by 30 seconds or whatever however long it takes you to notice it was Mac address for doing in place and Stalt spoofing your Mac address Then your other option is basically just Segment your network make sure that your show control isn't running on the same network as your audio Make sure your lighting isn't running on the same network and and just try to minimize your your footprint So how's it moving in the light direction? Well, PTP is getting better The the clock can now be authenticated Which is helpful and the clocks can also be validated against more clock domains. So now a PTP client can Can listen to multiple PTP clock domains And then if one of those clock domains is wildly out of sync with the other ones that can be ejected Which is an interesting concept now So it's kind of a cover them based on based on what's happening in your network And there's also the option to do authentication Which which I don't think any of the endpoints are really using yet, but It's at least a start And this of course secure RTP The only problem with secure RTP is It's not really got a solution to multicast yet It kind of goes. Oh, you could use it in multicast, but didn't find any Anything about how you would use it for multicast So that could be interesting So what what we're missing basically is a usable way to distribute keys to multicast groups And all of the software and hardware implementations that support it because when we can distribute key material then Then we when we've solved a lot of this problem That's pretty much everything Yeah, this is my own research, but My employer is also looking to hire a lot of people so if you're looking for a job in the making loud things then you can You can join Holoplaw. We've got lots and lots of jobs really a lot of them Also, if you just want to move to Berlin, it's quite a nice place to live Yeah, I think we might have some time for questions Thank you very much for the more than interesting talk Well, if you have questions, please line up at the microphones in the middle of the tent And while you do that, I have one question. Would be a mitigation to just switch back to XLR That is a mitigation, but I think I think the industry has changed enough that it's not going to happen anymore. I think I think one of the world cups Like over a decade ago was already using audio for the IP. So it's Definitely in the higher-end applications. It's it's already there and it's not going back So so that cat is out of the box that that that cat is out of the box The only thing we can do now is try to create a secure standard to use And trying to get it implemented Questions from the audience you look like knowledgeable people and it wasn't highly technical talk any questions No, seriously Going once Yes, please come up to the mic So analogous to the XLR situation, how expensive and practical would it be to just Apply only physical security to the whole thing? Well, that's an interesting idea. You could apply only physical security to it But then you put a switch and a ceiling tile above your bathroom And I don't know whether you want to put a CCTV camera in your bathroom like So I mean like separating all of the things because in the XLR situation everything was physically separated in any way So isn't that like Well, not the proper solution, but yeah solution. I mean physical separation is a good start and it's going to minimize Minimize your attacks, but if thinking the size of the venues that exist nowadays There's always an unprotected network access point somewhere So you you always going to find something so there's switches in front of house with switches behind stage There's switches in the foyer outside It's there's lots of different places where Unless you've got security walking around the whole time Or CCTV and you know everything else You're gonna have a hard time Controlling your network perimeter You know, it's not to say you shouldn't try to control your network perimeter. You definitely should but But the network perimeter, I don't think it's going to keep people out and maybe Also worth mentioning that in a lot of these situations, you've got contract workers coming in with random laptops Introducing random machines to networks So you could you could come in already with malware on the laptop that wants to exploit this specific thing and probably if I may add Companies or installations are not not keen to implement twice the IP cabling for just for all Well, that's an interesting point. They actually do tend to tend to build two networks But not but not to separate not to separate the different things They tend to they tend to build two networks because the way that The way that redundancy is done in these networks is you build two completely independent networks ones on like one nine two one six eight one hundred slash twenty four and ones on two hundred slash twenty four and and they They operate completely separately and you I bet each device has a primary and a secondary interface on it and you send the same packet over both Is is basically how redundancy works on these networks So they tend to build two but they don't tend to build it the network segmentation. They build it for practical redundancy reasons Want to follow up? Please go ahead like if there's nobody else here So I think the question I would was getting at was Did the security actually decrease With the Digit like if you would put XLR cables and audio cables through the whole to all your buildings, etc Um, or you would just put these digital tiny cables in there. Did the security actually decrease while moving to digital cables Um, I would say what it did because they generally added more access points to the to the to the network Um, they generally put it in places. It wasn't before Yeah, so so they generally used the versatility that was added and by using the versatility that was added increased their exposure Um, and also you've then got computers which Which just come into networks and you can run a virtual sound card on your laptop and then put audio onto the network Um, so it's it's it's not that difficult to do And again these laptops come from You know five-it one-man AV contractor turns up to do a gig Probably uses the laptop for other things And then god knows what's on there Excellent all questions answered anyone else We still got some time Think about it Now's the time One going once Going twice All right. Well, if there are no further questions, I would like to ask you to give another very warm round of applause for this wonderful talk