WEBVTT
Kind: captions
Language: en

00:00:00.070 --> 00:00:04.760
Have you ever noticed that video of falling
snow or confetti can look pretty terrible?

00:00:04.760 --> 00:00:07.370
As soon as there's stuff floating around in
the air,

00:00:07.370 --> 00:00:10.280
suddenly the quality of the video you're watching
collapses.

00:00:10.280 --> 00:00:13.639
You can see it on this incredible clip of
200 kilos of confetti

00:00:13.639 --> 00:00:16.580
being blasted at Ed Sheeran on the UK's X
Factor.

00:00:16.580 --> 00:00:21.810
Now, if you already understand compression,
you can pick another video.

00:00:21.810 --> 00:00:23.720
Everyone else: let's talk bitrate.

00:00:23.720 --> 00:00:27.359
I'm not actually in Norway, by the way, if
that wasn't obvious.

00:00:27.359 --> 00:00:30.640
I could have tried to find some actual snow
or bought a load of confetti,

00:00:30.640 --> 00:00:33.960
but this way I can test things with carefully-controlled
digital effects.

00:00:33.960 --> 00:00:38.190
Which has the added bonus that I don't need
to clean up afterwards.

00:00:38.190 --> 00:00:40.600
So, to put the problem in one sentence:

00:00:40.600 --> 00:00:44.120
there are only so many ones and zeros to go
around.

00:00:44.120 --> 00:00:48.180
Back in the days of analogue television, video
was uncompressed.

00:00:48.180 --> 00:00:49.899
The TV camera scanned the signal,

00:00:49.899 --> 00:00:51.170
it was transmitted over the air,

00:00:51.170 --> 00:00:53.260
and your television played it back.

00:00:53.260 --> 00:00:54.969
And yes, it was only standard definition,

00:00:54.969 --> 00:00:59.039
but pretty much every bit of detail the camera
caught appeared on your screen.

00:00:59.039 --> 00:01:01.660
And that's fine when there are only a few
television channels

00:01:01.660 --> 00:01:04.570
and they're literally going over the air.

00:01:04.570 --> 00:01:06.920
But that's really wasteful.

00:01:06.920 --> 00:01:09.450
The reason that digital television can have
so many channels,

00:01:09.450 --> 00:01:11.780
and that web video works at all,

00:01:11.780 --> 00:01:13.830
is because of compression.

00:01:13.830 --> 00:01:18.000
If you tried to actually transmit every pixel
of an HD video, in perfect quality,

00:01:18.000 --> 00:01:22.850
you'd need somewhere around a gigabit a second
sent over the wire. As I record this,

00:01:22.850 --> 00:01:27.570
that would max out over 100 average American
broadband connections simultaneously,

00:01:27.570 --> 00:01:31.170
or over 50 average South Korean broadband
connections.

00:01:31.170 --> 00:01:35.810
So if you want YouTube to work: that amount
of data, that bitrate,

00:01:35.810 --> 00:01:38.110
is going to need to get cut down.

00:01:38.110 --> 00:01:40.970
Step 1 is regular, everyday image compression.

00:01:40.970 --> 00:01:43.420
Pretty much every photo on the internet is
compressed,

00:01:43.420 --> 00:01:47.430
mainly by throwing away small bits of detail
that the eye probably won't notice.

00:01:47.430 --> 00:01:50.210
At least until it gets screenshotted and reposted

00:01:50.210 --> 00:01:53.680
twenty different times by twenty different
Instagram accounts.

00:01:53.680 --> 00:01:56.799
You can take every individual frame of the
video

00:01:56.799 --> 00:01:58.950
and apply that compression to it.

00:01:58.950 --> 00:02:01.630
Step 2 is interframe compression.

00:02:01.630 --> 00:02:04.520
Until there's a big scene change, why bother
storing whole frames

00:02:04.520 --> 00:02:06.950
when you can only store the changes between
them?

00:02:06.950 --> 00:02:09.750
After all, if I'm just talking against a plain
background,

00:02:09.750 --> 00:02:12.710
you don't need to keep sending new data for
that background every time.

00:02:12.710 --> 00:02:15.740
Just tell the video player to repeat what
was there before.

00:02:15.740 --> 00:02:17.860
Or if I move my body a little as I talk,

00:02:17.860 --> 00:02:21.220
just tell the player to move that block of
pixels a bit to the right,

00:02:21.220 --> 00:02:24.240
and maybe tweak a bit of colour here and there.

00:02:24.240 --> 00:02:26.890
That's how you cut down gigabits of video
per second

00:02:26.890 --> 00:02:28.870
to something you can load on your phone:

00:02:28.870 --> 00:02:30.500
Maths. Lots of maths.

00:02:30.500 --> 00:02:33.520
But I think a practical demonstration would
be better, so:

00:02:33.520 --> 00:02:35.660
I'm going to limit the bitrate of this video,

00:02:35.660 --> 00:02:39.010
the number of ones and zeros per second that
are being used to encode it.

00:02:39.010 --> 00:02:41.860
And yes, YouTube will mess about with this
after I upload it,

00:02:41.860 --> 00:02:44.010
but it can't magically put detail back in:

00:02:44.010 --> 00:02:47.340
so even if you're watching in the best quality
you can,

00:02:47.340 --> 00:02:49.870
what you're seeing now is still the limited
version.

00:02:49.870 --> 00:02:51.980
This is two hundred kilobits a second,

00:02:51.980 --> 00:02:54.810
two hundred thousand ones and zeros going
over the wire every second.

00:02:54.810 --> 00:02:56.730
Doesn't look too bad with modern encoding,

00:02:56.730 --> 00:03:00.810
you might lose some fine detail on my face
or hair or hand gestures,

00:03:00.810 --> 00:03:04.080
but you can still see what's going on pretty
clearly.

00:03:04.080 --> 00:03:06.110
But now, let's add a bit of snow.

00:03:06.110 --> 00:03:10.660
And suddenly, those bits aren't all being
spent on rendering me.

00:03:10.660 --> 00:03:13.340
Instead, they're also being used to track
the stuff that's flying around.

00:03:13.340 --> 00:03:16.100
It's chaotic, it keeps changing direction,
it's complicated,

00:03:16.100 --> 00:03:18.930
so just saying "move these pixels here" won't
work either.

00:03:18.930 --> 00:03:21.480
Let's add some confetti, too, all colourful
this time.

00:03:21.480 --> 00:03:23.360
There we go, now it's all starting to fall
apart.

00:03:23.360 --> 00:03:25.500
The more stuff there is moving in the frame,

00:03:25.500 --> 00:03:26.510
more confetti, there we go,

00:03:26.510 --> 00:03:29.370
the more spread out those two hundred kilobits
have to be.

00:03:29.370 --> 00:03:30.670
More confetti! Here we go.

00:03:30.670 --> 00:03:34.480
No matter much the encoder tries to optimise
for faces and skin tones,

00:03:34.480 --> 00:03:38.600
it just doesn't have the bits spare. More
confetti! More snow!

00:03:38.600 --> 00:03:40.560
Now, even if I turn the bitrate back up,

00:03:40.560 --> 00:03:42.430
put this in the highest quality I can,

00:03:42.430 --> 00:03:44.350
it still won't look good right now.

00:03:44.350 --> 00:03:48.260
I don't know why I'm yelling, I'm adding the
wind noise in later.

00:03:48.260 --> 00:03:52.590
But it's not really about the confetti itself.
It's about the movement.

00:03:52.590 --> 00:03:54.370
If we freeze all this stuff in mid-air,

00:03:54.370 --> 00:03:57.010
and make it into a background:

00:03:57.010 --> 00:03:58.620
over the next couple of seconds,

00:03:58.620 --> 00:04:01.850
the quality of the video will come back.

00:04:01.850 --> 00:04:05.990
That's why the picture falls apart when your
sports team wins and the confetti drops.

00:04:05.990 --> 00:04:08.830
Video literally isn't what it used to be.

00:04:10.900 --> 00:04:12.900
[Translating these subtitles? Add your name here!]

