The First Lossy Codec

(probably).

Nowadays we are used to the concept of the lossy codec that can reduce the bit rate of CD-quality audio by a factor of, say, 5 without much audible degradation. We are also accustomed to lossless compression which can halve the bit rate without any degradation at all.

But many people may not realise that they were listening to digital audio and a form of lossy compression in the 1970s and 80s!

Early BBC PCM

As described here, the BBC were experimenting with digital audio as early as the 1960s, and in the early 70s they wired up much of the UK FM transmitter network with PCM links in order to eliminate the hum, noise, distortion and frequency response errors that were inevitable with the previous analogue links.

So listeners were already hearing 13-bit audio at a sample rate of 32 kHz when they tuned into FM radio in the 1970s. I was completely unaware of this at the time, and it is ironic that many audiophiles still think that FM radio sounds good but wouldn’t touch digital audio with a bargepole.

13 bits was pretty high quality in terms of signal-to-noise-ratio, and the 32 kHz sample rate gave something approaching 15 kHz audio bandwidth which, for many people’s hearing, would be more than adequate. The quality was, however, objectively inferior to that of the Compact Disc that came later.

Downsampling to 10 bits

In the later 70s, in order to multiplex more stations into a lower bandwidth, the BBC wanted to compress higher quality 14-bit audio down to 10 bits

As you may be aware, downsampling to a lower bit depth leads to a higher level of background noise due to the reduced resolution and the mandatory addition of dither noise. For 10 bits with dither, the best that could be achieved would be a signal to noise ratio of 54 dB (I think I am right in saying) although the modern technique of noise shaping the dither can reduce the audibility of the quantisation noise.

This would not have been acceptable audible quality for the BBC.

Companding Noise Reduction

Compression-expansion is a noise reduction technique that was already used with analogue tape recorders e.g. the dbx noise reduction system. Here, the signal’s dynamic range is squashed during recording i.e. the quiet sections are boosted in level, following a specific ‘law’. Upon replay, the inverse ‘law’ is followed in order to restore the original dynamic range. In doing so, any noise which has been added during recording is boosted downwards in level, reducing its audibility.

With such a system, the recorded signal itself carries the information necessary to control the expander, so compressor and expander need to track each other accurately in terms of the relationships between gain, level and time. Different time constants may be used for ‘attack’ and ‘release’ and these are a compromise between rapid noise reduction and audible side effects such as ‘pumping’ and ‘breathing’. The noise itself is being modulated in level, and this can be audible against certain signals more than others. Frequency selective pre- and de-emphasis can also help to tailor the audible quality of the result.

The BBC investigated conventional analogue companding before they turned to the pure digital equivalent.

N.I.C.A.M

The BBC called their digital equivalent of analogue companding ‘NICAM’ (Near Instantaneously Companded Audio Multiplex). It is much, much simpler, and more precise and effective than the analogue version.

It is as simple as this:

  • Sample the signal at full resolution (14 bits for the BBC)
  • Divide the digitised stream into time-based chunks (1ms was the duration they decided upon);
  • For each chunk, find the maximum absolute level within it;
  • For all samples in that chunk, do a binary shift sufficient to bring all the samples down to within the target bit depth (e.g. 10 bits);
  • Transmit the shifted samples, plus a single value indicating by how much they have been shifted;
  • At the other end, restore the full range by shifting samples in the opposite direction by the appropriate number of bits for each chunk.

Using this system, all ‘quiet chunks’ i.e. those already below the 10 bit maximum value are sent unchanged. Chunks containing values that are higher in level than 10 bits lose their least significant bits, but this loss of resolution is masked by the louder signal level. Compared to modern lossy codecs, this method requires minimal DSP and could be performed without software using dedicated circuits based on logic gates, shift registers and memory chips.

You may be surprised at how effective it is. I have written a program to demonstrate it, and in order to really emphasise how good it is, I have compressed the original signal into 8 bits, not the 10 that the BBC used.

In the following clip, a CD-quality recording has been converted as follows:

  • 0-10s is the raw full-resolution data
  • 10-20s is the sound of the signal downsampled to 8 bits with dither – notice the noise!
  • 20-40s is the signal compressed NICAM-style into 8 bits and restored at the other end.

I think it is much better than we might have expected…

(I was wanting to start with high quality, so I got the music extract from here:

http://www.2l.no/hires/index.html

This is the web site of a label providing extracts of their own high quality recordings in various formats for evaluation purposes. I hope they don’t mind me using one of their excellent recorded extracts as the source for my experiment).

Advertisements

The Secret Science of Pop

secret-science-of-pop

In The Secret Science of Pop, evolutionary biologist Professor Armand Leroi tells us that he sees pop music as a direct analogy for natural selection. And he salivates at the prospect of a huge, complete, historical data set that can be analysed in order to test his theories.

He starts off by bringing in experts in data analysis from some prestigious universities, and has them crunch the numbers on the past 50 years of chart music, analysing the audio data for numerous characteristics including “rhythmic intensity” and “agressiveness”. He plots a line on a giant computer monitor showing the rate of musical change based on an aggregate of these values. The line shows that the 60s were a time of revolution – although he claims that the Beatles were pretty average and “sat out” the revolution. Disco, and to a lesser extent punk, made the 70s a time of revolution but the 80s were not.

He is convinced that he is going to be able to use his findings to influence the production of new pop music. The results are not encouraging: no matter how he formulates his data he finds he cannot predict a song’s chart success with much better than random accuracy. The best correlation seems to be that a song’s closeness to a particular period’s “average” predicts high chart success. It is, he says, “statistically significant”.

Armed with this insight he takes on the role of producer and attempts to make a song (a ballad) being recorded at Trevor Horn’s studio as average as possible by, amongst other things, adjusting its tempo and adding some rap. It doesn’t really work, and when he measures the results with his computer, he finds that he has manoeuvred the song away from average with this manual intervention.

He then shifts his attention to trying to find the stars of tomorrow by picking out the most average song from 1200 tracks that have been sent into BBC Radio 1 Introducing. The computer picks out a particular band who seem to have a very danceable track, and in the world’s least scientific experiment ever, he demonstrates that a BBC Radio 1 producer thinks it’s OK, too.

His final conclusion: “We failed spectacularly this time, but I am sure the answer is somewhere in the data if we can just find it”.

My immediate thoughts on this programme:

-An entertaining, interesting programme.

-The rule still holds: science is not valid in the field of aesthetic judgement.

-If your system cannot predict the future stars of the past, it is very unlikely to be able to predict the stars of the future.

-The choice of which aspects of songs to measure is purely subjective, based on the scientist’s own assumptions about what humans like about music. The chances of the scientist not tweaking the algorithms in order to reflect their own intuitions are very remote. To claim that “The computer picked the song with no human intervention” is stretching it! (This applies to any ‘science’ whose main output is based on computer modelling).

-The lure of data is irresistible to scientists but, as anyone who has ever experimented with anything but the simplest, most controlled, pattern recognition will tell you, there is always too much, and at the same time never enough, training data. It slowly dawns on you that although theoretically there may be multidimensional functions that really could spot what you are looking for, you are never going to present the training data in such a way that you find a function with 100%, or at least ‘human’ levels of, reliability.

-Add to that the myriad paradoxes of human consciousness, and of humans modifying their tastes temporarily in response to novelty and fashion – even to the data itself (the charts) – and the reality is that it is a wild goose chase.

(very relevant to a post from a few months ago)

The first time I ever heard stereo

I can remember the exact moment. My dad had tried to explain to me the difference between true stereo and just wiring up a second speaker to a mono radio – and failed. I went with him to the hi fi retailer Comet to pick up our new Tandberg receiver. Unfortunately they didn’t yet have the Tandberg speakers or Thorens record deck in stock, but we bought some Koss K6 LC headphones. That evening we attached some wires to make an FM aerial, and plugged in the headphones. My dad tuned in BBC Radio 2 – some big band programme I think – and handed me the headphones. Of course, within a fraction of a second I understood what stereo was. This would have been round about 1972-73.

vintage-tandberg-tr1010-fm-am-stereo-receiver-56-p

Tandberg TR1010

This Tandberg receiver seems to be going for a reasonable price

koss k6

Koss K6 headphones. Our version of these headphones had a slider volume control on each earcup. [www.etsy.com]