Audio Objects

Some audio pessimists are convinced that because a stereo recording and reproduction system can only sample a couple of infinitesimal points within the overall ‘sound field’, it is futile to imagine that the result can be anything but a pale imitation of the real thing.

Others are convinced that although the efforts of recording engineers mean that the recording itself is passable, the problem is that speakers playing in a real room are not conveying it to their ears accurately enough. They attempt to alter what comes out of the speakers in order to compensate for the room.

And stereo itself when reproduced over speakers is assumed to be so flawed due to crosstalk to the ‘wrong’ ear that it can’t possibly work, and we must be deluding ourselves if we think it does.

These are assumptions made by people who cannot allow themselves to enjoy their audio systems. I suggest they are fixated on the wrong things and the situation is much better than they imagine. A different way to view the problem of audio is this:

It is a mistake to think that the aim of the system is to recreate the precise waveform that would have reached the listener’s ear at the actual performance. It is not practically achievable, would not necessarily reproduce a realistic perception of the actual performance in the context of the listener’s own room anyway, and also it is not necessary. Most people couldn’t even tell you which of two plausible versions of the waveform are absolutely correct, and that is because they’re not hearing a waveform; they’re hearing musical and acoustic ‘objects’. It is the relationship between those objects that is paramount.

An ‘object’ could be:

  • A voice
  • A choir
  • Silence
  • A sad note
  • A happy chord
  • Song lyrics
  • A violin
  • A rhythm
  • An orchestra
  • A concert hall
  • Tension

The primary aim of a hi-fi system (as opposed to a kitchen radio, for example) is to maintain the integrity of single objects and the separation of different objects.

The secondary aim of the hi-fi system is to present the objects in a plausible way that allows for the normal behaviour of the listener; the sound basically appearing to emanate from in front of the listener, separable by distance and direction, without strange acoustic sensations if they turn to talk to their neigbour.

And that’s it. Everything flows from there.

  • Harmonic distortion (and the corresponding intermodulation distortion) smears objects together.
  • Bumps and dips in the frequency and phase response of a speaker smears the objects together and punches holes in the integrity of the objects.
  • Noise smears itself over all the objects, obscuring their separation.
  • Limited bass damages the integrity of certain objects and smears those objects together.
  • Timing errors smear objects together. Resonators in speakers (e.g. bass reflex) that take time to ‘get going’ and time ‘to stop’ damage and smear the objects together.
  • Stereo obviously aids in separating objects. Just a pair of speakers provides a continuous spread of individual, separate acoustic sources. And stereo over speakers isn’t flawed; the crosstalk to the ‘wrong ear’ is how it produces the image in the first place.
  • Realistic volume helps to elevate objects above the noise floor, with a more natural sound due to our hearing’s volume-dependent frequency sensitivity.

So some objects make it out of a kitchen radio OK: a rhythm, a melody or the words of a song. But other objects may be severely damaged or smeared together. On a hi-fi system you might hear two separate guitars but on the radio they’re just a wash over the whole sound. On the hi-fi you hear a startling, deep bass note, but on the radio there’s nothing.

And the hi-fi system does things ‘without trying’ – which is why some people can’t believe it’s doing them. The stereo system with speakers automatically creates a two-way interaction between the listeners and the performance because both are subject to the listening room’s acoustics. This also solves the problem of how to cram a concert hall into the listener’s room as well as the more intimate performances. Is the aim for the musicians and venue to come to the listener or for the listener to go to the performance? The stereo system with speakers creates a hybrid: regard it as the listener’s room being transported to the venue and its end wall being opened up.

The First Lossy Codec

(probably).

Nowadays we are used to the concept of the lossy codec that can reduce the bit rate of CD-quality audio by a factor of, say, 5 without much audible degradation. We are also accustomed to lossless compression which can halve the bit rate without any degradation at all.

But many people may not realise that they were listening to digital audio and a form of lossy compression in the 1970s and 80s!

Early BBC PCM

As described here, the BBC were experimenting with digital audio as early as the 1960s, and in the early 70s they wired up much of the UK FM transmitter network with PCM links in order to eliminate the hum, noise, distortion and frequency response errors that were inevitable with the previous analogue links.

So listeners were already hearing 13-bit audio at a sample rate of 32 kHz when they tuned into FM radio in the 1970s. I was completely unaware of this at the time, and it is ironic that many audiophiles still think that FM radio sounds good but wouldn’t touch digital audio with a bargepole.

13 bits was pretty high quality in terms of signal-to-noise-ratio, and the 32 kHz sample rate gave something approaching 15 kHz audio bandwidth which, for many people’s hearing, would be more than adequate. The quality was, however, objectively inferior to that of the Compact Disc that came later.

Downsampling to 10 bits

In the later 70s, in order to multiplex more stations into a lower bandwidth, the BBC wanted to compress higher quality 14-bit audio down to 10 bits

As you may be aware, downsampling to a lower bit depth leads to a higher level of background noise due to the reduced resolution and the mandatory addition of dither noise. For 10 bits with dither, the best that could be achieved would be a signal to noise ratio of 54 dB (I think I am right in saying) although the modern technique of noise shaping the dither can reduce the audibility of the quantisation noise.

This would not have been acceptable audible quality for the BBC.

Companding Noise Reduction

Compression-expansion is a noise reduction technique that was already used with analogue tape recorders e.g. the dbx noise reduction system. Here, the signal’s dynamic range is squashed during recording i.e. the quiet sections are boosted in level, following a specific ‘law’. Upon replay, the inverse ‘law’ is followed in order to restore the original dynamic range. In doing so, any noise which has been added during recording is boosted downwards in level, reducing its audibility.

With such a system, the recorded signal itself carries the information necessary to control the expander, so compressor and expander need to track each other accurately in terms of the relationships between gain, level and time. Different time constants may be used for ‘attack’ and ‘release’ and these are a compromise between rapid noise reduction and audible side effects such as ‘pumping’ and ‘breathing’. The noise itself is being modulated in level, and this can be audible against certain signals more than others. Frequency selective pre- and de-emphasis can also help to tailor the audible quality of the result.

The BBC investigated conventional analogue companding before they turned to the pure digital equivalent.

N.I.C.A.M

The BBC called their digital equivalent of analogue companding ‘NICAM’ (Near Instantaneously Companded Audio Multiplex). It is much, much simpler, and more precise and effective than the analogue version.

It is as simple as this:

  • Sample the signal at full resolution (14 bits for the BBC)
  • Divide the digitised stream into time-based chunks (1ms was the duration they decided upon);
  • For each chunk, find the maximum absolute level within it;
  • For all samples in that chunk, do a binary shift sufficient to bring all the samples down to within the target bit depth (e.g. 10 bits);
  • Transmit the shifted samples, plus a single value indicating by how much they have been shifted;
  • At the other end, restore the full range by shifting samples in the opposite direction by the appropriate number of bits for each chunk.

Using this system, all ‘quiet chunks’ i.e. those already below the 10 bit maximum value are sent unchanged. Chunks containing values that are higher in level than 10 bits lose their least significant bits, but this loss of resolution is masked by the louder signal level. Compared to modern lossy codecs, this method requires minimal DSP and could be performed without software using dedicated circuits based on logic gates, shift registers and memory chips.

You may be surprised at how effective it is. I have written a program to demonstrate it, and in order to really emphasise how good it is, I have compressed the original signal into 8 bits, not the 10 that the BBC used.

In the following clip, a CD-quality recording has been converted as follows:

  • 0-10s is the raw full-resolution data
  • 10-20s is the sound of the signal downsampled to 8 bits with dither – notice the noise!
  • 20-40s is the signal compressed NICAM-style into 8 bits and restored at the other end.

I think it is much better than we might have expected…

(I was wanting to start with high quality, so I got the music extract from here:

http://www.2l.no/hires/index.html

This is the web site of a label providing extracts of their own high quality recordings in various formats for evaluation purposes. I hope they don’t mind me using one of their excellent recorded extracts as the source for my experiment).

The Secret Science of Pop

secret-science-of-pop

In The Secret Science of Pop, evolutionary biologist Professor Armand Leroi tells us that he sees pop music as a direct analogy for natural selection. And he salivates at the prospect of a huge, complete, historical data set that can be analysed in order to test his theories.

He starts off by bringing in experts in data analysis from some prestigious universities, and has them crunch the numbers on the past 50 years of chart music, analysing the audio data for numerous characteristics including “rhythmic intensity” and “agressiveness”. He plots a line on a giant computer monitor showing the rate of musical change based on an aggregate of these values. The line shows that the 60s were a time of revolution – although he claims that the Beatles were pretty average and “sat out” the revolution. Disco, and to a lesser extent punk, made the 70s a time of revolution but the 80s were not.

He is convinced that he is going to be able to use his findings to influence the production of new pop music. The results are not encouraging: no matter how he formulates his data he finds he cannot predict a song’s chart success with much better than random accuracy. The best correlation seems to be that a song’s closeness to a particular period’s “average” predicts high chart success. It is, he says, “statistically significant”.

Armed with this insight he takes on the role of producer and attempts to make a song (a ballad) being recorded at Trevor Horn’s studio as average as possible by, amongst other things, adjusting its tempo and adding some rap. It doesn’t really work, and when he measures the results with his computer, he finds that he has manoeuvred the song away from average with this manual intervention.

He then shifts his attention to trying to find the stars of tomorrow by picking out the most average song from 1200 tracks that have been sent into BBC Radio 1 Introducing. The computer picks out a particular band who seem to have a very danceable track, and in the world’s least scientific experiment ever, he demonstrates that a BBC Radio 1 producer thinks it’s OK, too.

His final conclusion: “We failed spectacularly this time, but I am sure the answer is somewhere in the data if we can just find it”.

My immediate thoughts on this programme:

-An entertaining, interesting programme.

-The rule still holds: science is not valid in the field of aesthetic judgement.

-If your system cannot predict the future stars of the past, it is very unlikely to be able to predict the stars of the future.

-The choice of which aspects of songs to measure is purely subjective, based on the scientist’s own assumptions about what humans like about music. The chances of the scientist not tweaking the algorithms in order to reflect their own intuitions are very remote. To claim that “The computer picked the song with no human intervention” is stretching it! (This applies to any ‘science’ whose main output is based on computer modelling).

-The lure of data is irresistible to scientists but, as anyone who has ever experimented with anything but the simplest, most controlled, pattern recognition will tell you, there is always too much, and at the same time never enough, training data. It slowly dawns on you that although theoretically there may be multidimensional functions that really could spot what you are looking for, you are never going to present the training data in such a way that you find a function with 100%, or at least ‘human’ levels of, reliability.

-Add to that the myriad paradoxes of human consciousness, and of humans modifying their tastes temporarily in response to novelty and fashion – even to the data itself (the charts) – and the reality is that it is a wild goose chase.

(very relevant to a post from a few months ago)

The first time I ever heard stereo

I can remember the exact moment. My dad had tried to explain to me the difference between true stereo and just wiring up a second speaker to a mono radio – and failed. I went with him to the hi fi retailer Comet to pick up our new Tandberg receiver. Unfortunately they didn’t yet have the Tandberg speakers or Thorens record deck in stock, but we bought some Koss K6 LC headphones. That evening we attached some wires to make an FM aerial, and plugged in the headphones. My dad tuned in BBC Radio 2 – some big band programme I think – and handed me the headphones. Of course, within a fraction of a second I understood what stereo was. This would have been round about 1972-73.

vintage-tandberg-tr1010-fm-am-stereo-receiver-56-p

Tandberg TR1010

This Tandberg receiver seems to be going for a reasonable price

koss k6

Koss K6 headphones. Our version of these headphones had a slider volume control on each earcup. [www.etsy.com]