Auditory Scene Analysis

There is a field of study called Auditory Scene Analysis (ASA) that postulates that humans interpret “scenes” using sound just as they do using vision. I am not sure that it necessarily has any particular bearing on the way that audio hardware should be designed: basically the scene is all the clearer if the reproduction of the audio is clean in terms of noise, channel separation, distortion, frequency response and (seemingly controversial to hi-fi folk) the time domain.

However, the seminal work in this field includes the following analogy for hearing:

Your friend digs two narrow channels up from the side of a lake. Each is a few feet long and a few inches wide and they are spaced a few feet apart. Halfway up each one, your friend stretches a handkerchief and fastens it to the sides of the channel. As the waves reach the side of the lake they travel up the channels and cause the two handkerchiefs to go into motion. You are allowed to look only at the handkerchiefs and from their motions to answer a series of questions: How many boats are there on the lake and where are they? Which is the most powerful one? Which one is closer? Is the wind blowing? Has any large object been dropped suddenly into the lake?

Of course, when we listen to reproduced music with an audio system we are, in effect, duplicating the motion of the handkerchiefs using two paddles in another lake (our listening room) and watching the motion of a new pair of handkerchiefs. Amazingly, it works! But the key to this is that the two lakes are well-defined linear systems. Our brains can ‘work back’ to the original sounds using a process akin to ‘blind deconvolution’.

If we want to, we can eliminate the ‘second lake’ by using headphones, or we can almost eliminate it by using an anechoic chamber. We could theoretically eliminate it at a single point in space by deconvolving the reproduced signal with the measured impulse response of the room at that point. Listening with headphones works OK, but listening to speakers in a dead acoustic sounds terrible – probably to do with ‘head related transfer function’ (HRTF) telling us that we are listening to a ‘real’ acoustic but with an absence of the expected acoustic cues when we move our heads. By adding the ‘second lake’ we create enough ‘real acoustic’ to overcome that.

But here is why ‘room correction’ is flawed. The logical conclusion of room correction is to simulate headphones, but this cannot be achieved – and is not what most listeners want anyway, even if they don’t know it. Instead, an incomplete ‘correction’ is implemented based on the idea of trying to make the motion of the two sets of ‘handkerchiefs’ closer to each other than they (in naive measurements) appear to be. If the idea of the brain ‘working back’ to the original sound is correct, it will ‘work back’ to a seemingly arbitrarily modified recording. Modifying the physical acoustics of the room is valid whereas modifying the signal is not.

I think the problem stems ultimately from an engineering tool (frequency domain measurement) proliferating due to cheap computing power. There is a huge difference in levels of understanding between the author of the ASA book and the audiophiles and manufacturers who think that the sound is improved by tweaking graphic equalisers in an attempt to compensate for delays that the brain has compensated for already.



How often do you stumble across an album by an artist you were completely unaware of, and find that it’s as good as anything you’ve heard in your life? It’s nice when it happens!

Ever heard of Jobriath? I hadn’t. It seems he was going to be the next big thing in 1973 but the world wasn’t quite ready for him. Anyway, I just listened to the album Jobriath which I have somehow managed to miss until now. Fantastic music with unusual arrangements and unexpected twists and turns – check out the piano part on Inside. Beautiful, fresh recording. Surely this is as good as David Bowie or Elton John. The highlight of the album for me is I’m a Man which, among its many virtues, uses a harpsichord to great effect.

The Man in the White Suit


There’s a brilliant film from the 1950s called The Man in the White Suit. It’s a satire on capitalism, the power of the unions, and the story of how the two sides find themselves working together to oppose a new invention that threatens to make several industries redundant.

I wonder if there’s a tenuous resemblance between the film’s new wonder-fabric and the invention of digital audio? I hesitate to say that it’s exactly the same, because someone will point out that in the end, the wonder-fabric isn’t all it seems and falls apart, but I think they do have these similarities:

  1. The new invention is, for all practical purposes, ‘perfect’, and is immediately superior to everything that has gone before.
  2. It is cheap – very cheap – and can be mass-produced in large quantities.
  3. It has the properties of infinite lifespan, zero maintenance and non-obsolescence
  4. It threatens the profits not only of the industry that invented it, but other related industries.

In the film it all turns a bit dark, with mobs on the streets and violence imminent. Only the invention’s catastrophic failure saves the day.

In the smaller worlds of audio and music, things are a little different. Digital audio shows no signs of failing, and it has taken quite a few years for someone to finally come up with a comprehensive, feasible strategy for monopolising the invention while also shutting the Pandora’s box that was opened when it was initially released without restrictions.

The new strategy is this:

  1. Spread rumours that the original invention was flawed
  2. Re-package the invention as something brand new, with a vagueness that allows people to believe whatever they want about it
  3. Deviate from the rigid mathematical conditions of the original invention, opening up possibilities for future innovations in filtering and “de-blurring”. The audiophile imagination is a potent force, so this may not be the last time you can persuade them to re-purchase their record collections, after all.
  4. Offer to protect the other, affected industries – for a fee
  5. Appear to maintain compatibility with the original invention – for now – while substituting a more inconvenient version with inferior quality for unlicensed users
  6. Through positive enticements, nudge users into voluntarily phasing out the original invention over several years.
  7. Introduce stronger protection once the window has been closed.

It’s a very clever strategy, I think. Point (2) is the master stroke.

Hi-Fi Sci-Fi

stone tape

Last night I watched a BBC TV play from 1972 called The Stone Tape. An electronics company installs its R&D department in an old mansion, with the aim of developing “a new recording medium”. Tape is, apparently, “too delicate and it loses its memory”. They stumble upon a possible ready-made solution in a room in the oldest part of the house, which seems to have a ‘ghost’ – a Victorian maid frozen in time just before she fell to her death. What if it’s not a ghost, but a ‘recording’ of an event that has somehow become embedded in the stone itself? Maybe this could be “the big one” they have been looking for…

What I particularly liked about it, was the idea that – hard to believe – there once was a time before the world went digital, and when everything was still up for grabs. Digital computers do play a role in the story, but only as a way of “correlating” the experimental results in order to spot possible connections that a human might miss.

It’s also a well-observed portrayal of life in a certain kind of company – some of it seemed very familiar.

The pickle that listening-based ‘science’ gets us into

fmri-salmonJust expanding on an earlier post: some thoughts on ‘audio science’ and its observation that human perception of sound is often influenced by our imagination. Blind testing doesn’t eliminate our imagination, merely prevents it from biasing the result of the test. We can still imagine anything we like when switching between A and B – and under such conditions, the imagination is likely to flourish. In amongst these high levels of imaginary ‘noise’, audio ‘scientists’ think that the magical powers of statistical formulae can enable them to discern audible differences that the test subjects didn’t even know they had heard. Or they can confidently state that no difference was heard. Such confidence in the validity of their statistics brings to mind a study of the brain of a dead fish that, with the dumb application of dumb formulae, could be interpreted as responding to images flashed up in front of its eyes.

Outside the laboratory, there is an awkward shift when the people who espouse ‘audio science’ want to sell us their products, or even to buy something themselves. It is their implicit position that any demonstration of the product in a showroom or the customer’s own home, is a sham. Customers – including themselves – are malleable creatures that imagine what they are persuaded to hear. Even the ‘science’ that has been used in the equipment’s creation and is promoted in advertising (or, indeed is advertising), feeds back into how people perceive the sound. The audio scientist/objectivist is in a completely paradoxical position where they cannot even know whether they actually like something! They must acknowledge that they only think they like something on that particular day in that particular showroom, or in their own workshop as they tweak their crossover design. They could conduct their own blind listening tests to establish their preference scientifically, but how many of these would they have to run in order to cover every permutation of the variables when they change a setting? Much more than a lifetime’s-worth. And as discussed before, there is no way to tell whether taking part in a listening trial affects our ability to discern differences, anyway.

The only way out of this impasse while maintaining the listening trial dogma is to argue that statistics from blind listening tests carried out by others can tell us what to like on the basis of sheer scale, and the probability that our hearing preferences are the same as everyone else’s. But do we then allow just anyone off the street to tell us what is best, or do we use “trained” listeners? The former would seem just silly, but the latter leaves the whole scheme open to accusations of incestuousness and circularity. In a deft move, a claim is made that trained listeners still register the same average preferences that ordinary people would over thousands of tests, but that they do it more clearly and decisively. Attempting to determine whether this is in fact the case would be such a circular absurdity that people can only accept it on faith. This is one of audio science’s self-deluding sleights of hand: being as rigorous as anyone could like in the execution of the actual trials, but basing the premises of the experiment and the conclusions to be drawn from it on the flimsiest of hand-waving. Of course, as this is science, anyone can challenge those conclusions or conduct their own experiments, but this just replaces one flimsy assertion with another.

Why is this all so difficult and farcical? Well, I think it is because science has no meaning in the world of ‘art’ so our troubles start there. Technically, perhaps it can be argued that we are only attempting to use science to create hardware for reproducing art – which doesn’t sound too difficult. But when we say “reproducing” do we mean “most accurate”, or “most preferred”? The fact that anyone would go to the lengths of using listening trials is a giveaway that they are not sure whether they know (a) how to determine “most accurate” objectively, (b) whether listeners actually want accuracy in the context of listening to music in their own homes. and (c) whether recordings created while monitoring on existing speakers should be reproduced accurately anyway.  

At this point, the entire enterprise is doomed to circularity and farce. The human trial participant is subjected to reproduced ‘art’ (but not the original!) and either by directly registering preferences, or indirectly by registering differences, is assumed to be capable of determining the ‘best’ method of reproducing that art. ‘Art’ is the thing that no one can define – the thing that is supposed to affect us emotionally in ways that cannot be predicted. Using it as the stimulus to gauge human reaction to the hardware is not obviously compatible with science is it?

In contrast, it is perfectly rational to admit that scientific experiments cannot tell us the best way to reproduce art. It is perfectly rational to simply work out on paper a likely way of doing it, then build it and listen to it. We will never know scientifically whether we actually like it, because this is beyond the remit of science. But this doesn’t stop us from enjoying it, anyway. In a normal setting we are not entirely slave to our imaginations – we can make a fair assessment of when something is obviously good or bad.

Rather than the (pseudo)scientific blind listening test, I think there is a much more fruitful test. It is the ultimate ‘sighted’ test, that suppresses imaginary differences, and is only possible because of DSP – which can be used to simulate the characteristics of real world hardware in many ways. The test is this: While listening, be allowed to change whatever parameter you like using DSP and hear the result instantaneously. Change one variable and flick backwards and forwards between two values while listening. Or change several variables simultaneously if you like. Close your eyes while pressing the supplied ‘random’ button and see if you were right. Such a test would condense a lifetime’s worth of exhaustive listening trials into a few minutes or hours of ‘fun’ that is much more representative of normal listening than the dreary alternative. (For example, with my own system I can make instantaneous radical changes to the crossovers that other people can only achieve in a much more limited way with huge effort and long intervals of silent soldering in between). It isn’t science. It won’t tell you definitively what you prefer, or what you are sensitive to in normal listening, but it will certainly put into context the scale of the changes you have to make in order to hear a ‘night and day’ difference. It allows an instantaneous comparison between various types of technology that could never be achieved otherwise. It could help lay to rest a few audio demons.