Article about listening tests

I just saw this article mentioned on the AudioKarma web site. I very much applaud the approach of taking an ‘institution’ such as the ABX test, and rather than arguing head-on about whether it works or not (which cannot be done if you don’t think it is valid in the first place), coming in from the side and dismantling it forensically.

Along with many other very perceptive points, the argument is that the ABX test is based on a completely flawed premise: it is not comparing audio systems using human hearing, but human memory.

I can’t vouch for whether the author and I would agree on most audio matters, but I certainly agree with him on the idea that listening tests are non-scientific on many levels.

Car Comparison

Someone generously gave me a car last week. My old one reached the end of the line at the age of 16 years and 200,000 miles, although it still looked like new and could easily have been kept going – but for a few hundred pounds, and if the alternative hadn’t been so cheap. The replacement car, a mere 13 years old, had once been quite ‘high end’, but had been used for transporting two huge dogs, logs for firewood, and small children, and so was in dire need of cleaning inside and out. I started on the interior and quickly realised that it was going to be a long job – there is so much stuff in a big car that just giving it all a quick clean is a problem in itself. In the end I took the wise decision, I thought, of having it cleaned by professionals – although even they missed a few bits and pieces when it came back.

Each item in a car is designed to a Safety Integrity Level (SIL) and the simplest thing, like an indicator stalk, has to meet various requirements for strength, reliability and what it would do in a crash situation: minimum radius on all edges (no sharp corners), controlled deformability. It has to work for ten years in temperatures from -30 to +105 degrees C and be resistant to constant UV exposure. There is an aesthetic element to its design. It is designed on computer, manufactured by CAM machinery, and extensively tested by robots. I think it is safe to say that more knowledge and expertise goes into designing an indicator stalk than most domestic audio systems.

Faced with the job of cleaning a large car, you get a feel for just how many individual items there are, all of which have gone through this epic design process. Some are seemingly simple, passive pieces of plastic, but even they are the result of untold hours of educated professionals’ time. Other items are more complex, for example a full audio system and a comprehensive climate control system. And this is only the interior – the part that most people would regard as almost trivial. Beyond that we have the ‘real’ engineering.

It is not an original point, but the price of a car is, for me, a meaningful way of highlighting the absurdity of the audiophile world and the prices that are regarded as perfectly normal there, with no guarantee of quality. No doubt the scale of manufacture and the sheer number of cars sold compared to high end audio systems, could be mentioned. But even hand-built cars not designed and built in faraway places, that sell in tiny numbers – like Ferraris and Bentleys – sell for less than many audio systems!

In a brief glance around the cockpit of my ‘new’ car, I noticed the following items, and could only imagine the complexity that lay behind each one:

  • seats
    • super strong
    • adjustable
    • fold down (for more boot space)
    • leather upholstery
    • seat heaters
    • fold down headrests
  • seat belts
  • steering wheel
    • strong but collapsible
    • adjustable
    • horn button
    • buttons various (audio system , cruise control)
    • airbag
  • windscreen wipers
    • multi-speed etc.
    • rain sensor
  • light switches
    • headlights
    • full beam
    • fog etc.
    • interior lights various
    • headlight elevation control
  • audio system
    • FM radio
    • CD changer
    • surround options
    • untold numbers of speaker drivers all over the place
  • climate control
    • controls, various
    • vents, individually controllable
    • air conditioning
    • demister
  • security alarm
  • instrument cluster
    • analogue dials (stepper motor driven)
    • various LCD readouts
    • adjustable illumination
  • indicator stalk
  • doors
    • door handles
    • central locking
    • map pockets
    • child locks
  • windows
    • electrically operated
    • toughened or laminated
    • UV and/or IR blocking
    • demisters
  • pedals
  • gear shift
  • hand brake
  • cruise control
  • 12V power outlets, various
  • carpets, numerous
  • cup holder!

At a similar level, in comparison the whole list for a pair of £20,000 speakers would be:

  • speakers

and even if we broke it down to a lower level we would only have:

  • two wooden boxes (or maybe even some complex-looking metal bits designed on a computer similar to, say, part of a car door)
  • some wadding (similar to that in a car seat, probably)
  • four speaker drive units
  • a handful of passive components on a circuit board similar to, say, the 150 circuit boards that are found in a car (I made that number up)
  • some terminal posts
  • some cloth

Maybe the speaker is unique, using some hitherto undiscovered technology that required a huge R&D spend to develop it, and is very, very expensive to manufacture? Maybe the craftsmanship and materials are simply remarkable? More than an entire Jaguar, say? Not even 0.1%.

I am not even suggesting that a car-sized sum wouldn’t be worth spending to get great sound – assuming that it was unavoidable (like building a Bentley really does cost a lot of money – it is obvious) and that only rich people, therefore, could have it. Nor am I suggesting that all manufacturers are involved in some sort of scam to fleece gullible audiophiles – for all I know, they really are managing to spend thousands of pounds by continually re-inventing the wheel using ancient technology and then assembling a few bits of metal, plastic and cardboard in a box really inefficiently. My bewilderment is reserved for the fact that the customers seem to collude with the magazines and manufacturers to keep audio products as primitive as possible, and the price as high as possible. Instead of embracing the manufacturer who might demonstrate moderate cost, high quality sound embodying tangible technological progress, they would write it off as ‘mid fi’. No such problem for the car industry which produces technological miracles for an amazingly low price.

That Floyd Toole lecture

I watched this because so many people are talking about it – and I even took notes.


The overall thrust of the lecture is, I would say, “objectivity works”. We can make measurements of the hardware, judge them against some fixed criteria, and then demonstrate that these correlate with real, human preferences in blind tests. Hurrah! The age-old debate is over, and we can improve our new speaker designs by building them to maximise their objective scores in the full knowledge that this would correlate with human preferences.

However, I am not totally convinced by the criteria that were specified in the lecture: it seemed to me that there might be gaps in the argument and some circularity.

What he showed:

  • Sighted listening tests can be flawed (thereby implying: all sighted listening tests are unreliable).
  • Some measurements can be carried out on speakers in an anechoic chamber (dubbed ‘Spin-o-rama’) and munged together to create a performance index related to flatness of frequency response and smoothness of off-axis response. Transient response is not a factor. At all.
  • In listening tests, speakers with the ‘best’ Spin-o-rama score are usually preferred by listeners over the opposite (implied: all else being equal).
  • Mono allows maximum discernment of difference, and does not contradict stereo listening results in the above tests (implied: therefore mono should be used for all listening tests)
  • Trained professional listeners give the ‘statistically healthiest’ range of scores, and do not contradict ordinary listeners in the above tests (implied: therefore trained professionals should be used for all listening tests)

What he didn’t show:

  • That it is valid to use the Spin-o-rama score in reverse i.e. as a tool for designing a speaker. He implies it is, but does not prove that a poor speaker could not be designed that achieves an exemplary Spin-o-rama score.
  • That transient response doesn’t matter – it is simply ignored. The speakers tested may have had good transient responses, or not, but as most of them were of conventional design they may all have been much of a muchness.
  • That various speaker technologies are inherently better or worse than others i.e. no view on whether sealed cabinets are better than bass reflex, or active crossovers better than passive – and his performance index is indifferent to this, assuming that flat steady-state frequency response is all that matters.
  • That mono speakers and trained professionals are the best choice for all listening tests.

It is possible to produce different colourations related to phase shifts while still producing a perfect frequency magnitude response (the drivers may have their phases matched perfectly throughout the crossover but the phase is shifted relative to other components in the signal). Similarly, bass reflex configurations distort the time domain response while maintaining a perfect steady state sinusoidal magnitude response. Dr. Toole’s tests don’t address these factors.

I have no doubt that flat frequency response and smooth off-axis response are essential, as he says, but might there be more to it than just that? Any unexplained deviations between the listening tests and the measurements (it isn’t a perfect correlation) could be explained by a multitude of factors including the speaker’s transient response which, after all, is a straightforward difference between what was recorded and what the speaker emits – it is just that someone around 1936 declared that ‘phase doesn’t matter’. Until recently it has not been possible to verify this, because it was not possible to produce a high quality output with close to perfect phase. Comparing different speakers all of which have phase/time distortion and other problems, and finding that listeners cannot tell them apart (in mono using someone’s idea of ‘typical’ music), does not tell us that a speaker without those distortions would not sound better.

Correlation is not causation, but people are talking about the Harman method as if it is. So, if I were a speaker designer doing things by the Toole book, I would always use bass reflex without thinking, as this would have no effect on the Spin-o-rama score but would result in a smaller box. And I would be supremely relaxed about crossover design, ensuring only that it matched the phases of drivers through the crossover. Phase correction and sealed enclosures wouldn’t get a look-in because they offer nothing extra in terms of the Spin-o-rama score but cost more to manufacture. 

My opinion is of no consequence, of course, but there are some serious people who do suggest that transient response matters, and it would have been nice if the guru of gurus could have mentioned it, if only to dismiss it with reasons.

Engineers at play

There’s an influence on ‘high end’ audio that is probably incomprehensible to most ordinary people, but which I think is obvious to people like me: as an engineer, I understand the appeal of the well-defined little project. My DIY system is a perfect example. The conditions for a perfect project are:

  1. It has clear boundaries
  2. From the outset it is clear that it is probably attainable
  3. Notwithstanding (2), it is not entirely obvious how to do it.
  4. It might exhibit excellence in some aspect – maybe with the fantasy of doing it better than has ever been achieved before.
  5. There is scope for making it unique, not just a re-hashing of someone else’s design (and this can lead to a ‘wilful ignorance’ i.e. deliberately not reading around the subject in order to avoid spoiling the fun. In the case of audio, this is probably essential to avoid being sucked into a vortex of misinformation!)
  6. It gives ample scope for ‘play’ – experimenting, refining, testing
  7. It can be ‘pimped up’ ad infinitum
  8. It has an audience: there are people who may be impressed by it and who may even applaud it.

Engineers dream of being assigned such projects when working for other people – but rarely get the opportunity. So, as a hobby, or for the the ultimate fantasy of starting their own company, ‘high end’ audio is the perfect vehicle. There are other such hobbies, but audio is quite seductive in the way it lets the person who is adept at tapping holes in aluminium feel that they are in touch with high culture and not just other middle aged men with oil under their fingernails.

I believe this phenomenon is responsible for at least some of the trends we see at the self-appointed super-‘high end’ i.e. those massive creations of copper and milled aluminium that probably sell in minuscule quantities but look great in show reports and photos – if you like that sort of thing. These can be huge enclosures and power supplies around a $0.50 DAC IC, or the millionth ever amplifier design, or an elaborate device for rotating a disc. The audiophile world is full of people who are impressed by ‘big’, or ‘shiny’, or ‘curved’, or ‘angular’ and their praise is the nearest most engineers will ever get to feeling like the person they deserve to be. But the important point is this: just because an object exists and is big/shiny/curved/angular/milled from the solid, it does not follow that that product was ever needed: it’s probably just engineers at play.