Audio – Literature Analogy

An audio recording is a bit like a book: created through artistic or intellectual endeavour, then ‘fixed’ as a collection of pure information and distributed to customers for them to ‘consume’ in their own environments. In the case of digital audio, a recording is literally the same as a book, being stored as numbers in a file; you could store a book as a WAV, or an audio recording as a MSWORD file if you wanted.

In rendering the content to be read, there are things you could do to detract from the content of a book:

  • printed too big/too small
  • lighting too dim/too bright
  • inappropriate use of colour
  • blotchy printout
  • typeface varies with content, or randomly
  • corrupted: missing/duplicated/erroneous characters
  • peculiar paper
  • non-neutral typeface – difficult to read or inappropriate e.g. science fiction font for a Jane Austen novel
  • in the case of some ‘boutique’ printing, an appropriate analogy might be a book that spontaneously becomes too hot to touch, or occasionally ruins valuable furniture.

The emotional or intellectual force of the book would actually be reduced because of these problems. In other words, it is not true to say that the quality of reproduction doesn’t matter.

However, there is a finite envelope of neutral, even ‘mundane’, reproduction which achieves an optimal result for the reader – after reading the book they can’t tell you anything about the quality of the printing; all they remember is the content, and the content was thrilling.

Maybe the author specifies the typeface. Some books may include fine illustrations or intricate frontispieces which are intrinsic to the book. In these cases, the reproduction needs to be particularly accurate in order to do justice to what the author has created.

Beyond this, is there anything that the printer can do to enhance the appeal of the book? Well, they can create a fancy binding that the reader notices before they start reading; they can use particularly high quality paper; they can print the characters with micron precision. But only a book collector or printing technology enthusiast would care about these refinements – they have no effect on the actual experience of reading the content, and could easily detract from it.

The manufacturers of the ink and the mains cable that powers the printing press could read lots of books in their spare time, attend evening classes in English Literature, study the physiology of the eye, get diplomas in grammar, and tell us in interviews with speciality magazines about how it all informs their craft. But clearly the results would do nothing whatsoever to change the reading experience.

The printer might decide to dabble in science for the first time since they left printing college. They could do scientific trials in aspects of book reproduction where lucky participants get to read snippets of text or passages from ‘typical’ books, responding with their perceptions of differences, preferences, or even ‘emotional stimulation level’ in aspects such as:

  • typeface
  • ink
  • reading light
  • paper texture and weight
  • reading room shape/dimensions/finishes

But the results would be rather obvious and predictable, with anything slightly interesting being clearly the result of fashion, novelty and human fickleness rather than being a universal law.

The only way to actually enhance the book would be to change its content. An algorithm that replaces certain words? Re-writes sections to make them longer or shorter? Clearly in the case of literature, such a thing would be meaningless and idiotic. It is not so different in the case of audio. There is nothing but the recording: there is no technology, effect or algorithm that can meaningfully enhance it.


Domestic hi-fi is no more than the equivalent of rendering the printed content of a book: it can be done adequately or badly, and beyond that there is no meaningful way of improving on it. People become deluded by the idea that the rendering technology can enhance the content – which is obviously ridiculous in the case of books, but less obvious with audio.

But this is not to say that hi-fi is, in itself, boring: achieving ‘adequate’ is not trivial.

Many people are simply not used to hearing adequate reproduction regardless of how much money they spend, so they are not aware that the experience vs. quality graph has a horizontal flat top. And needless to say, the audiophile quality vs. cost graph is more-or-less random, which makes it even more confusing.

The audio enthusiast would be much happier and richer if they got a sense of proportion of what matters, then put all their creativity (and money if they’ve got nothing else to spend it on) into building the equivalent of a pleasant reading room, comfy chair and attractive bookcases rather than a solid gold and diamond reading light.

[Last edited  30/05/17]

Reverberation of a point source, compared with a ‘distributed’ loudspeaker

Here’s a fascinating speaker:

CBT36 Manufacturer of loudspeakers that focus on elimination of box resonances.

It uses many transducers arranged in a specific curve, driven in parallel and with ‘shading’ i.e. graduated volume settings along the curve, to reduce vertical dispersion but maintain wide dispersion in the horizontal. I can see how this might appear quite appealing for use in a non-ideal room with low ceilings or whatever.

It is a variation on the phased array concept, where the outputs of many transducers combine to produce a directional beam. It is effectively relying on differing path lengths from the different transducers producing phase cancellation or reinforcement in the air at different angles as you move off axis. All the individual wavefronts sum correctly at the listener’s ear to reproduce the signal accurately.

At a smaller scale, a single transducer of finite size can be thought of as many small transducers being driven simultaneously. At high frequencies (as the wavelengths being reproduced become short compared to the diameter of the transducer) differing path lengths from various parts of the transducer combine in the air to cause phase cancellation as you move off axis. This is known as beaming and is usually controlled in speaker design by using drivers of the appropriate size for the frequencies they are reproducing. Changes in directivity with frequency are regarded as undesirable in speaker design, because although the on-axis measurements can be perfect, the ‘room sound’ (reverberation) has the ‘wrong’ frequency response.

A large panel speaker suffers from beaming in the extreme, but with Quad electrostatics Peter Walker introduced a clever trick, where phase is shifted selectively using concentric circular electrodes as you move outwards from the centre of the panel. At the listener’s position, this simulates the effect of a point source emanating from some distance behind the panel, increasing the size of the ‘sweet spot’ and effectively reducing the high frequency beaming.

There are other ways of harnessing the power of phase cancellation and summation. Dipole speakers’ lower frequencies cancel out at the sides (and top and bottom) as the antiphase rear pressure waves meet those from the front. This is supposed to be useful acoustically, cutting down on unwanted reflections from floor, walls and ceiling. A dipole speaker may be realised by mounting a single driver on a panel of wood with a hole in it, but it behaves effectively as two transducers, one of which is in anti-phase to the other. Some people say they prefer the sound of such speakers over conventional box speakers.

This all works well in terms of the direct sound reaching the listener and, as in the CBT speaker above, may provide a very uniform dispersion with frequency compared to conventional speakers. But beyond the measurements of the direct sound, does the reverberation sound quite ‘right’? What if the overall level of reverberation doesn’t approximate the ‘liveness’ of the room that the listeners notice as they talk or shuffle their feet? If the vertical reflections are reduced but not the horizontal, does this sound unnatural?

Characterising a room from its sound

The interaction of a room and an acoustic source could be thought of as a collection of simultaneous equations – acoustics can be modelled and simulated for computer games, and it is possible for a computer to do the reverse and work out the size and shape of the room from the sound.  If the acoustic source is, in fact, multiple sources separated by certain distances, the computer can work that out, too.

Does the human hearing system do something similar? I would say “probably”. A human can work quite a lot out about a room from just its sound – you would certainly know whether you were in an anechoic chamber, a normal room or a cathedral. Even in a strange environment, a human rarely mistakes the direction and distance from which sound is coming. Head movements may play a part.

And this is where listening to a ‘distributed speaker’ in a room becomes a bit strange.

Stereo speakers can be regarded as a ‘distributed speaker’ when playing a centrally-placed sound. This is unavoidable – if we are using stereo as our system. Beyond that, what is the effect of spreading each speaker itself out, or deliberately creating phased ‘beams’ of sound?

Even though the combination of direct sounds adds up to the familiar sound at the listener’s position as though emanating from its original source, there is information within the reflections that is telling the listener that the acoustic source is really a radically different shape. Reverberation levels and directions may be ‘asymmetric’ with the apparent direct sound.

In effect, the direct sound says we are listening to this:

Image result for zoe wanamaker cassandra

but the reverberation says it is something different.

Image result for zoe wanamaker cassandra

Might there be audible side effects from this? In the case of the dipole speaker, for example, the rear (antiphase) signal reflects off the back wall and some of it does make its way forwards to the listener. In my experience, this comes through as a certain ‘phasiness’ but it doesn’t seem to bother other people.

From a normal listening distance, most musical sources are small and appear close to being a ‘point source’. If we are going to add some more reverberation, should it not appear to be emanating as much as possible from a point source?

It is easy to say that reverberation is so complex that it is just a wash of ‘ambience’ and nothing more; all we need to do is give it the right ‘colour’ i.e. frequency response. And one of the reasons for using a ‘distributed speaker’ may be to reduce the amount of reverberation anyway. But I don’t think we should overdo it: we surely want to listen in real rooms because of the reverberation, not despite it. What is the most side effect-free way to introduce this reverberation?

Clearly, some rooms are not ideal and offer too much of the wrong sort of reverberation. Maybe a ‘distributed speaker’ offers a solution, but is it as good as a conventional speaker in a suitable room? And is it really necessary, anyway? I think some people may be misguidedly attempting to achieve ‘perfect’ measurements by, effectively, eliminating the room from the sound even though their room is perfectly fine. How many people are intrigued by the CBT speaker above simply because it offers ‘better’ conventional in-room measurements, regardless of whether it is necessary?


‘Distributed speakers’ that use large, or multiple, transducers may achieve what they set out to do superficially, but are they free of side-effects?

I don’t have scientific proof, but I remain convinced that the ‘Rolls Royce’ of listening remains ‘point source’ monopole speakers in a large, carpeted, furnished room with a high ceiling. Box speakers with multiple drivers of different sizes are small and can be regarded as being very close to a single transducer, but are not so omnidirectional that they create too much reverberation. The acoustic ‘throw’ they produce is fairly ‘natural’. In other words, for stereo perfection, I think there is still a good chance that the types of rooms and speakers people were listening to in the 1970s remain optimal.

[Last edited 17.30 BST 09/05/17]

The Logic of Listening Tests

Casual readers may not believe this, but in the world of audiophilia there are people who enjoy organising scientific listening tests – or more aptly ‘trials’. These involve assembling panels of human ‘subjects’ to listen to snippets of music played through different setups in double blind tests, pressing buttons or filling in forms to indicate audible differences and preferences. The motivation is often to use science to debunk the ideas of a rival group, who may be known as ‘subjectivists’ or ‘objectivists’, or to confirm the ideas of one’s own group.

There are many, many inherent reasons why such listening tests may not be valid e.g.

  • no one can demonstrate that the knowledge you are taking part in an experiment doesn’t impede your ability to hear differences
  • a participant who has his own agenda may choose to ‘lie’ in order to pretend he is not hearing differences when he, in fact, is.
  • etc. etc.

The tests are difficult and tedious for the participants, and no one who holds the opposing viewpoint will be convinced by the results. At a logical level, they are dubious. So why bother to do the tests? I think it is an ‘appeal to a higher authority’ to arbitrate an argument that cannot be solved any other way. ‘Science’ is that higher authority.

But let’s look at just the logic.

We are told that there are two basic types of listening test:

  1. Determining or identifying audible difference
  2. Determining ‘preference’

Presumably the idea is that (1) suggests whether two or more devices or processes are equivalent, or whether their insertion into the audio chain is audibly transparent. If a difference is identified, then (2) can make the information useful and tell us which permutation sounds best to a human. Perhaps there is a notion that in the best case scenario a £100 DAC is found to sound identical to a £100,000 DAC, or that if they do sound different, the £100 DAC is preferred by listeners. Or vice versa.

But would anything actually have been gained by a listening test over simple measurements? A DAC has a very specific, well-defined job to do – we are not talking about observing the natural world and trying to work out what is going on. With today’s technology, it is trivial to make a DAC that is accurate to very close objective tolerances for £100 – it is not necessary to listen to it to know whether it works.

For two DACs to actually sound different, they must be measurably quite far apart. At least one of them is not even close to being a DAC: it is, in fact, an effects box of some kind. And such are the fundamental uncertainties in all experiments involving the asking of humans how they feel, it is entirely possible that in a preference-based listening test, the listeners are found to prefer the sound of the effects box.

Or not. It depends on myriad unstable factors. An effects box that adds some harmonic distortion may make certain recordings sound ‘louder’ or ‘more exciting’ thus eliciting a preference for it today – with those specific recordings. But the experiment cannot show that the listeners wouldn’t be bored with the effect three hours, days or months down the line. Or that they wouldn’t hate it if it happened to be raining. Or if the walls were painted yellow, not blue. You get the idea: it is nothing but aesthetic judgement, the classic condition where science becomes pseudoscience no matter how ‘scientific’ the methodology.

The results may be fed into statistical formulae and the handle cranked, allowing the experimenter to declare “statistical significance”, but this is just the usual misunderstanding of statistics, which are only valid under very specific mathematical conditions. If your experiment is built on invalid assumptions, the statistics mean nothing.

If we think it is acceptable for a ‘DAC’ to impose its own “effects” on the sound, where do we stop? Home theatre amps often have buttons labelled ‘Super Stereo’ or ‘Concert Hall’. Before we go declaring that the £100,000 DAC’s ‘effect’ is worth the money, shouldn’t we also verify that our experiment doesn’t show that ‘Super Stereo’ is even better? Or that a £10 DAC off Amazon isn’t even better than that? This is the open-ended illogicality of preference-based listening tests.

If the device is supposed to be a “DAC”, it can do no more than meet the objective definition of a DAC to a tolerably close degree. How do we know what “tolerably close” is? Well, if we were to simulate the known, objective, measured error, and amplify it by a factor of a hundred, and still fail to be able to hear it at normal listening levels in a quiet room, I think we would have our answer. This is the one listening test that I think would be useful.