Early Glastonbury

We just had the Glastonbury festival here in the UK. A few years ago I would absolutely devour the BBC’s coverage of it (I was never foolhardy enough to actually go there…). In the Britpop years it was pretty good, I thought; obviously culminating in Radiohead’s fantastic set in 1997. People of a certain age may realise that it hasn’t been quite the same of late.

As a contrast to last weekend, here’s one of my favourite film clips in the world, ever. It’s from the second Glastonbury festival in 1971, featuring Terry Reid joined by Linda Lewis on – I think – the first incarnation of the now traditional Pyramid Stage.


Audiophile Demo Music


lizardIn shops that sell televisions, they often play some sort of ‘showreel’ of spectacular scenes; you know the type of thing: ultra-detailed night time cityscapes, ultra-saturated lizards, ultra-contrasty arctic wildlife, and so on. You realise that it is impossible to see any real difference between the televisions with these scenes. They are ‘impressive’, but only at the most superficial level of what a television can display. Basically, any modern television can display them, with the only differentiators being size and absolute brightness. It always seems to me that the only way I can tell if a TV is any good is to watch a local news programme or something like that – not zero ‘production values’, but something that is relatable to everyday life.

Does something similar happen with audio?

When writing this post, I vowed to myself to search for a report of an audio show demo track, and to use the first track I came across as my example – of course I would have quietly forgotten that vow if it hadn’t illustrated my point fairly well, but as it happens, I think it does. The track is by Malia, and is called I Feel It Like You.

Absolutely no criticism is implied of the track, nor its production which is exemplary for this kind of music. But as an audio demo track?

Listening to it on my laptop, it seems to me to be an ‘in-your-face’ studio recording, built from a fairly sparse assemblage of pristine layers, each of which has been processed, compressed and equalised. The vocals are crystal clear and close up, mixed with a carefully-balanced amount of ‘Large Hall’ reverberation. The backing features plenty of detail, with lots of staccato, sampled(?) percussion rhythms and bass.

I think that this track would sound superficially impressive on any system – it even sounds good on my laptop.

What it is missing, if you ask me, is any connection with the organic, natural acoustics we encounter every day. It is like those uber-detailed images used for TV demos; the sound is highly-detailed and everything is at peak contrast and saturation. Such tracks are very common in audio demonstrations.

An alternative staple of audiophile demonstrations is ‘jazz’… I’m not sure what the appeal of this is (as a demonstration). I suspect it is because it often seems like an antidote to over-production – although jazz can still be over-produced. But again, as the potential customer, I don’t think it is telling me very much about the system’s capabilities. Old recordings of jazz are like grainy monochrome pictures, and modern recordings are still showing a ‘scene’ that is ‘smokey’ and sepia-toned (which I am sure is the intention). The style of music and the instrumental line-up (e.g. continuous brushed snare..?) means that I am often not hearing clear delineation between the instruments nor much in the way of transients and dynamics. (Or maybe I just don’t like jazz particularly and cannot engage with it, in which case ignore my objections…).

Just looking through some of the tracks that I might ‘demo’ my system with, one thing strikes me: they usually feature a bit of ‘messiness’. They may, or may not, have been put together in a studio using overdubbing, but the individual layers are a bit raw, organic, and recorded from a bit of a distance, so the room’s natural acoustics are audible. This possibly masks a bit of the pristine detail, but there’s enough there to verify that the system can do detail, anyway. When a short sound stops, and the reverberation remains, the contrast between the two can be particularly revealing. In photographic terms, the image covers all shades of grey and there’s still detail in the shadows; it’s not pushed into excessive contrast, nor selected or processed to be super-detailed. I am not even advocating massive ‘dynamics’ most of the time, which some people cite as proof of a system’s chops. As I will mention later, there are some specific classical tracks that might be played in order to put the system’s dynamic capabilities beyond doubt, anyway!

My favoured demo tracks are not just a single mic recording of a school concert, of course! They have been put together with some high ‘production values’.

It is worth perhaps listing the aspects of the system we might want to show off, or listen to if we are thinking of buying it.

  • frequency response: it is good if the track covers a wide spectrum of frequencies with equal weighting – not just bass and treble. A problem with many a system, would be fixed bumps and dips in the frequency response. These are almost impossible to hear against a recording that also has fixed bumps and dips in its ‘frequency response’. For example, a solo voice or a string quartet, or a piano. All of these are generated by resonant systems characterised by a formant, or a group of similar formants.  Some studio recordings are also augmented with fairly aggressive parametric equalisation of the individual layers in order to make them sound even more detailed. It is only when we hear many different natural musical sources playing in varying combinations that we assemble enough ‘simultaneous equations’ to work out whether the system is neutral or not.
  • bass: of course we want to demonstrate this! Deep organ notes, kick drums, symphony orchestras in natural acoustics are going to show this off well. The best bass does not have ‘one note’ quality; it engages somewhat with ‘room gain’ in order to extend all the way down to below audibility; it starts and stops quickly, hitting you in the chest (the kick drum will show this). In other words, sealed not bass reflex…
  • distortion: a sine wave would show up harmonic distortion, and several musical sources all playing at the same time would show up the resulting intermodulation distortion. A single voice will not really show it, nor percussive sounds. A choir would probably be a pretty good demonstration of low distortion, as would a symphony orchestra playing a varied selection. Less good would be girl-and-guitar, a string quartet, or a ‘world music’ drumming ensemble.
  • imaging: the really great demo, in my opinion, is when the stereo speakers produce a complete 3D audio ‘scene’. It may be an “illusion” as some people are very keen to point out, and not a perfect holographic reproduction especially if the recording was created with multiple mics and overdubs in a studio, but it is very compelling. Some classical recordings are made in purist fashion and do create a very convincing sense of 3D space – not just left-right imaging, but also a sense of distance. Imaging depends at least on low distortion and accurate correlation between left and right speakers, implying (I would say) a requirement for accurate reproduction of phase and timing. Some people would claim that absolute reproduction of phase isn’t important as long as both channels are well matched. I think this is special pleading based on the performance of traditional systems; I sometimes think that the people who are very keen to ‘dis’ imaging probably have very expensive systems based on valves, vinyl and passive crossovers…
  • power: achieving high volume isn’t usually a problem, but we want the system to behave uniformly well at all volumes. I suggest that the way this would be made obvious would be when a musical performer or ensemble plays continuously and naturally between quiet and loud – with minimal dynamic compression being applied. This is different from demonstrating a system playing a less dynamic recording with the volume control low and then high. As the Fletcher Munson curves show, there is only one volume at which we perceive a sound with the correct frequency response: its natural volume. If the system does something peculiar as the volume increases, it will be much more obvious if we are listening at a fixed volume that is closer to the ‘real’ volume at which it was recorded.

Of course, recommending tracks is a bit pointless, because the track’s ‘demo’ qualities are combined with musical taste – and I think you need to like the music in order to engage fully with what you are hearing and to know how it’s going to sound with ‘your’ music. Nevertheless, here’s a few tracks out of hundreds that I tentatively suggest would reveal a system’s attributes (no accounting for Youtube’s sound quality) and are the sort of thing I would want to listen to in order to get some idea of whether a system was any good.

Sufjan Stevens, Jacksonville – not a familiar act to me, but this track is ‘big’, has great bass and enough rawness to hear that the system sounds ‘natural’.

Elton John, Rocket Man – a beautiful, rounded studio recording with a great sense of space (so to speak).

Neil Young, Double E – very simple rock track that doesn’t sound over-produced.

Khachaturian Symphony Number 3 – a *massive* symphonic recording with huge pipe organ and 15 trumpets (apparently). If you play this loud, the end is very loud!

Arvo Part, Credo, for Piano Solo, Mixed Choir and Orchestra – possibly some of the most dynamic, contrasting classical music you will encounter.

(Maybe these classical performances are a bit too dynamic for everyday listening, but if you really want the demo to show what the system is capable of..!)

A less intense classical recording with some great imaging, space and some revealing bass is this one:

It’s An American in Paris by Gershwin, performed by the LA Philharmonic under Zubin Mehta – not sure if the Youtube version is the same as the CD version I listen to.

Sgt. Pepper’s Musical Revolution

Image result for howard goodall sgt pepper

Did you see Howard Goodall’s BBC programme about Sgt. Pepper? I thought it was a fine tribute, emphasising how fortunate we are for the existence of the Beatles.

Howard did his usual thing of analysing the finer points of the music and how it relates to classical and other forms, playing the piano and singing to illustrate his points. He showed that twelve of the tracks on Sgt. Pepper contain “modulations”, where the songs shift from one key to another – revealing very advanced compositional skills needless to say. But I don’t think that the Beatles ever really knew or cared that music is ‘supposed’ to be composed in one key and one time signature – they were just instinctive and brilliant. To me, it suggested that formal training might have stifled their creativity, in fact.

He supplemented his survey of the tracks with Strawberry Fields and Penny Lane which although not on the album, were the first tracks produced from the Sgt. Peppers recording sessions.

The technical stuff about studio trickery and how George Martin and his team worked around the limitations of four track tape was interesting (as always), and we listened in on some of the chat in the studio in-between takes.

Obviously, I checked out what versions of the album are available on Spotify, and found that there’s the 2009 remaster and, I think, the new 50th anniversary remixed version..! (Isn’t streaming great?)

Clearly the remixed version has moved some of the previous hard-panned left and right towards the middle, and the sound has more ‘body’ – but I am sure there is a lot more to it than that. The orchestral crescendos and final chord in A Day in the Life are particularly striking.

At the end of the day, however, I actually prefer a couple of more stripped back versions of tracks that appeared on the Beatles Anthology CDs from 1995. These, to me, sound even cleaner and fresher.

But what is this? Archimago has recently analysed some of the new remix and found that it has been processed into heavy clipping i.e. just like any typical modern recording that wants to sound ‘loud’. Archimago also shows that the 1987 CD version doesn’t have any such clipping in it; I won’t be throwing away my original Beatles CDs just yet…

Audio – Literature Analogy

An audio recording is a bit like a book: created through artistic or intellectual endeavour, then ‘fixed’ as a collection of pure information and distributed to customers for them to ‘consume’ in their own environments. In the case of digital audio, a recording is literally the same as a book, being stored as numbers in a file; you could store a book as a WAV, or an audio recording as a MSWORD file if you wanted.

In rendering the content to be read, there are things you could do to detract from the content of a book:

  • printed too big/too small
  • lighting too dim/too bright
  • inappropriate use of colour
  • blotchy printout
  • typeface varies with content, or randomly
  • corrupted: missing/duplicated/erroneous characters
  • peculiar paper
  • non-neutral typeface – difficult to read or inappropriate e.g. science fiction font for a Jane Austen novel
  • in the case of some ‘boutique’ printing, an appropriate analogy might be a book that spontaneously becomes too hot to touch, or occasionally ruins valuable furniture.

The emotional or intellectual force of the book would actually be reduced because of these problems. In other words, it is not true to say that the quality of reproduction doesn’t matter.

However, there is a finite envelope of neutral, even ‘mundane’, reproduction which achieves an optimal result for the reader – after reading the book they can’t tell you anything about the quality of the printing; all they remember is the content, and the content was thrilling.

Maybe the author specifies the typeface. Some books may include fine illustrations or intricate frontispieces which are intrinsic to the book. In these cases, the reproduction needs to be particularly accurate in order to do justice to what the author has created.

Beyond this, is there anything that the printer can do to enhance the appeal of the book? Well, they can create a fancy binding that the reader notices before they start reading; they can use particularly high quality paper; they can print the characters with micron precision. But only a book collector or printing technology enthusiast would care about these refinements – they have no effect on the actual experience of reading the content, and could easily detract from it.

The manufacturers of the ink and the mains cable that powers the printing press could read lots of books in their spare time, attend evening classes in English Literature, study the physiology of the eye, get diplomas in grammar, and tell us in interviews with speciality magazines about how it all informs their craft. But clearly the results would do nothing whatsoever to change the reading experience.

The printer might decide to dabble in science for the first time since they left printing college. They could do scientific trials in aspects of book reproduction where lucky participants get to read snippets of text or passages from ‘typical’ books, responding with their perceptions of differences, preferences, or even ‘emotional stimulation level’ in aspects such as:

  • typeface
  • ink
  • reading light
  • paper texture and weight
  • reading room shape/dimensions/finishes

But the results would be rather obvious and predictable, with anything slightly interesting being clearly the result of fashion, novelty and human fickleness rather than being a universal law.

The only way to actually enhance the book would be to change its content. An algorithm that replaces certain words? Re-writes sections to make them longer or shorter? Clearly in the case of literature, such a thing would be meaningless and idiotic. It is not so different in the case of audio. There is nothing but the recording: there is no technology, effect or algorithm that can meaningfully enhance it.


Domestic hi-fi is no more than the equivalent of rendering the printed content of a book: it can be done adequately or badly, and beyond that there is no meaningful way of improving on it. People become deluded by the idea that the rendering technology can enhance the content – which is obviously ridiculous in the case of books, but less obvious with audio.

But this is not to say that hi-fi is, in itself, boring: achieving ‘adequate’ is not trivial.

Many people are simply not used to hearing adequate reproduction regardless of how much money they spend, so they are not aware that the experience vs. quality graph has a horizontal flat top. And needless to say, the audiophile quality vs. cost graph is more-or-less random, which makes it even more confusing.

The audio enthusiast would be much happier and richer if they got a sense of proportion of what matters, then put all their creativity (and money if they’ve got nothing else to spend it on) into building the equivalent of a pleasant reading room, comfy chair and attractive bookcases rather than a solid gold and diamond reading light.

[Last edited  30/05/17]

Reverberation of a point source, compared with a ‘distributed’ loudspeaker

Here’s a fascinating speaker:

CBT36 Manufacturer of loudspeakers that focus on elimination of box resonances.

It uses many transducers arranged in a specific curve, driven in parallel and with ‘shading’ i.e. graduated volume settings along the curve, to reduce vertical dispersion but maintain wide dispersion in the horizontal. I can see how this might appear quite appealing for use in a non-ideal room with low ceilings or whatever.

It is a variation on the phased array concept, where the outputs of many transducers combine to produce a directional beam. It is effectively relying on differing path lengths from the different transducers producing phase cancellation or reinforcement in the air at different angles as you move off axis. All the individual wavefronts sum correctly at the listener’s ear to reproduce the signal accurately.

At a smaller scale, a single transducer of finite size can be thought of as many small transducers being driven simultaneously. At high frequencies (as the wavelengths being reproduced become short compared to the diameter of the transducer) differing path lengths from various parts of the transducer combine in the air to cause phase cancellation as you move off axis. This is known as beaming and is usually controlled in speaker design by using drivers of the appropriate size for the frequencies they are reproducing. Changes in directivity with frequency are regarded as undesirable in speaker design, because although the on-axis measurements can be perfect, the ‘room sound’ (reverberation) has the ‘wrong’ frequency response.

A large panel speaker suffers from beaming in the extreme, but with Quad electrostatics Peter Walker introduced a clever trick, where phase is shifted selectively using concentric circular electrodes as you move outwards from the centre of the panel. At the listener’s position, this simulates the effect of a point source emanating from some distance behind the panel, increasing the size of the ‘sweet spot’ and effectively reducing the high frequency beaming.

There are other ways of harnessing the power of phase cancellation and summation. Dipole speakers’ lower frequencies cancel out at the sides (and top and bottom) as the antiphase rear pressure waves meet those from the front. This is supposed to be useful acoustically, cutting down on unwanted reflections from floor, walls and ceiling. A dipole speaker may be realised by mounting a single driver on a panel of wood with a hole in it, but it behaves effectively as two transducers, one of which is in anti-phase to the other. Some people say they prefer the sound of such speakers over conventional box speakers.

This all works well in terms of the direct sound reaching the listener and, as in the CBT speaker above, may provide a very uniform dispersion with frequency compared to conventional speakers. But beyond the measurements of the direct sound, does the reverberation sound quite ‘right’? What if the overall level of reverberation doesn’t approximate the ‘liveness’ of the room that the listeners notice as they talk or shuffle their feet? If the vertical reflections are reduced but not the horizontal, does this sound unnatural?

Characterising a room from its sound

The interaction of a room and an acoustic source could be thought of as a collection of simultaneous equations – acoustics can be modelled and simulated for computer games, and it is possible for a computer to do the reverse and work out the size and shape of the room from the sound.  If the acoustic source is, in fact, multiple sources separated by certain distances, the computer can work that out, too.

Does the human hearing system do something similar? I would say “probably”. A human can work quite a lot out about a room from just its sound – you would certainly know whether you were in an anechoic chamber, a normal room or a cathedral. Even in a strange environment, a human rarely mistakes the direction and distance from which sound is coming. Head movements may play a part.

And this is where listening to a ‘distributed speaker’ in a room becomes a bit strange.

Stereo speakers can be regarded as a ‘distributed speaker’ when playing a centrally-placed sound. This is unavoidable – if we are using stereo as our system. Beyond that, what is the effect of spreading each speaker itself out, or deliberately creating phased ‘beams’ of sound?

Even though the combination of direct sounds adds up to the familiar sound at the listener’s position as though emanating from its original source, there is information within the reflections that is telling the listener that the acoustic source is really a radically different shape. Reverberation levels and directions may be ‘asymmetric’ with the apparent direct sound.

In effect, the direct sound says we are listening to this:

Image result for zoe wanamaker cassandra

but the reverberation says it is something different.

Image result for zoe wanamaker cassandra

Might there be audible side effects from this? In the case of the dipole speaker, for example, the rear (antiphase) signal reflects off the back wall and some of it does make its way forwards to the listener. In my experience, this comes through as a certain ‘phasiness’ but it doesn’t seem to bother other people.

From a normal listening distance, most musical sources are small and appear close to being a ‘point source’. If we are going to add some more reverberation, should it not appear to be emanating as much as possible from a point source?

It is easy to say that reverberation is so complex that it is just a wash of ‘ambience’ and nothing more; all we need to do is give it the right ‘colour’ i.e. frequency response. And one of the reasons for using a ‘distributed speaker’ may be to reduce the amount of reverberation anyway. But I don’t think we should overdo it: we surely want to listen in real rooms because of the reverberation, not despite it. What is the most side effect-free way to introduce this reverberation?

Clearly, some rooms are not ideal and offer too much of the wrong sort of reverberation. Maybe a ‘distributed speaker’ offers a solution, but is it as good as a conventional speaker in a suitable room? And is it really necessary, anyway? I think some people may be misguidedly attempting to achieve ‘perfect’ measurements by, effectively, eliminating the room from the sound even though their room is perfectly fine. How many people are intrigued by the CBT speaker above simply because it offers ‘better’ conventional in-room measurements, regardless of whether it is necessary?


‘Distributed speakers’ that use large, or multiple, transducers may achieve what they set out to do superficially, but are they free of side-effects?

I don’t have scientific proof, but I remain convinced that the ‘Rolls Royce’ of listening remains ‘point source’ monopole speakers in a large, carpeted, furnished room with a high ceiling. Box speakers with multiple drivers of different sizes are small and can be regarded as being very close to a single transducer, but are not so omnidirectional that they create too much reverberation. The acoustic ‘throw’ they produce is fairly ‘natural’. In other words, for stereo perfection, I think there is still a good chance that the types of rooms and speakers people were listening to in the 1970s remain optimal.

[Last edited 17.30 BST 09/05/17]

The Logic of Listening Tests

Casual readers may not believe this, but in the world of audiophilia there are people who enjoy organising scientific listening tests – or more aptly ‘trials’. These involve assembling panels of human ‘subjects’ to listen to snippets of music played through different setups in double blind tests, pressing buttons or filling in forms to indicate audible differences and preferences. The motivation is often to use science to debunk the ideas of a rival group, who may be known as ‘subjectivists’ or ‘objectivists’, or to confirm the ideas of one’s own group.

There are many, many inherent reasons why such listening tests may not be valid e.g.

  • no one can demonstrate that the knowledge you are taking part in an experiment doesn’t impede your ability to hear differences
  • a participant who has his own agenda may choose to ‘lie’ in order to pretend he is not hearing differences when he, in fact, is.
  • etc. etc.

The tests are difficult and tedious for the participants, and no one who holds the opposing viewpoint will be convinced by the results. At a logical level, they are dubious. So why bother to do the tests? I think it is an ‘appeal to a higher authority’ to arbitrate an argument that cannot be solved any other way. ‘Science’ is that higher authority.

But let’s look at just the logic.

We are told that there are two basic types of listening test:

  1. Determining or identifying audible difference
  2. Determining ‘preference’

Presumably the idea is that (1) suggests whether two or more devices or processes are equivalent, or whether their insertion into the audio chain is audibly transparent. If a difference is identified, then (2) can make the information useful and tell us which permutation sounds best to a human. Perhaps there is a notion that in the best case scenario a £100 DAC is found to sound identical to a £100,000 DAC, or that if they do sound different, the £100 DAC is preferred by listeners. Or vice versa.

But would anything actually have been gained by a listening test over simple measurements? A DAC has a very specific, well-defined job to do – we are not talking about observing the natural world and trying to work out what is going on. With today’s technology, it is trivial to make a DAC that is accurate to very close objective tolerances for £100 – it is not necessary to listen to it to know whether it works.

For two DACs to actually sound different, they must be measurably quite far apart. At least one of them is not even close to being a DAC: it is, in fact, an effects box of some kind. And such are the fundamental uncertainties in all experiments involving the asking of humans how they feel, it is entirely possible that in a preference-based listening test, the listeners are found to prefer the sound of the effects box.

Or not. It depends on myriad unstable factors. An effects box that adds some harmonic distortion may make certain recordings sound ‘louder’ or ‘more exciting’ thus eliciting a preference for it today – with those specific recordings. But the experiment cannot show that the listeners wouldn’t be bored with the effect three hours, days or months down the line. Or that they wouldn’t hate it if it happened to be raining. Or if the walls were painted yellow, not blue. You get the idea: it is nothing but aesthetic judgement, the classic condition where science becomes pseudoscience no matter how ‘scientific’ the methodology.

The results may be fed into statistical formulae and the handle cranked, allowing the experimenter to declare “statistical significance”, but this is just the usual misunderstanding of statistics, which are only valid under very specific mathematical conditions. If your experiment is built on invalid assumptions, the statistics mean nothing.

If we think it is acceptable for a ‘DAC’ to impose its own “effects” on the sound, where do we stop? Home theatre amps often have buttons labelled ‘Super Stereo’ or ‘Concert Hall’. Before we go declaring that the £100,000 DAC’s ‘effect’ is worth the money, shouldn’t we also verify that our experiment doesn’t show that ‘Super Stereo’ is even better? Or that a £10 DAC off Amazon isn’t even better than that? This is the open-ended illogicality of preference-based listening tests.

If the device is supposed to be a “DAC”, it can do no more than meet the objective definition of a DAC to a tolerably close degree. How do we know what “tolerably close” is? Well, if we were to simulate the known, objective, measured error, and amplify it by a factor of a hundred, and still fail to be able to hear it at normal listening levels in a quiet room, I think we would have our answer. This is the one listening test that I think would be useful.

The Sound of a Symphony Orchestra

Last night I went to a symphony concert: Shostakovich’s 10th, preceded by Prokofiev’s Piano Concerto No. 2 at the West Road Concert Hall, Cambridge.

west roadWe were sitting in the second row from the front – so quite close to the piano. I wish I had taken a photograph, but I was so paranoid about my phone ringing mid performance that I left it turned off! The image above shows the empty venue.

We really enjoyed the concert. Chiyan Wong is an amazing piano soloist, and CCSO were spectacular. The sound was formidable from a large orchestra, and we got to hear the fairly new Steinway grand in great detail – the piano was removed during the interval, for the Shostakovich that followed.

Now, I do often listen to this sort of music with my system, but this was the first time I had been to a concert to hear this specific Russian ‘genre’. Of course I couldn’t help but make a mental comparison of the sound of the real thing versus the hi-fi facsimile that I am used to, as I was listening. And you know what? I have to say that a good hi-fi gives a pretty good rendition of the real sound.

The real thing was very loud, but also very rich – I have observed that ‘painfully loud’ is more a function of quality than volume; you need good bass to balance the rest of the spectrum. So this was very loud, but at no time painful. Bass from the orchestra was wonderful, but didn’t take me by surprise – I sometimes hear such bass from my system. (It did take me by surprise the first time I heard it from a hi-fi system, however!).

Some people cite piano as being the most difficult thing for a hi-fi system to reproduce. I don’t know where they get that from: I loved the sound of the piano, and I think a good system can reproduce it fairly easily.

I was struck by the homogeneity within the different sections of the orchestra. Listening to a recording of just a piano, or just the violins, would not tell you very much about an audio system. It is only when you hear a combination of the piano, the violins and the brass, say, that any ‘formant’ (i.e. fixed frequency response signature) within your system would show up.

As discussed previously, ‘imaging’ of the orchestra was not as pin sharp as you get in some recordings, but many purist recordings portray the true effect quite accurately. The width of the ‘soundstage’ of a stereo system is more-or-less right, and the room you are listening in enhances the recording’s ‘ambience’ around and behind you.

Of course the concert is a very special experience. The stereo version isn’t always as deep, open and spacious, nor is the envelopment as complete but, all in all, I think if you sit down in the right frame of mind to listen to a fine orchestral recording using a good hi-fi system, you are getting a very reasonable impression of the sound, excitement and visceral quality of the real thing. And that really is quite an amazing idea.

Room correction. What are we trying to achieve?

The short version…

The recent availability of DSP is leading some people to assume that speakers are, and have always been, ‘wrong’ unless EQ’ed to invert the room’s acoustics.

In fact, our audio ancestors didn’t get it wrong. Only a neutral speaker is ‘right’, and the acoustics of an average room are an enhancement to the sound. If we don’t like the sound of the room, we must change the room – not the sound from the speaker.

DSP gives us the tools to build a more neutral speaker than ever before.

There are endless discussions about room correction, and many different commercial products and methods. Some people seem to like certain results while others find them a little strange-sounding.

I am not actually sure what it is that people are trying to achieve. I can’t help but think that if someone feels the need for room correction, they have yet to hear a system that sounds so good that they wouldn’t dream of messing it up with another layer of their own ‘EQ’.

Another possibility is that they are making an unwarranted assumption based on the fact that there are large objective differences between the recorded waveform and what reaches the listener’s ears in a real room. That must mean that no matter how good it sounds, there’s an error. It could sound even better, right?


A reviewer of the Kii Three found that that particularly neutral speaker sounded perfect straight out of the box.

“…the traditional kind of subjective analysis we speaker reviewers default to — describing the tonal balance and making a judgement about the competence of a monitor’s basic frequency response — is somehow rendered a little pointless with the Kii Three. It sounds so transparent and creates such fundamentally believable audio that thoughts of ‘dull’ or ‘bright’ seem somehow superfluous.”

The Kii Three does, however, offer a number of preset “contour” EQ options. As I shall describe later, I think that a variation on this is all that is required to refine the sound of any well-designed neutral speaker in most rooms.

A distinction is often made between correction of the bass and higher frequencies. If the room is large, and furnished copiously, there may be no problem to solve in either case, and this is the ideal situation. But some bass manipulation may be needed in many rooms. At a minimum, the person with sealed woofers needs the roll-off at the bottom end to start at about the right frequency for the room. This, in itself, is a form of ‘room correction’.

The controversial aspect is the question of whether we need ‘correction’ higher up. Should it be applied routinely (some people think so), as sparingly as possible, or not at all? And if people do hear an improvement, is that because the system is inherently correcting less-than-ideal speakers rather than the room?

Here are some ways of looking at the issue.

  1. Single room reflections give us echoes, while multiple reflections (of reflections) give us reverberation. Performing a frequency response measurement with a neutral transducer and analysing the result may show a non-flat FR at the listening position even when smoothed fairly heavily. This is just an aspect of statistics, and of the geometry and absorptivity of the various surfaces in the room. Some reflections will result in some frequencies summing in phase, to some extent, and others not.
  2. Experience tells us that we “hear through” the room to any acoustic source. Our hearing appears not to be just a frequency response analyser, but can separate direct sound from reflections. This is not a fanciful idea: adaptive software can learn to do the same thing.

The idea is also supported by some of the great and the good in audio.

Floyd Toole:

“…we humans manage to compensate for many of the temporal and timbral variations contributed by rooms and hear “through” them to appreciate certain essential qualities of sound sources within these spaces.”

Or Meridian’s Bob Stuart:

“Our brains are able to separate direct sound from the reverberation…”

  1. If we EQ the FR of the speaker to obtain a flat in-room measured response including the reflections in the measurement, it seems that we will subsequently “hear through” the reflections to a strangely-EQ’ed direct sound. It will, nevertheless measure ‘perfectly’.
  2. Audio orthodoxy maintains that humans are supremely insensitive to phase distortion, and this is often compounded with the argument that room reflections completely swamp phase information so it is not worth worrying about. This denies the possibility that we “hear through” the room. Listening tests in the past that purportedly demonstrated our inability to hear the effects of phase have often been based on mono only, and didn’t compare distorted with undistorted phase examples – merely distorted versus differently distorted, played on the then available equipment.
  3. Contradicting (4), audiophiles traditionally fear crossovers because the phase shifts inherent in (non-DSP) crossovers are, they say, always audible. DSP, on the other hand, allows us to create crossovers without any phase shift i.e. they are ‘transparent’.
  4. At a minimum, speaker drivers on their baffles should not ‘fight’ each other through the crossover – their phases should be aligned. The appropriate delays then ensure that they are not ‘fighting’ at the listener’s position. The next level in performance is to ensure that their phases are flat at all frequencies i.e. linear phase. The result of this is the recorded waveform preserved in both frequency and time.
  5. Intuitively, genuine stereo imaging is likely to be a function of phase and timing. Preserving that phase and timing should probably be something we logically try to do. We could ‘second guess’ how it works using traditional rules of thumb, deciding not to preserve the phase and timing, but if it is effectively cost-free to do it, why not do it anyway?
  6. A ‘perfect’ response from many speaker/room combinations can be guaranteed using DSP (deconvolution with the impulse response at that point, not just playing with a graphic equaliser). Unfortunately, it will only be valid for a single point in space, and moving 1mm from there will produce errors and unquantifiable sonic effects. Additionally, ‘perfect’ refers to the ‘anechoic chamber’ version of the recording, which may not be what most people are trying to achieve even if the measurements they think they seek mean precisely that.
  7. Room effects such as (moderate) reverberation are a major difference between listening with speakers versus headphones, and are actually desirable. ‘Room correction’ would be a bad thing if it literally removed the room from the sound. If that is the case, what exactly do we think ‘room correction’ is for?
  8. Even if the drivers are neutral (in an anechoic situation) and crossed over perfectly on axis, they are of finite size and mounted in a box or on a baffle that has a physical size and shape. This produces certain frequency-dependent dispersion characteristics which give different measured, and subjective, results in different rooms. Some questions are:
    • is this dispersion characteristic a ‘room effect’ or a ‘speaker effect’. Or both?
    • is there a simple objective measurement that says one result is better than any other?
    • is there just one ‘right’ result and all others are ‘wrong’?
  1. Should room correction attempt to correct the speaker as well? Or should we, in fact, only correct the speaker? Or just the room? If so, how would we separate room from speaker in our measurements? Can they, in fact, be separated?

I think there is a formula that gives good results. It says:

  • Don’t rely on feedback from in-room measurements, but do ‘neutralise’ the speaker at the most elemental levels first. At every stage, go for the most neutral (and locally correctable) option e.g. sealed woofers, DSP-based linear phase crossovers with time alignment delays.
  • Simply avoid configurations that are going to give inherently weird results: two-way speakers, bass reflex, many types of passive crossover etc. These may not even be partially correctable in any meaningful way.
  • Phase and time alignment are sacrosanct. This is the secret ingredient. You can play with minor changes to the ‘tone colour’ separately, but your direct sound must always maintain the recording’s phase and time alignment. This implies that FIR filters must be used, thus allowing frequency response to be modified independently of phase.
  • By all means do all the good stuff regarding speaker placement, room treatments (the room is always ‘valid’), and avoiding objects and asymmetry around the speakers themselves.
  • Notionally, I propose that we wish to correct the speaker not the room. However, we are faced with a room and non-neutral speaker that are intertwined due to the fact that the speaker has multiple drivers of finite size and a physical presence (as opposed to being a point source with uniform directivity at all frequencies). The artefacts resulting from this are room-dependent and can never really be ‘corrected’ unambiguously. Luckily, a smooth EQ curve can make the sound subjectively near enough to transparent. To obtain this curve, predict the baffle step correction for each driver using modelling or standard formula with some some trial-and-error regarding the depth required (4, 5, 6 dB?); this is a very smooth EQ curve. Or, possibly (I haven’t done this myself), make many FR measurements around the listening area, smooth and average them together, and partially invert this, again without altering phase and time alignment.
  • You are hearing the direct sound, plus separately-perceived ‘room ambience’. If you don’t like the sound of the ambience, you must change the room, not the direct sound.

Is there any scientific evidence for these assertions? No more nor less than any other ‘room correction’ technique – just logical deduction based on subjective experience. Really, it is just a case of thinking about what we hear as we move around and between rooms, compared to what the simple in-room FR measurements show. Why do real musicians not need ‘correction’ when they play in different venues? Do we really want ‘headphone sound’ when listening in rooms? (If so, just wear headphones or sit closer to smaller speakers).

This does not say that neutral drivers alone are sufficient to guarantee good sound – I have observed that this is not the case. A simple baffle step correction applied to frequency response (but leaving phase and timing intact) can greatly improve the sound of a real loudspeaker in a room without affecting how sharply-imaged and dynamic it sounds. I surmise that frequency response can be regarded as ‘colour’ (or “chrominance” in old school video speak), independent of the ‘detail’ (or “luminance”) of phase and timing. We can work towards a frequency response that compensates for the combination of room and speaker dispersion effects to give the right subjective ‘colour’ as long as we maintain accurate phase and timing of the direct sound.

We are not (necessarily) trying to flatten the in-room FR as measured at the listener’s position – the EQ we apply is very smooth and shallow – but the result will still be perceived as a flat FR. Many (most?) existing speakers inherently have this EQ built in whether their creators applied it deliberately, or via the ‘voicing’ they did when setting the speaker up for use in an average room.

In conclusion, the summary is this:

  • Humans “hear through” the room to the direct sound; the room is perceived as a separate ‘ambience’. Because of this, ‘no correction’ really is the correct strategy.
  • Simply flattening the FR at the listening position via EQ of the speaker output is likely to result in ‘peculiar’ perceived sound, even if the in-room measurements purport to say otherwise.
  • Speakers have to be as rigorously neutral as possible by design, rather than attempting to correct them by ‘global feedback’ in the room.
  • Final refinement is a speaker/room-dependent, smooth, shallow EQ curve that doesn’t touch phase and timing – only FIR filters can do this.

[Last updated 05/04/17]

Data Over Sound

Just saw this mentioned. It’s interesting how an idea that, years ago, was just a method of harnessing existing technology, can re-appear as something funky and brand new. It joins those other technologies that aim to get data into our devices via cost-free, non-contact interfaces, such as QR Codes.

What is Chirp?

A Chirp™ is a sonic barcode. With Chirp technology, data and content can be encoded into a unique audio stream. Any device with a speaker can transmit a chirp and most devices with a microphone can decode them.

People of a certain age will be familiar with the use of audio cassettes as storage for their microcomputer programs back in the 1980s – I think I used reel-to-reel for a time.

I also remember, round about 1980, sending a computer program over the phone to a friend’s house by holding the phone close to the speaker and picking the sound up at the other end with a microphone. As I recall, our version wasn’t really very reliable or practical, but I think we did succeed in sending a short program. Obviously we were inspired by the audio coupler modems that we might have seen in films and documentaries.


SMPTE and MIDI timecodes can be recorded as audio signals on analogue tape and can survive multiple transfers and, I dare say, would be robust enough to work over a speaker-microphone link.

In the 1990s we were all familiar with ‘the sound of data’ when we used dial-up modems.

Over the years we have also had DTMF dialling, audio watermarking, Shazam, Siri, Alexa etc. and phone-based automated systems using speech recognition, all of which have to deal with extracting ‘data’ from noisy audio. You would think that the new audio barcodes should be pretty simple to make work reliably.