Reverberation of a point source, compared with a ‘distributed’ loudspeaker

Here’s a fascinating speaker:

CBT36 Manufacturer of loudspeakers that focus on elimination of box resonances.

It uses many transducers arranged in a specific curve, driven in parallel and with ‘shading’ i.e. graduated volume settings along the curve, to reduce vertical dispersion but maintain wide dispersion in the horizontal. I can see how this might appear quite appealing for use in a non-ideal room with low ceilings or whatever.

It is a variation on the phased array concept, where the outputs of many transducers combine to produce a directional beam. It is effectively relying on differing path lengths from the different transducers producing phase cancellation or reinforcement in the air at different angles as you move off axis. All the individual wavefronts sum correctly at the listener’s ear to reproduce the signal accurately.

At a smaller scale, a single transducer of finite size can be thought of as many small transducers being driven simultaneously. At high frequencies (as the wavelengths being reproduced become short compared to the diameter of the transducer) differing path lengths from various parts of the transducer combine in the air to cause phase cancellation as you move off axis. This is known as beaming and is usually controlled in speaker design by using drivers of the appropriate size for the frequencies they are reproducing. Changes in directivity with frequency are regarded as undesirable in speaker design, because although the on-axis measurements can be perfect, the ‘room sound’ (reverberation) has the ‘wrong’ frequency response.

A large panel speaker suffers from beaming in the extreme, but with Quad electrostatics Peter Walker introduced a clever trick, where phase is shifted selectively using concentric circular electrodes as you move outwards from the centre of the panel. At the listener’s position, this simulates the effect of a point source emanating from some distance behind the panel, increasing the size of the ‘sweet spot’ and effectively reducing the high frequency beaming.

There are other ways of harnessing the power of phase cancellation and summation. Dipole speakers’ lower frequencies cancel out at the sides (and top and bottom) as the antiphase rear pressure waves meet those from the front. This is supposed to be useful acoustically, cutting down on unwanted reflections from floor, walls and ceiling. A dipole speaker may be realised by mounting a single driver on a panel of wood with a hole in it, but it behaves effectively as two transducers, one of which is in anti-phase to the other. Some people say they prefer the sound of such speakers over conventional box speakers.

This all works well in terms of the direct sound reaching the listener and, as in the CBT speaker above, may provide a very uniform dispersion with frequency compared to conventional speakers. But beyond the measurements of the direct sound, does the reverberation sound quite ‘right’? What if the overall level of reverberation doesn’t approximate the ‘liveness’ of the room that the listeners notice as they talk or shuffle their feet? If the vertical reflections are reduced but not the horizontal, does this sound unnatural?

Characterising a room from its sound

The interaction of a room and an acoustic source could be thought of as a collection of simultaneous equations – acoustics can be modelled and simulated for computer games, and it is possible for a computer to do the reverse and work out the size and shape of the room from the sound.  If the acoustic source is, in fact, multiple sources separated by certain distances, the computer can work that out, too.

Does the human hearing system do something similar? I would say “probably”. A human can work quite a lot out about a room from just its sound – you would certainly know whether you were in an anechoic chamber, a normal room or a cathedral. Even in a strange environment, a human rarely mistakes the direction and distance from which sound is coming. Head movements may play a part.

And this is where listening to a ‘distributed speaker’ in a room becomes a bit strange.

Stereo speakers can be regarded as a ‘distributed speaker’ when playing a centrally-placed sound. This is unavoidable – if we are using stereo as our system. Beyond that, what is the effect of spreading each speaker itself out, or deliberately creating phased ‘beams’ of sound?

Even though the combination of direct sounds adds up to the familiar sound at the listener’s position as though emanating from its original source, there is information within the reflections that is telling the listener that the acoustic source is really a radically different shape. Reverberation levels and directions may be ‘asymmetric’ with the apparent direct sound.

In effect, the direct sound says we are listening to this:

Image result for zoe wanamaker cassandra

but the reverberation says it is something different.

Image result for zoe wanamaker cassandra

Might there be audible side effects from this? In the case of the dipole speaker, for example, the rear (antiphase) signal reflects off the back wall and some of it does make its way forwards to the listener. In my experience, this comes through as a certain ‘phasiness’ but it doesn’t seem to bother other people.

From a normal listening distance, most musical sources are small and appear close to being a ‘point source’. If we are going to add some more reverberation, should it not appear to be emanating as much as possible from a point source?

It is easy to say that reverberation is so complex that it is just a wash of ‘ambience’ and nothing more; all we need to do is give it the right ‘colour’ i.e. frequency response. And one of the reasons for using a ‘distributed speaker’ may be to reduce the amount of reverberation anyway. But I don’t think we should overdo it: we surely want to listen in real rooms because of the reverberation, not despite it. What is the most side effect-free way to introduce this reverberation?

Clearly, some rooms are not ideal and offer too much of the wrong sort of reverberation. Maybe a ‘distributed speaker’ offers a solution, but is it as good as a conventional speaker in a suitable room? And is it really necessary, anyway? I think some people may be misguidedly attempting to achieve ‘perfect’ measurements by, effectively, eliminating the room from the sound even though their room is perfectly fine. How many people are intrigued by the CBT speaker above simply because it offers ‘better’ conventional in-room measurements, regardless of whether it is necessary?


‘Distributed speakers’ that use large, or multiple, transducers may achieve what they set out to do superficially, but are they free of side-effects?

I don’t have scientific proof, but I remain convinced that the ‘Rolls Royce’ of listening remains ‘point source’ monopole speakers in a large, carpeted, furnished room with a high ceiling. Box speakers with multiple drivers of different sizes are small and can be regarded as being very close to a single transducer, but are not so omnidirectional that they create too much reverberation. The acoustic ‘throw’ they produce is fairly ‘natural’. In other words, for stereo perfection, I think there is still a good chance that the types of rooms and speakers people were listening to in the 1970s remain optimal.

[Last edited 17.30 BST 09/05/17]

The Logic of Listening Tests

Casual readers may not believe this, but in the world of audiophilia there are people who enjoy organising scientific listening tests – or more aptly ‘trials’. These involve assembling panels of human ‘subjects’ to listen to snippets of music played through different setups in double blind tests, pressing buttons or filling in forms to indicate audible differences and preferences. The motivation is often to use science to debunk the ideas of a rival group, who may be known as ‘subjectivists’ or ‘objectivists’, or to confirm the ideas of one’s own group.

There are many, many inherent reasons why such listening tests may not be valid e.g.

  • no one can demonstrate that the knowledge you are taking part in an experiment doesn’t impede your ability to hear differences
  • a participant who has his own agenda may choose to ‘lie’ in order to pretend he is not hearing differences when he, in fact, is.
  • etc. etc.

The tests are difficult and tedious for the participants, and no one who holds the opposing viewpoint will be convinced by the results. At a logical level, they are dubious. So why bother to do the tests? I think it is an ‘appeal to a higher authority’ to arbitrate an argument that cannot be solved any other way. ‘Science’ is that higher authority.

But let’s look at just the logic.

We are told that there are two basic types of listening test:

  1. Determining or identifying audible difference
  2. Determining ‘preference’

Presumably the idea is that (1) suggests whether two or more devices or processes are equivalent, or whether their insertion into the audio chain is audibly transparent. If a difference is identified, then (2) can make the information useful and tell us which permutation sounds best to a human. Perhaps there is a notion that in the best case scenario a £100 DAC is found to sound identical to a £100,000 DAC, or that if they do sound different, the £100 DAC is preferred by listeners. Or vice versa.

But would anything actually have been gained by a listening test over simple measurements? A DAC has a very specific, well-defined job to do – we are not talking about observing the natural world and trying to work out what is going on. With today’s technology, it is trivial to make a DAC that is accurate to very close objective tolerances for £100 – it is not necessary to listen to it to know whether it works.

For two DACs to actually sound different, they must be measurably quite far apart. At least one of them is not even close to being a DAC: it is, in fact, an effects box of some kind. And such are the fundamental uncertainties in all experiments involving the asking of humans how they feel, it is entirely possible that in a preference-based listening test, the listeners are found to prefer the sound of the effects box.

Or not. It depends on myriad unstable factors. An effects box that adds some harmonic distortion may make certain recordings sound ‘louder’ or ‘more exciting’ thus eliciting a preference for it today – with those specific recordings. But the experiment cannot show that the listeners wouldn’t be bored with the effect three hours, days or months down the line. Or that they wouldn’t hate it if it happened to be raining. Or if the walls were painted yellow, not blue. You get the idea: it is nothing but aesthetic judgement, the classic condition where science becomes pseudoscience no matter how ‘scientific’ the methodology.

The results may be fed into statistical formulae and the handle cranked, allowing the experimenter to declare “statistical significance”, but this is just the usual misunderstanding of statistics, which are only valid under very specific mathematical conditions. If your experiment is built on invalid assumptions, the statistics mean nothing.

If we think it is acceptable for a ‘DAC’ to impose its own “effects” on the sound, where do we stop? Home theatre amps often have buttons labelled ‘Super Stereo’ or ‘Concert Hall’. Before we go declaring that the £100,000 DAC’s ‘effect’ is worth the money, shouldn’t we also verify that our experiment doesn’t show that ‘Super Stereo’ is even better? Or that a £10 DAC off Amazon isn’t even better than that? This is the open-ended illogicality of preference-based listening tests.

If the device is supposed to be a “DAC”, it can do no more than meet the objective definition of a DAC to a tolerably close degree. How do we know what “tolerably close” is? Well, if we were to simulate the known, objective, measured error, and amplify it by a factor of a hundred, and still fail to be able to hear it at normal listening levels in a quiet room, I think we would have our answer. This is the one listening test that I think would be useful.

Room correction. What are we trying to achieve?

The short version…

The recent availability of DSP is leading some people to assume that speakers are, and have always been, ‘wrong’ unless EQ’ed to invert the room’s acoustics.

In fact, our audio ancestors didn’t get it wrong. Only a neutral speaker is ‘right’, and the acoustics of an average room are an enhancement to the sound. If we don’t like the sound of the room, we must change the room – not the sound from the speaker.

DSP gives us the tools to build a more neutral speaker than ever before.

There are endless discussions about room correction, and many different commercial products and methods. Some people seem to like certain results while others find them a little strange-sounding.

I am not actually sure what it is that people are trying to achieve. I can’t help but think that if someone feels the need for room correction, they have yet to hear a system that sounds so good that they wouldn’t dream of messing it up with another layer of their own ‘EQ’.

Another possibility is that they are making an unwarranted assumption based on the fact that there are large objective differences between the recorded waveform and what reaches the listener’s ears in a real room. That must mean that no matter how good it sounds, there’s an error. It could sound even better, right?


A reviewer of the Kii Three found that that particularly neutral speaker sounded perfect straight out of the box.

“…the traditional kind of subjective analysis we speaker reviewers default to — describing the tonal balance and making a judgement about the competence of a monitor’s basic frequency response — is somehow rendered a little pointless with the Kii Three. It sounds so transparent and creates such fundamentally believable audio that thoughts of ‘dull’ or ‘bright’ seem somehow superfluous.”

The Kii Three does, however, offer a number of preset “contour” EQ options. As I shall describe later, I think that a variation on this is all that is required to refine the sound of any well-designed neutral speaker in most rooms.

A distinction is often made between correction of the bass and higher frequencies. If the room is large, and furnished copiously, there may be no problem to solve in either case, and this is the ideal situation. But some bass manipulation may be needed in many rooms. At a minimum, the person with sealed woofers needs the roll-off at the bottom end to start at about the right frequency for the room. This, in itself, is a form of ‘room correction’.

The controversial aspect is the question of whether we need ‘correction’ higher up. Should it be applied routinely (some people think so), as sparingly as possible, or not at all? And if people do hear an improvement, is that because the system is inherently correcting less-than-ideal speakers rather than the room?

Here are some ways of looking at the issue.

  1. Single room reflections give us echoes, while multiple reflections (of reflections) give us reverberation. Performing a frequency response measurement with a neutral transducer and analysing the result may show a non-flat FR at the listening position even when smoothed fairly heavily. This is just an aspect of statistics, and of the geometry and absorptivity of the various surfaces in the room. Some reflections will result in some frequencies summing in phase, to some extent, and others not.
  2. Experience tells us that we “hear through” the room to any acoustic source. Our hearing appears not to be just a frequency response analyser, but can separate direct sound from reflections. This is not a fanciful idea: adaptive software can learn to do the same thing.

The idea is also supported by some of the great and the good in audio.

Floyd Toole:

“…we humans manage to compensate for many of the temporal and timbral variations contributed by rooms and hear “through” them to appreciate certain essential qualities of sound sources within these spaces.”

Or Meridian’s Bob Stuart:

“Our brains are able to separate direct sound from the reverberation…”

  1. If we EQ the FR of the speaker to obtain a flat in-room measured response including the reflections in the measurement, it seems that we will subsequently “hear through” the reflections to a strangely-EQ’ed direct sound. It will, nevertheless measure ‘perfectly’.
  2. Audio orthodoxy maintains that humans are supremely insensitive to phase distortion, and this is often compounded with the argument that room reflections completely swamp phase information so it is not worth worrying about. This denies the possibility that we “hear through” the room. Listening tests in the past that purportedly demonstrated our inability to hear the effects of phase have often been based on mono only, and didn’t compare distorted with undistorted phase examples – merely distorted versus differently distorted, played on the then available equipment.
  3. Contradicting (4), audiophiles traditionally fear crossovers because the phase shifts inherent in (non-DSP) crossovers are, they say, always audible. DSP, on the other hand, allows us to create crossovers without any phase shift i.e. they are ‘transparent’.
  4. At a minimum, speaker drivers on their baffles should not ‘fight’ each other through the crossover – their phases should be aligned. The appropriate delays then ensure that they are not ‘fighting’ at the listener’s position. The next level in performance is to ensure that their phases are flat at all frequencies i.e. linear phase. The result of this is the recorded waveform preserved in both frequency and time.
  5. Intuitively, genuine stereo imaging is likely to be a function of phase and timing. Preserving that phase and timing should probably be something we logically try to do. We could ‘second guess’ how it works using traditional rules of thumb, deciding not to preserve the phase and timing, but if it is effectively cost-free to do it, why not do it anyway?
  6. A ‘perfect’ response from many speaker/room combinations can be guaranteed using DSP (deconvolution with the impulse response at that point, not just playing with a graphic equaliser). Unfortunately, it will only be valid for a single point in space, and moving 1mm from there will produce errors and unquantifiable sonic effects. Additionally, ‘perfect’ refers to the ‘anechoic chamber’ version of the recording, which may not be what most people are trying to achieve even if the measurements they think they seek mean precisely that.
  7. Room effects such as (moderate) reverberation are a major difference between listening with speakers versus headphones, and are actually desirable. ‘Room correction’ would be a bad thing if it literally removed the room from the sound. If that is the case, what exactly do we think ‘room correction’ is for?
  8. Even if the drivers are neutral (in an anechoic situation) and crossed over perfectly on axis, they are of finite size and mounted in a box or on a baffle that has a physical size and shape. This produces certain frequency-dependent dispersion characteristics which give different measured, and subjective, results in different rooms. Some questions are:
    • is this dispersion characteristic a ‘room effect’ or a ‘speaker effect’. Or both?
    • is there a simple objective measurement that says one result is better than any other?
    • is there just one ‘right’ result and all others are ‘wrong’?
  1. Should room correction attempt to correct the speaker as well? Or should we, in fact, only correct the speaker? Or just the room? If so, how would we separate room from speaker in our measurements? Can they, in fact, be separated?

I think there is a formula that gives good results. It says:

  • Don’t rely on feedback from in-room measurements, but do ‘neutralise’ the speaker at the most elemental levels first. At every stage, go for the most neutral (and locally correctable) option e.g. sealed woofers, DSP-based linear phase crossovers with time alignment delays.
  • Simply avoid configurations that are going to give inherently weird results: two-way speakers, bass reflex, many types of passive crossover etc. These may not even be partially correctable in any meaningful way.
  • Phase and time alignment are sacrosanct. This is the secret ingredient. You can play with minor changes to the ‘tone colour’ separately, but your direct sound must always maintain the recording’s phase and time alignment. This implies that FIR filters must be used, thus allowing frequency response to be modified independently of phase.
  • By all means do all the good stuff regarding speaker placement, room treatments (the room is always ‘valid’), and avoiding objects and asymmetry around the speakers themselves.
  • Notionally, I propose that we wish to correct the speaker not the room. However, we are faced with a room and non-neutral speaker that are intertwined due to the fact that the speaker has multiple drivers of finite size and a physical presence (as opposed to being a point source with uniform directivity at all frequencies). The artefacts resulting from this are room-dependent and can never really be ‘corrected’ unambiguously. Luckily, a smooth EQ curve can make the sound subjectively near enough to transparent. To obtain this curve, predict the baffle step correction for each driver using modelling or standard formula with some some trial-and-error regarding the depth required (4, 5, 6 dB?); this is a very smooth EQ curve. Or, possibly (I haven’t done this myself), make many FR measurements around the listening area, smooth and average them together, and partially invert this, again without altering phase and time alignment.
  • You are hearing the direct sound, plus separately-perceived ‘room ambience’. If you don’t like the sound of the ambience, you must change the room, not the direct sound.

Is there any scientific evidence for these assertions? No more nor less than any other ‘room correction’ technique – just logical deduction based on subjective experience. Really, it is just a case of thinking about what we hear as we move around and between rooms, compared to what the simple in-room FR measurements show. Why do real musicians not need ‘correction’ when they play in different venues? Do we really want ‘headphone sound’ when listening in rooms? (If so, just wear headphones or sit closer to smaller speakers).

This does not say that neutral drivers alone are sufficient to guarantee good sound – I have observed that this is not the case. A simple baffle step correction applied to frequency response (but leaving phase and timing intact) can greatly improve the sound of a real loudspeaker in a room without affecting how sharply-imaged and dynamic it sounds. I surmise that frequency response can be regarded as ‘colour’ (or “chrominance” in old school video speak), independent of the ‘detail’ (or “luminance”) of phase and timing. We can work towards a frequency response that compensates for the combination of room and speaker dispersion effects to give the right subjective ‘colour’ as long as we maintain accurate phase and timing of the direct sound.

We are not (necessarily) trying to flatten the in-room FR as measured at the listener’s position – the EQ we apply is very smooth and shallow – but the result will still be perceived as a flat FR. Many (most?) existing speakers inherently have this EQ built in whether their creators applied it deliberately, or via the ‘voicing’ they did when setting the speaker up for use in an average room.

In conclusion, the summary is this:

  • Humans “hear through” the room to the direct sound; the room is perceived as a separate ‘ambience’. Because of this, ‘no correction’ really is the correct strategy.
  • Simply flattening the FR at the listening position via EQ of the speaker output is likely to result in ‘peculiar’ perceived sound, even if the in-room measurements purport to say otherwise.
  • Speakers have to be as rigorously neutral as possible by design, rather than attempting to correct them by ‘global feedback’ in the room.
  • Final refinement is a speaker/room-dependent, smooth, shallow EQ curve that doesn’t touch phase and timing – only FIR filters can do this.

[Last updated 05/04/17]

The Secret Life of the Signal

Some people actually think of stereo imaging as a “parlour trick” that is very low on the list of desirable attributes that an audio system should have. They ‘rationalise’ this by saying that in the majority of recordings, any stereo image is an artificial illusion, created by the recording engineer either deliberately or by accident; it does not accurately represent the live event – because there may not even have been a single live event. So how can it matter if it is reproduced by the playback system or not? Perhaps it is even best to suppress it: muddle it up with some inter-channel crosstalk like vinyl does, or even listen in mono.

At the top of the list of desirable attributes for a hi-fi system, most audiophiles would put “timbre”, “tonality”, low distortion, clean reproduction at high volumes, dynamics, deep bass. All of these qualities can be experienced with a mono signal and a single speaker – in fact in the Harman Corporation’s training for listening, monophonic reproduction is recommended for when performing listening tests.

Because their effects are not so obvious in mono, phase and timing are regarded by many as supremely unimportant. I quote one industry luminary:

Time domain does not enter my vocabulary…

Sound is colour?

We know that our eyes respond to detail and colour in different ways. In the early days of colour TV (analogue) it was found that the signal could be broadcast within practical bandwidths because the colour (chrominance) information could be be sent at lower resolution than the detail (luminance).

There is, perhaps, a parallel in hearing, too: that humans have separate mechanisms for responding to sound in the frequency and time domains. But the conventional hi-fi industry’s implicit view is that we only hear in the frequency domain: all the main measurements are in the frequency domain, and steady state signals are regarded as equivalent to real music. A speaker’s overall response to phase and timing is ignored almost totally or, at best, regarded as a secondary issue.

I think that this is symptomatic of an idea that pervades hi-fi: that the signal is ‘colour’. Sure, it varies as the music is playing, but the exact nature of that variation is almost incidental; secondary in comparison to the importance of the accurate reproduction of colour, and that in testing, all that matters is whether a uniform colour is accurately reproduced.

There has, nevertheless, been some belated lip service paid to the importance of timing, with the hype around MQA (still usually being played over speakers with huge timing errors!), and a number of passive speakers with sloping front baffles for time alignment. Taken to its logical conclusion, we have these:


Their creator says, though:

It’s nice if you have phase coherence, but it is not necessary

So they still fall short of the “straight wire with gain” ideal. It still says that the signal is something we can take liberties with, not aspiring to absolute accuracy in the detail as long as we get a good neutral white and a deep black, and all uniform (‘steady state’) colours reproduced with the correct shading. It says that we understand the signal and it is trivial. Time alignment by moving the drivers backwards and forwards is an easy gimmick, so we can go that far, however.

Another Dimension

I think that with DSP-corrected drivers and crossovers, we are beginning to find that there is another dimension to the common or garden stereo signal; one that has been viewed as a secondary effect until now. Whether created accidentally or not, the majority of recordings contain ‘imaging’ that is so clear that it gives us access to the music in a way we were not aware of. It allows us to ‘walk around’ the scene in which the recording was made. If it is a composite, multitrack recording, it may not be a real scene that ever existed, but the individual elements are each small scenes in themselves, and they become clearly delineated. It is ‘compelling’.

I can do no better than quote a brand new review of the Kii Three written by a professional audio engineer, that echoes something I was saying a couple of weeks ago: imaging is not just a ‘trick’, but improves the separation of the acoustic sources in a way that goes beyond the traditional attributes of low distortion & colouration.

I think he also echoes something I said about believable imaging giving the speaker a ‘free pass’ in terms of measurements. As in my DIY post, he says that the speaker sounds so transparent and believable that there is no point in going any further in criticising the sound. A suggestion, perhaps, that conventional ‘in-room’ measurements and ‘room correction’, are shown up as the red herrings they are if a system sets out to be genuinely neutral by design, at source.

Firstly, the traditional kind of subjective analysis we speaker reviewers default to — describing the tonal balance and making a judgement about the competence of a monitor’s basic frequency response — is somehow rendered a little pointless with the Kii Three. It sounds so transparent and creates such fundamentally believable audio that thoughts of ‘dull’ or ‘bright’ seem somehow superfluous.

… it is dominated by such a sense of realistic clarity, imaging, dynamics and detail that you begin almost to forget that there’s a speaker between you and the music.

…I’ve never heard anything anywhere near as adept at separating the elements of a mix and revealing exactly what is going on. I found myself endlessly fascinated, in particular, by the way the Kii Three presents vocals within a mix and ruthlessly reveals how good the performance was and how the voice was subsequently treated (or mistreated). Performance idiosyncrasies, microphone character, room sound, compression effects, reverb and delay techniques and pitch-correction artifacts that I’d never noticed before became blindingly obvious — it was addictive.

…One of the joys of auditioning new audio gear, especially speakers, is that I occasionally get to rediscover CDs or mixes that I thought I knew intimately. I can honestly say that with the Kii Three, every time I played some old familiar material I heard something significant in the way it performs…

…Low-latency mode …switch[es] off the system phase correction. It makes for a fascinating listening experience. …the change of phase response is clearly audible. The monitor loses a little of its imaging ability and overall precision in low-latency mode so that things sound a little less ‘together’.

“The Kii Three is one of the finest speakers I’ve ever heard and undoubtedly the best I’ve ever had the privilege and pleasure of using in my own home.”

Neural Adaptation

Just an interesting snippet regarding a characteristic of human hearing (and all our senses). It is called neural adaptation.

Neural adaptation or sensory adaptation is a change over time in the responsiveness of the sensory system to a constant stimulus. It is usually experienced as a change in the stimulus. For example, if one rests one’s hand on a table, one immediately feels the table’s surface on one’s skin. Within a few seconds, however, one ceases to feel the table’s surface. The sensory neurons stimulated by the table’s surface respond immediately, but then respond less and less until they may not respond at all; this is an example of neural adaptation. Neural adaptation is also thought to happen at a more central level such as the cortex.

Fast and slow adaptation
One has to distinguish fast adaptation from slow adaptation. Fast adaptation occurs immediately after stimulus presentation i.e., within 100s of milliseconds. Slow adaptive processes that take minutes, hours or even days. The two classes of neural adaptation may rely on very different physiological mechanisms.

Auditory adaptation, as perceptual adaptation with other senses, is the process by which individuals adapt to sounds and noises. As research has shown, as time progresses, individuals tend to adapt to sounds and tend to distinguish them less frequently after a while. Sensory adaptation tends to blend sounds into one, variable sound, rather than having several separate sounds as a series. Moreover, after repeated perception, individuals tend to adapt to sounds to the point where they no longer consciously perceive it, or rather, “block it out”.

What this says to me is that perceived sound characteristics are variable depending on how long the person has been listening, and to what sequence of ‘stimulii’. Our senses, to some extent, are change detectors not ‘direct coupled’.

Something of a conundrum for listening-based audio equipment testing..? Our hearing begins to change the moment we start listening. It becomes desensitised to repeated exposure to a sound – one of the cornerstones of many types of listening-based testing.

Auditory Scene Analysis

There is a field of study called Auditory Scene Analysis (ASA) that postulates that humans interpret “scenes” using sound just as they do using vision. I am not sure that it necessarily has any particular bearing on the way that audio hardware should be designed: basically the scene is all the clearer if the reproduction of the audio is clean in terms of noise, channel separation, distortion, frequency response and (seemingly controversial to hi-fi folk) the time domain.

However, the seminal work in this field includes the following analogy for hearing:

Your friend digs two narrow channels up from the side of a lake. Each is a few feet long and a few inches wide and they are spaced a few feet apart. Halfway up each one, your friend stretches a handkerchief and fastens it to the sides of the channel. As the waves reach the side of the lake they travel up the channels and cause the two handkerchiefs to go into motion. You are allowed to look only at the handkerchiefs and from their motions to answer a series of questions: How many boats are there on the lake and where are they? Which is the most powerful one? Which one is closer? Is the wind blowing? Has any large object been dropped suddenly into the lake?

Of course, when we listen to reproduced music with an audio system we are, in effect, duplicating the motion of the handkerchiefs using two paddles in another lake (our listening room) and watching the motion of a new pair of handkerchiefs. Amazingly, it works! But the key to this is that the two lakes are well-defined linear systems. Our brains can ‘work back’ to the original sounds using a process akin to ‘blind deconvolution’.

If we want to, we can eliminate the ‘second lake’ by using headphones, or we can almost eliminate it by using an anechoic chamber. We could theoretically eliminate it at a single point in space by deconvolving the reproduced signal with the measured impulse response of the room at that point. Listening with headphones works OK, but listening to speakers in a dead acoustic sounds terrible – probably to do with ‘head related transfer function’ (HRTF) telling us that we are listening to a ‘real’ acoustic but with an absence of the expected acoustic cues when we move our heads. By adding the ‘second lake’ we create enough ‘real acoustic’ to overcome that.

But here is why ‘room correction’ is flawed. The logical conclusion of room correction is to simulate headphones, but this cannot be achieved – and is not what most listeners want anyway, even if they don’t know it. Instead, an incomplete ‘correction’ is implemented based on the idea of trying to make the motion of the two sets of ‘handkerchiefs’ closer to each other than they (in naive measurements) appear to be. If the idea of the brain ‘working back’ to the original sound is correct, it will ‘work back’ to a seemingly arbitrarily modified recording. Modifying the physical acoustics of the room is valid whereas modifying the signal is not.

I think the problem stems ultimately from an engineering tool (frequency domain measurement) proliferating due to cheap computing power. There is a huge difference in levels of understanding between the author of the ASA book and the audiophiles and manufacturers who think that the sound is improved by tweaking graphic equalisers in an attempt to compensate for delays that the brain has compensated for already.

The musical ‘observer effect’

In scientific audio circles, it is believed that if you are aware (or think you are aware) of what hardware you are listening to, then you are incapable of any sort of objective assessment of its quality. This leads to the blind listening test being held up as the Gold Standard for audio science.

But here’s an irony: almost everything of value that man creates comes into being through a process of ‘sighted’ creation and refinement – and it seems to work. Bridges are designed by architects who refine CAD models on a screen, but the finished products don’t fall down, and are admired by ordinary people for their appearance. Car bodies are designed by engineers and stylists in full sight, yet the holes line up with the rest of the car, and they achieve great measurements for aerodynamics and the cars look good as well. Pianos are tuned by people who know which way they are turning the lever as they listen.

So if ‘sighted-ness’ leads to a completely fictitious, imaginary perception, then presumably our pianos are not really in tune, but we imagine they are? Maybe everyone but the piano tuner would hear an out-of-tune cacophony when the piano is played? But no, it turns out that everyone, including the piano tuner, can tell consistently when a piano is in tune without resorting to blind tests, and this can be confirmed with measurements.

So how come ‘sightedness’ is so problematic for the creation or assessment of audio equipment? I think that the question is “not even wrong”. The faulty logic lies in the erroneous idea that audio equipment is being listened to, as opposed to through, and that the human brain when listening to music is similar to a microphone. There is no reason to believe this at all; to me, it is just as likely that the brain is acting as an acquirer and interpreter of symbols. The quality of the sound is part of the symbol’s meaning, but cannot be examined in isolation.

As a result, it may just be that there is no way for a human listener to reliably discern anything but the most obvious audio differences in A/B/X listening tests. Using real music, the listener may be perceiving sound quality differences as changes in the perceived meaning of the symbols, but repeated listenings (like reading a phrase over and over), or listening to extracts out of context, kills all meaning and therefore kills any discernment of sound quality. Consciously listening for differences as opposed to listening to the music, pressing buttons while listening, breaking the flow of the music in any way, all have a similar effect. Alternatively, using electronic bleeps, or randomised snippets as the ‘test signal’, the listener is effectively hearing a stream of noise without any context or meaning, so the brain has nothing to attach the sound quality to at all.

In effect, the act of listening for sound quality in scientific trials may kill our ability to discern sound quality. Can this be proved either way? No.

I don’t see this as a problem to be ‘solved’; it is simply the kind of paradox that pops up when you start thinking about consciousness. Music has no evolutionary survival value, but we enjoy listening to it anyway – so we are in Weirdsville already. The extreme ‘objectivists’ who hold up ABX testing as science are extremely unimaginative if they think their naïve experiments and dull statistical formulae are a match for human consciousness.

Within the limitations of their chosen technology, most hi-fi systems are created with the aim of being ‘transparent’ to levels that exceed the known limitations of the physiology of the ear, and people seem keen to buy them. Without referring to scientific listening test data, the customers know that, in normal use, proper hi-fi does sound better than an iPod dock with 2″ speaker. But, as their own preference for the sound can’t be proved scientifically because of ‘the observer effect’, and because a human is bound to be influenced by factors other than the sound, then at some level they have to buy their hi-fi equipment ‘on faith’; maybe being influenced by the look of it, or because they believe the meme that vinyl is superior to digital. So be it. But they may find that, later, the system fails to meet their expectations and they are on a ruinous treadmill of “tweaks” and “upgrades”.

On a strictly rational basis, bypassing all that anguish, the new generation of DSP-based speakers gets even closer to the ideal of transparency by virtue of superior design – no listening tests required. I am confident they will sound great when being used for their intended purpose.

[Last edited 06/08/16]

Thoughts on creating stuff


The mysterious driver at the bottom is the original tweeter left in place to avoid having to plug the hole

I just spent an enjoyable evening tuning my converted KEF Concord III speakers. Faced with three drivers in a box, I was able to do the following:

  • Make impulse response measurements of the drivers – near and far field as appropriate to the size and frequency ranges of the drivers (although it’s not a great room for making the far field measurements in)
  • Apply linear phase crossovers at 500Hz/3100Hz with a 4th order slope. Much scope for changing these later.
  • Correct the drivers’ phase based on the measurements.
  • Apply baffle step compensation using a formula based on baffle width.
  • Trim the gain of each driver.
  • Adjust delays by ear to get the ‘fullest’ pink noise sound over several positions around the listening position.
  • ‘Overwrite’ the woofer’s natural response to obtain a new corner frequency at 40 Hz with 12dB per octave roll off.

The KEFs are now sounding beautiful although I didn’t do any room measurements as such – maybe later. Instead, I have been using more of a ‘feedforward’ technique i.e. trust the polypropylene drivers to behave over the narrow frequency ranges we’re using, and don’t mess about with them too much.

The benefits of good imaging

There is lovely deep bass, and the imaging is spectacular – even better than my bigger system. There really is no way to tell that a voice from the middle of the ‘soundstage’ is coming from anywhere but straight ahead and not from the two speakers at the sides. As a result, not only are the individual acoustic sources well separated, but the acoustic surroundings are also reproduced better. These aspects, I think, may be responsible for more than just the enjoyment of hearing voices and instruments coming from different places: I think that imaging, when done well, may trump other aspects of the system. Poorly implemented stereo is probably more confusing to the ear/brain than mono, leaving the listener in no doubt that they are listening to an artificial system. With good stereo, it becomes possible to simply listen to music without thinking about anything else.

Build a four way?

In conjunction with the standard expectation bias warning, I would say the overall sound of the KEFs (so far) is subtly different from my big system and I suspect the baffle widths will have something to do with this – as well as the obvious fact that the 8 inch woofers have got half the area of 12 inch drivers, and the enclosures are one third the volume.

A truly terrible thought is taking shape, however: what would it sound like if I combined these speakers with the 12 inch woofers and enclosures from my large system, to make a huge four way system..? No, I must put the thought out of my head…

The passive alternative

How could all this be done with passive crossovers? How many iterations of the settings did it take me to get to here? Fifty maybe? Surely it would be impossible to do anything like this with soldering irons and bits of wire and passive components. I suppose some people would say that with a comprehensive set of measurements, it would be possible to push a button on a computer and get it to calculate the optimum configuration of resistors, capacitors and inductors to match the target response. Possibly, but (a) it can never work as well as an active system (literally, it can’t – no point in pretending that the two systems are equivalent), and (b) you have to know what your target response is in the first place. It must surely always be a bit of an art, with multiple iterations needed to home in on a really good ‘envelope’ of settings – I am not saying that there is some unique golden combination that is best in every way.

In developing a passive system, every iteration would take between minutes and hours to complete and I don’t think you would get anywhere near the accuracy of matching of responses between adjacent drivers and so on. I wouldn’t even attempt such a thing without first building a computerised box of relays and passive components that could automatically implement the crossover from a SPICE model or whatever output my software produced – it would be quite big box, I think. (A new product idea?)

Something real

With these KEFs, I feel that I have achieved something real which, I think, contrasts strongly with the preoccupations of many technically-oriented audio enthusiasts. In forums I see threads lasting tens or even hundreds of pages concerning the efficacy of USB “re-clockers” or similar. Theory says they don’t do anything; measurements show they don’t do anything (or even make things worse with added ground noise); enthusiasts claim they make a night and day improvement to the sound -> let’s have a listening test; it shows there is no improvement; there must have been something wrong with the test -> let’s do it again.

Or investigations of which lossless file format sounds best. Or which type of ethernet cable is the most musical.

Then there’s MQA and the idea that we must use higher sample rates and ‘de-blurring’ because timing is critical. Then the result is played through passive speakers with massive timing errors between the drivers.

All of these people have far more expertise than me in everything to do with audio, yet they spend their precious time on stuff that produces, literally, nothing.

New bass drivers for KEF Concords

Finally got round to ordering some better bass drivers for the KEF Concord III conversion at the very high end price of £19 each.

They’re Skytronic 902.208 8″ polypropylene drivers, and as you can see, they’re quite a bit beefier magnet-wise than the Peerless SKO200.

There seems to be some confusion about the Thiele Small parameters for this driver. As far as I can tell, the ones here are correct. It probably works out that the 30l KEF cabinets are too small, and we end up with a Q of 0.97. No matter.

I have measured the driver in the cabinet in the nearfield, and attempted to correct it for phase and amplitude, and then modified the filter to give me a driver with 38 Hz corner frequency and a roll-off at 12dB per octave. The cones move quite a lot sometimes, but the sound is good.


902.208 mounted in place on the KEF III. The diameter of this driver rim is smaller than both the originals and the previous Peerless replacements, hence the need to clamp the driver as there isn’t sufficient wood to screw into.

An audio breakthrough

It would appear that there is a particular audiophile DAC with a cult following that gets rave reviews and costs over $2000, and is based on a non-audio DAC chip.

Why would they do that? Well, I think it is so they can run it “NOS” (not New Old Stock, but “non-oversampled”) and add their own “proprietary” filtering – plus it’s different from what the hoi polloi uses so it must be better. But, it would appear that someone has found a glitch, literally.

I am no expert, but I think that because this chip is a non-audio DAC, the output comes directly from a R-2R ladder, or similar. Small capacitive charges are transferred whenever the ladder switches operate, and sometimes the switches don’t all operate at the same speed. This means there is a glitch at the output whenever the DAC value changes, and it is worst when all the switches operate simultaneously i.e. when the most significant bit changes – around the mid range in other words (hmm…). Presumably there are other significant glitches at multiples of 1/4 full scale and 1/8 full scale too.

Low pass filtering the output can reduce the amplitude of the glitch at the expense of increasing the settling time. There are better techniques using a further piece of circuitry (a sample-and-hold) but, apparently, for the designers this was regarded as unacceptable for some reason (why?), and at audio frequencies still wouldn’t be as good as a typical $1 audio DAC in a mobile phone.

The evidence is all in the DAC chip’s data sheet:


I don’t know whether the glitch energy scales with the the VREF (i.e. the full scale signal amplitude), but this glitch is huge compared to the smallest signals that we might generate with the DAC.

An owner of this product now thinks he is hearing a certain harshness in the sound, and seems to have found that when reproducing a sine wave at -90dBFS, the output of the $2000 DAC contains significant glitches at the zero crossings. It would be interesting to know if there are detectable glitches at 1/4 and 1/8 full scale, too. This could be the phenomenon shown in the data sheet, or a by-product of whatever mechanism is being used, unsuccessfully, to suppress the glitches – they are rumoured to be using a combination of two DAC chips. Scrutiny of other reviews and measurements of the device seems to reveal distortion and noise figures that suggest something strange is going on – apparently.

An aspect of integrated circuit DACs is that because they are very small and constructed on a single chip, they have fantastic performance relative to themselves i.e. they remain monotonic and linear at all times. However, their absolute gain and offset may drift slightly with temperature. These temperature coefficients vary from chip to chip and can even be positive for one chip and negative for another (this appears to be the case for this particular DAC chip according to the data sheet). This means that any attempt to blend the outputs of two DAC chips externally using a combination of scaling, offsetting, inverting, mixing and interleaving would be most unlikely to succeed down at the lowest levels.

If these suppositions are correct, then this product is a great example of where the basic engineering of a basic product appears to have been sacrificed in the interests of just making something ‘different’ and supposedly ‘simpler’ – although as usual it ends up being more complicated.

[Last edited 04/05/16]