A difference doesn’t have to be audible to matter

A common view among scientifically-oriented audiophiles is that controlled, double blind listening tests are equivalent to objective measurements. Such people may be further subdivided into those who believe that ‘preference’ is a genuine indicator of what matters, and those who believe that only ‘difference’ can count as real science in listening tests.

I can think of many, many, philosophical objections to the whole notion of assuming that listening tests are ‘scientific’, but just concentrating on that supposedly rigorous idea of ‘difference’ being scientific, we might suggest the following analogy:

Suppose there is a scene in a film that shows a thousand birds wheeling over a landscape. The emotional response is to see the scene as ‘magnificent’. In this case, the ‘magnificence’ stems from the complexity; the order emerging out of what looks like chaos; the amazing spectacle of so many similar creatures in one place. It would be reasonable, perhaps, to suggest that the ‘magnificence’ is more-or-less proportional to the number of birds.

Well, suppose we wish to stream that scene over the internet in high definition. The bandwidth required to do this would be prohibitive so we feed it into a lossy compression algorithm. One of the things it does is to remove noise and grain, and it finds the birds to be quite noise-like. So it removes a few of them, or fuses a few of them together into a single ‘blob’. Would the viewer identify the difference?

I suggest not. Within such complexity, they might only be able to see it if you pointed it out to them, and even after they knew where to look they might not see it the next time. But the ‘magnificence’ would have been diminished nevertheless. By turning up the compression ratio, we might remove more and more of the birds.

This sensation of ‘magnificence’ is not something you can put into words, and it is not something you are consciously aware of. But in this case, it would be reasonable to suggest that the ‘magnificence’ was being reduced progressively. The complexity would be such that the viewer wouldn’t consciously see the difference when asked to spot it, but clearly the emotional impact would be being reduced/altered.

For all their pretensions to scientific rigour, double blind listening tests are fundamentally failing in what they purport to do. They can only access the listener’s conscious perception, while the main aim of listening to music is to affect the subconscious. Defects in audio hardware (distortion, non-flat frequency response, phase shifts, etc.) all tend to blur the separation between individual sources and in so doing reduce the complexity of what we are hearing – it becomes a flavoured paste rather than maintaining its original granularity and texture, but we cannot necessarily hear the difference consciously. Nevertheless, we can work out rationally that complexity is one of the things that the we respond to emotionally. So even though we cannot hear a difference, the emotional impact is being affected, anyway.

I get to hear the Kii Threes

Thanks to a giant favour from a new friend, I finally get to hear the Kii Threes…


A couple of Sundays ago, a large van arrived at my house containing two Kii Threes and their monumentally heavy stands, plus a pair of Linkwkitz LX Minis with subwoofers along with their knowledgeable owner, John. It was our intention to spend the day comparing speakers.

We first set up the Kiis to compare against my ‘Keph’ speakers and to do this, we had to ‘interleave’ the speaker pairs with slightly less stereo separation and symmetry than ideal, perhaps:

2018-11-18 14-19-29

Setting up went remarkably smoothly, and we soon had the Kiis running off Tidal on a laptop while the Kephs were fed with Spotify Premium – most tracks seemed to be available from both services. The Kiis are elegant in the simplicity of cabling and the lack of extraneous boxes.

John had set up the Kiis with his preferred downwards frequency response slope that starts at 3kHz and ended at 4dB down (at 22 Khz?). I can’t say what significance this might have had on our listening experiment.

The original idea was to match the SPLs using pink noise and a sound level meter. This we did, but didn’t maintain such discipline for long. We were listening rather louder than I would normally, but this was inevitable because of the Kii’s amazing volume capabilities.

The bottom line is that the Kiis are spectacular! The main differences for me were that the Kiis were ‘smoother’ and the bass went deeper, and they seemed to show up the ‘ambience’ in many recordings more than the Kephs – more about that later. An SPL meter revealed that what sounded like equal volume required, in fact, a measurably higher SPL from the Kephs. Could this be our hearing registering the direct sound, but the Kiis’ superior dispersion abilities resulting in less reverberant sound – ignored by our conscious hearing but no doubt obscuring detail? Or possibly an artefact of their different frequency responses? We didn’t really have time to investigate this any further.

When standing a long way from both sets of speakers at the back of the room, the Kephs appeared to be emphasising the midrange more, and at the moment of changeover between speakers that contrast didn’t sound good; with a certain classical piano track, at the moment of changeover the Kephs seemed to render the sound* of the piano kind of ‘plinky plonk’ or toy-like compared to the Kiis – but then after about 10 seconds I got used to it. Without the Kiis to compare against I would have said my Kephs sounded quite good..! But the Kiis were clearly doing something very special.

I did try some ad hoc modifications of the Keph driver gains, baffle step slopes and so on, and we maybe got a bit closer in that regard. But I forgot about the -4dB slope that had been applied to the Kiis, and if I had thought about it, I already had an option in the Kephs’ config file for doing just that. But really, I wish I had had the courage of my convictions and left the the frequency response ‘as is’.

Ultimately, I think that we were running into the very reason why the Kiis are designed the way they are: to resemble a big speaker. As the blurb for the Kii says:

“The THREE’s ability to direct bass is comparable to, but much better controlled than that of a traditional speaker several meters wide.”

It’s about avoiding reflections that blur bass detail, but as R.E. Greene explains, it’s also about frequency response:

“What is true of the mini-monitor, that it cannot be EQed to sound right, is also true of narrow-front floor-standers. They sound too midrange-oriented because of the nature of the room sound. This is something about the geometry of the design. It cannot be substantially altered by crossover decisions and so on.

A conventional small speaker (and the Kephs are relatively small) cannot be equalised to give a flat direct sound and flat room sound. It has to be a compromise and as I described before, I apply baffle step compensation to help bridge this discrepancy between the direct and ambient frequency balances. The results are, so I thought, rather acceptable, but the compromise shows up against a speaker with more controlled dispersion.

This must always be a factor in the sound of conventional speakers unless sitting very close to them. I do believe Bruno Putzeys when he says that large speakers (or those that cleverly simulate largeness) will always sound different from small ones. It would be interesting also to have compared the Kiis against my bigger speakers whose baffle step is almost an octave lower.

However, there was another difference that bothered me (with the usual sighted listening caveats) and this was ‘focus’. With the Kiis I heard lots of ‘ambience’ – almost ‘surround sound’ – but I didn’t hear a super-precise image. When the Kephs were substituted I heard a sudden snap into focus, and everything moved to mainly between and beyond the speakers. The sound was less ‘smooth’ but it was, to me, more focused.

And this is a question I still have about the Kiis and other speakers that utilise anti-phase. I see the animations on the Kii web site that show how the rear drivers cancel out the sound that would otherwise go behind the speaker. To do this, the rear drivers must deliver a measured quantity of accurately-timed anti-phase. This is a brilliant idea.

My question is, though: how complete is this cancellation if you partially obscure one of the side drivers (-with another speaker in this case)? I do wonder if I was hearing the results of anti-phase escaping into the room and messing up the imaging because of the way we had arranged the speakers – along with a mildly (possibly imaginary!) uncomfortable sensation in my ears and head.

To a frequency response measurements oriented person, it doesn’t matter whether sound is anti-, or in-, phase; it is just ‘frequency response material’ that gets chucked into bins and totted up at the end of the measurement. If it is delayed and reflected then in graphs its effects appear no different from the visually-chaotic results of all room reflections; this is the usual argument against phase accuracy in forum discussions. “How can phase matter if it is shifted arbitrarily by reflections in the room, anyway?”.

However, to the person who acknowledges that the time domain is also important, anti-phase is a problem. If human hearing has the ability to separate direct sound from room sound, it is dependent on being able to register the time-delayed similarity between direct and reflected sound. If the reflected sound is inverted relative to the direct, that similarity is not as strong (we are talking about transients more than steady state waveforms). In fact, the reflected sound may partially register as a different source of sound.

Anti-phase is surely going to sound weird – and indeed it does, as anyone who has heard stereo speakers wired out of phase will attest. Where the listener registers in-phase stereo speakers as producing a precise image located at one point in space, out-of-phase speakers produce an image located nowhere and/or everywhere. The makers of pseudo-surround sound systems such as Q-Sound exploit this in order to create images that are not restricted to between the stereo speakers. This may be a factor in the open baffle sound that some people like (but I don’t!).

So I would suggest that allowing anti-phase to bounce around the room is going to produce unpredictable results. This is one reason why I am suspicious of any speaker that releases the backwave of the driver cone into the room. The more this can be attenuated (and its bandwidth restricted) the better.

With the Kiis, was I hearing the effect of less-than-perfect cancellation because of the obscuring of one of the side drivers? Or imagining it? Most people who have heard the Kiis remark on the precise imaging, so I fear that we managed to change something with our layout. Despite the Kiis’ very clever dispersion control system which supposedly makes them placement-independent, does it pay to be a little careful of placement and/or symmetry, anyway? For it not to matter would be miraculous, I would say.

In a recent review of the Kiis (not available online without a subscription), Martin Colloms says that with the Kiis he heard:

“…sometimes larger than life, out-of-the-box imaging”

I wonder if that could be a trace of what I was hearing..? Or maybe he means it as a pure compliment. In the same review he describes how the cardioid cancellation mechanism extends as far as 1kHz, so it is not just a bass phenomenon.


Next, John set up his DIY Linkwitz LX Mini speakers (which look very attractive, being based on vertical plastic tubes with small ‘pods’ on top), as well as their compact-but-heavy subwoofers. These were fed with analogue signals from a Raspberry-Pi based streamer and, again, sounded excellent. They also seek to control dispersion, in this case by purely acoustic means – that I don’t yet understand. And they may also dabble a bit in backwave anti-phase.

If I had any criticism, it was that the very top end wasn’t quite as good as a conventional tweeter..? But it might be my imagination and expectation bias. Also, our ears and critical faculties were pretty far gone by that point…

Really, we had three systems all of which, to me, sounded good in isolation – but with the Kiis revealing their superior performance at the point of changeover. There were certainly moments of confusion when I didn’t know which system was operating and only the changeover gave the game away. I think all three systems were much better than what you often get at audio shows.

What we didn’t have were any problems with distortion, hum, noise. In these respects, all three systems just worked. The biggest source of any such problem was a laptop fan which kicked in sometimes when running Tidal.

There were lots of things we didn’t do. We didn’t try the speakers in different positions; we didn’t try different toe-in angles; we didn’t make frequency response measurements and do things in a particularly scientific way; we listened at pretty high volume and didn’t have the self-control to listen at lower volumes – which might have been more appropriate for some of the classical music. The room was ‘as it comes’: 6 x 3.4 x 2.4m, carpeted, plaster walls and ceiling, and floor-to-ceiling windows behind the speakers with a few boxes and bits of furniture lying about.


So my conclusion is that I have heard the Kiis and am highly impressed, but there might possibly be an extra level of focus and integrity I have yet to experience. I never got to the point where I could listen through the speakers rather than to them, but I am sure that this will happen at some point.

In the meantime I am having to learn to love my Kephs again – which actually isn’t too hard without the Kiis in the same room showing them up!


Footnotes:

*Since writing that paragraph I have found a mention of possibly that very phenomenon:

“…even a brief comparison with a real piano, say, will reveal the midrange-orientation of the narrow- front wide radiators.”

The Logic of Listening Tests

Casual readers may not believe this, but in the world of audiophilia there are people who enjoy organising scientific listening tests – or more aptly ‘trials’. These involve assembling panels of human ‘subjects’ to listen to snippets of music played through different setups in double blind tests, pressing buttons or filling in forms to indicate audible differences and preferences. The motivation is often to use science to debunk the ideas of a rival group, who may be known as ‘subjectivists’ or ‘objectivists’, or to confirm the ideas of one’s own group.

There are many, many inherent reasons why such listening tests may not be valid e.g.

  • no one can demonstrate that the knowledge you are taking part in an experiment doesn’t impede your ability to hear differences
  • a participant who has his own agenda may choose to ‘lie’ in order to pretend he is not hearing differences when he, in fact, is.
  • etc. etc.

The tests are difficult and tedious for the participants, and no one who holds the opposing viewpoint will be convinced by the results. At a logical level, they are dubious. So why bother to do the tests? I think it is an ‘appeal to a higher authority’ to arbitrate an argument that cannot be solved any other way. ‘Science’ is that higher authority.

But let’s look at just the logic.

We are told that there are two basic types of listening test:

  1. Determining or identifying audible difference
  2. Determining ‘preference’

Presumably the idea is that (1) suggests whether two or more devices or processes are equivalent, or whether their insertion into the audio chain is audibly transparent. If a difference is identified, then (2) can make the information useful and tell us which permutation sounds best to a human. Perhaps there is a notion that in the best case scenario a £100 DAC is found to sound identical to a £100,000 DAC, or that if they do sound different, the £100 DAC is preferred by listeners. Or vice versa.

But would anything actually have been gained by a listening test over simple measurements? A DAC has a very specific, well-defined job to do – we are not talking about observing the natural world and trying to work out what is going on. With today’s technology, it is trivial to make a DAC that is accurate to very close objective tolerances for £100 – it is not necessary to listen to it to know whether it works.

For two DACs to actually sound different, they must be measurably quite far apart. At least one of them is not even close to being a DAC: it is, in fact, an effects box of some kind. And such are the fundamental uncertainties in all experiments involving the asking of humans how they feel, it is entirely possible that in a preference-based listening test, the listeners are found to prefer the sound of the effects box.

Or not. It depends on myriad unstable factors. An effects box that adds some harmonic distortion may make certain recordings sound ‘louder’ or ‘more exciting’ thus eliciting a preference for it today – with those specific recordings. But the experiment cannot show that the listeners wouldn’t be bored with the effect three hours, days or months down the line. Or that they wouldn’t hate it if it happened to be raining. Or if the walls were painted yellow, not blue. You get the idea: it is nothing but aesthetic judgement, the classic condition where science becomes pseudoscience no matter how ‘scientific’ the methodology.

The results may be fed into statistical formulae and the handle cranked, allowing the experimenter to declare “statistical significance”, but this is just the usual misunderstanding of statistics, which are only valid under very specific mathematical conditions. If your experiment is built on invalid assumptions, the statistics mean nothing.

If we think it is acceptable for a ‘DAC’ to impose its own “effects” on the sound, where do we stop? Home theatre amps often have buttons labelled ‘Super Stereo’ or ‘Concert Hall’. Before we go declaring that the £100,000 DAC’s ‘effect’ is worth the money, shouldn’t we also verify that our experiment doesn’t show that ‘Super Stereo’ is even better? Or that a £10 DAC off Amazon isn’t even better than that? This is the open-ended illogicality of preference-based listening tests.

If the device is supposed to be a “DAC”, it can do no more than meet the objective definition of a DAC to a tolerably close degree. How do we know what “tolerably close” is? Well, if we were to simulate the known, objective, measured error, and amplify it by a factor of a hundred, and still fail to be able to hear it at normal listening levels in a quiet room, I think we would have our answer. This is the one listening test that I think would be useful.

Room correction. What are we trying to achieve?

The short version…

The recent availability of DSP is leading some people to assume that speakers are, and have always been, ‘wrong’ unless EQ’ed to invert the room’s acoustics.

In fact, our audio ancestors didn’t get it wrong. Only a neutral speaker is ‘right’, and the acoustics of an average room are an enhancement to the sound. If we don’t like the sound of the room, we must change the room – not the sound from the speaker.

DSP gives us the tools to build a more neutral speaker than ever before.


There are endless discussions about room correction, and many different commercial products and methods. Some people seem to like certain results while others find them a little strange-sounding.

I am not actually sure what it is that people are trying to achieve. I can’t help but think that if someone feels the need for room correction, they have yet to hear a system that sounds so good that they wouldn’t dream of messing it up with another layer of their own ‘EQ’.

Another possibility is that they are making an unwarranted assumption based on the fact that there are large objective differences between the recorded waveform and what reaches the listener’s ears in a real room. That must mean that no matter how good it sounds, there’s an error. It could sound even better, right?

No.

A reviewer of the Kii Three found that that particularly neutral speaker sounded perfect straight out of the box.

“…the traditional kind of subjective analysis we speaker reviewers default to — describing the tonal balance and making a judgement about the competence of a monitor’s basic frequency response — is somehow rendered a little pointless with the Kii Three. It sounds so transparent and creates such fundamentally believable audio that thoughts of ‘dull’ or ‘bright’ seem somehow superfluous.”

The Kii Three does, however, offer a number of preset “contour” EQ options. As I shall describe later, I think that a variation on this is all that is required to refine the sound of any well-designed neutral speaker in most rooms.

A distinction is often made between correction of the bass and higher frequencies. If the room is large, and furnished copiously, there may be no problem to solve in either case, and this is the ideal situation. But some bass manipulation may be needed in many rooms. At a minimum, the person with sealed woofers needs the roll-off at the bottom end to start at about the right frequency for the room. This, in itself, is a form of ‘room correction’.

The controversial aspect is the question of whether we need ‘correction’ higher up. Should it be applied routinely (some people think so), as sparingly as possible, or not at all? And if people do hear an improvement, is that because the system is inherently correcting less-than-ideal speakers rather than the room?

Here are some ways of looking at the issue.

  1. Single room reflections give us echoes, while multiple reflections (of reflections) give us reverberation. Performing a frequency response measurement with a neutral transducer and analysing the result may show a non-flat FR at the listening position even when smoothed fairly heavily. This is just an aspect of statistics, and of the geometry and absorptivity of the various surfaces in the room. Some reflections will result in some frequencies summing in phase, to some extent, and others not.
  2. Experience tells us that we “hear through” the room to any acoustic source. Our hearing appears not to be just a frequency response analyser, but can separate direct sound from reflections. This is not a fanciful idea: adaptive software can learn to do the same thing.

The idea is also supported by some of the great and the good in audio.

Floyd Toole:

“…we humans manage to compensate for many of the temporal and timbral variations contributed by rooms and hear “through” them to appreciate certain essential qualities of sound sources within these spaces.”

Or Meridian’s Bob Stuart:

“Our brains are able to separate direct sound from the reverberation…”

  1. If we EQ the FR of the speaker to obtain a flat in-room measured response including the reflections in the measurement, it seems that we will subsequently “hear through” the reflections to a strangely-EQ’ed direct sound. It will, nevertheless measure ‘perfectly’.
  2. Audio orthodoxy maintains that humans are supremely insensitive to phase distortion, and this is often compounded with the argument that room reflections completely swamp phase information so it is not worth worrying about. This denies the possibility that we “hear through” the room. Listening tests in the past that purportedly demonstrated our inability to hear the effects of phase have often been based on mono only, and didn’t compare distorted with undistorted phase examples – merely distorted versus differently distorted, played on the then available equipment.
  3. Contradicting (4), audiophiles traditionally fear crossovers because the phase shifts inherent in (non-DSP) crossovers are, they say, always audible. DSP, on the other hand, allows us to create crossovers without any phase shift i.e. they are ‘transparent’.
  4. At a minimum, speaker drivers on their baffles should not ‘fight’ each other through the crossover – their phases should be aligned. The appropriate delays then ensure that they are not ‘fighting’ at the listener’s position. The next level in performance is to ensure that their phases are flat at all frequencies i.e. linear phase. The result of this is the recorded waveform preserved in both frequency and time.
  5. Intuitively, genuine stereo imaging is likely to be a function of phase and timing. Preserving that phase and timing should probably be something we logically try to do. We could ‘second guess’ how it works using traditional rules of thumb, deciding not to preserve the phase and timing, but if it is effectively cost-free to do it, why not do it anyway?
  6. A ‘perfect’ response from many speaker/room combinations can be guaranteed using DSP (deconvolution with the impulse response at that point, not just playing with a graphic equaliser). Unfortunately, it will only be valid for a single point in space, and moving 1mm from there will produce errors and unquantifiable sonic effects. Additionally, ‘perfect’ refers to the ‘anechoic chamber’ version of the recording, which may not be what most people are trying to achieve even if the measurements they think they seek mean precisely that.
  7. Room effects such as (moderate) reverberation are a major difference between listening with speakers versus headphones, and are actually desirable. ‘Room correction’ would be a bad thing if it literally removed the room from the sound. If that is the case, what exactly do we think ‘room correction’ is for?
  8. Even if the drivers are neutral (in an anechoic situation) and crossed over perfectly on axis, they are of finite size and mounted in a box or on a baffle that has a physical size and shape. This produces certain frequency-dependent dispersion characteristics which give different measured, and subjective, results in different rooms. Some questions are:
    • is this dispersion characteristic a ‘room effect’ or a ‘speaker effect’. Or both?
    • is there a simple objective measurement that says one result is better than any other?
    • is there just one ‘right’ result and all others are ‘wrong’?
  1. Should room correction attempt to correct the speaker as well? Or should we, in fact, only correct the speaker? Or just the room? If so, how would we separate room from speaker in our measurements? Can they, in fact, be separated?

I think there is a formula that gives good results. It says:

  • Don’t rely on feedback from in-room measurements, but do ‘neutralise’ the speaker at the most elemental levels first. At every stage, go for the most neutral (and locally correctable) option e.g. sealed woofers, DSP-based linear phase crossovers with time alignment delays.
  • Simply avoid configurations that are going to give inherently weird results: two-way speakers, bass reflex, many types of passive crossover etc. These may not even be partially correctable in any meaningful way.
  • Phase and time alignment are sacrosanct. This is the secret ingredient. You can play with minor changes to the ‘tone colour’ separately, but your direct sound must always maintain the recording’s phase and time alignment. This implies that FIR filters must be used, thus allowing frequency response to be modified independently of phase.
  • By all means do all the good stuff regarding speaker placement, room treatments (the room is always ‘valid’), and avoiding objects and asymmetry around the speakers themselves.
  • Notionally, I propose that we wish to correct the speaker not the room. However, we are faced with a room and non-neutral speaker that are intertwined due to the fact that the speaker has multiple drivers of finite size and a physical presence (as opposed to being a point source with uniform directivity at all frequencies). The artefacts resulting from this are room-dependent and can never really be ‘corrected’ unambiguously. Luckily, a smooth EQ curve can make the sound subjectively near enough to transparent. To obtain this curve, predict the baffle step correction for each driver using modelling or standard formula with some some trial-and-error regarding the depth required (4, 5, 6 dB?); this is a very smooth EQ curve. Or, possibly (I haven’t done this myself), make many FR measurements around the listening area, smooth and average them together, and partially invert this, again without altering phase and time alignment.
  • You are hearing the direct sound, plus separately-perceived ‘room ambience’. If you don’t like the sound of the ambience, you must change the room, not the direct sound.

Is there any scientific evidence for these assertions? No more nor less than any other ‘room correction’ technique – just logical deduction based on subjective experience. Really, it is just a case of thinking about what we hear as we move around and between rooms, compared to what the simple in-room FR measurements show. Why do real musicians not need ‘correction’ when they play in different venues? Do we really want ‘headphone sound’ when listening in rooms? (If so, just wear headphones or sit closer to smaller speakers).

This does not say that neutral drivers alone are sufficient to guarantee good sound – I have observed that this is not the case. A simple baffle step correction applied to frequency response (but leaving phase and timing intact) can greatly improve the sound of a real loudspeaker in a room without affecting how sharply-imaged and dynamic it sounds. I surmise that frequency response can be regarded as ‘colour’ (or “chrominance” in old school video speak), independent of the ‘detail’ (or “luminance”) of phase and timing. We can work towards a frequency response that compensates for the combination of room and speaker dispersion effects to give the right subjective ‘colour’ as long as we maintain accurate phase and timing of the direct sound.

We are not (necessarily) trying to flatten the in-room FR as measured at the listener’s position – the EQ we apply is very smooth and shallow – but the result will still be perceived as a flat FR. Many (most?) existing speakers inherently have this EQ built in whether their creators applied it deliberately, or via the ‘voicing’ they did when setting the speaker up for use in an average room.

In conclusion, the summary is this:

  • Humans “hear through” the room to the direct sound; the room is perceived as a separate ‘ambience’. Because of this, ‘no correction’ really is the correct strategy.
  • Simply flattening the FR at the listening position via EQ of the speaker output is likely to result in ‘peculiar’ perceived sound, even if the in-room measurements purport to say otherwise.
  • Speakers have to be as rigorously neutral as possible by design, rather than attempting to correct them by ‘global feedback’ in the room.
  • Final refinement is a speaker/room-dependent, smooth, shallow EQ curve that doesn’t touch phase and timing – only FIR filters can do this.

[Last updated 05/04/17]

Neural Adaptation

Just an interesting snippet regarding a characteristic of human hearing (and all our senses). It is called neural adaptation.

Neural adaptation or sensory adaptation is a change over time in the responsiveness of the sensory system to a constant stimulus. It is usually experienced as a change in the stimulus. For example, if one rests one’s hand on a table, one immediately feels the table’s surface on one’s skin. Within a few seconds, however, one ceases to feel the table’s surface. The sensory neurons stimulated by the table’s surface respond immediately, but then respond less and less until they may not respond at all; this is an example of neural adaptation. Neural adaptation is also thought to happen at a more central level such as the cortex.

Fast and slow adaptation
One has to distinguish fast adaptation from slow adaptation. Fast adaptation occurs immediately after stimulus presentation i.e., within 100s of milliseconds. Slow adaptive processes that take minutes, hours or even days. The two classes of neural adaptation may rely on very different physiological mechanisms.

Auditory adaptation, as perceptual adaptation with other senses, is the process by which individuals adapt to sounds and noises. As research has shown, as time progresses, individuals tend to adapt to sounds and tend to distinguish them less frequently after a while. Sensory adaptation tends to blend sounds into one, variable sound, rather than having several separate sounds as a series. Moreover, after repeated perception, individuals tend to adapt to sounds to the point where they no longer consciously perceive it, or rather, “block it out”.

What this says to me is that perceived sound characteristics are variable depending on how long the person has been listening, and to what sequence of ‘stimulii’. Our senses, to some extent, are change detectors not ‘direct coupled’.

Something of a conundrum for listening-based audio equipment testing..? Our hearing begins to change the moment we start listening. It becomes desensitised to repeated exposure to a sound – one of the cornerstones of many types of listening-based testing.

The Machine Learning delusion

This morning my personal biological computer detected a correlation between these two articles:

Sony’s SenseMe™ – A Superior Smart Shuffle

Machine learning: why we mustn’t be slaves to the algorithm

In the first article, the author is praising a “smart shuffle” algorithm that sequences tracks in your music collection with various themes such as “energetic, relax, upbeat”. It does this by analysing the music’s mood and tempo. It sounds amazing:

“I would never think of playing Steve Earl’s “Loretta” right after listening to the Boulder Philharmonic’s performance of “Olvidala,” or Ry Cooder’s “Crazy About an Automobile” followed by Doc and Merle Watson playing “Take Me Out to the Ballgame,” but I enjoyed not only the selections themselves but the way SensMe™ juxtaposes one after another, like a DJ who knows your collection better than you do…what will “he” play next? Surprise! It’s all good.”

And the algorithm’s effects go beyond mere music:

“SenseMe™ has brought domestic harmony – interesting selections for me and music with a similar mood for her. That’s better than marriage counseling! “

The author of the second article takes a more sceptical view. He notes the dumbness of Machine LearningTM algorithms, but says that

“…because these outputs are computer-generated, they are currently regarded with awe and amazement by bemused citizens …”

He quotes someone who is aware of the limitations:

“Machine learning is like a deep-fat fryer. If you’ve never deep-fried something before, you think to yourself: ‘This is amazing! I bet this would work on anything!’ And it kind of does. In our case, the deep fryer is a toolbox of statistical techniques. The names keep changing – it used to be unsupervised learning, now it’s called big data or deep learning or AI. Next year it will be called something else. But the core ideas don’t change. You train a computer on lots of data, and it learns to recognise structure.”

“But,” continues Cegłowski, “the fact that the same generic approach works across a wide range of domains should make you suspicious about how much insight it’s adding.”

I have been there. Machine learning is one of the most seductive branches of computer science, and in my experience is a very “easy sell” to people – I use it in my job in actual engineering applications where it can be eerily effective.

But if algorithms are so clever and know us so well, why are we using them only to shuffle the order of music? Why not cut out the middleman and get the computer to compose the music for us directly? The answer is obvious: it doesn’t work because we don’t know how the human brain works, and it is not predictable. By extension, the algorithms that purport to help us in matters of taste don’t actually work either. As the Guardian article says, all we are responding to is the novelty of the idea.

The musical ‘observer effect’

In scientific audio circles, it is believed that if you are aware (or think you are aware) of what hardware you are listening to, then you are incapable of any sort of objective assessment of its quality. This leads to the blind listening test being held up as the Gold Standard for audio science.

But here’s an irony: almost everything of value that man creates comes into being through a process of ‘sighted’ creation and refinement – and it seems to work. Bridges are designed by architects who refine CAD models on a screen, but the finished products don’t fall down, and are admired by ordinary people for their appearance. Car bodies are designed by engineers and stylists in full sight, yet the holes line up with the rest of the car, and they achieve great measurements for aerodynamics and the cars look good as well. Pianos are tuned by people who know which way they are turning the lever as they listen.

So if ‘sighted-ness’ leads to a completely fictitious, imaginary perception, then presumably our pianos are not really in tune, but we imagine they are? Maybe everyone but the piano tuner would hear an out-of-tune cacophony when the piano is played? But no, it turns out that everyone, including the piano tuner, can tell consistently when a piano is in tune without resorting to blind tests, and this can be confirmed with measurements.

So how come ‘sightedness’ is so problematic for the creation or assessment of audio equipment? I think that the question is “not even wrong”. The faulty logic lies in the erroneous idea that audio equipment is being listened to, as opposed to through, and that the human brain when listening to music is similar to a microphone. There is no reason to believe this at all; to me, it is just as likely that the brain is acting as an acquirer and interpreter of symbols. The quality of the sound is part of the symbol’s meaning, but cannot be examined in isolation.

As a result, it may just be that there is no way for a human listener to reliably discern anything but the most obvious audio differences in A/B/X listening tests. Using real music, the listener may be perceiving sound quality differences as changes in the perceived meaning of the symbols, but repeated listenings (like reading a phrase over and over), or listening to extracts out of context, kills all meaning and therefore kills any discernment of sound quality. Consciously listening for differences as opposed to listening to the music, pressing buttons while listening, breaking the flow of the music in any way, all have a similar effect. Alternatively, using electronic bleeps, or randomised snippets as the ‘test signal’, the listener is effectively hearing a stream of noise without any context or meaning, so the brain has nothing to attach the sound quality to at all.

In effect, the act of listening for sound quality in scientific trials may kill our ability to discern sound quality. Can this be proved either way? No.

I don’t see this as a problem to be ‘solved’; it is simply the kind of paradox that pops up when you start thinking about consciousness. Music has no evolutionary survival value, but we enjoy listening to it anyway – so we are in Weirdsville already. The extreme ‘objectivists’ who hold up ABX testing as science are extremely unimaginative if they think their naïve experiments and dull statistical formulae are a match for human consciousness.

Within the limitations of their chosen technology, most hi-fi systems are created with the aim of being ‘transparent’ to levels that exceed the known limitations of the physiology of the ear, and people seem keen to buy them. Without referring to scientific listening test data, the customers know that, in normal use, proper hi-fi does sound better than an iPod dock with 2″ speaker. But, as their own preference for the sound can’t be proved scientifically because of ‘the observer effect’, and because a human is bound to be influenced by factors other than the sound, then at some level they have to buy their hi-fi equipment ‘on faith’; maybe being influenced by the look of it, or because they believe the meme that vinyl is superior to digital. So be it. But they may find that, later, the system fails to meet their expectations and they are on a ruinous treadmill of “tweaks” and “upgrades”.

On a strictly rational basis, bypassing all that anguish, the new generation of DSP-based speakers gets even closer to the ideal of transparency by virtue of superior design – no listening tests required. I am confident they will sound great when being used for their intended purpose.

[Last edited 06/08/16]

Thoughts on creating stuff

IMG_0206

The mysterious driver at the bottom is the original tweeter left in place to avoid having to plug the hole

I just spent an enjoyable evening tuning my converted KEF Concord III speakers. Faced with three drivers in a box, I was able to do the following:

  • Make impulse response measurements of the drivers – near and far field as appropriate to the size and frequency ranges of the drivers (although it’s not a great room for making the far field measurements in)
  • Apply linear phase crossovers at 500Hz/3100Hz with a 4th order slope. Much scope for changing these later.
  • Correct the drivers’ phase based on the measurements.
  • Apply baffle step compensation using a formula based on baffle width.
  • Trim the gain of each driver.
  • Adjust delays by ear to get the ‘fullest’ pink noise sound over several positions around the listening position.
  • ‘Overwrite’ the woofer’s natural response to obtain a new corner frequency at 40 Hz with 12dB per octave roll off.

The KEFs are now sounding beautiful although I didn’t do any room measurements as such – maybe later. Instead, I have been using more of a ‘feedforward’ technique i.e. trust the polypropylene drivers to behave over the narrow frequency ranges we’re using, and don’t mess about with them too much.

The benefits of good imaging

There is lovely deep bass, and the imaging is spectacular – even better than my bigger system. There really is no way to tell that a voice from the middle of the ‘soundstage’ is coming from anywhere but straight ahead and not from the two speakers at the sides. As a result, not only are the individual acoustic sources well separated, but the acoustic surroundings are also reproduced better. These aspects, I think, may be responsible for more than just the enjoyment of hearing voices and instruments coming from different places: I think that imaging, when done well, may trump other aspects of the system. Poorly implemented stereo is probably more confusing to the ear/brain than mono, leaving the listener in no doubt that they are listening to an artificial system. With good stereo, it becomes possible to simply listen to music without thinking about anything else.

Build a four way?

In conjunction with the standard expectation bias warning, I would say the overall sound of the KEFs (so far) is subtly different from my big system and I suspect the baffle widths will have something to do with this – as well as the obvious fact that the 8 inch woofers have got half the area of 12 inch drivers, and the enclosures are one third the volume.

A truly terrible thought is taking shape, however: what would it sound like if I combined these speakers with the 12 inch woofers and enclosures from my large system, to make a huge four way system..? No, I must put the thought out of my head…

The passive alternative

How could all this be done with passive crossovers? How many iterations of the settings did it take me to get to here? Fifty maybe? Surely it would be impossible to do anything like this with soldering irons and bits of wire and passive components. I suppose some people would say that with a comprehensive set of measurements, it would be possible to push a button on a computer and get it to calculate the optimum configuration of resistors, capacitors and inductors to match the target response. Possibly, but (a) it can never work as well as an active system (literally, it can’t – no point in pretending that the two systems are equivalent), and (b) you have to know what your target response is in the first place. It must surely always be a bit of an art, with multiple iterations needed to home in on a really good ‘envelope’ of settings – I am not saying that there is some unique golden combination that is best in every way.

In developing a passive system, every iteration would take between minutes and hours to complete and I don’t think you would get anywhere near the accuracy of matching of responses between adjacent drivers and so on. I wouldn’t even attempt such a thing without first building a computerised box of relays and passive components that could automatically implement the crossover from a SPICE model or whatever output my software produced – it would be quite big box, I think. (A new product idea?)

Something real

With these KEFs, I feel that I have achieved something real which, I think, contrasts strongly with the preoccupations of many technically-oriented audio enthusiasts. In forums I see threads lasting tens or even hundreds of pages concerning the efficacy of USB “re-clockers” or similar. Theory says they don’t do anything; measurements show they don’t do anything (or even make things worse with added ground noise); enthusiasts claim they make a night and day improvement to the sound -> let’s have a listening test; it shows there is no improvement; there must have been something wrong with the test -> let’s do it again.

Or investigations of which lossless file format sounds best. Or which type of ethernet cable is the most musical.

Then there’s MQA and the idea that we must use higher sample rates and ‘de-blurring’ because timing is critical. Then the result is played through passive speakers with massive timing errors between the drivers.

All of these people have far more expertise than me in everything to do with audio, yet they spend their precious time on stuff that produces, literally, nothing.

The Subjectivist/Objectivist Synthesis (Audiostream’s latest article)

Audiostream has posted a new article on the Objectivist/”Subjectivist” debate.

(I have to be careful to put inverted commas around the word “subjectivist” because, as an earlier commenter pointed out quite rightly, “subjectivists” rarely justify their name; in reality they are usually claiming that their subjective experiences are a superior form of objective measurement).

In summary, the new article says “Hey, everyone’s views are valid. Let’s not too get dogmatic, and just have a beer”. I think this misses the point (or several points). For me, the biggest ones are these:

  • Measurements are always incomplete, and are just part of the picture; this seems to be misunderstood by many people, who think that there is a supernatural mystery as to why supposedly good measurements don’t always correlate with good sound. I don’t think there is a mystery.
  • Subjective experiences are always affected by spurious factors; if the measurements of what comes out of the speakers show substantial deviation from the signal (i.e. are ‘bad’), this, for me, trumps the subjective opinion. But as mentioned above, if the measurements are ‘good’ they may still be incomplete.
  • Let’s only start worrying about the above when we’ve designed the system to avoid the obvious errors as best we can. This is neither objectivism, nor “subjectivism”, but rationalism. For reasons of tradition and ‘folk intuition’, this may be rare in the world of hi-fi.
  • I would be only too keen to hear people’s subjective experiences if they are referring to something measurably out of the ordinary and arguably good; less so if they are describing differences between digital cables that, rationally, cannot be affecting the sound. Some people do it all within the same article!

Light entertainment

Here’s a little controversy from the archives of Stereophile magazine.

Stereophile has an interesting policy whereby an equipment reviewer writes up his subjective experience of testing a device, and only then is it measured for distortion, frequency response and so on. It seems that the magazine has the integrity to publish the two reports whatever the outcome.

Have you ever seen a more polarised review than this one from 2005?

The reviewer says:

The CyberLights represent one of the greatest technological breakthroughs in high-performance audio that I have experienced in my audiophile lifetime….

…for the first time in your life you’ll hear no cables whatsoever. When you switch back to any brand of metal conductors, you’ll know you’re hearing cables—because what’s transmitted via CyberLight will be the most gloriously open, coherent, delicate, extended, transparent, pristine sound you’ve ever heard from your system…

The measurements person says:

If this review were of a conventional product, I would dismiss it as being broken. …I really don’t see how the CyberLight P2A and Wave cables can be recommended. I am puzzled that Harmonic Technology, which makes good-sounding, reasonably priced conventional cables, would risk their reputation with something as technically flawed as the CyberLight.

You’ll have to read the full review for yourself, because the contrast between the two opinions is almost comical. The measurements are quite something to behold.

You see, I sometimes worry that perhaps I just don’t ‘get’ this hi-fi business. £80,000 analogue systems don’t sound anything special to me. Vinyl doesn’t sound as good as digital to my ears but everyone else says it is much better. Designing and building my own system was really quite straightforward, yet the internet is full of intense discussion about how difficult it is; people spend their entire lives building their own speakers and are never happy with them yet it’s almost three years in and counting, and I haven’t felt motivated to modify mine yet. Are the experts hearing something I am not? Perhaps this review sheds some light on the answer.

Analogue enthusiasts often claim that the signal-modifying effects of whatever product they are listening to actually improve the sound. The usual line is that the indefinable magic of valves and vinyl is down to what those devices add: they are serendipitously restoring something that is supposedly missing from the recording. ‘Poor’ measurements are simply an indication of an harmonious combination of factors that enable the leap from clinical, neutral signal to real music. There is no argument possible against this assertion.

However, in the above review, the writer cannot make that claim. Clearly he has confused high levels of distortion and noise plus extreme frequency response variations as an absence of colouration. For him, replacing metal cables with “light” was all about removing “grunge” and other “well-known problems”. Because of his extreme analogo-philia, I don’t think he actually knew what ‘neutral’ sounded like. When he heard something that was different from anything he had heard before, he automatically assumed that it must be because cables really are the sonic quagmire he thought they were and that the product was doing what he assumed it was designed to do. For once, it actually was a “night and day” difference but his understanding of what he was hearing was 180 degrees wrong. In the scheme of things, it doesn’t really matter, but it reassures me that 99% of the ‘expert’ opinion based on listening is very dubious indeed – I do think there are people out there who would find much to like in a pair of yoghurt pots linked with string as long as they cost enough.

Stereophile, it appears, doesn’t normally measure cables when they are reviewed. I think we can guess why: there is nothing to measure. Each and every review would feature the same distortion and noise measurements at the very lowest depths of the test equipment’s range, plus a ruler-flat frequency response when using the cable in normal circumstances. It wouldn’t matter if the cable cost £1 or £10,000 – which, absurdly, they sometimes do. To arrange anything different would actually be quite difficult. It is this complete, boring neutrality that Michael Fremer and other cable mythologisers are convinced is plagued with “grunge” and other problems. The justification for the Cyberlight product, so appealing to Fremer, is that it replaces a short section of metal with light and fibre optics, and is analogue – you still connect to the input and output with those awful grungy wires. It is no different from becoming excited about the audio quality of headphones that use an analogue wireless link rather than a cable. Just as with those headphones, there is a little “background hiss” but this is a small price to pay, apparently. And just like those headphones, the signal goes through a link of dubious quality. Very dubious. At least there is a valid justification for wireless headphones, though.

If you gave me about £20 to buy a few parts, I could build you this device in an afternoon, probably. But if I did, I would try to make it work properly. I would certainly try to convince you that the whole product was unnecessary and was corrupting the signal, and that if we really had to use fibre optics we should digitise the signal and send it as pulses. I might also point out that the commercial product is a mess: various “wall warts”, $400 battery packs and “pigtails” that could, depending on what equipment you’re using, destroy your speakers.

And don’t ever unplug or plug in the power to the cables with the amplifier turned on or you’ll send a horrendous THUMP through your system.

For people who might dismiss active speakers and DSP as too complex, there are no limits to the Heath Robinson-esqueness that they can tolerate in the name of ‘analogue’.