I’m glad I didn’t sell my CDs

img_3373.jpg

Since finding out that the latest Beatles remasters have been deliberately spoiled with the application of ‘gentle limiting’ I have been thinking that I am glad I didn’t sell my CDs when I went over to streaming.

In the early days of streaming, it seemed as though every different version of every album was available, which made the whole thing seem even better value. It seemed a no-brainer to ditch the CDs and enjoy access to the contents of all the obscure record shops in the world without having to leave your armchair.

Now, I think the streaming people are ‘consolidating’ their catalogues by losing the original albums and generally leaving only the ‘Remastered’ version – the one that has been optimised for listening to in cars and earbuds. And then, I suspect, they will begin to stop mentioning the fact that the album is remastered, and all that will be left will be compressed or ‘limited’ music – even the old stuff.

So I am glad I wasn’t organised enough to sell my several hundred CDs that have been sitting in cardboard boxes for the last 15 years. I don’t care whether they’re files or physical CDs – it’s not the format I’m bothered about – but at least I know that the recordings date from a time when recording engineers actually understood what digital audio was for.

Advertisements

A small side project

viewing the past screenshot2

Not completely unrelated to audio and DSP, I have started a blog about video. It was prompted by the results of processing some vintage direct-from-the-negative footage just to see what would happen.

The answer is: it’s amazing. I have certainly never seen footage of the past like that before. And I think I have worked out how and why a faster frame rate does what it does.

I was in a quandary about sharing the results because I don’t own the footage, but I have hit upon the idea of simply giving instructions on how to process publicly-accessible examples (e.g. on Youtube) using open source applications.

I have also created sound tracks to go with the processed footage – which was fun.

Dynamic Range Compression versus ‘Limiting’

Archimago’s image showing the waveform envelope of a track from The Cranberries’ first album compared to their last

In his latest article, Archimago rightly criticises the recording industry for vandalising its own product, exemplified by a comparison between one of the Cranberries’ first recordings and their last.

Highly compressed albums consciously or unconsciously are heard as unnatural, and uncomfortable to listen to long term. Listeners lose interest quickly due to fatigue and move on or just play the music in the background since they don’t capture our attention due to the lack of dynamics.

This is quite right, except I don’t think this is just compression; it’s far worse than that.

Dynamic range compression means that quiet content is proportionately boosted in volume, and loud content is attenuated. It is done with a time constant so that locally, the waveform’s shape at any point is more-or-less intact. The ‘envelope’ is squashed, but dynamics still remain. To a first approximation, with simple dynamic range conversion no information is lost because it can be reversed.

Old-fashioned DBX-style tape noise reduction was practical proof that dynamic range compression can be reversed: it relied upon dynamic compression when recording, and complementary expansion on playback. In the 1980s I built my own companding noise reduction systems (to other people’s designs) and even used analogue companding to make my own 8-bit delay line effects system sound as though it was 16-bit (-that was the idea, anyway).

Dynamic range compression is a necessary evil when it comes to making recordings suitable for playback in a wide variety of circumstances but it can also be used creatively in the production of music recordings. It is common to apply compression to individual tracks and sources within the multitrack recording as well as in the mastering of the whole track. More complex effects stray beyond simple compression where the characteristic is not just a straight line but constructed from different segments to follow a complex ‘law’ with adjustable characteristics. Even more flexible systems allow different compression characteristics to be used on different frequency ranges.

But what Archimago shows is not ‘compression’ and is more like ‘limiting’. This is an extreme form of the same process that the paragons of virtue at Abbey Road casually smeared over the digital remasters of Sgt. Peppers recently. But phew! In that case it was only “gentle” “digital limiting” that they were applying totally unnecessarily to an already-50 year old release that was compatible with the limited dynamic range of analogue equipment never mind the incredibly wide dynamic range capabilities of modern digital audio. Talk about vandalism!

Calling it dynamic range compression and expressing it in ‘DR’ ratings hides the true horror. It isn’t just the dynamic range being squashed, but is the alteration of the shape of the waveform. Information is being lost and garbage added in the form of distortion of the waveform. Whether that is dressed up as ‘loudness optimisation’, ‘soft clipping’ or just straightforward flat top clipping (that, depending on where it is applied in the chain could also produce ‘illegal’ digital samples that produce unpredictable values from DAC reconstruction filters) is a minor quibble.

This ‘process’ is applied without a second thought and without reference to the content at any particular moment. It is clearly applied indiscriminately to the whole composite recording, suggesting that the recording engineers do their stuff and then “limiting” is applied over the top of it. Maybe this is even dignified with terms like “mastering” and “engineering”, but as Archimago says, it is simple vandalism.

The Definition of ‘High Fidelity’

How would various people define the term ‘high fidelity’?

Average person in the street

“Recreates the sound of being at the performance”

I think an imaginary typical person would probably say something like this, especially after being told that audiophiles pay as much as they earn in a year for a piece of wire.

Unfortunately, high fidelity audio doesn’t reproduce the actual sound of the performance unless, perhaps, through binaural recording and playback over headphones. This technique doesn’t pretend to maintain the illusion as you turn your head and move around, though.

And, of course, for a studio creation rather than live performance, there is no performance as such to recreate.

Average slightly technical person

“The speaker reproduces the recorded signal precisely”.

As I imagine it, the technically-literate layman’s definition of high fidelity would be more realistic and in fact correct, but incomplete because it does not specify how the speaker should interact with the acoustic environment.

Traditional audio enthusiast

“Low distortion, low noise, flat frequency response from the speaker”.

The typical audio enthusiast would translate the goal into audio-centric terms that aspire to nothing but reproducing the signal with the right frequency content on average – which ignores the unavoidable timing & phase distortion that occurs in traditional passive speakers. It also allows for horrors such as bass reflex resonators to further smear transients (as opposed to the perfect results they may give on steady state sinusoids).

Computer-literate audio enthusiast

“Low distortion, low noise, and a target frequency response at the listener’s ear”

The modern audio enthusiast who has discovered laptops, microphones and FFTs thinks that the smoothed, simplified frequency response measurement displayed on their screen is the way a human hears sound. It has to be, because the alternatives – the complex frequency domain representation and its equivalent, the time domain waveform – are visually incomprehensible.

My definition

“The recorded signal is reproduced precisely, from a small acoustic source with equal, controlled directivity at all frequencies”.

This definition is based on logical deductions.

The perceptive audio enthusiast would observe that they can always recognise voices, instruments, musical sounds regardless of acoustics, and turn their heads towards those sounds. Therefore, they would deduce that humans have the ability to ‘focus’ on audio sources regardless of acoustics. Clearly, therefore, we don’t just hear the composite frequency response of source combined with the room but have other interesting hearing abilities, probably related to binaural hearing, head movements, and phase and timing.

If we can focus on the source of a sound i.e. hear through the room, the room is not a problem to be solved but simply something normal and natural that exists. It is puzzling to think that we can improve the sound of one thing (the room) by changing something we perceive as separate from it (the sound of the source).

If the frequency response of the source is modified because of some characteristic of the room (tantamount to changing the frequency response of a musical performer in a live venue), we will hear the source as sounding unnatural. Thus ‘room correction’ based on EQ is illogical. Thus the idea of the ‘target frequency response’ is simply wrong.

If we use phase and timing in our hearing, and/or have unknown hearing abilities, there is no excuse for modifying the source’s phase and timing, arbitrarily or otherwise. Thus, if it is possible, the speaker should not modify the recording’s phase or timing. DSP makes this possible. But because of the laws of physics, this would require the speaker to look into the future, and this is only possible if we introduce a delay in the output i.e. latency. For listening to recordings (as opposed to live monitoring) latency is acceptable.

The final part of the puzzle is how the ideal speaker should interact with the room. The speaker is not intended to recreate the exact acoustic characteristics of a literal musical instrument, but to reproduce the audio field that was picked up by a microphone – possibly a composite of many musical sources plus acoustics. There is only one logical ideal in terms of dispersion (i.e. the angle through which the sound emerges from the front of the speaker) and that is: uniform at all frequencies.

What the size of that constant dispersion angle should be is open to debate and the taste of the listener – as discussed in the Grimm LS1 design paper. Most people seem to prefer something that is a compromise between omni-directional and a super-directional beam.

This is exemplified by modern cardioid speakers such as the Kii Three or the D&D 8C. To quote the designer of the 8C:

No voicing required. Other loudspeakers usually require voicing. Based on listening to a lot of recordings, the tonal balance of the loudspeaker is changed so that most recordings sound good. Voicing is required to balance differences between direct and off-axis sound. The 8c has very even dispersion. It is the first loudspeaker I ever designed that did not benefit from voicing. The tonal balance is purely based on anechoic measurements.

Where some confusion may appear is when a real world speaker (almost all existing types until now) does not possess this ideal uniform dispersion characteristic. In this situation, the reverberant sound fails to point back to the source. Effectively the frequency response of the reverberant sound does not correspond with that of the source, and the listener perceives this discrepancy due to their ability to ‘read’ the acoustic environment in terms of phase, timing and frequency response.

Of course a single musical instrument might have any dispersion characteristic, but if the recording is a composite of several musical sources in their acoustic environment, and a single, unvarying non-neutral dispersion characteristic is applied to all of them, it sounds false. Only neutral dispersion will do.

Some EQ can help here, but it is not true ‘correction’. All that can be done is to steer a middle course between neutral frequency response for the direct sound and the same for the reverberant sound. A commonly known version of this is baffle step compensation which is often applied as a frequency response ‘shelf’ whose frequency is defined by the speaker’s baffle width, and whose depth is dependent on speaker positioning and the room.

The required compensation cannot be deduced from an in-room measurement of the speaker, because that measurement inextricably shows a combination of the room and the speaker’s unknown dispersion characteristics interacting with it. Only some a priori knowledge of the speaker can help to formulate the optimum correction curve.

N.B. the goal is not a flat, or any other ‘target’, in-room response; the goal is minimal deviation from flat direct sound while achieving the most natural in-room sound possible. DSP allows this EQ curve to be applied without distorting the speaker’s phase or timing.

Stereo

It seems reasonable to extend the logic of accurate playback of the signal and uniform dispersion, from mono (one speaker), to stereo (two speakers).

But stereo is where obvious logic gives way to an element of “It has to be heard to be believed”. The operation of stereo is not obvious. Despite all the talk of the human ability to interpret the acoustic environment, stereo relies on fooling human hearing into believing that a sound reproduced from two locations simultaneously is, in fact, coming from a phantom location. This simultaneous reproduction is something that does not occur in nature, hence the potential for this to work.

Aspects that might be potential ‘show stoppers’ include:

  1. Crosstalk from each speaker to ‘the wrong ear’
  2. Nausea-inducing collapse of the stereo image as the listener turns their head or moves off-centre
  3. Room reverberation from individual speakers not pointing back to the phantom stereo source and so sounding unnatural

It turns out that (1) is a fundamental part of the way stereo works over speakers – as opposed to headphones.

And this leads to a very benign situation regarding (2), where the stereo image remains stable and plausible with listener movement.

Because (1) and (2) lead to a counter-intuitively good result where the listener is simply unaware of the location of the speakers, a listening room with reasonable symmetry extends this effect to give a good result for (3) – effectively phantom reverberation. If one speaker was sitting next to a marble wall, floor and ceiling, and the other surrounded by cushions, maybe the result wouldn’t be so good. As it is, a reasonable listening setup does not give rise to any noticeably unnatural reverberation for stereo phantom images .

What High Fidelity Over Speakers Gives Us

The result of high fidelity stereo is remarkable, and could even be the ultimate way to listen to recorded music, being even better than the notion of perfect ‘holographic’ recreation of the listening venue.

The issue is one of compatibility with domestic life and the cognitive dissonance aspect of recreating a large space in your living room. Donning special apparatus in order to listen is a bit of a mood killer; having to sit in a restrictive central location likewise.

Not hearing one’s own voice or that of a companion while listening would seem weird and artificial. Hearing no acoustic link between one’s own voice and the musical performance would also seem peculiar: imagine listening over headphones to an organ playing in a cathedral and speaking to your companion over a lip mic.

Listening to stereo in a living room gets around all these issues naturally and elegantly. It’s good enough for really serious, critical listening, but is effortlessly compatible with more social, casual listening. The addition of the ambient reverberation of the listening room acts as a two-way bridge between the performance and the listeners.

Predictable audio

Various audiophiles and forums are very measurements-oriented. The implication is that any audio system or device can be approached as a blank sheet of paper; open to testing to reveal its performance and that ‘anything is possible’. Until you measure it, you just don’t know what it’s capable of, but that after you make a set of basic measurements you will know all about it.

The truth, surely, is going to be that two brands of similar-sized two-way speakers with 8″ and 1″ drivers are going to behave very similarly to each other in terms of distortion, dispersion and so on. The exact choice of crossover frequencies and slopes may make differences, but they will be entirely predictable trade-offs. Making the speakers slim-fronted floor standers with 6″ and 1″ drivers will change the characteristics in entirely predictable ways.

The same, surely, will be true of DACs that use the same chipsets. Only a bad mistake will change the core performance – there is no ‘trade secret’ that will magically improve what goes on in a chip. Sure, measurements will be a check on the absence of that mistake, but we shouldn’t expect any amazing differences.

Amplifiers of the same type will all behave similarly to each other. If someone claims to have drastically improved the performance of an existing type of amplifier, scepticism is the order of the day. Do these innovations only work with a resistive load, are they stable with temperature, do they progressively shut the amplifier down on long organ notes? Basic measurements won’t necessarily show these limitations up.

So I would say that the most interesting aspects of audio are going to be the motivations of the designers, their obsessions and what they dismiss as unimportant. For me, a speaker that derives from the quote “I realised our product line-up had a gap for a simple passive two-way, ported compact monitor” is not going to be worth measuring. It simply won’t provide any surprises – unless it’s been really messed up badly.

This is something I didn’t know ten years ago. Back then, I somehow assumed that audio was so mysterious that it was, to use an American expression, a crapshoot. A two-way ported speaker might somehow hit just the right balance if designed by a genius and might measure and sound fundamentally better than a larger, more sophisticated speaker with DSP. The reason I was so mistaken was the amazing price range of audio of various types and the work of reviewers who simply weren’t capable of discriminating good sound from bad; who would use the same glowing vocabulary to describe the sound of both a small two-way monitor and a large three-way speaker when in reality they were miles apart.

There are no unpredictable miracles unless the design does something fundamental that has not been done before. The world’s best DAC and amplifier are not going to transform your audio experience; the world’s most expensive two-way passive direct radiator speaker is not going to sound as good as a competent three-way active DSP speaker. On the other hand, innovations like motion feedback and driver power compression elimination might sound different from anything you’ve heard before; as might phase and timing alignment, active cardioid response, BACCH etc.

Such radical innovations defy measurement unless the design fundamentals and implementation are known; and if you know the design fundamentals and implementation, the measurements are going to be predictable and yet will still not tell you how they will sound.

What has happened to pop music?

Image result for totp logo

The BBC revived the tradition of the Christmas Day Top of the Pops again this year, and I witnessed it out of curiosity. Apart from watching Jools Holland’s programme when it’s on (mainly on fast forward), I have little contact with what the young folk might be into these days. What is the state of popular music in 2018?

Well, I can tell you, it’s appalling.

The first thing you notice is the vocal style that many of today’s artists are affecting to varying degrees. The missus and I are quite good at doing an impression: it involves moving your tongue back in your mouth and squeezing the vocals out through the smaller gap thus created. You also need to warble with your throat exaggeratedly. The kids seem to think it’s cool.

And the other thing you notice is the ‘tune’. On TOTP the other day, I would say that more than half the songs featured the same anodyne chord progression (if you can really dignify it with that term), just performed with slightly different production styles, rhythms and degrees of bombast – indistinguishable from the stuff you get in the Eurovision Song Contest. Truly, today’s music creators have lost the ability to dazzle the ears with a brilliant melody or even ‘riff’.

It really is just an ‘industry’ going through the motions, and kids indiscriminately lapping up whatever they are given. It isn’t just me getting old; it’s an absolute, objective decline in the ambition of pop music, musicians and listeners.

I get to hear the Kii Threes

Thanks to a giant favour from a new friend, I finally get to hear the Kii Threes…


A couple of Sundays ago, a large van arrived at my house containing two Kii Threes and their monumentally heavy stands, plus a pair of Linkwkitz LX Minis with subwoofers along with their knowledgeable owner, John. It was our intention to spend the day comparing speakers.

We first set up the Kiis to compare against my ‘Keph’ speakers and to do this, we had to ‘interleave’ the speaker pairs with slightly less stereo separation and symmetry than ideal, perhaps:

2018-11-18 14-19-29

Setting up went remarkably smoothly, and we soon had the Kiis running off Tidal on a laptop while the Kephs were fed with Spotify Premium – most tracks seemed to be available from both services. The Kiis are elegant in the simplicity of cabling and the lack of extraneous boxes.

John had set up the Kiis with his preferred downwards frequency response slope that starts at 3kHz and ended at 4dB down (at 22 Khz?). I can’t say what significance this might have had on our listening experiment.

The original idea was to match the SPLs using pink noise and a sound level meter. This we did, but didn’t maintain such discipline for long. We were listening rather louder than I would normally, but this was inevitable because of the Kii’s amazing volume capabilities.

The bottom line is that the Kiis are spectacular! The main differences for me were that the Kiis were ‘smoother’ and the bass went deeper, and they seemed to show up the ‘ambience’ in many recordings more than the Kephs – more about that later. An SPL meter revealed that what sounded like equal volume required, in fact, a measurably higher SPL from the Kephs. Could this be our hearing registering the direct sound, but the Kiis’ superior dispersion abilities resulting in less reverberant sound – ignored by our conscious hearing but no doubt obscuring detail? Or possibly an artefact of their different frequency responses? We didn’t really have time to investigate this any further.

When standing a long way from both sets of speakers at the back of the room, the Kephs appeared to be emphasising the midrange more, and at the moment of changeover between speakers that contrast didn’t sound good; with a certain classical piano track, at the moment of changeover the Kephs seemed to render the sound* of the piano kind of ‘plinky plonk’ or toy-like compared to the Kiis – but then after about 10 seconds I got used to it. Without the Kiis to compare against I would have said my Kephs sounded quite good..! But the Kiis were clearly doing something very special.

I did try some ad hoc modifications of the Keph driver gains, baffle step slopes and so on, and we maybe got a bit closer in that regard. But I forgot about the -4dB slope that had been applied to the Kiis, and if I had thought about it, I already had an option in the Kephs’ config file for doing just that. But really, I wish I had had the courage of my convictions and left the the frequency response ‘as is’.

Ultimately, I think that we were running into the very reason why the Kiis are designed the way they are: to resemble a big speaker. As the blurb for the Kii says:

“The THREE’s ability to direct bass is comparable to, but much better controlled than that of a traditional speaker several meters wide.”

It’s about avoiding reflections that blur bass detail, but as R.E. Greene explains, it’s also about frequency response:

“What is true of the mini-monitor, that it cannot be EQed to sound right, is also true of narrow-front floor-standers. They sound too midrange-oriented because of the nature of the room sound. This is something about the geometry of the design. It cannot be substantially altered by crossover decisions and so on.

A conventional small speaker (and the Kephs are relatively small) cannot be equalised to give a flat direct sound and flat room sound. It has to be a compromise and as I described before, I apply baffle step compensation to help bridge this discrepancy between the direct and ambient frequency balances. The results are, so I thought, rather acceptable, but the compromise shows up against a speaker with more controlled dispersion.

This must always be a factor in the sound of conventional speakers unless sitting very close to them. I do believe Bruno Putzeys when he says that large speakers (or those that cleverly simulate largeness) will always sound different from small ones. It would be interesting also to have compared the Kiis against my bigger speakers whose baffle step is almost an octave lower.

However, there was another difference that bothered me (with the usual sighted listening caveats) and this was ‘focus’. With the Kiis I heard lots of ‘ambience’ – almost ‘surround sound’ – but I didn’t hear a super-precise image. When the Kephs were substituted I heard a sudden snap into focus, and everything moved to mainly between and beyond the speakers. The sound was less ‘smooth’ but it was, to me, more focused.

And this is a question I still have about the Kiis and other speakers that utilise anti-phase. I see the animations on the Kii web site that show how the rear drivers cancel out the sound that would otherwise go behind the speaker. To do this, the rear drivers must deliver a measured quantity of accurately-timed anti-phase. This is a brilliant idea.

My question is, though: how complete is this cancellation if you partially obscure one of the side drivers (-with another speaker in this case)? I do wonder if I was hearing the results of anti-phase escaping into the room and messing up the imaging because of the way we had arranged the speakers – along with a mildly (possibly imaginary!) uncomfortable sensation in my ears and head.

To a frequency response measurements oriented person, it doesn’t matter whether sound is anti-, or in-, phase; it is just ‘frequency response material’ that gets chucked into bins and totted up at the end of the measurement. If it is delayed and reflected then in graphs its effects appear no different from the visually-chaotic results of all room reflections; this is the usual argument against phase accuracy in forum discussions. “How can phase matter if it is shifted arbitrarily by reflections in the room, anyway?”.

However, to the person who acknowledges that the time domain is also important, anti-phase is a problem. If human hearing has the ability to separate direct sound from room sound, it is dependent on being able to register the time-delayed similarity between direct and reflected sound. If the reflected sound is inverted relative to the direct, that similarity is not as strong (we are talking about transients more than steady state waveforms). In fact, the reflected sound may partially register as a different source of sound.

Anti-phase is surely going to sound weird – and indeed it does, as anyone who has heard stereo speakers wired out of phase will attest. Where the listener registers in-phase stereo speakers as producing a precise image located at one point in space, out-of-phase speakers produce an image located nowhere and/or everywhere. The makers of pseudo-surround sound systems such as Q-Sound exploit this in order to create images that are not restricted to between the stereo speakers. This may be a factor in the open baffle sound that some people like (but I don’t!).

So I would suggest that allowing anti-phase to bounce around the room is going to produce unpredictable results. This is one reason why I am suspicious of any speaker that releases the backwave of the driver cone into the room. The more this can be attenuated (and its bandwidth restricted) the better.

With the Kiis, was I hearing the effect of less-than-perfect cancellation because of the obscuring of one of the side drivers? Or imagining it? Most people who have heard the Kiis remark on the precise imaging, so I fear that we managed to change something with our layout. Despite the Kiis’ very clever dispersion control system which supposedly makes them placement-independent, does it pay to be a little careful of placement and/or symmetry, anyway? For it not to matter would be miraculous, I would say.

In a recent review of the Kiis (not available online without a subscription), Martin Colloms says that with the Kiis he heard:

“…sometimes larger than life, out-of-the-box imaging”

I wonder if that could be a trace of what I was hearing..? Or maybe he means it as a pure compliment. In the same review he describes how the cardioid cancellation mechanism extends as far as 1kHz, so it is not just a bass phenomenon.


Next, John set up his DIY Linkwitz LX Mini speakers (which look very attractive, being based on vertical plastic tubes with small ‘pods’ on top), as well as their compact-but-heavy subwoofers. These were fed with analogue signals from a Raspberry-Pi based streamer and, again, sounded excellent. They also seek to control dispersion, in this case by purely acoustic means – that I don’t yet understand. And they may also dabble a bit in backwave anti-phase.

If I had any criticism, it was that the very top end wasn’t quite as good as a conventional tweeter..? But it might be my imagination and expectation bias. Also, our ears and critical faculties were pretty far gone by that point…

Really, we had three systems all of which, to me, sounded good in isolation – but with the Kiis revealing their superior performance at the point of changeover. There were certainly moments of confusion when I didn’t know which system was operating and only the changeover gave the game away. I think all three systems were much better than what you often get at audio shows.

What we didn’t have were any problems with distortion, hum, noise. In these respects, all three systems just worked. The biggest source of any such problem was a laptop fan which kicked in sometimes when running Tidal.

There were lots of things we didn’t do. We didn’t try the speakers in different positions; we didn’t try different toe-in angles; we didn’t make frequency response measurements and do things in a particularly scientific way; we listened at pretty high volume and didn’t have the self-control to listen at lower volumes – which might have been more appropriate for some of the classical music. The room was ‘as it comes’: 6 x 3.4 x 2.4m, carpeted, plaster walls and ceiling, and floor-to-ceiling windows behind the speakers with a few boxes and bits of furniture lying about.


So my conclusion is that I have heard the Kiis and am highly impressed, but there might possibly be an extra level of focus and integrity I have yet to experience. I never got to the point where I could listen through the speakers rather than to them, but I am sure that this will happen at some point.

In the meantime I am having to learn to love my Kephs again – which actually isn’t too hard without the Kiis in the same room showing them up!


Footnotes:

*Since writing that paragraph I have found a mention of possibly that very phenomenon:

“…even a brief comparison with a real piano, say, will reveal the midrange-orientation of the narrow- front wide radiators.”

Still listening…

I’m still here, and still listening to my KEF (-derived) speakers most days. I honestly think that they are built to the right formula – although I keenly await the time when I get to hear the Kii Threes.

It all now seems so obvious, but it took me quite a while to disentangle myself from the frequency domain-centric view that most audio design people are committed to – and whose minds (and possibly ears) are warped into believing.

It is clear that human hearing does perform frequency domain analysis, but that it also uses other methods and ‘hardware’ in parallel to characterise what it is hearing. This means that an audio system needs to reproduce the signal without changing it in either the time or frequency domains.

The alternative is to second guess how human hearing works and to assume that arbitrary distortion of phase and timing has no effect. In fact, I would say it is not even as rational as that: what seems to have happened is that while carpentry-and-coil-based technology doesn’t explicitly control phase and timing, conventional 1970s speakers still sounded pretty good. The results have been retrospectively analysed and justified, and a model of human hearing developed to fit the speakers rather than the other way round.

This faulty model leads to ideas like bass reflex and ‘room correction’ that, viewed through the prism of not trying to second guess human hearing, seem as confused and deluded as they sound.

The result is the weird variability in audio systems that all ‘measure well’ – using the subset of measurements that satisfy the model – but sound disappointing even while costing the price of a car. It might even be worse than that: maybe recordings are being made while being monitored through ‘room correction’ resulting in the demise of high fidelity recordings as we know them.

And there’s another delusional idea that stems from the faulty model and the occasionally serendipitous characteristics of old technology: the notion that we listen to a signal rather than through a channel.

The conventional view is that we must change the signal to give the best sound – whether by equalisation or – bizarrely – adding distortion deliberately e.g. with valves or vinyl. If you do this, you are really changing the characteristics of the channel. In real music and acoustics there is no such thing as ‘a signal’ and whatever automatic processing you do of it is, in the general case, arbitrary and meaningless. For sure, you may find that distortion is a pleasing artistic effect on a particular (probably very simple) recording. But are you an artist? If so, you might be much better served by playing with a freeware PC recording studio app rather than churning equipment that represents several years of the retirement you may never get to enjoy.

The only coherent strategy is to reproduce the signal without touching it. In my experience, if you get anywhere near to this, it sounds magnificent. Not ‘neutral’; not ‘clinical’ but deep, open, rich, colourful – like real music.

Audiophile listening rooms: the hard floor phenomenon

listening rooms

If you Google image search for audiophile listening room, you bring up a selection of images that often have a certain resemblance to each other. A large number of rooms feature exposed wood or stone tiled floors with relatively small rugs in the middle. They also don’t have much furniture in them. Some people choose to sit a long way away from the speakers.

These don’t immediately look like my idea of a good listening room. To me they look echoey – how my room sounded before the carpet was fitted.

This leads me to wonder whether some discussions over room treatments, room measurements and room correction may be at cross purposes. I often don’t understand why people become so interested in these things when my system seems to sound perfectly OK without any of that stuff. This may be the explanation.

The way I look at it, a wall-to-wall carpet may be the best room treatment you can fit, covering a very large area with minimal effort. And some other furniture may also be a good thing. Kind of like a 1970s living room – the sound you’ve been trying to recreate for the last few decades and had convinced yourself was just a false memory…

How Stereo Works

(Updated 03/06/18 to include results for Blumlein Pair microphone arrangement.)

initial

A computer simulation of stereo speakers plus listener, showing the listener’s perception of the directions of three sources that have previously been ‘recorded’. The original source positions are shown overlaid with the loudspeakers.

Ever since building DSP-based active speakers and hearing real stereo imaging effectively for the first time, it has seemed to me that ordinary stereo produces a much better effect than we might expect. In fact, it has intrigued me, and it has been hard to find a truly satisfactory explanation of how and why it works so well.

My experience of stereo audio is this:

  • When sitting somewhere near the middle between two speakers and listening to a ‘purist’ stereo recording, I perceive a stable, compelling 3D space populated by the instruments and voices in different positions.
  • The scene can occasionally extend beyond the speakers (and this is certainly the case with recordings made using Q-Sound and other such processes).
  • Turning my head, the image stays plausible.
  • If I move position, the image falls apart somewhat, but when I stop moving it stabilises again into a plausible image – although not necessarily resembling what I might have expected it to be prior to moving.
  • If I move left or right, the image shifts in the direction of the speaker I am moving towards.

An article in Sound On Sound magazine may contain the most perceptive explanation I have seen:

The interaction of the signals from both speakers arriving at each ear results in the creation of a new composite signal, which is identical in wave shape but shifted in time. The time‑shift is towards the louder sound and creates a ‘fake’ time‑of‑arrival difference between the ears, so the listener interprets the information as coming from a sound source at a specific bearing somewhere within a 60‑degree angle in front.

This explanation is more elegant than the one that simply says that if the sound from one speaker is louder we will tend to hear it as if coming from that direction – I have always found it hard to believe that such a ‘blunt’ mechanism could give rise to a precise, sharp 3D image. Similarly, it is hard to believe that time-of-arrival differences on their own could somehow be relayed satisfactorily from two speakers unless the user’s head was locked into a fixed central position.

The Sound On Sound explanation says that by reproducing the sound from two spaced transducers that can reach both ears, the relative amplitude also controls the relative timing of what reaches the ears, thus giving a timing-based stereo image that, it appears, is reasonably stable with position and head rotation. This is not a psychoacoustic effect where volume difference is interpreted as a timing difference, but the literal creation of a physical timing difference from a volume difference.

There must be timbral distortion because of the mixing of the two separately-delayed renditions of the same impulse at each ear, but experience seems to suggest that this is either not significant or that the brain handles it transparently, perhaps because of the way it affects both ears.

Blumlein’s Patent

Blumlein’s original 1933 patent is reproduced here. The patent discusses how time-of-arrival may take precedence over volume-based cues depending on frequency content.

It is not immediately apparent to me that what is proposed in the patent is exactly what goes on in most stereo recordings. As far as I am aware, most ‘purist’ stereo recordings don’t exaggerate the level differences between channels, but simply record the straight signal from a pair of microphones. However, the patent goes on to make a distinction between “pressure” and “velocity” microphones which, I think, corresponds to omni-directional and directional microphones. It is stated that in the case of velocity microphones no amplitude manipulation may be needed. The microphones should be placed close together but facing in different directions (often called the ‘Blumlein Pair‘) as opposed to being spaced as “artificial ears”.

Blumlein -Stereo.png

Blumlein Pair microphone arrangement

The Blumlein microphones are bi-directional i.e. they also respond to sound from the back.

Going by the SoS description, this type of arrangement would record no timing-based information (from the direct sound of the sources at any rate), just like ‘panpot stereo’, but the speaker arrangement would convert orientation-induced volume variations into a timing-based image derived from the acoustic summation of different volume levels via acoustic delays to each ear. This may be the brilliant step that turns a rather mundane invention (voices come from different sides of the cinema screen) into a seemingly holographic rendering of 3D space when played over loudspeakers.

Thus the explanation becomes one of geometry plus some guesswork regarding the way the ears and brain correlate what they are hearing, presumably utilising both time-of-arrival and the more prosaic volume-based mechanism which says that sounds closer to one ear than the other will be louder – enhanced by the shadowing effect of the listener’s head in the way. Is this sufficient to plausibly explain the brilliance of stereo audio? Does a stereo recording in any way resemble the space in which it was recorded?

A Computer Simulation

In order to help me understand what is going on I have created a computer simulation which works as follows (please skip this section unless you are interested in very technical details):

  • It is a floor plan view of a 2D slice through the system. Objects can be placed at any XY location, measured in metres from an origin.
  • There are no reflections; only direct sound.
  • The system comprises
    • a recording system:
      • Three acoustic sources, each of which generate an identical musical transient (loaded from a mono WAV file at CD quality). Each source is considered in isolation from the others.

      • Two microphones that can be spaced and positioned as desired. They can be omni-directional or have a directional response. In the former case, volume is attenuated with distance from the source while in the latter it is attenuated by both distance and orientation to the source.
    • a playback system:

      • Two omni-directional speakers

      • A listener with two ears and the ability to move around and turn his head.

  • The directions and distances from sources to microphones are calculated based on their relative positions, and from these the delays and attenuations of the signals at the microphones are derived. These signals are ‘recorded’.

  • During ‘playback’, the positions of the listener’s ears are calculated based on XY position of the head and its rotation.

  • The distances from speakers to each ear are calculated, and from these, the delays and attenuation thereof.

  • The composite signal from each source that reaches each ear via both speakers is calculated and from this is found:

    • relative amplitude ratio at the ears
    • relative time-of-arrival difference at the ears. This is currently obtained by correlating one ear’s summed signal for that source (from both speakers) against the other and looking for the delay corresponding to peak output of this. (There may be methods more representative of the way human hearing ascertains time-of-arrival, and this might be part of a future experiment).

  • There is currently no attempt to simulate HRTF or the attenuating effect of ‘head shadow’. Attenuation is purely based on distance to each ear.

  • The system then simulates the signals that would arrive at each ear from a virtual acoustic source were the listener hearing it live rather than via the speakers.

    • This virtual source is swept through the XY space in fine increments and at each position the ‘real’ relative timings and volume ratio that would be experienced by the listener are calculated.

    • The results are compared to the results previously found for each of the three sources as recorded and played back over the speakers, and plotted as colour and brightness in order to indicate the position the listener might perceive the recorded sources as emanating from, and the strength of the similarity.

  • The listener’s location and rotation can be incremented and decremented in order to animate the display, showing how the system changes dynamically with head rotation or position.

The results are very interesting!

Here are some images from the system, plus some small animations.

Spaced omni-directional microphones

In these images, the (virtual) signal was picked up by a pair of (virtual) omnidirectional microphones on either side of the origin, spaced 0.3m apart. This is neither a binaural recording (which would at least have the microphones a little closer together) nor the Blumlein Pair arrangement, but does seem to be representative of some types of purist stero recording.

The positions of the three sources during (virtual) recording are shown overlaid with the two speakers, plus the listener’s head and ears. Red indicates response to SRC0; green SRC1; and blue SRC2.

head_rotation

Effect of head rotation on perceived direction of sources based on inter-aural timing when listener is close to the ‘sweet spot’.

side_to_side

Effect of side-to-side movement of listener on perceived imaging based on inter-aural timing.

compound_movement

Compound movement of listener, including front-to-back movement and head rotation.

amplitude

Effect of listener movement on perceived image based on inter-aural amplitudes.

Coincident directional microphones (Blumlein Pair)

Here, directional microphones are set at the origin at right angles to each other, as shown in the earlier diagram. They copy Blumlein’s description in the patent i.e. output is proportional to the cosine of angle of incidence.

blumlein_timing

Time-of-arrival based perception of direction as captured by a coincident pair of directional microphones (Blumlein Pair) and played back over stereo speakers, with compound movement of the listener.

blumlein_amplitude

A similar test, but showing perceived locations of the three sources based on inter-aural volume level

In no particular order, some observations on the results:

  • A stereo image based on time-of-arrival differences at the ears can be created with two spaced omni-directional microphones or coincident directional microphones. Note, the aim is not to ‘track’ the image with the user’s head movement (like headphones would), but to maintain stable positions in space even as the user turns away from ‘the stage’.
  • The Blumlein Pair gives a stable image with listener movement based on time-of-arrival. The image based on inter-aural amplitude may not be as stable, however.
  • Interaural timing can only give a direction, not distance.

  • A phantom mirror image of equal magnitude also accompanies the frontwards time-of-arrival-derived direction, but this would also be true of ‘real life’. The way this behaves with dynamic head movement isn’t necessarily correct; at some locations and listener orientations maybe the listener could be confused by this.

  • Relative volume at the two ears (as a ratio) gives a ‘blunt’ image that behaves differently from the time-of-arrival based image when the listener moves or turns their head. The plot shows that the same ratio can be achieved for different combinations of distance and angle so on its own it is not unambiguous.

  • Even if the time-of-arrival image stays meaningful with listener movement, the amplitude-based image may not.

  • Combined with timing, relative interaural volume might provide some cues for distance (not necessarily the ‘true’ distance).

  • No doubt other cues combining indirect ‘ambient’ reflections in the recording, comb-filtering, dynamic phase shifts with head movement, head-related transfer function, etc. are also used by the listener and these all contribute to the perception of depth.

  • The cues may not all ‘hang together’, particularly in the situation of movement of the listener, but the human brain seems to make reasonable sense of them once the movement stops.

  • The Blumlein Pair does, indeed, create a time-of-arrival-based image from amplitude variations, only. And this image is stable with movement of the listener – a truly remarkable result, I think.
  • Choice of microphone arrangement may influence the sound and stability of the image.
  • Maybe there is also an issue regarding the validity of different recording techniques when played back over headphones versus speakers. The Blumlein Pair gives no time-of-arrival cues when played over headphones.
  • The audio scene is generally limited to the region between the two speakers.
  • The simulation does not address ‘panpot’ stereo yet, although as noted earlier, the Blumlein microphone technique is doing something very similar.
  • In fact, over loudspeakers, the ‘panpot’ may actually be the most correct way of artificially placing a source in the stereo field, yielding a stable, time-of-arrival-based position.

Perhaps the thing that I find most exciting is that the animations really do seem to reflect what happens when I listen to certain recordings on a stereo system and shift position while concentrating on what I am hearing. I think that the directions of individual sources do indeed sometimes ‘flip’ or become ambiguous, and sometimes you need to ‘lock on’ to the image after moving, and from then on it seems stable and you can’t imagine it sounding any other way. Time-of-arrival and volume-based cues (which may be in conflict in certain listening positions), as well as the ‘mirror image’ time-of-arrival cue may be contributing to this confusion. These factors may differ with signal content e.g. the frequency ranges it covers.

It has occurred to me that in creating this simulation I might have been in danger of shattering my illusions about stereo, spoiling the experience forever, but in the end I think my enthusiasm remains intact. What looked like a defect with loudspeakers (the acoustic cross-coupling between channels) turns out to be the reason why it works so compellingly.

In an earlier post I suggested that maybe plain stereo from speakers was the optimal way to enjoy audio and I think I am more firmly persuaded of that now. Without having to wear special apparatus, have one’s ears moulded, make sure one’s face is visible to a tracking camera, or dedicate a large space to a central hot-seat, one or several listeners can enjoy a semi-‘holographic’ rendering of an acoustic recording that behaves in a logical way even as the listener turns their head. The system blends the listening room’s acoustics with the recording meaning that there is a two-way element to the experience whereby listeners can talk and move around and remain connected with the recording in a subtle, transparent way.

Conclusion

Stereo over speakers produces a seemingly realistic three-dimensional ‘image’ that remains stable with listener movement. How this works is perhaps more subtle than is sometimes thought.

The Blumlein Pair microphone arrangement records no timing differences between left and right, but by listening over loudspeakers, the directional volume variations are converted into time-of-arrival differences at the listener’s ears. The acoustic cross-coupling from each speaker to ‘the wrong ear’ is a necessary factor in this.

Some ‘purist’ microphone techniques may not be as valid as others when it comes to stability of the image or the positioning of sources within the field. Techniques that are appropriate for headphones may not be valid for speakers, and vice versa.