Image is Everything

I have a couple of audiophile friends for whom ‘imaging’ is very much a secondary hi-fi goal, but I wonder if this is because they’ve never really heard it from their audio systems.

What do we mean by the term anyway? My definition would be the (illusion of) precise placement of acoustic sources in three dimensions in front of the listener – including the acoustics of the recording venue(s). It isn’t a fragile effect that only appears at one infinitesimal position in space or collapses at the merest turn of the head, either.

It is something that I am finding is trivially easy for DSP-based active speakers. Why? Well I think that it just falls out naturally from accurate matching between the channels and phase & time-corrected drivers. Logically, good imaging will only occur when everything in a system is working more-or-less correctly.

I can imagine all kinds of mismatches and errors that might occur with passive crossovers, exacerbated by the compromises that are forced on the designer such as having to use fewer drivers than ideal, or running the drivers outside their ideal frequency ranges.

Imaging is affected by the speaker’s interaction with the room, of course. The ultimate imaging accuracy may occur when we eliminate the room’s contribution completely, and sit in a very tight ‘sweet spot’, but this is not the most practical or pleasant listening situation. The room’s contribution may also enhance an illusion of a palpable image, so it is not desirable to eliminate it completely. Ultimately, we are striking a balance between direct sound and ambient reflections through speaker directivity and positioning relative to walls.

A real audiophile scientist would no doubt be interested in how exactly stereo imaging works, and whether listening tests could be devised to show the relative contributions of poor damping, phase errors, Doppler distortion, timing misalignment etc. Maybe we could design a better passive speaker as a result. But I would say: why bother? The DSP active version is objectively more correct, and now that we have finally progressed to such technology and can actually listen to it, it clearly doesn’t need to do anything but reproduce left and right correctly – no need for any other tricks or the forlorn hope of some accidental magic from natural, organic, passive technology.

An ‘excuse’ for poor imaging is that in many real musical situations, imaging is not nearly as sharp as can be obtained from a good audio system. This is true: if you go to a classical concert and consciously listen for where a solo brass instrument (for example) is coming from, you often can’t really tell. I presume this is because you are generally seated far from the stage with a lot of people in the way and much ‘ambience’ thrown in. I presume that the conductor is hearing much stronger ‘imaging’ than we are – and many recordings are made with the mics much closer than a typical person sitting in the auditorium; the sharper imaging in the recording may well be largely artificial.

However, to cite this as a reason for deliberately blurring the image in some arbitrary way is surely a red herring. The image heard by the audience member is still ‘coherent’ even if it is not sharp. And the ‘artificially imaged’ recording contains extra information that is allowing us to separate the various acoustic sources by a different mechanism than the one that might allow us to tease out the various sources in a mono recording, say. It reduces effort and vastly increases the clarity of the audio ‘scene’.

I think that good imaging due to superior time alignment and phase is going to be much more important than going to the Nth degree to obtain ultra-low low harmonic distortion.

If we mess up the coherence between the channels we are getting the worst of all worlds: something that arbitrarily munges the various acoustic sources and their surroundings in response to signal content. An observation that is sometimes made is that the music “sticks to the speakers” rather than appearing in between. What are our brains to make of it? It must increase the effort of listening and blur the detail of what we are hearing.

Not only this, but good imaging is compelling. Solid voices and instruments that float in mid air grab the attention. The listener immediately understands that there is a lot more information trapped in a stereo recording than they ever knew.


15 thoughts on “Image is Everything

  1. “I have a couple of audiophile friends for whom ‘imaging’ is very much a secondary hi-fi goal, but I wonder if this is because they’ve never really heard it from their audio systems.”

    Perhaps it’s because they’re more interested in music than “sound.”


      1. No argument from me.

        I pretty much break it down into those who are in love with music and those who are in love with sound. Or to put it another way, those for whom audio systems are a means to connect with music and those for whom music is a way to connect with their audio systems. I would consider the latter to be audiophiles and the former not.


        1. It’s tricky. If someone says “I love music itself, not just the sound of it”, I don’t entirely believe them. How do you separate the pure music from ‘the sound’?

          As Thomas Beecham said “The English may not like music, but they absolutely love the noise it makes”.


            1. Last night I accidentally played a live Rolling Stones blues performance. I wouldn’t normally choose to listen to it, but over the system I was listening on, it sounded strikingly ‘live’ – a combination of really gutsy bass and the aforementioned imaging. I enjoyed it immensely because of the illusion I was actually there. The ‘sound’ transformed the ‘music’. Without the imaging (“obsessing over the fly specks”?), I wouldn’t have enjoyed it. Does this make me a very shallow person? 🙂


  2. This image that we speak of is surely just a construct of the mixing desk.The vocalist is usually in a booth off to one side of the studio and often the drums off in another booth.Parts of the recording may have been created in a different studio.How many recordings are done with just two mikes?


    1. I am thinking of it like a composite image. Sure, it may have been made by sticking some cutouts on a backdrop, but I still want to see the component parts sharp and clearly delineated rather than blurred. It is an extra dimension that the listener can latch onto in order to separate the sources – I can see how it could be a more important factor than low harmonic distortion in that regard.

      I wouldn’t have mentioned it if I hadn’t heard it and thought “Aha!”. I don’t think of it as ‘a thing’ that is to be designed in or measured: it is just something that falls out of accurate-ish stereo reproduction and is degraded to a greater or lesser extent by traditional ways of doing things.


  3. “The time resolution of human hearing is 5 microseconds or better—which would correspond to a frequency of 200 kHz, requiring audio equipment ideally to have a flat response to that frequency” -David E Blackmer
    We make super tweeters that have a response of up to 90kHz, using an ultra-thin, (4 micron) aluminium ribbon to deliver sound that arrives at the speed of real life. Arrivin at separate times to either ear. The ear latches onto the first delivery of sound, the differential is processed and the brain can define its direction accurately. Lulling the mind into a more convincing music reproduction.


    1. Many thanks for the comment, but I am not sure I agree 100% with the quote.

      If I were to reproduce a 3 kHz sine wave, for example, with a ‘jitter’ of 5 microseconds it would sound terrible. i.e. if I couldn’t control the zero crossings and peaks to better than 5 us. But an ordinary audio system has no problem in reproducing a very high quality sine wave. This is because the accuracy of the timing is not related to the sample rate: by sampling a bandwidth-limited signal we infer, and reproduce, the signal exactly – with just a minuscule randomised error related to the bit depth.

      The upshot is that an ordinary audio system can time impulses etc. to much better than 5 microseconds. The bandwidth of those signals is limited, however. To find fault with our ordinary audio systems, we would need for Mr. Blackmer to demonstrate that humans not only register timing to 5 microseconds, but that they can detect higher frequencies (in transients I suppose) than they are conventionally thought to be capable of. This, I think, is one of the claims for why the world needs MQA.


      1. Forgive me if you’ve written about this elsewhere, but is it your view that “high resolution” audio (above-Redbook sample and bit rates) — or what may more appropriately be called “high bandwidth” audio — is an unnecessary (and potentially audibly harmful due to intermodulation distortion) waste of space?


          1. Yes, it’s a great article and their videos are extremely helpful. It certainly did a number on many of my previously held assumptions about digital audio. Here’s a quote from another good article on the subject:

            “Unless the Nyquist Theorem is ever disproved, it stands that any increase in sample rates cannot increase fidelity within the audible spectrum. At all. Extra data points yield no improvement.”



    1. You mean the panpotted cello (or whatever it is)? I don’t hear it as a spectacular image on my speakers, possibly because it’s basically a low frequency sound..? and panpotting is not the most striking effect.

      The main sort of thing I’m talking about is either purists’ recordings (e.g. one I just listened to called Deux Poemes by Poulenc sung by someone called Sophie Karthauser) which may, or may not have been made with only two microphones, but certainly include elements which are true stereo – maybe the singer was recorded with two microphones and mixed with a couple more mics covering the piano. The result, anyway, is as though the singer is in front of you, a few feet back – absolutely no way to tell that the speakers are at the sides and she is in mid air.

      Or recordings that have, for example, a ‘backdrop’ with stereo effects (e.g. strong reverb) against which, a singer recorded in dry mono in the middle will stand out spectacularly. Or maybe the singer is recorded with a different stereo effect. If the system is working properly, the two contrast clearly to great effect. Try this one:

      It doesn’t sound like anything in particular in headphones to me, but played over the converted KEF speakers the start is quite spectacular with the voice, when it comes in, clearly floating over the more distant backdrop in a no-doubt artificial but still beautiful 3D effect.

      Just played Elton John Rocket Man and, again, the illusion of being in the studio is very compelling when the various instruments are so cleanly separated.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s