Image is Everything

I have a couple of audiophile friends for whom ‘imaging’ is very much a secondary hi-fi goal, but I wonder if this is because they’ve never really heard it from their audio systems.

What do we mean by the term anyway? My definition would be the (illusion of) precise placement of acoustic sources in three dimensions in front of the listener – including the acoustics of the recording venue(s). It isn’t a fragile effect that only appears at one infinitesimal position in space or collapses at the merest turn of the head, either.

It is something that I am finding is trivially easy for DSP-based active speakers. Why? Well I think that it just falls out naturally from accurate matching between the channels and phase & time-corrected drivers. Logically, good imaging will only occur when everything in a system is working more-or-less correctly.

I can imagine all kinds of mismatches and errors that might occur with passive crossovers, exacerbated by the compromises that are forced on the designer such as having to use fewer drivers than ideal, or running the drivers outside their ideal frequency ranges.

Imaging is affected by the speaker’s interaction with the room, of course. The ultimate imaging accuracy may occur when we eliminate the room’s contribution completely, and sit in a very tight ‘sweet spot’, but this is not the most practical or pleasant listening situation. The room’s contribution may also enhance an illusion of a palpable image, so it is not desirable to eliminate it completely. Ultimately, we are striking a balance between direct sound and ambient reflections through speaker directivity and positioning relative to walls.

A real audiophile scientist would no doubt be interested in how exactly stereo imaging works, and whether listening tests could be devised to show the relative contributions of poor damping, phase errors, Doppler distortion, timing misalignment etc. Maybe we could design a better passive speaker as a result. But I would say: why bother? The DSP active version is objectively more correct, and now that we have finally progressed to such technology and can actually listen to it, it clearly doesn’t need to do anything but reproduce left and right correctly – no need for any other tricks or the forlorn hope of some accidental magic from natural, organic, passive technology.

An ‘excuse’ for poor imaging is that in many real musical situations, imaging is not nearly as sharp as can be obtained from a good audio system. This is true: if you go to a classical concert and consciously listen for where a solo brass instrument (for example) is coming from, you often can’t really tell. I presume this is because you are generally seated far from the stage with a lot of people in the way and much ‘ambience’ thrown in. I presume that the conductor is hearing much stronger ‘imaging’ than we are – and many recordings are made with the mics much closer than a typical person sitting in the auditorium; the sharper imaging in the recording may well be largely artificial.

However, to cite this as a reason for deliberately blurring the image in some arbitrary way is surely a red herring. The image heard by the audience member is still ‘coherent’ even if it is not sharp. And the ‘artificially imaged’ recording contains extra information that is allowing us to separate the various acoustic sources by a different mechanism than the one that might allow us to tease out the various sources in a mono recording, say. It reduces effort and vastly increases the clarity of the audio ‘scene’.

I think that good imaging due to superior time alignment and phase is going to be much more important than going to the Nth degree to obtain ultra-low low harmonic distortion.

If we mess up the coherence between the channels we are getting the worst of all worlds: something that arbitrarily munges the various acoustic sources and their surroundings in response to signal content. An observation that is sometimes made is that the music “sticks to the speakers” rather than appearing in between. What are our brains to make of it? It must increase the effort of listening and blur the detail of what we are hearing.

Not only this, but good imaging is compelling. Solid voices and instruments that float in mid air grab the attention. The listener immediately understands that there is a lot more information trapped in a stereo recording than they ever knew.

Television’s first night


There was an interesting BBC programme last week which celebrated the 80th anniversary of the launch night of BBC television. It aimed to re-create the original event as closely as possible, even to the extent of building replicas of some of the technology in use at the time.

For those who don’t know the story, the BBC launched television in 1936 running two types of technology in parallel: the Logie Baird mechanical system and EMI’s vacuum tube-based electronic system. Baird’s system was used first, and then the whole thing was repeated using the electronic system. The original television receivers, of which only 300 had been sold by the launch, had a switch to allow the receiver to be put into Baird or EMI mode – I hadn’t realised that, even on launch day, some receivers were using electronic picture tubes even if the Baird camera system wasn’t.

The Baird mechanical system was incredible: for truly live images it had to use a “flying spot” camera where the scene (the face of a presenter sitting in a pitch black booth) was raster-scanned with a high intensity dot of light and the resulting reflected light level picked up by a photo-sensor. In order to achieve 240 lines of resolution, two rotating discs were used; one a metre in diameter and spinning so fast its edges were almost supersonic, and a synchronised slower disc with a spiral mask which selected one of several sets of dots on the main disc.

More general scenes of groups of performers and so on were recorded live to film which was developed in a portable ‘lab’ mounted beneath the camera, ready to be scanned by a flying spot scanner some 54 seconds later – this was effectively the first ever telecine machine. The transition from live to telecine sections required logistical coordination around the 54 second delay, meaning that the performers had to start 54 seconds before the live announcer stopped talking, and the announcer had to wait in silence after the performance ended before someone jabbed him in the ribs through a hole in the side of the booth and he could start talking again. (I found this whole thing baffling: why was it important that any of it was truly ‘live’? Why not just do it all delayed by 54 seconds? Perhaps, as was implied in the program, the telecine images were not quite as crisp as the live..?).

Anyway, the writing was on the wall for the mechanical system, and the six month competition was terminated after only three months. My question is: why did it take so long? Why did people go to such heroic lengths to pursue a solution that was so obviously doomed? Perhaps men’s fascination with spinning discs in preference to electronic solutions is universal. I have no doubt that there were some diehards who thought that the mechanical system somehow captured a better picture than a soulless glass tube.

The Engineering Department of Cambridge University had the fun of developing the replica flying spot camera (although with only 60 lines of resolution as opposed to the original 240). Things got a bit fraught in the build up to the ‘launch’ however: a persistent mechanical howl from the disc mechanism threatened to ruin everything. It seemed to take several hours of effort and anguish before someone had the bright idea of applying a drop of oil…

None of the original presenters, performers or staff present at the launch night are still with us, but the BBC did manage to track down a 104 year old engineer who worked for Baird. The launch of television seems like so long ago, and yet this man was already 24 when it happened. He is still sharp as a pin and when Hugh Hunt of Cambridge University told him he was building a replica flying spot machine using an aluminium disc instead of the original steel, his brow furrowed immediately and he asked “Are you sure aluminium will be strong enough to withstand the centrifugal force?”.

I enjoyed seeing the old abandoned studios in Alexandra Palace, and Paul Marshall‘s barn full of old TV equipment, including some of the earliest camera tubes in existence. He has built a working camera based on a genuine Iconoscope tube, using modern electronics to drive it, giving us a close re-creation of pre-war electronic TV pictures. I somehow find old TV equipment quite moving; TV was an important part of my childhood and I can’t help but think of the snippets of the golden past that might have been captured through those lenses.

Neural Adaptation

Just an interesting snippet regarding a characteristic of human hearing (and all our senses). It is called neural adaptation.

Neural adaptation or sensory adaptation is a change over time in the responsiveness of the sensory system to a constant stimulus. It is usually experienced as a change in the stimulus. For example, if one rests one’s hand on a table, one immediately feels the table’s surface on one’s skin. Within a few seconds, however, one ceases to feel the table’s surface. The sensory neurons stimulated by the table’s surface respond immediately, but then respond less and less until they may not respond at all; this is an example of neural adaptation. Neural adaptation is also thought to happen at a more central level such as the cortex.

Fast and slow adaptation
One has to distinguish fast adaptation from slow adaptation. Fast adaptation occurs immediately after stimulus presentation i.e., within 100s of milliseconds. Slow adaptive processes that take minutes, hours or even days. The two classes of neural adaptation may rely on very different physiological mechanisms.

Auditory adaptation, as perceptual adaptation with other senses, is the process by which individuals adapt to sounds and noises. As research has shown, as time progresses, individuals tend to adapt to sounds and tend to distinguish them less frequently after a while. Sensory adaptation tends to blend sounds into one, variable sound, rather than having several separate sounds as a series. Moreover, after repeated perception, individuals tend to adapt to sounds to the point where they no longer consciously perceive it, or rather, “block it out”.

What this says to me is that perceived sound characteristics are variable depending on how long the person has been listening, and to what sequence of ‘stimulii’. Our senses, to some extent, are change detectors not ‘direct coupled’.

Something of a conundrum for listening-based audio equipment testing..? Our hearing begins to change the moment we start listening. It becomes desensitised to repeated exposure to a sound – one of the cornerstones of many types of listening-based testing.