The Definition of ‘High Fidelity’

How would various people define the term ‘high fidelity’?

Average person in the street

“Recreates the sound of being at the performance”

I think an imaginary typical person would probably say something like this, especially after being told that people pay as much as they earn in a year for a piece of wire.

Unfortunately, high fidelity audio doesn’t reproduce the actual sound of the performance unless, perhaps, through binaural recording and playback over headphones. This technique doesn’t pretend to maintain the illusion as you turn your head and move around, though.

And, of course, for a studio creation rather than live performance, there is no performance as such to recreate.

Average slightly technical person

“The speaker reproduces the recorded signal precisely”.

As I imagine it, the technically-literate layman’s definition of high fidelity would be more realistic and in fact correct, but incomplete because it does not specify how the speaker should interact with the acoustic environment.

Traditional audio enthusiast

“Low distortion, low noise, flat frequency response from the speaker”.

The typical audio enthusiast would translate the goal into audio-centric terms that aspire to nothing but reproducing the signal with the right frequency content on average – which ignores the unavoidable timing & phase distortion that occurs in traditional passive speakers. It also allows for horrors such as bass reflex resonators to further smear transients (as opposed to the perfect results they may give on steady state sinusoids).

Computer-literate audio enthusiast

“Low distortion, low noise, and a target frequency response at the listener’s ear”

The modern audio enthusiast who has discovered laptops, microphones and FFTs thinks that the smoothed, simplified frequency response measurement displayed on their screen is the way a human hears sound. It has to be, because the alternatives – the complex frequency domain representation and its equivalent, the time domain waveform – are visually incomprehensible.

My definition

“The recorded signal is reproduced precisely, from a small acoustic source with equal, controlled directivity at all frequencies”.

This definition is based on logical deductions.

The perceptive audio enthusiast would observe that they can always recognise voices, instruments, musical sounds regardless of acoustics, and turn their heads towards those sounds. Therefore, they would deduce that humans have the ability to ‘focus’ on audio sources regardless of acoustics. Clearly, therefore, we don’t just hear the composite frequency response of source combined with the room but have other interesting hearing abilities, probably related to binaural hearing, head movements, and phase and timing.

If we can focus on the source of a sound i.e. hear through the room, the room is not a problem to be solved but simply something normal and natural that exists. It is puzzling to think that we can improve the sound of one thing (the room) by changing something we perceive as separate from it (the sound of the source).

If the frequency response of the source is modified because of some characteristic of the room (tantamount to changing the frequency response of a musical performer in a live venue), we will hear the source as not neutral. Thus ‘room correction’ based on EQ is illogical. Thus the idea of the ‘target frequency response’ is simply wrong.

If we use phase and timing in our hearing, and/or have unknown hearing abilities, there is no excuse for modifying the source’s phase and timing, arbitrarily or otherwise. Thus, if it is possible, the speaker should not modify the recording’s phase or timing. DSP makes this possible. But because of the laws of physics, this would require the speaker to look into the future, and this is only possible if we introduce a delay in the output i.e. latency. For listening to recordings (as opposed to live monitoring) latency is acceptable.

The final part of the puzzle is how the ideal speaker should interact with the room. The speaker is not intended to recreate the exact acoustic characteristics of a literal musical instrument, but to reproduce the audio field that was picked up by a microphone – possibly a composite of many musical sources plus acoustics. There is only one logical ideal in terms of dispersion (i.e. the angle through which the sound emerges from the front of the speaker) and that is: uniform at all frequencies.

What the size of that constant dispersion angle should be is open to debate and the taste of the listener – as discussed in the Grimm LS1 design paper. Most people seem to prefer something that is a compromise between omni-directional and a super-directional beam.

This is exemplified by modern cardioid speakers such as the Kii Three or the D&D 8C. To quote the designer of the 8C:

No voicing required. Other loudspeakers usually require voicing. Based on listening to a lot of recordings, the tonal balance of the loudspeaker is changed so that most recordings sound good. Voicing is required to balance differences between direct and off-axis sound. The 8c has very even dispersion. It is the first loudspeaker I ever designed that did not benefit from voicing. The tonal balance is purely based on anechoic measurements.

Where some confusion may appear is when a real world speaker (almost all existing types until now) does not possess this ideal uniform dispersion characteristic. In this situation, the reverberant sound fails to point back to the source. Effectively the frequency response of the reverberant sound does not correspond with that of the source, and the listener perceives this due to their ability to ‘read’ the acoustic environment in terms of phase, timing and frequency response.

Of course a single musical instrument might have any dispersion characteristic, but if the recording is a composite of several musical sources in their acoustic environment, and a single, unvarying non-neutral dispersion characteristic is applied to all of them, it sounds false. Only neutral dispersion will do.

Some EQ can help here, but it is not true ‘correction’. All that can be done is to steer a middle course between neutral frequency response for the direct sound and the same for the reverberant sound. A commonly known version of this is baffle step compensation which is often applied as a frequency response ‘shelf’ whose frequency is defined by the speaker’s baffle width, and whose depth is dependent on speaker positioning and the room.

The required compensation cannot be deduced from an in-room measurement of the speaker, because that measurement inextricably shows a combination of the room and the speaker’s unknown dispersion characteristics interacting with it. Only some a priori knowledge of the speaker can help to formulate the optimum correction curve.

N.B. the goal is not a flat, or any other ‘target’, in-room response; the goal is minimal deviation from flat direct sound while achieving the most natural in-room sound possible. DSP allows this EQ curve to be applied without distorting the speaker’s phase or timing.

Stereo

It seems reasonable to extend the logic of accurate playback of the signal and uniform dispersion, from mono (one speaker), to stereo (two speakers).

But stereo is where obvious logic gives way to an element of “It has to be heard to be believed”. The operation of stereo is not obvious. Despite all the talk of the human ability to interpret the acoustic environment with miraculous accuracy, stereo relies on fooling human hearing into believing that a sound reproduced from two locations simultaneously is, in fact, coming from a phantom location. This simultaneous reproduction is something that does not occur in nature, hence the potential for this to work.

Aspects that might be potential ‘show stoppers’ include:

  1. Crosstalk from each speaker to ‘the wrong ear’
  2. Nausea-inducing collapse of the stereo image as the listener turns their head or moves off-centre
  3. Room reverberation from individual speakers not pointing back to the phantom stereo source and so sounding unnatural

It turns out that (1) is a fundamental part of the way stereo works over speakers – as opposed to headphones.

And this leads to a very benign situation regarding (2), where the stereo image remains stable and plausible with listener movement.

Because (1) and (2) lead to a counter-intuitively good result where the listener is simply unaware of the location of the speakers, a listening room with reasonable symmetry extends this effect to give a good result for (3) – effectively phantom reverberation. If one speaker was sitting next to a marble wall, floor and ceiling, and the other surrounded by cushions, maybe the result wouldn’t be so good. As it is, a reasonable listening setup does not give rise to any noticeably unnatural reverberation for stereo phantom images .

What High Fidelity Over Speakers Gives Us

The result of high fidelity stereo is remarkable, and could even be the ultimate way to listen to recorded music, being even better than the notion of perfect ‘holographic’ recreation of the listening venue.

The issue is one of compatibility with domestic life and the cognitive dissonance aspect of recreating a large space in your living room. Donning special apparatus in order to listen is a bit of a mood killer; having to sit in a restrictive central location likewise.

Not hearing one’s own voice or that of a companion while listening would seem weird and artificial. Hearing no acoustic link between one’s own voice and the musical performance would also seem peculiar: imagine listening over headphones to an organ playing in a cathedral and speaking to your companion over a lip mic.

Listening to stereo in a living room gets around all these issues naturally and elegantly. It’s good enough for really serious, critical listening, but is effortlessly compatible with more social, casual listening. The addition of the ambient reverberation of the listening room acts as a two-way bridge between the performance and the listeners.