What more do we want?

As I sit here listening to some big symphonic music playing on my ‘KEF’ DSP-based active crossover stereo system, I am struck by the thought: how could it be any better?

I sometimes read columns where people wonder about the future of audio, as though continuous progress is natural and inevitable – and as though we are accustomed to such progress. But it does occur to me that there is no reason why we cannot have reached the point of practical perfection already.

I think the desire for exotic improvements over what we have now has to be seen within the context of most people having not yet heard a good stereo system. They imagine that if the system they heard was expensive, it must therefore represent the state of the art, but in audio I think they could well be wrong. Some time ago, the audio industry and enthusiasts may even have subconsciously sniffed that they were reaching a plateau and begun to stall or reverse progress just to make life more interesting for themselves.

At the science fiction level, people dream of systems that reproduce live events exactly, including the acoustics of the performance venue. Even if this were possible, would it be worth it without the corresponding visuals? (and smells, temperature, humidity, etc.?)

Something like it could probably be achieved using the techniques of the computer games industry: synthesis of the acoustics from first principles, headphones with head tracking, or maybe even some system of printed transducer array wall coverings that could create the necessary sound fields in mid-air if there was no furniture in the room (and knowing the audio industry, it would also supplement the system with some conventional subwoofers). My prediction is that you would try it a couple of times, find it a rather contrived, unnatural experience, and next time revert to your stereo system with two speakers.

On a more practical level, the increasing use of conventional DSP is predicted. We are now seeing the introduction of systems that aim to reduce the (supposedly) unwanted stereo crosstalk that occurs from stereo speakers. The idea is to send out a slightly attenuated antiphase impulse from one speaker for every impulse from the other speaker, that will cancel out the crosstalk at the ‘wrong ear’. It then needs to send out an anti-antiphase impulse from the other speaker to cancel out that impulse as it reaches the other ear, and so on. My gut instinct is that this will only work perfectly at one precise location, and at all other locations there will be ‘residue’ possibly worse than the crosstalk. In fact we don’t seem bothered by the crosstalk from ordinary stereo – I am not convinced we hear it as “colouration”. Maybe it results in a narrowing of the width of the ‘scene’, but with the benefit of increasing its stability. (Hand-waving justification of the status quo, maybe, but I have tried ambiophonic demonstrations, and I was eventually happy to go back to ordinary stereo).

Other predictions include the increasing use of automatic room correction, ultra-sophisticated tone controls and loudness profiles that allow the user to tailor every recording to their own preferences.

Tiny speakers will generate huge SPLs flat down to 20 Hz – the Devialet Phantom is the first example of this, along with the not-so-futuristic drawback of needing to burn huge amounts of energy to do it. Complete multi-channel surround envelopment will come from hidden speakers.

At the hardware fetish end, no doubt some people imagine that even higher resolution sample rates and bit depths must result in better audible quality. Some people probably think that miniaturised valves will transform the listening experience. High resolution vinyl is on the horizon. Who knows what metallurgical miracles await in the science of audio interconnects?

For the IT-oriented audiophile, what is left to do? Multi-room audio, streaming from the cloud, complete control from handheld devices are all here, to a level of sophistication and ease of use limited only by the ‘cognitive gap’ between computer people and normal human users that sometimes results in clunky user interfaces. The technology is not a limiting factor. Do you want the album artwork to dissolve as one track fades out and the new artwork to spiral in and a CGI gatefold sleeve to open as the new track fades in? The ability to talk to your device and search on artist, genre, label, composer, producer, key signature? Swipe with hand gestures like Minority Report? Trivial. There really is no limit to this sort of thing already.

In fact, for the real music lover, I don’t think there is anything left to do. Truth be told, we were most of the way there in 1968.

The basic test is: how much better do you want the experience of summoning invisible musicians to your living room to be? I can’t imagine many worthwhile improvements over what we have now. The sound achievable from a current neutral stereo system is already at ‘hologram’ level; the solidity of the phantom image is total – the speakers disappear. It isn’t a literal hologram that reproduces the acoustics in absolute terms, allowing you to walk around it, of course, but it is a plausible ‘hologram’ from any static listening position, allowing you to ‘walk around it’ in your mind, and it stays plausible as you turn your head.

It isn’t complete surround envelopment, but there is reverberation from your own room all around you, and it seems natural to sit down and face the music. You will hear fully-formed, discrete, musical parts emerging from an open, three dimensional space, with acoustics that may not be related to the space you are listening in. You have been transported to a different venue – if that is what the recording contains. In terms of volume and dynamics, a modern system can give you the same visceral dynamics as the real performance.

And all this is happening in your living room, but without any visuals of the performance – it is music that you are wanting to listen to after all. If the requirement is to experience a literal night at the opera, then short of a synthesised Star Trek type ‘holodeck’ experience you will be out of luck.

You could always watch a high resolution DVD of some performance or the BBC’s Proms programmes, for example, and such visuals may give you a different experience. They will, however, destroy the pure recreation of the acoustic space in front of you because, by necessity, the visuals jump around from location to location, scene to scene in order to maintain the interest level, and your attention will be split between the sound and the imagery. Anyway, a huge TV will cost you about £200 from Tescos these days so that aspect is pretty well covered, too.

The natural partner to a huge TV is multi-channel surround sound. Quadraphonic sound seemed like the next big thing in the 1970s, but didn’t take off at the time. We now have five or seven channel surround sound. Does this improve the musical experience? Some people say so, but that could just be the gimmick factor, or an inferior stereo system being jazzed up a bit. While the correlation between two good speakers produces an unambiguous ‘solution’ to the equations thereof, multiple sources referring to the same ‘impulse’ could result in no clear ‘solution’ – that is, a fuzzy and indistinct ‘hologram’ that our ears struggle to make sense of. Mr. Linkwitz surmises something similar in the case of the centre speaker, plus he finds it visually distracting; with just two speakers, the space between them becomes a virtual blank space in which it is easier to imagine the audio scene. Most recordings are stereo and are likely to remain that way with a large proportion of listeners using headphones. For these reasons, I am happy that stereo is the best way to carry on listening to music.

Can DSP improve the listening experience further? Hardly at all I would say. So-called ‘room correction’ cannot transform a terrible room into a great one, and it doesn’t even transform a so-so one into a slightly better one. It starts from a faulty assumption: that human hearing is just a frequency response analyser for which real acoustics (the room) are an error, rather than human hearing having a powerful acoustics interpreter at the front end. If you attempt to ‘fix’ the acoustics by changing the source you just end up with a strange-sounding source. At a pinch, the listener could listen in the near(er) field to get rid of the room, anyway.

I am convinced that the audiophile obsession with tailoring recordings to the listener’s exact requirements is a red herring: the listener doesn’t want total predictability, and a top notch system shouldn’t be messed about with. As a reviewer of the Kii Three said:

…the traditional kind of subjective analysis we speaker reviewers default to — describing the tonal balance and making a judgement about the competence of a monitor’s basic frequency response — is somehow rendered a little pointless with the Kii Three. It sounds so transparent and creates such fundamentally believable audio that thoughts of ‘dull’ or ‘bright’ seem somehow superfluous.

The user doesn’t have access to the individual elements of the recording. What can be done in terms of, say, reducing the volume of the hi-hats (or whatever) is crude and unnatural and bleeds over every other element of the recording. The only chance of reproducing a natural sound, maintaining the separation between fully-formed elements and reproducing a three dimensional ‘scene’, is for the system to be neutral. When this happens, the level of the hi-hats likely just becomes just part of the performance. Audiophiles who, without any caveat, say they want DSP tone controls in order to fiddle about with recordings have already given up on that natural sound.

In summary, I see the way music was ‘consumed’ 40 or even 50 years ago as already pretty much at the pinnacle: two large speakers at one side or end of a comfortably-furnished living room, filling the space with beautiful sound – at once combining compatibility with domestic living and the ability to summon musicians to perform in the space in a comprehensible form that one or several people can enjoy without having to don special apparatus or sit in a super-critical location. And the fitted carpets of those times were great for the acoustics!

All that has happened in the meantime is just the ‘mopping up’ of the remaining niggles. We (can) now have better performance with respect to distortion, frequency response, dynamic range, and a more solid, holographic audio ‘scene’; no scratches and pops; instant selection of our choice of the world’s total music library. The incentives for the music lover to want anything more than this are surely extremely limited.

Advertisements

KEF Concord Step Response

I recently decided to measure my converted KEF Concords to check their time alignment. In theory, they should be time aligned because the individual drivers have been corrected for linear phase and then delayed appropriately based on distance to the listener, but I hadn’t quite ‘closed the loop’ by making a direct measurement.

In order to do this, I measured with a microphone at tweeter height and 1m away from the speaker – just to make it the standard measurement position. I didn’t change anything about the normal crossover setup I have been using. I used REW to make the impulse response measurement using a sweep from 10Hz to 20 kHz and duration about 24s. Without completely re-arranging the room I could manage about 3.5ms before the first (major) reflection – it would be good to try it in a bigger room or even outdoors. I am curious about what sort of windowing people normally apply just before the main impulse: depending on what you choose influences just how clean everything is leading up to the impulse and, to some extent how clean the leading edge is. Some of the Stereophile graphs look suspiciously ‘sharp’ at the start.

IMG_2170

This is the result I got:

concored step response

I am assuming that the above graph shows that the time alignment of my speakers is pretty reasonable. In Stereophile’s article on measuring speakers they show a similar image:

Fig.11 shows a good step response produced by a time-coherent, three-way loudspeaker, with the outputs of the three drive-units adding in-phase at the microphone position. There are not that many speakers that produce this good a step response. Of the speakers I have measured for Stereophile, only about 10—models from Quad, Thiel, Dunlavy, Spica, and Vandersteen—have step responses this good.

Fig.12 shows a more typical step response, again of a three-way loudspeaker. This time there are actually three step responses apparent in the graph: a narrow, positive-going step response from the tweeter; the next, negative-going step is the midrange unit (as will be seen, it’s connected with opposite polarity to the tweeter); with finally a slow, wide positive pulse from the woofer.

Stereophile is the go-to publication for these sorts of things.

If you do a Google Image Search for ‘stereophile step response’ the results are quite interesting: true step responses are still quite rare. DSP should make it trivial, but for a passive speaker it can generally only be achieved using first order crossover filters, and these, of course, result in the drivers having to cope with substantial bleed of frequencies outside their comfort zone as well as being inflexible.

Strangely, the Beolab 90 looks nothing like a step! – although extenuating circumstances are listed.

117Beo90fig5.jpg

The Kii Three is more like it:917Kii3fig1.jpg

 

A new listening room

concords in extension 1a

Here are my KEF Concords in their new home. Yes, a room whose walls are 1/3 glass! Since that photo was taken, floor-to-ceiling curtains have been installed:

The room is about 6m x 3.5m and has a ceiling height of 2.4m. Apart from the glass, the walls and ceiling are plasterboard, and the concrete floor is carpeted wall-to-wall. There’s a bed and various bits of junk in the room.

To some people it may look like an acoustic nightmare, but it’s actually sounding good. I’ve got the speakers wider apart than shown in the photo. I did originally set the bass -3dB point at 38 Hz, but I think that was too low and it is now at 44 Hz. Apart from that, I haven’t made any provision for ‘room correction’ as such. I am using 5th order crossover filters and the depth of the baffle step compensation curves has been set by ear.

I am pleased to find that I am achieving the desirable effect of the end of the room appearing as a clear window (literally and metaphorically!) onto the performance, particularly ‘purist’ classical recordings. There’s a nice level of clean bass and great imaging and detail higher up. It seems to work just fine with the curtains open or closed – when open the curtains are bunched up in the corners. Maybe looking out through the window does enhance the perception of front-to-back depth of the recording.

Beolab 50, Home HiFi Show 2017

For the first time in a while I have been to a hi fi show, this time in Harrogate, North Yorkshire. It was arranged by the forum HiFi Wigwam, and there were both commercial and amateur exhibitors. It was fairly low key: not all that many exhibitors and not too many visitors on the day I was there (Saturday). I liked the venue, The Old Swan Hotel.

IMG_2186

My main reason for going was to hear the Bang and Olufsen Beolab 90s, but those weren’t there. Instead the Beolab 50s were being demonstrated in a very large room as shown above. There was a technical problem: they couldn’t change the settings for the speakers because of wi fi issues – it can only be done from a phone app (I think) and it needs to find the speakers on the network. So they were stuck on a fairly omni-directional setting and I could really hear this: I desperately needed them to be more focused. But anyway, very generously, the sales guy allowed me to play tracks off a memory stick I had brought, and gave me control of the volume.

My impression was of a beautifully clean, effortless sound, and incredible bass, but the setup was just ‘not right’ and everything sounded too diffuse and distant. Nevertheless, I enjoyed playing some good demo tracks and it was easy to hear that these speakers are not troubled by high volume levels – although I didn’t get anywhere near the volume they normally run demos at!

I must try to get a demo when they are set up properly – I expect great things from them. (I feel a bit of a fraud though, because at over £20,000 I won’t be buying them!).

My friend was very impressed by the looks of these speakers: solid-looking aluminium, fine wooden grilles, and a tweeter ‘pod’ that disappears when the speakers are inactive. In fact, he was very taken by the whole B&O ‘ethos’. Even the remote control for the system was a work of art, being made from a single piece of aluminium. And B&O do the best brochures of any hi-fi company, I think!

In the rest of the show, we heard some enormous horn speakers – I am not a fan, KEF LS50 wireless, some BBC-style LS5/9, some early Harbeths, some Focal speakers, some tiny actives based on balanced mode radiators, and quite a few others. There were various vintage components from the 1970s onwards. My friend was quite taken with the sound of some very classic-looking Tannoys with concentric tweeters, and anything that sounded good on a lower budget – I don’t have the brochure to hand, but may fill in some more details later.

Valves were on show of course, a few turntables, some outrageously inefficient Class A solid state amplifiers, and some active crossovers. In some setups, vinyl sounded OK, but because of the pops and clicks I often found myself wishing for digital sources!

There were the usual vinyl stalls, cables and accessories at eye watering prices, an interesting exhibition of photos of pop stars from the 60s and a great jukebox.

Acoustics-wise, the standard rooms were quite good, I thought, having higher ceilings than some other places.

Skruvsta

I am currently installing myself into a new room in our house extension. My KEFs will be housed there.

I did want to buy a vintage 1960s or 70s swivel armchair for listening to my stereo in the ultimate style, but they are expensive and/or shabby. I bought this one from Ikea for £75 instead. (I have no association with Ikea btw!)

Image result for skruvsta

It may seem like something hardly worth mentioning, but you don’t want a chair that has a high back because of acoustic reflections (as highlighted in the £3150 Lobster listening chair). This one is low but very comfortable, and you don’t have to fit the castors. It’s very light, and the adjustable height means that it can double up as an office chair even without the castors. Clearly it was designed for audio with its carefully shaped, acoustically-absorbent surface. Anyway, I find I can sit it in it for long periods very comfortably, and the stereo sounds pretty good without having to spend three grand…

Audio – Literature Analogy

An audio recording is a bit like a book: created through artistic or intellectual endeavour, then ‘fixed’ as a collection of pure information and distributed to customers for them to ‘consume’ in their own environments. In the case of digital audio, a recording is literally the same as a book, being stored as numbers in a file; you could store a book as a WAV, or an audio recording as a MSWORD file if you wanted.

In rendering the content to be read, there are things you could do to detract from the content of a book:

  • printed too big/too small
  • lighting too dim/too bright
  • inappropriate use of colour
  • blotchy printout
  • typeface varies with content, or randomly
  • corrupted: missing/duplicated/erroneous characters
  • peculiar paper
  • non-neutral typeface – difficult to read or inappropriate e.g. science fiction font for a Jane Austen novel
  • in the case of some ‘boutique’ printing, an appropriate analogy with unreliable ’boutique’ hi-fi equipment might be a book that spontaneously becomes too hot to touch, or occasionally ruins valuable furniture.

The emotional or intellectual force of the book would actually be reduced because of these problems. In other words, it is not true to say that the quality of reproduction doesn’t matter.

However, there is a finite envelope of neutral, even ‘mundane’, reproduction which achieves an optimal result for the reader – after reading the book they can’t tell you anything about the quality of the printing; all they remember is the content, and the content was thrilling.

Maybe the author specifies the typeface. Some books may include fine illustrations or intricate frontispieces which are intrinsic to the book. In these cases, the reproduction needs to be particularly accurate in order to do justice to what the author has created.

Beyond this, is there anything that the printer can do to enhance the appeal of the book? Well, they can create a fancy binding that the reader notices before they start reading; they can use particularly high quality paper; they can print the characters with micron precision. But only a book collector or printing technology enthusiast would care about these refinements – they have no effect on the actual experience of reading the content, and could easily detract from it.

The manufacturers of the ink and the mains cable that powers the printing press could read lots of books in their spare time, attend evening classes in English Literature, study the physiology of the eye, get diplomas in grammar, and tell us in interviews with speciality magazines about how it all informs their craft. But clearly the results would do nothing whatsoever to change the reading experience.

The printer might decide to dabble in science for the first time since they left printing college. They could do scientific trials in aspects of book reproduction where lucky participants get to read snippets of text or passages from ‘typical’ books, responding with their perceptions of differences, preferences, or even ‘emotional stimulation level’ in aspects such as:

  • typeface
  • ink
  • reading light
  • paper texture and weight
  • reading room shape/dimensions/finishes

But the results would be rather obvious and predictable, with anything slightly interesting being clearly the result of fashion, novelty and human fickleness rather than being a universal law.

The only way to actually enhance the book would be to change its content. An algorithm that replaces certain words? Re-writes sections to make them longer or shorter? Clearly in the case of literature, such a thing would be meaningless and idiotic. It is not so different in the case of audio. There is nothing but the recording: there is no technology, effect or algorithm that can meaningfully enhance it.

Conclusion

Domestic hi-fi is no more than the equivalent of rendering the printed content of a book: it can be done adequately or badly, and beyond that there is no meaningful way of improving on it. People become deluded by the idea that the rendering technology can enhance the content – which is obviously ridiculous in the case of books, but less obvious with audio.

But this is not to say that hi-fi is, in itself, boring: achieving ‘adequate’ is not trivial.

Many people are simply not used to hearing adequate reproduction regardless of how much money they spend, so they are not aware that the experience vs. quality graph has a horizontal flat top. And needless to say, the audiophile quality vs. cost graph is more-or-less random, which makes it even more confusing.

The audio enthusiast would be much happier and richer if they got a sense of proportion of what matters, then put all their creativity (and money if they’ve got nothing else to spend it on) into building the equivalent of a pleasant reading room, comfy chair and attractive bookcases rather than a solid gold and diamond reading light.

[Last edited  30/05/17]

Reverberation of a point source, compared with a ‘distributed’ loudspeaker

Here’s a fascinating speaker:

CBT36 Manufacturer of loudspeakers that focus on elimination of box resonances.

It uses many transducers arranged in a specific curve, driven in parallel and with ‘shading’ i.e. graduated volume settings along the curve, to reduce vertical dispersion but maintain wide dispersion in the horizontal. I can see how this might appear quite appealing for use in a non-ideal room with low ceilings or whatever.

It is a variation on the phased array concept, where the outputs of many transducers combine to produce a directional beam. It is effectively relying on differing path lengths from the different transducers producing phase cancellation or reinforcement in the air at different angles as you move off axis. All the individual wavefronts sum correctly at the listener’s ear to reproduce the signal accurately.

At a smaller scale, a single transducer of finite size can be thought of as many small transducers being driven simultaneously. At high frequencies (as the wavelengths being reproduced become short compared to the diameter of the transducer) differing path lengths from various parts of the transducer combine in the air to cause phase cancellation as you move off axis. This is known as beaming and is usually controlled in speaker design by using drivers of the appropriate size for the frequencies they are reproducing. Changes in directivity with frequency are regarded as undesirable in speaker design, because although the on-axis measurements can be perfect, the ‘room sound’ (reverberation) has the ‘wrong’ frequency response.

A large panel speaker suffers from beaming in the extreme, but with Quad electrostatics Peter Walker introduced a clever trick, where phase is shifted selectively using concentric circular electrodes as you move outwards from the centre of the panel. At the listener’s position, this simulates the effect of a point source emanating from some distance behind the panel, increasing the size of the ‘sweet spot’ and effectively reducing the high frequency beaming.

There are other ways of harnessing the power of phase cancellation and summation. Dipole speakers’ lower frequencies cancel out at the sides (and top and bottom) as the antiphase rear pressure waves meet those from the front. This is supposed to be useful acoustically, cutting down on unwanted reflections from floor, walls and ceiling. A dipole speaker may be realised by mounting a single driver on a panel of wood with a hole in it, but it behaves effectively as two transducers, one of which is in anti-phase to the other. Some people say they prefer the sound of such speakers over conventional box speakers.

This all works well in terms of the direct sound reaching the listener and, as in the CBT speaker above, may provide a very uniform dispersion with frequency compared to conventional speakers. But beyond the measurements of the direct sound, does the reverberation sound quite ‘right’? What if the overall level of reverberation doesn’t approximate the ‘liveness’ of the room that the listeners notice as they talk or shuffle their feet? If the vertical reflections are reduced but not the horizontal, does this sound unnatural?

Characterising a room from its sound

The interaction of a room and an acoustic source could be thought of as a collection of simultaneous equations – acoustics can be modelled and simulated for computer games, and it is possible for a computer to do the reverse and work out the size and shape of the room from the sound.  If the acoustic source is, in fact, multiple sources separated by certain distances, the computer can work that out, too.

Does the human hearing system do something similar? I would say “probably”. A human can work quite a lot out about a room from just its sound – you would certainly know whether you were in an anechoic chamber, a normal room or a cathedral. Even in a strange environment, a human rarely mistakes the direction and distance from which sound is coming. Head movements may play a part.

And this is where listening to a ‘distributed speaker’ in a room becomes a bit strange.

Stereo speakers can be regarded as a ‘distributed speaker’ when playing a centrally-placed sound. This is unavoidable – if we are using stereo as our system. Beyond that, what is the effect of spreading each speaker itself out, or deliberately creating phased ‘beams’ of sound?

Even though the combination of direct sounds adds up to the familiar sound at the listener’s position as though emanating from its original source, there is information within the reflections that is telling the listener that the acoustic source is really a radically different shape. Reverberation levels and directions may be ‘asymmetric’ with the apparent direct sound.

In effect, the direct sound says we are listening to this:

Image result for zoe wanamaker cassandra

but the reverberation says it is something different.

Image result for zoe wanamaker cassandra

Might there be audible side effects from this? In the case of the dipole speaker, for example, the rear (antiphase) signal reflects off the back wall and some of it does make its way forwards to the listener. In my experience, this comes through as a certain ‘phasiness’ but it doesn’t seem to bother other people.

From a normal listening distance, most musical sources are small and appear close to being a ‘point source’. If we are going to add some more reverberation, should it not appear to be emanating as much as possible from a point source?

It is easy to say that reverberation is so complex that it is just a wash of ‘ambience’ and nothing more; all we need to do is give it the right ‘colour’ i.e. frequency response. And one of the reasons for using a ‘distributed speaker’ may be to reduce the amount of reverberation anyway. But I don’t think we should overdo it: we surely want to listen in real rooms because of the reverberation, not despite it. What is the most side effect-free way to introduce this reverberation?

Clearly, some rooms are not ideal and offer too much of the wrong sort of reverberation. Maybe a ‘distributed speaker’ offers a solution, but is it as good as a conventional speaker in a suitable room? And is it really necessary, anyway? I think some people may be misguidedly attempting to achieve ‘perfect’ measurements by, effectively, eliminating the room from the sound even though their room is perfectly fine. How many people are intrigued by the CBT speaker above simply because it offers ‘better’ conventional in-room measurements, regardless of whether it is necessary?

Conclusion

‘Distributed speakers’ that use large, or multiple, transducers may achieve what they set out to do superficially, but are they free of side-effects?

I don’t have scientific proof, but I remain convinced that the ‘Rolls Royce’ of listening remains ‘point source’ monopole speakers in a large, carpeted, furnished room with a high ceiling. Box speakers with multiple drivers of different sizes are small and can be regarded as being very close to a single transducer, but are not so omnidirectional that they create too much reverberation. The acoustic ‘throw’ they produce is fairly ‘natural’. In other words, for stereo perfection, I think there is still a good chance that the types of rooms and speakers people were listening to in the 1970s remain optimal.

[Last edited 17.30 BST 09/05/17]

The Sound of a Symphony Orchestra

Last night I went to a symphony concert: Shostakovich’s 10th, preceded by Prokofiev’s Piano Concerto No. 2 at the West Road Concert Hall, Cambridge.

west roadWe were sitting in the second row from the front – so quite close to the piano. I wish I had taken a photograph, but I was so paranoid about my phone ringing mid performance that I left it turned off! The image above shows the empty venue.

We really enjoyed the concert. Chiyan Wong is an amazing piano soloist, and CCSO were spectacular. The sound was formidable from a large orchestra, and we got to hear the fairly new Steinway grand in great detail – the piano was removed during the interval, for the Shostakovich that followed.

Now, I do often listen to this sort of music with my system, but this was the first time I had been to a concert to hear this specific Russian ‘genre’. Of course I couldn’t help but make a mental comparison of the sound of the real thing versus the hi-fi facsimile that I am used to, as I was listening. And you know what? I have to say that a good hi-fi gives a pretty good rendition of the real sound.

The real thing was very loud, but also very rich – I have observed that ‘painfully loud’ is more a function of quality than volume; you need good bass to balance the rest of the spectrum. So this was very loud, but at no time painful. Bass from the orchestra was wonderful, but didn’t take me by surprise – I sometimes hear such bass from my system. (It did take me by surprise the first time I heard it from a hi-fi system, however!).

Some people cite piano as being the most difficult thing for a hi-fi system to reproduce. I don’t know where they get that from: I loved the sound of the piano, and I think a good system can reproduce it fairly easily.

I was struck by the homogeneity within the different sections of the orchestra. Listening to a recording of just a piano, or just the violins, would not tell you very much about an audio system. It is only when you hear a combination of the piano, the violins and the brass, say, that any ‘formant’ (i.e. fixed frequency response signature) within your system would show up.

As discussed previously, ‘imaging’ of the orchestra was not as pin sharp as you get in some recordings, but many purist recordings portray the true effect quite accurately. The width of the ‘soundstage’ of a stereo system is more-or-less right, and the room you are listening in enhances the recording’s ‘ambience’ around and behind you.

Of course the concert is a very special experience. The stereo version isn’t always as deep, open and spacious, nor is the envelopment as complete but, all in all, I think if you sit down in the right frame of mind to listen to a fine orchestral recording using a good hi-fi system, you are getting a very reasonable impression of the sound, excitement and visceral quality of the real thing. And that really is quite an amazing idea.

Room correction. What are we trying to achieve?

The short version…

The recent availability of DSP is leading some people to assume that speakers are, and have always been, ‘wrong’ unless EQ’ed to invert the room’s acoustics.

In fact, our audio ancestors didn’t get it wrong. Only a neutral speaker is ‘right’, and the acoustics of an average room are an enhancement to the sound. If we don’t like the sound of the room, we must change the room – not the sound from the speaker.

DSP gives us the tools to build a more neutral speaker than ever before.


There are endless discussions about room correction, and many different commercial products and methods. Some people seem to like certain results while others find them a little strange-sounding.

I am not actually sure what it is that people are trying to achieve. I can’t help but think that if someone feels the need for room correction, they have yet to hear a system that sounds so good that they wouldn’t dream of messing it up with another layer of their own ‘EQ’.

Another possibility is that they are making an unwarranted assumption based on the fact that there are large objective differences between the recorded waveform and what reaches the listener’s ears in a real room. That must mean that no matter how good it sounds, there’s an error. It could sound even better, right?

No.

A reviewer of the Kii Three found that that particularly neutral speaker sounded perfect straight out of the box.

“…the traditional kind of subjective analysis we speaker reviewers default to — describing the tonal balance and making a judgement about the competence of a monitor’s basic frequency response — is somehow rendered a little pointless with the Kii Three. It sounds so transparent and creates such fundamentally believable audio that thoughts of ‘dull’ or ‘bright’ seem somehow superfluous.”

The Kii Three does, however, offer a number of preset “contour” EQ options. As I shall describe later, I think that a variation on this is all that is required to refine the sound of any well-designed neutral speaker in most rooms.

A distinction is often made between correction of the bass and higher frequencies. If the room is large, and furnished copiously, there may be no problem to solve in either case, and this is the ideal situation. But some bass manipulation may be needed in many rooms. At a minimum, the person with sealed woofers needs the roll-off at the bottom end to start at about the right frequency for the room. This, in itself, is a form of ‘room correction’.

The controversial aspect is the question of whether we need ‘correction’ higher up. Should it be applied routinely (some people think so), as sparingly as possible, or not at all? And if people do hear an improvement, is that because the system is inherently correcting less-than-ideal speakers rather than the room?

Here are some ways of looking at the issue.

  1. Single room reflections give us echoes, while multiple reflections (of reflections) give us reverberation. Performing a frequency response measurement with a neutral transducer and analysing the result may show a non-flat FR at the listening position even when smoothed fairly heavily. This is just an aspect of statistics, and of the geometry and absorptivity of the various surfaces in the room. Some reflections will result in some frequencies summing in phase, to some extent, and others not.
  2. Experience tells us that we “hear through” the room to any acoustic source. Our hearing appears not to be just a frequency response analyser, but can separate direct sound from reflections. This is not a fanciful idea: adaptive software can learn to do the same thing.

The idea is also supported by some of the great and the good in audio.

Floyd Toole:

“…we humans manage to compensate for many of the temporal and timbral variations contributed by rooms and hear “through” them to appreciate certain essential qualities of sound sources within these spaces.”

Or Meridian’s Bob Stuart:

“Our brains are able to separate direct sound from the reverberation…”

  1. If we EQ the FR of the speaker to obtain a flat in-room measured response including the reflections in the measurement, it seems that we will subsequently “hear through” the reflections to a strangely-EQ’ed direct sound. It will, nevertheless measure ‘perfectly’.
  2. Audio orthodoxy maintains that humans are supremely insensitive to phase distortion, and this is often compounded with the argument that room reflections completely swamp phase information so it is not worth worrying about. This denies the possibility that we “hear through” the room. Listening tests in the past that purportedly demonstrated our inability to hear the effects of phase have often been based on mono only, and didn’t compare distorted with undistorted phase examples – merely distorted versus differently distorted, played on the then available equipment.
  3. Contradicting (4), audiophiles traditionally fear crossovers because the phase shifts inherent in (non-DSP) crossovers are, they say, always audible. DSP, on the other hand, allows us to create crossovers without any phase shift i.e. they are ‘transparent’.
  4. At a minimum, speaker drivers on their baffles should not ‘fight’ each other through the crossover – their phases should be aligned. The appropriate delays then ensure that they are not ‘fighting’ at the listener’s position. The next level in performance is to ensure that their phases are flat at all frequencies i.e. linear phase. The result of this is the recorded waveform preserved in both frequency and time.
  5. Intuitively, genuine stereo imaging is likely to be a function of phase and timing. Preserving that phase and timing should probably be something we logically try to do. We could ‘second guess’ how it works using traditional rules of thumb, deciding not to preserve the phase and timing, but if it is effectively cost-free to do it, why not do it anyway?
  6. A ‘perfect’ response from many speaker/room combinations can be guaranteed using DSP (deconvolution with the impulse response at that point, not just playing with a graphic equaliser). Unfortunately, it will only be valid for a single point in space, and moving 1mm from there will produce errors and unquantifiable sonic effects. Additionally, ‘perfect’ refers to the ‘anechoic chamber’ version of the recording, which may not be what most people are trying to achieve even if the measurements they think they seek mean precisely that.
  7. Room effects such as (moderate) reverberation are a major difference between listening with speakers versus headphones, and are actually desirable. ‘Room correction’ would be a bad thing if it literally removed the room from the sound. If that is the case, what exactly do we think ‘room correction’ is for?
  8. Even if the drivers are neutral (in an anechoic situation) and crossed over perfectly on axis, they are of finite size and mounted in a box or on a baffle that has a physical size and shape. This produces certain frequency-dependent dispersion characteristics which give different measured, and subjective, results in different rooms. Some questions are:
    • is this dispersion characteristic a ‘room effect’ or a ‘speaker effect’. Or both?
    • is there a simple objective measurement that says one result is better than any other?
    • is there just one ‘right’ result and all others are ‘wrong’?
  1. Should room correction attempt to correct the speaker as well? Or should we, in fact, only correct the speaker? Or just the room? If so, how would we separate room from speaker in our measurements? Can they, in fact, be separated?

I think there is a formula that gives good results. It says:

  • Don’t rely on feedback from in-room measurements, but do ‘neutralise’ the speaker at the most elemental levels first. At every stage, go for the most neutral (and locally correctable) option e.g. sealed woofers, DSP-based linear phase crossovers with time alignment delays.
  • Simply avoid configurations that are going to give inherently weird results: two-way speakers, bass reflex, many types of passive crossover etc. These may not even be partially correctable in any meaningful way.
  • Phase and time alignment are sacrosanct. This is the secret ingredient. You can play with minor changes to the ‘tone colour’ separately, but your direct sound must always maintain the recording’s phase and time alignment. This implies that FIR filters must be used, thus allowing frequency response to be modified independently of phase.
  • By all means do all the good stuff regarding speaker placement, room treatments (the room is always ‘valid’), and avoiding objects and asymmetry around the speakers themselves.
  • Notionally, I propose that we wish to correct the speaker not the room. However, we are faced with a room and non-neutral speaker that are intertwined due to the fact that the speaker has multiple drivers of finite size and a physical presence (as opposed to being a point source with uniform directivity at all frequencies). The artefacts resulting from this are room-dependent and can never really be ‘corrected’ unambiguously. Luckily, a smooth EQ curve can make the sound subjectively near enough to transparent. To obtain this curve, predict the baffle step correction for each driver using modelling or standard formula with some some trial-and-error regarding the depth required (4, 5, 6 dB?); this is a very smooth EQ curve. Or, possibly (I haven’t done this myself), make many FR measurements around the listening area, smooth and average them together, and partially invert this, again without altering phase and time alignment.
  • You are hearing the direct sound, plus separately-perceived ‘room ambience’. If you don’t like the sound of the ambience, you must change the room, not the direct sound.

Is there any scientific evidence for these assertions? No more nor less than any other ‘room correction’ technique – just logical deduction based on subjective experience. Really, it is just a case of thinking about what we hear as we move around and between rooms, compared to what the simple in-room FR measurements show. Why do real musicians not need ‘correction’ when they play in different venues? Do we really want ‘headphone sound’ when listening in rooms? (If so, just wear headphones or sit closer to smaller speakers).

This does not say that neutral drivers alone are sufficient to guarantee good sound – I have observed that this is not the case. A simple baffle step correction applied to frequency response (but leaving phase and timing intact) can greatly improve the sound of a real loudspeaker in a room without affecting how sharply-imaged and dynamic it sounds. I surmise that frequency response can be regarded as ‘colour’ (or “chrominance” in old school video speak), independent of the ‘detail’ (or “luminance”) of phase and timing. We can work towards a frequency response that compensates for the combination of room and speaker dispersion effects to give the right subjective ‘colour’ as long as we maintain accurate phase and timing of the direct sound.

We are not (necessarily) trying to flatten the in-room FR as measured at the listener’s position – the EQ we apply is very smooth and shallow – but the result will still be perceived as a flat FR. Many (most?) existing speakers inherently have this EQ built in whether their creators applied it deliberately, or via the ‘voicing’ they did when setting the speaker up for use in an average room.

In conclusion, the summary is this:

  • Humans “hear through” the room to the direct sound; the room is perceived as a separate ‘ambience’. Because of this, ‘no correction’ really is the correct strategy.
  • Simply flattening the FR at the listening position via EQ of the speaker output is likely to result in ‘peculiar’ perceived sound, even if the in-room measurements purport to say otherwise.
  • Speakers have to be as rigorously neutral as possible by design, rather than attempting to correct them by ‘global feedback’ in the room.
  • Final refinement is a speaker/room-dependent, smooth, shallow EQ curve that doesn’t touch phase and timing – only FIR filters can do this.

[Last updated 05/04/17]

The Secret Life of the Signal

Some people actually think of stereo imaging as a “parlour trick” that is very low on the list of desirable attributes that an audio system should have. They ‘rationalise’ this by saying that in the majority of recordings, any stereo image is an artificial illusion, created by the recording engineer either deliberately or by accident; it does not accurately represent the live event – because there may not even have been a single live event. So how can it matter if it is reproduced by the playback system or not? Perhaps it is even best to suppress it: muddle it up with some inter-channel crosstalk like vinyl does, or even listen in mono.

At the top of the list of desirable attributes for a hi-fi system, most audiophiles would put “timbre”, “tonality”, low distortion, clean reproduction at high volumes, dynamics, deep bass. All of these qualities can be experienced with a mono signal and a single speaker – in fact in the Harman Corporation’s training for listening, monophonic reproduction is recommended for when performing listening tests.

Because their effects are not so obvious in mono, phase and timing are regarded by many as supremely unimportant. I quote one industry luminary:

Time domain does not enter my vocabulary…

Sound is colour?

We know that our eyes respond to detail and colour in different ways. In the early days of colour TV (analogue) it was found that the signal could be broadcast within practical bandwidths because the colour (chrominance) information could be be sent at lower resolution than the detail (luminance).

There is, perhaps, a parallel in hearing, too: that humans have separate mechanisms for responding to sound in the frequency and time domains. But the conventional hi-fi industry’s implicit view is that we only hear in the frequency domain: all the main measurements are in the frequency domain, and steady state signals are regarded as equivalent to real music. A speaker’s overall response to phase and timing is ignored almost totally or, at best, regarded as a secondary issue.

I think that this is symptomatic of an idea that pervades hi-fi: that the signal is ‘colour’. Sure, it varies as the music is playing, but the exact nature of that variation is almost incidental; secondary in comparison to the importance of the accurate reproduction of colour, and that in testing, all that matters is whether a uniform colour is accurately reproduced.

There has, nevertheless, been some belated lip service paid to the importance of timing, with the hype around MQA (still usually being played over speakers with huge timing errors!), and a number of passive speakers with sloping front baffles for time alignment. Taken to its logical conclusion, we have these:

wilson_wamm_master_chronosonic_final_prototype_news_oct

Their creator says, though:

It’s nice if you have phase coherence, but it is not necessary

So they still fall short of the “straight wire with gain” ideal. It still says that the signal is something we can take liberties with, not aspiring to absolute accuracy in the detail as long as we get a good neutral white and a deep black, and all uniform (‘steady state’) colours reproduced with the correct shading. It says that we understand the signal and it is trivial. Time alignment by moving the drivers backwards and forwards is an easy gimmick, so we can go that far, however.

Another Dimension

I think that with DSP-corrected drivers and crossovers, we are beginning to find that there is another dimension to the common or garden stereo signal; one that has been viewed as a secondary effect until now. Whether created accidentally or not, the majority of recordings contain ‘imaging’ that is so clear that it gives us access to the music in a way we were not aware of. It allows us to ‘walk around’ the scene in which the recording was made. If it is a composite, multitrack recording, it may not be a real scene that ever existed, but the individual elements are each small scenes in themselves, and they become clearly delineated. It is ‘compelling’.

I can do no better than quote a brand new review of the Kii Three written by a professional audio engineer, that echoes something I was saying a couple of weeks ago: imaging is not just a ‘trick’, but improves the separation of the acoustic sources in a way that goes beyond the traditional attributes of low distortion & colouration.

I think he also echoes something I said about believable imaging giving the speaker a ‘free pass’ in terms of measurements. As in my DIY post, he says that the speaker sounds so transparent and believable that there is no point in going any further in criticising the sound. A suggestion, perhaps, that conventional ‘in-room’ measurements and ‘room correction’, are shown up as the red herrings they are if a system sets out to be genuinely neutral by design, at source.

Firstly, the traditional kind of subjective analysis we speaker reviewers default to — describing the tonal balance and making a judgement about the competence of a monitor’s basic frequency response — is somehow rendered a little pointless with the Kii Three. It sounds so transparent and creates such fundamentally believable audio that thoughts of ‘dull’ or ‘bright’ seem somehow superfluous.

… it is dominated by such a sense of realistic clarity, imaging, dynamics and detail that you begin almost to forget that there’s a speaker between you and the music.

…I’ve never heard anything anywhere near as adept at separating the elements of a mix and revealing exactly what is going on. I found myself endlessly fascinated, in particular, by the way the Kii Three presents vocals within a mix and ruthlessly reveals how good the performance was and how the voice was subsequently treated (or mistreated). Performance idiosyncrasies, microphone character, room sound, compression effects, reverb and delay techniques and pitch-correction artifacts that I’d never noticed before became blindingly obvious — it was addictive.

…One of the joys of auditioning new audio gear, especially speakers, is that I occasionally get to rediscover CDs or mixes that I thought I knew intimately. I can honestly say that with the Kii Three, every time I played some old familiar material I heard something significant in the way it performs…

…Low-latency mode …switch[es] off the system phase correction. It makes for a fascinating listening experience. …the change of phase response is clearly audible. The monitor loses a little of its imaging ability and overall precision in low-latency mode so that things sound a little less ‘together’.

“The Kii Three is one of the finest speakers I’ve ever heard and undoubtedly the best I’ve ever had the privilege and pleasure of using in my own home.”