A difference doesn’t have to be audible to matter

A common view among scientifically-oriented audiophiles is that controlled, double blind listening tests are equivalent to objective measurements. Such people may be further subdivided into those who believe that ‘preference’ is a genuine indicator of what matters, and those who believe that only ‘difference’ can count as real science in listening tests.

I can think of many, many, philosophical objections to the whole notion of assuming that listening tests are ‘scientific’, but just concentrating on that supposedly rigorous idea of ‘difference’ being scientific, we might suggest the following analogy:

Suppose there is a scene in a film that shows a thousand birds wheeling over a landscape. The emotional response is to see the scene as ‘magnificent’. In this case, the ‘magnificence’ stems from the complexity; the order emerging out of what looks like chaos; the amazing spectacle of so many similar creatures in one place. It would be reasonable, perhaps, to suggest that the ‘magnificence’ is more-or-less proportional to the number of birds.

Well, suppose we wish to stream that scene over the internet in high definition. The bandwidth required to do this would be prohibitive so we feed it into a lossy compression algorithm. One of the things it does is to remove noise and grain, and it finds the birds to be quite noise-like. So it removes a few of them, or fuses a few of them together into a single ‘blob’. Would the viewer identify the difference?

I suggest not. Within such complexity, they might only be able to see it if you pointed it out to them, and even after they knew where to look they might not see it the next time. But the ‘magnificence’ would have been diminished nevertheless. By turning up the compression ratio, we might remove more and more of the birds.

This sensation of ‘magnificence’ is not something you can put into words, and it is not something you are consciously aware of. But in this case, it would be reasonable to suggest that the ‘magnificence’ was being reduced progressively. The complexity would be such that the viewer wouldn’t consciously see the difference when asked to spot it, but clearly the emotional impact would be being reduced/altered.

For all their pretensions to scientific rigour, double blind listening tests are fundamentally failing in what they purport to do. They can only access the listener’s conscious perception, while the main aim of listening to music is to affect the subconscious. Defects in audio hardware (distortion, non-flat frequency response, phase shifts, etc.) all tend to blur the separation between individual sources and in so doing reduce the complexity of what we are hearing – it becomes a flavoured paste rather than maintaining its original granularity and texture, but we cannot necessarily hear the difference consciously. Nevertheless, we can work out rationally that complexity is one of the things that the we respond to emotionally. So even though we cannot hear a difference, the emotional impact is being affected, anyway.

Asus Xonar U7 MkII

Xonar U7

I’ve had to replace my original Asus Xonar U7 USB DAC with a new one. For the last few months it has refused to reset properly and would only do so when warm air was blown onto it for a couple of minutes! (I don’t now know how I discovered that, but it always worked). I have a feeling that there was always something marginal about that design’s reset, having experienced a problem the first time I used it, but then it worked fine for a few years.

Archimago once mentioned something about the reset of that chipset being dodgy, too.

The new one is the MK II version and I had to change a single item in the software’s config file: its name as discovered by Alsa is ‘MKII’ as opposed to the previous ‘U7’. Also, the sequence of jacks on the back has changed, as seen by my software.

It sounds just the same as the Mk I and thankfully seems to reset every time.

Listening to Goodmans Goodwoods

Goodmans Goodwood

For a short while I have in my possession a pair of Goodmans Goodwood speakers similar to the pair above. As far as I know, they are completely original, yet in very nice condition.

Size-wise they look large by modern standards (a good thing, I say), being approximately 760 x 360 mm from the front, although fairly shallow front-to-back at 280mm. They are teak veneered and have slightly funky textured grille cloths. I imagine they looked quite retro even when they were made in 1973.

As you can see, they are three-way with the smaller drivers placed side by side rather than lined up vertically. Oddly (I always think), such speakers don’t come as a mirror image pair, both having the tweeter on the right.

I think they look fabulous and as a small boy I would have been absolutely thrilled by them. My brother and I had a criterion of 2 feet minimum height before a speaker could be considered truly great, so these would have been very much in the premier league.

So, given that they are going on for half a century old, how do they sound? The answer is: great. Clean, clear, deep, relaxed-yet-punchy, hi-fi. The sound is commendably free of colouration. If these were my only pair of speakers I would be perfectly happy with them.

My observations would be that the imaging isn’t as ridiculously solid and palpable as you get with the DSP active alternative, but unless you’re concentrating and listening for it, they’re fine. On loud, complex passages they might be slightly ‘congested’ – possibly. The bass doesn’t go very, very low, but it’s fine; they have plenty of body and warmth. Overall, perhaps they don’t quite have the sheer ‘slam’ of the DSP active alternative. I wouldn’t be as happy playing them as loud as my normal speakers can go – but that is very loud.

Of course, measurements-wise they’re not going to be perfect, and I’m sure if I did some serious A-B comparisons I’m going to hear differences. Maybe they won’t be as easy on the ears as my other speakers. But the general impression is good.

They are head and shoulders above many modern, expensive speakers I have heard. Of course they have several built-in advantages that would still be desirable features today:

  1. They are sealed rather than the bass reflex alternative, meaning the bass is tauter and more accurate;
  2. The baffles are wide, producing a lower baffle step frequency, making them closer to neutral in terms of dispersion;
  3. They are three-way, meaning the drivers aren’t stretched over excessively wide frequency spans, there’s less intermodulation including Doppler distortion, and there is an absence of beaming and less abrupt dispersion transition from driver to driver.

Hi-fi speakers could obviously be pretty good even 45 years ago. Modern alternatives may be smaller and in some ways better on paper, but they don’t always sound as nice.

Audio Objects

Some audio pessimists are convinced that because a stereo recording and reproduction system can only sample a couple of infinitesimal points within the overall ‘sound field’, it is futile to imagine that the result can be anything but a pale imitation of the real thing.

Others are convinced that although the efforts of recording engineers mean that the recording itself is passable, the problem is that speakers playing in a real room are not conveying it to their ears accurately enough. They attempt to alter what comes out of the speakers in order to compensate for the room.

And stereo itself when reproduced over speakers is assumed to be so flawed due to crosstalk to the ‘wrong’ ear that it can’t possibly work, and we must be deluding ourselves if we think it does.

These are assumptions made by people who cannot allow themselves to enjoy their audio systems. I suggest they are fixated on the wrong things and the situation is much better than they imagine. A different way to view the problem of audio is this:

It is a mistake to think that the aim of the system is to recreate the precise waveform that would have reached the listener’s ear at the actual performance. It is not practically achievable, would not necessarily reproduce a realistic perception of the actual performance in the context of the listener’s own room anyway, and also it is not necessary. Most people couldn’t even tell you which of two plausible versions of the waveform are absolutely correct, and that is because they’re not hearing a waveform; they’re hearing musical and acoustic ‘objects’. It is the relationship between those objects that is paramount.

An ‘object’ could be:

  • A voice
  • A choir
  • Silence
  • A sad note
  • A happy chord
  • Song lyrics
  • A violin
  • A rhythm
  • An orchestra
  • A concert hall
  • Tension

The primary aim of a hi-fi system (as opposed to a kitchen radio, for example) is to maintain the integrity of single objects and the separation of different objects.

The secondary aim of the hi-fi system is to present the objects in a plausible way that allows for the normal behaviour of the listener; the sound basically appearing to emanate from in front of the listener, separable by distance and direction, without strange acoustic sensations if they turn to talk to their neigbour.

And that’s it. Everything flows from there.

  • Harmonic distortion (and the corresponding intermodulation distortion) smears objects together.
  • Bumps and dips in the frequency and phase response of a speaker smears the objects together and punches holes in the integrity of the objects.
  • Noise smears itself over all the objects, obscuring their separation.
  • Limited bass damages the integrity of certain objects and smears those objects together.
  • Timing errors smear objects together. Resonators in speakers (e.g. bass reflex) that take time to ‘get going’ and time ‘to stop’ damage and smear the objects together.
  • Stereo obviously aids in separating objects. Just a pair of speakers provides a continuous spread of individual, separate acoustic sources. And stereo over speakers isn’t flawed; the crosstalk to the ‘wrong ear’ is how it produces the image in the first place.
  • Realistic volume helps to elevate objects above the noise floor, with a more natural sound due to our hearing’s volume-dependent frequency sensitivity.

So some objects make it out of a kitchen radio OK: a rhythm, a melody or the words of a song. But other objects may be severely damaged or smeared together. On a hi-fi system you might hear two separate guitars but on the radio they’re just a wash over the whole sound. On the hi-fi you hear a startling, deep bass note, but on the radio there’s nothing.

And the hi-fi system does things ‘without trying’ – which is why some people can’t believe it’s doing them. The stereo system with speakers automatically creates a two-way interaction between the listeners and the performance because both are subject to the listening room’s acoustics. This also solves the problem of how to cram a concert hall into the listener’s room as well as the more intimate performances. Is the aim for the musicians and venue to come to the listener or for the listener to go to the performance? The stereo system with speakers creates a hybrid: regard it as the listener’s room being transported to the venue and its end wall being opened up.

Life Without the Beatles

I don’t often go to the cinema these days, but I might just have to make an exception to see the new film called Yesterday.

There’s a thought-provoking article about the film in The Guardian today. Its best observation, I think, is that The Beatles had “exquisite taste”. Indeed, even if you were the most talented or highly-skilled creative person who had put in the requisite 10,000 hours as the Beatles undoubtedly did, where would you be without good taste in the first place?

I’m glad I didn’t sell my CDs

img_3373.jpg

Since finding out that the latest Beatles remasters have been deliberately spoiled with the application of ‘gentle limiting’ I have been thinking that I am glad I didn’t sell my CDs when I went over to streaming.

In the early days of streaming, it seemed as though every different version of every album was available, which made the whole thing seem even better value. It seemed a no-brainer to ditch the CDs and enjoy access to the contents of all the obscure record shops in the world without having to leave your armchair.

Now, I think the streaming people are ‘consolidating’ their catalogues by losing the original albums and generally leaving only the ‘Remastered’ version – the one that has been optimised for listening to in cars and earbuds. And then, I suspect, they will begin to stop mentioning the fact that the album is remastered, and all that will be left will be compressed or ‘limited’ music – even the old stuff.

So I am glad I wasn’t organised enough to sell my several hundred CDs that have been sitting in cardboard boxes for the last 15 years. I don’t care whether they’re files or physical CDs – it’s not the format I’m bothered about – but at least I know that the recordings date from a time when recording engineers actually understood what digital audio was for.

Dynamic Range Compression versus ‘Limiting’

Archimago’s image showing the waveform envelope of a track from The Cranberries’ first album compared to their last

In his latest article, Archimago rightly criticises the recording industry for vandalising its own product, exemplified by a comparison between one of the Cranberries’ first recordings and their last.

Highly compressed albums consciously or unconsciously are heard as unnatural, and uncomfortable to listen to long term. Listeners lose interest quickly due to fatigue and move on or just play the music in the background since they don’t capture our attention due to the lack of dynamics.

This is quite right, except I don’t think this is just compression; it’s far worse than that.

Dynamic range compression means that quiet content is proportionately boosted in volume, and loud content is attenuated. It is done with a time constant so that locally, the waveform’s shape at any point is more-or-less intact. The ‘envelope’ is squashed, but dynamics still remain. To a first approximation, with simple dynamic range conversion no information is lost because it can be reversed.

Old-fashioned DBX-style tape noise reduction was practical proof that dynamic range compression can be reversed: it relied upon dynamic compression when recording, and complementary expansion on playback. In the 1980s I built my own companding noise reduction systems (to other people’s designs) and even used analogue companding to make my own 8-bit delay line effects system sound as though it was 16-bit (-that was the idea, anyway).

Dynamic range compression is a necessary evil when it comes to making recordings suitable for playback in a wide variety of circumstances but it can also be used creatively in the production of music recordings. It is common to apply compression to individual tracks and sources within the multitrack recording as well as in the mastering of the whole track. More complex effects stray beyond simple compression where the characteristic is not just a straight line but constructed from different segments to follow a complex ‘law’ with adjustable characteristics. Even more flexible systems allow different compression characteristics to be used on different frequency ranges.

But what Archimago shows is not ‘compression’ and is more like ‘limiting’. This is an extreme form of the same process that the paragons of virtue at Abbey Road casually smeared over the digital remasters of Sgt. Peppers recently. But phew! In that case it was only “gentle” “digital limiting” that they were applying totally unnecessarily to an already-50 year old release that was compatible with the limited dynamic range of analogue equipment never mind the incredibly wide dynamic range capabilities of modern digital audio. Talk about vandalism!

Calling it dynamic range compression and expressing it in ‘DR’ ratings hides the true horror. It isn’t just the dynamic range being squashed, but is the alteration of the shape of the waveform. Information is being lost and garbage added in the form of distortion of the waveform. Whether that is dressed up as ‘loudness optimisation’, ‘soft clipping’ or just straightforward flat top clipping (that, depending on where it is applied in the chain could also produce ‘illegal’ digital samples that produce unpredictable values from DAC reconstruction filters) is a minor quibble.

This ‘process’ is applied without a second thought and without reference to the content at any particular moment. It is clearly applied indiscriminately to the whole composite recording, suggesting that the recording engineers do their stuff and then “limiting” is applied over the top of it. Maybe this is even dignified with terms like “mastering” and “engineering”, but as Archimago says, it is simple vandalism.

The Definition of ‘High Fidelity’

How would various people define the term ‘high fidelity’?

Average person in the street

“Recreates the sound of being at the performance”

I think an imaginary typical person would probably say something like this, especially after being told that audiophiles pay as much as they earn in a year for a piece of wire.

Unfortunately, high fidelity audio doesn’t reproduce the actual sound of the performance unless, perhaps, through binaural recording and playback over headphones. This technique doesn’t pretend to maintain the illusion as you turn your head and move around, though.

And, of course, for a studio creation rather than live performance, there is no performance as such to recreate.

Average slightly technical person

“The speaker reproduces the recorded signal precisely”.

As I imagine it, the technically-literate layman’s definition of high fidelity would be more realistic and in fact correct, but incomplete because it does not specify how the speaker should interact with the acoustic environment.

Traditional audio enthusiast

“Low distortion, low noise, flat frequency response from the speaker”.

The typical audio enthusiast would translate the goal into audio-centric terms that aspire to nothing but reproducing the signal with the right frequency content on average – which ignores the unavoidable timing & phase distortion that occurs in traditional passive speakers. It also allows for horrors such as bass reflex resonators to further smear transients (as opposed to the perfect results they may give on steady state sinusoids).

Computer-literate audio enthusiast

“Low distortion, low noise, and a target frequency response at the listener’s ear”

The modern audio enthusiast who has discovered laptops, microphones and FFTs thinks that the smoothed, simplified frequency response measurement displayed on their screen is the way a human hears sound. It has to be, because the alternatives – the complex frequency domain representation and its equivalent, the time domain waveform – are visually incomprehensible.

My definition

“The recorded signal is reproduced precisely, from a small acoustic source with equal, controlled directivity at all frequencies”.

This definition is based on logical deductions.

The perceptive audio enthusiast would observe that they can always recognise voices, instruments, musical sounds regardless of acoustics, and turn their heads towards those sounds. Therefore, they would deduce that humans have the ability to ‘focus’ on audio sources regardless of acoustics. Clearly, therefore, we don’t just hear the composite frequency response of source combined with the room but have other interesting hearing abilities, probably related to binaural hearing, head movements, and phase and timing.

If we can focus on the source of a sound i.e. hear through the room, the room is not a problem to be solved but simply something normal and natural that exists. It is puzzling to think that we can improve the sound of one thing (the room) by changing something we perceive as separate from it (the sound of the source).

If the frequency response of the source is modified because of some characteristic of the room (tantamount to changing the frequency response of a musical performer in a live venue), we will hear the source as sounding unnatural. Thus ‘room correction’ based on EQ is illogical. Thus the idea of the ‘target frequency response’ is simply wrong.

If we use phase and timing in our hearing, and/or have unknown hearing abilities, there is no excuse for modifying the source’s phase and timing, arbitrarily or otherwise. Thus, if it is possible, the speaker should not modify the recording’s phase or timing. DSP makes this possible. But because of the laws of physics, this would require the speaker to look into the future, and this is only possible if we introduce a delay in the output i.e. latency. For listening to recordings (as opposed to live monitoring) latency is acceptable.

The final part of the puzzle is how the ideal speaker should interact with the room. The speaker is not intended to recreate the exact acoustic characteristics of a literal musical instrument, but to reproduce the audio field that was picked up by a microphone – possibly a composite of many musical sources plus acoustics. There is only one logical ideal in terms of dispersion (i.e. the angle through which the sound emerges from the front of the speaker) and that is: uniform at all frequencies.

What the size of that constant dispersion angle should be is open to debate and the taste of the listener – as discussed in the Grimm LS1 design paper. Most people seem to prefer something that is a compromise between omni-directional and a super-directional beam.

This is exemplified by modern cardioid speakers such as the Kii Three or the D&D 8C. To quote the designer of the 8C:

No voicing required. Other loudspeakers usually require voicing. Based on listening to a lot of recordings, the tonal balance of the loudspeaker is changed so that most recordings sound good. Voicing is required to balance differences between direct and off-axis sound. The 8c has very even dispersion. It is the first loudspeaker I ever designed that did not benefit from voicing. The tonal balance is purely based on anechoic measurements.

Where some confusion may appear is when a real world speaker (almost all existing types until now) does not possess this ideal uniform dispersion characteristic. In this situation, the reverberant sound fails to point back to the source. Effectively the frequency response of the reverberant sound does not correspond with that of the source, and the listener perceives this discrepancy due to their ability to ‘read’ the acoustic environment in terms of phase, timing and frequency response.

Of course a single musical instrument might have any dispersion characteristic, but if the recording is a composite of several musical sources in their acoustic environment, and a single, unvarying non-neutral dispersion characteristic is applied to all of them, it sounds false. Only neutral dispersion will do.

Some EQ can help here, but it is not true ‘correction’. All that can be done is to steer a middle course between neutral frequency response for the direct sound and the same for the reverberant sound. A commonly known version of this is baffle step compensation which is often applied as a frequency response ‘shelf’ whose frequency is defined by the speaker’s baffle width, and whose depth is dependent on speaker positioning and the room.

The required compensation cannot be deduced from an in-room measurement of the speaker, because that measurement inextricably shows a combination of the room and the speaker’s unknown dispersion characteristics interacting with it. Only some a priori knowledge of the speaker can help to formulate the optimum correction curve.

N.B. the goal is not a flat, or any other ‘target’, in-room response; the goal is minimal deviation from flat direct sound while achieving the most natural in-room sound possible. DSP allows this EQ curve to be applied without distorting the speaker’s phase or timing.

Stereo

It seems reasonable to extend the logic of accurate playback of the signal and uniform dispersion, from mono (one speaker), to stereo (two speakers).

But stereo is where obvious logic gives way to an element of “It has to be heard to be believed”. The operation of stereo is not obvious. Despite all the talk of the human ability to interpret the acoustic environment, stereo relies on fooling human hearing into believing that a sound reproduced from two locations simultaneously is, in fact, coming from a phantom location. This simultaneous reproduction is something that does not occur in nature, hence the potential for this to work.

Aspects that might be potential ‘show stoppers’ include:

  1. Crosstalk from each speaker to ‘the wrong ear’
  2. Nausea-inducing collapse of the stereo image as the listener turns their head or moves off-centre
  3. Room reverberation from individual speakers not pointing back to the phantom stereo source and so sounding unnatural

It turns out that (1) is a fundamental part of the way stereo works over speakers – as opposed to headphones.

And this leads to a very benign situation regarding (2), where the stereo image remains stable and plausible with listener movement.

Because (1) and (2) lead to a counter-intuitively good result where the listener is simply unaware of the location of the speakers, a listening room with reasonable symmetry extends this effect to give a good result for (3) – effectively phantom reverberation. If one speaker was sitting next to a marble wall, floor and ceiling, and the other surrounded by cushions, maybe the result wouldn’t be so good. As it is, a reasonable listening setup does not give rise to any noticeably unnatural reverberation for stereo phantom images .

What High Fidelity Over Speakers Gives Us

The result of high fidelity stereo is remarkable, and could even be the ultimate way to listen to recorded music, being even better than the notion of perfect ‘holographic’ recreation of the listening venue.

The issue is one of compatibility with domestic life and the cognitive dissonance aspect of recreating a large space in your living room. Donning special apparatus in order to listen is a bit of a mood killer; having to sit in a restrictive central location likewise.

Not hearing one’s own voice or that of a companion while listening would seem weird and artificial. Hearing no acoustic link between one’s own voice and the musical performance would also seem peculiar: imagine listening over headphones to an organ playing in a cathedral and speaking to your companion over a lip mic.

Listening to stereo in a living room gets around all these issues naturally and elegantly. It’s good enough for really serious, critical listening, but is effortlessly compatible with more social, casual listening. The addition of the ambient reverberation of the listening room acts as a two-way bridge between the performance and the listeners.

Predictable audio

Various audiophiles and forums are very measurements-oriented. The implication is that any audio system or device can be approached as a blank sheet of paper; open to testing to reveal its performance and that ‘anything is possible’. Until you measure it, you just don’t know what it’s capable of, but that after you make a set of basic measurements you will know all about it.

The truth, surely, is going to be that two brands of similar-sized two-way speakers with 8″ and 1″ drivers are going to behave very similarly to each other in terms of distortion, dispersion and so on. The exact choice of crossover frequencies and slopes may make differences, but they will be entirely predictable trade-offs. Making the speakers slim-fronted floor standers with 6″ and 1″ drivers will change the characteristics in entirely predictable ways.

The same, surely, will be true of DACs that use the same chipsets. Only a bad mistake will change the core performance – there is no ‘trade secret’ that will magically improve what goes on in a chip. Sure, measurements will be a check on the absence of that mistake, but we shouldn’t expect any amazing differences.

Amplifiers of the same type will all behave similarly to each other. If someone claims to have drastically improved the performance of an existing type of amplifier, scepticism is the order of the day. Do these innovations only work with a resistive load, are they stable with temperature, do they progressively shut the amplifier down on long organ notes? Basic measurements won’t necessarily show these limitations up.

So I would say that the most interesting aspects of audio are going to be the motivations of the designers, their obsessions and what they dismiss as unimportant. For me, a speaker that derives from the quote “I realised our product line-up had a gap for a simple passive two-way, ported compact monitor” is not going to be worth measuring. It simply won’t provide any surprises – unless it’s been really messed up badly.

This is something I didn’t know ten years ago. Back then, I somehow assumed that audio was so mysterious that it was, to use an American expression, a crapshoot. A two-way ported speaker might somehow hit just the right balance if designed by a genius and might measure and sound fundamentally better than a larger, more sophisticated speaker with DSP. The reason I was so mistaken was the amazing price range of audio of various types and the work of reviewers who simply weren’t capable of discriminating good sound from bad; who would use the same glowing vocabulary to describe the sound of both a small two-way monitor and a large three-way speaker when in reality they were miles apart.

There are no unpredictable miracles unless the design does something fundamental that has not been done before. The world’s best DAC and amplifier are not going to transform your audio experience; the world’s most expensive two-way passive direct radiator speaker is not going to sound as good as a competent three-way active DSP speaker. On the other hand, innovations like motion feedback and driver power compression elimination might sound different from anything you’ve heard before; as might phase and timing alignment, active cardioid response, BACCH etc.

Such radical innovations defy measurement unless the design fundamentals and implementation are known; and if you know the design fundamentals and implementation, the measurements are going to be predictable and yet will still not tell you how they will sound.

What has happened to pop music?

Image result for totp logo

The BBC revived the tradition of the Christmas Day Top of the Pops again this year, and I witnessed it out of curiosity. Apart from watching Jools Holland’s programme when it’s on (mainly on fast forward), I have little contact with what the young folk might be into these days. What is the state of popular music in 2018?

Well, I can tell you, it’s appalling.

The first thing you notice is the vocal style that many of today’s artists are affecting to varying degrees. The missus and I are quite good at doing an impression: it involves moving your tongue back in your mouth and squeezing the vocals out through the smaller gap thus created. You also need to warble with your throat exaggeratedly. The kids seem to think it’s cool.

And the other thing you notice is the ‘tune’. On TOTP the other day, I would say that more than half the songs featured the same anodyne chord progression (if you can really dignify it with that term), just performed with slightly different production styles, rhythms and degrees of bombast – indistinguishable from the stuff you get in the Eurovision Song Contest. Truly, today’s music creators have lost the ability to dazzle the ears with a brilliant melody or even ‘riff’.

It really is just an ‘industry’ going through the motions, and kids indiscriminately lapping up whatever they are given. It isn’t just me getting old; it’s an absolute, objective decline in the ambition of pop music, musicians and listeners.