The Secret Science of Pop


In The Secret Science of Pop, evolutionary biologist Professor Armand Leroi tells us that he sees pop music as a direct analogy for natural selection. And he salivates at the prospect of a huge, complete, historical data set that can be analysed in order to test his theories.

He starts off by bringing in experts in data analysis from some prestigious universities, and has them crunch the numbers on the past 50 years of chart music, analysing the audio data for numerous characteristics including “rhythmic intensity” and “agressiveness”. He plots a line on a giant computer monitor showing the rate of musical change based on an aggregate of these values. The line shows that the 60s were a time of revolution – although he claims that the Beatles were pretty average and “sat out” the revolution. Disco, and to a lesser extent punk, made the 70s a time of revolution but the 80s were not.

He is convinced that he is going to be able to use his findings to influence the production of new pop music. The results are not encouraging: no matter how he formulates his data he finds he cannot predict a song’s chart success with much better than random accuracy. The best correlation seems to be that a song’s closeness to a particular period’s “average” predicts high chart success. It is, he says, “statistically significant”.

Armed with this insight he takes on the role of producer and attempts to make a song (a ballad) being recorded at Trevor Horn’s studio as average as possible by, amongst other things, adjusting its tempo and adding some rap. It doesn’t really work, and when he measures the results with his computer, he finds that he has manoeuvred the song away from average with this manual intervention.

He then shifts his attention to trying to find the stars of tomorrow by picking out the most average song from 1200 tracks that have been sent into BBC Radio 1 Introducing. The computer picks out a particular band who seem to have a very danceable track, and in the world’s least scientific experiment ever, he demonstrates that a BBC Radio 1 producer thinks it’s OK, too.

His final conclusion: “We failed spectacularly this time, but I am sure the answer is somewhere in the data if we can just find it”.

My immediate thoughts on this programme:

-An entertaining, interesting programme.

-The rule still holds: science is not valid in the field of aesthetic judgement.

-If your system cannot predict the future stars of the past, it is very unlikely to be able to predict the stars of the future.

-The choice of which aspects of songs to measure is purely subjective, based on the scientist’s own assumptions about what humans like about music. The chances of the scientist not tweaking the algorithms in order to reflect their own intuitions are very remote. To claim that “The computer picked the song with no human intervention” is stretching it! (This applies to any ‘science’ whose main output is based on computer modelling).

-The lure of data is irresistible to scientists but, as anyone who has ever experimented with anything but the simplest, most controlled, pattern recognition will tell you, there is always too much, and at the same time never enough, training data. It slowly dawns on you that although theoretically there may be multidimensional functions that really could spot what you are looking for, you are never going to present the training data in such a way that you find a function with 100%, or at least ‘human’ levels of, reliability.

-Add to that the myriad paradoxes of human consciousness, and of humans modifying their tastes temporarily in response to novelty and fashion – even to the data itself (the charts) – and the reality is that it is a wild goose chase.

(very relevant to a post from a few months ago)

The Trouble with Hobbies

Have you ever suddenly been inspired to embark on a brand new hobby?

Maybe you’ve never owned a boat before, but having seen one chug by on the river you have thought “I’d love to do that!”. A quick browse in the classified ads shows lots of boats that look fine, and they don’t cost all that much. Basically any boat would be great, and you could gradually do it up, even if it is a bit shabby now. In your mind’s eye, your family will love you when you are able to take them on spur-of-the-moment, cheap weekends messing about on the water, starting in a few weeks’ time.

From this high point where the world is your oyster, you begin to take the advice of the magazines and other experienced hobbyists. Before you have even owned a boat, you become aware of the hierarchy of boat owners, and the boats that would render you a laughing stock if you owned them. You become aware of the general consensus on different types of bilge pump – not something you ever wanted to know. You begin to form an idea of the boat you should really go for – and it is not one of the bargain basement jobs you first saw. You might just about be able to stretch to a boat that would put you in the lower echelons of boat ownership but, importantly, not on the very lowest rung. You could always, perhaps, move up from there over time.

It now turns into an all-consuming hobby with the goal of having a boat on the river at the end of the year. In the end it costs thousands, and your children have grown up and left home before your boat finally takes to the water. You hit a bridge and rip the top off your boat the first time you take it out. You feel sick and abandon the whole hobby (a true story).

That’s the nature of male hobbies. They start out as wonderful, spontaneous ideas, but can turn into nightmares – mainly due to the existence of other hobbyists! Audio is one of those hobbies, I think. Ridiculously, the prices paid for bits of audio knickknackery rival the costs of boats.

A person could be seized one day by the idea of hi-fi as a way to improve their life, buy an amp and some secondhand speakers off Gumtree for £100, and plug their tablet or laptop headphone socket into the amp using a £2 cable. Hey presto, a hi-fi system that will sound much better than what they had before, and which has tinker-ability via the buying and selling of speakers and the audio streaming/library software options; there is no urgency in changing the amp and tablet hardware as they are pretty much perfect in what they do. The speakers are almost like pieces of furniture, so the person can indulge their tastes in how they look as well as how they sound, and they can be restored using standard DIY skills – a nice mini-hobby.

But what if the person does the natural male thing, and starts to read the magazines and forums? Immediately they will realise that their tablet’s headphone output is a joke in the audio world. They need to spend at least a few hundred pounds on a half-decent ‘DAC’, plus a couple of hundred on a budget cable. And of course, this is only for convenience: real audio quality can only be had if they own a decent turntable and a special vibration-free shelf to put it on. Where do they go from there? They need to make a decision on which turntable and which cartridge to go for. They need to take a view on cables, power conditioners, valve or solid state amps, accessories like cable lifters and record cleaning machines. Each decision, they are assured by their fellow hobbyists, will result in “night and day” differences in the sound.

After some months agonising over it, they assemble a beginner’s system for about £3,000 – they will upgrade as budget allows. It sounds OK, but they know that even though the brand is a highly recommended one, the particular model of valve amplifier they could afford has “hints of a slightly reticent mid range” – one of the magazines said so – and if they listen carefully, perhaps they can hear that… But the more powerful 18 Watt model cost £800 more and they decided against it. Perhaps they made the wrong decision. The nightmare unfolds…


Here’s a bit of a discovery (for me, anyway – and as yet they have only a few thousand listens on Spotify). They’re a Canadian duo (twin sisters) called Tasseomancy, a name that refers to the art of tea making…

I imagine that fans of Kate Bush will love this. I really like Dead Can Dance & Neil Young, the opening track from their latest album Do Easy. The track Missoula is beautiful:

The near field listening chair

stereo-lays-an-eggI often read around various audio forums in order to try to understand the world of audiophilia. One common theme seems to be:

  • Measurements at the listening position are important, and can be interpreted directly as good and bad. Good = flat (or, by rule of thumb, a slightly downward-tilted frequency response); bad = non-flat frequency response.
  • There is no limit to what can be justified in order to get ‘good’ measurements at the listening position. Floor-to-ceiling speaker arrays; huge panel speakers; multiple subwoofers; diffusers; absorbers; traps; DSP-based ‘room correction’.
  • It is acceptable for the system to sound poor everywhere in the room except for a single seat.
  • The size of the speakers may be such that they loom over the listener disconcertingly, but this is considered acceptable.

What is really happening is that audiophiles are trying to get the recorded signal beamed directly, and anechoically, to their ears while sitting in a room. This is the logic of seeking ‘perfect’ measurements at the listening position and what gives rise to speakers that dominate the room.

It occurs to me that the whole thing could be scaled down to a fraction of the size, and could give even better measurements, largely removing the influence of the room (which is what the logic of ‘perfect’ measurements is seeking).

Ultimately, the audiophile could sit in an armchair close to some pretty small speakers – that could even be mounted on the chair. By only being a couple of feet away from the listener, they can be relatively low powered, yet with more-than-adequate bass. Room reflections become far lower in proportion compared to the signal than with ordinary speakers, giving that ‘anechoic’ sound that audiophiles are (whether they know it or not) pursuing. The restriction to a single listening seat is no disadvantage as we have seen. Ambient volume becomes much lower, so this would be ideal for listening late at night without disturbing the neighbours.

Such chairs already have a precedent as in the image above (! time for a revival?). And some people already do prefer near field listening – those who sit at a desk with reasonable quality speakers either side of their PC monitor are experiencing something similar.

However, I don’t think the audiophiles with room-dominating speakers really are seeking “that anechoic sound”. It is only the the idea that “speakers and room are a system” with the assumption that the two should sum to the recorded signal, that is giving rise to huge, room-dominating systems and a single listening seat. The unfortunate audiophiles are constantly getting closer and closer to ‘perfection’ while their systems sound worse and worse. People back in the 1970s had a far better listening experience.

I think Siegfried Linkwitz is one of very few people who understand this:

A listening room is the modern equivalent to forest and savanna. We still use the now hardwired portions of the hearing process but adapt them to the new situation. We still can ignore the static background, in this case the room and the fixed loudspeakers, and automatically focus our attention on the direct sound, even when it creates an illusion….

Two-channel playback in a normal living space can provide an experience that is fully satisfying as loudspeakers and room disappear and the illusion of being transported to a different place and moment in time takes over.

The Secret Life of the Signal

Some people actually think of stereo imaging as a “parlour trick” that is very low on the list of desirable attributes that an audio system should have. They ‘rationalise’ this by saying that in the majority of recordings, any stereo image is an artificial illusion, created by the recording engineer either deliberately or by accident; it does not accurately represent the live event – because there may not even have been a single live event. So how can it matter if it is reproduced by the playback system or not? Perhaps it is even best to suppress it: muddle it up with some inter-channel crosstalk like vinyl does, or even listen in mono.

At the top of the list of desirable attributes for a hi-fi system, most audiophiles would put “timbre”, “tonality”, low distortion, clean reproduction at high volumes, dynamics, deep bass. All of these qualities can be experienced with a mono signal and a single speaker – in fact in the Harman Corporation’s training for listening, monophonic reproduction is recommended for when performing listening tests.

Because their effects are not so obvious in mono, phase and timing are regarded by many as supremely unimportant. I quote one industry luminary:

Time domain does not enter my vocabulary…

Sound is colour?

We know that our eyes respond to detail and colour in different ways. In the early days of colour TV (analogue) it was found that the signal could be broadcast within practical bandwidths because the colour (chrominance) information could be be sent at lower resolution than the detail (luminance).

There is, perhaps, a parallel in hearing, too: that humans have separate mechanisms for responding to sound in the frequency and time domains. But the conventional hi-fi industry’s implicit view is that we only hear in the frequency domain: all the main measurements are in the frequency domain, and steady state signals are regarded as equivalent to real music. A speaker’s overall response to phase and timing is ignored almost totally or, at best, regarded as a secondary issue.

I think that this is symptomatic of an idea that pervades hi-fi: that the signal is ‘colour’. Sure, it varies as the music is playing, but the exact nature of that variation is almost incidental; secondary in comparison to the importance of the accurate reproduction of colour, and that in testing, all that matters is whether a uniform colour is accurately reproduced.

There has, nevertheless, been some belated lip service paid to the importance of timing, with the hype around MQA (still usually being played over speakers with huge timing errors!), and a number of passive speakers with sloping front baffles for time alignment. Taken to its logical conclusion, we have these:


Their creator says, though:

It’s nice if you have phase coherence, but it is not necessary

So they still fall short of the “straight wire with gain” ideal. It still says that the signal is something we can take liberties with, not aspiring to absolute accuracy in the detail as long as we get a good neutral white and a deep black, and all uniform (‘steady state’) colours reproduced with the correct shading. It says that we understand the signal and it is trivial. Time alignment by moving the drivers backwards and forwards is an easy gimmick, so we can go that far, however.

Another Dimension

I think that with DSP-corrected drivers and crossovers, we are beginning to find that there is another dimension to the common or garden stereo signal; one that has been viewed as a secondary effect until now. Whether created accidentally or not, the majority of recordings contain ‘imaging’ that is so clear that it gives us access to the music in a way we were not aware of. It allows us to ‘walk around’ the scene in which the recording was made. If it is a composite, multitrack recording, it may not be a real scene that ever existed, but the individual elements are each small scenes in themselves, and they become clearly delineated. It is ‘compelling’.

I can do no better than quote a brand new review of the Kii Three written by a professional audio engineer, that echoes something I was saying a couple of weeks ago: imaging is not just a ‘trick’, but improves the separation of the acoustic sources in a way that goes beyond the traditional attributes of low distortion & colouration.

I think he also echoes something I said about believable imaging giving the speaker a ‘free pass’ in terms of measurements. As in my DIY post, he says that the speaker sounds so transparent and believable that there is no point in going any further in criticising the sound. A suggestion, perhaps, that conventional ‘in-room’ measurements and ‘room correction’, are shown up as the red herrings they are if a system sets out to be genuinely neutral by design, at source.

Firstly, the traditional kind of subjective analysis we speaker reviewers default to — describing the tonal balance and making a judgement about the competence of a monitor’s basic frequency response — is somehow rendered a little pointless with the Kii Three. It sounds so transparent and creates such fundamentally believable audio that thoughts of ‘dull’ or ‘bright’ seem somehow superfluous.

… it is dominated by such a sense of realistic clarity, imaging, dynamics and detail that you begin almost to forget that there’s a speaker between you and the music.

…I’ve never heard anything anywhere near as adept at separating the elements of a mix and revealing exactly what is going on. I found myself endlessly fascinated, in particular, by the way the Kii Three presents vocals within a mix and ruthlessly reveals how good the performance was and how the voice was subsequently treated (or mistreated). Performance idiosyncrasies, microphone character, room sound, compression effects, reverb and delay techniques and pitch-correction artifacts that I’d never noticed before became blindingly obvious — it was addictive.

…One of the joys of auditioning new audio gear, especially speakers, is that I occasionally get to rediscover CDs or mixes that I thought I knew intimately. I can honestly say that with the Kii Three, every time I played some old familiar material I heard something significant in the way it performs…

…Low-latency mode …switch[es] off the system phase correction. It makes for a fascinating listening experience. …the change of phase response is clearly audible. The monitor loses a little of its imaging ability and overall precision in low-latency mode so that things sound a little less ‘together’.

“The Kii Three is one of the finest speakers I’ve ever heard and undoubtedly the best I’ve ever had the privilege and pleasure of using in my own home.”

Vinyl sales overtake digital


It seems that a milestone was passed last week when UK vinyl sales hit £2.5m versus digital’s £2.1m. Vinyl has enjoyed eight straight years of growth.

It’s no skin off my nose, except where new recordings begin to be produced primarily with the vinyl release in mind. This is where dynamics are reduced, bass and treble attenuated, and stereo effects restricted while the recording is being made, rather than a special post-processed master being made for vinyl. We digital listeners are then forced to listen to the less dynamic version as well.

I just had a quick look to see if I could find an actual ‘Top Tips for Mastering Vinyl’ example for the above. The first site I looked at contained this:

Mastering for Vinyl

…For minimalist recordings, you want to try and minimize large phase differences between channels… This means that spaced omnis are really not such a good idea if you can avoid them.

If you can’t avoid them, try and put loud bass sources in the center of the soundstage, as close to the center mic as possible. Even if you are using coincident miking, this is a good idea.

In other words, once vinyl becomes a major consideration, actual recording techniques are dictated by the medium. In the example above, it is not crazy studio effects that are being limited, but the microphone placement used in minimalist recordings that you might have thought were not a problem.

Image is Everything

I have a couple of audiophile friends for whom ‘imaging’ is very much a secondary hi-fi goal, but I wonder if this is because they’ve never really heard it from their audio systems.

What do we mean by the term anyway? My definition would be the (illusion of) precise placement of acoustic sources in three dimensions in front of the listener – including the acoustics of the recording venue(s). It isn’t a fragile effect that only appears at one infinitesimal position in space or collapses at the merest turn of the head, either.

It is something that I am finding is trivially easy for DSP-based active speakers. Why? Well I think that it just falls out naturally from accurate matching between the channels and phase & time-corrected drivers. Logically, good imaging will only occur when everything in a system is working more-or-less correctly.

I can imagine all kinds of mismatches and errors that might occur with passive crossovers, exacerbated by the compromises that are forced on the designer such as having to use fewer drivers than ideal, or running the drivers outside their ideal frequency ranges.

Imaging is affected by the speaker’s interaction with the room, of course. The ultimate imaging accuracy may occur when we eliminate the room’s contribution completely, and sit in a very tight ‘sweet spot’, but this is not the most practical or pleasant listening situation. The room’s contribution may also enhance an illusion of a palpable image, so it is not desirable to eliminate it completely. Ultimately, we are striking a balance between direct sound and ambient reflections through speaker directivity and positioning relative to walls.

A real audiophile scientist would no doubt be interested in how exactly stereo imaging works, and whether listening tests could be devised to show the relative contributions of poor damping, phase errors, Doppler distortion, timing misalignment etc. Maybe we could design a better passive speaker as a result. But I would say: why bother? The DSP active version is objectively more correct, and now that we have finally progressed to such technology and can actually listen to it, it clearly doesn’t need to do anything but reproduce left and right correctly – no need for any other tricks or the forlorn hope of some accidental magic from natural, organic, passive technology.

An ‘excuse’ for poor imaging is that in many real musical situations, imaging is not nearly as sharp as can be obtained from a good audio system. This is true: if you go to a classical concert and consciously listen for where a solo brass instrument (for example) is coming from, you often can’t really tell. I presume this is because you are generally seated far from the stage with a lot of people in the way and much ‘ambience’ thrown in. I presume that the conductor is hearing much stronger ‘imaging’ than we are – and many recordings are made with the mics much closer than a typical person sitting in the auditorium; the sharper imaging in the recording may well be largely artificial.

However, to cite this as a reason for deliberately blurring the image in some arbitrary way is surely a red herring. The image heard by the audience member is still ‘coherent’ even if it is not sharp. And the ‘artificially imaged’ recording contains extra information that is allowing us to separate the various acoustic sources by a different mechanism than the one that might allow us to tease out the various sources in a mono recording, say. It reduces effort and vastly increases the clarity of the audio ‘scene’.

I think that good imaging due to superior time alignment and phase is going to be much more important than going to the Nth degree to obtain ultra-low low harmonic distortion.

If we mess up the coherence between the channels we are getting the worst of all worlds: something that arbitrarily munges the various acoustic sources and their surroundings in response to signal content. An observation that is sometimes made is that the music “sticks to the speakers” rather than appearing in between. What are our brains to make of it? It must increase the effort of listening and blur the detail of what we are hearing.

Not only this, but good imaging is compelling. Solid voices and instruments that float in mid air grab the attention. The listener immediately understands that there is a lot more information trapped in a stereo recording than they ever knew.

Television’s first night


There was an interesting BBC programme last week which celebrated the 80th anniversary of the launch night of BBC television. It aimed to re-create the original event as closely as possible, even to the extent of building replicas of some of the technology in use at the time.

For those who don’t know the story, the BBC launched television in 1936 running two types of technology in parallel: the Logie Baird mechanical system and EMI’s vacuum tube-based electronic system. Baird’s system was used first, and then the whole thing was repeated using the electronic system. The original television receivers, of which only 300 had been sold by the launch, had a switch to allow the receiver to be put into Baird or EMI mode – I hadn’t realised that, even on launch day, some receivers were using electronic picture tubes even if the Baird camera system wasn’t.

The Baird mechanical system was incredible: for truly live images it had to use a “flying spot” camera where the scene (the face of a presenter sitting in a pitch black booth) was raster-scanned with a high intensity dot of light and the resulting reflected light level picked up by a photo-sensor. In order to achieve 240 lines of resolution, two rotating discs were used; one a metre in diameter and spinning so fast its edges were almost supersonic, and a synchronised slower disc with a spiral mask which selected one of several sets of dots on the main disc.

More general scenes of groups of performers and so on were recorded live to film which was developed in a portable ‘lab’ mounted beneath the camera, ready to be scanned by a flying spot scanner some 54 seconds later – this was effectively the first ever telecine machine. The transition from live to telecine sections required logistical coordination around the 54 second delay, meaning that the performers had to start 54 seconds before the live announcer stopped talking, and the announcer had to wait in silence after the performance ended before someone jabbed him in the ribs through a hole in the side of the booth and he could start talking again. (I found this whole thing baffling: why was it important that any of it was truly ‘live’? Why not just do it all delayed by 54 seconds? Perhaps, as was implied in the program, the telecine images were not quite as crisp as the live..?).

Anyway, the writing was on the wall for the mechanical system, and the six month competition was terminated after only three months. My question is: why did it take so long? Why did people go to such heroic lengths to pursue a solution that was so obviously doomed? Perhaps men’s fascination with spinning discs in preference to electronic solutions is universal. I have no doubt that there were some diehards who thought that the mechanical system somehow captured a better picture than a soulless glass tube.

The Engineering Department of Cambridge University had the fun of developing the replica flying spot camera (although with only 60 lines of resolution as opposed to the original 240). Things got a bit fraught in the build up to the ‘launch’ however: a persistent mechanical howl from the disc mechanism threatened to ruin everything. It seemed to take several hours of effort and anguish before someone had the bright idea of applying a drop of oil…

None of the original presenters, performers or staff present at the launch night are still with us, but the BBC did manage to track down a 104 year old engineer who worked for Baird. The launch of television seems like so long ago, and yet this man was already 24 when it happened. He is still sharp as a pin and when Hugh Hunt of Cambridge University told him he was building a replica flying spot machine using an aluminium disc instead of the original steel, his brow furrowed immediately and he asked “Are you sure aluminium will be strong enough to withstand the centrifugal force?”.

I enjoyed seeing the old abandoned studios in Alexandra Palace, and Paul Marshall‘s barn full of old TV equipment, including some of the earliest camera tubes in existence. He has built a working camera based on a genuine Iconoscope tube, using modern electronics to drive it, giving us a close re-creation of pre-war electronic TV pictures. I somehow find old TV equipment quite moving; TV was an important part of my childhood and I can’t help but think of the snippets of the golden past that might have been captured through those lenses.