The Machine Learning delusion

This morning my personal biological computer detected a correlation between these two articles:

Sony’s SenseMe™ – A Superior Smart Shuffle

Machine learning: why we mustn’t be slaves to the algorithm

In the first article, the author is praising a “smart shuffle” algorithm that sequences tracks in your music collection with various themes such as “energetic, relax, upbeat”. It does this by analysing the music’s mood and tempo. It sounds amazing:

“I would never think of playing Steve Earl’s “Loretta” right after listening to the Boulder Philharmonic’s performance of “Olvidala,” or Ry Cooder’s “Crazy About an Automobile” followed by Doc and Merle Watson playing “Take Me Out to the Ballgame,” but I enjoyed not only the selections themselves but the way SensMe™ juxtaposes one after another, like a DJ who knows your collection better than you do…what will “he” play next? Surprise! It’s all good.”

And the algorithm’s effects go beyond mere music:

“SenseMe™ has brought domestic harmony – interesting selections for me and music with a similar mood for her. That’s better than marriage counseling! “

The author of the second article takes a more sceptical view. He notes the dumbness of Machine LearningTM algorithms, but says that

“…because these outputs are computer-generated, they are currently regarded with awe and amazement by bemused citizens …”

He quotes someone who is aware of the limitations:

“Machine learning is like a deep-fat fryer. If you’ve never deep-fried something before, you think to yourself: ‘This is amazing! I bet this would work on anything!’ And it kind of does. In our case, the deep fryer is a toolbox of statistical techniques. The names keep changing – it used to be unsupervised learning, now it’s called big data or deep learning or AI. Next year it will be called something else. But the core ideas don’t change. You train a computer on lots of data, and it learns to recognise structure.”

“But,” continues Cegłowski, “the fact that the same generic approach works across a wide range of domains should make you suspicious about how much insight it’s adding.”

I have been there. Machine learning is one of the most seductive branches of computer science, and in my experience is a very “easy sell” to people – I use it in my job in actual engineering applications where it can be eerily effective.

But if algorithms are so clever and know us so well, why are we using them only to shuffle the order of music? Why not cut out the middleman and get the computer to compose the music for us directly? The answer is obvious: it doesn’t work because we don’t know how the human brain works, and it is not predictable. By extension, the algorithms that purport to help us in matters of taste don’t actually work either. As the Guardian article says, all we are responding to is the novelty of the idea.

Pop and click remover, old electronics magazines

Just saw a short article about a new product that aims to remove the pops and clicks from vinyl records. It…

…digitizes the signal at 192/24 bit resolution and then uses a “non-destructive” real time program that removes pops and clicks without, the company claims, damaging the music.

…In addition to real-time, non-destructive click & pop Removal the SC-1 features user controllable click & pop removal “strength”, a pushbutton audiophile-grade “bypass” that lets you hear non-digitized versus digitized signal (for when you don’t need pop and click removal), iOS and Android mobile app control and 192/24 bit hi-res digital processing.

Of course it is highly ironic that a vinyl enthusiast should need the services of the digital world to improve the sound of his recordings. And it is obvious (surely) that the digital stream could be stored for later replay without needing to further degrade the original vinyl or wear out the multi-thousand dollar stylus that is no doubt being used. (Omitting to mention the most obvious idea of just listening to a digital recording…)

The aim of the product reminded me of a certain project in an old electronics magazine, a huge number of which I still have in a set of bookshelves that I haven’t touched since 1990 – the date of the last magazine I seem to have bought. Sifting through them, it is amazing how familiar the front covers still are –  a measure of the intensity of youthful hobbies.


From Electronics Today International in April 1979, the project I remembered was a ‘Click Eliminator’ for vinyl records based on an analogue CCD delay line. The idea was to insert a few milliseconds of silence in place of the offensive click. Here’s how it worked:


Electronics Today International was the magazine I would go to WH Smiths for on a Saturday, being terribly disappointed if the latest issue wasn’t in. I would say more than 50% of issues featured an audio or hi-fi project: from 1982 an active speaker project for example, or from 1986 “Can Valves make a comeback?” with an accompanying valve amp project. There were any number of MOSFET amps, phono pre-amps, tape noise reduction units. Electronic music featured prominently with projects for effects pedals and synthesisers galore. I devoured this stuff.

Other magazines included: Practical Electronics, Wireless World, Everyday Electronics, Elektor, Electronics and Music Maker, and one I didn’t recall Hobby Electronics. I also bought any number of computer magazines. I have never thrown any away, so I have hundreds of them gathering dust.

Kii Three Review

Just saw this review of the Kii Threes by mastering engineer Bob Macc. He seems rather pleased with them:

…everything is just tight, accurate, and not smeared in time or pressed-sounding. Kick drums on these speakers are ridiculous in their tightness and accuracy. Acoustic and electric basses are the very definition of the word ‘articulate’. The time-coherency extends, of course, across the whole spectrum. Transients are, well, transients.

The imaging on these speakers is also absolutely unbelievable, in all dimensions. The front to back depth is unreal; room information is conveyed incredibly well. You’re there. In fact it might be the depth that astounded me more than anything else. The stereo image is absolutely enormous, involving, and everything sounds real. Drums pop out like drums do (or don’t, if they don’t). The main acoustic guitar in Holland and Habichuela’s ‘Hands’ was unbelievably real. The sound is huge, and absolutely pristine in all regards.

They are incredibly revealing – I heard things in tracks I know extremely well that I have never heard before. I heard micro-movement inside tracks from compression/sidechaining that I’ve never heard before. I heard mistakes in work by very famous engineers in tracks I’ve listened to a million times. I heard mistakes in my own work (tiny ones, I promise!) that I absolutely would not have allowed to pass had I heard them previously. That kind of says it all.

The whole time though, you’re thinking; ‘how do those tiny little speakers make all that sound?!’. With eyes closed and a good-sounding track playing, the room is absolutely full of sound. When you open your eyes, it’s almost as if the illusion is destroyed – there’s simply no way those little things can produce all that sound. But they do, and they do it easily and effortlessly.

I really am going to have to get in touch with the chap at Purité Audio (who posted this review on HiFi Wigwam) and see if he’ll let me have a listen…

Auditory Scene Analysis

There is a field of study called Auditory Scene Analysis (ASA) that postulates that humans interpret “scenes” using sound just as they do using vision. I am not sure that it necessarily has any particular bearing on the way that audio hardware should be designed: basically the scene is all the clearer if the reproduction of the audio is clean in terms of noise, channel separation, distortion, frequency response and (seemingly controversial to hi-fi folk) the time domain.

However, the seminal work in this field includes the following analogy for hearing:

Your friend digs two narrow channels up from the side of a lake. Each is a few feet long and a few inches wide and they are spaced a few feet apart. Halfway up each one, your friend stretches a handkerchief and fastens it to the sides of the channel. As the waves reach the side of the lake they travel up the channels and cause the two handkerchiefs to go into motion. You are allowed to look only at the handkerchiefs and from their motions to answer a series of questions: How many boats are there on the lake and where are they? Which is the most powerful one? Which one is closer? Is the wind blowing? Has any large object been dropped suddenly into the lake?

Of course, when we listen to reproduced music with an audio system we are, in effect, duplicating the motion of the handkerchiefs using two paddles in another lake (our listening room) and watching the motion of a new pair of handkerchiefs. Amazingly, it works! But the key to this is that the two lakes are well-defined linear systems. Our brains can ‘work back’ to the original sounds using a process akin to ‘blind deconvolution’.

If we want to, we can eliminate the ‘second lake’ by using headphones, or we can almost eliminate it by using an anechoic chamber. We could theoretically eliminate it at a single point in space by deconvolving the reproduced signal with the measured impulse response of the room at that point. Listening with headphones works OK, but listening to speakers in a dead acoustic sounds terrible – probably to do with ‘head related transfer function’ (HRTF) telling us that we are listening to a ‘real’ acoustic but with an absence of the expected acoustic cues when we move our heads. By adding the ‘second lake’ we create enough ‘real acoustic’ to overcome that.

But here is why ‘room correction’ is flawed. The logical conclusion of room correction is to simulate headphones, but this cannot be achieved – and is not what most listeners want anyway, even if they don’t know it. Instead, an incomplete ‘correction’ is implemented based on the idea of trying to make the motion of the two sets of ‘handkerchiefs’ closer to each other than they (in naive measurements) appear to be. If the idea of the brain ‘working back’ to the original sound is correct, it will ‘work back’ to a seemingly arbitrarily modified recording. Modifying the physical acoustics of the room is valid whereas modifying the signal is not.

I think the problem stems ultimately from an engineering tool (frequency domain measurement) proliferating due to cheap computing power. There is a huge difference in levels of understanding between the author of the ASA book and the audiophiles and manufacturers who think that the sound is improved by tweaking graphic equalisers in an attempt to compensate for delays that the brain has compensated for already.



How often do you stumble across an album by an artist you were completely unaware of, and find that it’s as good as anything you’ve heard in your life? It’s nice when it happens!

Ever heard of Jobriath? I hadn’t. It seems he was going to be the next big thing in 1973 but the world wasn’t quite ready for him. Anyway, I just listened to the album Jobriath which I have somehow managed to miss until now. Fantastic music with unusual arrangements and unexpected twists and turns – check out the piano part on Inside. Beautiful, fresh recording. Surely this is as good as David Bowie or Elton John. The highlight of the album for me is I’m a Man which, among its many virtues, uses a harpsichord to great effect.

The Man in the White Suit


There’s a brilliant film from the 1950s called The Man in the White Suit. It’s a satire on capitalism, the power of the unions, and the story of how the two sides find themselves working together to oppose a new invention that threatens to make several industries redundant.

I wonder if there’s a tenuous resemblance between the film’s new wonder-fabric and the invention of digital audio? I hesitate to say that it’s exactly the same, because someone will point out that in the end, the wonder-fabric isn’t all it seems and falls apart, but I think they do have these similarities:

  1. The new invention is, for all practical purposes, ‘perfect’, and is immediately superior to everything that has gone before.
  2. It is cheap – very cheap – and can be mass-produced in large quantities.
  3. It has the properties of infinite lifespan, zero maintenance and non-obsolescence
  4. It threatens the profits not only of the industry that invented it, but other related industries.

In the film it all turns a bit dark, with mobs on the streets and violence imminent. Only the invention’s catastrophic failure saves the day.

In the smaller worlds of audio and music, things are a little different. Digital audio shows no signs of failing, and it has taken quite a few years for someone to finally come up with a comprehensive, feasible strategy for monopolising the invention while also shutting the Pandora’s box that was opened when it was initially released without restrictions.

The new strategy is this:

  1. Spread rumours that the original invention was flawed
  2. Re-package the invention as something brand new, with a vagueness that allows people to believe whatever they want about it
  3. Deviate from the rigid mathematical conditions of the original invention, opening up possibilities for future innovations in filtering and “de-blurring”. The audiophile imagination is a potent force, so this may not be the last time you can persuade them to re-purchase their record collections, after all.
  4. Offer to protect the other, affected industries – for a fee
  5. Appear to maintain compatibility with the original invention – for now – while substituting a more inconvenient version with inferior quality for unlicensed users
  6. Through positive enticements, nudge users into voluntarily phasing out the original invention over several years.
  7. Introduce stronger protection once the window has been closed.

It’s a very clever strategy, I think. Point (2) is the master stroke.

Hi-Fi Sci-Fi

stone tape

Last night I watched a BBC TV play from 1972 called The Stone Tape. An electronics company installs its R&D department in an old mansion, with the aim of developing “a new recording medium”. Tape is, apparently, “too delicate and it loses its memory”. They stumble upon a possible ready-made solution in a room in the oldest part of the house, which seems to have a ‘ghost’ – a Victorian maid frozen in time just before she fell to her death. What if it’s not a ghost, but a ‘recording’ of an event that has somehow become embedded in the stone itself? Maybe this could be “the big one” they have been looking for…

What I particularly liked about it, was the idea that – hard to believe – there once was a time before the world went digital, and when everything was still up for grabs. Digital computers do play a role in the story, but only as a way of “correlating” the experimental results in order to spot possible connections that a human might miss.

It’s also a well-observed portrayal of life in a certain kind of company – some of it seemed very familiar.

The pickle that listening-based ‘science’ gets us into

fmri-salmonJust expanding on an earlier post: some thoughts on ‘audio science’ and its observation that human perception of sound is often influenced by our imagination. Blind testing doesn’t eliminate our imagination, merely prevents it from biasing the result of the test. We can still imagine anything we like when switching between A and B – and under such conditions, the imagination is likely to flourish. In amongst these high levels of imaginary ‘noise’, audio ‘scientists’ think that the magical powers of statistical formulae can enable them to discern audible differences that the test subjects didn’t even know they had heard. Or they can confidently state that no difference was heard. Such confidence in the validity of their statistics brings to mind a study of the brain of a dead fish that, with the dumb application of dumb formulae, could be interpreted as responding to images flashed up in front of its eyes.

Outside the laboratory, there is an awkward shift when the people who espouse ‘audio science’ want to sell us their products, or even to buy something themselves. It is their implicit position that any demonstration of the product in a showroom or the customer’s own home, is a sham. Customers – including themselves – are malleable creatures that imagine what they are persuaded to hear. Even the ‘science’ that has been used in the equipment’s creation and is promoted in advertising (or, indeed is advertising), feeds back into how people perceive the sound. The audio scientist/objectivist is in a completely paradoxical position where they cannot even know whether they actually like something! They must acknowledge that they only think they like something on that particular day in that particular showroom, or in their own workshop as they tweak their crossover design. They could conduct their own blind listening tests to establish their preference scientifically, but how many of these would they have to run in order to cover every permutation of the variables when they change a setting? Much more than a lifetime’s-worth. And as discussed before, there is no way to tell whether taking part in a listening trial affects our ability to discern differences, anyway.

The only way out of this impasse while maintaining the listening trial dogma is to argue that statistics from blind listening tests carried out by others can tell us what to like on the basis of sheer scale, and the probability that our hearing preferences are the same as everyone else’s. But do we then allow just anyone off the street to tell us what is best, or do we use “trained” listeners? The former would seem just silly, but the latter leaves the whole scheme open to accusations of incestuousness and circularity. In a deft move, a claim is made that trained listeners still register the same average preferences that ordinary people would over thousands of tests, but that they do it more clearly and decisively. Attempting to determine whether this is in fact the case would be such a circular absurdity that people can only accept it on faith. This is one of audio science’s self-deluding sleights of hand: being as rigorous as anyone could like in the execution of the actual trials, but basing the premises of the experiment and the conclusions to be drawn from it on the flimsiest of hand-waving. Of course, as this is science, anyone can challenge those conclusions or conduct their own experiments, but this just replaces one flimsy assertion with another.

Why is this all so difficult and farcical? Well, I think it is because science has no meaning in the world of ‘art’ so our troubles start there. Technically, perhaps it can be argued that we are only attempting to use science to create hardware for reproducing art – which doesn’t sound too difficult. But when we say “reproducing” do we mean “most accurate”, or “most preferred”? The fact that anyone would go to the lengths of using listening trials is a giveaway that they are not sure whether they know (a) how to determine “most accurate” objectively, (b) whether listeners actually want accuracy in the context of listening to music in their own homes. and (c) whether recordings created while monitoring on existing speakers should be reproduced accurately anyway.  

At this point, the entire enterprise is doomed to circularity and farce. The human trial participant is subjected to reproduced ‘art’ (but not the original!) and either by directly registering preferences, or indirectly by registering differences, is assumed to be capable of determining the ‘best’ method of reproducing that art. ‘Art’ is the thing that no one can define – the thing that is supposed to affect us emotionally in ways that cannot be predicted. Using it as the stimulus to gauge human reaction to the hardware is not obviously compatible with science is it?

In contrast, it is perfectly rational to admit that scientific experiments cannot tell us the best way to reproduce art. It is perfectly rational to simply work out on paper a likely way of doing it, then build it and listen to it. We will never know scientifically whether we actually like it, because this is beyond the remit of science. But this doesn’t stop us from enjoying it, anyway. In a normal setting we are not entirely slave to our imaginations – we can make a fair assessment of when something is obviously good or bad.

Rather than the (pseudo)scientific blind listening test, I think there is a much more fruitful test. It is the ultimate ‘sighted’ test, that suppresses imaginary differences, and is only possible because of DSP – which can be used to simulate the characteristics of real world hardware in many ways. The test is this: While listening, be allowed to change whatever parameter you like using DSP and hear the result instantaneously. Change one variable and flick backwards and forwards between two values while listening. Or change several variables simultaneously if you like. Close your eyes while pressing the supplied ‘random’ button and see if you were right. Such a test would condense a lifetime’s worth of exhaustive listening trials into a few minutes or hours of ‘fun’ that is much more representative of normal listening than the dreary alternative. (For example, with my own system I can make instantaneous radical changes to the crossovers that other people can only achieve in a much more limited way with huge effort and long intervals of silent soldering in between). It isn’t science. It won’t tell you definitively what you prefer, or what you are sensitive to in normal listening, but it will certainly put into context the scale of the changes you have to make in order to hear a ‘night and day’ difference. It allows an instantaneous comparison between various types of technology that could never be achieved otherwise. It could help lay to rest a few audio demons.

KEF Concord in print

I just noticed that Ken Kessler’s lavish book on the history of KEF contains several pages on the Concord – the speaker I have been re-building in active form. He makes it sound like a much better speaker than I found it to be prior to conversion, but maybe I just had a bad pair.

The mark IV version looked subtly cheaper and less sophisticated than the III due to small details like the badge, base plinth which was now plastic and the texture of the all-round fabric. It had a removable plastic cap on the top of the enclosure, and it seems that this was to allow users to change the ‘sock’ for different colours, although no one ever bought anything but black and brown, leaving warehouses full of the other colours – how I would love to have some of them now!

There’s also a story of one of the bosses getting his wife to try one on as a boob tube…

UPDATE 07/10/16

I bought some KEF Celeste IV (the Concord’s smaller sister) for the original stands, in order to use them with my Concords. It isn’t all that straightforward to re-use the stands with my version III speakers, however – some engineering is going to be required. One Celeste tweeter wasn’t working so I replaced both with the ones from my Concords which are supposedly the same type. They now actually sound quite good – much better than I remember my Concords sounding.