Active crossover with Raspberry Pi?

I was a bit bored this afternoon and finally managed to put myself into the frame of mind to try transplanting my active crossover software onto a Raspberry Pi.

It turns out it works, but it’s a bit delicate: although CPU usage seems to be about 30% on average, extra activity on the RPi can cause glitches in the audio. But I have established in principle that the RPi can do it, and that the software can simply be transplanted from a PC to the RPi – quite an improbable result I think!

A future-proof DSP box?

What I’d like to do is: build a box that can implement my DSP ‘formula’, that isn’t connected to the internet, takes in stereo S/PDIF, and gives out six channels of analogue.

Is this the way to get a future-proof DSP box that the Powers-That-Be can’t continually ‘upgrade’ into obsolescence? In other words, I would always be able to connect the latest PCs, streamers, Chromecast to it without relying on the same box having to be the source of the stereo audio itself (which currently means that every time it is booted it up it could stop working because of some trivial – or major – change that breaks the system). Witness only this week where Spotify has ‘upgraded’ its system and consigned many dedicated smart speakers’ streaming capability into oblivion. The only way to keep up with such changes is to be an IT-support person, staying current with updates and potentially making changes to code.

To avoid this, surely there will always have to be cheap boxes that connect to the internet and give out S/PDIF or TOSLink, maintained by genuine IT-support people, rather than me having to do it. (Maybe not…. It’s possible that if fitment of MQA-capable chips becomes universal in all future consumer audio hardware, they could eventually decide it is viable to enable full data encryption and/or restrict access to unencrypted data to secure, licensed hardware only).

It’s unfortunate, because it automatically means an extra layer of resampling in the system (because the DAC’s clock is not the same as the source’s clock), but I can persuade myself that it’s transparent. If the worst comes to the very worst in future, the box could also have analogue inputs, but I hope it doesn’t come to that.

This afternoon’s exercise was really just to see if it could be done with an even cheaper box than a fanless PC and, amazingly, it can! I don’t know if anyone else out there is like me, but while I understand the guts of something like DSP, it’s the peripheral stuff I am very hazy on. To me, to be able to take a system that runs on an Intel-based PC and make it run on a completely different processor and chipset without major changes is so unlikely that I find the whole thing quite pleasing.

[UPDATE 18/02/18] This may not be as straightforward as I thought. I have bought one of these for its S/PDIF input (TOSLink, actually). This works (being driven by a 30-year old CD player for testing), but it has focused my mind on the problem of sample clock drift:

My own resampling algorithm?

S/PDIF runs at the sender’s own rate, and my DAC will run at a slightly different rate. It is a very specialised thing to be able to reconcile the two, and I am no longer convinced that Linux/Alsa has a ready-made solution. I am feeling my way towards implementing my own resampling algorithm..!

At the moment, I regulate the sample rate of a dummy loopback driver that draws data from any music player app running on the Linux PC. Instead of this, I will need to read data in at the S/PDIF sample rate and store it in the circular buffer I currently use. The same mechanism that regulates the rate of the loopback driver will now control the rate at which data is drawn from this circular buffer for processing, and the values will need to be interpolated in between the stored values using convolution with a windowed sinc kernel. It’s an horrendous amount of calculation that the CPU will have to do for each and every output sample – probably way beyond the capabilities of the Raspberry Pi I’m afraid. This problem is solved in some sound cards by using dedicated hardware to do resampling, but if I want to make a general purpose solution to the problem, I will need to bite the bullet and try to do it in software. Hopefully my Intel Atom-based PC will be up to the job. It’s a good job that I know that high res doesn’t sound any different to 16/44.1 otherwise I could be setting myself up for needing a supercomputer.

[UPDATE 20/02/18] I couldn’t resist doing some tests and trials with my own resampling code.

Resampling Experiments

First, to get a feel for the problem and how much computing power it will take, I tried running some basic multiplies and adds on a Windows laptop programmed in ‘C’. If using a small filter kernel size of 51 and assuming two sweeps of two pre-calculated kernels per sample (then a trivial interpolation between), it could only just keep up with stereo CD in real time. Disappointing, and a problem if the PC is having to do other stuff. But then I realised that the compiler had all optimisations turned off. Optimising for maximum speed, it was blistering! At least 20x real time.

I tried on a Raspberry Pi. Even it could keep up at 3x real time.

There may be other tricks to try as well, including processor-specific optimisations and programming for ‘SIMD’ (apparently where the CPU does identical calculations on vectors i.e. arrays of values, simultaneously) or kicking off threads to work on parts of the calculation where the operating system is able to share the tasks optimally across the processor cores. Or maybe that’s what the optimisation is doing, anyway.

There is also the possibility that for a larger (higher quality) kernel (say >256 values), an FFT might be a more economical way of doing the convolution.

Either way, it seems very promising.

Lanczos Kernel

I then wrote a basic system for testing the actual resampling in non-real time. This is based on the idea of wanting to, effectively, perform the job of a DAC reconstruction filter in software, and then to be able to pick the reconstructed value at any non-integer sample time. To do this ‘properly’ it is necessary to sweep the samples on either side of the desired sample time with a sinc kernel i.e. convolve it. Here’s where it gets interesting. The kernel can be created so that its elements’ values compute the kernel as though centred on the exact non-integer sample time desired, even though it is aligned and calculated on the integer sample times.

It would be possible to calculate on-the-fly a new, exact kernel for every new sample, but this would be very processor intensive, involving many calculations. Instead, it is possible to pre-calculate a range of kernels that represent a few fractional positions between adjacent samples. In operation, the two kernels on either side of the desired non-integer sample time are swept and accumulated, and then linear interpolation between these two values used to find the value representing the exact sample time.

You may be horrified at the thought of linear interpolation until you realise that several thousand kernels could be pre-calculated and stored in memory, so that the error of the linear interpolation would be extremely small indeed.

Of course a true sinc function would extend to plus and minus infinity, so for practical filtering it needs to be windowed i.e. shortened and tapered to zero at the edges. Apparently – and I am no mathematician – the best window is a widened duplicate of the sinc function’s central lobe, and this is known as the Lanczos Kernel.

Using this arrangement I have been resampling some floating point sine waves at different pitches and examining the results in the program Audacity. The results when the spectrum is plotted seem to be flawless.

The exact width (and therefore quality) of the kernel and how many filters to create are yet to be determined.

[Another update] I have put the resampling code into the active crossover program running on an Intel Atom fanless PC. It has no trouble performing the resampling in real time – much to my amazement – so I now have a fully functional system that can take in TOSLink (from a CD player at the moment) and generate six analogue output channels for the two KEF-derived three-way speakers. Not as truly ‘perfect’ as the previous system that controls the rate at which data arrives, but not far off.

[Update 01/03/18] Everything has worked out OK, including the re-sampling described in a later post. I actually had it working before I managed to grasp fully in my head how it worked! But the necessary mental adjustments have been made, now.

However, I am finding that the number of platforms that provide S/PDIF or TOSLink outputs ‘out-of-the-box’ without problems is very small.

I would simply have bought a Chromecast Audio as the source, but apparently its Ogg Vorbis encoded lossy bit rate is limited to 256kbps with Spotify as the source (which is what I might be planning to use for these tests) as opposed to the 320 kbps that it uses with a PC.

So I thought I could just use a cheap USB sound card with a PC, but found that with Linux it did a very stupid thing: turned off the TOSLink output when no data was being written to it – which is, of course, a nightmare for the receiver software to deal with, especially if it is planning to base its resampling ratio on the received sample rate.

I then began messing around with old desktop machines and PCI sound cards. The Asus Xonar DS did the same ridiculous muting thing in Linux. The Creative X-Fi looked as though it was going to work, but then sent out 48 kHz when idling, and switched to the desired 44.1 kHz when sending music. Again, impossible for the receiver to deal with, and I could find no solution.

Only one permutation is working: Creative X-Fi PCI card in a Windows 7 machine with a freeware driver and app because Creative seemingly couldn’t be bothered to support anything after XP. The free app and driver is called ‘PAX’ and looks like an original Creative app – my thanks to Robert McClelland. Using it, it is possible to ensure bit perfect output, and in the Windows Control Panel app it is possible to force the output to 16 bit 44.1 kHz which is exactly what I need.

[Update 03/03/18] The general situation with TOSLink, PCs and consumer grade sound cards is dire, as far as I can tell. I bought one of these ubiquitous devices thinking that Ubuntu/Linux/Alsa would, of course, just work with it and TOSLink.

USB 6 Channel 5.1 External SPDIF Optical Digital Sound Card Audio Adapter for PC

It is reputedly based on the CM6206. At least the TOSLink output stays on all the time with this card, but it doesn’t work properly at 44.1 kHz even though Alsa seems happy at both ends: if you listen to a 1kHz sine wave played over this thing, it has a cyclic discontinuity somewhere – like it’s doing nearest neighbour resampling from 48 to 44.1 or something like that..? As a receiver it seems to work fine.

With Windows, it automatically installs drivers, but Control Panel->Manage Audio Devices->Properties indicates that it will only do 48 kHz sample rate. Windows probably does its own resampling so that Spotify happily works with it, and if I run my application expecting a 48 kHz sample rate, it all works – but I don’t want that extra layer of resampling.

As mentioned earlier I also bought one of these from Maplin (now about to go out of business). It, too, is supposedly based on the CM6206:

Under Linux/Alsa I can make it work as TOSLink receiver, but cannot make its output turn on except for a brief flash when plugging it in.

In Windows you have to install the driver (and large ‘app’ unfortunately) from the supplied CD. This then gives you the option to select various sample rates, etc. including the desired 44.1 kHz. Running Spotify, everything works except… when you pause, the TOSLink output turns off after a few seconds. Aaaaaghhh!

This really does seem very poor to me. The default should be that TOSLink stays on all the time, at a fixed, selected sample rate. Anything else is just a huge mess. Why are they turning it off? Some pathetic ‘environmental’ gesture? I may have to look into whether S/PDIF from other types of sound card is constantly running all the time, in which case a USB-S/PDIF sound card feeding a super-simple hardware-based S/PDIF-to-TOSLink converter would be a reliable solution – or simply use S/PDIF throughout, but I quite like the idea of the electrical isolation from TOSLink.

It’s not that I need this in order to listen to music, you understand – the original ‘bit perfect’ solution still works for now, and maybe always will – but I am just trying to make SPDIF/TOSLink work in principle so that I have a more general purpose, future-proof, system.

Advertisements

Data Over Sound

Just saw this mentioned. It’s interesting how an idea that, years ago, was just a method of harnessing existing technology, can re-appear as something funky and brand new. It joins those other technologies that aim to get data into our devices via cost-free, non-contact interfaces, such as QR Codes.

What is Chirp?

A Chirp™ is a sonic barcode. With Chirp technology, data and content can be encoded into a unique audio stream. Any device with a speaker can transmit a chirp and most devices with a microphone can decode them.

People of a certain age will be familiar with the use of audio cassettes as storage for their microcomputer programs back in the 1980s – I think I used reel-to-reel for a time.

I also remember, round about 1980, sending a computer program over the phone to a friend’s house by holding the phone close to the speaker and picking the sound up at the other end with a microphone. As I recall, our version wasn’t really very reliable or practical, but I think we did succeed in sending a short program. Obviously we were inspired by the audio coupler modems that we might have seen in films and documentaries.

full

SMPTE and MIDI timecodes can be recorded as audio signals on analogue tape and can survive multiple transfers and, I dare say, would be robust enough to work over a speaker-microphone link.

In the 1990s we were all familiar with ‘the sound of data’ when we used dial-up modems.

Over the years we have also had DTMF dialling, audio watermarking, Shazam, Siri, Alexa etc. and phone-based automated systems using speech recognition, all of which have to deal with extracting ‘data’ from noisy audio. You would think that the new audio barcodes should be pretty simple to make work reliably.

The Man in the White Suit

man-in-the-white-suit

There’s a brilliant film from the 1950s called The Man in the White Suit. It’s a satire on capitalism, the power of the unions, and the story of how the two sides find themselves working together to oppose a new invention that threatens to make several industries redundant.

I wonder if there’s a tenuous resemblance between the film’s new wonder-fabric and the invention of digital audio? I hesitate to say that it’s exactly the same, because someone will point out that in the end, the wonder-fabric isn’t all it seems and falls apart, but I think they do have these similarities:

  1. The new invention is, for all practical purposes, ‘perfect’, and is immediately superior to everything that has gone before.
  2. It is cheap – very cheap – and can be mass-produced in large quantities.
  3. It has the properties of infinite lifespan, zero maintenance and non-obsolescence
  4. It threatens the profits not only of the industry that invented it, but other related industries.

In the film it all turns a bit dark, with mobs on the streets and violence imminent. Only the invention’s catastrophic failure saves the day.

In the smaller worlds of audio and music, things are a little different. Digital audio shows no signs of failing, and it has taken quite a few years for someone to finally come up with a comprehensive, feasible strategy for monopolising the invention while also shutting the Pandora’s box that was opened when it was initially released without restrictions.

The new strategy is this:

  1. Spread rumours that the original invention was flawed
  2. Re-package the invention as something brand new, with a vagueness that allows people to believe whatever they want about it
  3. Deviate from the rigid mathematical conditions of the original invention, opening up possibilities for future innovations in filtering and “de-blurring”. The audiophile imagination is a potent force, so this may not be the last time you can persuade them to re-purchase their record collections, after all.
  4. Offer to protect the other, affected industries – for a fee
  5. Appear to maintain compatibility with the original invention – for now – while substituting a more inconvenient version with inferior quality for unlicensed users
  6. Through positive enticements, nudge users into voluntarily phasing out the original invention over several years.
  7. Introduce stronger protection once the window has been closed.

It’s a very clever strategy, I think. Point (2) is the master stroke.