I was a bit bored this afternoon and finally managed to put myself into the frame of mind to try transplanting my active crossover software onto a Raspberry Pi.
It turns out it works, but it’s a bit delicate: although CPU usage seems to be about 30% on average, extra activity on the RPi can cause glitches in the audio. But I have established in principle that the RPi can do it, and that the software can simply be transplanted from a PC to the RPi – quite an improbable result I think!
A future-proof DSP box?
What I’d like to do is: build a box that can implement my DSP ‘formula’, that isn’t connected to the internet, takes in stereo S/PDIF, and gives out six channels of analogue.
Is this the way to get a future-proof DSP box that the Powers-That-Be can’t continually ‘upgrade’ into obsolescence? In other words, I would always be able to connect the latest PCs, streamers, Chromecast to it without relying on the same box having to be the source of the stereo audio itself (which currently means that every time it is booted it up it could stop working because of some trivial – or major – change that breaks the system). Witness only this week where Spotify has ‘upgraded’ its system and consigned many dedicated smart speakers’ streaming capability into oblivion. The only way to keep up with such changes is to be an IT-support person, staying current with updates and potentially making changes to code.
To avoid this, surely there will always have to be cheap boxes that connect to the internet and give out S/PDIF or TOSLink, maintained by genuine IT-support people, rather than me having to do it. (Maybe not…. It’s possible that if fitment of MQA-capable chips becomes universal in all future consumer audio hardware, they could eventually decide it is viable to enable full data encryption and/or restrict access to unencrypted data to secure, licensed hardware only).
It’s unfortunate, because it automatically means an extra layer of resampling in the system (because the DAC’s clock is not the same as the source’s clock), but I can persuade myself that it’s transparent. If the worst comes to the very worst in future, the box could also have analogue inputs, but I hope it doesn’t come to that.
This afternoon’s exercise was really just to see if it could be done with an even cheaper box than a fanless PC and, amazingly, it can! I don’t know if anyone else out there is like me, but while I understand the guts of something like DSP, it’s the peripheral stuff I am very hazy on. To me, to be able to take a system that runs on an Intel-based PC and make it run on a completely different processor and chipset without major changes is so unlikely that I find the whole thing quite pleasing.
[UPDATE 18/02/18] This may not be as straightforward as I thought. I have bought one of these for its S/PDIF input (TOSLink, actually). This works (being driven by a 30-year old CD player for testing), but it has focused my mind on the problem of sample clock drift:
My own resampling algorithm?
S/PDIF runs at the sender’s own rate, and my DAC will run at a slightly different rate. It is a very specialised thing to be able to reconcile the two, and I am no longer convinced that Linux/Alsa has a ready-made solution. I am feeling my way towards implementing my own resampling algorithm..!
At the moment, I regulate the sample rate of a dummy loopback driver that draws data from any music player app running on the Linux PC. Instead of this, I will need to read data in at the S/PDIF sample rate and store it in the circular buffer I currently use. The same mechanism that regulates the rate of the loopback driver will now control the rate at which data is drawn from this circular buffer for processing, and the values will need to be interpolated in between the stored values using convolution with a windowed sinc kernel. It’s an horrendous amount of calculation that the CPU will have to do for each and every output sample – probably way beyond the capabilities of the Raspberry Pi I’m afraid. This problem is solved in some sound cards by using dedicated hardware to do resampling, but if I want to make a general purpose solution to the problem, I will need to bite the bullet and try to do it in software. Hopefully my Intel Atom-based PC will be up to the job. It’s a good job that I know that high res doesn’t sound any different to 16/44.1 otherwise I could be setting myself up for needing a supercomputer.
[UPDATE 20/02/18] I couldn’t resist doing some tests and trials with my own resampling code.
First, to get a feel for the problem and how much computing power it will take, I tried running some basic multiplies and adds on a Windows laptop programmed in ‘C’. If using a small filter kernel size of 51 and assuming two sweeps of two pre-calculated kernels per sample (then a trivial interpolation between), it could only just keep up with stereo CD in real time. Disappointing, and a problem if the PC is having to do other stuff. But then I realised that the compiler had all optimisations turned off. Optimising for maximum speed, it was blistering! At least 20x real time.
I tried on a Raspberry Pi. Even it could keep up at 3x real time.
There may be other tricks to try as well, including processor-specific optimisations and programming for ‘SIMD’ (apparently where the CPU does identical calculations on vectors i.e. arrays of values, simultaneously) or kicking off threads to work on parts of the calculation where the operating system is able to share the tasks optimally across the processor cores. Or maybe that’s what the optimisation is doing, anyway.
There is also the possibility that for a larger (higher quality) kernel (say >256 values), an FFT might be a more economical way of doing the convolution.
Either way, it seems very promising.
I then wrote a basic system for testing the actual resampling in non-real time. This is based on the idea of wanting to, effectively, perform the job of a DAC reconstruction filter in software, and then to be able to pick the reconstructed value at any non-integer sample time. To do this ‘properly’ it is necessary to sweep the samples on either side of the desired sample time with a sinc kernel i.e. convolve it. Here’s where it gets interesting. The kernel can be created so that its elements’ values compute the kernel as though centred on the exact non-integer sample time desired, even though it is aligned and calculated on the integer sample times.
It would be possible to calculate on-the-fly a new, exact kernel for every new sample, but this would be very processor intensive, involving many calculations. Instead, it is possible to pre-calculate a range of kernels that represent a few fractional positions between adjacent samples. In operation, the two kernels on either side of the desired non-integer sample time are swept and accumulated, and then linear interpolation between these two values used to find the value representing the exact sample time.
You may be horrified at the thought of linear interpolation until you realise that several thousand kernels could be pre-calculated and stored in memory, so that the error of the linear interpolation would be extremely small indeed.
Of course a true sinc function would extend to plus and minus infinity, so for practical filtering it needs to be windowed i.e. shortened and tapered to zero at the edges. Apparently – and I am no mathematician – the best window is a widened duplicate of the sinc function’s central lobe, and this is known as the Lanczos Kernel.
Using this arrangement I have been resampling some floating point sine waves at different pitches and examining the results in the program Audacity. The results when the spectrum is plotted seem to be flawless.
The exact width (and therefore quality) of the kernel and how many filters to create are yet to be determined.
[Another update] I have put the resampling code into the active crossover program running on an Intel Atom fanless PC. It has no trouble performing the resampling in real time – much to my amazement – so I now have a fully functional system that can take in TOSLink (from a CD player at the moment) and generate six analogue output channels for the two KEF-derived three-way speakers. Not as truly ‘perfect’ as the previous system that controls the rate at which data arrives, but not far off.
[Update 01/03/18] Everything has worked out OK, including the re-sampling described in a later post. I actually had it working before I managed to grasp fully in my head how it worked! But the necessary mental adjustments have been made, now.
However, I am finding that the number of platforms that provide S/PDIF or TOSLink outputs ‘out-of-the-box’ without problems is very small.
I would simply have bought a Chromecast Audio as the source, but apparently its Ogg Vorbis encoded lossy bit rate is limited to 256kbps with Spotify as the source (which is what I might be planning to use for these tests) as opposed to the 320 kbps that it uses with a PC.
So I thought I could just use a cheap USB sound card with a PC, but found that with Linux it did a very stupid thing: turned off the TOSLink output when no data was being written to it – which is, of course, a nightmare for the receiver software to deal with, especially if it is planning to base its resampling ratio on the received sample rate.
I then began messing around with old desktop machines and PCI sound cards. The Asus Xonar DS did the same ridiculous muting thing in Linux. The Creative X-Fi looked as though it was going to work, but then sent out 48 kHz when idling, and switched to the desired 44.1 kHz when sending music. Again, impossible for the receiver to deal with, and I could find no solution.
Only one permutation is working: Creative X-Fi PCI card in a Windows 7 machine with a freeware driver and app because Creative seemingly couldn’t be bothered to support anything after XP. The free app and driver is called ‘PAX’ and looks like an original Creative app – my thanks to Robert McClelland. Using it, it is possible to ensure bit perfect output, and in the Windows Control Panel app it is possible to force the output to 16 bit 44.1 kHz which is exactly what I need.
[Update 03/03/18] The general situation with TOSLink, PCs and consumer grade sound cards is dire, as far as I can tell. I bought one of these ubiquitous devices thinking that Ubuntu/Linux/Alsa would, of course, just work with it and TOSLink.
It is reputedly based on the CM6206. At least the TOSLink output stays on all the time with this card, but it doesn’t work properly at 44.1 kHz even though Alsa seems happy at both ends: if you listen to a 1kHz sine wave played over this thing, it has a cyclic discontinuity somewhere – like it’s doing nearest neighbour resampling from 48 to 44.1 or something like that..? As a receiver it seems to work fine.
With Windows, it automatically installs drivers, but Control Panel->Manage Audio Devices->Properties indicates that it will only do 48 kHz sample rate. Windows probably does its own resampling so that Spotify happily works with it, and if I run my application expecting a 48 kHz sample rate, it all works – but I don’t want that extra layer of resampling.
As mentioned earlier I also bought one of these from Maplin (now about to go out of business). It, too, is supposedly based on the CM6206:
Under Linux/Alsa I can make it work as TOSLink receiver, but cannot make its output turn on except for a brief flash when plugging it in.
In Windows you have to install the driver (and large ‘app’ unfortunately) from the supplied CD. This then gives you the option to select various sample rates, etc. including the desired 44.1 kHz. Running Spotify, everything works except… when you pause, the TOSLink output turns off after a few seconds. Aaaaaghhh!
This really does seem very poor to me. The default should be that TOSLink stays on all the time, at a fixed, selected sample rate. Anything else is just a huge mess. Why are they turning it off? Some pathetic ‘environmental’ gesture? I may have to look into whether S/PDIF from other types of sound card is constantly running all the time, in which case a USB-S/PDIF sound card feeding a super-simple hardware-based S/PDIF-to-TOSLink converter would be a reliable solution – or simply use S/PDIF throughout, but I quite like the idea of the electrical isolation from TOSLink.
It’s not that I need this in order to listen to music, you understand – the original ‘bit perfect’ solution still works for now, and maybe always will – but I am just trying to make SPDIF/TOSLink work in principle so that I have a more general purpose, future-proof, system.