How to re-sample an audio signal

As I mentioned earlier, I would like to have the flexibility of using digital audio data that emanates externally from the PC that is performing the DSP, and this necessarily will have a different sample clock from the DAC. Something has got to give!

If the input was analogue, you would just sample it with an ADC locked to your DAC’s sample rate, and then the source’s own sample rate wouldn’t matter to you. With a standard digital audio source (e.g. S/PDIF) you need to be able to do the same thing but purely in software. The incoming sampled data points are notionally turned into a continuous waveform in memory by duplicating a DAC reconstruction filter using floating point maths. You can then sample it wherever you want at a rate locked to the DAC’s sample rate.

You still ‘eat’ the incoming data at the rate at which it comes in, but you vary the number of samples that you ‘decimate’ from it (very, very slightly).

The control algorithm for locking this re-sampling to the DAC’s sample rate is not completely trivial because the PC’s only knowledge of the sample rates of the DAC and S/PDIF are via notifications that large chunks of data have arrived or left, with unknown amounts of jitter. It is only possible to establish an accurate measure of relative sample rates with a very long time constant average. In reality the program never actually calculates the sample rate at all, but merely maintains a constant-ish difference between the read and write pointer positions of a circular buffer. It relies on adequate latency and the two sample rates being reasonably stable by virtue of being derived from crystal oscillators. The corrections will, in practice be tiny and/or occasional.

How is the interesting problem of re-sampling solved?

Well, it’s pretty new to me, so in order to experiment with it I have created a program that runs on a PC and does the following:

  1. Synthesises a test signal as an array of floating point values at a notional sample rate of 44.1 kHz. This can be a sine wave, or combination of different frequency sine waves.
  2. Plots the incoming waveform as time domain dots.
  3. Plots the waveform as it would appear when reconstructed with the sinc filter. This is a sanity check that the filter is doing approximately the right thing.
  4. Resamples the data at a different sample rate (can be specified with any arbitrary step size e.g. 0.9992 or 1.033 or whatever), using floating point maths. The method can be nearest-neighbour, linear interpolation, or sinc & linear interpolation.
  5. Plots the resampled waveform as time domain dots.
  6. Passes the result into a FFT (65536 points), windowing the data with a raised cosine window.
  7. Plots the resulting resampled spectrum in terms of frequency and amplitude in dB.

This is an ideal test bed for experimenting with different algorithms and getting a feel for how accurate they are.

Nearest-neighbour and linear interpolation are pretty self explanatory methods; the sinc method is similar to that described here:

https://www.dsprelated.com/freebooks/pasp/Windowed_Sinc_Interpolation.html

I haven’t completely reproduced (or necessarily understood) their method, but I was inspired by this image:

\includegraphics[scale=0.8]{eps/Waveforms}

The sinc function is the ideal ‘brick wall’ low pass filter and is calculated as sin(x*PI)/(x*PI). In theory it extends from minus to plus infinity, but for practical uses is windowed so that it tapers to zero at plus or minus the desired width – which should be as wide as practical.

The filter can be set at a lower cutoff frequency than Nyquist by stretching it out horizontally, and this would be necessary to avoid aliasing if wishing to re-sample at an effectively slower sample rate.

If the kernel is slid along the incoming sample points and a point-by-point multiply and sum is performed, the result is the reconstructed waveform. What the above diagram shows is that the kernel can be in the form of discrete sampled points, calculated as the values they would be if the kernel was centred at any arbitrary point.

So resampling is very easy: simply synthesise a sinc kernel in the form of sampled points based on the non-integer position you want to reconstruct, and multiply-and-add all the points corresponding to it.

A complication is the necessity to shorten the filter to a practical length, which involves windowing the filter i.e. multiplying it by a smooth function that tapers to zero at the edges. I did previously mention the Lanczos kernel which apparently uses a widened copy of the central lobe of the sinc function as the window. But looking at it, I don’t know why this is supposed to be a good window function because it doesn’t taper gradually to zero, and at non-integer sample positions you would either have to extend it with zeroes abruptly, or accept non-zero values at its edges.

Instead, I have decided to use a simple raised cosine as the windowing function, and to reduce its width slightly to give me some leeway in the kernel’s position between input samples. At the extremities I ensure it is set to zero. It seems to give a purer output than my version of the Lanczos kernel.

Pre-calculating the kernel

Although very simple, calculating the kernel on-the-fly at every new position would be extremely costly in terms of computing power, so the obvious solution is to use lookup tables. The pre-calculated kernels on either side of the desired sample position are evaluated to give two output values. Linear interpolation can then be used to find the value at the exact position. Because memory is plentiful in PCs, there is no need to skimp on the number of pre-calculated kernels – you could use a thousand of them. For this reason, the errors associated with this linear interpolation can be reduced to negligible.

The horizontal position of the raised cosine window follows the position of the centre of the kernel for all the versions that are calculated to lie in between the incoming sample points.

All that remains is to decide how wide the kernel needs to be for adequate accuracy in the reconstruction – and this is where my demo program comes in. I apologise that there now follows a whole load of similar looking graphs, demonstrating the results with various signals and kernel sizes, etc.

1 kHz sine wave

First we can look at the standard test signal: a 1 kHz sine wave. In the following image, the original sine wave points are shown joined with straight lines at the top right, followed by how the points would look when emerging from a DAC that has a sinc-based reconstruction filter (in this case, the two images look very similar).

Next down in the three time domain waveforms comes the resampled waveform after we have resampled it to shift its frequency by a factor of 0.9 (a much larger ratio than we will use in practice). In this first example, the resampling method being used is ‘nearest neighbour’. As you can see, the results are disastrous!

sin_1k_nn

1kHz sine wave, frequency shift 0.9, nearest neighbour interpolation

The discrete steps in the output waveform are obvious, and the FFT shows huge spikes of distortion.

Linear interpolation is quite a bit better in terms of the FFT, and the time domain waveform at the bottom right looks much better.

sin_1k_li

1kHz sine wave, frequency shift 0.9, linear interpolation

However, the FFT magnitude display reveals that it is clearly not ‘hi-fi’.

Now, compare the results using sinc interpolation:

sin_1k_sinc_50_0.9

1kHz sine wave, frequency shift 0.9, sinc interpolation, kernel width 50

As you can see, the FFT plot is absolutely clean, indicating that this result is close to distortion-free.

Next we can look at something very different: a 20 kHz sine wave.

20 kHz sine wave

sin_20k_nn

20 Khz sine wave, frequency shift 0.9, nearest neighbour interpolation

With nearest neighbour resampling, the results are again disastrous. At the right hand side, though, the middle of the three time domain plots shows something very interesting: even though the discrete points look nothing like a sine wave at this frequency, the reconstruction filter ‘rings’ in between the points, producing a perfect sine wave with absolutely uniform amplitude. This is what is produced by any normal DAC – and is something that most people don’t realise; they often assume that digital audio falls apart at the top end, but it doesn’t: it is perfect.

Linear interpolation is better than nearest-neighbour, but pretty much useless for our purposes.

sin_20k_li

20kHz sine wave, frequency shift 0.9, linear interpolation

Sinc interpolation is much better!

sin_20k_sinc_50

20kHz sine wave, frequency shift 0.9, sinc interpolation, kernel size 50

However, there is an unwanted spike at the right hand side (note the main signal is at 18 kHz because it has been shifted down by a factor of 0.9). This spike appears because of inadequate width of the sinc kernel which in this case has been set at 50 (with 500 pre-calculated versions of it with different time offsets, between sample points).

If we increase the width of the kernel to 200 (actually 201 because the kernel is always symmetrical about a central point with value 1.0), we get this:

sin_20k_sinc_200

20kHz sine wave, frequency shift 0.9, sinc interpolation, kernel size 200

The spike is almost at acceptable levels. Increasing the width to 250 we get this:

sin_20k_sinc_250

20 kHz sine wave, frequency shift 0.9, sinc interpolation, kernel size 250

And at 300 we get this:

sin_20k_sinc_300

20 kHz sine wave, frequency shift 0.9, sinc interpolation, kernel size 300

Clearly the kernel width does need to be in this region for the highest quality.

For completeness, here is the system working on a more complex waveform comprising the sum of three frequencies: 14, 18 and 19 kHz, all at the same amplitude and a frequency shift of 1.01.

14 kHz, 18 kHz, 19 kHz sum

Nearest neighbour:

sin_14_18_19_nn

14, 18, 19 kHz sine wave, nearest neighbour interpolation

Linear interpolation:

sin_14_18_19_li
14, 18, 19 kHz sine wave, linear interpolation

Sinc interpolation with a kernel width of 50:

sin_14_18_19_sinc_50

14, 18, 19 kHz sine wave, sinc interpolation, kernel width 50

Kernel width increased to 250:

sin_14_18_19_sinc_250
14, 18, 19 kHz sine wave, sinc interpolation, kernel width 250

More evidence that the kernel width needs to be in this region.

Ready made solutions

Re-sampling is often done in dedicated hardware like Analog Devices’ AD1896. Some advanced sound cards like the Creative X-Fi can re-sample everything internally to a common sample rate using powerful dedicated processors – this is the solution that makes connecting digital audio sources together almost as simple as analogue.

In theory, stuff like this goes on inside Linux already, in systems like JACK – apparently. But it just feels too fragile: I don’t know how to make sure it is working, and I don’t really have any handle on the quality of it. This is a tricky problem to solve by trial-and-error because a system can run for ages without any sign that clocks are drifting.

In Windows, there is a product called “Virtual Audio Cable” that I know performs re-sampling using methods along these lines.

There are libraries around that supposedly can do resampling, but the quality is unknown – I was looking at one that said “Not the best quality” so I gave up on that one.

I have a feeling that much of the code was developed at a time when processors were much less powerful than they are now and so the algorithms are designed for economy rather than quality.

Software-based sinc resampling in practice

I have grafted the code from my demo program into my active crossover application and set it running with TOSLink from a CD player going into a cheap USB sound card (Maplin), and the output going to a better multichannel sound card (the Xonar U7). The TOSLink data is being resampled in order to keep it aligned with the DAC’s sample rate. I have had it running for 20 hours without incident.

Originally, before developing the test bed program, I set the kernel size at 50, fearing that anything larger would stress the Intel Atom CPU. However, I now realise that a width of at least 250 is necessary, so with trepidation I upped it to this value. The CPU load trace went up a bit in the Ubuntu system monitor, but not much; the cores are still running cool. The power of modern CPUs is ridiculous!! Remember that for each of the two samples arriving at 44.1 kHz, the algorithm is performing 500 floating point multiplications and sums, yet it hardly breaks into a sweat. There are absolutely no clever efficiencies in the programming. Amazing.

Advertisements

Active crossover with Raspberry Pi?

I was a bit bored this afternoon and finally managed to put myself into the frame of mind to try transplanting my active crossover software onto a Raspberry Pi.

It turns out it works, but it’s a bit delicate: although CPU usage seems to be about 30% on average, extra activity on the RPi can cause glitches in the audio. But I have established in principle that the RPi can do it, and that the software can simply be transplanted from a PC to the RPi – quite an improbable result I think!

A future-proof DSP box?

What I’d like to do is: build a box that can implement my DSP ‘formula’, that isn’t connected to the internet, takes in stereo S/PDIF, and gives out six channels of analogue.

Is this the way to get a future-proof DSP box that the Powers-That-Be can’t continually ‘upgrade’ into obsolescence? In other words, I would always be able to connect the latest PCs, streamers, Chromecast to it without relying on the same box having to be the source of the stereo audio itself (which currently means that every time it is booted it up it could stop working because of some trivial – or major – change that breaks the system). Witness only this week where Spotify has ‘upgraded’ its system and consigned many dedicated smart speakers’ streaming capability into oblivion. The only way to keep up with such changes is to be an IT-support person, staying current with updates and potentially making changes to code.

To avoid this, surely there will always have to be cheap boxes that connect to the internet and give out S/PDIF or TOSLink, maintained by genuine IT-support people, rather than me having to do it. (Maybe not…. It’s possible that if fitment of MQA-capable chips becomes universal in all future consumer audio hardware, they could eventually decide it is viable to enable full data encryption and/or restrict access to unencrypted data to secure, licensed hardware only).

It’s unfortunate, because it automatically means an extra layer of resampling in the system (because the DAC’s clock is not the same as the source’s clock), but I can persuade myself that it’s transparent. If the worst comes to the very worst in future, the box could also have analogue inputs, but I hope it doesn’t come to that.

This afternoon’s exercise was really just to see if it could be done with an even cheaper box than a fanless PC and, amazingly, it can! I don’t know if anyone else out there is like me, but while I understand the guts of something like DSP, it’s the peripheral stuff I am very hazy on. To me, to be able to take a system that runs on an Intel-based PC and make it run on a completely different processor and chipset without major changes is so unlikely that I find the whole thing quite pleasing.

[UPDATE 18/02/18] This may not be as straightforward as I thought. I have bought one of these for its S/PDIF input (TOSLink, actually). This works (being driven by a 30-year old CD player for testing), but it has focused my mind on the problem of sample clock drift:

My own resampling algorithm?

S/PDIF runs at the sender’s own rate, and my DAC will run at a slightly different rate. It is a very specialised thing to be able to reconcile the two, and I am no longer convinced that Linux/Alsa has a ready-made solution. I am feeling my way towards implementing my own resampling algorithm..!

At the moment, I regulate the sample rate of a dummy loopback driver that draws data from any music player app running on the Linux PC. Instead of this, I will need to read data in at the S/PDIF sample rate and store it in the circular buffer I currently use. The same mechanism that regulates the rate of the loopback driver will now control the rate at which data is drawn from this circular buffer for processing, and the values will need to be interpolated in between the stored values using convolution with a windowed sinc kernel. It’s an horrendous amount of calculation that the CPU will have to do for each and every output sample – probably way beyond the capabilities of the Raspberry Pi I’m afraid. This problem is solved in some sound cards by using dedicated hardware to do resampling, but if I want to make a general purpose solution to the problem, I will need to bite the bullet and try to do it in software. Hopefully my Intel Atom-based PC will be up to the job. It’s a good job that I know that high res doesn’t sound any different to 16/44.1 otherwise I could be setting myself up for needing a supercomputer.

[UPDATE 20/02/18] I couldn’t resist doing some tests and trials with my own resampling code.

Resampling Experiments

First, to get a feel for the problem and how much computing power it will take, I tried running some basic multiplies and adds on a Windows laptop programmed in ‘C’. If using a small filter kernel size of 51 and assuming two sweeps of two pre-calculated kernels per sample (then a trivial interpolation between), it could only just keep up with stereo CD in real time. Disappointing, and a problem if the PC is having to do other stuff. But then I realised that the compiler had all optimisations turned off. Optimising for maximum speed, it was blistering! At least 20x real time.

I tried on a Raspberry Pi. Even it could keep up at 3x real time.

There may be other tricks to try as well, including processor-specific optimisations and programming for ‘SIMD’ (apparently where the CPU does identical calculations on vectors i.e. arrays of values, simultaneously) or kicking off threads to work on parts of the calculation where the operating system is able to share the tasks optimally across the processor cores. Or maybe that’s what the optimisation is doing, anyway.

There is also the possibility that for a larger (higher quality) kernel (say >256 values), an FFT might be a more economical way of doing the convolution.

Either way, it seems very promising.

Lanczos Kernel

I then wrote a basic system for testing the actual resampling in non-real time. This is based on the idea of wanting to, effectively, perform the job of a DAC reconstruction filter in software, and then to be able to pick the reconstructed value at any non-integer sample time. To do this ‘properly’ it is necessary to sweep the samples on either side of the desired sample time with a sinc kernel i.e. convolve it. Here’s where it gets interesting. The kernel can be created so that its elements’ values compute the kernel as though centred on the exact non-integer sample time desired, even though it is aligned and calculated on the integer sample times.

It would be possible to calculate on-the-fly a new, exact kernel for every new sample, but this would be very processor intensive, involving many calculations. Instead, it is possible to pre-calculate a range of kernels that represent a few fractional positions between adjacent samples. In operation, the two kernels on either side of the desired non-integer sample time are swept and accumulated, and then linear interpolation between these two values used to find the value representing the exact sample time.

You may be horrified at the thought of linear interpolation until you realise that several thousand kernels could be pre-calculated and stored in memory, so that the error of the linear interpolation would be extremely small indeed.

Of course a true sinc function would extend to plus and minus infinity, so for practical filtering it needs to be windowed i.e. shortened and tapered to zero at the edges. Apparently – and I am no mathematician – the best window is a widened duplicate of the sinc function’s central lobe, and this is known as the Lanczos Kernel.

Using this arrangement I have been resampling some floating point sine waves at different pitches and examining the results in the program Audacity. The results when the spectrum is plotted seem to be flawless.

The exact width (and therefore quality) of the kernel and how many filters to create are yet to be determined.

[Another update] I have put the resampling code into the active crossover program running on an Intel Atom fanless PC. It has no trouble performing the resampling in real time – much to my amazement – so I now have a fully functional system that can take in TOSLink (from a CD player at the moment) and generate six analogue output channels for the two KEF-derived three-way speakers. Not as truly ‘perfect’ as the previous system that controls the rate at which data arrives, but not far off.

[Update 01/03/18] Everything has worked out OK, including the re-sampling described in a later post. I actually had it working before I managed to grasp fully in my head how it worked! But the necessary mental adjustments have been made, now.

However, I am finding that the number of platforms that provide S/PDIF or TOSLink outputs ‘out-of-the-box’ without problems is very small.

I would simply have bought a Chromecast Audio as the source, but apparently its Ogg Vorbis encoded lossy bit rate is limited to 256kbps with Spotify as the source (which is what I might be planning to use for these tests) as opposed to the 320 kbps that it uses with a PC.

So I thought I could just use a cheap USB sound card with a PC, but found that with Linux it did a very stupid thing: turned off the TOSLink output when no data was being written to it – which is, of course, a nightmare for the receiver software to deal with, especially if it is planning to base its resampling ratio on the received sample rate.

I then began messing around with old desktop machines and PCI sound cards. The Asus Xonar DS did the same ridiculous muting thing in Linux. The Creative X-Fi looked as though it was going to work, but then sent out 48 kHz when idling, and switched to the desired 44.1 kHz when sending music. Again, impossible for the receiver to deal with, and I could find no solution.

Only one permutation is working: Creative X-Fi PCI card in a Windows 7 machine with a freeware driver and app because Creative seemingly couldn’t be bothered to support anything after XP. The free app and driver is called ‘PAX’ and looks like an original Creative app – my thanks to Robert McClelland. Using it, it is possible to ensure bit perfect output, and in the Windows Control Panel app it is possible to force the output to 16 bit 44.1 kHz which is exactly what I need.

[Update 03/03/18] The general situation with TOSLink, PCs and consumer grade sound cards is dire, as far as I can tell. I bought one of these ubiquitous devices thinking that Ubuntu/Linux/Alsa would, of course, just work with it and TOSLink.

USB 6 Channel 5.1 External SPDIF Optical Digital Sound Card Audio Adapter for PC

It is reputedly based on the CM6206. At least the TOSLink output stays on all the time with this card, but it doesn’t work properly at 44.1 kHz even though Alsa seems happy at both ends: if you listen to a 1kHz sine wave played over this thing, it has a cyclic discontinuity somewhere – like it’s doing nearest neighbour resampling from 48 to 44.1 or something like that..? As a receiver it seems to work fine.

With Windows, it automatically installs drivers, but Control Panel->Manage Audio Devices->Properties indicates that it will only do 48 kHz sample rate. Windows probably does its own resampling so that Spotify happily works with it, and if I run my application expecting a 48 kHz sample rate, it all works – but I don’t want that extra layer of resampling.

As mentioned earlier I also bought one of these from Maplin (now about to go out of business). It, too, is supposedly based on the CM6206:

Under Linux/Alsa I can make it work as TOSLink receiver, but cannot make its output turn on except for a brief flash when plugging it in.

In Windows you have to install the driver (and large ‘app’ unfortunately) from the supplied CD. This then gives you the option to select various sample rates, etc. including the desired 44.1 kHz. Running Spotify, everything works except… when you pause, the TOSLink output turns off after a few seconds. Aaaaaghhh!

This really does seem very poor to me. The default should be that TOSLink stays on all the time, at a fixed, selected sample rate. Anything else is just a huge mess. Why are they turning it off? Some pathetic ‘environmental’ gesture? I may have to look into whether S/PDIF from other types of sound card is constantly running all the time, in which case a USB-S/PDIF sound card feeding a super-simple hardware-based S/PDIF-to-TOSLink converter would be a reliable solution – or simply use S/PDIF throughout, but I quite like the idea of the electrical isolation from TOSLink.

It’s not that I need this in order to listen to music, you understand – the original ‘bit perfect’ solution still works for now, and maybe always will – but I am just trying to make SPDIF/TOSLink work in principle so that I have a more general purpose, future-proof, system.

The problem with IT…

…is that you can never rely on things staying the same. Here’s what happened to me last night.

By default I start Spotify when my Linux audio PC boots up. I often leave it running for days. Last night I was listening to something on Spotify (but I suspect it wouldn’t have mattered if it had been a CD or other source). I got a few glitches in the audio – something that never happens. This threatened to spoil my evening – I thought everything was perfect.

I immediately plugged in a keyboard and mouse to begin to investigate and it was at that moment that I noticed that the Intel Atom-based PC was red hot.

Using the Ubuntu system monitor app I could see that the processor cores were running close to flat out. Spotify was running, and on the default opening page was a snazzy animated advert referring to some artist I have no interest in. The basic appearance was a sparkly oscilloscope type display pulsing in time with the music. I had not seen anything like that on Spotify before. I had an inkling that this might be the problem and so I clicked to a more pedestrian page with my playlists on it. The CPU load went down drastically.

Yes, Spotify had decided they needed to jazz up their front page with animation and this had sent my CPU cores into meltdown. Now, my PC is the same chipset as loads of tablets out there. Maybe Ubuntu’s version of flash (or whatever ‘technology’ the animation was based on) is really inefficient or something, but it looks to me as though there is a strong possibility that this Spotify ‘innovation’ might have suddenly resulted in millions of tablets getting hot and their batteries flattening in minutes.

The animation is now gone from their front page. Will it return? I can’t now check whether any changes I make to Spotify’s opening behaviour (opening up minimised?) will prevent the issue.

This is the problem with modern computer-based stuff that is connected to the internet. It’s brilliant, but they can never stop meddling with things that work perfectly as they are.

[06/01/17] Of course it can get worse. Much worse. Since then, we now know that practically every computer in the world will need to be slowed down in order to patch over a security issue that has been designed into the processors at hardware level. At worst it could be a 50% slowdown. Will my audio PC cope? Will it now run permanently hot? I installed an update yesterday and it didn’t seem to cause a problem. Was this patch in it, or is the worst yet to come?

[04/02/18] I defaulted to Spotify opening up minimised when the PC is switched on. Everything still working, and the PC running cool.

But I would like to get to the point where I have a box that always works. I would like to be able to give my code to other people without needing to be an IT support person – believe me, I don’t know enough about that sort of thing.

It now seems to me that the only way to guarantee that a box will always be future-proof without constant updates and the need for IT support is to bite the bullet and accept that the system cannot be bit-perfect. Once that psychological hurdle is overcome, it becomes easy: send the data via S/PDIF. Resample the data in software (Linux will do this automatically if you let it), and bob’s your uncle: a box that isn’t even attached to the internet, that takes in S/PDIF and gives you six analogue outputs or variations thereof; a box with a video monitor output and USB sockets, allowing you to change settings, import WAV files to define filters, etc. then disconnect the keyboard and mouse. Or a box that is accessible over a standard network in a web browser – or does that render it not future-proof? Presumably a very simple web interface will always be valid. I think this is going to be the direction I head in…

Software: the future of audio

Last night, on a whim, I decided that I would like my active crossover software to display some sort of indication of the output levels being sent to the DACs. This is quite important, and something that I should have tackled quite a while ago. Basically, we should be worried about clipping, and also ‘overs’ i.e. those interpolated samples that are generated by DAC reconstruction filters in between the recorded samples and which have the potential to clip even though the recording does not, directly. By messing around with various types of driver correction and so on, am I running the risk of clipping? Or, am I wasting DAC resolution by needlessly attenuating my DAC outputs too much?

Here is how easy it was to display the information in a useful and aesthetically pleasing way:

  • I created six vertical rectangular areas on the active crossover app’s screen – one bargraph for each DAC output.
  • I decided upon a linear percentage display (not dB) and an update rate of 10 Hz
  • A timer was set to trigger at 10 Hz (the timer is provided by the GTK GUI library) and call the function to draw the six bargraphs
  • In the output function for the DACs, I take the absolute value of each sample as I write it to the DAC and compare it to the maximum recorded so far for that channel (out of six channels). I overwrite the maximum if it is exceeded. There is a ‘mutex’ interlock around the maximum value to prevent the bargraph drawing function from accessing it at the same moment.
  • The bargraph drawing function for each channel accesses that maximum recorded value and saves it. The maximum value for that channel is then reset to zero. The saved value is compared against that bargraph’s previous displayed value. If it is greater, a coloured rectangle is drawn directly proportional in length to the value. If it is less, the previous value is multiplied by 0.9, and the rectangle drawn to that height, instead. With this simple system, we have a PPM-style display that shows signal peaks that slowly decay.
  • The bargraph display function also records an absolute maximum for that channel, which doesn’t get reset. This value is displayed as a red horizontal line, thus showing the maximum output level for that particular listening session.

The result is one of those attractive arrays of VU meters that dances in response to the incoming signal levels. The results were interesting, and will alert me to any future mis-steps with regard to clipping – it still doesn’t tackle the issue of ‘overs’ directly, however.

But the reason for mentioning it, is to show the power and simplicity of engineering with software. To build a PPM meter in hardware and wire it all up, would not be trivial, and would take days, weeks or months for a commercial product. In software, it takes less than an hour and a half to construct it from scratch. Audio processing functions are equally simple to create and integrate within the system. It seems clear that once the basic DSP ‘engine’ is in place, complex audio systems can be put together like Lego. A perfectly capable three-way speaker can be built in days. It is not too hard to see how a three-way, six channel DSP system could simply be scaled up to create something like the Beolab 90.

Is this an exciting trend, or the end of everything that makes audio interesting? I think it is the former, but I can see that many traditionalists might disagree.

Active crossover running on fanless PC

sumvision cyclone

I bought a Sumvision Cyclone Mini PC for experimenting with running my active crossover software on a fanless PC. It’s no more than a tablet in a box, but it’s quad core and runs 64 bit Linux on an Intel Atom Bay Trail chipset and, presumably, can perform GFLOPS without dissipating more than few watts – that’s really quite amazing but it’s so easy to take such things for granted these days! It comes pre-loaded with Windows 8.1 and it was a pain to make it work with Linux. I relied heavily on a guide on the internet – thanks to the person who provided it.

Undoubtedly it will be much easier to install Ubuntu on one of these PCs in the future when the Linux people have caught up with the hardware. The WiFi doesn’t work yet so I am using a USB dongle, nor does the on-board audio but I am using the Asus Xonar U7 for that, anyway. Interestingly (to me anyway) I was able to remove Pulse Audio (I think) from this version of Ubuntu without affecting the other system settings [I think this was a fluke: removing pulseaudio properly is impossible, and what I really should do is merely set “autospawn=no” in /etc/pulse/client/conf and reboot].

Absurdly, once Linux was installed following the guide, it worked straight away with the Xonar U7 and my crossover software, plus 64 bit Spotify.

I am assuming I could plug in a common or garden USB DVD drive for playing CDs without a problem [tried this and it works fine].

While running the active crossover software and streaming from Spotify, according to psensor the core temperatures are stabilised at about 56-60 degrees C in an ambient room temperature of 22 degrees C, and overall CPU usage is about 18%.

UPDATE 13/09/15

Things haven’t worked out quite as smoothly as I thought: on the fanless PC I have been getting occasional glitches in the audio, in the form of a click audible within the music, perhaps once every 10 minutes on average. These don’t occur in silent sections of the music so I am assuming that it is a case of missing, or extra, samples rather than corrupted samples. I didn’t notice this when running the code on a Pentium IV based desktop machine.

As a result, I have made major changes to the software, reducing the number of threads from three to one (plus a default thread for the GUI – which is currently just a mute checkbox for each driver). There are suggestions on the web that the ALSA functions are not ‘thread safe’. So now, all the ALSA audio and DSP processing runs in a single thread and all ALSA calls are non-blocking. This arrangement dispenses with the necessity to lock various circular buffer pointers with mutexes when accessing them, so the code is now more stripped back and simpler to understand.

The main motivation for multi-threading originally was that I assumed that the OS would assign threads to different cores, so for coolest running it would be best to share the computation load across several threads. Therefore I expected to see the CPU load on one of the cores go up as a result of combining three threads into one, but it doesn’t seem to have greatly affected the CPU load traces, nor the core temperatures.

No glitches so far.

[UPDATE 29/10/15]

Still had the glitch problem! It wasn’t happening on a P4 desktop minitower PC, but on the Sumvision I might get a glitch once in ten minutes. Nothing drastic, but I found myself on edge waiting for it. Really, any glitches are unacceptable, even if only one every three hours.

Is it related to input or output? As an experiment I modified the code to stream the incoming audio to a file while playing music. When I heard a glitch I noted down the time it occurred. Examining the data in the audio editor app Audacity I found a discontinuity in the waveform. Gotcha! In order to test for this problem reliably I created an audio file containing a continuously-repeating ramp waveform. In my program I added a check on consecutive samples to flag up any discontinuities. Sure enough, the problem only occurred occasionally, but it always happened eventually. Playing with threads etc. didn’t get rid of it.

In desperation I started to look at the open source code for the snd-aloop driver I am using as my bridge between audio player apps and my code. I found a mysterious system whereby there are separate ‘rate shifts’ (the programmable sample rate I am relying on in my code) for playback and capture. I don’t really understand this: unless playback and capture are locked together (at least on average), it seems to me that they must eventually diverge and cause audio discontinuities. I bodged the snd-aloop source code in order to precisely lock together the playback and capture ‘deltas’. This sort of thing is outside my comfort zone. I had to re-compile the snd-aloop driver and use the Linux command insmod to load it into the system.

It worked. I now get zero errors no matter how long the system is running. The difference between the Sumvision and the old P4 may be explained by the fact that the PCs’ clocks were quite a bit different, and much more rate shift was necessary in the Sumvision in order to synchronise with the sound card.

I still think I am making this harder than it needs to be. Do I need the snd-aloop driver at all? Can it all be done with ALSA plugins? One Linux guru said I should write my program as a plugin itself. It has occurred to me that at least I can now modify snd-aloop in order to make it work as I want: not with its own sample rate at all, but merely as a relay from the capture demand to the playback demand.

But the bottom line is that the system is now working perfectly on the Sumvision Cyclone.

02/12/16: It seems that there will be no ‘official’ version of Ubuntu for the Bay Trail and Cherry Trail chipsets, but someone called Ian Morrison (a.k.a. “Linuxium”) has created an installer and very kindly made it available. I found that his version of Ubuntu 16.04 seemed not to boot on the Sumvision Cyclone, but 16.10 appears to be fine. I haven’t transferred my software over to it yet but my second Sumvision Cyclone appears to be working fine, with Wi-Fi. Many thanks to Ian for this. I wouldn’t know where to start in creating such a thing.

UPDATE 30/01/16: I just spent quite a large proportion of my Christmas break worrying about, and trying to fix, an issue that arose after I put the latest version of Ubuntu (mentioned above) onto a Sumvision Cyclone and installed my active crossover software. Glitches were back!

I cannot tell you how many fruitless attempts I made to solve it. I narrowed it down to the DAC output, where some zero-value samples were being substituted in the analogue output, but with the overall timing remaining correct. There were no EPIPE (buffer underrun) errors.

I created a ‘glitch detector’ where I generated a waveform from the DAC’s analogue output and fed it via a cable into the microphone input, looking for excessive sample-to-sample amplitude changes. Glitches would always occur, but it seemed worse when loading web pages etc.

Finally, I think I hit upon the solution in this forum topic:

Problem with Bay Trail and new kernels

It seems that recent versions of the Linux kernel have changed something regarding ‘C-states’, related to the way the processor cores are dynamically put into low power modes when idle in order to reduce average power consumption. With the new, more aggressive, power saving, they take longer to start up again (flushing pipelines etc.), and this has been causing Bay Trail setups to freeze completely. It is still being discussed as a live issue on Intel forums. I think I have been suffering from another side effect of this misguided change.

There is a workaround, which is to specify a boot option to keep the cores relatively ‘alive’ at all times. (There may also be a BIOS setting that I could have changed, too). It seems to have fixed my problem completely.

If this turns out to be the issue, it highlights the fragile nature of any IT-based product. An innocuous update in the operating system can kill your product because of real time issues; there is no amount of testing that can be done by the OS people that can eliminate the potential for problems in users’ own applications.

At one time, I naively thought that it was possible to put together an embedded PC-based system that could be ‘frozen’ and would always work, and could always be duplicated, but I have long given up hope on that. Embarking on any digital audio scheme based on a PC implies a commitment to constant maintenance, in a way no different from the constant maintenance you commit to when using, say, reel-to-reel tape recorders.

Linux Active Crossover is working!

[Update: now running on a fanless Bay Trail processor]

After a few evenings of half-hearted attempts to port my Windows code and make the changes needed to run on Linux, I finally got my head around what was needed, and it works! Unfortunately I’m not at the house where the amp and speakers are so I can’t try it ‘in anger’ but at least I can tell that I’m getting what sounds like correctly-filtered Spotify or CD from the three stereo outputs.

On a ten year old Dell GX520 it’s using about 16% of the CPU, and when you add in Spotify at about another 16% plus the snd-aloop driver and all the other stuff going on in an internet-connected PC, it comes to about 40% CPU, which is a bit higher than I had hoped – there’s a tiny amount of fan noise. Maybe there is scope to improve the efficiency of the crossover software: at the moment I am reading and writing 32 bit integers to/from the sound cards (one is a dummy sound card of course) but doing all the processing in floating point which therefore involves converting each sample twice with a potentially expensive operation. Maybe this can be speeded up. And I can always find a faster, cooler PC of course.

[13/07/15] In response to a comment, the point of all this is not just to implement basic crossover filtering, but to correct the drivers’ individual responses based on measurements, producing zero phase shift for each driver, and therefore perfect (or as close as possible) acoustic crossovers and zero overall phase shift. EQ such as baffle step correction is overlaid onto the filters’ responses without costing anything extra in CPU power. Individual driver delays are also added. I am not claiming this is unique, but nor is it commonplace. In terms of an active crossover it is the no-compromises version.

I have had this system working for a couple of years on a Windows PC, but Linux will be a cheaper and more elegant solution.

[UPDATE 18/0715] I have it running with the speakers with a choice of two sound cards: Asus Xonar DS and Creative X-Fi. It’s just a case of changing a few characters in the xover config file.

The control loop algorithm for maintaining the average sample rate at input and output (and avoiding any resampling) is an interesting problem to solve and I have had fun trying different algorithms based on PID loops and plotting the result out as a graph. The output sample rate is fixed, set by the card, and has to be inferred from the time between calls to send chunks of data to the output card but there will be a level of jitter on this due to the other things that the multi-threaded program is doing. We know the precise sample rate at the input (the snd-aloop loopback driver) because we are setting it. The aim is to keep the difference between number of samples read and number of samples output to the DACs at a constant level, but as we are sending and receiving chunks of data the instantaneous figure is fluctuating all the time. I presume that similar calculations are being performed in the adaptive resampling that would be usual when connecting together digital audio systems with differing sample rates – the difference being that this would affect the audio (subtly, but it undeniably would), while the aim of my scheme is that the timing adjustments merely affect the fill level of a FIFO, the sample rate being rigidly fixed and defined by the DAC.

[UPDATE 31/07/15]

Feeling confident, I bought an Asus Xonar U7 USB 7.1 sound card. This is based on the CM6632A chipset. I got it working but… trying to set the format to signed 32 bit within my program failed when addressing the device as “hw”. It also failed with S24_3LE and various other sample formats. However, 16 bit was accepted. Consulting the web, people commonly seem to have this issue with both CM6631A and CM6632A on Linux, and their workaround is simply to use “plughw” instead. However, if the “hw” device rejects a format, then, supposedly, the hardware cannot support it. All the “plughw” device does is automatically allow the OS to convert samples from the format you are using into one that the card can use. So I have a feeling that the card is only running in 16 bit mode, regardless of what my code is sending it.

If an application chooses a PCM parameter (sampling rate, channel count or sample format) which the hardware does not support, the hw plugin returns an error. Therefore the next most important plugin is the plug plugin which performs channel duplication, sample value conversion and resampling when necessary.

http://www.volkerschatz.com/noise/alsa.html

[03/08/15 UPDATE] Got back to the house where my system lives after the weekend, and was able to try my Asus Xonar U7 again. This time it accepted S24_3LE! Could this be the issue with hot-plugging versus not hot-plugging that other people on the web have seen? I have a feeling that my previous tests were with the U7 hot-plugged into a PC that was already on. Anyway, I now seem to be in business with the U7 and it sounds good.

Linux-based active crossover: getting there

A few weeks ago I wrote about my desire to dump Windows and to go with Linux for audio. The aim is to create an active crossover system that is the best of all worlds:

  • completely flexible, programmable down to bit level (I am going to program it – or pretty much port my existing code from Windows)
  • powerful enough to implement any type of filtering (large FIRs in particular)
  • not dependent on specific hardware – can use a variety of low cost PCs including old PCs at the back of the cupboard, fanless, compact, low powered, dedicated DSP cards.
  • all libraries, drivers, compilers are open source; not beholden to commercial companies
  • capable of streaming from a variety of sources without sample conversion
  • not bogged down with continuous updates and anti-virus shenanigans

The goal is to use DSP to replace the passive crossovers that so-degrade conventional speakers’ performance, not merely to use the PC as a ‘media hub’. The Linux-based audio system can do this, and despite its workaday image represents the ultimate hi-fi source component. Hi-fi sustains an industry, and hordes of enthusiasts are prepared to spend real money on it. What an interesting thought, therefore, to realise that as a source there will never be any need for a better component than the ‘Linux box’. Here exists a system, a general purpose number cruncher that is powerful enough for all audio applications, bristling with connectivity, easy to equip with digital to analogue converters whose raw fidelity have long surpassed the limits of human hearing, and yet (if you use an old, surplus PC) costs less than a Christmas cracker toy to own – unlike an equivalent Windows PC.

Details, details

Regardless, reading around the web on the subject, for my active crossover system I seem to either have unique requirements that no one has ever thought of, or my requirements are just so trivial as to be not even worth writing down by anyone. I am still not sure which it is…

On the face of it Linux seems to have audio covered and then some, but in amongst the fantastically comprehensive JACK solution I don’t really feel I know what is going on. It feels like overkill. Is the audio being resampled? I think I need a simpler solution.

Just to summarise the thinking behind my requirements:

  • I want to design my own DSP system rather than trying to adapt existing systems.
  • I want to be able to understand exactly what is going on.
  • Dedicated digital signal processing systems are relatively expensive, often not very powerful, and in order to get the most out of them they may entail a considerable learning curve without the effort being applicable elsewhere, whereas PCs running Linux are ridiculously powerful and cheap.
  • Linux can be installed on any PC for free, and there is no danger of The Powers That Be decreeing that it must be ‘upgraded’, with the high chance that the system will be broken by the upgrade. For example, the mandatory ‘upgrade’ from XP to Windows 7 broke my current system, entailing the fitting of a second sound card due to a change in functionality of a sound card driver. And it cost money.
  • I want the best of all worlds: to be able to program the system at low level as though it is a microcontroller sending samples to a DAC, but for it also to have nice GUIs, play CDs, run Spotify without the need for any other piece of hardware linked with a cable.
  • It would be nice if the system would run on any old PC e.g. fanless.
  • It would be nice to be able to use any sound card as the multichannel DAC.
  • I don’t want the system to resample the audio. This is the ‘killer’ requirement that, I think, most people never give a second thought to.

That last requirement is what the whole thing is about. It is nothing to do with conversion between 48 kHz and 44.1 kHz, or 96 kHz and 192 kHz, but is about the resampling that would be necessary in going from 44.0999 kHz to 44.10001 kHz, for example; if the source and DAC are at nominally the same sample rate, but use separate crystal clocks they will drift apart over time. This can be handled using adaptive resampling of the audio stream in software. Resampling would involve extra DSP, so even if I was happy that no audible degradation was occurring, it would be sapping more CPU power than was necessary, or relying on a particular type of sound card that does its own resampling.

The alternative is to ensure that the source and DAC are synchronised in terms of their average sample rates. The DAC will have a fixed, rigid sample rate, so the only rate that can vary is the source and, if the source is a stream of bytes from an audio application (e.g. a media player program), this synchronisation can be arranged by requesting chunks of data from the source only when the DAC is ready to receive it. A First-In-First-Out (FIFO) buffer is loaded with these chunks of data, and the data is streamed out to the DAC continuously.

I would like to think I have now found the solution using Linux. I would be very grateful if any Linux gurus out there would care to correct me if I am wrong on any of this:

  • Linux has several (confusing) layers when it comes to handling audio. However, most audio applications will work directly with ALSA, which allows fairly low level programming.
  • Typical Linux distributions also come with Pulseaudio loaded and running. Pulseaudio is a higher level system than ALSA and has many nice features, but automatically performs resampling(?). Pulseaudio can be removed.
  • Another step up in sophistication is JACK, a very comprehensive system that requires a server program to be running all the time in the background. There is no obligation to set JACK running.
  • As with Windows, fitting a sound card into a Linux machine causes the driver for that sound card to be loaded automatically. ALSA can then ‘see’ the card and it can be referred to as “hw:3,1” where the ‘3’ is the card, and the ‘1’ is a device on the card, or using aliases e.g. “hw:DS,1” etc. – this is useful because the numeric designation may change between boot-ups.
  • “hw” devices are accessed directly without any resampling. as opposed to “plughw” devices. Both options are usually available for most sound cards and their drivers. I am only considering the “hw” option.
  • Driver capabilities can be ascertained in detail by dumping the driver controls to a file using various methods e.g. “alsactrl store” etc.
  • Linux provides drivers that have been put together by enthusiasts based on sound card chipsets, so not all the facilities listed by the driver will necessarily be available for every card.
  • ALSA’s API allows real time streaming to and from ALSA devices, including multichannel frames. Taking data from a device is known as capture, and sending to a device is known as playback (or similar).
  • A device can be designated as the ALSA default, which most audio applications default to sending their output to. Applications like Spotify can only direct their output to the default device.
  • There is a ‘dummy’ driver available called snd-aloop. This can be loaded into the system at boot-up. To ALSA it appears as as a sound card called Loopback with eight capture devices and eight playback.
  • snd-aloop can be designated as the default device.
  • snd-aloop has a very desirable feature: its sample rate can be varied via a real time control. This control is accessible like the controls that are available on any sound card driver and can simply be set from a terminal using a command such as “amixer cset numid=49 100010” where 49 is the index of the control and 100010 is the value we are setting it to. The control can also be adjusted from inside your own program.
  • Clearly, if a way can be found to compare the sample rates of the DAC and snd-aloop, then snd-aloop‘s sample rate can be adjusted occasionally to keep the source’s average sample rate the same as the DAC’s. N.B. this is not dynamically changing the pitch or timing of the stream – this is fixed and immoveable and set by the DAC – but merely ensures that the FIFO buffer’s capacity is not exceeded in either direction. If the source was not asynchronous (e.g. not a CD or on-demand streaming application whose data can be requested at any time) but a fixed rate stream with no way of locking the DAC to its sample rate via hardware, then this would not be possible, and adaptive re-sampling would be essential.

After a few days of wrestling with this, my experience is as follows:

  • Removing Pulseaudio from Ubuntu (“sudo apt-get remove pulseaudio –force” or similar) has side-effects, and the system loses many of its GUI-adjustable settings options because various Gnome-related dependencies are removed too. It doesn’t ‘break’ the system; merely makes it less useable. The solution can be as crude as re-installing Pulseaudio in order to make a settings change and then removing it again! I don’t know that it is essential to remove Pulseaudio, but it certainly feels better to do so.
  • Various audio apps are happy to play their outputs into snd-aloop, and my software can capture its output and process it quite happily.
  • The real core essentials of using the ALSA API for streaming are straightforward-ish, but documentation beyond a simple description of each function is sparse. In many cases, the ALSA source code is viewed as being sufficient documentation. As an example, try to find any information on how to modify an ALSA driver control without actually delving into an existing program like amixer to try and work it out. I find that most ‘third party’ tutorials seem to obscure the essentials with multiple equivalent options demonstrating all the different ways that a single task can be performed.
  • My ASUS Xonar sound card may yet turn out to be useful now that I don’t have to worry about using it as an input as well as an output: it is a high quality eight channel DAC that seems well-behaved in terms of lack of ‘thump’ at power-on and -off.
  • I found the easiest way to adjust the snd-aloop sample rate dynamically was by cutting and pasting the source code for the standard ALSA/Linux program amixer into my program (isn’t open source software great?) and passing the commands to it with the same syntax as I would use at the command line.
  • The system seems stable and robust when the PC is doing other things i.e. opening up highly graphical web pages in a browser. No audible glitches at all and no jump in the difference between my record and playback sample counters.
  • I am, as yet, unsure as to the best way to implement the control loop that will keep snd-aloop and the Asus Xonar in sync. With a snd-aloop rate setting of 100000 i.e. nominally neutral, there is a drift of about one sample every couple of seconds (an evening’s worth of listening could be assured without any adjustment at all by have a large enough FIFO and slightly-longer-than-desirable latency…). I am currently keeping a count of the number of samples captured vs. the number of samples sent to the DAC and simply swapping between fixed ‘slightly slow’ (99995) and ‘slightly fast’ (100005) snd-aloop sample rates, triggered when the (heavily-averaged) difference hits either of two thresholds.
  • In terms of the ALSA sample streaming I just use the ‘blocking method’ inside two separate threads: one for capture and one for playback.
  • It occurs to me that this system could be used to stream to an HDMI output, thence to an AV receiver with multiple output channels. Not sure if the PC locks to the AV receiver’s DAC sample rate via HDMI (is it bidirectional?), or whether the AV receiver resamples the data, or syncs itself to the incoming the HDMI stream.

You may find it hard to get excited by this stuff, but not me: it’s a case of feeling that I own the system rather than my recent experiences that showed that with Windows the system is merely ‘under licence’ from Microsoft and the hardware vendors.

Trying Linux

UPDATED 16/03/15ubuntu-logo-8647_640 Approximately every two years I find myself inspired to have a go with Linux. I install Ubuntu on an old PC and congratulate myself on having finally made the right choice. Everything works fine: all the devices are auto-detected correctly, and although the graphics and text are a bit lumpy, it looks as though it can do everything Windows can do. It never lasts. Within a short time I try to do something beyond the basic web surfing and word processing and it doesn’t quite work. So I go to the web, and of course there’s usually a solution buried in a forum somewhere, and it invariably involves editing a config file. But along the way I may have found several other ‘solutions’ that didn’t work, and for each I maybe edited a different file or changed something using some little app I’ve installed. At the end, even though the system may be working, I am never quite sure how I got there, nor confident I could reproduce the same working system on another PC.

Well, the time has come again, and I am typing this using the latest version of Ubuntu. Everything is wonderful so far, and even Spotify is running flawlessly. Specifically, though, I want to get my active crossover system working on Linux, not Windows. My experience with Windows 7 running on slightly older PCs is not good. I have a laptop approximately 5 years old which will grind almost to a halt for several minutes every day, performing some sort of scan of itself, and I don’t know enough to do anything about it. The desktop PC that I use for the active crossover is slightly better, but it, too, takes quite a while to ‘warm up’ and is also prone to the occasional glitch while playing music, due to deciding to update its anti-virus database – I am sure it was not a problem with Windows XP. In contrast, running Ubuntu on an older desktop PC without much RAM, the experience is one of ‘solidity’. I am not experiencing the operating system going AWOL for several seconds at a time. But it comes at a price. I really, really don’t want to have to understand the details of any operating system, and Windows is good for the person who maybe wants to dip into a bit of programming (a distinctly different activity from IT) without having to worry too much about the really low level details. Windows feels as though it is ‘self-healing’. Every time the PC is turned on it starts scanning itself, checking for inconsistencies, downloading updates. New hardware is detected automatically and the user never edits configuration files. Ubuntu feels a little different. By all means correct me if I am wrong, but the impression I get is of a system that is dependent on lots of configuration files that are not hidden from the user. Of course these files get changed by the operating system itself (just as Windows must change its hidden configuration files) and there are little applications that you can install that simplify changing the parameters of various sound cards, say (more on this later). But occasionally the configuration files must be edited by the user using a text editor. One typo, and the PC may refuse to boot!

As I mentioned, I am hoping to run my active crossover stuff on Linux, not Windows. In order to achieve this I must loop continuously doing the following:

  1. Extract a chunk of stereo audio from an ‘input port’ that receives data from my application of choice (media player, Spotify etc.)
  2. Assemble the data into fixed-size buffers to be FFT-ed.
  3. Process with FIR filters to produce a separate, filtered output for each driver.
  4. Inverse FFT.
  5. Squirt the results out to six or eight analogue channels, or if feeling ambitious, HDMI (that would be the dream!).

It’s a very specific, self-contained requirement. I can handle numbers 2 to 4, no problem. 1 and 5 are the tricky ones, and seem to be a lot trickier than they, perhaps, might be. They weren’t all that easy in Windows, either, but I eventually came up with a scheme that kind of worked.

Here’s where it gets very specific: under XP I was able to use a single Creative X-Fi surround sound card as both the ‘receptacle’ for PC audio which I could then access with my application, and also as the multichannel DAC that my application could squirt its output to. Under Windows 7 the driver for the sound card was ‘updated’ and I could no longer access it as the receiver for general PC audio – I could still have used it for S/PDIF, analogue Line In etc., however. In the ideal world, the ‘receptacle’ would just be some software slaved to the output sample rate, I think, but I don’t know how to create such a piece of software – it would appear to Windows to be a driver I would guess. I could buy a piece of software called Virtual Audio Cable but I could never be sure whether that would always be re-sampling the data, and I’d rather avoid that. In the end, I used a method that I knew would work: I slaved a ‘professional’ audio card to the X-Fi using S/PDIF from the X-Fi. The M Audio 2496 can slave its sample rate to the S/PDIF (using settings in the M Audio-supplied configuration application) so I was able to send PC audio to the M Audio and my application could extract data from its ‘mixer’ at the same sample rate. Keeping the input and output on separate cards like this has some advantages when it comes to making measurements of the system while it is working, I think.

As a start I will probably try to do the same thing under Linux. I am attempting to use an Asus Xonar as the multichannel DAC, and another M Audio card I had lying around as the slaved source. It’s almost certain that I could achieve the objective without a second sound card, but I really don’t know how to do it [update 30/06/15: maybe I do know how to do it now]. Linux audio seems to have several ‘layers’ that I don’t understand (but as yet I have no view of them as layers, more as spaghetti). Really, I would like not to have to know anything about them at all, but this seems unrealistic. I have established the following:

  • I can do lowish-level audio stuff using the Alsa API. I can refer to specific cards by names that I can bring up with certain command line (shell) queries. Are these names guaranteed to stay the same in between boots? I don’t think so, but there are ways of editing the config files to associate names I choose to specific cards – I think.
  • There is a highly comprehensive system called JACK that allows “JACK-aware” programs to have their audio routed via a user-configurable patchbay. It can handle re-sampling between separate cards transparently. Brilliant, but I don’t think Spotify is “JACK-aware” for example so I’m not bothering with it. [Update 30/06/15: I want to avoid any form of re-sampling anyway]
  • Ubuntu has PulseAudio installed already (I think) and using an application (that I had to install) called Pavucontrol I can direct Spotify, and presumably other apps, to send their outputs to any of the sound cards in the system. Does this get written to a file and saved when I exit it? I think so. PulseAudio may be the thing I need, possibly being capable of creating software “sources” and “sinks”. But is it always resampling the audio to match sample rates even when that is not needed? More investigation needed. [Update 30/06/15: Pulseaudio cannot be guaranteed not to resample. I have removed it from the machine].
  • I installed a little program called Mudita24 that gives me most of the functionality of the app that is supplied for M Audio cards under Windows. It will let me slave the M Audio to S/PDIF. But without a lot of rummaging around on the web, finding this solution was not obvious. Will the results be saved to a file so I don’t have to call this up every time? I don’t know. [Update 30/06/15: the M Audio-compatible drivers don’t seem to work properly. I have abandoned this idea].
  • I found a “minimal” example program that can send a sine wave to an output via Alsa. The program is anything but minimal and allows the user to select from a large number of alternative sample rates, bit depths etc. etc. and has copious error reporting. My version of “minimal” is much shorter! I adapted the program for eight channels, and am sending a separate frequency to each of the Xonar’s outputs. It seems to be working quite solidly. I can’t be absolutely sure that the Xonar isn’t applying surround sound processing to the signals yet, though. Question: should I be programming using Alsa or PulseAudio? [Update 30/06/15: answer is most definitely ALSA only].

I don’t mind if everything is low level, nor do I mind if the operating system handles everything for me. What I am not keen on is a hybrid between the operating system doing some things automatically, and yet having to manually edit files (I haven’t done that yet, though) or having to install little apps myself. How are they all tied together? I don’t know.

UPDATE 10/03/15 Installed Ubuntu on my erratic Windows 7 laptop. On the hard drive I had to delete the ‘HP Tools’ partition to do it, as a PC can only have four partitions, apparently, and HP had used all four to install Windows – the things you learn, eh?

For the things I use the laptop for mainly, Ubuntu is knocking Windows 7 into a cocked hat. It actually responds instantly and doesn’t hang for tens of seconds with the disk light on constantly and the mouse pointer frozen. It’s taking some getting used to!

UPDATE 15/03/15 It is becoming clear to me that there is only one sensible solution for what I am trying to achieve (an active crossover / general DSP system under my control that can be applied to any source including streaming) that is guaranteed not to resample the data, nor is dependent on sound card-specific features, or needs two sound cards. Let me run this by you:

  • Media player apps need something that looks like a sound card to play into. Some apps will only play into whichever card is set as the default audio device.
  • If it’s a real sound card that’s being played into, I need to extract the data before it reaches the analogue outputs. This just may not be possible with many sound cards, and it is impossible to know without trying the card – no one cares about this issue normally.
  • I process the data into six or eight channels and then I need to squirt the results out to, effectively, some DACs (or HDMI). This is most likely a real, physical multi-channel sound card.
  • I believe that the media player’s sample rate is defined by the sound card it is playing into. If so, this is akin to asynchronous USB mode i.e. the media app is slaved to the sound card’s sample rate.
  • I would like to avoid sample rate conversion (and this would still be needed to convert between 44.09999 kHz and 44.10001 kHz i.e. there is no such thing as “the same sample rate” unless they are derived from the same crystal oscillator).

There is a Linux driver called snd-aloop which can act as a virtual audio node, recognisable by media player apps as a sink, but also recognisable by other apps as a recording source. I could send media player output into this virtual device, recognise it as a source for my application, process the data and send the multi-channel audio to a consumer-level DAC card without it needing any special features. However, there is a subtle problem: aloop’s sample rate is derived from the system-wide “jiffies” count. It will not match the sample rate of the DAC card even if they are both nominally 44.1 kHz.

I see just one sensible solution: I have to modify the aloop code so that, when the information is available, it gets its sample rate synchronisation from the DAC card. I could either modify aloop and send it this synchronisation information via a ‘pipe’ or shared memory (if that’s possible) from my active crossover application, or I can make my active crossover application a virtual sound card driver itself. Either way, I would need to register the driver with the system so that it can be set up as the default audio device (using the usual GUI-based sound preferences).

To any Linux programmers out there: does this sound sensible and do-able?

More later.

Update 30/06/15: It seems that there is an updated version of the snd-aloop driver which incorporates a dynamically-adjustable sample rate via the Alsa PCM interface. This could be precisely what I need.