Linux Active Crossover is working!

[Update: now running on a fanless Bay Trail processor]

After a few evenings of half-hearted attempts to port my Windows code and make the changes needed to run on Linux, I finally got my head around what was needed, and it works! Unfortunately I’m not at the house where the amp and speakers are so I can’t try it ‘in anger’ but at least I can tell that I’m getting what sounds like correctly-filtered Spotify or CD from the three stereo outputs.

On a ten year old Dell GX520 it’s using about 16% of the CPU, and when you add in Spotify at about another 16% plus the snd-aloop driver and all the other stuff going on in an internet-connected PC, it comes to about 40% CPU, which is a bit higher than I had hoped – there’s a tiny amount of fan noise. Maybe there is scope to improve the efficiency of the crossover software: at the moment I am reading and writing 32 bit integers to/from the sound cards (one is a dummy sound card of course) but doing all the processing in floating point which therefore involves converting each sample twice with a potentially expensive operation. Maybe this can be speeded up. And I can always find a faster, cooler PC of course.

[13/07/15] In response to a comment, the point of all this is not just to implement basic crossover filtering, but to correct the drivers’ individual responses based on measurements, producing zero phase shift for each driver, and therefore perfect (or as close as possible) acoustic crossovers and zero overall phase shift. EQ such as baffle step correction is overlaid onto the filters’ responses without costing anything extra in CPU power. Individual driver delays are also added. I am not claiming this is unique, but nor is it commonplace. In terms of an active crossover it is the no-compromises version.

I have had this system working for a couple of years on a Windows PC, but Linux will be a cheaper and more elegant solution.

[UPDATE 18/0715] I have it running with the speakers with a choice of two sound cards: Asus Xonar DS and Creative X-Fi. It’s just a case of changing a few characters in the xover config file.

The control loop algorithm for maintaining the average sample rate at input and output (and avoiding any resampling) is an interesting problem to solve and I have had fun trying different algorithms based on PID loops and plotting the result out as a graph. The output sample rate is fixed, set by the card, and has to be inferred from the time between calls to send chunks of data to the output card but there will be a level of jitter on this due to the other things that the multi-threaded program is doing. We know the precise sample rate at the input (the snd-aloop loopback driver) because we are setting it. The aim is to keep the difference between number of samples read and number of samples output to the DACs at a constant level, but as we are sending and receiving chunks of data the instantaneous figure is fluctuating all the time. I presume that similar calculations are being performed in the adaptive resampling that would be usual when connecting together digital audio systems with differing sample rates – the difference being that this would affect the audio (subtly, but it undeniably would), while the aim of my scheme is that the timing adjustments merely affect the fill level of a FIFO, the sample rate being rigidly fixed and defined by the DAC.

[UPDATE 31/07/15]

Feeling confident, I bought an Asus Xonar U7 USB 7.1 sound card. This is based on the CM6632A chipset. I got it working but… trying to set the format to signed 32 bit within my program failed when addressing the device as “hw”. It also failed with S24_3LE and various other sample formats. However, 16 bit was accepted. Consulting the web, people commonly seem to have this issue with both CM6631A and CM6632A on Linux, and their workaround is simply to use “plughw” instead. However, if the “hw” device rejects a format, then, supposedly, the hardware cannot support it. All the “plughw” device does is automatically allow the OS to convert samples from the format you are using into one that the card can use. So I have a feeling that the card is only running in 16 bit mode, regardless of what my code is sending it.

If an application chooses a PCM parameter (sampling rate, channel count or sample format) which the hardware does not support, the hw plugin returns an error. Therefore the next most important plugin is the plug plugin which performs channel duplication, sample value conversion and resampling when necessary.

http://www.volkerschatz.com/noise/alsa.html

[03/08/15 UPDATE] Got back to the house where my system lives after the weekend, and was able to try my Asus Xonar U7 again. This time it accepted S24_3LE! Could this be the issue with hot-plugging versus not hot-plugging that other people on the web have seen? I have a feeling that my previous tests were with the U7 hot-plugged into a PC that was already on. Anyway, I now seem to be in business with the U7 and it sounds good.

Linux-based active crossover: getting there

A few weeks ago I wrote about my desire to dump Windows and to go with Linux for audio. The aim is to create an active crossover system that is the best of all worlds:

  • completely flexible, programmable down to bit level (I am going to program it – or pretty much port my existing code from Windows)
  • powerful enough to implement any type of filtering (large FIRs in particular)
  • not dependent on specific hardware – can use a variety of low cost PCs including old PCs at the back of the cupboard, fanless, compact, low powered, dedicated DSP cards.
  • all libraries, drivers, compilers are open source; not beholden to commercial companies
  • capable of streaming from a variety of sources without sample conversion
  • not bogged down with continuous updates and anti-virus shenanigans

The goal is to use DSP to replace the passive crossovers that so-degrade conventional speakers’ performance, not merely to use the PC as a ‘media hub’. The Linux-based audio system can do this, and despite its workaday image represents the ultimate hi-fi source component. Hi-fi sustains an industry, and hordes of enthusiasts are prepared to spend real money on it. What an interesting thought, therefore, to realise that as a source there will never be any need for a better component than the ‘Linux box’. Here exists a system, a general purpose number cruncher that is powerful enough for all audio applications, bristling with connectivity, easy to equip with digital to analogue converters whose raw fidelity have long surpassed the limits of human hearing, and yet (if you use an old, surplus PC) costs less than a Christmas cracker toy to own – unlike an equivalent Windows PC.

Details, details

Regardless, reading around the web on the subject, for my active crossover system I seem to either have unique requirements that no one has ever thought of, or my requirements are just so trivial as to be not even worth writing down by anyone. I am still not sure which it is…

On the face of it Linux seems to have audio covered and then some, but in amongst the fantastically comprehensive JACK solution I don’t really feel I know what is going on. It feels like overkill. Is the audio being resampled? I think I need a simpler solution.

Just to summarise the thinking behind my requirements:

  • I want to design my own DSP system rather than trying to adapt existing systems.
  • I want to be able to understand exactly what is going on.
  • Dedicated digital signal processing systems are relatively expensive, often not very powerful, and in order to get the most out of them they may entail a considerable learning curve without the effort being applicable elsewhere, whereas PCs running Linux are ridiculously powerful and cheap.
  • Linux can be installed on any PC for free, and there is no danger of The Powers That Be decreeing that it must be ‘upgraded’, with the high chance that the system will be broken by the upgrade. For example, the mandatory ‘upgrade’ from XP to Windows 7 broke my current system, entailing the fitting of a second sound card due to a change in functionality of a sound card driver. And it cost money.
  • I want the best of all worlds: to be able to program the system at low level as though it is a microcontroller sending samples to a DAC, but for it also to have nice GUIs, play CDs, run Spotify without the need for any other piece of hardware linked with a cable.
  • It would be nice if the system would run on any old PC e.g. fanless.
  • It would be nice to be able to use any sound card as the multichannel DAC.
  • I don’t want the system to resample the audio. This is the ‘killer’ requirement that, I think, most people never give a second thought to.

That last requirement is what the whole thing is about. It is nothing to do with conversion between 48 kHz and 44.1 kHz, or 96 kHz and 192 kHz, but is about the resampling that would be necessary in going from 44.0999 kHz to 44.10001 kHz, for example; if the source and DAC are at nominally the same sample rate, but use separate crystal clocks they will drift apart over time. This can be handled using adaptive resampling of the audio stream in software. Resampling would involve extra DSP, so even if I was happy that no audible degradation was occurring, it would be sapping more CPU power than was necessary, or relying on a particular type of sound card that does its own resampling.

The alternative is to ensure that the source and DAC are synchronised in terms of their average sample rates. The DAC will have a fixed, rigid sample rate, so the only rate that can vary is the source and, if the source is a stream of bytes from an audio application (e.g. a media player program), this synchronisation can be arranged by requesting chunks of data from the source only when the DAC is ready to receive it. A First-In-First-Out (FIFO) buffer is loaded with these chunks of data, and the data is streamed out to the DAC continuously.

I would like to think I have now found the solution using Linux. I would be very grateful if any Linux gurus out there would care to correct me if I am wrong on any of this:

  • Linux has several (confusing) layers when it comes to handling audio. However, most audio applications will work directly with ALSA, which allows fairly low level programming.
  • Typical Linux distributions also come with Pulseaudio loaded and running. Pulseaudio is a higher level system than ALSA and has many nice features, but automatically performs resampling(?). Pulseaudio can be removed.
  • Another step up in sophistication is JACK, a very comprehensive system that requires a server program to be running all the time in the background. There is no obligation to set JACK running.
  • As with Windows, fitting a sound card into a Linux machine causes the driver for that sound card to be loaded automatically. ALSA can then ‘see’ the card and it can be referred to as “hw:3,1” where the ‘3’ is the card, and the ‘1’ is a device on the card, or using aliases e.g. “hw:DS,1” etc. – this is useful because the numeric designation may change between boot-ups.
  • “hw” devices are accessed directly without any resampling. as opposed to “plughw” devices. Both options are usually available for most sound cards and their drivers. I am only considering the “hw” option.
  • Driver capabilities can be ascertained in detail by dumping the driver controls to a file using various methods e.g. “alsactrl store” etc.
  • Linux provides drivers that have been put together by enthusiasts based on sound card chipsets, so not all the facilities listed by the driver will necessarily be available for every card.
  • ALSA’s API allows real time streaming to and from ALSA devices, including multichannel frames. Taking data from a device is known as capture, and sending to a device is known as playback (or similar).
  • A device can be designated as the ALSA default, which most audio applications default to sending their output to. Applications like Spotify can only direct their output to the default device.
  • There is a ‘dummy’ driver available called snd-aloop. This can be loaded into the system at boot-up. To ALSA it appears as as a sound card called Loopback with eight capture devices and eight playback.
  • snd-aloop can be designated as the default device.
  • snd-aloop has a very desirable feature: its sample rate can be varied via a real time control. This control is accessible like the controls that are available on any sound card driver and can simply be set from a terminal using a command such as “amixer cset numid=49 100010” where 49 is the index of the control and 100010 is the value we are setting it to. The control can also be adjusted from inside your own program.
  • Clearly, if a way can be found to compare the sample rates of the DAC and snd-aloop, then snd-aloop‘s sample rate can be adjusted occasionally to keep the source’s average sample rate the same as the DAC’s. N.B. this is not dynamically changing the pitch or timing of the stream – this is fixed and immoveable and set by the DAC – but merely ensures that the FIFO buffer’s capacity is not exceeded in either direction. If the source was not asynchronous (e.g. not a CD or on-demand streaming application whose data can be requested at any time) but a fixed rate stream with no way of locking the DAC to its sample rate via hardware, then this would not be possible, and adaptive re-sampling would be essential.

After a few days of wrestling with this, my experience is as follows:

  • Removing Pulseaudio from Ubuntu (“sudo apt-get remove pulseaudio –force” or similar) has side-effects, and the system loses many of its GUI-adjustable settings options because various Gnome-related dependencies are removed too. It doesn’t ‘break’ the system; merely makes it less useable. The solution can be as crude as re-installing Pulseaudio in order to make a settings change and then removing it again! I don’t know that it is essential to remove Pulseaudio, but it certainly feels better to do so.
  • Various audio apps are happy to play their outputs into snd-aloop, and my software can capture its output and process it quite happily.
  • The real core essentials of using the ALSA API for streaming are straightforward-ish, but documentation beyond a simple description of each function is sparse. In many cases, the ALSA source code is viewed as being sufficient documentation. As an example, try to find any information on how to modify an ALSA driver control without actually delving into an existing program like amixer to try and work it out. I find that most ‘third party’ tutorials seem to obscure the essentials with multiple equivalent options demonstrating all the different ways that a single task can be performed.
  • My ASUS Xonar sound card may yet turn out to be useful now that I don’t have to worry about using it as an input as well as an output: it is a high quality eight channel DAC that seems well-behaved in terms of lack of ‘thump’ at power-on and -off.
  • I found the easiest way to adjust the snd-aloop sample rate dynamically was by cutting and pasting the source code for the standard ALSA/Linux program amixer into my program (isn’t open source software great?) and passing the commands to it with the same syntax as I would use at the command line.
  • The system seems stable and robust when the PC is doing other things i.e. opening up highly graphical web pages in a browser. No audible glitches at all and no jump in the difference between my record and playback sample counters.
  • I am, as yet, unsure as to the best way to implement the control loop that will keep snd-aloop and the Asus Xonar in sync. With a snd-aloop rate setting of 100000 i.e. nominally neutral, there is a drift of about one sample every couple of seconds (an evening’s worth of listening could be assured without any adjustment at all by have a large enough FIFO and slightly-longer-than-desirable latency…). I am currently keeping a count of the number of samples captured vs. the number of samples sent to the DAC and simply swapping between fixed ‘slightly slow’ (99995) and ‘slightly fast’ (100005) snd-aloop sample rates, triggered when the (heavily-averaged) difference hits either of two thresholds.
  • In terms of the ALSA sample streaming I just use the ‘blocking method’ inside two separate threads: one for capture and one for playback.
  • It occurs to me that this system could be used to stream to an HDMI output, thence to an AV receiver with multiple output channels. Not sure if the PC locks to the AV receiver’s DAC sample rate via HDMI (is it bidirectional?), or whether the AV receiver resamples the data, or syncs itself to the incoming the HDMI stream.

You may find it hard to get excited by this stuff, but not me: it’s a case of feeling that I own the system rather than my recent experiences that showed that with Windows the system is merely ‘under licence’ from Microsoft and the hardware vendors.

The Curse of Dimensionality

What has been called The curse of dimensionality is a phenomenon that has some relevance to Floyd Toole’s work and to many of the ideas that flourish in those audiophile forums that think of themselves as being at the scientific end of the spectrum. In a nutshell:

when the dimensionality increases, the size of the space increases so fast that the available data become sparse. This sparsity is problematic for any method that requires statistical significance.

Having followed a few online discussions in recent days, it seems clear to me that many people think that Floyd Toole’s experiments were/are controlling only a single variable: the speakers’ ‘directionality’. This is far from the truth*. And even if the tests had been genuinely controlling a single variable, the fixing of the other variables would have restricted the experiments to such a tiny subset of the overall problem space as to make them potentially meaningless**. In reality, the experiments created a few sparse pieces of data within a small fraction of the overall problem space.

In this particular case the research was attempting to confirm the answers to questions which, without doing any experiments at all, most people would have given the ‘correct’ answers to anyway:

“Which do you think is better? Flat on-axis frequency response or not? Flat off-axis response or not? Smooth off-axis response or not?”

Which is not to say that identifying the questions in the first place was not a significant achievement. It is just the pseudo-science of the experiments, the implication that the methodology can be extended to ‘solve’ all audio problems, and the conclusions that people draw from the results that I object to.


* Directionality was not a direct variable. Instead, a selection of existing speakers with varying characteristics were listened to, and their directionality measured. A ‘directionality index’ was created ‘heuristically’. Many other incidental variables related to the speakers were not controlled in the experiments.

** For example, if we restrict all the speakers to the same position in the room regardless of whether they work best there or not, restrict the choice of music to a few pieces of audiophile music, restrict the listening to mono, restrict SPL to the capabilities of the smallest speaker in the selection etc. etc. To test the entire ‘problem space’ meaningfully would require thousands of experiments – the curse of dimensionality.