A few weeks ago I wrote about my desire to dump Windows and to go with Linux for audio. The aim is to create an active crossover system that is the best of all worlds:
- completely flexible, programmable down to bit level (I am going to program it – or pretty much port my existing code from Windows)
- powerful enough to implement any type of filtering (large FIRs in particular)
- not dependent on specific hardware – can use a variety of low cost PCs including old PCs at the back of the cupboard, fanless, compact, low powered, dedicated DSP cards.
- all libraries, drivers, compilers are open source; not beholden to commercial companies
- capable of streaming from a variety of sources without sample conversion
- not bogged down with continuous updates and anti-virus shenanigans
The goal is to use DSP to replace the passive crossovers that so-degrade conventional speakers’ performance, not merely to use the PC as a ‘media hub’. The Linux-based audio system can do this, and despite its workaday image represents the ultimate hi-fi source component. Hi-fi sustains an industry, and hordes of enthusiasts are prepared to spend real money on it. What an interesting thought, therefore, to realise that as a source there will never be any need for a better component than the ‘Linux box’. Here exists a system, a general purpose number cruncher that is powerful enough for all audio applications, bristling with connectivity, easy to equip with digital to analogue converters whose raw fidelity have long surpassed the limits of human hearing, and yet (if you use an old, surplus PC) costs less than a Christmas cracker toy to own – unlike an equivalent Windows PC.
Regardless, reading around the web on the subject, for my active crossover system I seem to either have unique requirements that no one has ever thought of, or my requirements are just so trivial as to be not even worth writing down by anyone. I am still not sure which it is…
On the face of it Linux seems to have audio covered and then some, but in amongst the fantastically comprehensive JACK solution I don’t really feel I know what is going on. It feels like overkill. Is the audio being resampled? I think I need a simpler solution.
Just to summarise the thinking behind my requirements:
- I want to design my own DSP system rather than trying to adapt existing systems.
- I want to be able to understand exactly what is going on.
- Dedicated digital signal processing systems are relatively expensive, often not very powerful, and in order to get the most out of them they may entail a considerable learning curve without the effort being applicable elsewhere, whereas PCs running Linux are ridiculously powerful and cheap.
- Linux can be installed on any PC for free, and there is no danger of The Powers That Be decreeing that it must be ‘upgraded’, with the high chance that the system will be broken by the upgrade. For example, the mandatory ‘upgrade’ from XP to Windows 7 broke my current system, entailing the fitting of a second sound card due to a change in functionality of a sound card driver. And it cost money.
- I want the best of all worlds: to be able to program the system at low level as though it is a microcontroller sending samples to a DAC, but for it also to have nice GUIs, play CDs, run Spotify without the need for any other piece of hardware linked with a cable.
- It would be nice if the system would run on any old PC e.g. fanless.
- It would be nice to be able to use any sound card as the multichannel DAC.
- I don’t want the system to resample the audio. This is the ‘killer’ requirement that, I think, most people never give a second thought to.
That last requirement is what the whole thing is about. It is nothing to do with conversion between 48 kHz and 44.1 kHz, or 96 kHz and 192 kHz, but is about the resampling that would be necessary in going from 44.0999 kHz to 44.10001 kHz, for example; if the source and DAC are at nominally the same sample rate, but use separate crystal clocks they will drift apart over time. This can be handled using adaptive resampling of the audio stream in software. Resampling would involve extra DSP, so even if I was happy that no audible degradation was occurring, it would be sapping more CPU power than was necessary, or relying on a particular type of sound card that does its own resampling.
The alternative is to ensure that the source and DAC are synchronised in terms of their average sample rates. The DAC will have a fixed, rigid sample rate, so the only rate that can vary is the source and, if the source is a stream of bytes from an audio application (e.g. a media player program), this synchronisation can be arranged by requesting chunks of data from the source only when the DAC is ready to receive it. A First-In-First-Out (FIFO) buffer is loaded with these chunks of data, and the data is streamed out to the DAC continuously.
I would like to think I have now found the solution using Linux. I would be very grateful if any Linux gurus out there would care to correct me if I am wrong on any of this:
- Linux has several (confusing) layers when it comes to handling audio. However, most audio applications will work directly with ALSA, which allows fairly low level programming.
- Typical Linux distributions also come with Pulseaudio loaded and running. Pulseaudio is a higher level system than ALSA and has many nice features, but automatically performs resampling(?). Pulseaudio can be removed.
- Another step up in sophistication is JACK, a very comprehensive system that requires a server program to be running all the time in the background. There is no obligation to set JACK running.
- As with Windows, fitting a sound card into a Linux machine causes the driver for that sound card to be loaded automatically. ALSA can then ‘see’ the card and it can be referred to as “hw:3,1” where the ‘3’ is the card, and the ‘1’ is a device on the card, or using aliases e.g. “hw:DS,1” etc. – this is useful because the numeric designation may change between boot-ups.
- “hw” devices are accessed directly without any resampling. as opposed to “plughw” devices. Both options are usually available for most sound cards and their drivers. I am only considering the “hw” option.
- Driver capabilities can be ascertained in detail by dumping the driver controls to a file using various methods e.g. “alsactrl store” etc.
- Linux provides drivers that have been put together by enthusiasts based on sound card chipsets, so not all the facilities listed by the driver will necessarily be available for every card.
- ALSA’s API allows real time streaming to and from ALSA devices, including multichannel frames. Taking data from a device is known as capture, and sending to a device is known as playback (or similar).
- A device can be designated as the ALSA default, which most audio applications default to sending their output to. Applications like Spotify can only direct their output to the default device.
- There is a ‘dummy’ driver available called snd-aloop. This can be loaded into the system at boot-up. To ALSA it appears as as a sound card called Loopback with eight capture devices and eight playback.
- snd-aloop can be designated as the default device.
- snd-aloop has a very desirable feature: its sample rate can be varied via a real time control. This control is accessible like the controls that are available on any sound card driver and can simply be set from a terminal using a command such as “amixer cset numid=49 100010” where 49 is the index of the control and 100010 is the value we are setting it to. The control can also be adjusted from inside your own program.
- Clearly, if a way can be found to compare the sample rates of the DAC and snd-aloop, then snd-aloop‘s sample rate can be adjusted occasionally to keep the source’s average sample rate the same as the DAC’s. N.B. this is not dynamically changing the pitch or timing of the stream – this is fixed and immoveable and set by the DAC – but merely ensures that the FIFO buffer’s capacity is not exceeded in either direction. If the source was not asynchronous (e.g. not a CD or on-demand streaming application whose data can be requested at any time) but a fixed rate stream with no way of locking the DAC to its sample rate via hardware, then this would not be possible, and adaptive re-sampling would be essential.
After a few days of wrestling with this, my experience is as follows:
- Removing Pulseaudio from Ubuntu (“sudo apt-get remove pulseaudio –force” or similar) has side-effects, and the system loses many of its GUI-adjustable settings options because various Gnome-related dependencies are removed too. It doesn’t ‘break’ the system; merely makes it less useable. The solution can be as crude as re-installing Pulseaudio in order to make a settings change and then removing it again! I don’t know that it is essential to remove Pulseaudio, but it certainly feels better to do so.
- Various audio apps are happy to play their outputs into snd-aloop, and my software can capture its output and process it quite happily.
- The real core essentials of using the ALSA API for streaming are straightforward-ish, but documentation beyond a simple description of each function is sparse. In many cases, the ALSA source code is viewed as being sufficient documentation. As an example, try to find any information on how to modify an ALSA driver control without actually delving into an existing program like amixer to try and work it out. I find that most ‘third party’ tutorials seem to obscure the essentials with multiple equivalent options demonstrating all the different ways that a single task can be performed.
- My ASUS Xonar sound card may yet turn out to be useful now that I don’t have to worry about using it as an input as well as an output: it is a high quality eight channel DAC that seems well-behaved in terms of lack of ‘thump’ at power-on and -off.
- I found the easiest way to adjust the snd-aloop sample rate dynamically was by cutting and pasting the source code for the standard ALSA/Linux program amixer into my program (isn’t open source software great?) and passing the commands to it with the same syntax as I would use at the command line.
- The system seems stable and robust when the PC is doing other things i.e. opening up highly graphical web pages in a browser. No audible glitches at all and no jump in the difference between my record and playback sample counters.
- I am, as yet, unsure as to the best way to implement the control loop that will keep snd-aloop and the Asus Xonar in sync. With a snd-aloop rate setting of 100000 i.e. nominally neutral, there is a drift of about one sample every couple of seconds (an evening’s worth of listening could be assured without any adjustment at all by have a large enough FIFO and slightly-longer-than-desirable latency…). I am currently keeping a count of the number of samples captured vs. the number of samples sent to the DAC and simply swapping between fixed ‘slightly slow’ (99995) and ‘slightly fast’ (100005) snd-aloop sample rates, triggered when the (heavily-averaged) difference hits either of two thresholds.
- In terms of the ALSA sample streaming I just use the ‘blocking method’ inside two separate threads: one for capture and one for playback.
- It occurs to me that this system could be used to stream to an HDMI output, thence to an AV receiver with multiple output channels. Not sure if the PC locks to the AV receiver’s DAC sample rate via HDMI (is it bidirectional?), or whether the AV receiver resamples the data, or syncs itself to the incoming the HDMI stream.
You may find it hard to get excited by this stuff, but not me: it’s a case of feeling that I own the system rather than my recent experiences that showed that with Windows the system is merely ‘under licence’ from Microsoft and the hardware vendors.