Room correction. What are we trying to achieve?

The short version…

The recent availability of DSP is leading some people to assume that speakers are, and have always been, ‘wrong’ unless EQ’ed to invert the room’s acoustics.

In fact, our audio ancestors didn’t get it wrong. Only a neutral speaker is ‘right’, and the acoustics of an average room are an enhancement to the sound. If we don’t like the sound of the room, we must change the room – not the sound from the speaker.

DSP gives us the tools to build a more neutral speaker than ever before.

There are endless discussions about room correction, and many different commercial products and methods. Some people seem to like certain results while others find them a little strange-sounding.

I am not actually sure what it is that people are trying to achieve. I can’t help but think that if someone feels the need for room correction, they have yet to hear a system that sounds so good that they wouldn’t dream of messing it up with another layer of their own ‘EQ’.

Another possibility is that they are making an unwarranted assumption based on the fact that there are large objective differences between the recorded waveform and what reaches the listener’s ears in a real room. That must mean that no matter how good it sounds, there’s an error. It could sound even better, right?


A reviewer of the Kii Three found that that particularly neutral speaker sounded perfect straight out of the box.

“…the traditional kind of subjective analysis we speaker reviewers default to — describing the tonal balance and making a judgement about the competence of a monitor’s basic frequency response — is somehow rendered a little pointless with the Kii Three. It sounds so transparent and creates such fundamentally believable audio that thoughts of ‘dull’ or ‘bright’ seem somehow superfluous.”

The Kii Three does, however, offer a number of preset “contour” EQ options. As I shall describe later, I think that a variation on this is all that is required to refine the sound of any well-designed neutral speaker in most rooms.

A distinction is often made between correction of the bass and higher frequencies. If the room is large, and furnished copiously, there may be no problem to solve in either case, and this is the ideal situation. But some bass manipulation may be needed in many rooms. At a minimum, the person with sealed woofers needs the roll-off at the bottom end to start at about the right frequency for the room. This, in itself, is a form of ‘room correction’.

The controversial aspect is the question of whether we need ‘correction’ higher up. Should it be applied routinely (some people think so), as sparingly as possible, or not at all? And if people do hear an improvement, is that because the system is inherently correcting less-than-ideal speakers rather than the room?

Here are some ways of looking at the issue.

  1. Single room reflections give us echoes, while multiple reflections (of reflections) give us reverberation. Performing a frequency response measurement with a neutral transducer and analysing the result may show a non-flat FR at the listening position even when smoothed fairly heavily. This is just an aspect of statistics, and of the geometry and absorptivity of the various surfaces in the room. Some reflections will result in some frequencies summing in phase, to some extent, and others not.
  2. Experience tells us that we “hear through” the room to any acoustic source. Our hearing appears not to be just a frequency response analyser, but can separate direct sound from reflections. This is not a fanciful idea: adaptive software can learn to do the same thing.

The idea is also supported by some of the great and the good in audio.

Floyd Toole:

“…we humans manage to compensate for many of the temporal and timbral variations contributed by rooms and hear “through” them to appreciate certain essential qualities of sound sources within these spaces.”

Or Meridian’s Bob Stuart:

“Our brains are able to separate direct sound from the reverberation…”

  1. If we EQ the FR of the speaker to obtain a flat in-room measured response including the reflections in the measurement, it seems that we will subsequently “hear through” the reflections to a strangely-EQ’ed direct sound. It will, nevertheless measure ‘perfectly’.
  2. Audio orthodoxy maintains that humans are supremely insensitive to phase distortion, and this is often compounded with the argument that room reflections completely swamp phase information so it is not worth worrying about. This denies the possibility that we “hear through” the room. Listening tests in the past that purportedly demonstrated our inability to hear the effects of phase have often been based on mono only, and didn’t compare distorted with undistorted phase examples – merely distorted versus differently distorted, played on the then available equipment.
  3. Contradicting (4), audiophiles traditionally fear crossovers because the phase shifts inherent in (non-DSP) crossovers are, they say, always audible. DSP, on the other hand, allows us to create crossovers without any phase shift i.e. they are ‘transparent’.
  4. At a minimum, speaker drivers on their baffles should not ‘fight’ each other through the crossover – their phases should be aligned. The appropriate delays then ensure that they are not ‘fighting’ at the listener’s position. The next level in performance is to ensure that their phases are flat at all frequencies i.e. linear phase. The result of this is the recorded waveform preserved in both frequency and time.
  5. Intuitively, genuine stereo imaging is likely to be a function of phase and timing. Preserving that phase and timing should probably be something we logically try to do. We could ‘second guess’ how it works using traditional rules of thumb, deciding not to preserve the phase and timing, but if it is effectively cost-free to do it, why not do it anyway?
  6. A ‘perfect’ response from many speaker/room combinations can be guaranteed using DSP (deconvolution with the impulse response at that point, not just playing with a graphic equaliser). Unfortunately, it will only be valid for a single point in space, and moving 1mm from there will produce errors and unquantifiable sonic effects. Additionally, ‘perfect’ refers to the ‘anechoic chamber’ version of the recording, which may not be what most people are trying to achieve even if the measurements they think they seek mean precisely that.
  7. Room effects such as (moderate) reverberation are a major difference between listening with speakers versus headphones, and are actually desirable. ‘Room correction’ would be a bad thing if it literally removed the room from the sound. If that is the case, what exactly do we think ‘room correction’ is for?
  8. Even if the drivers are neutral (in an anechoic situation) and crossed over perfectly on axis, they are of finite size and mounted in a box or on a baffle that has a physical size and shape. This produces certain frequency-dependent dispersion characteristics which give different measured, and subjective, results in different rooms. Some questions are:
    • is this dispersion characteristic a ‘room effect’ or a ‘speaker effect’. Or both?
    • is there a simple objective measurement that says one result is better than any other?
    • is there just one ‘right’ result and all others are ‘wrong’?
  1. Should room correction attempt to correct the speaker as well? Or should we, in fact, only correct the speaker? Or just the room? If so, how would we separate room from speaker in our measurements? Can they, in fact, be separated?

I think there is a formula that gives good results. It says:

  • Don’t rely on feedback from in-room measurements, but do ‘neutralise’ the speaker at the most elemental levels first. At every stage, go for the most neutral (and locally correctable) option e.g. sealed woofers, DSP-based linear phase crossovers with time alignment delays.
  • Simply avoid configurations that are going to give inherently weird results: two-way speakers, bass reflex, many types of passive crossover etc. These may not even be partially correctable in any meaningful way.
  • Phase and time alignment are sacrosanct. This is the secret ingredient. You can play with minor changes to the ‘tone colour’ separately, but your direct sound must always maintain the recording’s phase and time alignment. This implies that FIR filters must be used, thus allowing frequency response to be modified independently of phase.
  • By all means do all the good stuff regarding speaker placement, room treatments (the room is always ‘valid’), and avoiding objects and asymmetry around the speakers themselves.
  • Notionally, I propose that we wish to correct the speaker not the room. However, we are faced with a room and non-neutral speaker that are intertwined due to the fact that the speaker has multiple drivers of finite size and a physical presence (as opposed to being a point source with uniform directivity at all frequencies). The artefacts resulting from this are room-dependent and can never really be ‘corrected’ unambiguously. Luckily, a smooth EQ curve can make the sound subjectively near enough to transparent. To obtain this curve, predict the baffle step correction for each driver using modelling or standard formula with some some trial-and-error regarding the depth required (4, 5, 6 dB?); this is a very smooth EQ curve. Or, possibly (I haven’t done this myself), make many FR measurements around the listening area, smooth and average them together, and partially invert this, again without altering phase and time alignment.
  • You are hearing the direct sound, plus separately-perceived ‘room ambience’. If you don’t like the sound of the ambience, you must change the room, not the direct sound.

Is there any scientific evidence for these assertions? No more nor less than any other ‘room correction’ technique – just logical deduction based on subjective experience. Really, it is just a case of thinking about what we hear as we move around and between rooms, compared to what the simple in-room FR measurements show. Why do real musicians not need ‘correction’ when they play in different venues? Do we really want ‘headphone sound’ when listening in rooms? (If so, just wear headphones or sit closer to smaller speakers).

This does not say that neutral drivers alone are sufficient to guarantee good sound – I have observed that this is not the case. A simple baffle step correction applied to frequency response (but leaving phase and timing intact) can greatly improve the sound of a real loudspeaker in a room without affecting how sharply-imaged and dynamic it sounds. I surmise that frequency response can be regarded as ‘colour’ (or “chrominance” in old school video speak), independent of the ‘detail’ (or “luminance”) of phase and timing. We can work towards a frequency response that compensates for the combination of room and speaker dispersion effects to give the right subjective ‘colour’ as long as we maintain accurate phase and timing of the direct sound.

We are not (necessarily) trying to flatten the in-room FR as measured at the listener’s position – the EQ we apply is very smooth and shallow – but the result will still be perceived as a flat FR. Many (most?) existing speakers inherently have this EQ built in whether their creators applied it deliberately, or via the ‘voicing’ they did when setting the speaker up for use in an average room.

In conclusion, the summary is this:

  • Humans “hear through” the room to the direct sound; the room is perceived as a separate ‘ambience’. Because of this, ‘no correction’ really is the correct strategy.
  • Simply flattening the FR at the listening position via EQ of the speaker output is likely to result in ‘peculiar’ perceived sound, even if the in-room measurements purport to say otherwise.
  • Speakers have to be as rigorously neutral as possible by design, rather than attempting to correct them by ‘global feedback’ in the room.
  • Final refinement is a speaker/room-dependent, smooth, shallow EQ curve that doesn’t touch phase and timing – only FIR filters can do this.

[Last updated 05/04/17]


Software: the future of audio

Last night, on a whim, I decided that I would like my active crossover software to display some sort of indication of the output levels being sent to the DACs. This is quite important, and something that I should have tackled quite a while ago. Basically, we should be worried about clipping, and also ‘overs’ i.e. those interpolated samples that are generated by DAC reconstruction filters in between the recorded samples and which have the potential to clip even though the recording does not, directly. By messing around with various types of driver correction and so on, am I running the risk of clipping? Or, am I wasting DAC resolution by needlessly attenuating my DAC outputs too much?

Here is how easy it was to display the information in a useful and aesthetically pleasing way:

  • I created six vertical rectangular areas on the active crossover app’s screen – one bargraph for each DAC output.
  • I decided upon a linear percentage display (not dB) and an update rate of 10 Hz
  • A timer was set to trigger at 10 Hz (the timer is provided by the GTK GUI library) and call the function to draw the six bargraphs
  • In the output function for the DACs, I take the absolute value of each sample as I write it to the DAC and compare it to the maximum recorded so far for that channel (out of six channels). I overwrite the maximum if it is exceeded. There is a ‘mutex’ interlock around the maximum value to prevent the bargraph drawing function from accessing it at the same moment.
  • The bargraph drawing function for each channel accesses that maximum recorded value and saves it. The maximum value for that channel is then reset to zero. The saved value is compared against that bargraph’s previous displayed value. If it is greater, a coloured rectangle is drawn directly proportional in length to the value. If it is less, the previous value is multiplied by 0.9, and the rectangle drawn to that height, instead. With this simple system, we have a PPM-style display that shows signal peaks that slowly decay.
  • The bargraph display function also records an absolute maximum for that channel, which doesn’t get reset. This value is displayed as a red horizontal line, thus showing the maximum output level for that particular listening session.

The result is one of those attractive arrays of VU meters that dances in response to the incoming signal levels. The results were interesting, and will alert me to any future mis-steps with regard to clipping – it still doesn’t tackle the issue of ‘overs’ directly, however.

But the reason for mentioning it, is to show the power and simplicity of engineering with software. To build a PPM meter in hardware and wire it all up, would not be trivial, and would take days, weeks or months for a commercial product. In software, it takes less than an hour and a half to construct it from scratch. Audio processing functions are equally simple to create and integrate within the system. It seems clear that once the basic DSP ‘engine’ is in place, complex audio systems can be put together like Lego. A perfectly capable three-way speaker can be built in days. It is not too hard to see how a three-way, six channel DSP system could simply be scaled up to create something like the Beolab 90.

Is this an exciting trend, or the end of everything that makes audio interesting? I think it is the former, but I can see that many traditionalists might disagree.

Active crossover running on fanless PC

sumvision cyclone

I bought a Sumvision Cyclone Mini PC for experimenting with running my active crossover software on a fanless PC. It’s no more than a tablet in a box, but it’s quad core and runs 64 bit Linux on an Intel Atom Bay Trail chipset and, presumably, can perform GFLOPS without dissipating more than few watts – that’s really quite amazing but it’s so easy to take such things for granted these days! It comes pre-loaded with Windows 8.1 and it was a pain to make it work with Linux. I relied heavily on a guide on the internet – thanks to the person who provided it.

Undoubtedly it will be much easier to install Ubuntu on one of these PCs in the future when the Linux people have caught up with the hardware. The WiFi doesn’t work yet so I am using a USB dongle, nor does the on-board audio but I am using the Asus Xonar U7 for that, anyway. Interestingly (to me anyway) I was able to remove Pulse Audio (I think) from this version of Ubuntu without affecting the other system settings [I think this was a fluke: removing pulseaudio properly is impossible, and what I really should do is merely set “autospawn=no” in /etc/pulse/client/conf and reboot].

Absurdly, once Linux was installed following the guide, it worked straight away with the Xonar U7 and my crossover software, plus 64 bit Spotify.

I am assuming I could plug in a common or garden USB DVD drive for playing CDs without a problem [tried this and it works fine].

While running the active crossover software and streaming from Spotify, according to psensor the core temperatures are stabilised at about 56-60 degrees C in an ambient room temperature of 22 degrees C, and overall CPU usage is about 18%.

UPDATE 13/09/15

Things haven’t worked out quite as smoothly as I thought: on the fanless PC I have been getting occasional glitches in the audio, in the form of a click audible within the music, perhaps once every 10 minutes on average. These don’t occur in silent sections of the music so I am assuming that it is a case of missing, or extra, samples rather than corrupted samples. I didn’t notice this when running the code on a Pentium IV based desktop machine.

As a result, I have made major changes to the software, reducing the number of threads from three to one (plus a default thread for the GUI – which is currently just a mute checkbox for each driver). There are suggestions on the web that the ALSA functions are not ‘thread safe’. So now, all the ALSA audio and DSP processing runs in a single thread and all ALSA calls are non-blocking. This arrangement dispenses with the necessity to lock various circular buffer pointers with mutexes when accessing them, so the code is now more stripped back and simpler to understand.

The main motivation for multi-threading originally was that I assumed that the OS would assign threads to different cores, so for coolest running it would be best to share the computation load across several threads. Therefore I expected to see the CPU load on one of the cores go up as a result of combining three threads into one, but it doesn’t seem to have greatly affected the CPU load traces, nor the core temperatures.

No glitches so far.

[UPDATE 29/10/15]

Still had the glitch problem! It wasn’t happening on a P4 desktop minitower PC, but on the Sumvision I might get a glitch once in ten minutes. Nothing drastic, but I found myself on edge waiting for it. Really, any glitches are unacceptable, even if only one every three hours.

Is it related to input or output? As an experiment I modified the code to stream the incoming audio to a file while playing music. When I heard a glitch I noted down the time it occurred. Examining the data in the audio editor app Audacity I found a discontinuity in the waveform. Gotcha! In order to test for this problem reliably I created an audio file containing a continuously-repeating ramp waveform. In my program I added a check on consecutive samples to flag up any discontinuities. Sure enough, the problem only occurred occasionally, but it always happened eventually. Playing with threads etc. didn’t get rid of it.

In desperation I started to look at the open source code for the snd-aloop driver I am using as my bridge between audio player apps and my code. I found a mysterious system whereby there are separate ‘rate shifts’ (the programmable sample rate I am relying on in my code) for playback and capture. I don’t really understand this: unless playback and capture are locked together (at least on average), it seems to me that they must eventually diverge and cause audio discontinuities. I bodged the snd-aloop source code in order to precisely lock together the playback and capture ‘deltas’. This sort of thing is outside my comfort zone. I had to re-compile the snd-aloop driver and use the Linux command insmod to load it into the system.

It worked. I now get zero errors no matter how long the system is running. The difference between the Sumvision and the old P4 may be explained by the fact that the PCs’ clocks were quite a bit different, and much more rate shift was necessary in the Sumvision in order to synchronise with the sound card.

I still think I am making this harder than it needs to be. Do I need the snd-aloop driver at all? Can it all be done with ALSA plugins? One Linux guru said I should write my program as a plugin itself. It has occurred to me that at least I can now modify snd-aloop in order to make it work as I want: not with its own sample rate at all, but merely as a relay from the capture demand to the playback demand.

But the bottom line is that the system is now working perfectly on the Sumvision Cyclone.

02/12/16: It seems that there will be no ‘official’ version of Ubuntu for the Bay Trail and Cherry Trail chipsets, but someone called Ian Morrison (a.k.a. “Linuxium”) has created an installer and very kindly made it available. I found that his version of Ubuntu 16.04 seemed not to boot on the Sumvision Cyclone, but 16.10 appears to be fine. I haven’t transferred my software over to it yet but my second Sumvision Cyclone appears to be working fine, with Wi-Fi. Many thanks to Ian for this. I wouldn’t know where to start in creating such a thing.

UPDATE 30/01/16: I just spent quite a large proportion of my Christmas break worrying about, and trying to fix, an issue that arose after I put the latest version of Ubuntu (mentioned above) onto a Sumvision Cyclone and installed my active crossover software. Glitches were back!

I cannot tell you how many fruitless attempts I made to solve it. I narrowed it down to the DAC output, where some zero-value samples were being substituted in the analogue output, but with the overall timing remaining correct. There were no EPIPE (buffer underrun) errors.

I created a ‘glitch detector’ where I generated a waveform from the DAC’s analogue output and fed it via a cable into the microphone input, looking for excessive sample-to-sample amplitude changes. Glitches would always occur, but it seemed worse when loading web pages etc.

Finally, I think I hit upon the solution in this forum topic:

Problem with Bay Trail and new kernels

It seems that recent versions of the Linux kernel have changed something regarding ‘C-states’, related to the way the processor cores are dynamically put into low power modes when idle in order to reduce average power consumption. With the new, more aggressive, power saving, they take longer to start up again (flushing pipelines etc.), and this has been causing Bay Trail setups to freeze completely. It is still being discussed as a live issue on Intel forums. I think I have been suffering from another side effect of this misguided change.

There is a workaround, which is to specify a boot option to keep the cores relatively ‘alive’ at all times. (There may also be a BIOS setting that I could have changed, too). It seems to have fixed my problem completely.

If this turns out to be the issue, it highlights the fragile nature of any IT-based product. An innocuous update in the operating system can kill your product because of real time issues; there is no amount of testing that can be done by the OS people that can eliminate the potential for problems in users’ own applications.

At one time, I naively thought that it was possible to put together an embedded PC-based system that could be ‘frozen’ and would always work, and could always be duplicated, but I have long given up hope on that. Embarking on any digital audio scheme based on a PC implies a commitment to constant maintenance, in a way no different from the constant maintenance you commit to when using, say, reel-to-reel tape recorders.

Linux Active Crossover is working!

[Update: now running on a fanless Bay Trail processor]

After a few evenings of half-hearted attempts to port my Windows code and make the changes needed to run on Linux, I finally got my head around what was needed, and it works! Unfortunately I’m not at the house where the amp and speakers are so I can’t try it ‘in anger’ but at least I can tell that I’m getting what sounds like correctly-filtered Spotify or CD from the three stereo outputs.

On a ten year old Dell GX520 it’s using about 16% of the CPU, and when you add in Spotify at about another 16% plus the snd-aloop driver and all the other stuff going on in an internet-connected PC, it comes to about 40% CPU, which is a bit higher than I had hoped – there’s a tiny amount of fan noise. Maybe there is scope to improve the efficiency of the crossover software: at the moment I am reading and writing 32 bit integers to/from the sound cards (one is a dummy sound card of course) but doing all the processing in floating point which therefore involves converting each sample twice with a potentially expensive operation. Maybe this can be speeded up. And I can always find a faster, cooler PC of course.

[13/07/15] In response to a comment, the point of all this is not just to implement basic crossover filtering, but to correct the drivers’ individual responses based on measurements, producing zero phase shift for each driver, and therefore perfect (or as close as possible) acoustic crossovers and zero overall phase shift. EQ such as baffle step correction is overlaid onto the filters’ responses without costing anything extra in CPU power. Individual driver delays are also added. I am not claiming this is unique, but nor is it commonplace. In terms of an active crossover it is the no-compromises version.

I have had this system working for a couple of years on a Windows PC, but Linux will be a cheaper and more elegant solution.

[UPDATE 18/0715] I have it running with the speakers with a choice of two sound cards: Asus Xonar DS and Creative X-Fi. It’s just a case of changing a few characters in the xover config file.

The control loop algorithm for maintaining the average sample rate at input and output (and avoiding any resampling) is an interesting problem to solve and I have had fun trying different algorithms based on PID loops and plotting the result out as a graph. The output sample rate is fixed, set by the card, and has to be inferred from the time between calls to send chunks of data to the output card but there will be a level of jitter on this due to the other things that the multi-threaded program is doing. We know the precise sample rate at the input (the snd-aloop loopback driver) because we are setting it. The aim is to keep the difference between number of samples read and number of samples output to the DACs at a constant level, but as we are sending and receiving chunks of data the instantaneous figure is fluctuating all the time. I presume that similar calculations are being performed in the adaptive resampling that would be usual when connecting together digital audio systems with differing sample rates – the difference being that this would affect the audio (subtly, but it undeniably would), while the aim of my scheme is that the timing adjustments merely affect the fill level of a FIFO, the sample rate being rigidly fixed and defined by the DAC.

[UPDATE 31/07/15]

Feeling confident, I bought an Asus Xonar U7 USB 7.1 sound card. This is based on the CM6632A chipset. I got it working but… trying to set the format to signed 32 bit within my program failed when addressing the device as “hw”. It also failed with S24_3LE and various other sample formats. However, 16 bit was accepted. Consulting the web, people commonly seem to have this issue with both CM6631A and CM6632A on Linux, and their workaround is simply to use “plughw” instead. However, if the “hw” device rejects a format, then, supposedly, the hardware cannot support it. All the “plughw” device does is automatically allow the OS to convert samples from the format you are using into one that the card can use. So I have a feeling that the card is only running in 16 bit mode, regardless of what my code is sending it.

If an application chooses a PCM parameter (sampling rate, channel count or sample format) which the hardware does not support, the hw plugin returns an error. Therefore the next most important plugin is the plug plugin which performs channel duplication, sample value conversion and resampling when necessary.

[03/08/15 UPDATE] Got back to the house where my system lives after the weekend, and was able to try my Asus Xonar U7 again. This time it accepted S24_3LE! Could this be the issue with hot-plugging versus not hot-plugging that other people on the web have seen? I have a feeling that my previous tests were with the U7 hot-plugged into a PC that was already on. Anyway, I now seem to be in business with the U7 and it sounds good.

Linux-based active crossover: getting there

A few weeks ago I wrote about my desire to dump Windows and to go with Linux for audio. The aim is to create an active crossover system that is the best of all worlds:

  • completely flexible, programmable down to bit level (I am going to program it – or pretty much port my existing code from Windows)
  • powerful enough to implement any type of filtering (large FIRs in particular)
  • not dependent on specific hardware – can use a variety of low cost PCs including old PCs at the back of the cupboard, fanless, compact, low powered, dedicated DSP cards.
  • all libraries, drivers, compilers are open source; not beholden to commercial companies
  • capable of streaming from a variety of sources without sample conversion
  • not bogged down with continuous updates and anti-virus shenanigans

The goal is to use DSP to replace the passive crossovers that so-degrade conventional speakers’ performance, not merely to use the PC as a ‘media hub’. The Linux-based audio system can do this, and despite its workaday image represents the ultimate hi-fi source component. Hi-fi sustains an industry, and hordes of enthusiasts are prepared to spend real money on it. What an interesting thought, therefore, to realise that as a source there will never be any need for a better component than the ‘Linux box’. Here exists a system, a general purpose number cruncher that is powerful enough for all audio applications, bristling with connectivity, easy to equip with digital to analogue converters whose raw fidelity have long surpassed the limits of human hearing, and yet (if you use an old, surplus PC) costs less than a Christmas cracker toy to own – unlike an equivalent Windows PC.

Details, details

Regardless, reading around the web on the subject, for my active crossover system I seem to either have unique requirements that no one has ever thought of, or my requirements are just so trivial as to be not even worth writing down by anyone. I am still not sure which it is…

On the face of it Linux seems to have audio covered and then some, but in amongst the fantastically comprehensive JACK solution I don’t really feel I know what is going on. It feels like overkill. Is the audio being resampled? I think I need a simpler solution.

Just to summarise the thinking behind my requirements:

  • I want to design my own DSP system rather than trying to adapt existing systems.
  • I want to be able to understand exactly what is going on.
  • Dedicated digital signal processing systems are relatively expensive, often not very powerful, and in order to get the most out of them they may entail a considerable learning curve without the effort being applicable elsewhere, whereas PCs running Linux are ridiculously powerful and cheap.
  • Linux can be installed on any PC for free, and there is no danger of The Powers That Be decreeing that it must be ‘upgraded’, with the high chance that the system will be broken by the upgrade. For example, the mandatory ‘upgrade’ from XP to Windows 7 broke my current system, entailing the fitting of a second sound card due to a change in functionality of a sound card driver. And it cost money.
  • I want the best of all worlds: to be able to program the system at low level as though it is a microcontroller sending samples to a DAC, but for it also to have nice GUIs, play CDs, run Spotify without the need for any other piece of hardware linked with a cable.
  • It would be nice if the system would run on any old PC e.g. fanless.
  • It would be nice to be able to use any sound card as the multichannel DAC.
  • I don’t want the system to resample the audio. This is the ‘killer’ requirement that, I think, most people never give a second thought to.

That last requirement is what the whole thing is about. It is nothing to do with conversion between 48 kHz and 44.1 kHz, or 96 kHz and 192 kHz, but is about the resampling that would be necessary in going from 44.0999 kHz to 44.10001 kHz, for example; if the source and DAC are at nominally the same sample rate, but use separate crystal clocks they will drift apart over time. This can be handled using adaptive resampling of the audio stream in software. Resampling would involve extra DSP, so even if I was happy that no audible degradation was occurring, it would be sapping more CPU power than was necessary, or relying on a particular type of sound card that does its own resampling.

The alternative is to ensure that the source and DAC are synchronised in terms of their average sample rates. The DAC will have a fixed, rigid sample rate, so the only rate that can vary is the source and, if the source is a stream of bytes from an audio application (e.g. a media player program), this synchronisation can be arranged by requesting chunks of data from the source only when the DAC is ready to receive it. A First-In-First-Out (FIFO) buffer is loaded with these chunks of data, and the data is streamed out to the DAC continuously.

I would like to think I have now found the solution using Linux. I would be very grateful if any Linux gurus out there would care to correct me if I am wrong on any of this:

  • Linux has several (confusing) layers when it comes to handling audio. However, most audio applications will work directly with ALSA, which allows fairly low level programming.
  • Typical Linux distributions also come with Pulseaudio loaded and running. Pulseaudio is a higher level system than ALSA and has many nice features, but automatically performs resampling(?). Pulseaudio can be removed.
  • Another step up in sophistication is JACK, a very comprehensive system that requires a server program to be running all the time in the background. There is no obligation to set JACK running.
  • As with Windows, fitting a sound card into a Linux machine causes the driver for that sound card to be loaded automatically. ALSA can then ‘see’ the card and it can be referred to as “hw:3,1” where the ‘3’ is the card, and the ‘1’ is a device on the card, or using aliases e.g. “hw:DS,1” etc. – this is useful because the numeric designation may change between boot-ups.
  • “hw” devices are accessed directly without any resampling. as opposed to “plughw” devices. Both options are usually available for most sound cards and their drivers. I am only considering the “hw” option.
  • Driver capabilities can be ascertained in detail by dumping the driver controls to a file using various methods e.g. “alsactrl store” etc.
  • Linux provides drivers that have been put together by enthusiasts based on sound card chipsets, so not all the facilities listed by the driver will necessarily be available for every card.
  • ALSA’s API allows real time streaming to and from ALSA devices, including multichannel frames. Taking data from a device is known as capture, and sending to a device is known as playback (or similar).
  • A device can be designated as the ALSA default, which most audio applications default to sending their output to. Applications like Spotify can only direct their output to the default device.
  • There is a ‘dummy’ driver available called snd-aloop. This can be loaded into the system at boot-up. To ALSA it appears as as a sound card called Loopback with eight capture devices and eight playback.
  • snd-aloop can be designated as the default device.
  • snd-aloop has a very desirable feature: its sample rate can be varied via a real time control. This control is accessible like the controls that are available on any sound card driver and can simply be set from a terminal using a command such as “amixer cset numid=49 100010” where 49 is the index of the control and 100010 is the value we are setting it to. The control can also be adjusted from inside your own program.
  • Clearly, if a way can be found to compare the sample rates of the DAC and snd-aloop, then snd-aloop‘s sample rate can be adjusted occasionally to keep the source’s average sample rate the same as the DAC’s. N.B. this is not dynamically changing the pitch or timing of the stream – this is fixed and immoveable and set by the DAC – but merely ensures that the FIFO buffer’s capacity is not exceeded in either direction. If the source was not asynchronous (e.g. not a CD or on-demand streaming application whose data can be requested at any time) but a fixed rate stream with no way of locking the DAC to its sample rate via hardware, then this would not be possible, and adaptive re-sampling would be essential.

After a few days of wrestling with this, my experience is as follows:

  • Removing Pulseaudio from Ubuntu (“sudo apt-get remove pulseaudio –force” or similar) has side-effects, and the system loses many of its GUI-adjustable settings options because various Gnome-related dependencies are removed too. It doesn’t ‘break’ the system; merely makes it less useable. The solution can be as crude as re-installing Pulseaudio in order to make a settings change and then removing it again! I don’t know that it is essential to remove Pulseaudio, but it certainly feels better to do so.
  • Various audio apps are happy to play their outputs into snd-aloop, and my software can capture its output and process it quite happily.
  • The real core essentials of using the ALSA API for streaming are straightforward-ish, but documentation beyond a simple description of each function is sparse. In many cases, the ALSA source code is viewed as being sufficient documentation. As an example, try to find any information on how to modify an ALSA driver control without actually delving into an existing program like amixer to try and work it out. I find that most ‘third party’ tutorials seem to obscure the essentials with multiple equivalent options demonstrating all the different ways that a single task can be performed.
  • My ASUS Xonar sound card may yet turn out to be useful now that I don’t have to worry about using it as an input as well as an output: it is a high quality eight channel DAC that seems well-behaved in terms of lack of ‘thump’ at power-on and -off.
  • I found the easiest way to adjust the snd-aloop sample rate dynamically was by cutting and pasting the source code for the standard ALSA/Linux program amixer into my program (isn’t open source software great?) and passing the commands to it with the same syntax as I would use at the command line.
  • The system seems stable and robust when the PC is doing other things i.e. opening up highly graphical web pages in a browser. No audible glitches at all and no jump in the difference between my record and playback sample counters.
  • I am, as yet, unsure as to the best way to implement the control loop that will keep snd-aloop and the Asus Xonar in sync. With a snd-aloop rate setting of 100000 i.e. nominally neutral, there is a drift of about one sample every couple of seconds (an evening’s worth of listening could be assured without any adjustment at all by have a large enough FIFO and slightly-longer-than-desirable latency…). I am currently keeping a count of the number of samples captured vs. the number of samples sent to the DAC and simply swapping between fixed ‘slightly slow’ (99995) and ‘slightly fast’ (100005) snd-aloop sample rates, triggered when the (heavily-averaged) difference hits either of two thresholds.
  • In terms of the ALSA sample streaming I just use the ‘blocking method’ inside two separate threads: one for capture and one for playback.
  • It occurs to me that this system could be used to stream to an HDMI output, thence to an AV receiver with multiple output channels. Not sure if the PC locks to the AV receiver’s DAC sample rate via HDMI (is it bidirectional?), or whether the AV receiver resamples the data, or syncs itself to the incoming the HDMI stream.

You may find it hard to get excited by this stuff, but not me: it’s a case of feeling that I own the system rather than my recent experiences that showed that with Windows the system is merely ‘under licence’ from Microsoft and the hardware vendors.


It seems that there is a new smart interface for your music collection, mentioned here and here.


I’ll bet it is good if you like that sort of thing – but worth $119 a year? You decide.

Many’s the time with Spotify I have wished that it could simply display a full screen image of the album art while playing – not much to ask, but seemingly too difficult to arrange. Not to mention being able to sort search results, a useful facility that seemed to disappear with an update some time ago and is bitterly regretted by the users – but bizarrely lives on in the Linux version (I have been trying to work out the story behind why they thought it was a good idea to remove it, but can’t!). Clearly, it must be possible to do something better in the non-Spotify world, and I have every confidence in roon.

But something caught my eye in the various mentions around the web: people are enquiring about roon’s sound quality, and no one knows, or wants to give them a straight answer.

Well let me do it: the sound quality will be exactly what you can get / are getting right now. There is no mystery. Digital audio is not mysterious. It is just numbers. A new user interface is not going to change the numbers. And unless something is very wrong, it is not going to change how the numbers are sent to your DAC. OK?

Trying Linux

UPDATED 16/03/15ubuntu-logo-8647_640 Approximately every two years I find myself inspired to have a go with Linux. I install Ubuntu on an old PC and congratulate myself on having finally made the right choice. Everything works fine: all the devices are auto-detected correctly, and although the graphics and text are a bit lumpy, it looks as though it can do everything Windows can do. It never lasts. Within a short time I try to do something beyond the basic web surfing and word processing and it doesn’t quite work. So I go to the web, and of course there’s usually a solution buried in a forum somewhere, and it invariably involves editing a config file. But along the way I may have found several other ‘solutions’ that didn’t work, and for each I maybe edited a different file or changed something using some little app I’ve installed. At the end, even though the system may be working, I am never quite sure how I got there, nor confident I could reproduce the same working system on another PC.

Well, the time has come again, and I am typing this using the latest version of Ubuntu. Everything is wonderful so far, and even Spotify is running flawlessly. Specifically, though, I want to get my active crossover system working on Linux, not Windows. My experience with Windows 7 running on slightly older PCs is not good. I have a laptop approximately 5 years old which will grind almost to a halt for several minutes every day, performing some sort of scan of itself, and I don’t know enough to do anything about it. The desktop PC that I use for the active crossover is slightly better, but it, too, takes quite a while to ‘warm up’ and is also prone to the occasional glitch while playing music, due to deciding to update its anti-virus database – I am sure it was not a problem with Windows XP. In contrast, running Ubuntu on an older desktop PC without much RAM, the experience is one of ‘solidity’. I am not experiencing the operating system going AWOL for several seconds at a time. But it comes at a price. I really, really don’t want to have to understand the details of any operating system, and Windows is good for the person who maybe wants to dip into a bit of programming (a distinctly different activity from IT) without having to worry too much about the really low level details. Windows feels as though it is ‘self-healing’. Every time the PC is turned on it starts scanning itself, checking for inconsistencies, downloading updates. New hardware is detected automatically and the user never edits configuration files. Ubuntu feels a little different. By all means correct me if I am wrong, but the impression I get is of a system that is dependent on lots of configuration files that are not hidden from the user. Of course these files get changed by the operating system itself (just as Windows must change its hidden configuration files) and there are little applications that you can install that simplify changing the parameters of various sound cards, say (more on this later). But occasionally the configuration files must be edited by the user using a text editor. One typo, and the PC may refuse to boot!

As I mentioned, I am hoping to run my active crossover stuff on Linux, not Windows. In order to achieve this I must loop continuously doing the following:

  1. Extract a chunk of stereo audio from an ‘input port’ that receives data from my application of choice (media player, Spotify etc.)
  2. Assemble the data into fixed-size buffers to be FFT-ed.
  3. Process with FIR filters to produce a separate, filtered output for each driver.
  4. Inverse FFT.
  5. Squirt the results out to six or eight analogue channels, or if feeling ambitious, HDMI (that would be the dream!).

It’s a very specific, self-contained requirement. I can handle numbers 2 to 4, no problem. 1 and 5 are the tricky ones, and seem to be a lot trickier than they, perhaps, might be. They weren’t all that easy in Windows, either, but I eventually came up with a scheme that kind of worked.

Here’s where it gets very specific: under XP I was able to use a single Creative X-Fi surround sound card as both the ‘receptacle’ for PC audio which I could then access with my application, and also as the multichannel DAC that my application could squirt its output to. Under Windows 7 the driver for the sound card was ‘updated’ and I could no longer access it as the receiver for general PC audio – I could still have used it for S/PDIF, analogue Line In etc., however. In the ideal world, the ‘receptacle’ would just be some software slaved to the output sample rate, I think, but I don’t know how to create such a piece of software – it would appear to Windows to be a driver I would guess. I could buy a piece of software called Virtual Audio Cable but I could never be sure whether that would always be re-sampling the data, and I’d rather avoid that. In the end, I used a method that I knew would work: I slaved a ‘professional’ audio card to the X-Fi using S/PDIF from the X-Fi. The M Audio 2496 can slave its sample rate to the S/PDIF (using settings in the M Audio-supplied configuration application) so I was able to send PC audio to the M Audio and my application could extract data from its ‘mixer’ at the same sample rate. Keeping the input and output on separate cards like this has some advantages when it comes to making measurements of the system while it is working, I think.

As a start I will probably try to do the same thing under Linux. I am attempting to use an Asus Xonar as the multichannel DAC, and another M Audio card I had lying around as the slaved source. It’s almost certain that I could achieve the objective without a second sound card, but I really don’t know how to do it [update 30/06/15: maybe I do know how to do it now]. Linux audio seems to have several ‘layers’ that I don’t understand (but as yet I have no view of them as layers, more as spaghetti). Really, I would like not to have to know anything about them at all, but this seems unrealistic. I have established the following:

  • I can do lowish-level audio stuff using the Alsa API. I can refer to specific cards by names that I can bring up with certain command line (shell) queries. Are these names guaranteed to stay the same in between boots? I don’t think so, but there are ways of editing the config files to associate names I choose to specific cards – I think.
  • There is a highly comprehensive system called JACK that allows “JACK-aware” programs to have their audio routed via a user-configurable patchbay. It can handle re-sampling between separate cards transparently. Brilliant, but I don’t think Spotify is “JACK-aware” for example so I’m not bothering with it. [Update 30/06/15: I want to avoid any form of re-sampling anyway]
  • Ubuntu has PulseAudio installed already (I think) and using an application (that I had to install) called Pavucontrol I can direct Spotify, and presumably other apps, to send their outputs to any of the sound cards in the system. Does this get written to a file and saved when I exit it? I think so. PulseAudio may be the thing I need, possibly being capable of creating software “sources” and “sinks”. But is it always resampling the audio to match sample rates even when that is not needed? More investigation needed. [Update 30/06/15: Pulseaudio cannot be guaranteed not to resample. I have removed it from the machine].
  • I installed a little program called Mudita24 that gives me most of the functionality of the app that is supplied for M Audio cards under Windows. It will let me slave the M Audio to S/PDIF. But without a lot of rummaging around on the web, finding this solution was not obvious. Will the results be saved to a file so I don’t have to call this up every time? I don’t know. [Update 30/06/15: the M Audio-compatible drivers don’t seem to work properly. I have abandoned this idea].
  • I found a “minimal” example program that can send a sine wave to an output via Alsa. The program is anything but minimal and allows the user to select from a large number of alternative sample rates, bit depths etc. etc. and has copious error reporting. My version of “minimal” is much shorter! I adapted the program for eight channels, and am sending a separate frequency to each of the Xonar’s outputs. It seems to be working quite solidly. I can’t be absolutely sure that the Xonar isn’t applying surround sound processing to the signals yet, though. Question: should I be programming using Alsa or PulseAudio? [Update 30/06/15: answer is most definitely ALSA only].

I don’t mind if everything is low level, nor do I mind if the operating system handles everything for me. What I am not keen on is a hybrid between the operating system doing some things automatically, and yet having to manually edit files (I haven’t done that yet, though) or having to install little apps myself. How are they all tied together? I don’t know.

UPDATE 10/03/15 Installed Ubuntu on my erratic Windows 7 laptop. On the hard drive I had to delete the ‘HP Tools’ partition to do it, as a PC can only have four partitions, apparently, and HP had used all four to install Windows – the things you learn, eh?

For the things I use the laptop for mainly, Ubuntu is knocking Windows 7 into a cocked hat. It actually responds instantly and doesn’t hang for tens of seconds with the disk light on constantly and the mouse pointer frozen. It’s taking some getting used to!

UPDATE 15/03/15 It is becoming clear to me that there is only one sensible solution for what I am trying to achieve (an active crossover / general DSP system under my control that can be applied to any source including streaming) that is guaranteed not to resample the data, nor is dependent on sound card-specific features, or needs two sound cards. Let me run this by you:

  • Media player apps need something that looks like a sound card to play into. Some apps will only play into whichever card is set as the default audio device.
  • If it’s a real sound card that’s being played into, I need to extract the data before it reaches the analogue outputs. This just may not be possible with many sound cards, and it is impossible to know without trying the card – no one cares about this issue normally.
  • I process the data into six or eight channels and then I need to squirt the results out to, effectively, some DACs (or HDMI). This is most likely a real, physical multi-channel sound card.
  • I believe that the media player’s sample rate is defined by the sound card it is playing into. If so, this is akin to asynchronous USB mode i.e. the media app is slaved to the sound card’s sample rate.
  • I would like to avoid sample rate conversion (and this would still be needed to convert between 44.09999 kHz and 44.10001 kHz i.e. there is no such thing as “the same sample rate” unless they are derived from the same crystal oscillator).

There is a Linux driver called snd-aloop which can act as a virtual audio node, recognisable by media player apps as a sink, but also recognisable by other apps as a recording source. I could send media player output into this virtual device, recognise it as a source for my application, process the data and send the multi-channel audio to a consumer-level DAC card without it needing any special features. However, there is a subtle problem: aloop’s sample rate is derived from the system-wide “jiffies” count. It will not match the sample rate of the DAC card even if they are both nominally 44.1 kHz.

I see just one sensible solution: I have to modify the aloop code so that, when the information is available, it gets its sample rate synchronisation from the DAC card. I could either modify aloop and send it this synchronisation information via a ‘pipe’ or shared memory (if that’s possible) from my active crossover application, or I can make my active crossover application a virtual sound card driver itself. Either way, I would need to register the driver with the system so that it can be set up as the default audio device (using the usual GUI-based sound preferences).

To any Linux programmers out there: does this sound sensible and do-able?

More later.

Update 30/06/15: It seems that there is an updated version of the snd-aloop driver which incorporates a dynamically-adjustable sample rate via the Alsa PCM interface. This could be precisely what I need.