This article describes the ideas behind the system I designed and built myself. It started out as a “I wonder what it would sound like if…” exercise, and turned into the system I would choose to listen to even if I had the chance to swap it for commercial gear.
It includes the first speakers I ever built. Other people have heard them (see later) and said some extremely nice things about them online.
It does seem ridiculous, but I think that in the world of speakers, an amateur can easily compete with commercial designs because they are allowed to use DSP and active amplification – methods that are yet to take hold in the commercial hi-fi (as opposed to professional audio) world.
Some aspects of my system that are not entirely typical of standard hi-fi systems are:
- Three-way speakers
- typical audiophile speakers are two-way
- Large (12″) woofers in large enclosures
- much more common to use smaller woofers e.g. 6″ in slim enclosures
- Polypropylene cones used within their optimal frequency ranges.
- Well-behaved, smooth, flat response.
- No ports
- most speakers use bass reflex to extend (or should that be reapportion) bass response at the expense of other characteristics
- Drivers are in close proximity to each other
- many people believe the woofer should be close to the floor, and many tweeters have very large mounting flanges. I aimed to get the drivers closer together than this.
- Active crossovers
- most speakers are passive i.e. the output of a single amplifier is split into multiple frequency bands using inductors and capacitors. In an active speaker, each driver has its own directly connected amplifier and the filtering occurs before the amplifier.
- DSP for
- crossover filtering
- infinitely adjustable, precise and repeatable
- driver correction
- drivers can corrected for phase and amplitude responses
- time alignment
- arrival time at listener’s ears can be adjusted for each driver
- in-room EQ
- compensation for the drivers/enclosure dispersion characteristics
- crossover filtering
- No sample rate conversion or resampling.
The system uses PCM audio only, so no analogue sources and no DSD. (It can be used directly with analogue sources via line level ADC inputs which are built in to the DAC unit, but it’s not something that interests me. I do use the analogue inputs for measurements via a microphone, but that is all).
High resolution PCM sources could be played – with increased CPU load – but I use 44.1/16 material exclusively.
Further room correction could be provided by modifying the crossover/correction filters, but I am not yet persuaded that it is necessary or always desirable.
The system is Linux PC-based, and the PC performs as both the source and the DSP processor. Standard GUI-based multimedia apps can play CDs and downloads, and the standard streaming services all seem to have Linux apps. Spotify has its ‘Connect’ feature which means that I can access the system wirelessly from an iPad or phone – very neat.
[I have recently tried UPnP/DLNA as a way of interfacing the system with the outside world. It works. All I have to do is run a UPnP ‘renderer’ app on the Linux PC, with its output directed to the default dummy driver (see later), and then UPnP ‘control point’ apps running on iPads and the like can stream audio to it, or even direct audio streams from a separate server on the network. It really is ‘plug and play’. So far I have not managed to get gapless playback to work properly, but I have established that the system works in principle using free apps.]
In hardware terms, the system comprises:
- a fanless PC (Sumvision Cyclone £100)
- a multi-channel DAC (Asus Xonar U7 £90)
- a six channel AV amplifier (Sony STR-DB1070 second hand £140, but the same type has been seen on eBay for £40!)
- 2 x speakers comprising:
- 12″ woofer driver type 902.222 (£50 per pair) in re-used Goodmans Magister enclosure (£20 per pair)
- 4″ mid-range driver type Peerless SKO100 (£18 per pair) in re-used Acoustic Research bookshelf enclosure (£0)
- 1″ tweeter type Monacor DT25N (£30 per pair) in the same enclosure – this tweeter may have some built-in ‘waveguide’ to its dispersion characteristics.
- 2 x 80µF film capacitors for tweeter protection (is this value too large for adequate protection? – not sure, but no disasters yet)
- some cables (£40)
- Maplin heavy duty speaker cable
- 4mm plugs from eBay – the nicer ones
- 3.5mm stereo jack cables from eBay
- miscellaneous materials (£30)
- draught excluder and neoprene foam for enclosure sealing
- MDF for new baffles
- screws for MDF
- crimp spade connectors for connecting to driver terminals
If I had to, I could probably get by with just a bog-standard, pragmatic hi-fi system: a PC, an amp and a pair of box speakers, but there’s no way I would want to pay more than a few hundred pounds for the privilege. The way I listen to music would change, and I think the range of music I listen to would be reduced.
At the other end of the scale, I would be intrigued to spend some time with a top quality system that used ‘rational’ methods to get closer to genuine high fidelity; the larger Meridian systems might fit the bill, for example, but I would find it embarrassing to spend £17000 on a hi-fi system.
DIY is an interesting ‘third way’. The aim is not to produce an imitation of a commercial product, and certainly not to tolerate performance that is inferior to something built by the professionals. For me, it is a direct way of exploring the myth and reality of what the experts tell us about hi-fi; in this regard, I would consider my foray into speaker design a great success.
At one time I thought I was interested in building amplifiers, and later I thought I was interested in making my own DACs, but I now realise how pointless those activities were. I was seduced by the thought of experimenting, creating, and by the idea that I could make the signal path simpler or purer, and that I might possibly be able to hear the difference though, even as an engineer, I wouldn’t be able to tell you why; I had bought into the whole “hi-fi is mysterious” thing. I was not considering that the basic ‘topology’ of a standard hi-fi system (source, single amp, passive speakers) might be the bottleneck.
Building speakers – in contrast to DACs and amplifiers – is very appealing because it involves messing about with woodwork and playing with acoustics. But second hand speakers from eBay are very cheap, so it raises the question of whether it can be worth building your own wooden boxes – sorry, I mean high performance acoustic enclosures – unless you are very serious about it. It occurred to me that one way to learn about speaker design would be to modify some existing speakers, and that the most versatile way of driving them would be active crossovers and DSP.
I duly modified a cheap pair of two-way floorstanders and wrote some simple PC-based DSP software running in Windows to do basic FFT-based crossover filtering, using a multichannel sound card to drive the amps. I wasn’t expecting anything spectacular, but even at this stage the sound was rather good. Thus, it seemed that it was going to be worth doing something more ambitious: three-way with unusually large sealed woofers (why not? – I had a vast budget of tens of pounds upwards and that is all it costs to buy the hardware needed for such a ‘high end’ configuration). Utilising a PC’s processing power, I could implement full DSP crossovers with driver correction and time alignment.
The full system
This has worked out much better than I might have imagined – I really had no idea of what was possible. It feeds my suspicion that much of the received wisdom in hi-fi is derived from expertise that was at the cutting edge in 1952 but can now be surpassed using digital hardware and software. It also suggests to me that size matters; that large amounts of design effort and hardware cost are spent in attempting to fix problems that appear only when you try to make speakers smaller than they should ideally be. You may disagree.
One notable aspect is the time domain behaviour of the system. Even so-called objectivists are convinced that the balance of the frequency magnitude response is all that matters, and to suggest otherwise is to commit audiophile heresy. Could it be that this conventional view is wrong, and that it explains some of the supposed mysteriousness of audio and the lack of correlation between orthodox measurements and perceived audio quality?
My DIY system suggests to me that if you get the frequency and time domains something like ‘correct’ by using linear phase crossovers and sealed woofers, then you have it all. Automatically. It also tells me that such a solution is robust, not balancing on a knife edge where the slightest modification to, say, a crossover frequency completely changes the sound.
Here’s how my system works:
- Source (CD, file or stream), DSP, DAC driver all reside in the same PC.
- Sample timing is driven by the external asynchronous USB DAC i.e. the source is effectively slaved to the DAC. This avoids any necessity to resample the digital data.
- Crossover filtering is performed by FIR filters in the frequency domain. This constitutes, effectively, what is known as a convolution engine.
- Several functions can be overlaid onto a single FIR filter (crossover filter, driver correction, in-room EQ, room correction – although I don’t apply room correction).
- Filters are pre-calculated. In this case, I calculate a basic crossover shape and apply driver correction. I also add EQ in the form of baffle step correction calculated from a standard formula.
- Left and right channels of the incoming stereo stream are each filtered with three pre-defined FIR filters, producing signals for woofer, mid and tweeter.
- A separate delay is applied to each driver for time alignment, based approximately on the distances from the driver’s acoustic centre to the listener at the expected height. (It is interesting to adjust this while listening to noise through the system).
- Processing is carried out in 32 bit floating point. The results are fed to the DAC as 24 bit integers.
- The DAC has eight channels available, but I am using only six.
- The system introduces latency which is proportional to the size of the FIR filter being used. In my particular case, I am only using the speakers for listening to music recordings, so I find a large filter and a latency of 500 ms to be acceptable.
Why Write My Own Software?
I wrote my own software because I often find that it is easier and more interesting to create something myself than to learn to understand someone else’s system. It is ’empowering’.
I wanted to bring the whole thing into one application, and to avoid the necessity of re-sampling the audio due to multiple sample clocks within the system. Another major reason was that I wanted to be able to make instantaneous changes to the settings and hear the result immediately. I could possibly have used existing packages like BruteFIR and RePhase to create an equivalent system, but the procedure would have been slow and convoluted, and I would never have really felt that I understood what was going on. With my own software I am able to, for example, change a single value for the bass-mid crossover frequency and have the software re-calculate both filters automatically and implement them instantaneously – it’s just so much more immediate.
By writing my own software I am also able to include a graphical user interface with useful ‘gadgets’ such as the ability to mute individual drivers, and real time bargraphs of the six output levels.
I think the aim should be to produce an overall neutral output from the speakers, and nothing else. Note that I am not talking about neutrality at the listener’s ears: I think that aiming to achieve this is faulty logic. The room is going to do what it does, with reflections of the direct sound arriving at the listener’s ears some time later – this is desirable, and why we listen in rooms and not anechoic chambers. A simplistic measurement of frequency response at the listening position will show all manner of undulations because of this. However, the listener will not hear them as frequency response colouration at all, and so attempting to ‘correct’ this by processing the signal with a ‘graphic equaliser’ will sound wrong even if it results in what looks like a perfectly flat frequency response. We just need to do our best to compensate for colourations that may arise from the dispersion characteristics of the drivers in combination with the physical presence of the speaker enclosure itself; DSP allows us to do this as subtly as possible.
Three-way, sealed enclosures
Sealed enclosures are simple and can be made closer to correct in the time domain than bass reflex. A three-way speaker is much simpler, and more neutral, than a two-way, as it inherently reduces the effects of beaming and lobing thereby improving the uniformity of the off-axis response, keeps the drivers away from break-up, reduces Doppler distortion and other forms of intermodulation distortion etc. etc. All of these small advantages add up.
Low cost polypropylene drivers
I derived driver corrections from on-axis impulse response measurements of my drivers (near and far field as appropriate).
The differences between ultra-high quality drivers and the ‘commodity’ polypropylene coned drivers that I used may primarily be in the way they handle the extremes of frequency and power which, in going three-way instead of two, are avoided. The costs of my drivers were £25 each for the woofers, £9 each for the mids and £15 each for the tweeters. I did originally intend to upgrade the drivers once I was confident I wasn’t going to blow them up, but have never detected any weakness in their sound – believe me, this has been as great a surprise to me as it might be to you.
Crossovers are not a problem
Actively amplified DSP based on FIR filters results in a much simpler system than the passive alternative, because it allows each parameter to be adjusted independently of the others, and results in the individual drivers being neutral. Crossovers then become transparent, allowing a three-way (or even four-way) speaker to be built without side effects.
Conventional wisdom says that crossovers are a problem and therefore we must minimise the number of ‘ways’, but it is very liberating to be able to sidestep this convention.
I decided to simply ignore the fashion for “minimum phase”: my speakers are linear phase and proud. Each driver is corrected to be linear phase, with the idea being that they can be blended at the listening position without a further thought. Conceptually, this may be directly equivalent to using an overall Linkwitz Riley crossover (i.e. one where the drivers do not ‘fight’ each other even if there is an overall phase rotation), and pre-correcting the phase of the incoming signal prior to the crossover. But the individual drivers would still need individual phase correction, so my way is just that bit easier to understand and set up, I think…
I originally expected to have to do a lot of experimentation with crossover shapes, slopes and frequencies because I so often saw references to the major sonic consequences of these. It didn’t work out that way: I quickly realised that once the individual drivers are corrected, and the drivers are well-behaved, and sufficient ‘ways’ are used, you can make massive changes with little audible consequence. In the end I settled on 4th order slopes with a calculated generic ‘smooth filter’ profile. If I instantaneously change from 2nd order to 8th order and/or change the crossover frequency by half an octave, I simply cannot hear the difference – but I suppose it might make a difference to the drivers in terms of distortion at very high output levels. In the end, the settings I chose were decided by a process of “on paper, that seems about right” rather than the anguished trade-offs that some people seem to get into.
[My overall experience has been ‘cognitive dissonance’ between the pain and never-ending compromises that the expert speaker designers tell us to expect, versus a very benign reality – I think you have to work quite hard to make a bad pair of speakers when you use the ‘checklist’ at the top of this article!].
Enclosures and dispersion
And what of those aspects like dispersion and the off-axis versus on-axis response? Well, using the enclosures and drivers I selected (or had lying about), they are what they are, and DSP gives some limited flexibility to adjust them by varying the crossover points, levels and slopes.
Some people say that the ideal enclosure reduces towards the top (e.g. KEF 105 MK1 where each driver is housed in a separate enclosure), while others say that the baffle should be as wide as possible (e.g. Grimm LS1). Clearly, the driver dimensions combined with the baffle dimensions are going to result in a certain dispersion pattern versus frequency (and by going three way we avoid the drivers breaking up which would result in weirder dispersion characteristics among other undesirable effects). The ideal is, supposedly, constant directivity at all frequencies, or maybe a smoothly narrowing directivity with increasing frequency. I have not simulated or measured this aspect of my speakers – they are ‘near enough’.
The only way to genuinely improve on this, as far as I can tell, would be to model the system in great 3D detail and, crucially, have a way of ranking the various simulated results in terms of their desirability. Given these prerequisites, then of course we could set a computer onto the task of finding the best combination of drivers, locations, box shapes and volumes, time delays, crossover characteristics. It would be an automatic process and would not require the input of any sort of human ‘maestro’. But as it is, even in this day and age, speakers still come in all shapes and sizes.
I am convinced that a simple ‘feed forward’ technique is likely to work, with the most important aspect being that enough ‘ways’ are used to avoid extremes that cause the drivers to go weird, and that individually, using DSP, they are kept as neutral as possible including the time domain. Adjacent drivers can then be blended judiciously around the listening position for delays and levels. (Believe me, if the results of this didn’t sound any good, I would change my mind and ‘rationalise’ a different method, but it’s sounding pretty good so far…).
In my system the mids and tweeters are mounted in suitably solid 1980s Acoustic Research bookshelf enclosures, and the large woofers are in very large sealed ex-Goodmans enclosures. The existing baffles were butchered by jigsawing large openings in them. I made new ‘supplementary’ MDF baffles incorporating a bit of bracing, and fastened them onto the existing baffles using many wood screws and rudimentary gaskets to seal them – this method may have something in common with the “lossiness” of BBC-style speaker enclosures, which also feature screwed baffles.
The woofers are mounted towards the tops of their enclosures and this means they are as close as possible to the midrange drivers. The tweeters were chosen because they have smaller mounting flanges than many alternatives and so can be mounted very close to the midranges; the idea was to minimise lobing and get as close as possible to a single unified driver. Don’t some people say that raising the woofer like this will result in ‘floor bounce’? Shouldn’t the drivers be placed asymmetrically on their baffles to stagger the effects of edge diffraction? Possibly, but that’s not what I did this time, anyway.
The resulting sound is gloriously ‘non-digital’. To me, it sounds remarkably ‘right’, and provides that extra dynamic something that you often get from live music. Spaces are opened up. The various musical elements occupy their own well-separated positions – the imaging is rock solid.
There is ‘flavour’ and ‘body’, warmth and coolness, clarity without artificial edge. The system can go very loud indeed, and of course a neutral system really comes alive when music is played back at realistic volume. When the music stops my ears are not ‘tensed up’ – which I find I am aware of with many systems.
I notice that many audiophile systems at shows sound ‘desiccated’. This may be partly the choice of music, but I think it is also a function of ported bass, passive crossovers and often two-way speakers. My system (searching for a suitable description) releases the ‘moisture’ in the music – if you get what I mean. I think this is a function of neutrality including phase and timing, the presence of the deepest bass, and clean isolation between the various musical elements.
There are more descriptions of how it sounds to me, here.
What I have found with my DIY project is that an audio system can, for all practical purposes, be perceived as providing a clear, colourless window on the recording – and once heard there is no going back. I am confident that many commercial DSP-based systems will sound equally transparent, and that passive crossover-based systems cannot. As such, I no longer feel any curiosity regarding particular brands of DAC, amplifier, speaker and all the other bits and pieces like cables, isolation stands etc. – a system cannot sound better than ‘transparent’. My only interest in commercial equipment these days is in the styling, ‘philosophy’ and ‘heritage’ aspects – I will always have soft spots for Quad and Meridian, for example. Is that a good position to be in, as an audiophile? I think so.
Scalford Hall 2016
I exhibited my DIY system at the HiFi Wigwam audio show in 2016. My room was one of the smaller hotel rooms and I didn’t change the settings from the ones I use in my larger listening room at home. An audiophile friend (who has many pairs of speakers) heard them and was struck by how “flat” the frequency response sounded – which I took as a compliment.
People seemed to like the system – and I later found some comments on the web:
It was one of the few things that stopped me in my tracks, I just had to take a look at what was producing the sound emanating from the room. Tonal accuracy, realism, deep but tuneful bass are all traits I would associate with what I heard.
[Qwin in a comment in this blog].
An experiment like mine may demonstrate that in the age of digital audio, the main determinants of sound quality are design choices, not the prices of the components. There are commercial systems that use DSP in a similar way, but also have high end prices, exotic drivers and high tech CAD-designed enclosures. Perhaps a DIY experiment is the only way to isolate the significance of each factor.
The DSP setup needs a combination of accuracy and pragmatism. The aim is to make the speaker itself neutral, not to attempt to ‘correct’ the effects of the room.
(Also check out my other DIY project).
[Last edited 03/02/18]