Thoughts on DSD

jolon
16 min readMay 24, 2016

--

Sony and Phillips introduced a new CD format called Super Audio CD (SACD) in 1999. It took a radically different approach to CDs (although was backwards compatible with an optional CD layer). Despite very strong marketing efforts to bring SACDs to the main stream it never really took off and seemed to appeal mainly to audiophiles. Since the launch of SACDs we now have high res PCM downloads as well as higher resolution DSD downloads.

In this article I want to look at some of the arguments for and against DSD as a format and some of the issues that occur in the mastering and playback chain from various parties and my current thoughts on the format.

Table of Contents:

  1. The argument for DSD
  2. The problem with PCM
  3. The problem with DSD
  4. So how can we improve DSD?
  5. The DSD Playback Controversy
  6. Bits or frequency?
  7. Chord Hugo
  8. Back to Analogue
  9. Volume Control
  10. My thoughts and the future

The argument for DSD

To understand why DSD produces better sound we need to look at how analogue is converted to digital and back to analogue again.

Most ADCs and DACs around 20 years or so ago started to use 1 bit converters. That is, as the analogue signal comes in 1 bit is sent out, either off or on. The overall effect is something known as PDM (Pulse Density Modulation). It is similar to a greyscale image displayed using purely black and white pixels. If you squint the grey levels are still visible. The key is that if the resolution is high enough it can actually reproduce all of the greyscale colours. The trade off is instead of storing 8 bits per pixel to represent 256 grey scale levels, we may need to store an 8x high resolution image of just pixels which are either on or off.

The following figure describes the concept. We start with a small greyscale image (top left) and convert it to 1 bit (right). We can see that the quality degrades. If we scale the 1 bit image 2x (right again) we can see how much the image has degraded.

1 bit image conversion is an analogy to DSD

However if we scale the original greyscale image first (bottom left), and then convert the image to 1 bit (bottom right), the fidelity is much higher. We haven’t introduced any new data, we are just using more 1 bit pixels to represent the greyscale data. There is a point where a sufficiently high resolution 1 bit image can produce the same fidelity as the original 8 bit grey scale data.

Coming back to our ADCs and DACs, this is how they see the data initially as 1 bits and then convert to multibit data, known as PCM (Pulse Code Modulation). All CDs use PCM. PCM is just like our 8 bit greyscale above, except that it uses 16 bits per sample on CDs. So in the recording or mastering studio analogue is first converted to 1 bit data which is then converted to 16 bit PCM which is then stored on a CD (or FLAC/WAV/AIFF etc.). When the CD is played back, the PCM data is read from the CD, and in the DAC, converted to 1 bit data before being converted to analogue, which can be achieved with a very simple low pass filter. Effectively the same as squinting your eyes at the 1 bit image and it looks greyscale.

The following image from a Positive Feedback article (Feb 2012) illustrates the process:

Conventional ADC/DAC process and benefits to PCM

The argument for DSD is that it bypasses an unnecessary conversion to PCM. But why is PCM so bad that we should avoid it?

The problem with PCM

To understand the problem with PCM we need to understand the Nyquist sampling theorem, which states that a signal can be perfectly reconstructed as long as it is sampled at 2x the highest rate to be reproduced. Human hearing maxes out at about 20kHz (some can hear a little higher).

That is why CDs have a sampling rate of 44.1kHz, twice 22.05kHz. The problem, is that the PCM sampled data, which is discrete, must be converted back to a continuous signal. It turns out there are multiple ways of doing this and there are tradeoffs with each. If we simply stick a capacitor there as a low pass filter, we will get a rolloff at the top end. The frequency response drops off towards 20kHz, as a result this isn’t really high fidelity, although it can sound pleasing and warm. The reality is that a good quality vinyl record and turntable can probably produce a flatter frequency response (possible even a good quality tape player).

Another approach is to manipulate the digital data so that it doesn’t rolloff at 20kHz using digital filters. The problem with this approach is that it introduces ringing into the signal. Post ringing is generally more pleasant but has a time smear effect. Pre ringing can result in a tighter sound but is very unnatural and can cause listening fatigue, causing the sound to sound harsh.

Initially all CD DACs were multi-bit PCM DACs. This meant that the 16 bits were fed through a resistor network with no conversion and then through a capacitor low pass filter. This sounded somewhat pleasant, although early CD recordings were generally quite poor because they had a brick wall filter in the ADC.

Not long afterwards 1 bit DACs were introduced. These DACs use digital filters to produce a flatter frequency response and in the process converted the signal to 1 bit before outputting. The outputted signal didn’t necessarily need any low pass filtering.

So the problem with CD is the 44.1kHz sampling rate and the illusion that Nyquist’s theory works perfectly fine in the real world. In reality it doesn’t.

This is not to say there is a problem with PCM though. A simple solution is to sample at a higher rate such as 96kHz. This is often known as a high-res recording. By sampling at 96kHz we avoid the issues of either frequency roll off or ringing.

Coming back to DSD, its advantage over CDs is that it samples at 2.8MHz. When it was released at the turn of the millenium, the sample rate was considered exceedingly high, and the sample rate alone was enough to create an interest to try the format.

But years later it turns out that DSD has its own issues…

The problem with DSD

2.8Mhz (64 times 44.1kHz) is an exceedingly high sample rate. Why sample so high?

Well it turns out that humans don’t just hear frequencies but they also perceive time cues in music. For example if you were standing in a forest and heard a twig snap, you would hear the leading edge of the sound very immediately. Additonally the subtle reflections of sounds off objects, such as tree trunks in the forest, or within a room, provide a sense of space and location of objects. These cues can occur at much higher resolutions than the frequency that we hear.

It turns out that that DSD is much better at reproducing these affects. The effect is most apparent with acoustic recordings such as a live orchestra. The challenge with non-acoustic recordings (and even many acoustic recordings), is that there is a fair amount of editing involved. As it turns out it is very difficult to manipulate a 1 bit signal. Most mastering is done by first converting to PCM. One of the original issues with DSD is that many of the recordings were actually converted to PCM in the editing process and mastered back to DSD losing most of the DSD benefits. The PCM mastering was probably done at 88.1kHz or higher, so the end result probably still sounded better than a normal CD.

However, that is not where the problem with DSD lies. It turns out that DSD has similar problems to CDs. You can see in the above image examples that the 1 bit image has lower fidelity than the 8 bit greyscale image. That’s because we need a higher resolution. But it turns out that the fidelity varies based on the frequency. For example, large areas of colour are ‘accurately’ reproduced in the 1 bit image, but if there is fine texture, the 1 bit version will be less accurate, because it can only be on or off on those regions, so we lose detail in a sense. The same also happens for the audio signal. As the frequency increases we actually get lower resolution. So how does that fit in with the concept above of accurate timing? Well the timing is very precise, but the amplitude of the signal isn’t.

Any deviation from the original signal, we count as noise. So effectively we get increased noise in the higher frequencies and it turns out that DSD on SACDs rises very quickly over 20kHz. In fact it becomes so high that it can cause an issue for amplifier circuitry so most DSD DACs filter the signal around 50kHz. This makes a mockery of DSD being “high resolution”, it is only marginally better than a 48kHz DAT recording.

But it turns out that DSD still has a special quality to it, and most people that prefer DSD would indicate that it sounds better than even 24 bit 192kHz PCM.

So how can we improve DSD?

Anything in audio can simply be improved by increasing the sample rate. One solution to DSD’s issue is to increase the 2.8MHz sample rate. In audio parlance 64fs means 64 times the CD sampling frequency of 44.1kHz. So DSD on SACDs is 64fs, often now referred to as DSD64. More recently DACs have become available that can play back DSD128, DSD256, and even DSD512. Recordings are becoming available in these formats as well.

The biggest challenge with the higher sampling rate is that more bits are required. A DSD256 album can easily weigh in at 8GB. That may seem extreme in comparison to an iTunes album of about 100MB. But people who value sound quality would almost certainly find downloading and storing an 8GB file simpler than purchasing, storing, and playing back a vinyl record or even a CD. Another comparison could be people that download and keep movies. In my opinion DSD256 is not out of the realms of downloadability.

DSD512 could be another story though. 16GB for an album? But it turns out that we probably don’t need to go there.

DSD128 almost completely resolves DSD64's issues, in a similar way to 96kHz solving 44.1KHz’s issues. There is another gain at DSD256 but it is much smaller than the jump from DSD64 to DSD128. In fact, even though DSD64 can sound amazing, the jump to DSD128 is very noticeable.

Okay DSD64 has some issues but sounds great. DSD128 sounds even better, so why not just settle on DSD64/128/256 and be done with it?

The DSD Playback Controversy

I mentioned above that most modern DACs are 1 bit and that DSD removes the unnecessary conversion to PCM.

Well it turns out that after 1 bit DACs became popular, multibit DACs became popular, but in a different form.

They have different names, but we can call them oversampling DACs. They take the PCM signal oversample to a very high frequency, and instead of converting a 1 bit signal, or retaining the 16 bit or 24 bit PCM signal, it is converted to a 5 or 6 bit PCM signal. Why do this and what is the downside?

The main reason this is done is because most people would admit that a non-oversampling resistor ladder DAC sounds the best (the initial one mentioned above for CD players). The problem is that you need a lot of resistors and they need to be produced to very high specifications. As a result resistor ladder DACs are very expensive. Some of the most expensive DACs in the world use resistor ladders. In fact one of the main motivations for moving to 1 bit DACs was to reduce cost.

To improve sound fidelity over 1 bit DACs a small resistor ladder was introduced. It’s cheaper than a full 16 bit or 24 bit resistor ladder, and better quality than a 1 bit DAC. Because it also operates at 5 or 6 bits PCM it means that it doesn’t need to sample at such a high frequency for a 1 bit signal.

Most DACs today are oversampling DACs. For example the ESS Sabre DAC chips are considered one of the best in the somewhat affordable price bracket. They are oversampling DACs.

The controversy is that DSD is meant to avoid PCM conversion, so what happens to a DSD signal when it gets into a oversampling DAC?

Well what happens is that it is converted to 5 or 6 bit PCM because that’s how the DAC works.

The purists cry foul, and claim that converting to PCM defeats the whole purpose of DSD and how can we possibly begetting the true DSD experience? As a result some recent DAC chips now support two pathways, one is for PCM data, and the other is for DSD data which avoids the PCM conversion. But is this necessary?

Bits or frequency?

The real question lies in what makes DSD sound good, is it the fact that it is 1 bit, or is it the fact that it is being sampled at a very high frequency? (or possibly both)

One way to look at this is purely in the digital domain. Can a DSD signal be faithfully converted to PCM and back losslessly? (and vice versa). The answer is generally no, in that you can’t convert a DSD64 signal to 24/192 PCM and back losslessly. But there is a point where it can be done losslessly. For example you can certainly convert a 1 bit 2.8MHz DSD signal to a 16 bit 2.8 MHz PCM signal and back losslessly. The key here is the sampling rate.

Note that 1 bit DSD is effectively 1 bit PCM. In other words DSD is PCM, but at 1 bit. You can’t go lower than 1 bit. So simply introducing more bits will allow a lossless conversion back and forth.

So the issue here is clearly not the number of bits (from a lossless perspective), the issue is the sampling frequency.

You can’t losslessly convert a 2.8MHz signal to 192kHz and back. There is a format called DXD which is actually a PCM format which is 24/384. It’s one of the highest PCM formats. The advantage of working in PCM is for editing.

I’ve heard 44.1, 88.2, 96, 174, 192, and 384 recordings as well as DSD64 and DSD128. I would definitely say that anything higher than 44.1 avoids the issues of harshness or roll off. However there is something that still sounds veiled or separated from reality with the PCM formats. DSD has an almost eery presence feeling to it (it doesn’t feel eery, but it just feels more as if it is really there, or that there is less hindering it from being there). I don’t think that 96kHz provides that feeling. 192kHz may just start to provide a hint of it, if I had to put a number on it, it could be 2%. DXD on the other hand really starts to provide some of that DSD effect, I would say that it has 25–30% of the DSD quality (if a number could be put on it). In other words, for me the DSD quality is related to the sampling rate not the number of bits.

The big question is, what is the special sampling rate? Obviously 2.8MHz is special. 384kHz appears to be getting there. Is it possible to have PCM lower than 2.8MHz that still has that same special sound? In the same way that humans can’t hear over 20kHz, is there a lower limit to provide the DSD affect?

DSD128 clearly sounds better than DSD64. It is sampling at 5.6MHz, but the benefit is probably not due to the higher sampling rate as such but more due to the limitations of DSD64, in fact it could be argued that the limitation is the 1 bit not the sampling rate.

So why waffle on about this? The main reason is to understand how detrimental PCM conversion in a DAC is.

Chord Hugo

Chord produce some interesting DACs (with interesting names).

Rob Watts explained their approach to handling DSD in detail on head-fi.org.

In essence, their previous QuteHD DAC converts DSD to PCM, but it doesn’t reduce the sample rate, in fact the DAC upsamples to 2048fs (~90MHz!).

Their newer Chord Hugo DAC has a digital volume control. This is harder to implement at higher sample rates so that volume control operates at 16fs (16 times 44.1kHz or 705.6kHz). As a result in the Hugo all DSD signals are downsampled to 705.6kHz. This is sacrilege for DSD purists. I haven’t heard a 700kHz+ PCM file (I don’t think any exist), and I haven’t heard any of Chord’s products, but it does beg the question as to where DSD’s benefits start and stop.

24/192 used to be consider the pinnacle of high res PCM and now we have 24/384. 24/192 only provided a very tiny essence of what is in DSD, in fact you can easily miss it, it is so miniscule. 24/384 on the other is very present, not in its fullness, but the jump between 192 and 384 is substantial (in my opinion), not quite the same as DSD64 and DSD128 or 44.1 and 96, but close. The question is how much more benefit do you get going above 700kHz? Maybe we get 90% there and maybe that’s enough?

The point is that the Hugo may well be downsampling DSD to PCM, but it is doing it at a PCM rate of 705.6kHz, that’s 16 times a CD’s sample rate, and 4 times 24/192, which not long ago was considered very high, and possibly unnecessary.

Back to Analogue

Okay, so maybe PCM at high enough sample rates is getting close to or even equivalent to DSD. But what about what happens in the DAC chip itself?

Some argue here that one of the advantages of DSD is that all of the processing on the signal has been done. With 1 bit DSD, you can literally put a capacitor on the end of the digital signal and it is instantly analogue. You can’t do that with PCM (although you can use a resistor ladder).

There is an argument that if all the DAC chip needs to do is pass through the 1 bit data without modifying it then there is less electrical activity happening within the chip and less likely to affect the analogue signal through parasitic electrical noise. This is a valid argument, but one I have very little experience with.

There are some DSD DACs which don’t use chips at all, such as the Lampizator, which essentially amplifies the DSD signal enough so it can be transmitted to the power amplifier, but other than that, simply low pass filters it. No solid state electronics are involved, not digital!

As a result there has been some interest in converting normal PCM files (e.g. CD files) to DSD on a computer and sending the DSD data to a DSD DAC, which should allow minimal conversion in the DAC itself and result in better analogue quality. This is all about reducing the electrical activity in the DAC itself.

It’s an interesting idea because theoretically a 1 bit DSD DAC can be very simple to implement. As long as all of our data being sent to it is in DSD format. Normally this is not the case as most audio is in PCM format, but if playback software can do the conversion on the fly (which JRiver, Foobar, etc. can do) it would mean that PCM DACs could become a thing of the past. Note that most consumer DACs don’t support DSD, so for example, you can’t just play DSD files on a Mac or iPhone. Although I wonder how long it will take for Apple or Microsoft to incorporate native DSD playback in their operating systems?

Volume Control

So what about volume control and other modifications of PCM signals?

It was thought that it is very difficult to edit DSD signals. However Signalyst’s HQPlayer is able to do volume control, EQ, and crossovers, all in native DSD. So it definitely is possible, but not many manufacturers seem to be using it.

The argument that we need to convert to PCM for volume control or other editing could become a weak one. But it looks like it will take some time for software to be rewritten to support the DSD pipeline.

My thoughts and the future

My thoughts are two fold:

  1. DSD is a fantastic format, that has been unleashed in the downloads era, with the increasing availability of DACs, and higher sample rate versions such as DSD128 and DSD256. Additionally it appears that editing DSD signals is now possible and hence there is little need for PCM conversion. So the future for DSD is looking very bright.
  2. The main reason DSD sounds good is the high sample rate not the 1 bit format. It is possible that PCM at 700kHz and above has most of the DSD qualities. As a result DACs that choose to convert to PCM, or software for editing and mastering, may not have a significant impact on the sound quality if they remain above 700kHz. Also the Chord Hugo came out 2 years ago, it is possible that in the future they may be able to down sample to 32fs which is 1.4MHz PCM.

I think the line between DSD and PCM will become increasingly blurred as sample rates increase. Will everything ultimately go 1 bit DSD? I don’t know, but I think it is a possibility.

Will we keep seeing higher sample rates? I think so. People purchased 24/192 when there isn’t a great improvement over 24/96. So my feeling is that ultimately DSD512 may become popular. But there is a point of diminishing returns. I find even DSD64 very listenable on a good quality system.

Other aspects that push the bandwidth envelope are multichannel audio. Original SACDs supported multichannel from the beginning. Many SACDs, even today, have a stereo track and a 5 or 6 channel multichannel track. That’s 8 tracks! Most people don’t have a multichannel system so it could be argued that space is wasted. I’ve wondered in recent times whether the SACD format would’ve been better off using DSD128 from the get go, instead of trying to introduce multichannel. Having said that I have heard good things about multichannel, so may be that will be the next evolution?

Ultimately we are in a very interesting era in audio. Many people don’t realise the fidelity that can be achieved with vinyl and even tapes. Nonetheless vinyl has clicks and pops, and tapes have hiss. CDs eradicated both of these but never actually attained true analogue quality. The solution has always been to increase the sample rate, but the issues are in the capturing, processing, delivery, and playback of such high sample rates.

It seems we are in a special era where we are reaching sample rates that can truly match analogue, whether it is DSD128 and higher, or 700kHz+ PCM.

--

--