Can digital audio playback be improved by resolutions greater than 16/44.1?


Copied from another thread, courtesy of jgatie - full question was:

"Do you believe digital audio, outside of mastering/production techniques, can be improved by playback resolutions greater than 16/44.1?"

This topic has been closed for further comments. You can use the search bar to find a similar topic, or create a new one by clicking Create Topic at the top of the page.

93 replies

By the way, I am currently listening to my only Hi-res DD5.1 recording on my Sonos 5.1 system. It's "Selling England By The Pound" by Genesis. It sounds great to be surrounded by sound (and the vocals stay up front while other instruments come from the rear, so genuine 5.1). The quality is excellent. I wouldn't know if the hi-res makes a difference or not, it doesn't matter.
I had to smile. "Hi-res DD5.1" is pretty much an oxymoron. The maximum bitrate for DD/AC-3 is 640kbps. And that's for six channels (albeit the LFE is band-limited).

Does this all not help to illustrate the point that the 'quality' is in the mix and the mastering? And, let's not forget, the composition and the artists' rendering? Oh, and a bit of adult beverage to enhance the memory of the original live performances, though as I recall the scent in the air then was of something rather different....
Hi all,
I am the author of the meta-analysis paper being discussed, and I was asked to comment on some of the points made in this discussion.
The paper being referred to is available at http://www.aes.org/e-lib/browse.cfm?elib=18296 , and it links to additional resources with all the data and analysis.
Also, note that this was unfunded research. At no point has any of my research into high resolution audio or related topics ever been funded by industry or anything like that.
On to the specific comments;

“Dr. Reiss was the one making the PR statement… Dr. Reiss is lying in his PR statement” - I didn’t write the press release! Press releases are put forward by organisations with the aim of trying to get the press to cover their story, and as such are a combination of spin, marketing, opinion and fact. In this case, it was written by a press officer at my university, and then AES issued another similar one. The ‘advantage’ quote was based on a conversation that I had with the press officer, but it was not text directly from me (I just checked my email correspondence to confirm this). It most likely came from trying to translate the phrase ‘small but statistically significant ability of test subjects to discriminate’ to something that can be easily understood by a wide audience.

“explained by the presence of intermodulation distortion” – This was looked into in great detail, see the paper and supplemental material. First note that intermodulation distortion in these studies would primarily arise from situations where the playback chain was not fully high resolution, e.g., putting high resolution content through an amplifier that distorts high frequencies. Anyway, quite a lot of studies did look into this and other possible distortions (see Oohashi 1991, Theiss 1997, Nishiguchi 2003, Hamasaki 2004, Jackson 2014. Jackson 2016) and took measures to ensure it wasn’t an issue. This includes most studies that found a strong ability to discriminate high resolution content. In contrast, some studies that claim not to find a difference, either make no mention of distortion or modulation (like Meyer 2007), or had low resolution equipment that might cause distortion (like Repp 2006).

“I have [yet] to come across even one for hi res that does a decent job of doing level matched blind AB, leave alone a full protocol ABX… may I be pointed to even one ABX done in line with well established principle” – see the paper. There were a lot of studies that do double blind, level matched ABX testing. Many of those studies reported strong results. They all could suffer issues of course, but the point of the paper was to investigate all those studies.

“absolutely no evidence in the meta-analysis that there is an ‘advantage in its quality’” – I would not go as far as that. I neither claim there is or there isn’t; ‘advantage’ is too subjective. However, many of the studies looked at preference, or at what sounded ‘closer to live’, or asked people to comment on subjective qualities of what they heard. They do suggest an advantage to audiophiles, but I would argue that the data is not rigorous or sufficient in this regard.

“his cherry picking of studies” – A strong motivation for doing the meta-analysis was to avoid cherry-picking studies. For this reason, I included all studies for which there was sufficient data for them to be used in meta-analysis. That way, I could try to avoid any of my own biases or judgement calls. Even if I thought it was a poor study, its conclusions seemed flawed or it disagreed with my own conceptions, if I could get the minimal data to do meta-analysis, I included it.

“chance of him actually including studies that find there is no difference, like the M&M study … slim to none… disclusion of seminal studies on Hi-res audio like the M&M study” – I did include the M&M study (Meyer 2007)! See Sections 2.2 and 3.7 and Tables 2, 3, 4 and 5. I couldn’t include it in the Continuous results because Meyer and Moran never reported their participant results, even in summary form, and no longer had the data (I asked them), but I was able to use their study for Dichotomous results and it didn’t change the outcome.

‘explain why studies that did factor in the existence of IM distortion were left out, whereas studies that didn't consider IM were included’ – see previous points. I included every study with sufficient data, some of which considered IM and some didn’t. The Ashihara study (references 25, 60 and 61), was a detection threshold test, demonstrating only that IM could be heard and could be a factor in discrimination tests. Nor did they report results in a form that could be used for meta-analysis.
Userlevel 5
Badge +10
I simply have to say thank you to @joshr for joining in on this conversation! It is fantastic to have someone with your familiarity of this topic and depth of research connect with such a community!

As I'm sure you have noticed, many of the most active participants of this forum have built fairly strong views on this topic (and other topics :P) and as a result, may be perceived as being somewhat .... aggressive... in their response. 🙂 In my experience everyone on here does mean well and has a deep honest interest in the discussion, and you have added significantly to that conversation.

Incidentally, Having read much (with a lot yet to read) while I appreciate what people *may* be able to do in a testing environment, I stand by my original view that in a consumer market, higher resolution does not improve audio playback in any way that makes any meaningful difference to actual listening in a real world. I go back to the discussion referenced in the What Hi-Fi article : That people don't really note differences in their regular listening between mp3s and lossless such that they are not moving away from mp3s... I currently don't see the broader population demanding moving to higher-res as I don't perceive the typical individual truly discerning material differences in a real-world application. (And this is ignoring the potential communication bandwidth challenges that I perceive make product release/updates extremely problematic at this point in time.)
"Hi-res DD5.1" is pretty much an oxymoron. The maximum bitrate for DD/AC-3 is 640kbps. And that's for six channels (albeit the LFE is band-limited).

Indeed you are right, and thanks for pointing that out. I noted in a later post that the disc was actually DTS5.1, and I had to transcode it using a Playstation to feed the Playbar. So none of the original hi-res audio (it's 24/96) will make it through, and even if it did, the Sonos system would presumably refuse to play it.

The point of my story was not about hi-res, it was about the clear and positive effect of Trueplay tuning on a Sonos 5.1 setup. We also watched Dr Strange the other night with similar excellent 5.1 sound.
Off topic, if you want a treat, check this guy out: https://youtu.be/-V7Dqf-FQL4

Thanks for the link - quite a talent!

Returning the favor, check out Another One, a song by Project RnL from Israel. The bass solo at the end is extraordinary.
So none of the original hi-res audio (it's 24/96) will make it through, and even if it did, the Sonos system would presumably refuse to play it.

I have tried playing 24/96 ALAC through my CONNECT:AMP to see what would happen. The result was just white noise.

It sounded FANTASTIC, though. 😉
Another excellent digest/commentary from Archimago: http://archimago.blogspot.co.uk/2017/04/musings-do-we-need-those-20khz.html
As a jazz fan, most of what i listen to was recorded during the golden age of jazz, the 1950s, on magnetic tape. These great recordings have an equivalent bit depth of perhaps 11 bits, and a bandwidth of 15khz, at best. Anything beyond CD resolution will simply be a waste.

Most modern pop recordings have maybe 9db of dynamic range. Again, "hi rez" would be a complete waste. Modern classical and jazz recordings, meticulously recorded, might have a tiny technical advantage at 24 bits, but I very much doubt anyone can hear a worthwhile difference.
This will make no difference to those on the other side of the divide, to people holding close to faith based beliefs.
From Wikipedia:

Dithering eliminates the granularity of quantization error, giving very low distortion, but at the expense of a slightly raised noise floor. Measured using ITU-R 468 noise weighting, this is about 66 dB below alignment level, or 84 dB below digital full scale, which is somewhat lower than the microphone noise level on most recordings, and hence of no consequence in 16-bit audio.

Dither can also be used to increase the effective dynamic range. The perceived dynamic range of 16-bit audio can be 120 dB or more with noise-shaped dither, taking advantage of the frequency response of the human ear.

In other words, noise-shaped dither provides 16 bit recordings with all the resolution humans will ever need.
My answer is qualified in two ways:
1. Not unless the higher numbers - I think calling them resolutions is misleading - are in some way needed to capture what enhanced mastering/recording techniques contain.
2. There are many other ways that digital audio can/will improve, in the realms of speaker sound quality and room response capability.
Until of course the time comes when we get digital implants feeding sound signals directly to the brain!
Userlevel 5
Badge +10
I'm not the expert here, and I know a number on this board have disciplined and detailed opinions on This, but for fun, I'll weigh in early and then enjoy stalking from the sidelines...

From my experience, all I have read, and personal opinion I'll answer this way:

Not in any way that makes any meaningful difference to actual listening in a real world.

I live in a real house, with a honest-to-God forced-air *furnace*... and a refrigerator that insists on humming and buzzing at the odd time, and where people actually flush a toilet, and ... you get the point.

I enjoy music, yes. I love quality audio, yes. If people want to isolate amazing 'listening rooms' and convince themselves they can 'hear a difference' *Obviously!* when they listen to a "HiRes" file... great for them. Have at 'er all you like. But don't tell the rest of us we're stupid for listening to poor quality audio. Truth is, we're listening to pretty much all the human ear can discern now.

Sure HiRes may sell a few extra files for product manufactures (but I'm not convinced most of the original recording sources are of a format that can actually have anything but a bunch of empty 0s in the extra bit depth of the file) and I'm one that will be quick to point out that stats sell products... maybe the world eventually needs to be producing equipment that can deliver better than the human ear can hear in a real environment... but I don't think we should allow ourselves to be fooled that it is real... many many people don't care about facts... HiRes is better than 'normal res' and we obviously need TVs that display in better grain than the eye can perceive from a natural watching distance. That is the art of marketing... and we all know that marketing and facts may have a 'loose' relationship.
many many people don't care about facts... HiRes is better than 'normal res' .
I get what you are saying, but Hi Res music has lost the race won first by wireless multi room supported via streaming, which in turn is under attack for mind/market share from voice and home automation integration. People are willing to even sacrifice some sound quality for these features; the brain can fill in the small sound quality gaps quite easily, but the lack of these features remains once the need for them is perceived and brain games can't be a substitute.

If none of these developments had taken place, Hi Res would probably have been the next big thing for lack of anything else because people are always looking for it and would have drunk that Kool Aid for lack of an alternative.

Niche markets will always exist for everything of course - see vinyl and how that exists with little in its favour in terms of either user ease or sound quality!
Here's a thought that I'm surprised hasn't occurred to me before. It is standard practice in audio production to add dither (low-level random noise) to an audio signal when converting from 24 bit to 16 bit. It helps to mask the truncation error caused by rounding the low level 24 bit signal to the nearest 16 bit value. As I've mentioned elsewhere, 16 bit steps can be as large as 6dB at low volume. Truncation error is most noticeable when fading music to silence (and is presumably more obvious in headphones or with the furnace turned off 🙂 )

My thought is this: if adding low-level noise is required to reduce the audible effect of truncation to 16 bits, surely it would be preferable to retain more bits and eliminate the need to add random noise?

This link has more detail.
My limited understanding of the subject:
Used appropriately, 16 bits can give you 96dB of dynamic range; more than enough for almost everything but the Big Bang.

But because it’s so difficult to record an unpredictable performance, and because this almost inevitably results in a sub-par recording, it’s more sensible to use 24 bits when capturing audio live. There’s no harm done (relative to a 16 bit recording) in reducing 24 bit files to 16 bits - as long as it’s done nicely; nicely being a layman's term for doing dithering properly, I suppose. My guess is that doing this isn't rocket science to people in the industry and doing it allows for a best of both worlds approach, the other world being the one where living with 16 bits is dictated by the need to be compatible with the 16/44 format.

Will music recorded in 24 bit, rendered competently down to 16, sound audibly worse/different than that which can be played as recorded, in the 24 bit format? That is the question to be answered.
16 bits gives you 90dB. Dithering buys you the 96dB. You are right about the recording process needing 24 bits. There IS harm done reducing this to 16 bits, which is why dithering is necessary. My argument stands - if dithering is necessary, then more than 16 bits is preferable on playback.
Come to think of it, I don't think much of subjective competency is involved here. Many here have said that if a hi res file is down sampled to a Sonos compatible format and played on Sonos front ended kit, it cannot be audibly distinguished from where the file is being played in the native format via hi res capable kit, all other things like the speakers in use being the same.

I haven't any ABX experience of this, but I haven't found anyone that has countered this claim via ABX either.
My argument stands - if dithering is necessary, then more than 16 bits is preferable on playback.The counter argument is that if the dithering effects cannot be audibly picked up, why are more than 16 bits necessary?
The point is that if dithering is perceived to be necessary (which I think we agree on), wouldn't it be preferable to increase the bit rate rather than add random noise to audio?
Given the constraint imposed by the format, I suppose there isn't the space to increase the bit rate. Preferable perhaps but not essential then, and good engineering is that which achieves the required outcome with the least expenditure of resources.
Here is a Sep 2015 view on the subject from Sonos:
Sonos only supports up to CD quality, while others are capable of playing back 24-bit tracks. Giles Martin, sound leader at Sonos, claims this shouldn't make a difference though.

"You can't upscale audio, it doesn't work so it becomes a numbers game…I can make a call to shift us [to 24-bit], it isn't hard, but the problem is on the experiential side, it has to be right. You have to make sure things don't drop out or stop. Even with Tidal we are on the edge right now because the pipe needs to get bigger."

Martin added: "I refuse and the company refuses to play this numbers game where you go 'we're better than you', 'how much better', 'we are 8 better than you'. It doesn't make any sense as far as the consumer stuff goes and I think that's where we are at right now. I think it would be great if people listened to CDs right now and then say 'you know what, we need a bit more' and then they can experience it if they want and decide whether they can hear the difference or not as most people can’t."
And, if you read the often quoted xiph.org material on the subject, 24 bit for listening isn't even preferred to 16 bit, it just isn't necessary, and therefore a wasted resource.

A quoted part from that site:
"Professionals use 24 bit samples in recording and production for headroom, noise floor, and convenience reasons.

16 bits is enough to span the real hearing range with room to spare. It does not span the entire possible signal range of audio equipment. The primary reason to use 24 bits when recording is to prevent mistakes; rather than being careful to center 16 bit recording-- risking clipping if you guess too high and adding noise if you guess too low-- 24 bits allows an operator to set an approximate level and not worry too much about it. Missing the optimal gain setting by a few bits has no consequences, and effects that dynamically compress the recorded range have a deep floor to work with.

An engineer also requires more than 16 bits during mixing and mastering. Modern work flows may involve literally thousands of effects and operations. The quantization noise and noise floor of a 16 bit sample may be undetectable during playback, but multiplying that noise by a few thousand times eventually becomes noticeable. 24 bits keeps the accumulated noise at a very low level. Once the music is ready to distribute, there's no reason to keep more than 16 bits."

For why there is no reason, see the material on the site and see if it convinces you:) - dither is addressed.
All I know about 24 bit recordings is from my excellently produced and noticeably more expensive 24 bit recorded CDs from JVC, marketed as XRCDs - Xtra Range. Very good packaging, liner notes and mastering. But no better than the best CDs from other production houses for sound quality. They now sit in my NAS and sound just as good as they did when played through a high end SACD player with a pure copper chassis(sic), wired to a pre amp + amp, wired in turn to very good passive speakers, via thick speaker cables. But in place of all those boxes and messy wires and a component rack is a well placed 1 pair + Sub, tuned to the room via Trueplay. The racks of CDs are gone, into a NAS that isn't to be seen.

This is digital audio at its best. Will it be improved? Of course - via better active speakers and better versions of Trueplay. Hopefully there is also still some scope for Sonos to further improve the sound quality from the existing hardware, improvements delivered free over the net, all the way from the US to India.

This to my mind is real progress, not to be shied from. IMO, it is close to magic. All that is missing is the glow of the valves:D.
Peter, more from xiph for you to ponder and/or research:
"16 bit audio can go considerably deeper than 96dB. With use of shaped dither, which moves quantization noise energy into frequencies where it's harder to hear, the effective dynamic range of 16 bit audio reaches 120dB in practice [13], more than fifteen times deeper than the 96dB claim.
16 bits is enough to store all we can hear, and will be enough forever."

So not just dither, but shaped dither!
Thanks Kumar. That is fascinating - I never realized dither could increase the dynamic range that much. Having thought about it for a bit, I now understand how this works. It's interesting that Monty's plot cuts off at about 10kHz. I got to wondering what happens beyond that, and also what happens if there is more than one frequency? So here is an extended version of his plot. You can see the noise is shifted to frequencies that most of us can't hear, but those with excellent/young hearing might perceive increased very-high-frequency noise. Multiple frequencies increases the noise level slightly (see here), so real music with all frequencies present might increase this further.

So, based on this, I still contend that if dithering is necessary, it would be preferable to use more bits and not add potentially audible noise. I concede this is all really low level stuff.

Cheers, Peter.