Skip to main content

The 13.4.1 S2 update added hi-res (Ultra HD) and Dolby Atmos audio support from Amazon Music Unlimited. With this update, Sonos released this great article about hi-res audio and how you can listen to it on Sonos. It’s a very detailed and well-written article:

https://blog.sonos.com/en-us/hi-res-audio-guide

No claims about quality anywhere.

What about:

 Why would you want to listen to music in hi-res? Again, McAllister explains. “The benefit of listening to hi-res music is that you’re listening to the actual file from the studio,” he said. “No conversion had to take place to change the 24-bit track into a 16-bit track. Listening to a track at 24-bit is a guarantee that you’re hearing the audio exactly as it sounded in the studio. 

Implicit in the quote is that with 16 bit what you are hearing isn't exactly as it sounded in the studio...


Why can’t the music industry standardize these designations and require compliance. Not sure who or how it would be done, just want transparency and honesty not marketing speak. 

 

An industry, or more accurately just an industry related group, can create a standard, but they can never require compliance. All they can really do is market and educate the public on what the standard is, why it’s important, and make sure the standard is followed strictly by product that applies and claims to meet the standard.  if the public doesn’t know or care about the standard, and it’s loosely enforced, then it’s pointless.  All that takes a lot of money, and generally speaking, if it doesn’t help increase sales, why bother.   I think the different music services primary means of competing is on audio quality, so they don’t have a big interest in standards.

That said.  it seems standards pretty much existed and worked in the days of physical media.  Once everything  stated going digital and ‘customers’ pirated music in whatever format and quality they wanted, things got all shot to hell.  Even when you could start buying music digitally, the industry didn’t want you to know that the quality was worse than CD.

 

 


Ahh, I see, good catch!

 

This is the way it was always going to turn out.  The conversation continues until you get caught in a trap.  There was never going to be another outcome.


Does the generation 2 of the sub support Amazon Music high res 24 bit audio? I do not see anything about the compatibility of the subs. I did the update on the app and currently I am only seeing HD and no Ultra HD song tags (which I know are UHD). My current set up uses two fives with a gen 2 sub. Thanks. 

From the article:

As of this post’s publish date, the following Sonos products are capable of playing 24-bit music at 48 kHz: RoamArcBeam (both generations), FiveSub (all generations)MoveOneOne SLPortAmp, SYMFONISK Bookshelf, SYMFONISK Table Lamp, Play:5 (Gen 2), Connect (Gen 2), and Connect:Amp (Gen 2).


Cite what??

 

Actual evidence for your claim.  

 

If you are refering to my last statement about the audible difference between currently available HD and UltraHD tracks, that is more a conclusion, based on those fundamentals of digital signal porcessing which I tried to briefly summarize before, rather than a claim.

 

How about the claim that “digital audio with increased sampling rates of 96kHz or even 192kHz would indeed provide a very noticeable benefit as it allows for more precise positioning and depth of the sound sources.”


It makes me wonder if there was a certain amount of “everyone else is doing it, we’ll lose sales if we don’t too” in the decision.Sometimes, being the lone voice of reason in the wilderness isn’t the ideal position to be in when you’re trying to sell stuff. Of course, other times, it’s the perfect thing...so who knows? 


Cite what??

 

Actual evidence for your claim.  

 

If you are refering to my last statement about the audible difference between currently available HD and UltraHD tracks, that is more a conclusion, based on those fundamentals of digital signal porcessing which I tried to briefly summarize before, rather than a claim.

 

You’ve clearly done enough research on the topic to know that what you’ve stated is far from accepted fact. You had to know that if you post a theory like this on a forum with others knowledgeable on the subject, it’s not going to be blindly accepted.


Ahh, I see, good catch!

 

This is the way it was always going to turn out.  The conversation continues until you get caught in a trap.  There was never going to be another outcome.

Yeah, internet formus seem to attract those flat-earthers who use negative results to prove the non-existence of any given phenomenon.  I don’t really bother especially if the most compelling counter argument they present is that they have never heard about it garnished with a hint on the many yeears of experience they have under their belt.

But for arguments sake, you may want to read J. Robert Stuarts’s paper on “Coding for High-Resolution Audio Systems”, published 2004 in the Journal of the Audio Engineering Society. Everything I have stated about sampling rate and high frequency content you can more or less find in Chapter 5 of this paper, and in particular section 5.1 Psychoacoustic Data to Support Higher Sampling Rates:”...It has been suggested that perhaps higher sampling rates are preferred because, somehow, the human hearing system will resolve small time differences which might imply a wider bandwidth in a linear system. In considering this it is important to distinguish between perceiving separate events which are very close together in time (implying wide bandwidth and fine monaural temporal resolution) and those events which help build the auditory scene, for which the relative arrival times are either binaural or well separated. In the first case, wider bandwidth is required to discriminate acoustic events that are closer together in time. This seems to be an alternative statement of the problem to determine the maximum bandwidth necessary for audible transparency...Events in time can be dis- criminated to within very fine limits, and with a resolution very substantially smaller than the sampling period. This point is crucial because provided we treat all channels identically to ensure no skew of directional information, there is no direct relationship between the attainable tem- poral resolution and the sampling interval.

So independet of whether you follow the author’s hyphotheses and findings or not, it is a well known paper from an AES Fellow, so you cannot really say that you never heard about this stuff.  

Following this paper in 2007 there has been an article from Meyer and Moran, also published in the AES Journal, whith the objective to find out if there are any audible gains from high-res audio playback by doing some extensive, formalized testing. Their conclusion was that based on their test methodology they could not find any significant preference for hi-res audio over the CD standard, even when using high-end headphones or speaker systems. However, they very correctly noted that “it is very difficult to use negative results to prove the inaudibility of any given phenomenon or process”. The most intriguing part of this arcticle, however, was their final note on high-resolution recordings: “Though our tests failed to substantiate the calimed advantages of high-resolution encoding for two-channel audio, one trend became obvious...throughout our testing: virtually all of the SACD and DVD-A recdoings sounded better than most CD’s - sometimes much better...Partly becauseu...]engineers and producers are give the freedom to produce recodings that sound as good as they can make them, without having to compress or equalize the signal to suit lesser systems.

And here we go. I truly belive there are advances to be made in capturing and reproducing more accurately what our ears actually perceive in a concert hall. I am a big fan of innovation. And the proliferation of Hi-Res audio formats as we are witnessing right now is certainly one way to inspire more innovation to come forward in this field. Even if we are not (yet) experiencing it in the UltraHD tracks we get to listen to today.

 


Their conclusion was that based on their test methodology they could not find any significant preference for hi-res audio over the CD standard, even when using high-end headphones or speaker systems. However, they very correctly noted that “it is very difficult to use negative results to prove the inaudibility of any given phenomenon or process”. The most intriguing part of this arcticle, however, was their final note on high-resolution recordings: “Though our tests failed to substantiate the calimed advantages of high-resolution encoding for two-channel audio, one trend became obvious...throughout our testing: virtually all of the SACD and DVD-A recdoings sounded better than most CD’s - sometimes much better...Partly because ...]engineers and producers are give the freedom to produce recodings that sound as good as they can make them, without having to compress or equalize the signal to suit lesser systems.

 

 

How does one resolve the internal contradiction in the quoted except by attributing the “sounded better” thing to better mastering, which is said in the quote to be partly the reason for the better sound; and no one here disputes that better mastering can deliver audibly better sound. But, if that is partly the reason, what is rest of it? That is left hanging in the air...and where mastering is the same and sound levels are accurately matched, these differences have not survived in any published blind test, where the better master is downsampled to CD format and compared against the original version.

We can of course wait for the black swan, but I for one am not holding my breath.

To repeat also: all this applies only to 2 channel audio, where all the information needed to the extent audible, is captured by the bit rate and sampling frequency of the CD format.


Shame on me :)

I basically agree with the statement above. And while it is fairly obvious (at least to me) that there is no audible benefit by increasing the bit/sample resolution form 16 to 24, I don’t think there is a similar conclsuion on the increase in sampling rate, yet. For 2D stereo recordings and stationary sound sources there are some studies out there suggesting that there is no audible benefit from increasing the sampling rate above 48kHz. This is where we might just stop the discussion with a simple 16bit/44.1kHz is good enough, period.

OTOH the psychoacoustic propeties around the detection of minimum audible angle and depth of a sound source from low to high frequencies are still very much under investiagtion and not entirely understood. All I was stating is, that when the CD was developed the focus was primarily on accurate reconstruction of the amplitude spectrum while the phase spectrum was of lower importance.  

 

CIte for the following definitive claim in your original post, please:

While it's true that the human ear (and brain) cannot hear the amplitude of frequencies above, say, 18kHz our two ears can extremely well detect phase differences between frequencies that are much higher! So while we cannot hear those frequencies as tones, we can detect the tiny differences in runtime which it takes those inaudible frequencies to arrive at the left and right ear respectively. In other words, our spatial location capabilities are of much higher resolution than our frequency hearing capabilities. Btw, this effect is heavily used by 3D sound systems like Dolby Atmos or THX.

 

Because in your above quoted post, you seem to say the claim is “still very much under investiagtion (sic) and not entirely understood.”  So which is it, a definitive fact or something to be investigated?

Ahh, I see, good catch!

I think most audio and dsp engineers would agree with my statment that 44.1kHz is not enough to fully capture the audible relevant phase properties of a complex music source, such as an orchestra in a concert hall. There are complex dynamic patterns of phase variants and thus interaural phase differences created by musicians moving their instruments as they perfrom which are lost in a signal sampled at 44.1kHz. This I would call an accepted fact.

In a first but not necessarly sufficient step one could increase the sampling rate to capture more of this information. However, psychoaccoustic experiments indicate, for example, the perception of the phase to be non-linear across the spectrum. Also these effects seem to be time-variant and dependent on the source. So increasing the resolution of the phase evenly across the spectrum by increasing the sampling frequency alone might not be sufficient to deliver the desired result. This is something to be investigated.


To take this even further, very few can reliably hear the difference between even the CD format and lossy 320k, or the 256k that is how Apple lossy is coded. The head of Apple Music is on record as saying that he/his team cannot pick between Apple lossless and Apple lossy for 2 channel audio, except perhaps on very high quality headphones. In that case, can Hi Res offer more?

Apple Spatial Audio, or Dolby Atmos are an audibly different species no doubt and the information content needed to deliver them is a different matter.


 

 it's a product of prime numbers (2*2*3*3*5*5*7*7), which makes calculations easier,,

Who would have thought that?! Interesting.


A small digression out of curiosity - why was an “odd” number- 44100 - selected in the first place? If 40000 needed a margin of safely, why not 48000? Or even 44000?


This is the way it was always going to turn out.  The conversation continues until you get caught in a trap.  There was never going to be another outcome.

 

What trap is that?  The poster contradicted their own earlier posts and I asked for an explanation.  Any trap the poster fell into was of their own making.


Ahh, I see, good catch!

I think most audio and dsp engineers would agree with my statment that 44.1kHz is not enough to fully capture the audible relevant phase properties of a complex music source, such as an orchestra in a concert hall. There are complex dynamic patterns of phase variants and thus interaural phase differences created by musicians moving their instruments as they perfrom which are lost in a signal sampled at 44.1kHz. This I would call an accepted fact.

 

 

Please cite where this is proven as “accepted fact”.  Just stating it does not make it so.  Quite frankly, I’ve been around this stuff for a long time and I’ve never heard an inkling about these supposed phase differences being lost at 44.1 kHz.  

 

In a first but not necessarly sufficient step one could increase the sampling rate to capture more of this information. However, psychoaccoustic experiments indicate, for example, the perception of the phase to be non-linear across the spectrum. Also these effects seem to be time-variant and dependent on the source. So increasing the resolution of the phase evenly across the spectrum by increasing the sampling frequency alone might not be sufficient to deliver the desired result. This is something to be investigated.

 

So your definitive statements in your first post were mere bluster, thinking we’d accept your post at face value?  Sorry, this ain’t the forum for that type of BS.


My quick summary:

Regarding 16 vs 24bit/sample resolution: As a streaming/transport format there will be no audible gain as long as studios are finally compressing the dynamic range of their master recording to fit into the 96dB provided by a 16bit representation of a signal. Also, as has been said before, I doubt anyone (even with golden ears) can hear the difference as 96dB SNR sounds "fantastic" while 144dB (which is what you theoretically get from 24bit) is just overkill. However, as an internal format in studio- as well as in listetning equipment 24bit/sample resolution makes total sense for doing proper volume control and eq in the digital domain. But this is happening anyways even if your transport format is "just" 16bit/sample.

Regarding sampling rate, the discussion is a litte different though:

There seems to be common consensus that a sampling frequency of 44.1kHz is sufficient to accurately reproduce the audible frequency spectrum. In fact, according to the Nyquist Theorem this allows for reproducing frequencies up to 22.05kHz and only young children can hear frequencies above 20kHz while the hearing of an average adult Joe is capped at 18 or even just 16kHz. So, all good here? Well not quite...

Ever since the CD appeared in the 80's many audiophiles keep claiming that a good analog record still offers more accurate reproduction of the sound stage and more precise positioning and depth of the instruments. They are right!

This is because there is a (incorrect) notion that equates the frequency spectrum with only the amplitude spectrum but it neglects the corresponding phase spectrum. While it's true that the human ear (and brain) cannot hear the amplitude of frequencies above, say, 18kHz our two ears can extremely well detect phase differences between frequencies that are much higher! So while we cannot hear those frequencies as tones, we can detect the tiny differences in runtime which it takes those inaudible frequencies to arrive at the left and right ear respectively. In other words, our spatial location capabilities are of much higher resolution than our frequency hearing capabilities. Btw, this effect is heavily used by 3D sound systems like Dolby Atmos or THX.

This is why digital audio with increased sampling rates of 96kHz or even 192kHz would indeed provide a very noticeable benefit as it allows for more precise positioning and depth of the sound sources.

I say "would" because every track from Amazon labeled "Ultra HD" which I have seen (or been listeing to) so far is just 24bit/44.1kHz. So it gives me the "useless" 24bit/sample resolution but falls short of providing higher sampling rates which could really make an audible difference.

 

As long as you get HD it’s “fantastic”, there is currently no audible difference to “Ultra HD”. My hope is they provide more and more content higher sampling frequencies in the future. Then it will make a difference!
    

 

 

Cite?


This has been done to death all over the internet. However I’d make a few observations:

  • Bob Stuart is an originator of MQA, a ‘hi res’ format which has divided the industry. In part this is due to its lossy nature, in part because it’s a form of DRM extracting licence fees along the chain. 
  • The Meyer and Moran study has come in for criticism. An argument was that some of their ‘hi res’ content may not have actually had guaranteed high resolution provenance. 
  • The only large scale study, to my knowledge, is the one by Mark Waldrep referenced earlier. He too found fault with Meyer and Moran and, as an expert in high resolution recordings, took great care over the preparation of his test materials. By the sound of things @edchristoph has not delved into the full detail of this test which, as noted, concluded that ‘hi res’ added no perceptible fidelity improvement over Red Book.

The idea that there could be ‘something out there’ (unknown unknowns?) which Red Book fails to capture is a formula that’s been used down the ages by less than scrupulous salesmen to convince people to part with their money. Personally I’m not buying it. 


I think most audio and dsp engineers would agree with my statment that 44.1kHz is not enough to fully capture the audible relevant phase properties of a complex music source, such as an orchestra in a concert hall. There are complex dynamic patterns of phase variants and thus interaural phase differences created by musicians moving their instruments as they perfrom which are lost in a signal sampled at 44.1kHz. This I would call an accepted fact.

All of which just sums up to a scalar amplitude measured at each microphone, varying over time. This signal is fully and perfectly captured for all human audible frequencies (and beyond) by sampling at 44.1kHz.


I am not saying that current hi-res recodings are better sounding than CD-quality!

I am just saying that current 2D recordings in CD-quality are not adequately capturing/reproducing the binaural experience in a conert hall. You may come to the conclusion that for 2D audio formats, CD quality is as good as it gets and no further improvements are possible from there, fine. But this is also just an opinion based on the negative results of studies perfroming fromal listening tests. As stated above, negative results deliver no proof for the non-existence of some phenomenon. They just prove that in this case CD-quality could adequately capture what’s in the corresonding hi-res recording under test. 

I for one belive that even for 2D audio there are advances to be made which are related to the reproduction of the phase spectrum as “the human hearing system will resolve small time differences which might imply a wider bandwidth in a linear system”.

It’s prefectly fine if you have a different opinion


Please cite an academic reference supporting this thesis, and not one from any individual with a commercial interest.


 

I am just saying that current 2D recordings in CD-quality are not adequately capturing/reproducing the binaural experience in a conert hall.

Those of us that have been to live gigs in even small venues know that home audio today is a very limited version of that experience, and not just for reasons of the sound of the music. But Hi Res 2 channel audio that is presently being marketed as such does not change that situation at all, in coming any closer to the real thing than where CD takes us.

Will that change in the future? I don’t know. What is visible of course are the changes being brought by Atmos/Spatial audio and similar, but that isn’t 2 channel audio as is commonly understood.  


I am not saying that current hi-res recodings are better sounding than CD-quality!

I am just saying that current 2D recordings in CD-quality are not adequately capturing/reproducing the binaural experience in a conert hall. You may come to the conclusion that for 2D audio formats, CD quality is as good as it gets and no further improvements are possible from there, fine. But this is also just an opinion based on the negative results of studies perfroming fromal listening tests. As stated above, negative results deliver no proof for the non-existence of some phenomenon. They just prove that in this case CD-quality could adequately capture what’s in the corresonding hi-res recording under test. 

I for one belive that even for 2D audio there are advances to be made which are related to the reproduction of the phase spectrum as “the human hearing system will resolve small time differences which might imply a wider bandwidth in a linear system”.

It’s prefectly fine if you have a different opinion

 

And until your opinion is backed up with scientific experimental proof, I’ll continue to laugh at it, no matter how much BS cut-and-paste word salad you spew.


https://web.archive.org/web/20100410235208/http://www.cs.ucc.ie/~ianp/CS2511/HAP.html

“...Localization accuracy is 1 degree for sources in front of the listener and 15 degrees for sources to the sides. Humans can discern interaural time differences of 10 microseconds or less.”

https://www.ece.ucdavis.edu/cipic/spatial-sound/tutorial/psychoacoustics-of-spatial-hearing/#azimuth

“...under optimum conditions, much greater accuracy (on the order of 1°) is possible...This is rather remarkable, since it means that a change in arrival time of as little as 10 microseconds is perceptible. (For comparison, the sampling rate for audio CD’s is 44.1 kHz, which corresponds to a sampling interval of 22.7 microseconds. Thus, in some circumstances, less than a one-sample delay is perceptible.)”


https://web.archive.org/web/20100410235208/http://www.cs.ucc.ie/~ianp/CS2511/HAP.html

“...Localization accuracy is 1 degree for sources in front of the listener and 15 degrees for sources to the sides. Humans can discern interaural time differences of 10 microseconds or less.”

https://www.ece.ucdavis.edu/cipic/spatial-sound/tutorial/psychoacoustics-of-spatial-hearing/#azimuth

“...under optimum conditions, much greater accuracy (on the order of 1°) is possible...This is rather remarkable, since it means that a change in arrival time of as little as 10 microseconds is perceptible. (For comparison, the sampling rate for audio CD’s is 44.1 kHz, which corresponds to a sampling interval of 22.7 microseconds. Thus, in some circumstances, less than a one-sample delay is perceptible.)”

 

The accuracy with which this can be done depends on the circumstances. For speech in normally reverberant rooms, typical human accuracies are on the order of 10° to 20°. However, under optimum conditions, much greater accuracy (on the order of 1°) is possible if the problem is to decide merely whether or not a sound source moves. This is rather remarkable, since it means that a change in arrival time of as little as 10 microseconds is perceptible. (For comparison, the sampling rate for audio CD’s is 44.1 kHz, which corresponds to a sampling interval of 22.7 microseconds. Thus, in some circumstances, less than a one-sample delay is perceptible.)

The underlined text is what you removed from your quote. The context matters quite a bit.  Obviously, the vast majority of homes, where Sonos speakers live, are not optical conditions.  As well, the greater accuracy is useful for determining whether the sound source moves, which is very debatable as being useful information when listening to music. And it’s not like that motion can’t be simulated at greater time intervals, as that certainly can occur in with 2 channel audio even in SD.  So this could only be possible when in a highly controlled environment, maybe headphones, and where the audio source is trying to give the impression of movement.

And of course, your quote is from a footnote, and timing is not the only factor  the ears/brain use for determining the location of sound source, as the article states. ILD seems rather important to me.  If you artificially modify timing in order to create a spatial illusion, how do you account from the difference in volume and frequency shift that each ear hears (especially without headphones)?

 


It seems we’re now in the realm of Dirac impulses which, to my mind, don’t have much to do with music. :rolling_eyes:

 

Why is it that none of this elaborate theorising is needed to identify/justify HD video streams and to pick them over DVD quality on any HD quality screen played on a HD capable player of any price point, by eyes that are not capable of 20/20 vision? Probably because in the case of audio, there is a frantic effort to justify something that doesn't exist in a practical sense for any domestic use case? 

Does anyone here remember wapping high?:grin: