"The Beginner’s Guide to Hi-Res Audio"

  • 7 December 2021
  • 92 replies
  • 2253 views

Userlevel 7

The 13.4.1 S2 update added hi-res (Ultra HD) and Dolby Atmos audio support from Amazon Music Unlimited. With this update, Sonos released this great article about hi-res audio and how you can listen to it on Sonos. It’s a very detailed and well-written article:

https://blog.sonos.com/en-us/hi-res-audio-guide


This topic has been closed for further comments. You can use the search bar to find a similar topic, or create a new one by clicking Create Topic at the top of the page.

92 replies



Audio formats reached that point that a long time ago. It’s kind of crazy that a 40 year old standard like Red Book still represents the pinnacle of audio reproduction but it really does. 

Which is why you see all the frantic marketing of the latest audio kit, year after year, by the odious to the credulous - the HiFi specialist media being just as culpable. And their pandering to the all too common human condition of not being satisfied with what is available and at hand in the home.

4K HDR video played back on my 65” OLED blows me away.  

OLED and HDR are very fine. However with average vision the 4K part is only likely to be relevant if you’re sitting 9 feet or less from the screen. https://www.rtings.com/tv/reviews/by-size/size-to-distance-relationship

In many typical living room situations a viewer quite probably wouldn’t be able to distinguish 4K from FHD. 

Agreed. And sitting close enough to pick the 4K on a large screen means turning to the left/right to catch all the action, that can get tiresome. 

But the few times I have watched 4K streams on a 2011 make 50 inch plasma HDTV, the picture clarity is a marked improvement even on that TV as compared to HD streams. Perhaps the better mastering analogy applies to video as well. Of course, how much better than that they would be on a 4K capable screen, I don’t know. 

The difference isn’t as marked as it is in the case of DVD to HD, so this suggests that diminishing returns for video are also now in play, and I would be surprised to see any need for 8K and higher.

Userlevel 7
Badge +21

The higher the audio bit rate the more data you have to move over your network. For folks with problems with issues with non-HD audio it will likely make things worse.

4K HDR video played back on my 65” OLED blows me away.  

OLED and HDR are very fine. However with average vision the 4K part is only likely to be relevant if you’re sitting 9 feet or less from the screen. https://www.rtings.com/tv/reviews/by-size/size-to-distance-relationship

In many typical living room situations a viewer quite probably wouldn’t be able to distinguish 4K from FHD. 

Why is it that none of this elaborate theorising is needed to identify/justify HD video streams and to pick them over DVD quality on any HD quality screen played on a HD capable player of any price point, by eyes that are not capable of 20/20 vision? Probably because in the case of audio, there is a frantic effort to justify something that doesn't exist in a practical sense for any domestic use case? 

 

 

Interesting point.  I really don’t have  a clue where the line is between video resolution and what your eyes can see.  Perhaps part of the reason is that resolution, isn’t the only advancement involved with video.  Size of the TV, how black are the blacks, frame rate, etc.  As well, in many cases, customers can actually see the difference betweeen TVs in a store, much better than they can hear the difference between speakers.

The Hi Res audio push seems inevitable regardless of whether it makes any real difference or not. It has become just another spec tick box you have to have if you want to sell audio products in 2022.  My guess is the increased cost of the bandwidth is so insignificant to streaming music services they are like “why not?”,  If it hooks a few more customers it will be worth it. 

 

 

 

I think a lot of the cost for higher bandwidth is further down the line with your service provider and local WiFi network.  This is why I would prefer to block resolutions beyond what I can hear.

 


Most listeners won’t know or care.  But there is a certain population out there who will be lured in by the promises. Just like there used to be a certain population people will argue bitterly about megapixels in cameras long after we passed the limits of what really mattered. Some people just love to bicker about numbers I guess.

 

 

Eh, I think most people always want more, and don’t really think about the limits of what they can possibly hear or use.   I wouldn’t say that I’m immune to that psychology either. 

 

 

Userlevel 2
Badge +2

@Kumar 

I could not agree more.  Arguing about bigger numbers is meaningless if people can’t see or hear the difference.  Every audio and video format eventually reaches a point of diminishing returns.  And at some point they surpass the limits of what our eyes and ears can actually perceive making these supposedly quality increases strictly academic.

Audio formats reached that point that a long time ago. It’s kind of crazy that a 40 year old standard like Red Book still represents the pinnacle of audio reproduction but it really does. Video formats on the other hand still have room to grow.  Hi Res audio played back on my $3K sound system, does not impress me at all.  But 4K HDR video played back on my 65” OLED blows me away.  

Userlevel 2
Badge +2

The Hi Res audio push seems inevitable regardless of whether it makes any real difference or not. It has become just another spec tick box you have to have if you want to sell audio products in 2022.  My guess is the increased cost of the bandwidth is so insignificant to streaming music services they are like “why not?”,  If it hooks a few more customers it will be worth it. 

Most listeners won’t know or care.  But there is a certain population out there who will be lured in by the promises. Just like there used to be a certain population people will argue bitterly about megapixels in cameras long after we passed the limits of what really mattered. Some people just love to bicker about numbers I guess.

It does make me chuckle when I see people fretting about whether they are hearing Hi Res audio on their Sonos Roam other tiny speakers.  Hell even on a Port connected to a really nice Denon/Aperion Audio setup I can’t hear a difference so what chance do they have on what’s essentially a 200 dollar bluetooth speaker?  

 

You need trained ears and a $50,000 system!

I can guarantee that in my home, even late at night, no human, however trained, will be able to pick the difference in a blind listening test even on a USD 100,000 system. Because that still has to deliver sound after interacting with my room - with its acoustics and ambient sound levels - which is a typical domestic one.

Whereas the HD video I can pick on a cheap HD capable TV, even when I have left my glasses out of reach.

HD audio is just digital snake oil, consumed by the credulous, who need the HD mark to be visible on their player/app to even know that they are listening to HD audio. 

It seems we’re now in the realm of Dirac impulses which, to my mind, don’t have much to do with music. :rolling_eyes:

 

Why is it that none of this elaborate theorising is needed to identify/justify HD video streams and to pick them over DVD quality on any HD quality screen played on a HD capable player of any price point, by eyes that are not capable of 20/20 vision? Probably because in the case of audio, there is a frantic effort to justify something that doesn't exist in a practical sense for any domestic use case? 

Does anyone here remember wapping high?:grin:

 

It's like night and day!  

You need trained ears and a $50,000 system!

It seems we’re now in the realm of Dirac impulses which, to my mind, don’t have much to do with music. :rolling_eyes:

 

Why is it that none of this elaborate theorising is needed to identify/justify HD video streams and to pick them over DVD quality on any HD quality screen played on a HD capable player of any price point, by eyes that are not capable of 20/20 vision? Probably because in the case of audio, there is a frantic effort to justify something that doesn't exist in a practical sense for any domestic use case? 

Does anyone here remember wapping high?:grin:

It seems we’re now in the realm of Dirac impulses which, to my mind, don’t have much to do with music. :rolling_eyes:

Indeed the first reference says “Localisation performance is better for non-musical sounds (e.g., clicks, percussive noises, etc.) than for musical tones”. Presumably this accounts for the 1 degree figure.

https://web.archive.org/web/20100410235208/http://www.cs.ucc.ie/~ianp/CS2511/HAP.html

“...Localization accuracy is 1 degree for sources in front of the listener and 15 degrees for sources to the sides. Humans can discern interaural time differences of 10 microseconds or less.”

https://www.ece.ucdavis.edu/cipic/spatial-sound/tutorial/psychoacoustics-of-spatial-hearing/#azimuth

“...under optimum conditions, much greater accuracy (on the order of 1°) is possible...This is rather remarkable, since it means that a change in arrival time of as little as 10 microseconds is perceptible. (For comparison, the sampling rate for audio CD’s is 44.1 kHz, which corresponds to a sampling interval of 22.7 microseconds. Thus, in some circumstances, less than a one-sample delay is perceptible.)”

 

You (and that paper) have a fundamental misunderstanding of what sample rate means and how it applies to digital sampling.  A change in arrival time of 10 microseconds due to positioning has no relationship to digital sample rate.  There is no “gap” in the data in which a phase shift can be missed because 10 microseconds is less than ½ sample rate. 

How, you say?  Well, as shown by Nyquist-Shannon, a bandwidth limited digital audio file converted back to analog doesn’t have gaps or stair steps or any of the other silly representations, it is actually EXACTLY the same as the original analog signal as captured by the listening device. 

Let’s say this again: Within ½ the bandwidth limit, there is no data loss, none.  Therefore, at 44.1 kHz, all audible sound is reproduced exactly as it was in analog form, and the ear hears all of it, including the phase shift.  All increasing the bandwidth would do is increase the frequencies that are reproduced, and the ear doesn’t hear ANYTHING over 20 kHz, phase shifted or not.

 

So read my Google Fu:

Strictly speaking, the theorem only applies to a class of mathematical functions having a Fourier transform that is zero outside of a finite region of frequencies. Intuitively we expect that when one reduces a continuous function to a discrete sequence and interpolates back to a continuous function, the fidelity of the result depends on the density (or sample rate) of the original samples. The sampling theorem introduces the concept of a sample rate that is sufficient for perfect fidelity for the class of functions that are band-limited to a given bandwidth, such that no actual information is lost in the sampling process. It expresses the sufficient sample rate in terms of the bandwidth for the class of functions. The theorem also leads to a formula for perfectly reconstructing the original continuous-time function from the samples.

 

https://en.wikipedia.org/wiki/Nyquist%E2%80%93Shannon_sampling_theorem

 

You fell for the old audiophile nonsense that there are “gaps” in the data because each sample is a “slice in time”.  It’s just not true.  Sampling isn’t slicing the data, it is transforming the data, ALL of the data, into a format that can be stored digitally.  The digital to analog step then transforms the data, ALL of the data, back to it’s original analog form.  Nothing is lost, phase shifted or not. 

https://web.archive.org/web/20100410235208/http://www.cs.ucc.ie/~ianp/CS2511/HAP.html

“...Localization accuracy is 1 degree for sources in front of the listener and 15 degrees for sources to the sides. Humans can discern interaural time differences of 10 microseconds or less.”

https://www.ece.ucdavis.edu/cipic/spatial-sound/tutorial/psychoacoustics-of-spatial-hearing/#azimuth

“...under optimum conditions, much greater accuracy (on the order of 1°) is possible...This is rather remarkable, since it means that a change in arrival time of as little as 10 microseconds is perceptible. (For comparison, the sampling rate for audio CD’s is 44.1 kHz, which corresponds to a sampling interval of 22.7 microseconds. Thus, in some circumstances, less than a one-sample delay is perceptible.)”

 

The accuracy with which this can be done depends on the circumstances. For speech in normally reverberant rooms, typical human accuracies are on the order of 10° to 20°. However, under optimum conditions, much greater accuracy (on the order of 1°) is possible if the problem is to decide merely whether or not a sound source moves. This is rather remarkable, since it means that a change in arrival time of as little as 10 microseconds is perceptible. (For comparison, the sampling rate for audio CD’s is 44.1 kHz, which corresponds to a sampling interval of 22.7 microseconds. Thus, in some circumstances, less than a one-sample delay is perceptible.)

The underlined text is what you removed from your quote. The context matters quite a bit.  Obviously, the vast majority of homes, where Sonos speakers live, are not optical conditions.  As well, the greater accuracy is useful for determining whether the sound source moves, which is very debatable as being useful information when listening to music. And it’s not like that motion can’t be simulated at greater time intervals, as that certainly can occur in with 2 channel audio even in SD.  So this could only be possible when in a highly controlled environment, maybe headphones, and where the audio source is trying to give the impression of movement.

And of course, your quote is from a footnote, and timing is not the only factor  the ears/brain use for determining the location of sound source, as the article states. ILD seems rather important to me.  If you artificially modify timing in order to create a spatial illusion, how do you account from the difference in volume and frequency shift that each ear hears (especially without headphones)?

 

https://web.archive.org/web/20100410235208/http://www.cs.ucc.ie/~ianp/CS2511/HAP.html

“...Localization accuracy is 1 degree for sources in front of the listener and 15 degrees for sources to the sides. Humans can discern interaural time differences of 10 microseconds or less.”

https://www.ece.ucdavis.edu/cipic/spatial-sound/tutorial/psychoacoustics-of-spatial-hearing/#azimuth

“...under optimum conditions, much greater accuracy (on the order of 1°) is possible...This is rather remarkable, since it means that a change in arrival time of as little as 10 microseconds is perceptible. (For comparison, the sampling rate for audio CD’s is 44.1 kHz, which corresponds to a sampling interval of 22.7 microseconds. Thus, in some circumstances, less than a one-sample delay is perceptible.)”

Userlevel 7
Badge +21

 

Software Engineers think of this kind of stuff all the time.  It’s what they do.

 

Indeed. On the other hand sales folks concentrate on not leaving any money on the table. They are good and have no shame… 

So you get: 1.5-meter Ethernet cable for $499.

https://www.networkworld.com/article/2281260/denon-s-outrageous-price-for-ethernet-cable.html

“The manufacturer is Denon, and the target customer is the "audio enthusiast." Apparently "audio enthusiast" is Denonese for "sucker."”

I am not saying that current hi-res recodings are better sounding than CD-quality!

I am just saying that current 2D recordings in CD-quality are not adequately capturing/reproducing the binaural experience in a conert hall. You may come to the conclusion that for 2D audio formats, CD quality is as good as it gets and no further improvements are possible from there, fine. But this is also just an opinion based on the negative results of studies perfroming fromal listening tests. As stated above, negative results deliver no proof for the non-existence of some phenomenon. They just prove that in this case CD-quality could adequately capture what’s in the corresonding hi-res recording under test. 

I for one belive that even for 2D audio there are advances to be made which are related to the reproduction of the phase spectrum as “the human hearing system will resolve small time differences which might imply a wider bandwidth in a linear system”.

It’s prefectly fine if you have a different opinion

 

It sounds more like you’re saying that listening to a live acoustic performance is not accurately reproduced by a 2 channel recording at CD quality.  I would agree with that.  However, that does not mean that hi res  audio is the solution, particularly when  it’s tested and failed.  The factor that seems to be forgotten is that room acoustics, reflections, absorptions, direction of audio, and likely visual ques, come in to play to effect what we hear.  While you could reproduce some of that with pyscho accoustic effects, timing modifications, etc, you still can’t quite reproduce it with 2 channels.   Even then you brain is still aware that what it’s hearing is a recording rather than a live performance, and that surely factors in to some extent.

So it seems logical to me that instead of pushing higher resolution, it would make more sense, to me anyway,  to work on improving room acoustics, additional audio channels, speakers that operate closer to how instruments actually produce,  sound, etc.  That’s not even realistic though, since environments are created for purpose beyond mimicking the acoustics of a concert hall.   But sure, there is room for growth in audio reproduction, but saying that it needs to occur with higher resolution rather than other issues, doesn’t make a ton of sense.

And your statement “Yeah, internet formus seem to attract those flat-earthers who use negative results to prove the non-existence of any given phenomenon.” is just silly in this context.   You can absolutely prove that A doesn’t cause B when you repeatedly demonstrate that A doesn’t cause B.   Negative results don’t prove anything when we do not have the testing capability to examine an entire sample.  For example, just because we have not seen life on other planets does not prove that there is no life on other planets because we can not test the entire population of planets.  Flat earth is very different as we have proven that the earth is round, and flat earth believe that those that did the test are lying about their results.

I am not saying that current hi-res recodings are better sounding than CD-quality!

I am just saying that current 2D recordings in CD-quality are not adequately capturing/reproducing the binaural experience in a conert hall. You may come to the conclusion that for 2D audio formats, CD quality is as good as it gets and no further improvements are possible from there, fine. But this is also just an opinion based on the negative results of studies perfroming fromal listening tests. As stated above, negative results deliver no proof for the non-existence of some phenomenon. They just prove that in this case CD-quality could adequately capture what’s in the corresonding hi-res recording under test. 

I for one belive that even for 2D audio there are advances to be made which are related to the reproduction of the phase spectrum as “the human hearing system will resolve small time differences which might imply a wider bandwidth in a linear system”.

It’s prefectly fine if you have a different opinion

 

And until your opinion is backed up with scientific experimental proof, I’ll continue to laugh at it, no matter how much BS cut-and-paste word salad you spew.

Ahh, I see, good catch!

 

This is the way it was always going to turn out.  The conversation continues until you get caught in a trap.  There was never going to be another outcome.

Yeah, internet formus seem to attract those flat-earthers who use negative results to prove the non-existence of any given phenomenon.  I don’t really bother especially if the most compelling counter argument they present is that they have never heard about it garnished with a hint on the many yeears of experience they have under their belt.

But for arguments sake, you may want to read J. Robert Stuarts’s paper on “Coding for High-Resolution Audio Systems”, published 2004 in the Journal of the Audio Engineering Society. Everything I have stated about sampling rate and high frequency content you can more or less find in Chapter 5 of this paper, and in particular section 5.1 Psychoacoustic Data to Support Higher Sampling Rates:”...It has been suggested that perhaps higher sampling rates are preferred because, somehow, the human hearing system will resolve small time differences which might imply a wider bandwidth in a linear system. In considering this it is important to distinguish between perceiving separate events which are very close together in time (implying wide bandwidth and fine monaural temporal resolution) and those events which help build the auditory scene, for which the relative arrival times are either binaural or well separated. In the first case, wider bandwidth is required to discriminate acoustic events that are closer together in time. This seems to be an alternative statement of the problem to determine the maximum bandwidth necessary for audible transparency...Events in time can be dis- criminated to within very fine limits, and with a resolution very substantially smaller than the sampling period. This point is crucial because provided we treat all channels identically to ensure no skew of directional information, there is no direct relationship between the attainable tem- poral resolution and the sampling interval.

So independet of whether you follow the author’s hyphotheses and findings or not, it is a well known paper from an AES Fellow, so you cannot really say that you never heard about this stuff.  

 

 

That AES Fellow has an direct financial relationship to the promotion of high resolution audio formats.  You might as well consult Dr. Daffy on whether his Magic Elixir cures all ails. 

 

Following this paper in 2007 there has been an article from Meyer and Moran, also published in the AES Journal, whith the objective to find out if there are any audible gains from high-res audio playback by doing some extensive, formalized testing. Their conclusion was that based on their test methodology they could not find any significant preference for hi-res audio over the CD standard, even when using high-end headphones or speaker systems. However, they very correctly noted that “it is very difficult to use negative results to prove the inaudibility of any given phenomenon or process”. The most intriguing part of this arcticle, however, was their final note on high-resolution recordings: “Though our tests failed to substantiate the calimed advantages of high-resolution encoding for two-channel audio, one trend became obvious...throughout our testing: virtually all of the SACD and DVD-A recdoings sounded better than most CD’s - sometimes much better...Partly because[...]engineers and producers are give the freedom to produce recodings that sound as good as they can make them, without having to compress or equalize the signal to suit lesser systems.

And here we go. I truly belive there are advances to be made in capturing and reproducing more accurately what our ears actually perceive in a concert hall. I am a big fan of innovation. And the proliferation of Hi-Res audio formats as we are witnessing right now is certainly one way to inspire more innovation to come forward in this field. Even if we are not (yet) experiencing it in the UltraHD tracks we get to listen to today.

 

 

Are you actually trying to use Meyer and Moran’s results to suggest SACD and DVD-A recordings are superior to CD, when their results showed all differences were due to mastering and not higher resolution formats?  Pretty freaking bold move!  But as before, your bluster doesn’t work here.

Who would have thought that?! Interesting.

 

Software Engineers think of this kind of stuff all the time.  It’s what they do.

Example:  The most efficient file sorting technique is to sort the files by dividing them up into separate files following the Fibonacci Sequence.  There’s also a search technique based on the Fib.  Both are proven to have a Log n algorithmic complexity, which is as good as you can get.

 

I am just saying that current 2D recordings in CD-quality are not adequately capturing/reproducing the binaural experience in a conert hall.

Those of us that have been to live gigs in even small venues know that home audio today is a very limited version of that experience, and not just for reasons of the sound of the music. But Hi Res 2 channel audio that is presently being marketed as such does not change that situation at all, in coming any closer to the real thing than where CD takes us.

Will that change in the future? I don’t know. What is visible of course are the changes being brought by Atmos/Spatial audio and similar, but that isn’t 2 channel audio as is commonly understood.  

Please cite an academic reference supporting this thesis, and not one from any individual with a commercial interest.

I am not saying that current hi-res recodings are better sounding than CD-quality!

I am just saying that current 2D recordings in CD-quality are not adequately capturing/reproducing the binaural experience in a conert hall. You may come to the conclusion that for 2D audio formats, CD quality is as good as it gets and no further improvements are possible from there, fine. But this is also just an opinion based on the negative results of studies perfroming fromal listening tests. As stated above, negative results deliver no proof for the non-existence of some phenomenon. They just prove that in this case CD-quality could adequately capture what’s in the corresonding hi-res recording under test. 

I for one belive that even for 2D audio there are advances to be made which are related to the reproduction of the phase spectrum as “the human hearing system will resolve small time differences which might imply a wider bandwidth in a linear system”.

It’s prefectly fine if you have a different opinion

Userlevel 7
Badge +20

I think most audio and dsp engineers would agree with my statment that 44.1kHz is not enough to fully capture the audible relevant phase properties of a complex music source, such as an orchestra in a concert hall. There are complex dynamic patterns of phase variants and thus interaural phase differences created by musicians moving their instruments as they perfrom which are lost in a signal sampled at 44.1kHz. This I would call an accepted fact.

All of which just sums up to a scalar amplitude measured at each microphone, varying over time. This signal is fully and perfectly captured for all human audible frequencies (and beyond) by sampling at 44.1kHz.

This has been done to death all over the internet. However I’d make a few observations:

  • Bob Stuart is an originator of MQA, a ‘hi res’ format which has divided the industry. In part this is due to its lossy nature, in part because it’s a form of DRM extracting licence fees along the chain. 
  • The Meyer and Moran study has come in for criticism. An argument was that some of their ‘hi res’ content may not have actually had guaranteed high resolution provenance. 
  • The only large scale study, to my knowledge, is the one by Mark Waldrep referenced earlier. He too found fault with Meyer and Moran and, as an expert in high resolution recordings, took great care over the preparation of his test materials. By the sound of things @edchristoph has not delved into the full detail of this test which, as noted, concluded that ‘hi res’ added no perceptible fidelity improvement over Red Book.

The idea that there could be ‘something out there’ (unknown unknowns?) which Red Book fails to capture is a formula that’s been used down the ages by less than scrupulous salesmen to convince people to part with their money. Personally I’m not buying it. 

To take this even further, very few can reliably hear the difference between even the CD format and lossy 320k, or the 256k that is how Apple lossy is coded. The head of Apple Music is on record as saying that he/his team cannot pick between Apple lossless and Apple lossy for 2 channel audio, except perhaps on very high quality headphones. In that case, can Hi Res offer more?

Apple Spatial Audio, or Dolby Atmos are an audibly different species no doubt and the information content needed to deliver them is a different matter.