Question

Sonos support for Hi Res Audio



Show first post
This topic has been closed for further comments. You can use the search bar to find a similar topic, or create a new one by clicking Create Topic at the top of the page.

105 replies

By the way, for those who wish, here is a link to a series of videos with Monty's explanation of sampling theorem and why higher sampling rates do not yield "finer slices" that "better approximate the original source", only higher frequencies that no human being can hear. In it, he matches analog waveforms with their digital counterpart, proving the original wave is exactly and perfectly duplicated when sampled at the proper Nyquist frequency. Measured with an oscilloscope. there is nothing which "distorts the original signal (and) generates audible differences when it comes to harmonics, imaging and other nuances that are not readily described by math". The input matches the output exactly, as measure electronically on an analog scope, a measurement far past the level that human ears could ever hear a difference.

https://wiki.xiph.org/Videos/Digital_Show_and_Tell
That web site gives a good explanation, but there is a lot hidden in the assumption of a bandlimited signal. Without this assumption, there are infinitely many analog signals that can fit exactly through the digital samples, but they all contain frequencies above the Nyquist. It is the bandlimited assumption that allows one to draw a stairstep, because it filters all other possibilities. So I disagree with Monty - if one assumes a bandlimited signal, it is perfectly reasonable to draw a stairstep.

Here's an analogy. Weather models use a spatial grid typically 50km square to perform their calculations. If you don't happen to live exactly at one of these grid points, you can still expect the weather forecast to apply to your location pretty well. Implicitly, a stairstep is drawn between the weather grid points on either side of your location. There may be slight differences due to trees, lakes, buildings, concrete etc, but the grid point forecast is pretty accurate and perfectly useful.

Cheers, Peter.
That web site gives a good explanation, but there is a lot hidden in the assumption of a bandlimited signal. Without this assumption, there are infinitely many analog signals that can fit exactly through the digital samples, but they all contain frequencies above the Nyquist. It is the bandlimited assumption that allows one to draw a stairstep, because it filters all other possibilities. So I disagree with Monty - if one assumes a bandlimited signal, it is perfectly reasonable to draw a stairstep.

Here's an analogy. Weather models use a spatial grid typically 50km square to perform their calculations. If you don't happen to live exactly at one of these grid points, you can still expect the weather forecast to apply to your location pretty well. Implicitly, a stairstep is drawn between the weather grid points on either side of your location. There may be slight differences due to trees, lakes, buildings, concrete etc, but the grid point forecast is pretty accurate and perfectly useful.

Cheers, Peter.


Well that's exactly what Monty is saying. Bandlimiting the signal to 22.05 KHz is what allows one to exactly reproduce the original signal. Limiting the frequencies eliminates the infinite number of signals above the Nyquist, so there are no more stairsteps. There is absolutely no estimations being done as in your analogy, the exact wave is reproduced.

This is the very reason the courts made Sony stop using the stairstep diagrams.
Bandlimiting the signal to 22.05 KHz is what allows one to exactly reproduce the original signal.
If I may correct the above: " to exactly reproduce the original signal to within the band limits". Audibly this does not matter because signal content above 20 kHz can't be heard by humans.
My A HA moment in this was when I finally realised that a sine wave can be perfectly copied if one has the information of just two points on it; not being very math literate I had wandered down the false path of thinking that more the points on the wave form that are known/captured, the less jaggy the reproduced sine wave; I now realise that information of more than the two necessary points is just redundant information of no value.
But this jaggy/fine slices misunderstanding lends itself to profound sounding statements that audiophiles still use, for example: the soul of the music lives between the zeros and ones of any digital sampling based attempt to record it and can therefore never be fully captured except via analog methods; ergo, digital is always a compromise.

So they are not going to make any change that means that customers existing kit is obsolete. And they're certainly not going to do it when there is no evidence whatsoever that HiRes sounds better - quite the reverse, in fact.

Makes sense. I do however see a situation where low res components aren't mass produced anymore and making a Hi Res capable hardware equipped product may be cheaper. Sonos may then choose to have these also capable of transcoding on the fly so that these can remain compatible with all existing kit, while still playing Hi Res files. But there doesn't seem to be a need for that capability unless Hi Res music content takes over in the manner that HD video content has.

I suspect that much of Hi Res hardware made today is no more expensive to make than the Lo res kind and marketing does the virtue out of a necessity thing to tout the capability.
Good insight Kumar. Interestingly though, you have to assume a sine wave for this to work. If you don't know the function shape used, two points gives you very little information. Sine waves are generally used when performing Fourier analysis to get the audio spectrum. You've got me thinking though, does a DAC operate in the time domain or the frequency domain? The two are equivalent but require different processing. Filtering out high frequencies in the frequency domain is the same as averaging in the time domain.

One other point I'll toss in here. There's been little or no discussion of oversampling. I believe this is where the original digital information is interpolated to 88.2kHz (two-times oversampling) or 4- or even 8- times so that the filter above 20kHz can be less steep (because the Nyquist is now at 44.1kHz or higher). Filters that are less steep have better phase characteristics. This is a potential minefield - perhaps I'll stop there.

Cheers, Peter.

If I may correct the above: " to exactly reproduce the original signal to within the band limits".

Or should it be: " to exactly reproduce all of the original signal that lies within the band limits."
If the original analog signal is not band-limited, then frequencies higher than 20kHz will be aliased back into the audible range because the spectrum is symmetric about the Nyquist frequency. This aliasing is definitely not desirable. Oversampling can help here too.
Peter, all this assumes recording and mastering uses oversampling so that filters are less steep and therefore more effective. Monty explains all this, along with dither, in the first article I listed. However, this has no bearing on playback, aside from the false audiophile argument that since it is beneficial in recording and mastering, it must be beneficial in playback.
you have to assume a sine wave for this to work. If you don't know the function shape used, two points gives you very little information. Sine waves are generally used when performing Fourier analysis to get the audio spectrum.
A fundamental and not a rhetorical question to the above:
Is it not the case that all sound can be broken down to a combination of sine waves? AFAIK, at its simplest the sound from a tuning fork is a sine wave, and my understanding of sound is on the lines that there is no other function to any sound wave; it is just an aggregation of sine waves.
And further to the above, something that talks of sound on the same lines:
http://www-users.math.umn.edu/~rogness/math1155/soundwaves/
by someone that seems to know what they are talking about.
Badge
wow.. music as an argument :)
My 0.2mhz.. SONOS was the first doing things right. synchro multi streaming, great interface and frankly..that is it. And it has been absolutely great! Venturing into hardware / speakers etc, isn't special, it is simply broadening the use case and base.
After +10 years (amazing feat!!!) competition is catching up. There are prettier devices and other interfaces popping up, with different/broader functionalities. SONOS needs to make a move to stay ahead. It needs to stay cool/sexy and all that. Faster product introductions, different models.. and what about piano black or ruby red devices.. Nothing technological at all, but style is a factor in buying in or buying more..
On the more, dare I say it.., high end side of the scale we have companies like Devialet creating a new reality in music reproduction. It's a flavour not all will appreciate, but what it does is simply astonishing. Oh and it play's 24 bits too. And why should it not? I've never bought purely on mathematical equations and never will. I have some high res and mostly the difference I perceive is in the mastering (and not all for the better I must say..). But it doesn't matter much.. I have it, I like it and simply want to play it. I listen to records, CD's, SACD's, full res and high res files. I love them all in their own way. You can discuss or share your flavours with me and enjoy it with me, but..don't argue with me. I do not need someone else's approval for my taste. You want my $? Then let me play what I want to play. Simple, no?
DTM 🙂
.don't argue with me. I do not need someone else's approval for my taste. You want my $? Then let me play what I want to play. Simple, no?

Fine, except that you may then not be interested in Sonos and vice versa.
The arguments are usually on the grounds of whether the perceived superiority that is claimed by Hi Res fans is more than psychological or more than mastering differences and on what is the size of the market that such claimants create based on their needs. No one here ever argues with someone that chooses something else because Sonos does not meet their needs and moves on, mainly because you need two sides to sustain an argument!
Badge
Well said. Hence, SONOS is not necessarily a one stop shop. Not many brands are. Complement to complete. Differentiation creates choice and the ability to assemble what you like (and can afford). Change is inevitable.
Jumping back in here...I really like how this thread has shifted to future directions and improvements. Thanks for all the inputs.

My original topic was concerning a house I am getting ready to spend $k's for its audio system. Five years ago when I built my current house, Sonos was the best option but now there are strong wireless options that are marketing 24b/192k support, e.g., Dynaudio, Klipsch, BlueSound, et al..

Today, I dropped by my favorite hi-end audio store and shared this thread with the owner who has sold Sonos for years. He was very intrigued as he also carries Dynaudio, Martin Logan and has other suppliers marketing the 24b/192k angle. So, we took our "golden ears" and sat in his most high end room and listened to Miles Davis' "Kind of Blue" in a variety of formats including a Tidal lossless stream via Sonos Connect and an Apple Music lossy stream.

We definitely could discern differences in imaging, tonal qualities, and clarity but this was not a designed experiment just some playing around as we only had 30 minutes. We were both impressed by the Tidal lossless and what he said next was interesting. He feels 24b/192k is marketing hype but that it will become table-stakes for product offerings and 24b/96k will become the new gold standard. Its one man's opinion whose been in this business for nearly 40 years...

.We are getting together in a few weeks for a more structured listening session.
So, we took our "golden ears" and sat in his most high end room and listened to Miles Davis' "Kind of Blue" in a variety of formats including a Tidal lossless stream via Sonos Connect and an Apple Music lossy stream.

We definitely could discern differences in imaging, tonal qualities, and clarity

On the other hand, I used Kind of Blue to determine whether I could box and later sell my high end SACD player. On test was the Connect in both digital mode via the DAC in the SACDP as well as in analog mode direct to a high end Quad amp. I also used KoB on 180gm vinyl, but it was not possible to structure that part of the test very well. As an outcome of that test I moved my front end in every room to one from Sonos, and liquidated my legacy hifi components in 2011/12.
The thing to remember about KoB is that there are so many versions/masters of it, that it can vitiate the results unless you are listening to the same master each time. You want to make sure that all tests are from one mastered source to have a test with just one variable; multi variable tests are pointless.
The other thing to remember is that 0.2dB sound level differences are enough to have the louder sounding version exhibit the kind of adjectives you use to describe the better sounding music. And matching sound levels down to less than that needs instruments; it can't be done by ear.
IMO, the differences, if any, are nowhere near the kind that Blueray exhibits over DVD. And being so small if they even exist at all on objective structured testing, they can easily be accommodated in the brain so as to yield the same listening pleasure as before. Provided one is willing to let the brain do its thing. Send the brain on a wild goose chase after chimera, and it will also do that very well. The advantage in the former is access to a much wider range of music and all the conveniences that Sonos offers as against using kit that is either more expensive and/or less featured. And limiting music content to just hi res while paying a higher price for it.
A fundamental and not a rhetorical question to the above:
Is it not the case that all sound can be broken down to a combination of sine waves? AFAIK, at its simplest the sound from a tuning fork is a sine wave, and my understanding of sound is on the lines that there is no other function to any sound wave; it is just an aggregation of sine waves.


Any time series can be represented as a sum of orthogonal basis functions. They don't even really need to be orthogonal, but this makes the calculation easier. As you rightly point out, sine waves are the natural basis functions for representing audio because each one corresponds to a simple audible tone. I wanted to point out this assumption, but I'm not sure it's really useful to do so.

"time series" is a sequence of data points varying with time. "orthogonal" is a general version of being at right angles. For example, you can't represent a 1kHz sine wave in terms of a 2kHz sine wave - they are totally unrelated. "basis functions" are a set of curves (like sine waves) that taken together can represent any possible time series.
All over my head Peter, the second paragraph above! All I was trying put forth is my understanding that if all sound is just an aggregation of sine waves, each of which can be captured accurately by just two data points on it, then the sampling frequency of 44000/second is enough to accurately capture all sound of frequencies up 22 kHz. The jaggies/steps don't exist and a higher sampling frequency does not yield any more information that is necessary for accurate reproduction of those sounds; the use of the word sample is what can mislead to that line of thinking because in common sense thinking, the more the samples, the better the accuracy of the conclusions obtained by sampling.

Aliasing/anti aliasing is a separate matter than I don't know enough to describe beyond understanding that this engineering is needed to stop frequencies when present above 22khz from messing up the sound delivered from frequencies up to 22 kHz, a little beyond the audible range, when sampling frequency is 44000 samples/second.
I do however see a situation where low res components aren't mass produced anymore and making a Hi Res capable hardware equipped product may be cheaper..

Yes, but they might still choose to use it in 'MedRes' mode, and just ignore the HiRes capabilities. As you say, if we assume a big leap in capability, it may well prove possible to experiment with other solutions, but keeping HiRes files in perfect sync with MedRes could prove tricky.

I suspect that much of Hi Res hardware made today is no more expensive to make than the Lo res kind and marketing does the virtue out of a necessity thing to tout the capability.

I'm afraid that I'm very cynical about the price we pay for hitech goods, feeling that it's largely smoke and mirrors.
Hey Kumar - I think you understand very well at a practical level. The one point I'd argue with is about the jaggies/steps, by which I presume you mean steps in the analog signal when reconstructed from digital information. These steps come from mentally or visually joining the dots of data occurring 44,100 times per second. If you don't join the dots, it looks very spiky! But it doesn't matter how the dots are joined because all the information between dots is above the Nyquist, and therefore not audible. So join them any way you want, steps, jaggies, or leave as data spikes, it all sounds the same to us humans.
You want to make sure that all tests are from one mastered source to have a test with just one variable; multi variable tests are pointless.

Couldn't agree more, but in this type of test (not even blind, let alone double blind) it's very easy to be misled, anyway.
If you don't join the dots, it looks very spiky! But it doesn't matter how the dots are joined
Aren't they just dots if not joined?! It does matter how they are joined, because that is what drives the look of smooth v jaggy steps. Audibly it doesn't matter, I agree, but the stepped representation is a large part of the misunderstanding of digital sound capture and reproduction.
The question is, what is doing the joining, if at all? When I record the analog output of the Connect, I get 44,100 numbers per second. Nothing in between, but as we agree, it doesn't matter. I think the joining is in our minds, or when we draw diagrams.

Couldn't agree more, but in this type of test (not even blind, let alone double blind) it's very easy to be misled, anyway.

And it is extremely difficult to set up a test at home that can be relied on. Here's the thing though: I did my best to set up a good blind test with a friend on hand, and I could not reliably perceive differences. I then gave up trying to perfect the test in the area of sound level matching using an instrument, because I did not think that this would end up reversing the test outcomes, that improved level matching would show up differences hitherto not heard.
But if it was the other way around, if differences had been heard, I would have wanted to be sure that this isn't because of testing protocol imperfections, before making equipment decisions. I would have felt the need to make sure that a 0.2dB difference in levels isn't the cause heard differences. Unless of course difference were day and night as they are in the case of HD v DVD.
And it is extremely difficult to set up a test at home that can be relied on.

Very much so... Impossible without extra kit, unbiased helpers and a great deal of work...

Here's the thing though: I did my best to set up a good blind test with a friend on hand, and I could not reliably perceive differences. I then gave up trying to perfect the test in the area of sound level matching using an instrument, because I did not think that this would end up reversing the test outcomes, that improved level matching would show up differences hitherto not heard.

I took the same decision when trying to find out if I could distinguish between flac files and various mp3 files. When I couldn't distinguish between them in a blind test (using a disinterested helper), it seemed safe to assume that level matching wasn't an issue.