Frequent pauses leading to a silent device. Power cycling is the only workaround.

  • 11 November 2023
  • 22 replies
  • 219 views

Badge

(system setup description below).
Fairly recently (few months) I started having issues with my system. Basically a device starts pausing, then playing, then pausing. The pauses start being a few seconds every hour or so, but become longer and more frequent until eventually the device is just silent.

This happens whether playing music from a file share, streaming from Spotify or Apple Music or playing Radio.

This happens to any one of the devices. The only requirement is that it's left playing until it falls over.

The only way to resolve the issue is to power cycle it.

It is a daily occurence that one device needs to be power cycled.

If any support are reading (hopefully not as it’s the weekend!) I've taken the liberty of submitting a diagnostic report whilst this problem was happening:

1673763072

My setup is:

I have a small house, but a large garden with two home offices in the garden and two Play:1 in the garden. Every building is well insulated with foil backed insulation. So Sonosnet does not work too well without a few wired units.

Two Play ones are wired directly but with wifi disabled. Of the rest one Connect Amp is wired but the rest are all using Sonosnet.

This is a "traditional" (i.e. not a mesh system) with two managed Netgear 24 port switches and two unmanaged Netgear 16 port switches.

The two managed switches are cabled directly into the house router (Draytek) and have the unmanaged switches and three Draytek access points plugged into them.

  • 5 x Play 1 (two are paired)
  • 2 Connect Amps
  • 2 x Connect

All S2.

Network is mostly wired.

  • Draytek 2862 router
  • 2 x Netgear GS724t (STP is configured according to the Sonos support article but not needed now).
  • 3 x Draytek AP710 access points
  • 2 x Netgear GS116 unmanaged switches.

Thanks.
Chris


22 replies

Do the garden offices have an Ethernet connection? Where are the Play:1s with disabled radios located?

It would also be useful to see a screenshot of the Network Matrix, to be found at http://x.x.x.x:1400/support/review, where x.x.x.x is the IP of any player. Blur out the MAC addresses if you’re concerned about privacy.

Badge

Hi Ratty. Thanks for your reply (you probably do not remember but you helped me solve an STP issue many, many years ago!). Matrix attached. The Garden offices all have ethernet run to them. One garden office has a wired Connect (with wifi disabled) but the other has a Play one on Sonosnet. There are also a stereo pair of Play ones in the garden on Sonosnet.

 

“Office” is presumably the wired Connect. “Kitchen” is obviously wired too. “Patio 2” is the only other wired node by the looks of things. 

Two nodes look to be powered off. What happens if they’re on?

When the problems start to set in, do the switch port LEDs give any indication of abnormal traffic patterns?

 

Badge

I forgot the two undefined ones. These are phantom devices and do not exist. They’ve been there for some time. I think due to renaming some devices some years ago.

When I saw STP issues some years ago the port LEDS on both switches would be on solid. But with this current issue there is no noticeable change in the status of the lights.

Do you typically have one or multiple units playing? Are any players Grouped? There may be a transient source of interference. Does the Network Matrix change significantly over time?

When paired, the left unit becomes the “coordinator” for the pair. All data for the pair is managed by the coordinator. For Groups, the first player (used to build the Group) is the Group coordinator. When play is initiated a cache is built. Data from the cache is used to ride through small communication issues. If there are lots of issues the cache will shrink and eventually the unit will need to take action when its cache is exhausted. If (in the opinion of the player) the issue seems transient, the player will briefly mute. If the problem is severe the current track will be abandoned and the next track in the Queue will be attempted. A string of small issues can result in an apparently silent player as it patiently tries to work through a continuous string of small, transient issues. Obviously, real time services, such as net radio cannot skip to the next track.

Keep a log of issues. I once had a very severe, intermittent issue. After a few days of logging it became obvious that days and occasional evenings were fine, while other evenings were nasty. Sunday afternoon was nearly impossible. I slipped a copy of my log under my neighbor’s door, along with a polite note. A couple days later the annotated log returned. Some evenings she was traveling overnight for business and Sunday afternoon she called Mom. She was using a cordless phone system that was known to wreck WiFi. I was able to completely resolve the issue by avoiding wireless SonosNet connections near her base station.

You may have an intermittent hardware issue that interferes with network communication. This may or may not be SONOS caused. In another situation network traffic was much slower than expected for a wired network. It was very puzzling. Eventually I discovered that a hidden network wire had been damaged during construction. SONOS noticed this and used its wireless SonosNet to provide the connection to a network switch. While SonosNet is solid, it is not very fast compared to a Gigabit wired connection.

Phantom devices should vanish when the whole system reboots, when the firmware updates if not before. 

I can’t see anything obvious that suggests why it’s going wrong, but it feels like a possible interaction with the network infrastructure. The logs in your diagnostic could well offer a clue. If you don’t get a follow-up here early next week I suggest calling Sonos Support and getting them to parse the diagnostic. 

In the meantime I trust that you reserve fixed IP addresses for Sonos devices and anything that interacts with them. If not then it would be a great idea to do so.

Badge

Phantom devices should vanish when the whole system reboots, when the firmware updates if not before. 

I can’t see anything obvious that suggests why it’s going wrong, but it feels like a possible interaction with the network infrastructure. The logs in your diagnostic could well offer a clue. If you don’t get a follow-up here early next week I suggest calling Sonos Support and getting them to parse the diagnostic. 

In the meantime I trust that you reserve fixed IP addresses for Sonos devices and anything that interacts with them. If not then it would be a great idea to do so.

Thanks. Interesting about the phantom devices. I wonder why they are there. I run my own DHCP/DNS server (DNSMasq). It has the IP address of each Sonos device along with its MAC address.

Badge

Do you typically have one or multiple units playing? Are any players Grouped? There may be a transient source of interference. Does the Network Matrix change significantly over time?

 

Generally about two or three devices are grouped. Also there’s usually at least one other device playing on its own.

We cannot rule out the possibility of a hardware or very specific to one player network issue. For the purpose of your logging, you could systematically avoid Grouping or using each player. Submitting diagnostics could reduce your logging workload because the diagnostics log network issues for all players -- unless the situation is so bad that a player cannot report, but this is a good, though incomplete data point.

Badge

I contacted Sonos support on Monday (20th). They asked me to connect the three wired Sonos devices direct to the router rather than the switches. I did this but after a day the problem returned and I had to power cycle ths device (although I sent a diagnostic before doing so). Unfortunately it seems Sonos support are now quite busy and the chatbot is not available. They emailed me and said they would get back to me in the next seven days….! Fingers crossed.

Badge

Update…..! The initial support technician told me to wire all the speakers to the router. I did this but the problem persisted.

I submitted another diagnostic. This time the (excellent!) technician noted that many of the devices are suffering from low memory. Which fits the symptoms exactly as after a power cycle they function fine until the problem reoccurs.

Most of these devices are old and have less memory than the current models. He suggested that all the Connects and the Connect Amps should be wired with the wifi disabled. This would lighten the load on the problematic devices. He also suggested that the single wired "gateway" should be a Play One, which has more memory than the Connect/Connect Amps and so would be less likely to suffer the issue.

However, the only true solution would be to replace all the devices with current models, which have more memory. Unfortunately the Port and the Amp are considerably more expensive than the Connect and Connect Amps. Perhaps I need to bit the bullet and look for Black Friday offers today!

Interesting. I think that’s the first suggestion that even the 64MB units are beginning to run out of steam.

Badge

I understand the

Interesting. I think that’s the first suggestion that even the 64MB units are beginning to run out of steam.

Is it the Play:1 that only has 64mb of Ram?

Badge

I’ve plugged in every Connect/Connect Amps to ethernet and disabled wifi. That leaves five Play:1s. I’ve been able to plug in two of those. So I’ve disabled wifi on one, and left wifi enabled. So hopefully that leaves just four Play:1s on Sonosnet with just one of those ethernet connected. Fingers crossed.

According to this chart Play:1 has at least 128MB.

Badge

Thanks. So it’s the Connect/Connect Amp that has 64mb.

Badge

A further update. After wiring the majority of the devices, including all the Connect/Connect Amps (and disabling wifi) a few days later the problem returns. So it’s looking like the Connect and Connect Amps are nearing the end of their life! That’s going to be quite an investment for me to replace them all. I’ve taken a diagnostic and will contact support with it. But it looks like an unworkaroundable issue :-(  I wonder if I can switch to S1? I use SMB stored flac files for a music library, streamed radio, Apple Music and Spotify.

Here is an article about downgrading:

https://support.sonos.com/en-gb/article/downgrade-a-sonos-product-from-s2-to-s1

Badge

I finally found time to follow this up with Sonos, with fresh diagnostic reports after their advice to wire as much as possible to reduce memory usage on the older devices.

After lots of back and forth about not wiring speakers as it causes a broadcast storm, I was able to inform the representative that only one wired speaker had wifi enabled and I had not seen a broadcast storm for years! ..AND that is was sonos support that recommended me to do this :-)

The representative then said "a system update is the best way to proceed. But we do not have an ETA".

I asked for confirmation that this was “..the memory issue on the older devices” and the rep confirmed that this was the case and repeated that  "..At this time, updating your system is the best way to alleviate this issue. We do not currently have an ETA.".

So it looks like Sonos will hopefully reduce the memory usage for these older devices with a firmware update.

If they can. Since the amount of ‘dead’ memory may be a variable, it is possible that they can’t make every device work again. I wouldn’t be expecting a ‘silver bullet’ that fixes every issue. 

Badge

Not sure what you mean by “dead memory”, but according to earlier comments from support, it’s just the 2nd generation Connect and Connect Amps that are problematic. Hopefully whatever memory leak (which it must be as this only happens over time and a power cycle fixes it!) is happening can be reduced on those devices.

I don’t think it’s a memory leak in the software, although Sonos isn’t saying. My supposition is that the electronic memory is going bad, causing available memory to run the software to drop. Not being familiar with their code base, I don’t know how well their Linux OS can optimize for failed memory. 

But, these are guesses. If we’re lucky, it is just a memory leak, something much more ‘fixable’. 

Reply