ZP100 not booting



Show first post
This topic has been closed for further comments. You can use the search bar to find a similar topic, or create a new one by clicking Create Topic at the top of the page.

197 replies

Thanks everyone for your posts. Using what is here, I've been able to recover two ZP100s that were loading the diagnostic firmware with a bus pirate.

I have a third that boots past the serial connection and gets an IP address from my router. The white light however stays blinking and it never gets to the point where I can add it to the rest of my sonos system.

When booting, even though it grabs an IP and responds to pings, it doesn't serve up the web pages with status, etc.

A factory reset gets it blinking orange, but there it stays.

Is it possible to have it revert back to the diagnostic firmware so I can reflash it? Any other suggestions?

Thanks


Hi everyone, I recently bought a second hand ZP100 and it's displaying the exact same issues as enton is describing. When I first got the unit home I added it to my existing system (without factory reset) and all seemed OK for a while. I then noticed that some of my other units were dropping out randomly and then the ZP100 disappeared so I disconnected it and tried to do a factory reset. This resulted in a continuous orange/white flashing light that never turns green. If I power on normally I just get a flashing white light that never turns green. With the unit plugged in via ethernet to my router I can see its IP address but can't access any of the status web pages etc. Any suggestions about where to start? Or is it a lost cause?!
Badge
To all the contributors, thank you so much for all the trial and error and documenting your solutions. I have a ZP100 with the same symptoms as most others with 3.2-29243-diag firmware. I can access it via the the default IP 169.254.1.1:1400/status, etc. I have ordered a USB to UART card but in the meantime I finally finished reading the last few entries of this thread and found the url solution. Yeah, it sure seemed to me that there ought to be a way.... btw, I am not proficient in LINUX.

@anapsix
I have the ZP100 connected to a local router configured for 169.254.x.x with internet access. My laptop is assigned 169.254.1.2. I am using XAMPP under WIN7 as a web server from my laptop. Since I'm not a linux guy, I don't understand you comment about the PHP server and running the command sudo php -S 0.0.0.00. I did however, place the fw.upd file in the C:\xampp\apache/ directory, in the C:\xammp directory and also in the root C:\

I can run "http://169.254.1.1:1400/diag/cgi-bin/bin/echo yes we can" and indeed see the response "yes we can".

As I run each of the URL scripts you provided above, I don't see any browser feedback except for the very last url. The response is:
awk: cmd. line:1: Unexpected end of string
wget: server returned error 404: HTTP/1.1 404 Not Found
Read failure 0
WGET exited with 1
Upgrade failed: (11) upgrade file download failed
pull_upgrade failed

When I look at the apache logs, it shows:
169.254.1.1 - - [27/Apr/2016:17:38:09 -0500] "GET /fw.upd HTTP/1.1" 404 1056 "-" "Wget"

Should I be seeing some sort of echo response from the first 5 url scripts?

Any idea what my error means? Is my web server just not finding the file or is it a format issue?
Userlevel 4
Badge +14

@anapsix
I have the ZP100 connected to a local router configured for 169.254.x.x with internet access. My laptop is assigned 169.254.1.2. I am using XAMPP under WIN7 as a web server from my laptop. Since I'm not a linux guy, I don't understand you comment about the PHP server and running the command sudo php -S 0.0.0.00. I did however, place the fw.upd file in the C:\xampp\apache/ directory, in the C:\xammp directory and also in the root C:\

I can run "http://169.254.1.1:1400/diag/cgi-bin/bin/echo yes we can" and indeed see the response "yes we can".

As I run each of the URL scripts you provided above, I don't see any browser feedback except for the very last url. The response is:
awk: cmd. line:1: Unexpected end of string
wget: server returned error 404: HTTP/1.1 404 Not Found
Read failure 0
WGET exited with 1
Upgrade failed: (11) upgrade file download failed
pull_upgrade failed

When I look at the apache logs, it shows:
169.254.1.1 - - [27/Apr/2016:17:38:09 -0500] "GET /fw.upd HTTP/1.1" 404 1056 "-" "Wget"

Should I be seeing some sort of echo response from the first 5 url scripts?

Any idea what my error means? Is my web server just not finding the file or is it a format issue?


Sounds like your ZP100 reach your machine, but the file is in the wrong folder. I don't recall how xampp is set up, but I think you should have a folder called "htdocs" somewhere under c:\xampp somewhere, you should put the file in there. Some web servers also need a defined mime-type configured for specific suffixes (IIS for example) in order to serve the file but I don't think apache cares about that.
Badge
@jishi - Thanks. placing the file under htdocs resolved the problem of finding the update file.

I am still having problems though. I still see no feedback until running the last url script above, but this time, instead of text in the browser window, I receive a download of the run.sh file. opening it as a text file I see:

awk: cmd. line:1: Unexpected end of string
upgrade
version 28.1-83040
compatible with Sonos Zone Player submodels 0-16 revisions 0-4294967294 (any region)
compatible with hardware feature set 1d
My hardware feature set is 0
Upgrade supports all my features
/-\|/-\|/- several lines of giberrish-\|/-\|/-\|/-\|/-\|/-\|
Upgrade file is good
Using new partition format mode
Destination section 0 generation 11
Operating in redundant partition mode (not changing partition table)
Executing upgrade script...failed
Upgrade failed: (35)
failure reading upgrade script file
pull_upgrade failed

No error logs from the apache server; looks lile it served the file ok.
Still investigating.
Badge
As a final note for the upgrade, I used a command way back in the thread to get the software download and it was 28.1-83040-1-1.upd. I later followed the instructions to use a path from my current working system and of course substituting the proper sonosid, etc. That downloaded a file named 31.8-24090-1-16.upd. I probably should have used that file for the upgrade but I stuck with the older 28-1 file.

After the successful upgrade, I tried to add the ZP100 to my current system and it really hosed everything up. Got to the update software part and gave an error 1101. I reset the system and tried again. As soon as I plugged in the ZP100, the rest of the sonos system went down again, I unplugged the rest of the system, set up the ZP100 as a new system, it updated to 39.1-26010 and worked well. I then did a factory reset and tried to add it back to my original system. Plugged everything back in, everything was acting fine with the ZP100 plugged in waiting to be added. I added it to the system and the controller updated the rest of my system to 39.1-2600. ZP100 is working great!

One negative side effect (I think) is that when the system updated to 39.1, it is now giving me some sort of error (warning): "Some Sonos players are using the wireless connection from your range extender device. You will be unable to play music in a group of rooms including such a player. To ensure playback in all grouped rooms, you will need a Sonos BOOST or a player permanently wired to your router."

I don't know if this is a result of the update to 31.9 or if it's because I powered down all the Sonos equipment and maybe they associated differently when they restarted. I'll check with Sonos Technical support later. Anyway, probably a subject for another thread but I thought that I would mention the issue here of updating the ZP100 to 28.1 and trying to add it to a currently updated system. It might be better to get the latest download first.
Userlevel 2
Badge
Good to know. FYI for anyone else doing this, I think it may be possible to keep serial access after the upgrade by running:
"mdputil -wfF 3" prior to upgrading to the upgrade. YMMV, I'm not responsible if you brick the device.
hello everyone! thank you for this thread! with your help, i managed to connect my sonos cpu board to my computer through uart and did some inspections.

the history in a nutshell was that my zp100 is a used unit, almost 9 or 10 years old. i've opened it multiple times to clean it or check the power supply... anyway i never had any problem with it.
back in the days it started to freeze randomly, independently of playing music or standby... there was no overheating...
so i decided to unplug and let it rest a few days but now when i wanted to boot it up the white led was blinking infinitely. cleaned it, checked the voltages temperatures no problem. PLUS the ethernet switch works indiviually...

with the uart i got into the bootloader and did some tests. RAM is good, NAND has only one bad block, but it won't boot the kernel...
This is the output for the boot linux from nand:
code:

Rincon boot loader version 0.16-11080(ZP) (32M SDRAM). Press 'h' for help.
h - help
m - SDRAM test
i - print NAND device ID
n - NAND device scan
x - NAND device destructive test
y - NAND device dump first page
p - Program NAND device
b - Boot the Linux kernel from NAND device
d - Boot diagnostics from NAND device
> NAND ID is EC:75
32M NAND flash (Samsung K9F5608U0C) detected
NAND flash block 970 is bad
Section 0 is provisionally good, kernel on partition 1, generation 15
nand_load: bad page magic, page 54688
nand_load: file appears to extend past end of partition
Section 1 is no good
Attempting to boot kernel from partition 1


and there it freezes. i've never seen the linux booting though uart ever...(yet).

can i do something? if the nand is fried or empty... how in the world could it get empty or fried if it was working fine for like 9-10 years...?

thank you very much!

EDIT: i'm using uart without plugging in to mains power... am i doing it right?

EDIT2: when i don't scan my nand before booting, it sees my nand good:
code:

SDRAM test complete
Attempting to autoboot from NAND device
NAND ID is EC:75
32M NAND flash (Samsung K9F5608U0C) detected
NAND flash block 970 is bad
Section 0 is provisionally good, kernel on partition 1, generation 15
Section 1 is provisionally good, kernel on partition 4, generation 14
Attempting to boot kernel from partition 1

but then freezes anyway.
I see something on my usb voltage meter. when it starts to boot, it uses the cpu and draws around 750-800mA. the bootloader freezes, and a few seconds later the current falls back to 500mA... it's like the cpu tried something then froze...
any idea? 😞
Userlevel 2
Badge
hThe linux console is disabled by default in production builds of the ZP100, so not seeing anything is normal. It sounds like the kernel starts to boot and crashes. This could be due to a bunch of different things, but it sounds like the NAND is going (not uncommon on units of this age). I've occasionally had luck doing a factory reset and trying to upgrade the unit if you can get it to boot at all post-reset (I think this allows it to go through and mark any newly bad blocks on the install of the new upgrade, but thats just a guess).

If you are really brave (and have access to the right hardware), you can try replacing the nand chip with an equivalent of the same model. Make sure you dump the data off this one using a flash programmer.

The ethernet switch has its own internal logic, so it will work even if the unit is hosed.
hThe linux console is disabled by default in production builds of the ZP100, so not seeing anything is normal. It sounds like the kernel starts to boot and crashes. This could be due to a bunch of different things, but it sounds like the NAND is going (not uncommon on units of this age). I've occasionally had luck doing a factory reset and trying to upgrade the unit if you can get it to boot at all post-reset (I think this allows it to go through and mark any newly bad blocks on the install of the new upgrade, but thats just a guess).

If you are really brave (and have access to the right hardware), you can try replacing the nand chip with an equivalent of the same model. Make sure you dump the data off this one using a flash programmer.

The ethernet switch has its own internal logic, so it will work even if the unit is hosed.


so you say i should keep trying until it might "accidently" boots up, do a factory reset and see what happens?
no chance for hardware problems (except nand)?
sadly i don't have jtag hardware at all.

but i don't understand... it was working fine for ages... i mean, my unit was given me from a hotel (because the hotel replaced all the audio with crestron) so this thing played music for long ages 24/7 and also it was working fine for 3 years at me. and it just suddenly freezes and dies without a sign before. weird.
Userlevel 2
Badge

so you say i should keep trying until it might "accidently" boots up, do a factory reset and see what happens?

Basically. When new software is installed it gets written around bad blocks on the NAND. Block 970 is part of the JFFS portion of the chip (RW user data), which should get cleared on a factory reset.


no chance for hardware problems (except nand)?

No idea, the things that I have had fail most commonly are NAND, RAM, and occasionally the DSP chip on the amplifier board. If the unit still won't boot after a factory reset, (and the SDRAM test passes), my guess would be that the issue is the amp board. If you have another opened unit somewhere you can try switching the computer board and see if it works then.

Another troubleshooting tool is to hook up a computer running wireshark directly to the unit when it boots and see if it's requesting an IP address. I've had units fail between when the kernel boots (and dhcpd is initialized) and when the webserver (anacapad) is initialized.


but i don't understand... it was working fine for ages... i mean, my unit was given me from a hotel (because the hotel replaced all the audio with crestron) so this thing played music for long ages 24/7 and also it was working fine for 3 years at me. and it just suddenly freezes and dies without a sign before. weird.


The thing is old. 10 years is well outside the expected life expectancy of consumer electronic components. So things eventually start to fail.

Basically. When new software is installed it gets written around bad blocks on the NAND. Block 970 is part of the JFFS portion of the chip (RW user data), which should get cleared on a factory reset.

No idea, the things that I have had fail most commonly are NAND, RAM, and occasionally the DSP chip on the amplifier board. If the unit still won't boot after a factory reset, (and the SDRAM test passes), my guess would be that the issue is the amp board. If you have another opened unit somewhere you can try switching the computer board and see if it works then.

Another troubleshooting tool is to hook up a computer running wireshark directly to the unit when it boots and see if it's requesting an IP address. I've had units fail between when the kernel boots (and dhcpd is initialized) and when the webserver (anacapad) is initialized.

The thing is old. 10 years is well outside the expected life expectancy of consumer electronic components. So things eventually start to fail.


thank you for your help! now i'm trying to boot it up through uart. still the same thing happens.
i could get it to amber-white flash to start the factory reset but it stuck in it. i waited for 10 minutes then interrupted...

the reason why i think that the linux part is defective is it won't even boot disconnected from the amp. or should it boot?...
i will check the amp board and try the wireshark and see what happens.
i don't really want to trash this thing out just one day to the next.
Userlevel 2
Badge
Unfortunately the CPU board won't boot all the way if it's disconnected from the amp. My guess is the kernel panics when it tries to load the DSP module and there is nothing there, or that Anacapad has some hard-coded calls to the amp board. If the unit gets to the point of requesting an IP address, some folks have had luck restarting it into factory reset while constantly pinging it. No idea why that works, but I've been able to replicate that behaviour.
Unfortunately the CPU board won't boot all the way if it's disconnected from the amp. My guess is the kernel panics when it tries to load the DSP module and there is nothing there, or that Anacapad has some hard-coded calls to the amp board. If the unit gets to the point of requesting an IP address, some folks have had luck restarting it into factory reset while constantly pinging it. No idea why that works, but I've been able to replicate that behaviour.

Sorry for being late.
I managed to start my device I don't now how... So thank you poshul!
I plugged everything together, started wireshark... and it booted up... I also started to ping it so maybe the ping or maybe wireshark's listening on ethernet port made it alive.
I performed a factory reset and now it works and did not freeze since a day now.

What I discovered is the led flickering again. It's definitely power related. When the amplifier turns on from stand by, it starts to flicker. If I stop the music it flickers less but continues until the amplifier turns off again.
Anyone had something similar like me? Do you know what is it? Maybe a capacitor? I hope it's not the DSP chip, but I don't think so.

Thank you for your help!
thank you for your suggestions buzz!

right now my unit works (with flickering led, power problems)
so i will use it a little while and then check all the boards.

the first reason is with a little usage the faulty part can show some physical signs of death and it can be recognized more easily...

the second reason is that i don't want to disassemble it's a long time and i will have to reassemble... 😃
Userlevel 1
Badge +3
So my Zp-100 has the flashing white LED failure.
When I got it from Ebay, i first did a hard reset (plug it in with mute+VolUp heald down) and all was well.
It worked great for a few hours then I could not see it with the Sonos app.
I found that when this happened, if I would just let it cool and start over with the hard reset i could get it working for a short while. Always doing the setup with Ethernet only.
I then did a tear down and re-flowed the logic parts with no luck. (noticed lots of flux on the board, and bloby solder.. Less quality then I expected)
Started measuring the DC voltages and they seemed fine.
Then I started pinging the ZP-100 at the IP that the router would always give the ZP-100. The ping returns would follow the ZP's functionality. If I got a return, the unit was working and vice versa.
I tried some clod spray on the logic board and that made the functionality last a few minutes longer, noticing that the lights on the Ethernet port connected never changed their behavior weather the ZP was working or not.
So I tried the following and had some luck.
I sprayed my last few seconds (bottle just about empty) of cold spray on the Ethernet controller/transformer chip. and quickly did a hard reset and this time (and for the first time) set it up in the way that prepares the ZP for a wifi installation.
What I found was once the setup was done, the ZP shuts off the Ethernet controller (connected Ethernet port goes dark) and switches to wifi. The unit stayed on and connected all night, longest ever.
When I get home I will see if it is still connected and try a reboot.
Will let yall know what happens.
Userlevel 1
Badge +3
So no joy. Not sure if I just got lucky with the cold spray or what, but my failure mode if back.
Going to try to swap out the ram chips and see what happens.

Been thinking after reading all of these pages of info. And my thoughts always lead to, what is the root cause of these boot problems? Some seem to be able to reinstall firmware and it fixes it.
Is there a smoking gun in all of this?
Badge
All, a huge thanks for such an informative thread. Really useful and very interesting!

I've got a non booting ZP100 and I get the following output, at which point it just sits there doing nothing. does anyone have any ideas?


Thanks in advance for any thoughts/help.



Rincon boot loader version 0.16-11080(ZP) (32M SDRAM). Press 'h' for help.
SDRAM test...
Memory test iteration 0
Boot interrupted
> h
Rincon boot loader version 0.16-11080(ZP) (32M SDRAM). Press 'h' for help.
h - help
m - SDRAM test
i - print NAND device ID
n - NAND device scan
x - NAND device destructive test
y - NAND device dump first page
p - Program NAND device
b - Boot the Linux kernel from NAND device
d - Boot diagnostics from NAND device
> NAND ID is EC:75
32M NAND flash (Samsung K9F5608U0C) detected
Section 0 is provisionally good, kernel on partition 1, generation 39
Section 1 is provisionally good, kernel on partition 4, generation 40
Attempting to boot kernel from partition 4