We were hit by a huge storm Tuesday morning which knocked out power and unfortunately our backup generator didn't come online, causing power loss on our entire rack. Everything seems fine except for our PowerVault MD3220i which no longer comes online, lights up the drives, etc. when powered back on. I have replaced both of the power supplies and still am unable to get it to work. This unit has worked flawlessly since 2013.
I took both controllers out, pulled the batteries out and put them back in.
I am able to ping the management IP of both controllers, and they are showing activity, but am not able to connect to either of them in the storage manager application.
I connected up the serial cable and and receiving the following during boot:
-=<###>=-
Instantiating /ram as rawFs, device = 0x1
Formatting /ram for DOSFS
Instantiating /ram as rawFs, device = 0x1
Formatting...Retrieved old volume params with %38 confidence:
Volume Parameters: FAT type: FAT32, sectors per cluster 0
0 FAT copies, 0 clusters, 0 sectors per FAT
Sectors reserved 0, hidden 0, FAT sectors 0
Root dir entries 0, sysId (null) , serial number f10000
Label:" " ...
Disk with 1024 sectors of 512 bytes will be formatted with:
Volume Parameters: FAT type: FAT12, sectors per cluster 1
2 FAT copies, 1010 clusters, 3 sectors per FAT
Sectors reserved 1, hidden 0, FAT sectors 6
Root dir entries 112, sysId VXDOS12 , serial number f10000
Label:" " ...
RTC Error: Real-time clock device is not working
OK.
Adding 14606 symbols for standalone.
Reset, Power-Up Diagnostics - Loop 1 of 1
3600 Processor DRAM
01 Data lines Passed
02 Address lines Passed
3300 NVSRAM
01 Data lines Passed
4410 Ethernet 82574 1
01 Register read Passed
02 Register address lines Passed
6D40 Bobcat
02 Flash Test Passed
3700 PLB SRAM
01 Data lines Passed
02 Address lines Passed
7000 SE iSCSI BE2 1
01 Register Read Test Passed
02 Register Address Lines Test Passed
03 Register Data Lines Test Passed
3900 Real-Time Clock
01 RT Clock Tick Passed
Diagnostic Manager exited normally.
Controller has been locked down due to Hardware errors:
================= EXCEPTION LOG =================
Serial number: 29T005W
Entry count: 8
Wrap-arounds: 0
First entry time:
Current Controller date/time: MAR-09-2017 06:51:58 AM
Current Local (User) date/time: MAR-09-2017 04:18:23 PM
---- Log Entry #0 (Core 0) DEC-11-2012 02:22:20 PM ----
WARNING: Reset by alternate controller
---- Log Entry #1 (Core 0) DEC-11-2012 02:46:05 PM ----
WARNING: Reset by alternate controller
---- Log Entry #2 (Core 0) DEC-11-2012 03:53:04 PM ----
WARNING: Reset by alternate controller
---- Log Entry #3 (Core 0) AUG-06-2013 09:01:28 PM ----
WARNING: Reset by alternate controller
---- Log Entry #4 (Core 0) NOV-15-2013 02:25:04 AM ----
11/15/13-10:06:49 (tNtbErrPolling): PANIC: PLX NTB Port 4 reg 0x000044a4 changed, original val 0x00000000, current val 0x00000010
Stack Trace for tNtbErrPolling:
0x0025ffac vxTaskEntry +0x5c : vkiTask (0x15000308)
0x0016844c vkiTask +0xec : ntbErrPolling ()
0x00143c20 ntbErrPolling+0x2a0: ntbRegCompare (0x4, 0xac8e10)
0x00143000 ntbRegCompare+0x100: _vkiCmnErr ()
0x00163544 _vkiCmnErr +0x104: 0x00163780 (0x585580, 0x4f8be0, 0xd838e0)
0x00163b04 vkiLogShow +0x544: psvJobAdd (0x1648a0, 0xd83a40, 0, 0)
0x00148c04 psvJobAdd +0x64 : msgQSend ()
0x00402714 msgQSend +0x61c: taskUnlock ()
---- Log Entry #5 (Core 0) NOV-15-2013 02:25:38 AM ----
WARNING: Reset by alternate controller
---- Log Entry #6 (Core 0) AUG-04-2014 09:28:29 PM ----
WARNING: Reset by alternate controller
---- Log Entry #7 (Core 0) DEC-15-2014 06:33:01 PM ----
12/16/14-02:43:26 (tNtbErrPolling): PANIC: PLX NTB Port 0 reg 0x00000364 changed, original val 0x00000000, current val 0x00000020
Stack Trace for tNtbErrPolling:
0x0026070c vxTaskEntry +0x5c : vkiTask (0x15000308)
0x00168b4c vkiTask +0xec : ntbErrPolling ()
0x00144308 ntbErrPolling+0x288: ntbRegCompare (0, 0xebf6a0)
0x00143700 ntbRegCompare+0x100: _vkiCmnErr ()
0x00163c44 _vkiCmnErr +0x104: 0x00163e80 (0x585f20, 0x4f9320, 0xd84290)
0x00164204 vkiLogShow +0x544: psvJobAdd (0x164fa0, 0xd843f8, 0, 0)
0x00149304 psvJobAdd +0x64 : msgQSend ()
0x00402e54 msgQSend +0x61c: taskUnlock ()
---- Log Entry #8 (Core 0) DEC-15-2014 06:33:39 PM ----
WARNING: Reset by alternate controller
---- Log Entry #9 (Core 0) DEC-15-2015 06:40:42 AM ----
Root Complex TLP header[0] 30008000
Root Complex TLP header[1] 01200033
Root Complex TLP header[2] 00000000
Root Complex TLP header[3] 00000000
PCI SERR Exception
PLX PCI-E Switch (Unit 0)
VID 0x10b5 DID 0x8632 B0:D0:F0
PCI Status = 0x4010
Bridge Secondary PCI Status = 0x4000
PLX PCI-E Bridge to Host Card (Unit 1)
VID 0x10b5 DID 0x8632 B1:D4:F0
PCI Status = 0x4010
PCI-E Device Status = 0x0005
PCI-E AER Uncorrectable Status = 0x00040000
Header Log 0 = 0x00000044
Header Log 1 = 0x00000044
Header Log 2 = 0x00000044
Header Log 3 = 0x20008080
PCI-E AER Correctable Status = 0x00000040
---- Log Entry #10 (Core 0) DEC-15-2015 06:40:45 AM ----
WARNING: Restart by watchdog time out
---- Log Entry #11 (Core 0) DEC-15-2015 06:41:19 AM ----
WARNING: Reset by alternate controller
---- Log Entry #12 (Core 0) AUG-06-2016 03:55:22 PM ----
WARNING: Reset by alternate controller
---- Log Entry #13 (Core 0) MAR-06-2017 09:43:27 PM ----
ERROR: Port 0 Bad TLP Count 1572864 exceeds threshold 16
ERROR: Port 0 Bad DLLP Count 1073743408 exceeds threshold 16
ERROR: Port 4 Bad TLP Count 1572864 exceeds threshold 16
ERROR: Port 4 Bad DLLP Count 1073743408 exceeds threshold 16
ERROR: Port 5 Bad TLP Count 1572864 exceeds threshold 16
ERROR: Port 5 Bad DLLP Count 1073743408 exceeds threshold 16
ERROR: Port 6 Bad TLP Count 1572864 exceeds threshold 16
ERROR: Port 6 Bad DLLP Count 1073743408 exceeds threshold 16
ERROR: Port 2/6 Rx Err Count 24 exceeds threshold 16
---- Log Entry #14 (Core 0) MAR-06-2017 09:43:27 PM ----
ERROR: Type-I Port 0 ECC correctable error threshold exceeded reg 0xf1a val 0x18
---- Log Entry #15 (Core 0) MAR-06-2017 09:46:14 PM ----
ERROR: Port 0 Bad TLP Count 1572864 exceeds threshold 16
ERROR: Port 0 Bad DLLP Count 1073743408 exceeds threshold 16
ERROR: Port 4 Bad TLP Count 1572864 exceeds threshold 16
ERROR: Port 4 Bad DLLP Count 1073743408 exceeds threshold 16
ERROR: Port 5 Bad TLP Count 1572864 exceeds threshold 16
ERROR: Port 5 Bad DLLP Count 1073743408 exceeds threshold 16
ERROR: Port 6 Bad TLP Count 1572864 exceeds threshold 16
ERROR: Port 6 Bad DLLP Count 1073743408 exceeds threshold 16
ERROR: Port 2/6 Rx Err Count 24 exceeds threshold 16
---- Log Entry #16 (Core 0) MAR-06-2017 09:46:14 PM ----
ERROR: Type-I Port 0 ECC correctable error threshold exceeded reg 0xf1a val 0x18
---- Log Entry #17 (Core 0) MAR-06-2017 09:47:35 PM ----
Faults are detected on all installed power supplies
---- Log Entry #18 (Core 0) MAR-06-2017 09:50:18 PM ----
ERROR: Port 0 Bad TLP Count 526208 exceeds threshold 16
ERROR: Port 4 Bad TLP Count 526208 exceeds threshold 16
ERROR: Port 5 Bad TLP Count 526208 exceeds threshold 16
ERROR: Port 6 Bad TLP Count 526208 exceeds threshold 16
ERROR: Port 0/4 Rx Err Count 128 exceeds threshold 16
---- Log Entry #19 (Core 0) MAR-06-2017 09:50:18 PM ----
ERROR: Type-I Port 0 ECC correctable error threshold exceeded reg 0x678 val 0x80
03/09/17-16:18:23 (tSystem): ERROR: FPGA FW is out of date
"Rhone03 rev17" currently in use
"Rhone03 rev20" available for update
Current date: 03/09/17 time: 16:18:23
Send <BREAK> for Service Interface or baud rate change
03/09/17-16:18:27 (tNetCfgInit): NOTE: eth0: LinkUp event
03/09/17-16:18:28 (tNetCfgInit): NOTE: Acquiring network parameters for interface gei0 using DHCP
03/09/17-16:18:37 (ipdhcpc): NOTE: netCfgDhcpReplyCallback :: received OFFER on interface gei0, unit 0
03/09/17-16:18:38 (ipdhcpc): NOTE: DHCP server: 10.0.0.1
03/09/17-16:18:38 (ipdhcpc): WARN: **WARNING** The DHCP Server did not assign a permanent IP for gei0.
03/09/17-16:18:38 (ipdhcpc): WARN: Network access to this controller may eventually fail.
03/09/17-16:18:38 (ipdhcpc): NOTE: DNS domain name: XXXXXX.com
03/09/17-16:18:38 (ipdhcpc): NOTE: DHCP client name: md3220i-mgmt
03/09/17-16:18:38 (ipdhcpc): NOTE: Client DNS name servers: 10.0.0.1
03/09/17-16:18:38 (ipdhcpc): NOTE: Client IP routers: 10.0.0.1
03/09/17-16:18:38 (ipdhcpc): NOTE: Assigned IP address: 10.0.0.122
03/09/17-16:18:38 (ipdhcpc): NOTE: Assigned subnet mask: 255.255.255.0
03/09/17-16:18:38 (tNetReset): NOTE: Network Ready
I replaced both of the power supplies, so not sure why the error is still there or how to clear it. I got into the vxworks shell, but am not sure how I can clear this so it starts again.
Anyone have an idea how to clear this?