hello,
I have a powervault PowerVault MD3200i with dual RAID controler. It was bought 5 or 6 years ago, so no more under warranty.
Several weeks ago, for an electric powersupply maintenance in my datacenter, I had to shutdown properly all my systems (servers, racks, network, storage,...).
Once maintenance done, all systems were restarted.
Since this time, the storage device MD3200i is running with an alarm state:
RAID Controller Module in slot 0 is REMOVED
- Symptoms:
In MDSM:
The tool Modular Disk Storage Management client shows an empty slot for this controler in the backside view of the MD3200i and sees it as REMOVED even if the controler is in.
In my rack:
Physically, on the back of the device, this controler is in the slot, well powered (some LEDs are on) and connected to the network (leds blinking for management and iSCSI RJ45)
I can ping the management interface with its IP own address.
Currently, all the I/O are processed by the another RAID controler (in slot 1). Production workloads keep running but with lower performances.
- Things tried to fix:
* Stop / restart all the MD3200I ==> same pb / no change
* Eject / reinsert the faulty controler ==> same pb / no change
- Diags tried:
I plugged the serial cable on the faulty controler and watch its boot sequence (after remove/reinsert):
It doesn't seem that the controler is in the lockdown mode (as I can seen before in others forums), because it doesn't even boot: it hangs before starting the lockdown check sequence.
Here is a dump of a boot:
------
-=<###>=-
Instantiating /ram as rawFs, device = 0x1
Formatting /ram for DOSFS
Instantiating /ram as rawFs, device = 0x1
Formatting...Retrieved old volume params with %38 confidence:
Volume Parameters: FAT type: FAT32, sectors per cluster 0
0 FAT copies, 0 clusters, 0 sectors per FAT
Sectors reserved 0, hidden 0, FAT sectors 0
Root dir entries 0, sysId (null) , serial number f10000
Label:" " ...
Disk with 1024 sectors of 512 bytes will be formatted with:
Volume Parameters: FAT type: FAT12, sectors per cluster 1
2 FAT copies, 1010 clusters, 3 sectors per FAT
Sectors reserved 1, hidden 0, FAT sectors 6
Root dir entries 112, sysId VXDOS12 , serial number f10000
Label:" " ...
RTC Error: Real-time clock device is not working
OK.
Adding 14588 symbols for standalone.
Reset, Power-Up Diagnostics - Loop 1 of 1
3600 Processor DRAM
01 Data lines Passed
02 Address lines Passed
3300 NVSRAM
01 Data lines Passed
4410 Ethernet 82574 1
01 Register read Passed
02 Register address lines Passed
6D40 Bobcat
02 Flash Test Passed
3700 PLB SRAM
01 Data lines Passed
02 Address lines Passed
7000 SE iSCSI BE2 1
01 Register Read Test Passed
02 Register Address Lines Test Passed
03 Register Data Lines Test Passed
3900 Real-Time Clock
01 RT Clock Tick Passed
Diagnostic Manager exited normally.
02/22/17-15:51:05 (tSystem): ERROR: FPGA FW is out of date
"Rhone03 rev17" currently in use
"Rhone03 rev20" available for update
02/22/17-15:51:09 (tNetCfgInit): NOTE: eth0: LinkUp event
02/22/17-15:51:10 (tNetCfgInit): NOTE: Network Ready
Bad File CRC after decompression
FAILED - ERROR READING FILE (errno = 0x610001)
Kernel initialization complete
Current date: 02/22/17 time: 15:51:21
Send <BREAK> for Service Interface or baud rate change
-------
In this state, nothing appens, even after wait 30/60 minutes while watching the
I can press Ctrl BREAK key to see the Service Interface :
'Press within 5 seconds: <S> for Service Interface, <BREAK> for baud rate'
But even if "S" key pressed, nothing appens; no menu displayed as expected.
I would try to reflash the controler again but I don't know if I can do it manually via serial cable, without using the MSDM tool.
How do I can do it, if it is possible ?
Do you have any other ideas to try to fix this faulty controler if this issue is really only software ?
Thank for any help.
PS: attached a partial SupportBundle (without trace-buffers.7z, because oversized) to this post.