[linux] Chybny disk v RAID - ako najst pricinu resp. chybny sektor
patrik na foral.sk
patrik na foral.sk
Úterý Červen 2 15:20:34 CEST 2009
> Dakujem, presne toto bol problem.
> Prave skusam selftest:
> smartctl -t long -d ata /dev/sda
> smartctl -t long -d ata /dev/sdb
> a test uz frci.
Vysledky SMART-u:
ID# ATTRIBUTE_NAME VALUE WORST THRESH TYPE RAW_VALUE
1 Raw_Read_Error_Rate 200 200 051 Pre-fail 5
3 Spin_Up_Time 185 184 021 Pre-fail 1750
4 Start_Stop_Count 100 100 000 Old_age 16
5 Reallocated_Sector_Ct 200 200 140 Pre-fail 0
7 Seek_Error_Rate 200 200 000 Old_age 0
9 Power_On_Hours 095 095 000 Old_age 4341
10 Spin_Retry_Count 100 253 000 Old_age 0
11 Calibration_Retry_Count 100 253 000 Old_age 0
12 Power_Cycle_Count 100 100 000 Old_age 16
192 Power-Off_Retract_Count 200 200 000 Old_age 3
193 Load_Cycle_Count 200 200 000 Old_age 16
194 Temperature_Celsius 117 108 000 Old_age 26
196 Reallocated_Event_Count 200 200 000 Old_age 0
197 Current_Pending_Sector 200 200 000 Old_age 0
198 Offline_Uncorrectable 200 200 000 Old_age 0
199 UDMA_CRC_Error_Count 200 200 000 Old_age 0
200 Multi_Zone_Error_Rate 200 200 000 Old_age 0
...niektore stlpce som vyhodil, kvoli prehladnenejsiemu formatovaniu
(v jednom riadku), vsetky hodnoty mi pridu OK az na
Raw_Read_Error_Rate, tym si nie som isty, ale podla toho co som
vygooglil, je hodnota vo VALUE vyssia sko hodnota v THRESH - takze by
to malo byt OK.
Vo vypise testu je ale niekolko zaznamov o chybe, ktore prikladam,
dufam, ze nebude vadit ze je to trochu dlhsie:
Error 18 occurred at disk power-on lifetime: 3302 hours (137 days + 14
hours)
When the command that caused the error occurred, the device was
active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
04 51 00 34 cf f3 a3
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ea 00 00 c6 f4 7f 66 08 49d+17:02:41.210 FLUSH CACHE EXIT
ea 00 00 c6 f4 7f 66 08 49d+17:02:30.772 FLUSH CACHE EXIT
ca 00 08 bf f4 7f 00 08 49d+17:02:30.772 WRITE DMA
ea 00 00 66 f9 00 5e 08 49d+17:02:30.772 FLUSH CACHE EXIT
ca 00 08 5f f9 00 00 08 49d+17:02:30.568 WRITE DMA
Error 17 occurred at disk power-on lifetime: 2109 hours (87 days + 21
hours)
When the command that caused the error occurred, the device was
active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
10 51 08 bf f4 7f e0
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ec 00 08 34 cf f3 00 08 49d+17:02:44.358 IDENTIFY DEVICE
ea 00 08 b7 83 25 00 08 49d+17:02:44.350 FLUSH CACHE EXIT
ec 00 08 b7 83 25 00 08 49d+17:02:44.147 IDENTIFY DEVICE
ec 00 08 6f 18 24 00 08 49d+17:02:44.137 IDENTIFY DEVICE
ec 00 00 34 cf f3 00 08 49d+17:02:44.127 IDENTIFY DEVICE
Error 16 occurred at disk power-on lifetime: 2109 hours (87 days + 21
hours)
When the command that caused the error occurred, the device was
active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
04 51 08 34 cf f3 a3
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ea 00 08 b7 83 25 00 08 49d+17:02:44.350 FLUSH CACHE EXIT
ec 00 08 b7 83 25 00 08 49d+17:02:44.147 IDENTIFY DEVICE
ec 00 08 6f 18 24 00 08 49d+17:02:44.137 IDENTIFY DEVICE
ec 00 00 34 cf f3 00 08 49d+17:02:44.127 IDENTIFY DEVICE
ea 00 00 c6 f4 7f 37 08 49d+17:02:44.119 FLUSH CACHE EXIT
Error 15 occurred at disk power-on lifetime: 2109 hours (87 days + 21
hours)
When the command that caused the error occurred, the device was
active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
10 51 08 b7 83 25 e0
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ec 00 08 6f 18 24 00 08 49d+17:02:44.137 IDENTIFY DEVICE
ec 00 00 34 cf f3 00 08 49d+17:02:44.127 IDENTIFY DEVICE
ea 00 00 c6 f4 7f 37 08 49d+17:02:44.119 FLUSH CACHE EXIT
ca 00 08 bf f4 7f 00 08 49d+17:02:44.119 WRITE DMA
ea 00 08 37 89 5e 00 08 49d+17:02:44.119 FLUSH CACHE EXIT
priznam sa, moc nerozumiem, aka je zavaznost tychto chyb.
Predpokladam ale, ze nic, co by mohlo sposobit pad pola do
degradovaneho rezimu.
Co sa tyka logov, /var/log/kern.log mam stale prazdny, co mam este
skontrolovat, resp. kde moze byt chyba? (Debian Linux 4.0 / 2.6.18-6-486)
--
Patrik Jan (pa3k)
Další informace o konferenci linux