Для работы и управления данными контролерами используем утилиту tw_cli. Вот пример с работающего сервера, где вышёл из строя 9-й диск (здесь нету RAID’a, каждый диск сам по себе; о замене диска в RAID-5 читайте ниже):
# tw_cli /c1 show Unit UnitType Status %RCmpl %V/I/M Stripe Size(GB) Cache AVrfy ------------------------------------------------------------------------------ u0 SINGLE OK - - - 931.312 ON OFF u1 SINGLE OK - - - 931.312 ON OFF u2 SINGLE OK - - - 931.312 ON OFF u3 SINGLE OK - - - 931.312 ON OFF u4 SINGLE OK - - - 931.312 ON OFF u5 SINGLE OK - - - 931.312 ON OFF u6 SINGLE OK - - - 931.312 ON OFF u7 SINGLE OK - - - 931.312 ON OFF u8 SINGLE OK - - - 931.312 ON OFF u9 SINGLE OK - - - 931.312 ON OFF u10 SINGLE OK - - - 931.312 ON OFF Port Status Unit Size Blocks Serial --------------------------------------------------------------- p0 OK u0 931.51 GB 1953525168 9QJ0AE03 p1 OK u1 931.51 GB 1953525168 9QJ0FGZR p2 OK u2 931.51 GB 1953525168 9QJ2G2YV p3 OK u3 931.51 GB 1953525168 9QJ0FG7T p4 OK u4 931.51 GB 1953525168 9QJ0C5LS p5 OK u5 931.51 GB 1953525168 9QJ0FGEM p6 OK u6 931.51 GB 1953525168 9QJ0FGE9 p7 OK u7 931.51 GB 1953525168 9QJ08X84 p8 OK u8 931.51 GB 1953525168 9QJ0EVB6 p9 NOT-PRESENT - - - - p10 OK u10 931.51 GB 1953525168 9QJ6NS9X p11 OK u9 931.51 GB 1953525168 9QJ07RGQ
После того, как вставили диск, в messages можно увидеть такую запись
Dec 17 21:37:06 c-n200-u0297-251 kernel: twa1: INFO: (0x04: 0x001A): Drive inserted: port=9
и картина выглядит по другому:
# tw_cli /c1 show Unit UnitType Status %RCmpl %V/I/M Stripe Size(GB) Cache AVrfy ------------------------------------------------------------------------------ u0 SINGLE OK - - - 931.312 ON OFF u1 SINGLE OK - - - 931.312 ON OFF u2 SINGLE OK - - - 931.312 ON OFF u3 SINGLE OK - - - 931.312 ON OFF u4 SINGLE OK - - - 931.312 ON OFF u5 SINGLE OK - - - 931.312 ON OFF u6 SINGLE OK - - - 931.312 ON OFF u7 SINGLE OK - - - 931.312 ON OFF u8 SINGLE OK - - - 931.312 ON OFF u9 SINGLE OK - - - 931.312 ON OFF u10 SINGLE OK - - - 931.312 ON OFF Port Status Unit Size Blocks Serial --------------------------------------------------------------- p0 OK u0 931.51 GB 1953525168 9QJ0AE03 p1 SMART-FAILURE u1 931.51 GB 1953525168 9QJ0FGZR p2 OK u2 931.51 GB 1953525168 9QJ2G2YV p3 OK u3 931.51 GB 1953525168 9QJ0FG7T p4 OK u4 931.51 GB 1953525168 9QJ0C5LS p5 OK u5 931.51 GB 1953525168 9QJ0FGEM p6 OK u6 931.51 GB 1953525168 9QJ0FGE9 p7 OK u7 931.51 GB 1953525168 9QJ08X84 p8 OK u8 931.51 GB 1953525168 9QJ0EVB6 p9 OK - 931.51 GB 1953525168 9QJ8JFYS p10 OK u10 931.51 GB 1953525168 9QJ6NS9X p11 OK u9 931.51 GB 1953525168 9QJ07RGQ
Собственно сам диск виден, но системе ещё недоступен (юнитов 11, а дисков – 12; отсутствует юнит u11). Что бы он стал доступен, его нужно добавить. Поскольку у нас нет рейд массива (у нас все диски собираются в RAIZ2 пул), то добавлять его будем как single:
# tw_cli /c1 add type=single disk=9
Creating new unit on controller /c1 ... Done. The new unit is /c1/u11.
Setting default Storsave policy to [balance] for the new unit ... Done.
Setting default Command Queuing policy for unit /c1/u11 to [on] ... Done.
Setting write cache = ON for the new unit ... Done.
Warning: You do not have a battery backup unit for /c1/u11 and the enabled
write cache (default) may cause data loss in the event of power failure.
Примечание.
Если при выполнении предыдущей команды получаем ошибку
"Operation not allowed; retained cache data"
то нужно сначала удалить все неиспользуемые unit’ы:
#tw_cli /c1/u9 remove
и потом опять попытаться выполнить команду добавления нового диска.
Теперь можно в messages можно увидеть такое:
Dec 18 16:10:10 c-n200-u0297-251 kernel: da23 at twa1 bus 0 scbus1 target 11 lun 0
Dec 18 16:10:10 c-n200-u0297-251 kernel: da23: Fixed Direct Access SCSI-5 device
Dec 18 16:10:10 c-n200-u0297-251 kernel: da23: 100.000MB/s transfers
Dec 18 16:10:10 c-n200-u0297-251 kernel: da23: 953664MB (1953103872 512 byte sectors: 255H 63S/T 121575C)
Смотрим состояние пула:
# zpool status pool: backup state: DEGRADED status: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Attach the missing device and online it using 'zpool online'. see: http://www.sun.com/msg/ZFS-8000-2Q scan: resilvered 378G in 34h24m with 0 errors on Fri Nov 23 18:18:32 2012 config: NAME STATE READ WRITE CKSUM backup DEGRADED 0 0 0 raidz2-0 DEGRADED 0 0 0 da0 ONLINE 0 0 0 da1 ONLINE 0 0 0 da2 ONLINE 0 0 0 da3 ONLINE 0 0 0 da4 ONLINE 0 0 0 da5 ONLINE 0 0 0 da6 ONLINE 0 0 0 da11 ONLINE 0 0 0 da7 ONLINE 0 0 0 da8 ONLINE 0 0 0 da9 ONLINE 0 0 0 da10 ONLINE 0 0 0 da12 ONLINE 0 0 0 da13 ONLINE 0 0 0 da14 ONLINE 0 0 0 da15 ONLINE 0 0 0 da16 ONLINE 0 0 0 da17 ONLINE 0 0 0 da18 ONLINE 0 0 0 da19 ONLINE 0 0 0 da20 ONLINE 0 0 0 11896710113252406751 UNAVAIL 0 0 0 was /dev/da21
Выполняем замену диска:
# zpool replace backup 11896710113252406751 da23 # zpool status pool: backup state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Tue Dec 18 16:17:25 2012 11.2M scanned out of 8.10T at 1.40M/s, (scan is slow, no estimated time) 490K resilvered, 0.00% done config: NAME STATE READ WRITE CKSUM backup DEGRADED 0 0 0 raidz2-0 DEGRADED 0 0 0 da0 ONLINE 0 0 0 da1 ONLINE 0 0 0 da2 ONLINE 0 0 0 da3 ONLINE 0 0 0 da4 ONLINE 0 0 0 da5 ONLINE 0 0 0 da6 ONLINE 0 0 0 da11 ONLINE 0 0 0 da7 ONLINE 0 0 0 da8 ONLINE 0 0 0 da9 ONLINE 0 0 0 da10 ONLINE 0 0 0 da12 ONLINE 0 0 0 da13 ONLINE 0 0 0 da14 ONLINE 0 0 0 da15 ONLINE 0 0 0 da16 ONLINE 0 0 0 da17 ONLINE 0 0 0 da18 ONLINE 0 0 0 da19 ONLINE 0 0 0 da20 ONLINE 0 0 0 replacing-21 UNAVAIL 0 0 0 11896710113252406751 UNAVAIL 0 0 0 was /dev/da21 da23 ONLINE 0 0 0 (resilvering)
Замена диска в RAID-5
# tw_cli /c0 show Unit UnitType Status %RCmpl %V/I/M Stripe Size(GB) Cache AVrfy ------------------------------------------------------------------------------ u0 RAID-5 DEGRADED - - 256K 1117.56 ON OFF Port Status Unit Size Blocks Serial --------------------------------------------------------------- p0 OK u0 372.61 GB 781422768 9QH054F5 p1 OK u0 372.61 GB 781422768 KRFS02RAGV1BLC p2 DEGRADED u0 372.61 GB 781422768 H604P4EH p3 OK u0 372.61 GB 781422768 KRFS2CRAHNL8ED
После замены, RAID сам не увидит диск, нужно ему помочь: удалить старый и сделать rescan:
#tw_cli /c0/p2 remove Removing port /c0/p2 ... Done. #tw_cli /c0 rescan Rescanning controller /c0 for units and drives ...Done. Found the following unit(s): [none]. Found the following drive(s): [/c0/p2]. #tw_cli /c0 show Unit UnitType Status %RCmpl %V/I/M Stripe Size(GB) Cache AVrfy ------------------------------------------------------------------------------ u0 RAID-5 DEGRADED - - 256K 1117.56 ON OFF Port Status Unit Size Blocks Serial --------------------------------------------------------------- p0 OK u0 372.61 GB 781422768 9QH054F5 p1 OK u0 372.61 GB 781422768 KRFS02RAGV1BLC p2 OK - 465.76 GB 976773168 WD-WMAYP5324488 p3 OK u0 372.61 GB 781422768 KRFS2CRAHNL8ED
А теперь запускаем запускаем rebuild и смотрим картину:
#tw_cli /c0/u0 start rebuild disk=2 Sending rebuild start request to /c0/u0 on 1 disk(s) [1] ... Done. #tw_cli /c0 show Unit UnitType Status %RCmpl %V/I/M Stripe Size(GB) Cache AVrfy ------------------------------------------------------------------------------ u0 RAID-5 REBUILDING 1% - 256K 1117.56 ON OFF Port Status Unit Size Blocks Serial --------------------------------------------------------------- p0 OK u0 372.61 GB 781422768 9QH054F5 p1 OK u0 372.61 GB 781422768 KRFS02RAGV1BLC p2 DEGRADED u0 465.76 GB 976773168 WD-WMAYP5324488 p3 OK u0 372.61 GB 781422768 KRFS2CRAHNL8ED
В логах можно наблюдать такое сообщение:
twa0: INFO: (0x04: 0x000B): Rebuild started: unit=0