RAID system check procedures

From Stadm
Jump to navigationJump to search

(Here are the step-by-step procedures for performing the RAID status checks from an ICS computer)

  1. Obtain the password for the stadm account on the console server from Aaron.
  2. Open a terminal shell from the operating system.
  3. Open a secure connection to “cserve” using the following command:
    ssh stadm@cserve
  4. Login using password.
  5. After logging in, you can clear the screen (Command-K in MacOSX terminal app).
    • this can be done at anytime to clear the screen
  6. For a list of commands type "help" and press ENTER
    • press ‘q’ to exit command listing mode.
  7. To see the devices on the console server, type "listdev"
  8. Use the “direct” command to connect to a raid device (either ‘raid-m’, ‘raid-j’, or ‘raid-w’).
    • usage: direct <raid_device> such as “direct raid-j”
  9. Press CTRL-D to reformat output (can be done at any time).
    • we are only interested in the OUTPUT section on the right-hand side of the screen
    • to copy the information, press command + a and then command + c
  10. To navigate the OUTPUT window use the arrow keys or “a, z, s, x” as follows:
    left-arrow key or ‘s’= page up
    right-arrow key or ‘x’ = page down
    up-arrow key or ‘a’ = scroll up
    down-arrow key or ‘z’ = scroll down
  11. Check for any errors and remaps that occur while keeping note of:
    • which disk number is having errors/remaps
    • how frequently those errors/remaps occur
    • the date and time you perform the RAID check
    • any disks that have been offline (such as when a disk is replaced)
    • any disks that have been rebuilt
  12. Document and append (see below for explanation) the results for each raid device in an output file (one for each raid) being sure to include the date and time (ex. – “2005-03-28 9:43am”).
    • open another terminal and create a secure connection to fablio (type "ssh fablio"). Use your normal login password. The documents are to be stored on the ‘fablio’ machine under /space/stadm/raid_analysis/, so change to this directory.
    • the results of each raid analysis should be copied to an already existing output/text file so that RAID checks can be compared with previous checks. Use a row a *’s to separate the RAID check results from different days.
    • if no errors/remaps occur, then simply state that no errors have occurred along with the date/time stamp.
    • otherwise copy and paste the blocks (text showing any and all DISK activity - which needs to be copied from the original raid analysis OUTPUT) into the text document including the date/time stamp at the end of the pasting. This way, on the next check the outputs can be compared to the previous one to see if there are any NEW entries. If so, then the new entries should be copied and pasted onto the end of the document followed by a new date/time stamp.NOTE: if a disk has been replaced then the RAID log will notify that that disk has been offline (pulled out of the RAD rack temporarily). Usually this is followed by a disk rebuilding. Document which disks get rebuilt and are offline. You only need to copy-paste results that occur after the rebuilding.
  13. Exit by typing ESC followed by SHIFT-A.
  14. Repeat steps 8 – 13 for each raid.
  15. Exit ‘direct’ mode by typing “exit” and pressing ENTER.
  16. Close connection to cserve using the “logout” command.
  17. If any of the RAID systems have excessive or reoccurring errors, send an email to Aaron notifying him of the potential problem(s) as well as specifying the location (directory path) of the output file (from step 12) so that he can view it himself.

created by Joe Mount
2005-04-12
edited by Tyler King
2007-7-17