Setting up smartmontools on OpenSolaris

As a follow-up to the previous entries concerning my new storage server, I thought I’d talk about installing and configuring the smartmontools monitoring software in OpenSolaris. Like most open source software, it’s fairly easy to compile and install on OpenSolaris, it’s the automating part that’s a little different from Linux, for which smartmontools was developed.

To get started, download the latest release of the smartmontools source and extract it to a temporary directory. Next, make sure you have the gcc-dev packages installed, otherwise compiling the source is going to be a challenge (if which gcc returns nothing, run pfexec pkg install gcc-dev). Now you can build and install the tools quite easily with the following commands.

  1. ./configure
  2. make
  3. pfexec make install

At this point the smartd and smartctl binaries are installed under /usr/local, along with the manual pages and a sample configuration file, /usr/local/etc/smartd.conf. There are just a couple of changes to be made in the configuration file, and a few notes before proceeding. First off, as of today, ATA disk support in smartmontools on Solaris is not there, so SCSI emulation is used instead. While this gives us basic health status, it seems to prevent any detailed SMART data from being collected. It may also be the reason why I can’t run self-tests on my disks. This all worked in Linux with these same disks, so I’m guessing it’s due to the lack of ATA support in Solaris. Secondly, before you can monitor your disks, you’ll need to know the labels for those disks. I found zpool status worked quite well.

$ zpool status
  pool: rpool
 state: ONLINE
 scrub: scrub completed after 0h6m with 0 errors on Sun Feb 15 02:21:37 2009
config:

        NAME        STATE     READ WRITE CKSUM
        rpool       ONLINE       0     0     0
          c0d0s0    ONLINE       0     0     0

errors: No known data errors

  pool: yubaba
 state: ONLINE
 scrub: scrub completed after 0h59m with 0 errors on Sun Feb 15 03:15:02 2009
config:

        NAME        STATE     READ WRITE CKSUM
        yubaba      ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            c4t0d0  ONLINE       0     0     0
            c4t1d0  ONLINE       0     0     0
            c5t0d0  ONLINE       0     0     0
            c5t1d0  ONLINE       0     0     0

errors: No known data errors

Not only does it show which disks are in which pools, but it gives you the names of the disks that smartmontools expects, namely c4t0d0 and so on. Now we are ready to make changes to the smartd.conf file.

The first change to make in smartd.conf is to comment out the DEVICESCAN line, which is fine if you want to scan all disks in your system, but I found that smartmontools didn’t like my rpool disk, and it wanted me to declare the disk types as “scsi” for it to do anything at all. Next we have to tell smartd which disks to monitor, so I added the following lines to the end of the smartd.conf file:

/dev/rdsk/c4t0d0 -d scsi -H -m root
/dev/rdsk/c4t1d0 -d scsi -H -m root
/dev/rdsk/c5t0d0 -d scsi -H -m root
/dev/rdsk/c5t1d0 -d scsi -H -m root

This seems to work, as invoking pfexec smartd -q onecheck resulted in output like this:

$ pfexec smartd -q onecheck
smartd version 5.38 [i386-pc-solaris2.11] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

Opened configuration file /usr/local/etc/smartd.conf
Configuration file /usr/local/etc/smartd.conf parsed.
Device: /dev/rdsk/c4t0d0, opened
Device: /dev/rdsk/c4t0d0, is SMART capable. Adding to "monitor" list.
Device: /dev/rdsk/c4t1d0, opened
Device: /dev/rdsk/c4t1d0, is SMART capable. Adding to "monitor" list.
Device: /dev/rdsk/c5t0d0, opened
Device: /dev/rdsk/c5t0d0, is SMART capable. Adding to "monitor" list.
Device: /dev/rdsk/c5t1d0, opened
Device: /dev/rdsk/c5t1d0, is SMART capable. Adding to "monitor" list.
Monitoring 0 ATA and 4 SCSI devices
Device: /dev/rdsk/c4t0d0, opened SCSI device
Device: /dev/rdsk/c4t0d0, SMART health: passed
Device: /dev/rdsk/c4t1d0, opened SCSI device
Device: /dev/rdsk/c4t1d0, SMART health: passed
Device: /dev/rdsk/c5t0d0, opened SCSI device
Device: /dev/rdsk/c5t0d0, SMART health: passed
Device: /dev/rdsk/c5t1d0, opened SCSI device
Device: /dev/rdsk/c5t1d0, SMART health: passed
Started with '-q onecheck' option. All devices sucessfully checked once.
smartd is exiting (exit status 0)

So far so good, but what about having smartd run at bootup, and continuously monitoring the disk status? In Linux, you’d use initd, but since this is OpenSolaris, we’ll use the Service Management Framework (SMF) instead. To do that, paste the following text into /var/svc/manifest/site/smartd.xml, change the file ownership to root:sys, and invoke pfexec svccfg -v import /var/svc/manifest/site/smartd.xml. Then check that the service is running (svcs smartd), and if not, enable it using pfexec svcadm enable smartd.

<?xml version="1.0"?>
<!DOCTYPE service_bundle SYSTEM "/usr/share/lib/xml/dtd/service_bundle.dtd.1">
<service_bundle type="manifest" name="smartd">
  <service
     name="site/smartd"
     type="service"
     version="1">
    <single_instance/>
    <dependency
       name="filesystem-local"
       grouping="require_all"
       restart_on="none"
       type="service">
      <service_fmri value="svc:/system/filesystem/local:default"/>
    </dependency>
    <exec_method
       type="method"
       name="start"
       exec="/usr/local/etc/rc.d/init.d/smartd start"
       timeout_seconds="60">
      <method_context>
        <method_credential user="root" group="root"/>
      </method_context>
    </exec_method>
    <exec_method
       type="method"
       name="stop"
       exec="/usr/local/etc/rc.d/init.d/smartd stop"
       timeout_seconds="60">
    </exec_method>
    <instance name="default" enabled="true"/>
    <stability value="Unstable"/>
    <template>
      <common_name>
        <loctext xml:lang="C">
          SMART monitoring service (smartd)
        </loctext>
      </common_name>
      <documentation>
        <manpage title="smartd" section="1M" manpath="/usr/local/share/man"/>
      </documentation>
    </template>
  </service>
</service_bundle>

At this point we have a managed service that is checking the health of our disks, and if anything comes up, it will send an email to the root user. While I would have liked to also set up short and long self-tests, I can live without it for now. In the mean time, I’ve got a weekly cron job that scrubs the data on the disks using zpool scrub, which will identify any data read errors on the disks and attempt to correct them automatically.

About these ads
This entry was posted in Computing, HowTo, Storage and tagged , , . Bookmark the permalink.

24 Responses to Setting up smartmontools on OpenSolaris

  1. Sebastian says:

    Hi!

    We tried to get smartmontools working on svn_124. But we don’t figure out, how to enable the “scsi emulation” for disks, to read smart information from PATA and SATA disks. Is there any possibility to read disk informations from this type of drives?

    Thanks in advance
    Sebastian

    • nlfiedler says:

      Good question, and as far as I know there is no proper ATA support in Solaris. Hopefully this will change some day.

  2. daniel says:

    its getting better – smartmontools-5.39 + with Scsi to ATA Translation

    packaging maybe
    http://jucr.opensolaris.org/review/packages/3055/

  3. Pingback: May Contain Blueberries » Should I have named it Barad-dûr? Or, my new server.

  4. Wonslung says:

    Thanks for this post. This is most excellent.

  5. If I had a penny for each time I came here! Great article.

  6. Vlad says:

    For the root pool devices (I have mirrored rpool) I had to add something like /dev/rdsk/c5t0d0p0 (note the p0) so the “file” in devfs is actually found. After that smartd is happy and able to check rpool disks as well.

  7. Pingback: Setting Up Smartmontools On Open Solaris | Jason Churchill

  8. James says:

    “no proper ATA support in Solaris” ?

    What do you actually mean by that? Do you actually mean that there’s some aspect of ATA/SCSI translation that is not being handled? If so, what is it?

    • nlfiedler says:

      Naturally I’ve never had the time to work out what shortcoming there is in Solaris, but there is a well known issue. It hasn’t been resolved for many years. I leave it as an exercise to the reader to find any one of the many references to the issue.

  9. Brian says:

    Am I missing something? I’ve followed the instructions carefully (there’s nothing very tricky here) and I can’t seem to get smartmontools to work at all. I’m getting this output:

    martd 5.40 2010-10-16 r3189 [i386-pc-solaris2.11] (local build)
    Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

    Opened configuration file /usr/local/etc/smartd.conf
    Configuration file /usr/local/etc/smartd.conf parsed.
    Device: /dev/rdsk/c10d0s0, opened
    Device: /dev/rdsk/c10d0s0, failed Test Unit Ready [err=-25]
    Unable to register SCSI device /dev/rdsk/c10d0s0 at line 24 of file /usr/local/etc/smartd.conf
    Unable to register device /dev/rdsk/c10d0s0 (no Directive -d removable). Exiting.

    Can anyone shed some light on this? I’m sure I’m missing something simple, but I just can’t seem to figure out what it is.

  10. Tom says:

    Thanks so much for the clear and easy to follow guide. I’ve been banging my head on a problem for days and would appreciate if anyone has pointers.

    Everything in the guide works fine, smartd -q onecheck fines and checks the disks OK, the specified files are copied over fine, but the service always goes into maintainance.
    Here is the command to start smartd and the resulting log file.

    zaphod@thebook:~/smartmontools-5.39.1$ svcadm enable smartd
    zaphod@thebook:~/smartmontools-5.39.1$ more /var/svc/log/site-smartd\:default.lo
    g
    [ Nov 2 20:27:34 Enabled. ]
    [ Nov 2 20:27:34 Executing start method ("/etc/init.d/smartd start"). ]
    smartd 5.39.1 2010-01-28 r3054 [i386-pc-solaris2.11] (local build)
    Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

    =======> UNRECOGNIZED ARGUMENT: start <=======

    Use smartd -h to get a usage summary

    [ Nov 2 20:27:34 Method "start" exited with status 1. ]

    To me "unrecognized command start" makes no sense, the xml file is copied in fine, and the other files are built and copied in OK also.

    If I start smartd directly (not through svcadm) it starts OK.

    Any idea how to debug this? I've really hit a wall here….

    Thanks so much.

    • OM says:

      In the XML given in the text take out the “start” from the
      exec=”/usr/local/etc/rc.d/init.d/smartd start”

      • OM says:

        Wrong, simply check the correct path and script. In smartmontools I found the script at /usr/local/etc/init.d/smartd for example.

  11. Plouj says:

    By the way, I noticed that the smartmontools service in NexentaCore needs “start_smartd=yes” uncommented in /etc/default/smartmontools. Otherwise the SMF will go into maintenance mode because /etc/init.d/smartmontools will not spawn the smartd daemon.

  12. Antony Brooke-Wood says:

    @Brian, I have exactly the same issue. My drives are listed as:
    /dev/rdsk/c7d0s0
    /dev/rdsk/c9d0p0
    /dev/rdsk/c10d0p0

    For each of them, I have tried the following options:
    -d ata
    -d sat,12
    -d sat
    -d scsi

    I also tried adding the additional (slice?) parameter to the device listing:
    c9d0p0s0 (note the additional ‘s0′)

    I also experimented with the permissive options. The ‘best’ result I get is with either no -d setting or -d ata, which gives:
    smartctl 5.41 2011-06-09 r3365 [i386-pc-solaris2.11] (local build)
    Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

    Smartctl: Device Read Identity Failed: Inappropriate ioctl for device

    === START OF INFORMATION SECTION ===
    Device Model: [No Information Found]
    Serial Number: [No Information Found]
    Firmware Version: [No Information Found]
    Device is: Not in smartctl database [for details use: -P showall]
    ATA Version is: [No Information Found]
    ATA Standard is: [No Information Found]
    Local Time is: Thu Jun 30 00:39:17 2011 EST
    SMART support is: Ambiguous – ATA IDENTIFY DEVICE words 82-83 don’t show if SMART supported.
    SMART support is: Ambiguous – ATA IDENTIFY DEVICE words 85-87 don’t show if SMART is enabled.

    Not exactly helpful!

  13. Antony Brooke-Wood says:

    Ahh .. even tried rebuilding, which seems to fix it for some:
    http://lists.freebsd.org/pipermail/freebsd-stable/2006-April/024334.html

    Didn’t work for me

  14. Mafketel says:

    -d sat,12 works for me with this version smartctl 5.41 2011-06-09 r3365 [i386-pc-solaris2.11] (local build)
    on a motherboard that previously did not work

  15. Pingback: Monitoring de la température des “disk” de Nexenta en SNMP - Hypervisor.fr

  16. Gregg says:

    I think that some SATA drivers are not “bridging” correctly. I get ENOTTY returns from using “-d scsi” on my off-board (on-board is Intel chipset) Silicon Image chipset cards. I don’t see any success with “-d sat”, “-d sat,12″ or “-d ata”.

  17. Alex says:

    Does not work for me on Solaris 11 with MSI Intel G31 motherboard:

    root@nas:/etc# smartd -d -q onecheck
    smartd 5.42 2011-10-20 r3458 [i386-pc-solaris2.11] (local build)
    Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

    Opened configuration file /etc/smartd.conf
    Configuration file /etc/smartd.conf parsed.
    Device: /dev/rdsk/c3d0s0, opened
    Device: /dev/rdsk/c3d0s0, Both 36 and 64 byte INQUIRY failed; skip device
    Unable to register SCSI device /dev/rdsk/c3d0s0 at line 149 of file /etc/smartd.conf
    Unable to register device /dev/rdsk/c3d0s0 (no Directive -d removable). Exiting.

  18. OM says:

    SunOS openindiana 5.11 oi_151a7, smartmontools 6.0 here: it works on an HP ProLiant ML110 G7 using “-d sat,12″

    Thanks for this blog, very useful.

  19. Bryan Iotti says:

    I would like your permission to upload a copy of your smartd.xml service manifest to the official OpenIndiana Wiki page about SMART monitoring, instead of just linking to your blog. Your manifest has proven invaluable when installing several servers, I think it should be more widely known and a part of the installation instructions for OI. Thank you for writing it!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s