Step-by-step instructions to stress test, partition, format and permanently install a new Seagate ST4000DM000 4 TB drive with 4k sectors in Debian.

I picked up a pair of Seagate ST4000DM000 4 TB drives to replace an aging setup of multiple 2 TB drives.

When I installed those a couple of years ago, I had to give manual instructions to the formatting tools to correctly report sector sizes and align the physical 4k sector boundaries to the applied file system. Now that I had new drives to install, I documented the process for future considerations, and I found it to consist of fewer steps than the last time, but with some added quirks due to larger drives.

Note on 4k sectors

If you are using somewhat recent tools (≥mid-2010), this section is in practice only of historical importance and can be skipped. If you want a more extensive explanation, see ATA 4 KiB sector issues at the wiki, and even more technical background and pretty pictures are found in this recap of Western Digitals' own "Advanced format" motivation.

Drives used to have physical sectors of size 512 b, but as drives grew this led to addressing and performance problems because of the large amount of sectors this would infer, generating large overhead space for e.g. error correction. Manufacturers started using sectors of size 4096 b ("4k sectors") to avoid this, but the underlying systems still assumed 512 b during a transition period. This led to some issues: drives needed to "lie" about their sector size to be compatible with old operating systems and BIOS versions, but at the same time the user needed to use their correct 4k sectors for maximum performance.

To get this, every write should line up with the physical sectors of the drive, so the problem was two-fold:

  1. Make sure the OS knew that a sector was 4k
  2. Make sure the file system was aligned with the physical sectors to avoid overlapping logical/physical sectors.

To avoid overlapping physical sectors for each logical one, the file system needed to align the start of its logical sectors at the physical sector boundaries (i.e. at a multiple of 8 of the legacy 512 b sectors) and then use 4k sectors throughout. This was possible with manual intervention and know-how, but it took a while before the tools themselves could detect this and act accordingly. E.g. fdisk ≥0.6 (released mid-2010) and GParted ≥0.6.0 (released 2010–06–18) automatically detect and support 4k drives, along with most other tools newer than this.

Check drive condition before deployment

Breakdown of drives through normal use follow the common bathtub curve, and thus it is definitely worth the time to try to find out if the drive is headed for early failure. No test is better than reality, but the method described here is commonly used to weed out drives with these kinds of errors.

  1. Check the SMART data on the new drive and save the result (preferably with timestamps in some fashion):

    mkdir disk-1
    sudo smartctl -a /dev/sde | tee disk-1/smartctl-$(date +'%Y-%m-%d-%H-%M-%S')
  2. Write data to the entire disk:

    sudo dd if=/dev/zero of=/dev/sde bs=8M |\
        tee disk-1/dd-dev-zero-$(date +'%Y-%m-%d-%H-%M-%S') &&\
        touch disk-1/dd-dev-zero-finished-$(date +'%Y-%m-%d-%H-%M-%S')

    I did this on two identical Seagate ST4000DM000 4 TB drives over SATA-300 connections (hardware from 2009 — CPU: C2D E8400; chipset: Intel P43/ICH10). For the first drive it took 8 hours 50 minutes, and for the second it took 8 hours 30 minutes. This gives a data write speed of ~130 MB/s, which is reasonable.

    If you have multiple drives on separate channels, you can run parallel processes with no slowdown (until you choke the CPU, but on my earlier described hardware setup one process drew ~20% of one CPU core, so it probably won't be an issue).

    dd should not write anything to STDOUT, but I use tee in case of unexpected behavior. Since it is such a long process, I would like to directly see if the tool wants to give me any messages. The ending touch command creates a file with the time the command finished, which might be interesting.

    Now we play the waiting game until dd is finished.

  3. Redo step 1 above (but don't overwrite the old files — running the exact same command will work because of the dynamically generated timestamp) and compare the values with the old files.

    If Reallocated_Sector_Ct has changed upwards (or indeed if it was not 0 to begin with), this drive is not in good shape and will probably break down long before its intended lifespan. Explain your process and the data values to your place of purchase and ask for a replacement drive. They may or may not be happy to help (for enterprise customers they will most probably do it; for home users it varies from dealer to dealer).

Format and partition the drive

  1. Earlier I have always used fdisk to create partition tables and partitions, but this tool only works with MBR style partition tables (sometimes referred to as DOS or MSDOS style), and these can only handle partitions ≤2 TB. Since I wanted 4 TB partitions this time, I could not use this utility.

    The GPT fdisk project provides a CLI for creating GPT partition tables, that are "more modern" and support large partitions. This time, however, I used GParted which provides a GUI to partition editing and supports a wide range of partition tables (though MBR is currently still the default for compatibility reasons).

    $ sudo gparted

    and choose the device in the list in the upper-right corner. Then go to Device → Create partition table, choose type "GPT" and apply.

    If you for some reason need to use the fdisk utility, here is how it used to be done (note the warning texts at startup describing what I said above). View sample fdisk session

    $ sudo fdisk /dev/sde
    Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
    Building a new DOS disklabel with disk identifier 0x9158bd86.
    Changes will remain in memory only, until you decide to write them.
    After that, of course, the previous content won't be recoverable.
    Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)
    WARNING: The size of this disk is 4.0 TB (4000787030016 bytes).
    DOS partition table format can not be used on drives for volumes
    larger than (2199023255040 bytes) for 512-byte sectors. Use parted(1) and GUID
    partition table format (GPT).
    The device presents a logical sector size that is smaller than
    the physical sector size. Aligning to a physical sector (or optimal
    I/O) size boundary is recommended, or performance may be impacted.
    Command (m for help): m
    Command action
       a   toggle a bootable flag
       b   edit bsd disklabel
       c   toggle the dos compatibility flag
       d   delete a partition
       l   list known partition types
       m   print this menu
       n   add a new partition
       o   create a new empty DOS partition table
       p   print the partition table
       q   quit without saving changes
       s   create a new empty Sun disklabel
       t   change a partition's system id
       u   change display/entry units
       v   verify the partition table
       w   write table to disk and exit
       x   extra functionality (experts only)
    Command (m for help): n
    Partition type:
       p   primary (0 primary, 0 extended, 4 free)
       e   extended
    Select (default p): p
    Partition number (1-4, default 1):
    Using default value 1
    First sector (2048-4294967295, default 2048):
    Using default value 2048
    Last sector, +sectors or +size{K,M,G} (2048-4294967294, default 4294967294):
    Using default value 4294967294
    Command (m for help): w
    The partition table has been altered!
    Calling ioctl() to re-read partition table.
    Syncing disks.

    hide fdisk session.

  2. Now create a file system on the partition. I will use Ext4 (but will probably migrate to Btrfs in the near future when more mature support is available in stock kernels).

    Via GParted
    Select the unallocated space on the device and create a new partition. Changes are not applied until "Apply" is hit in the main interface.
    Via CLI
    $ sudo mkfs.ext4 -m 0 /dev/sde1 mke2fs 1.42.5 (29-Jul-2012) Filesystem label= OS type: Linux Block size=4096 (log=2) Fragment size=4096 (log=2) Stride=0 blocks, Stripe width=0 blocks 134217728 inodes, 536870655 blocks 0 blocks (0.00%) reserved for the super user First data block=0 Maximum filesystem blocks=0 16384 block groups 32768 blocks per group, 32768 fragments per group 8192 inodes per group Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 102400000, 214990848, 512000000 Allocating group tables: done Writing inode tables: done Creating journal (32768 blocks): done Writing superblocks and filesystem accounting information: done

    -m 0 in the CLI example will reserve no space for the super user. This drive is meant to be used as a media storage device, but if it is your main drive, you probably don't want to do this. Read up on what reserved space implies if the concept is unfamiliar. For Ext2/3/4 file systems this can also be changed through CLI via the tune2fs utility:

    $ sudo tune2fs -m 0 /dev/sde1
  3. Add the partition and its mount point in /etc/fstab by adding the line:

    UUID=41af4c39-6505-4b64-a07b-c2188ab01fd4  /media/6-4000  ext4  defaults  0  2

    The UUID value is the name of the link in /dev/disk/by-uuid that points to the wanted partition.

    Create the mount directory if it doesn't exist.

  4. Mount the partition:

    $ sudo mount /dev/sde1
  5. Transfer ownership to the user that will use the partition:

    $ sudo chown myuser:mygroup /media/6-4000


Power saving engineering mishaps

For some reason, the "green" drives from both Western Digital and Seagate by default utilize very aggressive head parking intervals, i.e. the idle time before the heads are parked. The Seagate drives in my case seem to park after 8 seconds of inactivity. This could generate hundreds of thousands of parking operations in just months, which certainly would have a negative influence on the drives' longevity, and it gives of a notable and annoying sound. It is also not a good practice to get used to click noises from hard drives, since it is often a good indicator of upcoming failure. The noise triggers my survival reflexes and thus having the drives sound like this up to every 8 seconds is a bad recipe for a pleasant environment.

This can be mended with hdparm, specifically e.g.

$ sudo hdparm -B 255 /dev/sde

The magic value 255 disables APM (Advanced Power Management) features on the drive. To make this "permanent", put the following section in /etc/hdparm.conf:

/dev/sde {
    apm = 255

Settings in this file should get reapplied on reboot.

Won't the drive take damage from never being able to park the heads? No, I can't believe it will. Most drives don't park the heads at all; all this seem to be is some attempt at a power saving feature. For some use cases the head parking might work well, but not for most, I would say. Definitely not for me.

Additional tips for large copy operations

If you are going to copy large amounts of data locally, don't use cp — use rsync. The latter can give you nice progress reports and supports resuming. Use it like:

rsync -avP /media/3-2000/sd /media/6-4000

Note that an ending / on the source directory will alter the behavior of rsync! This is the biggest operative difference from cp. The above syntax will behave as cp -a, but if the source argument were to be given as media/3-2000/sd/, with an ending /, things would work differently (and probably not as you want). Read the rsync manual for more info.