View Single Post
Old 13th December 2010, 07:48 AM
Posts: n/a
SSD drives under Linux - discussion thread

UPDATE 3/2/2012
A concise, updated consolidation of this topic appears here.
You may wish to skip there directly.
---end of update --

I'd like to start a quality thread on how to manage SSD drives under
Linux. (please take the off-topic questions and quips elsewhere)

The basic FAQ questions are answered here:

I'll skip the "why"s for now, but the practical ways to manage an SSD drive include the following steps:

1/ File System Alignment:

For performance you should align to at least a multiple of 4KiB (8
sector) boundaries:

$ sudo fdisk -l /dev/sda
Disk /dev/sda: 120.0 GB, 120034123776 bytes
255 heads, 63 sectors/track, 14593 cylinders, total 234441648 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0000ac71

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *        2048    63490047    31744000   83  Linux
/dev/sda2        63490048    80283647     8396800   82  Linux swap / Solaris
/dev/sda3        80295936   173000703    46352384   83  Linux
Note: Each start sector is a multiple of 8. Gparted defaults to 1 MiB
boundary alignment (multiples of 2K sectors) which works nicely.

Some sources claim that SSD file system alignment should be on flash erase block
boundaries (often 128KB or 256KB) however manufacturers are almost silent on the
topic, and given the parallel RAID-like organization of recent SSD drives this advise
is questionable. Alignment to 4KiB boundaries makes a clear performance
improvement in many cases.


2/ Never use any defragmentation on an SSD file system.
Defragmentation assumes that contiguous read/writes are faster than
non-contiguous, but this is not true on a random access media.
Instead it generated pointless writes which wears the drive.


3/ TRIM issues:

TRIM is a method for a computer to notify an SSD drive that certain
blocks are unused and available for reclamation. Using TRIM
dramatically improved write speed as a drive is used. Linux has kernel
support for the TRIM command since 2.6.33 however it has undergone
several substantial revisions. The following notes apply to kernels 2.6.36 and

Does your older SSD drive support TRIM ? There is an easy way to tell:
# su -c 'hdparm -I /dev/sda' | grep TRIM
	   *	Data Set Management TRIM supported
	   *	Deterministic read data after TRIM
The asterisk indicates the feature is available. The first item "TRIM
supported" is the critical one. The second, "Deterministic read" means the
SSD drive will produce a fixed pattern (all zeros or all ones) if you read a block
that is unassigned (never written or in the TRIM free list).

SWAP space on Linux automatically supports TRIM operations when the underlying
drive supports TRIM. No configuration is needed. Early SWAP/TRIM support
(perhaps in 2.6.33 and prior) may cause horrible performance during suspend and
similar use.

The filesystems ext4, gfs2, nilfs2 ,btrfs and vfat all have some support for TRIM.
These filesystems can all be mounted with the "discard" option to allow
released space to be communicated to the drive. The "discard" option can
appear in the "mount -o ." list or in the /etc/fstab file.

The mkfs.ext4 (and ext2, ext3) should be executed without the "-K" option. This
options causes the mkfs.ext4* command to NOT TRIM all the unused blocks of the
file system when it is initially created.

The btrfs mount takes options "ssd" ssd_spread" options. Consult the btrfs wiki for details.

The hdparm command supports TRIM with the "--trim-sector-ranges" and
"--trim-sector-ranges-stdin" options, however the use of these is considered

Before the 3.6.37 kernel, filesysytems atop a LVM or a software RAID(mdadm) could
not support TRIM. Since 2.6.37 TRIM is supported on filesystems on LVM.and software
RAID can support TRIM.

No It's unclear/doubtful that any file-system atop an LVM block manager
or a RAID block manager can support TRIM at this time.
Therefore you should not use LVM nor RAID (mdadm) atop an SSD for
optimal performance unless you investigate the issue.

The btrfs file system has a "ssd" mount option which does NOT appear
to manage TRIM at this time.

Users frequently ask "What about swap ?". The swap manager does not
manage TRIM at all, however the good news is that no partition,
including swap, will ever allocate more flash than the size of the
partition. So even though an 8GB swap on a a 60GB drive occupies a
considerable fraction of space - if the remaining 52GB is using ext4
or other file system with a well managed TRIM, then the amount of
"unused" space on the ext4 partitions is nearly the same as the amount
of free space that the drive controller recognizes.

If having 8GB (example) permanently allocated from swap is too
annoying, then we could create a script to run at boot time (in
rc.local for example) that would:
1/ remove swap with "swapoff /dev/sdb2"
2/ TRIM the entire swap, for example, (see partitions above)
"hdparm --trim-sector-ranges 63490048:$((80283647+1-63490048))"
Note: this is a very dangerous command
3/ re-create swap, "mkswap /dev/sda2"
4/ remount swap, "swapon /dev/sda2"


4/ Remove the elevator disk schedule.

Normally rotating disks use the so-called elevator algorithm to access
blocks. It makes sense with a linear seeking head to seek
continuously from the inner to outer to inner most cylinders. Just as
an an elevator goes from the lowest requested floor, to highest and
then down again. To do this all the pending disk I/O has to be sorted
into cylinder order. This operation makes no sense for a SSD random
access media. Instead add the option "elevator=noop" to the kernel
line in /etc/grub.conf, like:
kernel /boot/vmlinuz.... elevator=noop

This saves a microscopic amount of CPU time preventing the kernel from
sorting disk operations by cylinder. It also may cause operations to
complete more nearly in-order, which can be an advantage.


5/ Reduce SSD Write Activity:

A/ mount file systems with the "noatime" option.
"noatime" prevents the file system from updating file access times.
"noatime" includes "nodiratime" (prevents directory access time updates).

B/ Consider whether you need journals.

File system journals make file system reconstruction (fsck for
example) possible, and running without any journal makes catastrophic
file system failure far more likely. On the other hand, journals
require extra write operations, Sometimes it's easy enough to
reconstruct a file system that it's not worth the cost in terms of SSD
write operations. For example I have a procedure to reconstruct the
root file system of any of my several systems in under an hour. I
have a catastrophic file system error rarely (<<1/5yrs). I can afford
to not journal the rootfs of these systems. Further I have a backup
of each /home that is no more that 24 hours stale - so I can afford to
lose each system's /home too. If you don't create reliable and
regular backups - then don't even consider removing the journals.

At a minimum mount each SSD ext4 with the "data=writeback" option.
This means the ext4 will not journal the data (just metadata) and
any fsck will probably succeed, except for the loss of very recently
written data. For the root file system, setting the journal mode
requires adding an option to the kernel line in /boot/grub/grub.conf,
kernel /boot/vmlinuz-2.6..... elevator=noop rootflags=data=writeback
For other non-root SSD file systems we can add the option in /etc/fstab
UUID=24941fe7-6... /home ext4 noatime,discard,data=writeback 1 2

Note: for Linux version 3.0 kernels, setting the ext3/4 root file system option
"data=..." in fstab will prevent the remount and likely to cause boot failure.

For a more aggressive approach one can eliminate journalling
altogether by doing this to a unmounted or read-only ext4
tune2fs -O ^has_journal /dev/sda1
Then there will be no journal. While you are at it the command:
tune2fs -r 1024 /dev/sda1
will reduce the root-reserved blocks to 1k (4MB) which is plenty IMO.

Note that the "commit=nnn" mount option appears to be ignored
since the recent kernel flush daemons were implemented.

C/ Get rid of /tmp
Put it on ramdisk. The X11 server makes a huge number of tiny
writes to /tmp and if these flush to disk at a high rate it creates a
load of writes. Adding this line to /etc/fstab does the trick.

none /tmp tmpfs defaults 0 0

D/ Send logs over the net.
man rsyslog.conf
Also look into logs for Xserver which can become large.

E/ Fix your Firefox to use memory caches.
Some nice notes on configuring firefox to not use disk caches here, set:
browser.cache.disk.enable false
create: (for 64MB of mem cache)
browser.cache.memory.capacity 65536


A brief buyers note: Study the SSD drive controller carefully before purchasing. The controller has a dramatic impact on performance and also on drive reliability/lifespan.

Last edited by stevea; 2nd March 2012 at 08:50 PM. Reason: improve organization, include added information; root journal changes
Reply With Quote