I had a client who was believed that something was wrong with our application because the tests performed worse on sequential writes on an SSD array than a spinning disk array. He said he wanted a “technical no-BS answer” — so I gave him one. Hopefully it can help someone else as well.
#1) It’s The Manufacturers’ Fault! — SSDs have their own drive interface, bridge, buffers, and gates that can be configured in a number of ways by the manufacturer. Every time a read is performed on an SSD, nothing changes, so the impact on the NAND flash memory is minimal. A write, however, causes the gates to block current to a chain of transistors, changing its value. Without going into an explanation of the role of electrons in electrical fields, this change creates significant wear on the memory cells of the drive. In fact, the larger the write, the greater the wear on the drive. Therefore, a standard was reached early on that set 90%-95% of an SSD’s interface bandwidth to be used for reads, and 5%-10% to be used for writes to limit simultaneous electrical changes. This limitation alone can slow a large-block sequential write stream.
#2) You Can Not Change The Laws Of Physics, Jim! — When a hard drive has “free space” to use, and it wants to write to it, it can just write the magnetic signature. One step — easy peasy — magnetism rocks. But you can’t just change the orientation, relative location, or charge of a group of electrons. Without a supercollider, your only point of change is called the helicity, which is affected by changing the flow of electricity.
In an SLC (single level cell) SSD, your capacity is lower, but you are modifying one electrical path containing one bit between a 0 and a 1. These cells form a grid, and these grids contain your data. SLC drives usually show lower performance specs, but this can be misleading. They generally are slower because they have to change more cells to get the same amount of data in, but each cell can only have two values: 1 (powered) or 0 (unpowered).
Now let’s take the far more common type of SSD known as a Multi-Level Cell (or MLC SSD). In this case, the cells can hold two bits of information: Bit-A and Bit-B. Both bits are controlled by separate electrical paths. So instead of 0 or 1, that one cell could be 00, 01, 10, or 11. Needless to say, that allows for a lot more data to be stored in the same number of cells, but writes involve a LOT more electrical changes, creating far faster wear on the cells. (And as those of you familiar with particle decay will realize — YES — cells DO wear out! Just because you can’t SEE moving parts without an electron microscope doesn’t mean they don’t exist!)
So why do we care? Here’s how writes happen to these three types of media:
Hard disk —
- Change the magnetic state of each bit/byte/block in one pass. Done. (Yes, the arm needs to be moved and articulated…but we’re talking sequential writes here.)
SLC SSD —
- Examine the grid of cells needed
- If the grid contains any zeroes, send power to all of the cells required to change their values to ones. (This is what a TRIM command does, by the way).
- Now that the grid is “reset”, cut the power to the cells that need to go to zero. Those three steps take time — especially in larger blocks, and ESPECIALLY when they have to go through a drive interface like SAS which was built for hard disks!
MLC SSD —
- Examine the grid of cells needed, which takes longer because of the extra bit per cell.
- If any value other than 11 exists, send power to all of the cells to change them to 11 in one burst.
- With the grid reset, cut the current to Bit-A of each cell that needs a value of zero.
- Now cut the current to Bit-B of each cell that needs a value of zero.
- Due to their proximity, confirm the value of each two-bit cell.
So when working with write-intensive applications, especially sequential ones where it needs a sequential grid of cells to take the data, the SSDs tend to perform worse, because it is simply more commands taking more time through a serial drive interface. The extra “gotcha” in all of this is that in write-heavy environments, the SSDs will wear out far sooner than even 15k rpm hard disks! (Especially the MLC SSDs)
Why PCI-E Flash is faster than everything else, even for sequential writes: It sits directly on the data bus and does not have to go through a serial drive interface, allowing it to perform FAR more actions at one time.
The reason PCI-E flash is so much more expensive has to do with the processing and sorting of all of those commands without a physical controller to drive it.