The hows and whys of SSDs

Robert Hallock 25 Sep, 2008 at 4:40pm ET Article published in Tech

Solid State Disks are poised blow the doors off of traditional storage media. As the inevitable end-game of the great bet on flash memory, they are coming in strengthening numbers to obliterate benchmarks, make or break companies, and free-fall in price. The revolution this nascent market is set to unleash will leave few questions as it makes a staggering rise to preeminence.

The history of flash

Flash began humbly in the laboratory of Dr. Fujio Masuoka as Toshiba sought to address the need for inexpensive non-volatile memory that could be easily reprogrammed. During early testing of the fledgling technology, a colleague by the name of Shoji Ariizumi commented that erasing data by using a sudden flash of electricity was similar to the flash of a camera. While the memory being introduced to the 1984 IEEE Electron Devices Meeting was known as “NOR,” Ariizumi’s unwitting comment has become the household name.

Intel was also present at the 1984 IEDM and was quick to recognize the potential of NOR-type flash. The Santa Clara-based foundry was first to commercialize the visionary technology in 1988 with the release of the 256Kb IC which, at $20 USD, commanded the arresting sum of $640 USD per megabyte. In spite of the numbing price, the introduction of commercial NOR memory was a sensational success that spawned numerous companies in its wake. It was clear that Toshiba’s work had triggered something much larger than originally anticipated.

At 1989’s International Solid-State Circuits Conference, Dr. Fujio Masuoka and Toshiba once more excited onlookers with the introduction of NAND-type flash memory. NAND surpassed NOR with enhanced longevity, faster I/O, lower costs and a smaller footprint amongst its features. While NOR was first to market in commercial devices with SanDisk’s 1994 CompactFlash I devices, NAND came to stay just a year later with Toshiba’s 1995 release of the SmartMedia specification.

Though NAND remains the flash memory magnate, considerable research continues for both types of flash memory. Manufacturers hope that their research will meet the capacious demands of users while continuing to enhance the speed, cost, and reliability of flash-based devices.

How flash memory works

Flash memory’s radical approach to storage technology dispenses with mechanical components and represents information with the electron. To harness those electrons, solid state disks begin with the flash memory cell.

Each cell is composed of nine major components: the word line, bit line, control gate, floating gate, oxide layer, drain, source, ground, and substrate. These unbelievably tiny cells — millions of which are arranged in an electrically-connected grid — serve as the building block of today’s flash devices.

The structure of NAND flash cell. The black lines represent current paths with or without wires.

All solid-state memory is designed to record states, or the strength of an electric current which represents a binary digit. Technologies like RAM lose their programmed information when the current is severed as they have no method to retain the electrons that represented information. Conversely, NAND can preserve its states by trapping the electrons with a process known as Fowler-Nordheim Tunneling.

The process begins by applying a positive current of approximately twelve volts to the word line and bit line. The positive charge on the bit line pulls a rush of electrons from the source to the drain as the current flows to ground. On the word line, the charge is sufficiently strong to tug a few electrons away from their race to the drain. While the oxide layers are typically a powerful insulator, the excited electrons are able to surmount this barrier and become trapped within the floating gate. These trapped electrons are how flash memory is able to remember the electrons that represent information.

The induction of an electric field (blue outline) along the lines excites electrons and forces them through the oxide layer to become trapped in the floating gate.

Reading information back out of a flash cell is done by a sensor that compares the charge of the trapped electrons against a steady current. If the charge in the gate exceeds fifty percent of the current’s strength, the cell is considered to be “closed” and represents a zero. If the current can move through the floating gate without being impeded by captured electrons, the gate is considered “open” and represents a binary one.

Flash memory employs blocks composed of thousands of NAND cells. Each block uses a common word and bit line.

Harnessing flash cells for mass storage

NAND is unique from memory technologies like NOR and DRAM because it is not byte-addressable, or incapable of reading and writing one byte at a time. Because the complexity of wiring for byte-addressable memory scales with capacity, NAND’s destiny as mass storage meant a different approach. To overcome addressing restrictions, NAND cells are grouped by the hundreds into pages, and the pages into blocks. Each page shares a common set of word and bit lines and are organized into four common configurations:

32 pages of 512 bytes for a 16 KiB block
64 pages of 2,048 bytes for a 128 KiB block
64 pages of 4,096 bytes for a 256 KiB block
128 pages of 4,096 bytes for a 512 KiB block

Even though data can be read and written by the page, up to the entire block can be read and written in a single pass. This means that a 4 KiB file could consume the entire block which may be up to 128 times the size of the file. Such wasted space, called slack space, will go unused until replaced by data that uses the space more efficiently.

Writing data

While solid state disks are quite adept at locating blocks for I/O, NAND cells can’t directly write to those located block. Instead, data is first written to an erase block and then merged with the existing contents of the drive to complete the write sequence. This merger process is rated with a write coefficient that compares the amount of data managed in DRAM, the ATA bus, and on the drive’s buffers against the size of the actual write. Many of today’s hard drives have a write coefficient of about 20:1, meaning 1GiB of written data forced the computer to manage 20GiB before the write could ever happen.

While mechanical disk performance is consistent, solid state drives struggle to write small data sizes.

The erase blocks set aside for write merging average 1MiB in size, meaning that a file must fit into or be divisible by 1MiB increments to achieve optimal write performance. The difference in write sizes can have staggering implications: transferring 32MiB of data in 1MiB chunks can exceed 80MiBps, or three times the performance of that same 32MiB written in 4KiB chunks. As the average write is less than 50KiB, many users are often underwhelmed by disappointing I/O.

Researchers hope to refine NAND so small block performance eclipses that of mechanical drives, but the reality of NAND is that the tremendous jumps in speed will continue to grace the big block writes.

Reading data

Today’s modern solid state disks have similar initial read and write speeds, but read throughput can be over 25 percent faster with a proper block size. More impressively, the maximum write speeds are still climbing, having recently approached the 200MiBps barrier in just nine short months.

Though solid state read performance is better than writes, it continues to struggle with small data sizes.

NAND’s position as mass storage meant that it abandons some of the features that make NOR faster in reads. The biggest loss is the cut of eXecute in Place (XIP) technology which allows memory to be executed directly in flash space. Instead, NAND must copy requested data to system RAM before it can be run.

It is plain that moving data to manipulate it is an inefficient mechanism, but the technique is not as devastating to read performance as it is to write. This is because reads rely on the tremendous speed of DRAM to achieve its scores. Though DDR2-SDRAM latency is a shade slower than NAND at 60ns, DRAM posts throughput surpassing 1100MiBps as today’s flash disks approach the 200MiBps barrier.

While this seems fast, sustained read performance ails because every block of data must be read in its entirety, and in sequence. In just the same way that humans can find a page in a book much faster than they can read the page, solid state disks suffer too.

Random read performance is significantly better. Capable of accessing over a thousand files per second using a 0.1ns seek time, burst throughput is at 250MiBps and climbing. This means that tasks like opening programs, opening small files and even on-demand file loads inside a game can be quite quick.

NAND’s true enemies

While it is easy to believe that design trade-offs have stunted NAND, we have only begun to scratch the surface of its potential. Flash memory’s biggest opponent is the present hard drive ecosystem caught entirely unprepared for a technology of flash’s nature. Largely unchanged since the early nineties, the advent of Serial ATA and ever-increasing magnetic capacities have done little to alter the way in which we talk to hard drives.

Clustering

Today’s approach to mass storage assumes a mechanical drive which, without aid, is a single mass of bits that go undivided unlike NAND blocks. Operating systems rely on an abstraction layer known as a file system to logically divide this contiguous swath of data into smaller manageable pieces known as clusters. Operating systems are so reliant on file systems that stored data is simply unreadable without one.

While there are many prevalent file systems, the 1996 introduction of FAT32 with Windows 95 OSR2 was a great success that practically institutionalized the 4KiB cluster size. Such a size was chosen to alleviate the tremendous slack space suffered by FAT16’s 32KiB cluster size in an age of tiny files. This 4KiB cluster size has been carried forward into NTFS, the file system of choice for Windows 2000, XP and Vista.

We now live in a world where most would not bat a lash at a file size that may not have fit on a hard drive from the earliest years of FAT32’s life. Though it’s true that such large files would fit neatly into NAND’s 4KiB pages when clustered, this belies the larger point that flash-based devices can manipulate 1MiB of data as quickly as mechanical hard drives do 4KiB.

The solution to the problem is to increase the cluster size, for which there are several advantages:

Reduced file system complexity; less clusters means less to organize.
Increased read and write speed as cluster size approaches parity with block size.
Decreased slack space if the system is primarily composed of large files.

Yet increased cluster size is not a magic bullet for solid state disks, as most people have a mix of information. Games often contain a myriad of small files and operating systems are the sum of small files almost as a rule; yet movies, music, archives and MMOs are perfect candidates for enlarged cluster sizes. More frustrating than the anchor of small clusters is the complicated process to get larger clusters under modern Windows operating systems. Such a feat requires premeditated use of programs like Acronis Disk Director which can increase cluster sizes prior to the installation of Windows. It is also possible to resize existing clusters, but such a procedure is accomplished with a frighteningly varied degree of success.

Hard Drive Controllers

Today’s drive controllers, like cluster sizes, were built for the relatively simple mechanical drive. They assume that the operating system continues to manage disk I/O and that data operations can be performed directly within the disk space. This approach ignores that flash drives do considerable self-management and are forced to make monumental exchanges of data due to the write coefficient.

Various approaches have managed to improve the bleak outlook on solid state drive control. An Intel technology known as write amplification has reduced the coefficient to just 1.1 times the size of the intended write. This approach alleviates a burden on the SATA bus, DRAM subsystem, and on the drive’s own techniques for placing clusters into storage.

Operating Systems

But hardware controllers are only half of the equation. Windows, Linux and other operating systems are ultimately responsible for how the data gets to the controller for management, and most are not yet optimized for flash storage. Microsoft Windows is especially ill-equipped to communicate intelligently with today’s flash drives, much less their successors. Given that the primary test platform for flash disk review has been Windows, one wonders how much the early reputation flash had for poor performance can actually be attributed to the drives.

Not only is Windows guilty of being a poor traffic controller, Windows-based systems are particularly fond of heavy disk access. Fixated with indexing, swapping, buffering, caching and background optimizing, Windows is analogous to torture for today’s flash-based devices. This brand of drive interaction is another clear indicator that today’s drive ecosystem has been built around the radically dissimilar mechanical drive.

Limitations of solid state disks

Longevity of magnetic storage is rated in mean time between failures (MTBF), a figure that often exceeds one million hours of continuous usage. Western Digital’s prestigious Raptor 10k line offers a 1.2 million hour MTBF rating good for almost 137 years of operation. Though the MTBF rating is egregious in that it does not factor an irreparable end of product life, it is nevertheless a testament to the relative reliability of mechanical drives.

We know that mechanical drives rarely live a decade on from their purchase, much less a century, yet people are comfortable with their volatility because their date of death remains ambiguous. In opposition, the life of a solid state disk is not only clearly limited, but touted as a feature when it improves. Intel’s recent decision to offer solid state disks wowed the market with the promise that the drive could withstand up to 100,000 write cycles.

The write cycle, or the number of times a flash block may be erased and reliably programmed, is taxing for a flash drive. Pouring more than ten volts of electricity through such small and sensitive components takes a toll on the cells and their materials to such an extent that they simply wear out in the end. No longer capable of reliably capturing electrons through the Fowler-Nordheim process, the drive and its data degrade into disrepair.

In order to combat this effect, solid state disks come with a feature called “wear leveling” which intentionally distributes data erratically across the drive to assure that no block is receiving undue usage. Wear leveling and write amplification are just two parts of the bigger host of technologies that are ensuring the continued longevity of flash devices. While 100,000 cycles seems slight, it’s more than 100GiB of new information written to the disk every day for five years before approaching failure. The average lifetime of the SSD is indeed longer than that of a conventional drive, a testament to the power of solid state.

Not all are created equal

The conventional hard drive’s speed is tied very closely to its revolutions per minute (RPMs). Drives like the Western Digital Raptor series excel because their rotational velocity of 10,000 RPMs is almost fifty percent faster than the more conventional 7200 RPM drive. Some drives, particularly in notebooks, may be as debilitatingly low as 3200 RPMs.

Amongst flash drives, there is a similar distinction that is neither as tangible nor as linear: the cell type. Today’s NAND cell can be a single level cell (SLC) or a multi-level cell (MLC). Recall that the state of a NAND cell is determined by the strength of the charge captured in the floating gate. In single level cells there is a single voltage threshold that determines if the cell is programmed as a zero or a one. Multi-level cells have multiple thresholds allowing it to capture two bits of information.

The capacity of an MLC SSD can be up to twice that of an SLC drive that is otherwise identical and equipped with the same number of chips. At up to 250GiB, MLC drives can be spacious, but they can be on the order of two or three times slower than their premium SLC brethren. While exceptional performance from an MLC drive is not out of the question, it is important to identify the role the drive will play prior to purchase.

Winners and losers

As solid state disks grow in popularity, there is great opportunity for new growth in a storage industry that has typically suffered from razor-thin profit margins. The turgid but deliberate pace of conventional drive capacity has left a wash of similar products and bored consumers. In spite of NAS boxes and stylish portable storage, these technologies have only marginally increased the profits of their respective manufacturers.

Flash-based hard drives offer a chance for existing hard drive manufacturers to rebound in a market increasingly condensing under the Seagate brand. Though only a small fraction of the SSD’s cost is returned in profit, a prolonged spike in sales volume will do much to reinvigorate an ailing industry. Because not all flash cells or other internal components are created equally, manufacturers will also have new opportunities to offer premium products.

The solid state market is going to be one that’s vastly different from the conventional drive market. As almost every drive manufacturer runs its own plant, traditional drive companies are saddled with the intense cost of labor and a ballooning capital. The barrier to entry is much lower for solid state disks given that flash chips are produced in volume by only a handful of manufacturers. This means that a smaller company has the opportunity to purchase a stock of chips and support electronics and assemble it in a much smaller workplace that requires fewer employees. Names like Hama, Memoright and Mtron — names that few enthusiasts have ever heard of — are exploding into household names thanks to the low cost of flash disk production.

But not all companies are prepared to win in the arrival of the solid state disk. Companies that make a living off of managing and addressing issues with conventional drives may be driven to other industries, if not bankrupted by flash. Companies like Diskeeper Corporation have made their living off of industry-leading defragmentation software that would all but ruin a solid state disk. Given that flash devices have a limited number of writes and intentionally fragment their contents, each bit of data moved to a contiguous area just begs for a drive’s early death.

Consider also the popular SpinRite program which has achieved outstanding success in recovering data from mechanical storage. Its crowning feature is the ability to analyze the physical geometry of the hard disk from various angles to reconstruct the contents of information that is unreadable head-on. What happens to SpinRite when there is no grey area between a dead drive and a functional drive?

Winding down

The burgeoning solid state disk industry is a rather different animal from the hard drives we are accustomed to. Even while suffering an unready ecosystem and a consumer base slow to reconcile the new paradigm, flash is already charting an incredible course. Fourteen short months were enough to make a mediocre successor to conventional storage into an undeniable force that will only get better.

As flash disks set to depose magnetic disks to which we owe more than 30 years of storage, the price is falling at a criminal rate. The future’s low prices and ever-increasing performance will open the doors to a whole host of new consumers which will virtually guarantee its success. While new companies and new drive owners delight in the march of progress, we can only wonder what will happen to those firms which depended on a market that may all but evaporate by 2012.

Comments

25 Sep 2008 ~ 5:32pm Snarkasm Exceptional article as always, Thrax. Excellent coverage and breadth.
25 Sep 2008 ~ 6:11pm BuddyJ The width and girth of this treatise are exorbitant!
25 Sep 2008 ~ 11:17pm Winfrey These drives can really help out laptop performance IMO. Laptops have those really slow rotational speeds (usually 5400RPM) which cuts into performance more than you would think, especially high end ones.
26 Sep 2008 ~ 7:47am Zuntar SSD have a long way to go before I'll even consider one.
26 Nov 2008 ~ 7:11am Mario Recently, I started seriously looking at getting a solid state drive (SSD) as my primary boot drive. After careful consideration, I have concluded that they still are not ready for prime time from the enthusiast gamer's point of view. The two biggest deterrent factors are the cost of SSD's and their life expectancy. As of today, an Intel X25-M SATA Solid-State Drive costs $US595 in quantities of 1000. Another very disturbing issue is the fact that regular defragmentation of a solid state drive would dramatically decrease it's life expectancy. As it stands, the earliest I see myself having an SSD is sometime around 2010.
26 Nov 2008 ~ 8:35am Thrax Fragmentation is not an issue. SSDs intentionally fragment files across the drive in a process called "wear leveling." Wear leveling assures that no one flash cell gets more work than others, thereby extending the life of the drive. If a file were stored in 100,000 places or in one contiguous block, an SSD would be able to load that file at the same speed.

Defragmentation is a cheap hack to sweep the performance limitations of mechanical drives under the rug. Defragging exists because there are performance penalties if the mechanical drive head needs to see files all over the disk.

Secondly, the longevity (MTBF) of the newest generation of Intel SSDs is as long or longer than traditional drives. Reliability has reached parity, it's not really a concern any more.

I do, however, agree that the price needs to come down.
5 Feb 2009 ~ 5:06pm james braselton HI THERE I KNOW WHY FLASH IS BETTER THEN A HARD DRIVE I STILL HAVE A FULLY WORKING COMADORE 64 I BET NO ONE ELSE HAS A COMADORE 64 AND GAMES FOR IT AND A BAUD 2400 MODEM OPTINAL AT THAT TIME SO 64 KB KILOBYETS VERSES A 64 GB SOLID STATE FLASH DRIVE USEING FLASH CHIPS
5 Feb 2009 ~ 5:10pm primesuspect Actually our friend Tim is looking for Commodore 64 stuff. I think you guys would get along well.
23 Mar 2009 ~ 6:33pm Celcho Thrax, you should be a research analyst on wall street... A shame there barely is one anymore. Excellent work, though, as always.
23 Mar 2009 ~ 7:13pm pigflipper Hey Ryan, forget your log in password?
23 Mar 2009 ~ 7:27pm Thrax Sup, Celcho!
23 Mar 2009 ~ 9:17pm primesuspect Celcho!
23 Mar 2009 ~ 9:44pm QCH Bump for an awesome article!!!
3 Apr 2009 ~ 3:22pm David What is 100GiB?

The article states you could write 100GiB per day for 5 years before approaching failure.
3 Apr 2009 ~ 3:29pm BuddyJ Hi David. Here ya go:
http://en.wikipedia.org/wiki/GiB
3 Apr 2009 ~ 3:35pm Thrax The article links to this wikipedia entry on one of the pages: http://en.wikipedia.org/wiki/KiB

Basically, the SI units kilo-, mega-, giga- all refer to powers of 1000. The word "gigabyte" suggests that it's composed of 1000 megabytes. But that's not how storage works, because storage is ACTUALLY based on powers of 1024. A gigabyte is ACTUALLY 1024 megabytes.

I wanted to be very clear about how much data the drive can write.

8 bits = 1 byte
1024 bytes = 1 kibibyte (1KiB)
1024 kibibytes = 1 mibibyte (1MiB)
1024 mibibytes = 1 gibibyte (1GiB)

This discrepancy is why a "250GB" hard drive (Which you would think is 250,000 megabytes) is actually 244,000 mibibytes, because the computer judges values in powers of 1024. So 250,000/1024 = 244,000.

It's confusing and stupid.
23 Feb 2010 ~ 12:33am rexrivera dude, you'r the man! hands down..!
5 May 2010 ~ 3:35am zew Is there an explanation on why erase can only be done on the whole block? And why can't a page be overwritten?
25 Nov 2010 ~ 6:25pm DWatt So in practical terms, what does "partition alignment" really mean, its a comcept that is getting a lot of play on the SSD forums - is it necessary to get best performance from an SSD. Seems like not way to ensure that you have partition alignment in XP. Can you clone a previous OS partition to an SSD in windows XP and get good performance. What about ideal cluster size?
25 Nov 2010 ~ 6:35pm Thrax Aligning partitions is a complicated process on Windows XP, but it's absolutely essential to get good performance. Of course, XP doesn't support the ATA TRIM command, so SSD performance on XP is pretty much doomed to decay unless the mfgr offers a garbage collection program.

Windows 7 is easily the best OS for SSDs at this time.

Cluster size should match the block size of the drive, usually 512k or 1MB.
6 Dec 2010 ~ 8:03pm jedihobbit It appears that I've added to my already software challanged self by winning a 32GB SSD frin Zalman (who know who really builds them!! :rolleyes2) the SSD0032S1 (http://www.zalman.com/ENG/product/Product_view.asp?idx=421).

So from what I've just read here if I plan to jump from XP to Win7 now is the time if I plan to use this thing? So if using as my primary is 32GB enough room for "everything"......meaning OS, apps, etc?

Figures, as I have a 300GB V'Raptor that was supposed to be my primary in the build.........;D
6 Dec 2010 ~ 8:17pm Thrax There's nothing that goes on C: that can't be installed or moved somewhere else: swap file, My Documents, the temp directory, applications, etc. 20GB is perfectly sufficient for Windows 7.
6 Dec 2010 ~ 8:29pm Garg

Thrax wrote:

There's nothing that goes on C: that can't be installed or moved somewhere else: swap file, My Documents, the temp directory, applications, etc. 20GB is perfectly sufficient for Windows 7.

I ran into issues with a 20GB partition for Win 7 x64 on my laptop, and eventually had to expand it. I'm not sure what I could have overlooked, but I moved everything I could think of (and all of the things listed above). My 4GB hibernate file was stuck on C, as far as I could tell.

I finally expanded the partition into a neighboring partition I had been using for Linux when I realized that Visual Studio insists on being installed onto the C drive, and I didn't have enough room left for it. Programming software made by programmers can't be run from the D drive, for reasons I won't ever understand.

At any rate, 32GB should be enough. 20 was just cutting it close after Windows kept accumulating bloat/updates.
6 Dec 2010 ~ 8:39pm jedihobbit

Thrax wrote:

There's nothing that goes on C: that can't be installed or moved somewhere else: swap file, My Documents, the temp directory, applications, etc. 20GB is perfectly sufficient for Windows 7.

Another noob question(s)

1. If the SSD is C: should I use the V'Rator for the misc stuff as the 2 x 1TBs are for mirror??

2. OR??
6 Dec 2010 ~ 8:46pm Thrax

Gargoyle wrote:

I ran into issues with a 20GB partition for Win 7 x64 on my laptop, and eventually had to expand it. I'm not sure what I could have overlooked, but I moved everything I could think of (and all of the things listed above). My 4GB hibernate file was stuck on C, as far as I could tell.

I finally expanded the partition into a neighboring partition I had been using for Linux when I realized that Visual Studio insists on being installed onto the C drive, and I didn't have enough room left for it. Programming software made by programmers can't be run from the D drive, for reasons I won't ever understand.

At any rate, 32GB should be enough. 20 was just cutting it close after Windows kept accumulating bloat/updates.

Hibernate:
open administrative command prompt and issue this command: powercfg -h off

I have otherwise been able to dodge bloat since I installed this copy of Windows 7 in March.
6 Dec 2010 ~ 8:47pm Thrax

jedihobbit wrote:

Another noob question(s)

1. If the SSD is C: should I use the V'Rator for the misc stuff as the 2 x 1TBs are for mirror??

2. OR??

Yes, use the raptor for mass storage, and applications you don't need super speedy loading on.
7 Dec 2010 ~ 8:02am RichD I think I kind of got the jist of the article although I have to confess some of the more complex technical stuff did loose me.

Is there a market for a hybrid drive which has a small amount of traditional drive storage thatcan be used to store small files and then a larger SSD for storing the large data files? I guess you would need to have a clever controller that looks at the file sizes and distributes accordingly? You would also need to have some sort of study into the total file size for small files and the total file size for large files so that you can gauge the ratio of traditional storage vs SSD.
7 Dec 2010 ~ 8:38am Thrax Hybrid HDDs were attempted in 2007 with the release of Windows Vista, but they flopped spectacularly. There's some technical merit for it (Seagate Momentus XT), but it's unlikely to ever hit the desktop.
7 Dec 2010 ~ 9:04am primesuspect RichD:

We were just talking about Seagate Momentus XT drives the other day. It's an interesting technology.