Howdy, stranger! Ready to join the community? [log in]

The hows and whys of SSDs

 

Harnessing flash cells for mass storage

NAND is unique from memory technologies like NOR and DRAM because it is not byte-addressable, or incapable of reading and writing one byte at a time. Because the complexity of wiring for byte-addressable memory scales with capacity, NAND’s destiny as mass storage meant a different approach. To overcome addressing restrictions, NAND cells are grouped by the hundreds into pages, and the pages into blocks. Each page shares a common set of word and bit lines and are organized into four common configurations:

  • 32 pages of 512 bytes for a 16 KiB block
  • 64 pages of 2,048 bytes for a 128 KiB block
  • 64 pages of 4,096 bytes for a 256 KiB block
  • 128 pages of 4,096 bytes for a 512 KiB block

Even though data can be read and written by the page, up to the entire block can be read and written in a single pass. This means that a 4 KiB file could consume the entire block which may be up to 128 times the size of the file. Such wasted space, called slack space, will go unused until replaced by data that uses the space more efficiently.

Writing data

While solid state disks are quite adept at locating blocks for I/O, NAND cells can’t directly write to those located block. Instead, data is first written to an erase block and then merged with the existing contents of the drive to complete the write sequence. This merger process is rated with a write coefficient that compares the amount of data managed in DRAM, the ATA bus, and on the drive’s buffers against the size of the actual write. Many of today’s hard drives have a write coefficient of about 20:1, meaning 1GiB of written data forced the computer to manage 20GiB before the write could ever happen.

While mechanical disk performance is consistent, solid state drives struggle to write small data sizes.

While mechanical disk performance is consistent, solid state drives struggle to write small data sizes.

The erase blocks set aside for write merging average 1MiB in size, meaning that a file must fit into or be divisible by 1MiB increments to achieve optimal write performance. The difference in write sizes can have staggering implications: transferring 32MiB of data in 1MiB chunks can exceed 80MiBps, or three times the performance of that same 32MiB written in 4KiB chunks. As the average write is less than 50KiB, many users are often underwhelmed by disappointing I/O.

Researchers hope to refine NAND so small block performance eclipses that of mechanical drives, but the reality of NAND is that the tremendous jumps in speed will continue to grace the big block writes.

Reading data

Today’s modern solid state disks have similar initial read and write speeds, but read throughput can be over 25 percent faster with a proper block size. More impressively, the maximum write speeds are still climbing, having recently approached the 200MiBps barrier in just nine short months.

Though solid state read performance is better than writes, it continues to struggle with small data sizes.

Though solid state read performance is better than writes, it continues to struggle with small data sizes.

NAND’s position as mass storage meant that it abandons some of the features that make NOR faster in reads. The biggest loss is the cut of eXecute in Place (XIP) technology which allows memory to be executed directly in flash space. Instead, NAND must copy requested data to system RAM before it can be run.

It is plain that moving data to manipulate it is an inefficient mechanism, but the technique is not as devastating to read performance as it is to write. This is because reads rely on the tremendous speed of DRAM to achieve its scores. Though DDR2-SDRAM latency is a shade slower than NAND at 60ns, DRAM posts throughput surpassing 1100MiBps as today’s flash disks approach the 200MiBps barrier.

While this seems fast, sustained read performance ails because every block of data must be read in its entirety, and in sequence. In just the same way that humans can find a page in a book much faster than they can read the page, solid state disks suffer too.

Random read performance is significantly better. Capable of accessing over a thousand files per second using a 0.1ns seek time, burst throughput is at 250MiBps and climbing. This means that tasks like opening programs, opening small files and even on-demand file loads inside a game can be quite quick.

NAND’s true enemies

While it is easy to believe that design trade-offs have stunted NAND, we have only begun to scratch the surface of its potential. Flash memory’s biggest opponent is the present hard drive ecosystem caught entirely unprepared for a technology of flash’s nature. Largely unchanged since the early nineties, the advent of Serial ATA and ever-increasing magnetic capacities have done little to alter the way in which we talk to hard drives.

Clustering

Today’s approach to mass storage assumes a mechanical drive which, without aid, is a single mass of bits that go undivided unlike NAND blocks. Operating systems rely on an abstraction layer known as a file system to logically divide this contiguous swath of data into smaller manageable pieces known as clusters. Operating systems are so reliant on file systems that stored data is simply unreadable without one.

While there are many prevalent file systems, the 1996 introduction of FAT32 with Windows 95 OSR2 was a great success that practically institutionalized the 4KiB cluster size. Such a size was chosen to alleviate the tremendous slack space suffered by FAT16’s 32KiB cluster size in an age of tiny files. This 4KiB cluster size has been carried forward into NTFS, the file system of choice for Windows 2000, XP and Vista.

We now live in a world where most would not bat a lash at a file size that may not have fit on a hard drive from the earliest years of FAT32’s life. Though it’s true that such large files would fit neatly into NAND’s 4KiB pages when clustered, this belies the larger point that flash-based devices can manipulate 1MiB of data as quickly as mechanical hard drives do 4KiB.

The solution to the problem is to increase the cluster size, for which there are several advantages:

  • Reduced file system complexity; less clusters means less to organize.
  • Increased read and write speed as cluster size approaches parity with block size.
  • Decreased slack space if the system is primarily composed of large files.

Yet increased cluster size is not a magic bullet for solid state disks, as most people have a mix of information. Games often contain a myriad of small files and operating systems are the sum of small files almost as a rule; yet movies, music, archives and MMOs are perfect candidates for enlarged cluster sizes. More frustrating than the anchor of small clusters is the complicated process to get larger clusters under modern Windows operating systems. Such a feat requires premeditated use of programs like Acronis Disk Director which can increase cluster sizes prior to the installation of Windows. It is also possible to resize existing clusters, but such a procedure is accomplished with a frighteningly varied degree of success.

Hard Drive Controllers

Today’s drive controllers, like cluster sizes, were built for the relatively simple mechanical drive. They assume that the operating system continues to manage disk I/O and that data operations can be performed directly within the disk space. This approach ignores that flash drives do considerable self-management and are forced to make monumental exchanges of data due to the write coefficient.

Various approaches have managed to improve the bleak outlook on solid state drive control. An Intel technology known as write amplification has reduced the coefficient to just 1.1 times the size of the intended write. This approach alleviates a burden on the SATA bus, DRAM subsystem, and on the drive’s own techniques for placing clusters into storage.

Operating Systems

But hardware controllers are only half of the equation. Windows, Linux and other operating systems are ultimately responsible for how the data gets to the controller for management, and most are not yet optimized for flash storage. Microsoft Windows is especially ill-equipped to communicate intelligently with today’s flash drives, much less their successors. Given that the primary test platform for flash disk review has been Windows, one wonders how much the early reputation flash had for poor performance can actually be attributed to the drives.

Not only is Windows guilty of being a poor traffic controller, Windows-based systems are particularly fond of heavy disk access. Fixated with indexing, swapping, buffering, caching and background optimizing, Windows is analogous to torture for today’s flash-based devices. This brand of drive interaction is another clear indicator that today’s drive ecosystem has been built around the radically dissimilar mechanical drive.

« PreviousNext page »

Share |

16 Comments:

  1. Snarkasm
    The Photographer.

    Exceptional article as always, Thrax. Excellent coverage and breadth.

  2. Buddy J
    Dept. of Propaganda

    The width and girth of this treatise are exorbitant!

  3. Winfrey
    kaishakunin

    These drives can really help out laptop performance IMO. Laptops have those really slow rotational speeds (usually 5400RPM) which cuts into performance more than you would think, especially high end ones.

  4. Zuntar
    Modder extraordinaire

    SSD have a long way to go before I'll even consider one.

  5. Mario
    Guest

    Recently, I started seriously looking at getting a solid state drive (SSD) as my primary boot drive. After careful consideration, I have concluded that they still are not ready for prime time from the enthusiast gamer's point of view. The two biggest deterrent factors are the cost of SSD's and their life expectancy. As of today, an Intel X25-M SATA Solid-State Drive costs $US595 in quantities of 1000. Another very disturbing issue is the fact that regular defragmentation of a solid state drive would dramatically decrease it's life expectancy. As it stands, the earliest I see myself having an SSD is sometime around 2010.

  6. Fragmentation is not an issue. SSDs intentionally fragment files across the drive in a process called "wear leveling." Wear leveling assures that no one flash cell gets more work than others, thereby extending the life of the drive. If a file were stored in 100,000 places or in one contiguous block, an SSD would be able to load that file at the same speed.

    Defragmentation is a cheap hack to sweep the performance limitations of mechanical drives under the rug. Defragging exists because there are performance penalties if the mechanical drive head needs to see files all over the disk.

    Secondly, the longevity (MTBF) of the newest generation of Intel SSDs is as long or longer than traditional drives. Reliability has reached parity, it's not really a concern any more.

    I do, however, agree that the price needs to come down.

  7. james braselton
    Guest

    HI THERE I KNOW WHY FLASH IS BETTER THEN A HARD DRIVE I STILL HAVE A FULLY WORKING COMADORE 64 I BET NO ONE ELSE HAS A COMADORE 64 AND GAMES FOR IT AND A BAUD 2400 MODEM OPTINAL AT THAT TIME SO 64 KB KILOBYETS VERSES A 64 GB SOLID STATE FLASH DRIVE USEING FLASH CHIPS

  8. primesuspect
    The Icrontic Guy

    Actually our friend Tim is looking for Commodore 64 stuff. I think you guys would get along well.

  9. Celcho
    Guest

    Thrax, you should be a research analyst on wall street... A shame there barely is one anymore. Excellent work, though, as always.

  10. pigflipper
    Shot Master

    Hey Ryan, forget your log in password?

  11. Sup, Celcho!

  12. primesuspect
    The Icrontic Guy
  13. QCH
    Guru

    Bump for an awesome article!!!

  14. David
    Guest

    What is 100GiB?

    The article states you could write 100GiB per day for 5 years before approaching failure.

  15. Buddy J
    Dept. of Propaganda
  16. The article links to this wikipedia entry on one of the pages: http://en.wikipedia.org/wiki/KiB

    Basically, the SI units kilo-, mega-, giga- all refer to powers of 1000. The word "gigabyte" suggests that it's composed of 1000 megabytes. But that's not how storage works, because storage is ACTUALLY based on powers of 1024. A gigabyte is ACTUALLY 1024 megabytes.

    I wanted to be very clear about how much data the drive can write.

    8 bits = 1 byte
    1024 bytes = 1 kibibyte (1KiB)
    1024 kibibytes = 1 mibibyte (1MiB)
    1024 mibibytes = 1 gibibyte (1GiB)

    This discrepancy is why a "250GB" hard drive (Which you would think is 250,000 megabytes) is actually 244,000 mibibytes, because the computer judges values in powers of 1024. So 250,000/1024 = 244,000.

    It's confusing and stupid.

Hey, be nice. Icrontic is full of good people, we promise.

New Features on Icrontic: