> we run a fairly comprehensive set of block-level tests using fio, consisting of both sequential and random asynchronous reads and writes straight to the disk. Then we throw a few timed runs of the venerable dd program at it.
Running dd as a benchmark is a major red flag. It show that they didn't know what they were doing with fio, and didn't trust its results. They later started using IOzone and a custom-written tool to accomplish stuff they should have done with fio in their initial testing.
They also did not mention pre-conditioning the drives or ensuring that their tests run long enough to reach a steady state. This is one of the most important aspects of enterprise SSD testing and they would have known that if they'd consulted any outside resources on the subject instead of making up their own testing guidelines from a position of extreme ignorance about the fundamentals of the hardware they were using and the details of their own workload.
They really should stop calling any of their tests "comprehensive".
This is not a comprehensive guide to testing SSDs, it’s the story of what the author went through when trying to test SSDs. It’s well written and the author seemed to really engage with the topic and describe all the setbacks he had and research they did. I did not think he presented himself as an expert, just a software engineer tasked with upgrading their SSDs. And who knows, maybe this was only a 20% time project.
There are a lot of blog posts on HN that have much less actual content and where the authors have much less of a clue, yet often the response is overwhelmingly positive because someone took the time to write it up. You should really be more charitable here.
And to adress the calls for “outside experts”: If everyone called in outside experts for everything hardware (or software) related, we software engineers would never get to do anything cool or learn some new framework. We’d just be watching an outside expert do their thing. And outside “experts” are not necessarily better, often they might just sell themselves better. And who is going to check their work if the knowledge is all outsourced?
I think it’s great that the BBC lets their engineers do this and learn along the way, and a place where that is possible sounds like a nice place to work. It’s not like they had any downtime or anything because of this.
You completely misinterpreted (and misquoted) me on that one. I wasn't implying that they should hire a consultant for this kind of thing, but they should at least have bothered to read anything about the methodology used by SSD reviewers or the industry standard storage testing methodology freely published by organizations like SNIA. It's clear the BBC guys didn't even spend an afternoon trying to read up on how to evaluate SSD performance; they just jumped in and started re-inventing the wheel, hitting all the foreseeable problems along the way. It looks like they now have a clue and have learned a lot from the process, but this is not how you should handle this kind of upgrade.
We're acutely aware that we've still got much to learn in this space, so if there are thoughts you have on how we could do better we're all ears.
Finally, while I assured you it wasn't a PR piece we're always looking for engineers in this area (and across the whole BBC) so if you'd be interested in helping us improve, get in touch.
> We also looked up whether our HBA used TRIM in its current configuration. It turns out, in RAID mode, the HBA did not support TRIM. We did do some trim-enabled testing with a different machine, but these results are hard to compare fairly. In any case, we can't currently enable TRIM on our production systems.
In our experience SSD write performance goes to sh*t if you don't regularly TRIM them.
Running fstrim once a day is enough to keep them healthy.
RAID cards not passing TRIM is a big problem for us too...
(Experience from day job at Hosting Provider)
Interesting, is that because of the load? It seemed "modern" SSDs have GCs good enough that trim isn't quite necessary anymore to ensure good performances in consumer loads.
> RAID cards not passing TRIM is a big problem for us too...
Are there NVMe RAID cards? I assume they'd necessarily pass the command along considering deallocate* is just one parameter/option of the DATA SET MANAGEMENT command, or do RAID cards just drop the entire command?
A drive has no way to tell whether filesystem is using a given block or not. TRIM is a way for the filesystem to tell it that. So I would imagine the GC that you're referring to is working on the blocks marked with TRIM.
BTW, besides running fstrim from cron on Linux, you can also use discard flag to mount the drive, so the filesystem sends TRIM command when files are deleted.
There are no hardware RAID solutions for NVMe, though there are now several hardware platforms supporting software RAID for NVMe devices in their motherboard firmware so you can boot from a NVMe RAID array. As with any other RAID solution, translating trim/unmap/deallocate commands takes a bit of effort, and less mature NVMe RAID solutions don't necessarily bother.
Around 15 years ago my company did a Linux distribution on CDs: KRUD. It was updated monthly, and we had something like 400 subscribers. For various reasons we burned these CDs in house on a cluster I built.
We would burn, eject, read and checksum, and if the read test succeeded we would ship it out. We found some users with some discs had problems reading them. We contacted these users and paid them to return the CDs and did further testing on them.
Our initial test was using dd, and we found that the discs that were not obviously damaged in shipping, would tend to pass tests on some of our CD-ROM drives, but fail on others. But when they did succeed, they would tend to take longer than normal.
I wrote a new test program that instead of using dd directly used SCSI read commands, and timed every one. It would then count the number of reads that were "slow" (like 2x normal) and those that were "really slow" (like 5x), and if these got over a certain threshold we would throw away the disc.
Being able to time the raw operations was incredibly useful, and seems like it could have shown the authors of this paper problems before being deployed to production.
Except, they didn't really seem to do very thorough testing of the drives. Running stress testing on a 1TB drive for an hour seems pretty short.
Also in my above job we did hosting. We found that if we burned in disks by reading/writing to them 10 times ("badblocks -svw -p 10"), we would almost never experience drive failures on the Hitachi drives we were using. If we didn't do this, the drives would have a fairly high chance of falling out of the RAID array in production.
As drive sizes increased from 20GB to 200GB to 1TB, these tests started taking weeks to complete. But, they were totally worth it.
Why don't operating systems treat SSDs more flash memory, and why doesn't the file system cooperate with the underlying hardware instead of pretending it's a disk? For home use that may even work, but in a demanding environment the extra complexity will invariably fail.
This is a genuine question, I'm an amateur here.
1. Each OS that wants to use the drive needs a compatible implementation of the FTL. Consumer systems always have at least two operating systems in play (UEFI counts for these purposes). Enterprise systems are where you will actually find non-boot data-only drives.
2. Flash memory changes. The FTL needs very different parameters depending on whether you're using Toshiba flash or Samsung flash, and even depending on whether you're using last year's Toshiba flash or the stuff they're manufacturing today.
These aren't insurmountable problems, but they're enough to keep such products confined to a small niche. Instead, we're seeing a trend of SSDs accepting optional hints that allow them to perform the kinds of optimizations you'd expect from a fully host-managed SSD. The ATA TRIM command was just the tip of this iceberg.
The simple reason is because the SSDs themselves expose a regular HD interface and then does a lot of the flash-memory related stuff itself. For example, if you don't include TRIM support (Which early SSDs did not have) there is no 'erase' command the OS can send to an SSD.
With that in mind, SSDs also have memory controllers on them that map the blocks the OS sees to actual SSD blocks (scattered across the memory chips). So when the OS writes to block 1 it may write to block 15 internally on the SSD, and then block 2 might write to block 4002. Combine this with caching and other various details on the SSD side, and it leaves little predictable behavior for the OS to exploit.
If they'd been starting from scratch, while thinking about modern SSDs, it's quite likely they wouldn't have built an application load tester using files containing only dots.
But as it was an existing system, it didn't get the same amount of attention.
Just to confirm, I have seen the behaviour described herein, with write-back cached making enormous difference with the Samsung EVO product in particular.
https://youtu.be/RqaZTwW_X2o?t=511
For sustained write performance you need the 960 PRO.
For example on a 4kb sync write with 16 threads test, the 960 Evo cannot do more than 1000 iops. In comparison the Intel P4800X (Optane) does friggin 500 000 iops on the same test. That is a 500X difference.
https://forums.servethehome.com/index.php?threads/did-some-w...
I noticed they didn't mention any brands by name though, why is that?
[0] http://www.bbc.co.uk/editorialguidelines/guidelines/editoria...
EDIT: fixed policy name and added link
See, for example:
- The constant mention of speaking to people 'over Skype' on the News
- Publicization of Twitter hashtags on Questiontime and other programs
- Hours worth of Top Gear footage (and the entire Arctic Special) that were effectively Toyota Hilux advertisements
As a public service broadcaster in the UK, the BBC must be very careful about naming specific brands and products due to rules laid out in their legal remit (and fear of legal action if from a party that feels unfairly disadvantaged by a competitor getting a good mention or them getting a bad one).
This is way Blue Peter always use "sticky backed plastic" instead of "sellotape" or "scotch tape", and people on BBC shows "vacuum" where the common parlance is "to hoover" (hoover being a brand name that got verbed like Google -> to google).
Hang on! Surely, "to google" refers to performing a search specifically with Google, right?
We may be writing software, but without a working knowledge of hardware it's not worth much!