Mechanical Hard Disk Drives (HDD) and Solid State Drives (SSD) are the two most common types of hard drives used as external storage for computers, and it takes a long time for the CPU to access the data they store, as shown in the table below, where it takes 1,500 times longer to access a random 4KB of data in an SSD than it does to access the main memory, and 100,000 times longer to seek a mechanical disk than it does to access the main memory.
Work | Latency |
---|---|
L1 cache reference | 0.5 ns |
Branch mispredict | 5 ns |
L2 cache reference | 7 ns |
Mutex lock/unlock | 25 ns |
Main memory reference | 100 ns |
Compress 1K bytes with Zippy | 3,000 ns |
Send 1K bytes over 1 Gbps network | 10,000 ns |
Read 4K randomly from SSD* | 150,000 ns |
Read 1 MB sequentially from memory | 250,000 ns |
Round trip within same datacenter | 500,000 ns |
Read 1 MB sequentially from SSD* | 1,000,000 ns |
Disk seek | 10,000,000 ns |
Read 1 MB sequentially from disk | 20,000,000 ns |
Send packet CA->Netherlands->CA | 150,000,000 ns |
Although the disk seek time is only 10ms, it seems like a very long time to the CPU, and when we scale the above times up equally, we can visualize the performance difference. If the CPU takes 1 second to access the L1 cache, it takes 3 minutes to access the main memory, 3.4 days to read random data from the SSD, 2 months for disk seek, and possibly over a year for network transfers.
In computer architecture, the hard disk is a common input and output device, and the operating system does not necessarily need a hard disk to boot; it can be booted from either a hard disk, a network device, or an external device, so a hard disk is not necessary for the computer to run.
As an external input and output device, compared to CPU cache and memory, the extremely slow read and write speed of the hard disk makes sense, however, the difference in speed of thousands or even hundreds of thousands of times is really hard to imagine or accept, in this article, we will analyze why the CPU access to the hard disk is very slow: * The CPU access to hard disk data is a complex process.
- CPU access to hard disk data is a complex process, it will first read the data from the disk into memory through I/O operations and then access the data in memory.
- mechanical hard disks rely on mechanical structures in accessing data in the disk, which require moving the mechanical arms in the disk.
I/O operations
A CPU that wants to access data in the disk must first read the data in the disk into memory through I/O operations and then access the data stored in memory. Computers contain three common types of I/O operations - Programmed I/O, Interrupt-driven I/O, and Direct Memory Access - which we will describe in turn.
If we want to output Hello World
on the screen, the CPU will write a new character to the I/O device each time, and after writing, it will poll the device’s status and wait for it to finish its work and write a new character. This approach is simple, but it takes up all the CPU resources, which can cause a serious waste of computing resources in some complex systems.
Interrupt-driven I/O is a more efficient way to perform I/O operations. In programmed I/O, the CPU actively obtains the state of the device and waits for it to become idle, but with interrupt-driven I/O, the device initiates an interrupt to suspend the current process and save the context when it becomes idle, and the OS executes the I/O device’s interrupt handler.
- If the current does not contain the character to be printed, stop the interrupt handler and resume the suspended process.
- If the current contains the character to be printed, copies the next character to the device and resumes the paused process.
Using interrupt-driven I/O allows the CPU to handle other tasks when the device is busy, maximizing CPU utilization and eliminating the waste of precious computing resources. Compared to programmed I/O, interrupt-driven I/O gives a portion of the work to the I/O device, so it can improve resource utilization.
Direct memory access uses the DMA controller to perform I/O operations, and interrupt-driven I/O requires an OS interrupt to be triggered for each character, which consumes a certain amount of CPU time. When we use the DMA controller, the CPU reads all the data in the buffer to the DMA controller at once, and the DMA controller takes care of writing the data to the I/O device by character.
While the DMA controller frees up the CPU and reduces the number of interrupts, its execution speed is slow compared to the CPU, and if the DMA controller cannot drive I/O devices fast enough, the CPU may wait for the DMA controller to trigger an interrupt, in which case interrupt-driven I/O or programmed I/O can provide faster access.
By default, we use the DMA controller to perform I/O tasks, but programmed I/O and interrupt-driven I/O are not unacceptable options. When the CPU often has to wait for the DMA controller to perform I/O tasks, using interrupt-driven I/O or even polled programmed I/O can result in higher throughput, however, either way, I/O is a more time-consuming and complex operation in the program.
Mechanical Hard Disk Drive
A mechanical hard disk drive (Hard Disk Drive, HDD) is an electronic-based, non-volatile mechanical data storage device that uses magnetic memory to store and find data on the disk, and in the process of reading and writing data, the head attached to the mechanical arm of the drive reads and writes bits on the surface of the disk.
It is because the disk has a relatively complex mechanical structure, so the disk takes a lot of time to read and write, and the read and write performance of the database is basically dependent on the performance of the disk, if we query a random piece of data in a database using a mechanical hard disk, this may trigger random I/O on the disk, however the cost required to read the data from the disk into memory is very large, ordinary disks (non-SSD) need to load data through these processes of queuing, seeking, spinning and transfer, which takes about 10ms.
We can use the order of 10ms to estimate the time taken by random I/O when estimating database queries, what we want to say here is that random I/O can have a very big impact on database query performance, while sequential reading of data from disk can reach 40MB/s, the performance gap between the two is several orders of magnitude, so we should also try to reduce the number of random I/O in order to improve performance.
A solid state drive (SSD) is a computer storage device that uses flash memory as persistent memory. Unlike mechanical hard drives, SSDs do not contain any mechanical structure, and we use them to read or store data without using any mechanical structure, because everything is done by circuitry, so SSDs can read and write much faster than HDDs.
Both mechanical hard drives and SSDs have been decreasing in price since their inception. Mechanical hard drives are the primary external storage used in data centers today, and most general purpose commercial servers use mechanical hard drives as their primary external storage, but because SSDs can read and write at speeds tens of times faster than mechanical hard drives, more and more servers, especially databases, are using SSDs as external storage. However, as an external storage device with a mechanical structure, it is very mature and has a large capacity, but it is susceptible to external interference when subjected to vibration.
Summary
The hard drive is an external storage device on the computer that can store a large amount of data persistently, however, the CPU cannot access the data in the hard drive directly, when the computer starts up the operating system will load the data in the hard drive into memory for the CPU to access, but if the data that the CPU wants to access is not in memory, then we need to spend thousands or even hundreds of thousands of times more time to read the data, which is mainly caused by two reasons.
- the CPU needs to access data in external storage through I/O operations, and several methods - programmed I/O, interrupt-driven I/O, and DMA - all introduce additional overhead and take up more CPU time.
- Mechanical hard drives access the data stored in them through mechanical structures, and each random I/O of the drive requires several processes of queuing, seeking, spinning and transferring data, consuming about 10ms of time.
As we mentioned in the article, hard disks are not a necessary hardware device for computer operation, computers can load the data needed for booting into memory from any external storage device such as disks, CDs, etc. and boot normally, but hard disks are already the most common external storage device today. At the end of the day, let’s look at some of the more open-ended related issues, and interested readers can think carefully about the following questions.
- Must data written to a hard drive be stored persistently and not lost?
- Why is data in memory cleared after a power failure and reboot?