December 22, 2016

Measuring SSD interrupt delays

by Mikhail Karpenko

Sometimes we need to test disks connected to camera and find out if a particular model is a good candidate for in-camera stream recording application. Such disks should not only be fast enough in terms of write speed, but they should have short ‘response time’ to write commands. This ‘response time’ is basically the time between command sent to disk and a response from disk that this command has finished. The time between the two events is related to total write speed, but it can vary due to processes going on in internal disk controller. The fluctuations in disk response time can be an important parameter for high bandwidth streaming applications in embedded systems as this value allows to estimate the data buffer size needed during recording, but this may be not very critical parameter for typical PC applications as modern computers are equipped with large amount of RAM. We have not found any suitable parameter in disk specifications we had which would give us a hint for the buffer size estimation and developed a small test program for this purpose.

This program basically resembles camogm (in-camera recording program) in its operation and allows us to write repeating blocks of data containing counter value and then check the consistency of the data written. This program works directly with disk driver and collects some statistics during its operation. Disk driver, among other things, measures the time between two events: when write command is issued and when command completion interrupt from controller is received. This time can be used to measure disk write speed as the amount of data sent to disk with each command is also known. In general, this time slightly floats around its average value given that the amount of data written with each command is almost the same. But long run tests have shown that sometimes the interrupt return time after write command can be much longer then the average time.

We decided to investigate this situation in a little bit more details and tested two SSDs with our test program. The disks used for tests were SanDisk SD8SMAT128G1122 and Crucial CT250MX200SSD6, both were connected to eSATA camera port over M.2 SSD adapter. We used these disks before and they demonstrated different performance during recording. We ran camogm_test to write 3 MB blocks of data in cyclic mode. The program collected delayed interrupt times reported by driver as well as the amount of data written since the last delay event. The processed results of the test:

crucial-irq-distribution_bars_1
sandisk-irq-distribution_bars_1

Actual points of interest on these charts are circled in red and they show those delays that are noticeably different from average values. Below is the same data in table form:

Disk Average IRQ reception time, ms Standard deviation, ms Average IRQ delay time, ms Standard deviation, ms Data recorded since last IRQ delay, GB Standard deviation, GB
CT250MX200SSD6 (250 GB) 11.9 1.1 804 12.7 499.7 111.7
SD8SMAT128G1122 (128 GB) 19.3 4.8 113 6.5 231.5 11.5

The delayed interrupt times of these disks are considerably different although the difference in average interrupt times which reflect disk write speeds is not that big. It is interesting to notice that the amount of data written to disk between two consecutive interrupt delays is almost twice the total disk size. smartctl reported the increase of Runtime_Bad_Block attribute for CT250MX200SSD6 after each delay but the delays occurred each time on different LBAs. Unfortunately, SD8SMAT128G1122 does not have such parameter in its smartctl attributes and it is difficult to compare the two disks by this parameter.


Leave a Reply

Your email address will not be published. Required fields are marked *


seven − = 0