November 6, 2010

Elphel cameras and 4096 (4K) sector drives – solved

by Andrey Filippov

We were assembling four of the nc353_369_hdd camera kits with internal 1.8″ HDD. Unfortunately, when we assembled them – none worked, on the serial console output we could see unhandled interrupts leading to kernel panic when we just tried to format the disk. And the same on all of the 4 cameras, all of the 4 disks, so that could not be a problem with the particular board or device.

Yes, the disks we purchased – 120GB Toshiba MK1231GAL were new for us – we never used them before, but those that we tested earlier were not available. So we looked at the specs and concluded that they should work – same PATA, same ZIF connectors.

When we found that there is some problem, I started debugging drivers in the only way I’m familiar with (not to count oscilloscope) – adding a bunch of printk() throughout the code trying to find out where the things are starting to get wrong. I found some reported errors during initialization (reported in dmesg as DriveReady SeekComplete Error ) – driver was trying to specify disk geometry – the command that is unsupported by some of the modern disks (WIN_SPECIFY) and there was even a note in the comments in the driver, but fixing that just made the dmesg output cleaner, not solving the actual problem. So now everything in the debug output seemed to be OK – up to the point the system tried to read sector number 63 (the sector used for the start of the first disk partition in the “DOS compatibility mode” (default in fdisk). And that was the command that caused error, unhandled interrupts with the following debug output:

ow: data 0x20, reg 0x14000000 (0x2) 20->COUNT
ow: data 0x3f, reg 0x16000000 (0x3) 3F->LBAL
ow: data 0x0, reg 0x18000000 (0x4) 00->LBAM
ow: data 0x0, reg 0x1a000000 (0x5) 00->LMAH
ow: data 0xe0, reg 0x1c000000 (0x6) E0->DEVICE
ow: data 0x25, reg 0x1e000000 (0x7) 25->COMMAND "READ DMA EXT"
hda: dma_intr: status=0x80 { Busy }
ide: failed opcode was: unknown
hda: ide_ata_error()
ow: data 0xe1, reg 0x1e000000 (0x7)
ATA timeout reg 0x1e000000 := 0xe1
hda: DMA disabled
ow: data 0xe, reg 0x2c000000 (0x6)
ATA timeout reg 0x2c000000 := 0xe
ow: data 0xa, reg 0x2c000000 (0x6)
ATA timeout reg 0x2c000000 := 0xa
irq 65: nobody cared (try booting with the "irqpoll" option)

and eventually the kernel panic.

That “3F”(63) gave me a hint that the disk can  not read some sectors, and searching the Internet I immediately found that many of the modern hard drives have 4096 (instead of the old 512) byte physical sectors (LBA still addresses “logical” ones of 512 bytes). In the ATA specs I found that the ID word 106 has the physical sector info, so I looked at that data read from these new drives – and yes, there was written 0x6003, meaning that the physical sectors are multiple of the logical ones, logical is 512 and the physical is – yes, 4096. I never dealt with such problem before and the 2-page PDF with the drive specs does list a lot of parameters, but nothing about the 4k sectors. Maybe this drive came from the future when nobody remembers that some ancient disks used to have tiny 512 byte sectors? And that parameter is of no importance for the users? The sector size is definitely a concern now, in 2010.

Initially that problem seemed a major one to me, I tried to follow the recommendations on the Internet (align partitions to physical sectors, format with the block size of 4096, …), but discussion was rather recent, and the camera has kernel 2.6.19 – not the very latest. That kernel was heavily patched by Axis (CPU manufacturers, authors of arch/cris) and we did a lot of our modifications on top of that, so upgrading the kernel would be rather difficult – we planned to use the latest one only with the next hardware. And as I could notice on our computers kernel started to report physical sector size in /sys between 2.6.29 (what our Ubuntu 9.04 computer has) and 2.6.32 (Ubuntu 9.10) – and that was for SCSI disks.
I looked at file with definitions of the ATA registers (hdreg.h) in the latest kernels – it still had the same gap in the definitions

unsigned short words104_125[22];/* reserved words 104-125 */

That I modified to include the physical sector definition following ATA documentation:

unsigned short words104_105[2];/* reserved words 104-105 */
unsigned short phys_sector; /* Physical/Logical sector sizes
* 15: Shall be ZERO
* 14: Shall be ONE
* 13: 1 = Device has multiple logical sectors per physical sector.
* 12: 1= Device Logical Sector Longer than 256 Words
* 11:4 Reserved
* 3:0 2^x logical sectors per physical sector
unsigned short words107_125[19];/* reserved words 107-125 */

The fact that that file was not modified in latest kernel lead me to the wrong conclusion that modern kernels can only read SCSI disks sector size, because the SCSI emulation layer for the PATA disks still has to read that info from the physical device using ATA commands, and that file was not modified.
Stefan was able to find the file that does in fact read this data – in other file (drivers/ata/libata-scsi.c) and it is not using a single header file with register definitions, but in a some “hacker” way I would say, just referencing “word 106” of the device ID array directly. That was something I did not expect from the mature code base, it can be a nightmare when direct physical addresses are scattered through many thousands of the files instead of being defined in a single file for particular device or standard:

2402 u16 word_106 = dev->id[106];
2403 u16 word_209 = dev->id[209];
2405 if ((word_106 & 0xc000) == 0x4000) {
2406 /* Number and offset of logical sectors per physical sector */
2407 if (word_106 & (1 << 13))
2408 log_per_phys = word_106 & 0xf;
2409 if ((word_209 & 0xc000) == 0x4000) {
2410 u16 first = dev->id[209] & 0x3fff;
2411 if (first > 0)
2412 lowest_aligned = (1 << log_per_phys) - first;
2413 }
2414 }

But anyway, that was not a real solution for the camera (to say nothing that I just failed to find that file myself) – camera uses older kernel and does not use SCSI layer for the IDE devices. And porting it and making work would just add an extra software layer in the camera where the 200MHz CPU is already a bottleneck between the FPGA and the storage/network devices.

So I was ready to give up and buy new (actually old) disks with 512 byte blocks for the cameras, when I looked in the book that often helped me before: Linux Device Drivers, Third Edition, by Jonathan Corbet, Alessandro Rubini, Greg Kroah-Hartman. It said:
… Getting a device with a different sector size to work is not particular hard; it is just a matter of taking care of a few details…
So I just followed that advice, used the mentioned blk_que_hardsect_size() function, looked at an example of that function usage in the source of the CD driver (ide-cd.c)  and added interpretation of the disk ID word 106 to ide-probe.c – that really was not particularly hard.

After that done the mkfs.ext2 immediately worked, no more crashes. There were few minor glitches in busybox fdisk utility – it was possible to partition 4K sector disk with it, but the program incorrectly suggested the total number of sectors for partition size, reported an error if the partition does not end on “cylinder” boundary, so I fixed them and now the disks attached to the camera can be partitioned with “enter-enter…”. And made a cosmetic addition to busybox hdparm to report large sectors.

So it seems that now Elphel cameras are 4k-sector ready.  Starting with the 8.0.9 firmware.

One response to “Elphel cameras and 4096 (4K) sector drives – solved”

  1. Thanks a lot for this in depth kernel hack tutorial. It must have been a nightmare. I’ll have a look at the book myself.

Leave a Reply

Your email address will not be published. Required fields are marked *

+ six = 7