March 14, 2016

AHCI platform driver

by Mikhail Karpenko


In kernels prior to 2.6.x AHCI was only supported through PCI and hence required custom patches to support platform AHCI implementation. All modern kernels have SATA support as part of AHCI framework which significantly simplifies driver development. Platform drivers follow the standard driver model convention which is described in Documentation/driver-model/platform.txt in kernel source tree and provide methods called during discovery or enumeration in their platform_driver structure. This structure is used to register platform driver and is passed to module_platform_driver() helper macro which replaces module_init() and module_exit() functions. We redefined probe() and remove() methods of platform_driver in our driver to initialize/deinitialize resources defined in device tree and allocate/deallocate memory for driver specific structure. We also opted to resource-managed function devm_kzalloc() as it seems to be preferred way of resource allocation in modern drivers. The memory allocated with resource-managed function is associated with the device and will be freed automatically after driver is unloaded.


As Andrey has already pointed out in his post, current implementation of AHCI controller has several limitations and our platform driver is affected by two of them.
First, there is a deviation from AHCI specification which should be considered during platform driver implementation. The specification defines that host bus adapter uses system memory for the Command List Structure, Received FIS Structure and Command Tables. The common approach in platform drivers is to allocate a block of system memory with single dmam_alloc_coherent() call, set pointers to different structures inside this block and store these pointers in port specific structure ahci_port_priv. The first two of these structures in x393_sata are stored in the FPGA RAM blocks and mapped to register memory as it was easier to make them this way. Thus we need to allocate a block of system memory for Command Tables only and set other pointers to predefined addresses.
Second, and the most significant one from the driver’s point of view, proved to be single command slot implemented. Low level drivers assume that all 32 slots in Command List Structure are implemented and explicitly use the last slot for internal commands in ata_exec_internal_sg() function as shown in the following code snippet:
struct ata_queued_cmd *qc;
unsigned int tag, preempted_tag;
if (ap->ops->error_handler)
    tag = 0;
qc = __ata_qc_from_tag(ap, tag);

ATA_TAG_INTERNAL is defined in libata.h and reserved for internal commands. We wanted to keep all the code of our driver in our own sources and make as fewer changes to existing Linux drivers as possible to simplify further development and upgrade to newer kernels. So we decided that substitution of the command tag in our own code which handles command preparation would be the easiest way of fixing this issue.


Proper platform driver initialization requires that several structures to be prepared and passed to platform functions during driver probing. One of them is scsi_host_template and it serves as a direct interface between middle level drivers and low level drivers. Most AHCI drivers use default AHCI_SHT macro to fill the structure with predefined values. This structure contains a field called .can_queue which is of particular interest for us. The .can_queue field sets the maximum number of simultaneous commands the host bus adapter can accept and this is the way to tell middle level drivers that our controller has only one command slot. The scsi_host_template structure was redefined in our driver as follows:
static struct scsi_host_template ahci_platform_sht = {
    .can_queue = 1,
    .sg_tablesize = AHCI_MAX_SG,
    .dma_boundary = AHCI_DMA_BOUNDARY,
    .shost_attrs = ahci_shost_attrs,
    .sdev_attrs = ahci_sdev_attrs,

Unfortunately, ATA layer driver does not take into consideration the value we set in this template and uses hard coded tag value for its internal commands as I pointed out earlier, so we had to fix this in command preparation handler.
ata_port_operations is another important driver structure as it controls how the low level driver interfaces with upper layers. This structure is defined as follows:
static struct ata_port_operations ahci_elphel_ops = {
    .inherits = &ahci_ops,
    .port_start = elphel_port_start,
    .qc_prep = elphel_qc_prep,

The port start and command preparation handlers were redefined to add some implementation specific code. .port_start is used to allocate memory for Command Table and set pointers to Command List Structure and Received FIS Structure. We decided to use streaming DMA mapping instead of coherent DMA mapping used in generic AHCI driver as explained in Andrey’s article. .qc_prep is used to change the tag of current command and organize proper access to DMA mapped buffer.


We used debug code in the driver along with profiling code in the controller to estimate overall performance and found out that upper driver layers introduce significant delays in command execution sequence. The delay between last DMA transaction in a sequence of transactions and next command could be as high as 2 ms. There are various sources of overhead which could lead to delays, for instance, file system operations and context switches in the operating system. We will try to use read/write operations on a raw device to improve performance.


AHCI/SATA stack under GNU GPL
GitHub: AHCI driver source code

Leave a Reply

Your email address will not be published. Required fields are marked *

+ 1 = seven