mirror of https://github.com/Qortal/Brooklyn
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
865 lines
35 KiB
865 lines
35 KiB
=============================== |
|
Adjunct Processor (AP) facility |
|
=============================== |
|
|
|
|
|
Introduction |
|
============ |
|
The Adjunct Processor (AP) facility is an IBM Z cryptographic facility comprised |
|
of three AP instructions and from 1 up to 256 PCIe cryptographic adapter cards. |
|
The AP devices provide cryptographic functions to all CPUs assigned to a |
|
linux system running in an IBM Z system LPAR. |
|
|
|
The AP adapter cards are exposed via the AP bus. The motivation for vfio-ap |
|
is to make AP cards available to KVM guests using the VFIO mediated device |
|
framework. This implementation relies considerably on the s390 virtualization |
|
facilities which do most of the hard work of providing direct access to AP |
|
devices. |
|
|
|
AP Architectural Overview |
|
========================= |
|
To facilitate the comprehension of the design, let's start with some |
|
definitions: |
|
|
|
* AP adapter |
|
|
|
An AP adapter is an IBM Z adapter card that can perform cryptographic |
|
functions. There can be from 0 to 256 adapters assigned to an LPAR. Adapters |
|
assigned to the LPAR in which a linux host is running will be available to |
|
the linux host. Each adapter is identified by a number from 0 to 255; however, |
|
the maximum adapter number is determined by machine model and/or adapter type. |
|
When installed, an AP adapter is accessed by AP instructions executed by any |
|
CPU. |
|
|
|
The AP adapter cards are assigned to a given LPAR via the system's Activation |
|
Profile which can be edited via the HMC. When the linux host system is IPL'd |
|
in the LPAR, the AP bus detects the AP adapter cards assigned to the LPAR and |
|
creates a sysfs device for each assigned adapter. For example, if AP adapters |
|
4 and 10 (0x0a) are assigned to the LPAR, the AP bus will create the following |
|
sysfs device entries:: |
|
|
|
/sys/devices/ap/card04 |
|
/sys/devices/ap/card0a |
|
|
|
Symbolic links to these devices will also be created in the AP bus devices |
|
sub-directory:: |
|
|
|
/sys/bus/ap/devices/[card04] |
|
/sys/bus/ap/devices/[card04] |
|
|
|
* AP domain |
|
|
|
An adapter is partitioned into domains. An adapter can hold up to 256 domains |
|
depending upon the adapter type and hardware configuration. A domain is |
|
identified by a number from 0 to 255; however, the maximum domain number is |
|
determined by machine model and/or adapter type.. A domain can be thought of |
|
as a set of hardware registers and memory used for processing AP commands. A |
|
domain can be configured with a secure private key used for clear key |
|
encryption. A domain is classified in one of two ways depending upon how it |
|
may be accessed: |
|
|
|
* Usage domains are domains that are targeted by an AP instruction to |
|
process an AP command. |
|
|
|
* Control domains are domains that are changed by an AP command sent to a |
|
usage domain; for example, to set the secure private key for the control |
|
domain. |
|
|
|
The AP usage and control domains are assigned to a given LPAR via the system's |
|
Activation Profile which can be edited via the HMC. When a linux host system |
|
is IPL'd in the LPAR, the AP bus module detects the AP usage and control |
|
domains assigned to the LPAR. The domain number of each usage domain and |
|
adapter number of each AP adapter are combined to create AP queue devices |
|
(see AP Queue section below). The domain number of each control domain will be |
|
represented in a bitmask and stored in a sysfs file |
|
/sys/bus/ap/ap_control_domain_mask. The bits in the mask, from most to least |
|
significant bit, correspond to domains 0-255. |
|
|
|
* AP Queue |
|
|
|
An AP queue is the means by which an AP command is sent to a usage domain |
|
inside a specific adapter. An AP queue is identified by a tuple |
|
comprised of an AP adapter ID (APID) and an AP queue index (APQI). The |
|
APQI corresponds to a given usage domain number within the adapter. This tuple |
|
forms an AP Queue Number (APQN) uniquely identifying an AP queue. AP |
|
instructions include a field containing the APQN to identify the AP queue to |
|
which the AP command is to be sent for processing. |
|
|
|
The AP bus will create a sysfs device for each APQN that can be derived from |
|
the cross product of the AP adapter and usage domain numbers detected when the |
|
AP bus module is loaded. For example, if adapters 4 and 10 (0x0a) and usage |
|
domains 6 and 71 (0x47) are assigned to the LPAR, the AP bus will create the |
|
following sysfs entries:: |
|
|
|
/sys/devices/ap/card04/04.0006 |
|
/sys/devices/ap/card04/04.0047 |
|
/sys/devices/ap/card0a/0a.0006 |
|
/sys/devices/ap/card0a/0a.0047 |
|
|
|
The following symbolic links to these devices will be created in the AP bus |
|
devices subdirectory:: |
|
|
|
/sys/bus/ap/devices/[04.0006] |
|
/sys/bus/ap/devices/[04.0047] |
|
/sys/bus/ap/devices/[0a.0006] |
|
/sys/bus/ap/devices/[0a.0047] |
|
|
|
* AP Instructions: |
|
|
|
There are three AP instructions: |
|
|
|
* NQAP: to enqueue an AP command-request message to a queue |
|
* DQAP: to dequeue an AP command-reply message from a queue |
|
* PQAP: to administer the queues |
|
|
|
AP instructions identify the domain that is targeted to process the AP |
|
command; this must be one of the usage domains. An AP command may modify a |
|
domain that is not one of the usage domains, but the modified domain |
|
must be one of the control domains. |
|
|
|
AP and SIE |
|
========== |
|
Let's now take a look at how AP instructions executed on a guest are interpreted |
|
by the hardware. |
|
|
|
A satellite control block called the Crypto Control Block (CRYCB) is attached to |
|
our main hardware virtualization control block. The CRYCB contains three fields |
|
to identify the adapters, usage domains and control domains assigned to the KVM |
|
guest: |
|
|
|
* The AP Mask (APM) field is a bit mask that identifies the AP adapters assigned |
|
to the KVM guest. Each bit in the mask, from left to right (i.e. from most |
|
significant to least significant bit in big endian order), corresponds to |
|
an APID from 0-255. If a bit is set, the corresponding adapter is valid for |
|
use by the KVM guest. |
|
|
|
* The AP Queue Mask (AQM) field is a bit mask identifying the AP usage domains |
|
assigned to the KVM guest. Each bit in the mask, from left to right (i.e. from |
|
most significant to least significant bit in big endian order), corresponds to |
|
an AP queue index (APQI) from 0-255. If a bit is set, the corresponding queue |
|
is valid for use by the KVM guest. |
|
|
|
* The AP Domain Mask field is a bit mask that identifies the AP control domains |
|
assigned to the KVM guest. The ADM bit mask controls which domains can be |
|
changed by an AP command-request message sent to a usage domain from the |
|
guest. Each bit in the mask, from left to right (i.e. from most significant to |
|
least significant bit in big endian order), corresponds to a domain from |
|
0-255. If a bit is set, the corresponding domain can be modified by an AP |
|
command-request message sent to a usage domain. |
|
|
|
If you recall from the description of an AP Queue, AP instructions include |
|
an APQN to identify the AP queue to which an AP command-request message is to be |
|
sent (NQAP and PQAP instructions), or from which a command-reply message is to |
|
be received (DQAP instruction). The validity of an APQN is defined by the matrix |
|
calculated from the APM and AQM; it is the cross product of all assigned adapter |
|
numbers (APM) with all assigned queue indexes (AQM). For example, if adapters 1 |
|
and 2 and usage domains 5 and 6 are assigned to a guest, the APQNs (1,5), (1,6), |
|
(2,5) and (2,6) will be valid for the guest. |
|
|
|
The APQNs can provide secure key functionality - i.e., a private key is stored |
|
on the adapter card for each of its domains - so each APQN must be assigned to |
|
at most one guest or to the linux host:: |
|
|
|
Example 1: Valid configuration: |
|
------------------------------ |
|
Guest1: adapters 1,2 domains 5,6 |
|
Guest2: adapter 1,2 domain 7 |
|
|
|
This is valid because both guests have a unique set of APQNs: |
|
Guest1 has APQNs (1,5), (1,6), (2,5), (2,6); |
|
Guest2 has APQNs (1,7), (2,7) |
|
|
|
Example 2: Valid configuration: |
|
------------------------------ |
|
Guest1: adapters 1,2 domains 5,6 |
|
Guest2: adapters 3,4 domains 5,6 |
|
|
|
This is also valid because both guests have a unique set of APQNs: |
|
Guest1 has APQNs (1,5), (1,6), (2,5), (2,6); |
|
Guest2 has APQNs (3,5), (3,6), (4,5), (4,6) |
|
|
|
Example 3: Invalid configuration: |
|
-------------------------------- |
|
Guest1: adapters 1,2 domains 5,6 |
|
Guest2: adapter 1 domains 6,7 |
|
|
|
This is an invalid configuration because both guests have access to |
|
APQN (1,6). |
|
|
|
The Design |
|
========== |
|
The design introduces three new objects: |
|
|
|
1. AP matrix device |
|
2. VFIO AP device driver (vfio_ap.ko) |
|
3. VFIO AP mediated matrix pass-through device |
|
|
|
The VFIO AP device driver |
|
------------------------- |
|
The VFIO AP (vfio_ap) device driver serves the following purposes: |
|
|
|
1. Provides the interfaces to secure APQNs for exclusive use of KVM guests. |
|
|
|
2. Sets up the VFIO mediated device interfaces to manage a mediated matrix |
|
device and creates the sysfs interfaces for assigning adapters, usage |
|
domains, and control domains comprising the matrix for a KVM guest. |
|
|
|
3. Configures the APM, AQM and ADM in the CRYCB referenced by a KVM guest's |
|
SIE state description to grant the guest access to a matrix of AP devices |
|
|
|
Reserve APQNs for exclusive use of KVM guests |
|
--------------------------------------------- |
|
The following block diagram illustrates the mechanism by which APQNs are |
|
reserved:: |
|
|
|
+------------------+ |
|
7 remove | | |
|
+--------------------> cex4queue driver | |
|
| | | |
|
| +------------------+ |
|
| |
|
| |
|
| +------------------+ +----------------+ |
|
| 5 register driver | | 3 create | | |
|
| +----------------> Device core +----------> matrix device | |
|
| | | | | | |
|
| | +--------^---------+ +----------------+ |
|
| | | |
|
| | +-------------------+ |
|
| | +-----------------------------------+ | |
|
| | | 4 register AP driver | | 2 register device |
|
| | | | | |
|
+--------+---+-v---+ +--------+-------+-+ |
|
| | | | |
|
| ap_bus +--------------------- > vfio_ap driver | |
|
| | 8 probe | | |
|
+--------^---------+ +--^--^------------+ |
|
6 edit | | | |
|
apmask | +-----------------------------+ | 9 mdev create |
|
aqmask | | 1 modprobe | |
|
+--------+-----+---+ +----------------+-+ +----------------+ |
|
| | | |8 create | mediated | |
|
| admin | | VFIO device core |---------> matrix | |
|
| + | | | device | |
|
+------+-+---------+ +--------^---------+ +--------^-------+ |
|
| | | | |
|
| | 9 create vfio_ap-passthrough | | |
|
| +------------------------------+ | |
|
+-------------------------------------------------------------+ |
|
10 assign adapter/domain/control domain |
|
|
|
The process for reserving an AP queue for use by a KVM guest is: |
|
|
|
1. The administrator loads the vfio_ap device driver |
|
2. The vfio-ap driver during its initialization will register a single 'matrix' |
|
device with the device core. This will serve as the parent device for |
|
all mediated matrix devices used to configure an AP matrix for a guest. |
|
3. The /sys/devices/vfio_ap/matrix device is created by the device core |
|
4. The vfio_ap device driver will register with the AP bus for AP queue devices |
|
of type 10 and higher (CEX4 and newer). The driver will provide the vfio_ap |
|
driver's probe and remove callback interfaces. Devices older than CEX4 queues |
|
are not supported to simplify the implementation by not needlessly |
|
complicating the design by supporting older devices that will go out of |
|
service in the relatively near future, and for which there are few older |
|
systems around on which to test. |
|
5. The AP bus registers the vfio_ap device driver with the device core |
|
6. The administrator edits the AP adapter and queue masks to reserve AP queues |
|
for use by the vfio_ap device driver. |
|
7. The AP bus removes the AP queues reserved for the vfio_ap driver from the |
|
default zcrypt cex4queue driver. |
|
8. The AP bus probes the vfio_ap device driver to bind the queues reserved for |
|
it. |
|
9. The administrator creates a passthrough type mediated matrix device to be |
|
used by a guest |
|
10. The administrator assigns the adapters, usage domains and control domains |
|
to be exclusively used by a guest. |
|
|
|
Set up the VFIO mediated device interfaces |
|
------------------------------------------ |
|
The VFIO AP device driver utilizes the common interface of the VFIO mediated |
|
device core driver to: |
|
|
|
* Register an AP mediated bus driver to add a mediated matrix device to and |
|
remove it from a VFIO group. |
|
* Create and destroy a mediated matrix device |
|
* Add a mediated matrix device to and remove it from the AP mediated bus driver |
|
* Add a mediated matrix device to and remove it from an IOMMU group |
|
|
|
The following high-level block diagram shows the main components and interfaces |
|
of the VFIO AP mediated matrix device driver:: |
|
|
|
+-------------+ |
|
| | |
|
| +---------+ | mdev_register_driver() +--------------+ |
|
| | Mdev | +<-----------------------+ | |
|
| | bus | | | vfio_mdev.ko | |
|
| | driver | +----------------------->+ |<-> VFIO user |
|
| +---------+ | probe()/remove() +--------------+ APIs |
|
| | |
|
| MDEV CORE | |
|
| MODULE | |
|
| mdev.ko | |
|
| +---------+ | mdev_register_device() +--------------+ |
|
| |Physical | +<-----------------------+ | |
|
| | device | | | vfio_ap.ko |<-> matrix |
|
| |interface| +----------------------->+ | device |
|
| +---------+ | callback +--------------+ |
|
+-------------+ |
|
|
|
During initialization of the vfio_ap module, the matrix device is registered |
|
with an 'mdev_parent_ops' structure that provides the sysfs attribute |
|
structures, mdev functions and callback interfaces for managing the mediated |
|
matrix device. |
|
|
|
* sysfs attribute structures: |
|
|
|
supported_type_groups |
|
The VFIO mediated device framework supports creation of user-defined |
|
mediated device types. These mediated device types are specified |
|
via the 'supported_type_groups' structure when a device is registered |
|
with the mediated device framework. The registration process creates the |
|
sysfs structures for each mediated device type specified in the |
|
'mdev_supported_types' sub-directory of the device being registered. Along |
|
with the device type, the sysfs attributes of the mediated device type are |
|
provided. |
|
|
|
The VFIO AP device driver will register one mediated device type for |
|
passthrough devices: |
|
|
|
/sys/devices/vfio_ap/matrix/mdev_supported_types/vfio_ap-passthrough |
|
|
|
Only the read-only attributes required by the VFIO mdev framework will |
|
be provided:: |
|
|
|
... name |
|
... device_api |
|
... available_instances |
|
... device_api |
|
|
|
Where: |
|
|
|
* name: |
|
specifies the name of the mediated device type |
|
* device_api: |
|
the mediated device type's API |
|
* available_instances: |
|
the number of mediated matrix passthrough devices |
|
that can be created |
|
* device_api: |
|
specifies the VFIO API |
|
mdev_attr_groups |
|
This attribute group identifies the user-defined sysfs attributes of the |
|
mediated device. When a device is registered with the VFIO mediated device |
|
framework, the sysfs attribute files identified in the 'mdev_attr_groups' |
|
structure will be created in the mediated matrix device's directory. The |
|
sysfs attributes for a mediated matrix device are: |
|
|
|
assign_adapter / unassign_adapter: |
|
Write-only attributes for assigning/unassigning an AP adapter to/from the |
|
mediated matrix device. To assign/unassign an adapter, the APID of the |
|
adapter is echoed to the respective attribute file. |
|
assign_domain / unassign_domain: |
|
Write-only attributes for assigning/unassigning an AP usage domain to/from |
|
the mediated matrix device. To assign/unassign a domain, the domain |
|
number of the usage domain is echoed to the respective attribute |
|
file. |
|
matrix: |
|
A read-only file for displaying the APQNs derived from the cross product |
|
of the adapter and domain numbers assigned to the mediated matrix device. |
|
assign_control_domain / unassign_control_domain: |
|
Write-only attributes for assigning/unassigning an AP control domain |
|
to/from the mediated matrix device. To assign/unassign a control domain, |
|
the ID of the domain to be assigned/unassigned is echoed to the respective |
|
attribute file. |
|
control_domains: |
|
A read-only file for displaying the control domain numbers assigned to the |
|
mediated matrix device. |
|
|
|
* functions: |
|
|
|
create: |
|
allocates the ap_matrix_mdev structure used by the vfio_ap driver to: |
|
|
|
* Store the reference to the KVM structure for the guest using the mdev |
|
* Store the AP matrix configuration for the adapters, domains, and control |
|
domains assigned via the corresponding sysfs attributes files |
|
|
|
remove: |
|
deallocates the mediated matrix device's ap_matrix_mdev structure. This will |
|
be allowed only if a running guest is not using the mdev. |
|
|
|
* callback interfaces |
|
|
|
open: |
|
The vfio_ap driver uses this callback to register a |
|
VFIO_GROUP_NOTIFY_SET_KVM notifier callback function for the mdev matrix |
|
device. The open is invoked when QEMU connects the VFIO iommu group |
|
for the mdev matrix device to the MDEV bus. Access to the KVM structure used |
|
to configure the KVM guest is provided via this callback. The KVM structure, |
|
is used to configure the guest's access to the AP matrix defined via the |
|
mediated matrix device's sysfs attribute files. |
|
release: |
|
unregisters the VFIO_GROUP_NOTIFY_SET_KVM notifier callback function for the |
|
mdev matrix device and deconfigures the guest's AP matrix. |
|
|
|
Configure the APM, AQM and ADM in the CRYCB |
|
------------------------------------------- |
|
Configuring the AP matrix for a KVM guest will be performed when the |
|
VFIO_GROUP_NOTIFY_SET_KVM notifier callback is invoked. The notifier |
|
function is called when QEMU connects to KVM. The guest's AP matrix is |
|
configured via it's CRYCB by: |
|
|
|
* Setting the bits in the APM corresponding to the APIDs assigned to the |
|
mediated matrix device via its 'assign_adapter' interface. |
|
* Setting the bits in the AQM corresponding to the domains assigned to the |
|
mediated matrix device via its 'assign_domain' interface. |
|
* Setting the bits in the ADM corresponding to the domain dIDs assigned to the |
|
mediated matrix device via its 'assign_control_domains' interface. |
|
|
|
The CPU model features for AP |
|
----------------------------- |
|
The AP stack relies on the presence of the AP instructions as well as two |
|
facilities: The AP Facilities Test (APFT) facility; and the AP Query |
|
Configuration Information (QCI) facility. These features/facilities are made |
|
available to a KVM guest via the following CPU model features: |
|
|
|
1. ap: Indicates whether the AP instructions are installed on the guest. This |
|
feature will be enabled by KVM only if the AP instructions are installed |
|
on the host. |
|
|
|
2. apft: Indicates the APFT facility is available on the guest. This facility |
|
can be made available to the guest only if it is available on the host (i.e., |
|
facility bit 15 is set). |
|
|
|
3. apqci: Indicates the AP QCI facility is available on the guest. This facility |
|
can be made available to the guest only if it is available on the host (i.e., |
|
facility bit 12 is set). |
|
|
|
Note: If the user chooses to specify a CPU model different than the 'host' |
|
model to QEMU, the CPU model features and facilities need to be turned on |
|
explicitly; for example:: |
|
|
|
/usr/bin/qemu-system-s390x ... -cpu z13,ap=on,apqci=on,apft=on |
|
|
|
A guest can be precluded from using AP features/facilities by turning them off |
|
explicitly; for example:: |
|
|
|
/usr/bin/qemu-system-s390x ... -cpu host,ap=off,apqci=off,apft=off |
|
|
|
Note: If the APFT facility is turned off (apft=off) for the guest, the guest |
|
will not see any AP devices. The zcrypt device drivers that register for type 10 |
|
and newer AP devices - i.e., the cex4card and cex4queue device drivers - need |
|
the APFT facility to ascertain the facilities installed on a given AP device. If |
|
the APFT facility is not installed on the guest, then the probe of device |
|
drivers will fail since only type 10 and newer devices can be configured for |
|
guest use. |
|
|
|
Example |
|
======= |
|
Let's now provide an example to illustrate how KVM guests may be given |
|
access to AP facilities. For this example, we will show how to configure |
|
three guests such that executing the lszcrypt command on the guests would |
|
look like this: |
|
|
|
Guest1 |
|
------ |
|
=========== ===== ============ |
|
CARD.DOMAIN TYPE MODE |
|
=========== ===== ============ |
|
05 CEX5C CCA-Coproc |
|
05.0004 CEX5C CCA-Coproc |
|
05.00ab CEX5C CCA-Coproc |
|
06 CEX5A Accelerator |
|
06.0004 CEX5A Accelerator |
|
06.00ab CEX5C CCA-Coproc |
|
=========== ===== ============ |
|
|
|
Guest2 |
|
------ |
|
=========== ===== ============ |
|
CARD.DOMAIN TYPE MODE |
|
=========== ===== ============ |
|
05 CEX5A Accelerator |
|
05.0047 CEX5A Accelerator |
|
05.00ff CEX5A Accelerator |
|
=========== ===== ============ |
|
|
|
Guest3 |
|
------ |
|
=========== ===== ============ |
|
CARD.DOMAIN TYPE MODE |
|
=========== ===== ============ |
|
06 CEX5A Accelerator |
|
06.0047 CEX5A Accelerator |
|
06.00ff CEX5A Accelerator |
|
=========== ===== ============ |
|
|
|
These are the steps: |
|
|
|
1. Install the vfio_ap module on the linux host. The dependency chain for the |
|
vfio_ap module is: |
|
* iommu |
|
* s390 |
|
* zcrypt |
|
* vfio |
|
* vfio_mdev |
|
* vfio_mdev_device |
|
* KVM |
|
|
|
To build the vfio_ap module, the kernel build must be configured with the |
|
following Kconfig elements selected: |
|
* IOMMU_SUPPORT |
|
* S390 |
|
* ZCRYPT |
|
* S390_AP_IOMMU |
|
* VFIO |
|
* VFIO_MDEV |
|
* KVM |
|
|
|
If using make menuconfig select the following to build the vfio_ap module:: |
|
|
|
-> Device Drivers |
|
-> IOMMU Hardware Support |
|
select S390 AP IOMMU Support |
|
-> VFIO Non-Privileged userspace driver framework |
|
-> Mediated device driver frramework |
|
-> VFIO driver for Mediated devices |
|
-> I/O subsystem |
|
-> VFIO support for AP devices |
|
|
|
2. Secure the AP queues to be used by the three guests so that the host can not |
|
access them. To secure them, there are two sysfs files that specify |
|
bitmasks marking a subset of the APQN range as 'usable by the default AP |
|
queue device drivers' or 'not usable by the default device drivers' and thus |
|
available for use by the vfio_ap device driver'. The location of the sysfs |
|
files containing the masks are:: |
|
|
|
/sys/bus/ap/apmask |
|
/sys/bus/ap/aqmask |
|
|
|
The 'apmask' is a 256-bit mask that identifies a set of AP adapter IDs |
|
(APID). Each bit in the mask, from left to right (i.e., from most significant |
|
to least significant bit in big endian order), corresponds to an APID from |
|
0-255. If a bit is set, the APID is marked as usable only by the default AP |
|
queue device drivers; otherwise, the APID is usable by the vfio_ap |
|
device driver. |
|
|
|
The 'aqmask' is a 256-bit mask that identifies a set of AP queue indexes |
|
(APQI). Each bit in the mask, from left to right (i.e., from most significant |
|
to least significant bit in big endian order), corresponds to an APQI from |
|
0-255. If a bit is set, the APQI is marked as usable only by the default AP |
|
queue device drivers; otherwise, the APQI is usable by the vfio_ap device |
|
driver. |
|
|
|
Take, for example, the following mask:: |
|
|
|
0x7dffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff |
|
|
|
It indicates: |
|
|
|
1, 2, 3, 4, 5, and 7-255 belong to the default drivers' pool, and 0 and 6 |
|
belong to the vfio_ap device driver's pool. |
|
|
|
The APQN of each AP queue device assigned to the linux host is checked by the |
|
AP bus against the set of APQNs derived from the cross product of APIDs |
|
and APQIs marked as usable only by the default AP queue device drivers. If a |
|
match is detected, only the default AP queue device drivers will be probed; |
|
otherwise, the vfio_ap device driver will be probed. |
|
|
|
By default, the two masks are set to reserve all APQNs for use by the default |
|
AP queue device drivers. There are two ways the default masks can be changed: |
|
|
|
1. The sysfs mask files can be edited by echoing a string into the |
|
respective sysfs mask file in one of two formats: |
|
|
|
* An absolute hex string starting with 0x - like "0x12345678" - sets |
|
the mask. If the given string is shorter than the mask, it is padded |
|
with 0s on the right; for example, specifying a mask value of 0x41 is |
|
the same as specifying:: |
|
|
|
0x4100000000000000000000000000000000000000000000000000000000000000 |
|
|
|
Keep in mind that the mask reads from left to right (i.e., most |
|
significant to least significant bit in big endian order), so the mask |
|
above identifies device numbers 1 and 7 (01000001). |
|
|
|
If the string is longer than the mask, the operation is terminated with |
|
an error (EINVAL). |
|
|
|
* Individual bits in the mask can be switched on and off by specifying |
|
each bit number to be switched in a comma separated list. Each bit |
|
number string must be prepended with a ('+') or minus ('-') to indicate |
|
the corresponding bit is to be switched on ('+') or off ('-'). Some |
|
valid values are: |
|
|
|
- "+0" switches bit 0 on |
|
- "-13" switches bit 13 off |
|
- "+0x41" switches bit 65 on |
|
- "-0xff" switches bit 255 off |
|
|
|
The following example: |
|
|
|
+0,-6,+0x47,-0xf0 |
|
|
|
Switches bits 0 and 71 (0x47) on |
|
|
|
Switches bits 6 and 240 (0xf0) off |
|
|
|
Note that the bits not specified in the list remain as they were before |
|
the operation. |
|
|
|
2. The masks can also be changed at boot time via parameters on the kernel |
|
command line like this: |
|
|
|
ap.apmask=0xffff ap.aqmask=0x40 |
|
|
|
This would create the following masks:: |
|
|
|
apmask: |
|
0xffff000000000000000000000000000000000000000000000000000000000000 |
|
|
|
aqmask: |
|
0x4000000000000000000000000000000000000000000000000000000000000000 |
|
|
|
Resulting in these two pools:: |
|
|
|
default drivers pool: adapter 0-15, domain 1 |
|
alternate drivers pool: adapter 16-255, domains 0, 2-255 |
|
|
|
Securing the APQNs for our example |
|
---------------------------------- |
|
To secure the AP queues 05.0004, 05.0047, 05.00ab, 05.00ff, 06.0004, 06.0047, |
|
06.00ab, and 06.00ff for use by the vfio_ap device driver, the corresponding |
|
APQNs can either be removed from the default masks:: |
|
|
|
echo -5,-6 > /sys/bus/ap/apmask |
|
|
|
echo -4,-0x47,-0xab,-0xff > /sys/bus/ap/aqmask |
|
|
|
Or the masks can be set as follows:: |
|
|
|
echo 0xf9ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff \ |
|
> apmask |
|
|
|
echo 0xf7fffffffffffffffeffffffffffffffffffffffffeffffffffffffffffffffe \ |
|
> aqmask |
|
|
|
This will result in AP queues 05.0004, 05.0047, 05.00ab, 05.00ff, 06.0004, |
|
06.0047, 06.00ab, and 06.00ff getting bound to the vfio_ap device driver. The |
|
sysfs directory for the vfio_ap device driver will now contain symbolic links |
|
to the AP queue devices bound to it:: |
|
|
|
/sys/bus/ap |
|
... [drivers] |
|
...... [vfio_ap] |
|
......... [05.0004] |
|
......... [05.0047] |
|
......... [05.00ab] |
|
......... [05.00ff] |
|
......... [06.0004] |
|
......... [06.0047] |
|
......... [06.00ab] |
|
......... [06.00ff] |
|
|
|
Keep in mind that only type 10 and newer adapters (i.e., CEX4 and later) |
|
can be bound to the vfio_ap device driver. The reason for this is to |
|
simplify the implementation by not needlessly complicating the design by |
|
supporting older devices that will go out of service in the relatively near |
|
future and for which there are few older systems on which to test. |
|
|
|
The administrator, therefore, must take care to secure only AP queues that |
|
can be bound to the vfio_ap device driver. The device type for a given AP |
|
queue device can be read from the parent card's sysfs directory. For example, |
|
to see the hardware type of the queue 05.0004: |
|
|
|
cat /sys/bus/ap/devices/card05/hwtype |
|
|
|
The hwtype must be 10 or higher (CEX4 or newer) in order to be bound to the |
|
vfio_ap device driver. |
|
|
|
3. Create the mediated devices needed to configure the AP matrixes for the |
|
three guests and to provide an interface to the vfio_ap driver for |
|
use by the guests:: |
|
|
|
/sys/devices/vfio_ap/matrix/ |
|
--- [mdev_supported_types] |
|
------ [vfio_ap-passthrough] (passthrough mediated matrix device type) |
|
--------- create |
|
--------- [devices] |
|
|
|
To create the mediated devices for the three guests:: |
|
|
|
uuidgen > create |
|
uuidgen > create |
|
uuidgen > create |
|
|
|
or |
|
|
|
echo $uuid1 > create |
|
echo $uuid2 > create |
|
echo $uuid3 > create |
|
|
|
This will create three mediated devices in the [devices] subdirectory named |
|
after the UUID written to the create attribute file. We call them $uuid1, |
|
$uuid2 and $uuid3 and this is the sysfs directory structure after creation:: |
|
|
|
/sys/devices/vfio_ap/matrix/ |
|
--- [mdev_supported_types] |
|
------ [vfio_ap-passthrough] |
|
--------- [devices] |
|
------------ [$uuid1] |
|
--------------- assign_adapter |
|
--------------- assign_control_domain |
|
--------------- assign_domain |
|
--------------- matrix |
|
--------------- unassign_adapter |
|
--------------- unassign_control_domain |
|
--------------- unassign_domain |
|
|
|
------------ [$uuid2] |
|
--------------- assign_adapter |
|
--------------- assign_control_domain |
|
--------------- assign_domain |
|
--------------- matrix |
|
--------------- unassign_adapter |
|
----------------unassign_control_domain |
|
----------------unassign_domain |
|
|
|
------------ [$uuid3] |
|
--------------- assign_adapter |
|
--------------- assign_control_domain |
|
--------------- assign_domain |
|
--------------- matrix |
|
--------------- unassign_adapter |
|
----------------unassign_control_domain |
|
----------------unassign_domain |
|
|
|
4. The administrator now needs to configure the matrixes for the mediated |
|
devices $uuid1 (for Guest1), $uuid2 (for Guest2) and $uuid3 (for Guest3). |
|
|
|
This is how the matrix is configured for Guest1:: |
|
|
|
echo 5 > assign_adapter |
|
echo 6 > assign_adapter |
|
echo 4 > assign_domain |
|
echo 0xab > assign_domain |
|
|
|
Control domains can similarly be assigned using the assign_control_domain |
|
sysfs file. |
|
|
|
If a mistake is made configuring an adapter, domain or control domain, |
|
you can use the unassign_xxx files to unassign the adapter, domain or |
|
control domain. |
|
|
|
To display the matrix configuration for Guest1:: |
|
|
|
cat matrix |
|
|
|
This is how the matrix is configured for Guest2:: |
|
|
|
echo 5 > assign_adapter |
|
echo 0x47 > assign_domain |
|
echo 0xff > assign_domain |
|
|
|
This is how the matrix is configured for Guest3:: |
|
|
|
echo 6 > assign_adapter |
|
echo 0x47 > assign_domain |
|
echo 0xff > assign_domain |
|
|
|
In order to successfully assign an adapter: |
|
|
|
* The adapter number specified must represent a value from 0 up to the |
|
maximum adapter number configured for the system. If an adapter number |
|
higher than the maximum is specified, the operation will terminate with |
|
an error (ENODEV). |
|
|
|
* All APQNs that can be derived from the adapter ID and the IDs of |
|
the previously assigned domains must be bound to the vfio_ap device |
|
driver. If no domains have yet been assigned, then there must be at least |
|
one APQN with the specified APID bound to the vfio_ap driver. If no such |
|
APQNs are bound to the driver, the operation will terminate with an |
|
error (EADDRNOTAVAIL). |
|
|
|
No APQN that can be derived from the adapter ID and the IDs of the |
|
previously assigned domains can be assigned to another mediated matrix |
|
device. If an APQN is assigned to another mediated matrix device, the |
|
operation will terminate with an error (EADDRINUSE). |
|
|
|
In order to successfully assign a domain: |
|
|
|
* The domain number specified must represent a value from 0 up to the |
|
maximum domain number configured for the system. If a domain number |
|
higher than the maximum is specified, the operation will terminate with |
|
an error (ENODEV). |
|
|
|
* All APQNs that can be derived from the domain ID and the IDs of |
|
the previously assigned adapters must be bound to the vfio_ap device |
|
driver. If no domains have yet been assigned, then there must be at least |
|
one APQN with the specified APQI bound to the vfio_ap driver. If no such |
|
APQNs are bound to the driver, the operation will terminate with an |
|
error (EADDRNOTAVAIL). |
|
|
|
No APQN that can be derived from the domain ID and the IDs of the |
|
previously assigned adapters can be assigned to another mediated matrix |
|
device. If an APQN is assigned to another mediated matrix device, the |
|
operation will terminate with an error (EADDRINUSE). |
|
|
|
In order to successfully assign a control domain, the domain number |
|
specified must represent a value from 0 up to the maximum domain number |
|
configured for the system. If a control domain number higher than the maximum |
|
is specified, the operation will terminate with an error (ENODEV). |
|
|
|
5. Start Guest1:: |
|
|
|
/usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on \ |
|
-device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid1 ... |
|
|
|
7. Start Guest2:: |
|
|
|
/usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on \ |
|
-device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid2 ... |
|
|
|
7. Start Guest3:: |
|
|
|
/usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on \ |
|
-device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid3 ... |
|
|
|
When the guest is shut down, the mediated matrix devices may be removed. |
|
|
|
Using our example again, to remove the mediated matrix device $uuid1:: |
|
|
|
/sys/devices/vfio_ap/matrix/ |
|
--- [mdev_supported_types] |
|
------ [vfio_ap-passthrough] |
|
--------- [devices] |
|
------------ [$uuid1] |
|
--------------- remove |
|
|
|
:: |
|
|
|
echo 1 > remove |
|
|
|
This will remove all of the mdev matrix device's sysfs structures including |
|
the mdev device itself. To recreate and reconfigure the mdev matrix device, |
|
all of the steps starting with step 3 will have to be performed again. Note |
|
that the remove will fail if a guest using the mdev is still running. |
|
|
|
It is not necessary to remove an mdev matrix device, but one may want to |
|
remove it if no guest will use it during the remaining lifetime of the linux |
|
host. If the mdev matrix device is removed, one may want to also reconfigure |
|
the pool of adapters and queues reserved for use by the default drivers. |
|
|
|
Limitations |
|
=========== |
|
* The KVM/kernel interfaces do not provide a way to prevent restoring an APQN |
|
to the default drivers pool of a queue that is still assigned to a mediated |
|
device in use by a guest. It is incumbent upon the administrator to |
|
ensure there is no mediated device in use by a guest to which the APQN is |
|
assigned lest the host be given access to the private data of the AP queue |
|
device such as a private key configured specifically for the guest. |
|
|
|
* Dynamically modifying the AP matrix for a running guest (which would amount to |
|
hot(un)plug of AP devices for the guest) is currently not supported |
|
|
|
* Live guest migration is not supported for guests using AP devices.
|
|
|