mirror of https://github.com/Qortal/Brooklyn
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
98 lines
3.3 KiB
98 lines
3.3 KiB
.. SPDX-License-Identifier: GPL-2.0 |
|
|
|
====================== |
|
Memory Protection Keys |
|
====================== |
|
|
|
Memory Protection Keys provide a mechanism for enforcing page-based |
|
protections, but without requiring modification of the page tables when an |
|
application changes protection domains. |
|
|
|
Pkeys Userspace (PKU) is a feature which can be found on: |
|
* Intel server CPUs, Skylake and later |
|
* Intel client CPUs, Tiger Lake (11th Gen Core) and later |
|
* Future AMD CPUs |
|
|
|
Pkeys work by dedicating 4 previously Reserved bits in each page table entry to |
|
a "protection key", giving 16 possible keys. |
|
|
|
Protections for each key are defined with a per-CPU user-accessible register |
|
(PKRU). Each of these is a 32-bit register storing two bits (Access Disable |
|
and Write Disable) for each of 16 keys. |
|
|
|
Being a CPU register, PKRU is inherently thread-local, potentially giving each |
|
thread a different set of protections from every other thread. |
|
|
|
There are two instructions (RDPKRU/WRPKRU) for reading and writing to the |
|
register. The feature is only available in 64-bit mode, even though there is |
|
theoretically space in the PAE PTEs. These permissions are enforced on data |
|
access only and have no effect on instruction fetches. |
|
|
|
Syscalls |
|
======== |
|
|
|
There are 3 system calls which directly interact with pkeys:: |
|
|
|
int pkey_alloc(unsigned long flags, unsigned long init_access_rights) |
|
int pkey_free(int pkey); |
|
int pkey_mprotect(unsigned long start, size_t len, |
|
unsigned long prot, int pkey); |
|
|
|
Before a pkey can be used, it must first be allocated with |
|
pkey_alloc(). An application calls the WRPKRU instruction |
|
directly in order to change access permissions to memory covered |
|
with a key. In this example WRPKRU is wrapped by a C function |
|
called pkey_set(). |
|
:: |
|
|
|
int real_prot = PROT_READ|PROT_WRITE; |
|
pkey = pkey_alloc(0, PKEY_DISABLE_WRITE); |
|
ptr = mmap(NULL, PAGE_SIZE, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0); |
|
ret = pkey_mprotect(ptr, PAGE_SIZE, real_prot, pkey); |
|
... application runs here |
|
|
|
Now, if the application needs to update the data at 'ptr', it can |
|
gain access, do the update, then remove its write access:: |
|
|
|
pkey_set(pkey, 0); // clear PKEY_DISABLE_WRITE |
|
*ptr = foo; // assign something |
|
pkey_set(pkey, PKEY_DISABLE_WRITE); // set PKEY_DISABLE_WRITE again |
|
|
|
Now when it frees the memory, it will also free the pkey since it |
|
is no longer in use:: |
|
|
|
munmap(ptr, PAGE_SIZE); |
|
pkey_free(pkey); |
|
|
|
.. note:: pkey_set() is a wrapper for the RDPKRU and WRPKRU instructions. |
|
An example implementation can be found in |
|
tools/testing/selftests/x86/protection_keys.c. |
|
|
|
Behavior |
|
======== |
|
|
|
The kernel attempts to make protection keys consistent with the |
|
behavior of a plain mprotect(). For instance if you do this:: |
|
|
|
mprotect(ptr, size, PROT_NONE); |
|
something(ptr); |
|
|
|
you can expect the same effects with protection keys when doing this:: |
|
|
|
pkey = pkey_alloc(0, PKEY_DISABLE_WRITE | PKEY_DISABLE_READ); |
|
pkey_mprotect(ptr, size, PROT_READ|PROT_WRITE, pkey); |
|
something(ptr); |
|
|
|
That should be true whether something() is a direct access to 'ptr' |
|
like:: |
|
|
|
*ptr = foo; |
|
|
|
or when the kernel does the access on the application's behalf like |
|
with a read():: |
|
|
|
read(fd, ptr, 1); |
|
|
|
The kernel will send a SIGSEGV in both cases, but si_code will be set |
|
to SEGV_PKERR when violating protection keys versus SEGV_ACCERR when |
|
the plain mprotect() permissions are violated.
|
|
|