mirror of https://github.com/Qortal/Brooklyn
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
125 lines
4.7 KiB
125 lines
4.7 KiB
.. _memory_hotplug: |
|
|
|
============== |
|
Memory hotplug |
|
============== |
|
|
|
Memory hotplug event notifier |
|
============================= |
|
|
|
Hotplugging events are sent to a notification queue. |
|
|
|
There are six types of notification defined in ``include/linux/memory.h``: |
|
|
|
MEM_GOING_ONLINE |
|
Generated before new memory becomes available in order to be able to |
|
prepare subsystems to handle memory. The page allocator is still unable |
|
to allocate from the new memory. |
|
|
|
MEM_CANCEL_ONLINE |
|
Generated if MEM_GOING_ONLINE fails. |
|
|
|
MEM_ONLINE |
|
Generated when memory has successfully brought online. The callback may |
|
allocate pages from the new memory. |
|
|
|
MEM_GOING_OFFLINE |
|
Generated to begin the process of offlining memory. Allocations are no |
|
longer possible from the memory but some of the memory to be offlined |
|
is still in use. The callback can be used to free memory known to a |
|
subsystem from the indicated memory block. |
|
|
|
MEM_CANCEL_OFFLINE |
|
Generated if MEM_GOING_OFFLINE fails. Memory is available again from |
|
the memory block that we attempted to offline. |
|
|
|
MEM_OFFLINE |
|
Generated after offlining memory is complete. |
|
|
|
A callback routine can be registered by calling:: |
|
|
|
hotplug_memory_notifier(callback_func, priority) |
|
|
|
Callback functions with higher values of priority are called before callback |
|
functions with lower values. |
|
|
|
A callback function must have the following prototype:: |
|
|
|
int callback_func( |
|
struct notifier_block *self, unsigned long action, void *arg); |
|
|
|
The first argument of the callback function (self) is a pointer to the block |
|
of the notifier chain that points to the callback function itself. |
|
The second argument (action) is one of the event types described above. |
|
The third argument (arg) passes a pointer of struct memory_notify:: |
|
|
|
struct memory_notify { |
|
unsigned long start_pfn; |
|
unsigned long nr_pages; |
|
int status_change_nid_normal; |
|
int status_change_nid_high; |
|
int status_change_nid; |
|
} |
|
|
|
- start_pfn is start_pfn of online/offline memory. |
|
- nr_pages is # of pages of online/offline memory. |
|
- status_change_nid_normal is set node id when N_NORMAL_MEMORY of nodemask |
|
is (will be) set/clear, if this is -1, then nodemask status is not changed. |
|
- status_change_nid_high is set node id when N_HIGH_MEMORY of nodemask |
|
is (will be) set/clear, if this is -1, then nodemask status is not changed. |
|
- status_change_nid is set node id when N_MEMORY of nodemask is (will be) |
|
set/clear. It means a new(memoryless) node gets new memory by online and a |
|
node loses all memory. If this is -1, then nodemask status is not changed. |
|
|
|
If status_changed_nid* >= 0, callback should create/discard structures for the |
|
node if necessary. |
|
|
|
The callback routine shall return one of the values |
|
NOTIFY_DONE, NOTIFY_OK, NOTIFY_BAD, NOTIFY_STOP |
|
defined in ``include/linux/notifier.h`` |
|
|
|
NOTIFY_DONE and NOTIFY_OK have no effect on the further processing. |
|
|
|
NOTIFY_BAD is used as response to the MEM_GOING_ONLINE, MEM_GOING_OFFLINE, |
|
MEM_ONLINE, or MEM_OFFLINE action to cancel hotplugging. It stops |
|
further processing of the notification queue. |
|
|
|
NOTIFY_STOP stops further processing of the notification queue. |
|
|
|
Locking Internals |
|
================= |
|
|
|
When adding/removing memory that uses memory block devices (i.e. ordinary RAM), |
|
the device_hotplug_lock should be held to: |
|
|
|
- synchronize against online/offline requests (e.g. via sysfs). This way, memory |
|
block devices can only be accessed (.online/.state attributes) by user |
|
space once memory has been fully added. And when removing memory, we |
|
know nobody is in critical sections. |
|
- synchronize against CPU hotplug and similar (e.g. relevant for ACPI and PPC) |
|
|
|
Especially, there is a possible lock inversion that is avoided using |
|
device_hotplug_lock when adding memory and user space tries to online that |
|
memory faster than expected: |
|
|
|
- device_online() will first take the device_lock(), followed by |
|
mem_hotplug_lock |
|
- add_memory_resource() will first take the mem_hotplug_lock, followed by |
|
the device_lock() (while creating the devices, during bus_add_device()). |
|
|
|
As the device is visible to user space before taking the device_lock(), this |
|
can result in a lock inversion. |
|
|
|
onlining/offlining of memory should be done via device_online()/ |
|
device_offline() - to make sure it is properly synchronized to actions |
|
via sysfs. Holding device_hotplug_lock is advised (to e.g. protect online_type) |
|
|
|
When adding/removing/onlining/offlining memory or adding/removing |
|
heterogeneous/device memory, we should always hold the mem_hotplug_lock in |
|
write mode to serialise memory hotplug (e.g. access to global/zone |
|
variables). |
|
|
|
In addition, mem_hotplug_lock (in contrast to device_hotplug_lock) in read |
|
mode allows for a quite efficient get_online_mems/put_online_mems |
|
implementation, so code accessing memory can protect from that memory |
|
vanishing.
|
|
|