forked from Qortal/Brooklyn
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
148 lines
7.1 KiB
148 lines
7.1 KiB
========================================= |
|
I915 GuC Submission/DRM Scheduler Section |
|
========================================= |
|
|
|
Upstream plan |
|
============= |
|
For upstream the overall plan for landing GuC submission and integrating the |
|
i915 with the DRM scheduler is: |
|
|
|
* Merge basic GuC submission |
|
* Basic submission support for all gen11+ platforms |
|
* Not enabled by default on any current platforms but can be enabled via |
|
modparam enable_guc |
|
* Lots of rework will need to be done to integrate with DRM scheduler so |
|
no need to nit pick everything in the code, it just should be |
|
functional, no major coding style / layering errors, and not regress |
|
execlists |
|
* Update IGTs / selftests as needed to work with GuC submission |
|
* Enable CI on supported platforms for a baseline |
|
* Rework / get CI heathly for GuC submission in place as needed |
|
* Merge new parallel submission uAPI |
|
* Bonding uAPI completely incompatible with GuC submission, plus it has |
|
severe design issues in general, which is why we want to retire it no |
|
matter what |
|
* New uAPI adds I915_CONTEXT_ENGINES_EXT_PARALLEL context setup step |
|
which configures a slot with N contexts |
|
* After I915_CONTEXT_ENGINES_EXT_PARALLEL a user can submit N batches to |
|
a slot in a single execbuf IOCTL and the batches run on the GPU in |
|
paralllel |
|
* Initially only for GuC submission but execlists can be supported if |
|
needed |
|
* Convert the i915 to use the DRM scheduler |
|
* GuC submission backend fully integrated with DRM scheduler |
|
* All request queues removed from backend (e.g. all backpressure |
|
handled in DRM scheduler) |
|
* Resets / cancels hook in DRM scheduler |
|
* Watchdog hooks into DRM scheduler |
|
* Lots of complexity of the GuC backend can be pulled out once |
|
integrated with DRM scheduler (e.g. state machine gets |
|
simplier, locking gets simplier, etc...) |
|
* Execlists backend will minimum required to hook in the DRM scheduler |
|
* Legacy interface |
|
* Features like timeslicing / preemption / virtual engines would |
|
be difficult to integrate with the DRM scheduler and these |
|
features are not required for GuC submission as the GuC does |
|
these things for us |
|
* ROI low on fully integrating into DRM scheduler |
|
* Fully integrating would add lots of complexity to DRM |
|
scheduler |
|
* Port i915 priority inheritance / boosting feature in DRM scheduler |
|
* Used for i915 page flip, may be useful to other DRM drivers as |
|
well |
|
* Will be an optional feature in the DRM scheduler |
|
* Remove in-order completion assumptions from DRM scheduler |
|
* Even when using the DRM scheduler the backends will handle |
|
preemption, timeslicing, etc... so it is possible for jobs to |
|
finish out of order |
|
* Pull out i915 priority levels and use DRM priority levels |
|
* Optimize DRM scheduler as needed |
|
|
|
TODOs for GuC submission upstream |
|
================================= |
|
|
|
* Need an update to GuC firmware / i915 to enable error state capture |
|
* Open source tool to decode GuC logs |
|
* Public GuC spec |
|
|
|
New uAPI for basic GuC submission |
|
================================= |
|
No major changes are required to the uAPI for basic GuC submission. The only |
|
change is a new scheduler attribute: I915_SCHEDULER_CAP_STATIC_PRIORITY_MAP. |
|
This attribute indicates the 2k i915 user priority levels are statically mapped |
|
into 3 levels as follows: |
|
|
|
* -1k to -1 Low priority |
|
* 0 Medium priority |
|
* 1 to 1k High priority |
|
|
|
This is needed because the GuC only has 4 priority bands. The highest priority |
|
band is reserved with the kernel. This aligns with the DRM scheduler priority |
|
levels too. |
|
|
|
Spec references: |
|
---------------- |
|
* https://www.khronos.org/registry/EGL/extensions/IMG/EGL_IMG_context_priority.txt |
|
* https://www.khronos.org/registry/vulkan/specs/1.2-extensions/html/chap5.html#devsandqueues-priority |
|
* https://spec.oneapi.com/level-zero/latest/core/api.html#ze-command-queue-priority-t |
|
|
|
New parallel submission uAPI |
|
============================ |
|
The existing bonding uAPI is completely broken with GuC submission because |
|
whether a submission is a single context submit or parallel submit isn't known |
|
until execbuf time activated via the I915_SUBMIT_FENCE. To submit multiple |
|
contexts in parallel with the GuC the context must be explicitly registered with |
|
N contexts and all N contexts must be submitted in a single command to the GuC. |
|
The GuC interfaces do not support dynamically changing between N contexts as the |
|
bonding uAPI does. Hence the need for a new parallel submission interface. Also |
|
the legacy bonding uAPI is quite confusing and not intuitive at all. Furthermore |
|
I915_SUBMIT_FENCE is by design a future fence, so not really something we should |
|
continue to support. |
|
|
|
The new parallel submission uAPI consists of 3 parts: |
|
|
|
* Export engines logical mapping |
|
* A 'set_parallel' extension to configure contexts for parallel |
|
submission |
|
* Extend execbuf2 IOCTL to support submitting N BBs in a single IOCTL |
|
|
|
Export engines logical mapping |
|
------------------------------ |
|
Certain use cases require BBs to be placed on engine instances in logical order |
|
(e.g. split-frame on gen11+). The logical mapping of engine instances can change |
|
based on fusing. Rather than making UMDs be aware of fusing, simply expose the |
|
logical mapping with the existing query engine info IOCTL. Also the GuC |
|
submission interface currently only supports submitting multiple contexts to |
|
engines in logical order which is a new requirement compared to execlists. |
|
Lastly, all current platforms have at most 2 engine instances and the logical |
|
order is the same as uAPI order. This will change on platforms with more than 2 |
|
engine instances. |
|
|
|
A single bit will be added to drm_i915_engine_info.flags indicating that the |
|
logical instance has been returned and a new field, |
|
drm_i915_engine_info.logical_instance, returns the logical instance. |
|
|
|
A 'set_parallel' extension to configure contexts for parallel submission |
|
------------------------------------------------------------------------ |
|
The 'set_parallel' extension configures a slot for parallel submission of N BBs. |
|
It is a setup step that must be called before using any of the contexts. See |
|
I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE or I915_CONTEXT_ENGINES_EXT_BOND for |
|
similar existing examples. Once a slot is configured for parallel submission the |
|
execbuf2 IOCTL can be called submitting N BBs in a single IOCTL. Initially only |
|
supports GuC submission. Execlists supports can be added later if needed. |
|
|
|
Add I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT and |
|
drm_i915_context_engines_parallel_submit to the uAPI to implement this |
|
extension. |
|
|
|
.. kernel-doc:: Documentation/gpu/rfc/i915_parallel_execbuf.h |
|
:functions: drm_i915_context_engines_parallel_submit |
|
|
|
Extend execbuf2 IOCTL to support submitting N BBs in a single IOCTL |
|
------------------------------------------------------------------- |
|
Contexts that have been configured with the 'set_parallel' extension can only |
|
submit N BBs in a single execbuf2 IOCTL. The BBs are either the last N objects |
|
in the drm_i915_gem_exec_object2 list or the first N if I915_EXEC_BATCH_FIRST is |
|
set. The number of BBs is implicit based on the slot submitted and how it has |
|
been configured by 'set_parallel' or other extensions. No uAPI changes are |
|
required to the execbuf2 IOCTL.
|
|
|