vcpu-requests.rst 12 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294
  1. .. SPDX-License-Identifier: GPL-2.0
  2. =================
  3. KVM VCPU Requests
  4. =================
  5. Overview
  6. ========
  7. KVM supports an internal API enabling threads to request a VCPU thread to
  8. perform some activity. For example, a thread may request a VCPU to flush
  9. its TLB with a VCPU request. The API consists of the following functions::
  10. /* Check if any requests are pending for VCPU @vcpu. */
  11. bool kvm_request_pending(struct kvm_vcpu *vcpu);
  12. /* Check if VCPU @vcpu has request @req pending. */
  13. bool kvm_test_request(int req, struct kvm_vcpu *vcpu);
  14. /* Clear request @req for VCPU @vcpu. */
  15. void kvm_clear_request(int req, struct kvm_vcpu *vcpu);
  16. /*
  17. * Check if VCPU @vcpu has request @req pending. When the request is
  18. * pending it will be cleared and a memory barrier, which pairs with
  19. * another in kvm_make_request(), will be issued.
  20. */
  21. bool kvm_check_request(int req, struct kvm_vcpu *vcpu);
  22. /*
  23. * Make request @req of VCPU @vcpu. Issues a memory barrier, which pairs
  24. * with another in kvm_check_request(), prior to setting the request.
  25. */
  26. void kvm_make_request(int req, struct kvm_vcpu *vcpu);
  27. /* Make request @req of all VCPUs of the VM with struct kvm @kvm. */
  28. bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req);
  29. Typically a requester wants the VCPU to perform the activity as soon
  30. as possible after making the request. This means most requests
  31. (kvm_make_request() calls) are followed by a call to kvm_vcpu_kick(),
  32. and kvm_make_all_cpus_request() has the kicking of all VCPUs built
  33. into it.
  34. VCPU Kicks
  35. ----------
  36. The goal of a VCPU kick is to bring a VCPU thread out of guest mode in
  37. order to perform some KVM maintenance. To do so, an IPI is sent, forcing
  38. a guest mode exit. However, a VCPU thread may not be in guest mode at the
  39. time of the kick. Therefore, depending on the mode and state of the VCPU
  40. thread, there are two other actions a kick may take. All three actions
  41. are listed below:
  42. 1) Send an IPI. This forces a guest mode exit.
  43. 2) Waking a sleeping VCPU. Sleeping VCPUs are VCPU threads outside guest
  44. mode that wait on waitqueues. Waking them removes the threads from
  45. the waitqueues, allowing the threads to run again. This behavior
  46. may be suppressed, see KVM_REQUEST_NO_WAKEUP below.
  47. 3) Nothing. When the VCPU is not in guest mode and the VCPU thread is not
  48. sleeping, then there is nothing to do.
  49. VCPU Mode
  50. ---------
  51. VCPUs have a mode state, ``vcpu->mode``, that is used to track whether the
  52. guest is running in guest mode or not, as well as some specific
  53. outside guest mode states. The architecture may use ``vcpu->mode`` to
  54. ensure VCPU requests are seen by VCPUs (see "Ensuring Requests Are Seen"),
  55. as well as to avoid sending unnecessary IPIs (see "IPI Reduction"), and
  56. even to ensure IPI acknowledgements are waited upon (see "Waiting for
  57. Acknowledgements"). The following modes are defined:
  58. OUTSIDE_GUEST_MODE
  59. The VCPU thread is outside guest mode.
  60. IN_GUEST_MODE
  61. The VCPU thread is in guest mode.
  62. EXITING_GUEST_MODE
  63. The VCPU thread is transitioning from IN_GUEST_MODE to
  64. OUTSIDE_GUEST_MODE.
  65. READING_SHADOW_PAGE_TABLES
  66. The VCPU thread is outside guest mode, but it wants the sender of
  67. certain VCPU requests, namely KVM_REQ_TLB_FLUSH, to wait until the VCPU
  68. thread is done reading the page tables.
  69. VCPU Request Internals
  70. ======================
  71. VCPU requests are simply bit indices of the ``vcpu->requests`` bitmap.
  72. This means general bitops, like those documented in [atomic-ops]_ could
  73. also be used, e.g. ::
  74. clear_bit(KVM_REQ_UNBLOCK & KVM_REQUEST_MASK, &vcpu->requests);
  75. However, VCPU request users should refrain from doing so, as it would
  76. break the abstraction. The first 8 bits are reserved for architecture
  77. independent requests; all additional bits are available for architecture
  78. dependent requests.
  79. Architecture Independent Requests
  80. ---------------------------------
  81. KVM_REQ_TLB_FLUSH
  82. KVM's common MMU notifier may need to flush all of a guest's TLB
  83. entries, calling kvm_flush_remote_tlbs() to do so. Architectures that
  84. choose to use the common kvm_flush_remote_tlbs() implementation will
  85. need to handle this VCPU request.
  86. KVM_REQ_VM_DEAD
  87. This request informs all VCPUs that the VM is dead and unusable, e.g. due to
  88. fatal error or because the VM's state has been intentionally destroyed.
  89. KVM_REQ_UNBLOCK
  90. This request informs the vCPU to exit kvm_vcpu_block. It is used for
  91. example from timer handlers that run on the host on behalf of a vCPU,
  92. or in order to update the interrupt routing and ensure that assigned
  93. devices will wake up the vCPU.
  94. KVM_REQ_OUTSIDE_GUEST_MODE
  95. This "request" ensures the target vCPU has exited guest mode prior to the
  96. sender of the request continuing on. No action needs be taken by the target,
  97. and so no request is actually logged for the target. This request is similar
  98. to a "kick", but unlike a kick it guarantees the vCPU has actually exited
  99. guest mode. A kick only guarantees the vCPU will exit at some point in the
  100. future, e.g. a previous kick may have started the process, but there's no
  101. guarantee the to-be-kicked vCPU has fully exited guest mode.
  102. KVM_REQUEST_MASK
  103. ----------------
  104. VCPU requests should be masked by KVM_REQUEST_MASK before using them with
  105. bitops. This is because only the lower 8 bits are used to represent the
  106. request's number. The upper bits are used as flags. Currently only two
  107. flags are defined.
  108. VCPU Request Flags
  109. ------------------
  110. KVM_REQUEST_NO_WAKEUP
  111. This flag is applied to requests that only need immediate attention
  112. from VCPUs running in guest mode. That is, sleeping VCPUs do not need
  113. to be awakened for these requests. Sleeping VCPUs will handle the
  114. requests when they are awakened later for some other reason.
  115. KVM_REQUEST_WAIT
  116. When requests with this flag are made with kvm_make_all_cpus_request(),
  117. then the caller will wait for each VCPU to acknowledge its IPI before
  118. proceeding. This flag only applies to VCPUs that would receive IPIs.
  119. If, for example, the VCPU is sleeping, so no IPI is necessary, then
  120. the requesting thread does not wait. This means that this flag may be
  121. safely combined with KVM_REQUEST_NO_WAKEUP. See "Waiting for
  122. Acknowledgements" for more information about requests with
  123. KVM_REQUEST_WAIT.
  124. VCPU Requests with Associated State
  125. ===================================
  126. Requesters that want the receiving VCPU to handle new state need to ensure
  127. the newly written state is observable to the receiving VCPU thread's CPU
  128. by the time it observes the request. This means a write memory barrier
  129. must be inserted after writing the new state and before setting the VCPU
  130. request bit. Additionally, on the receiving VCPU thread's side, a
  131. corresponding read barrier must be inserted after reading the request bit
  132. and before proceeding to read the new state associated with it. See
  133. scenario 3, Message and Flag, of [lwn-mb]_ and the kernel documentation
  134. [memory-barriers]_.
  135. The pair of functions, kvm_check_request() and kvm_make_request(), provide
  136. the memory barriers, allowing this requirement to be handled internally by
  137. the API.
  138. Ensuring Requests Are Seen
  139. ==========================
  140. When making requests to VCPUs, we want to avoid the receiving VCPU
  141. executing in guest mode for an arbitrary long time without handling the
  142. request. We can be sure this won't happen as long as we ensure the VCPU
  143. thread checks kvm_request_pending() before entering guest mode and that a
  144. kick will send an IPI to force an exit from guest mode when necessary.
  145. Extra care must be taken to cover the period after the VCPU thread's last
  146. kvm_request_pending() check and before it has entered guest mode, as kick
  147. IPIs will only trigger guest mode exits for VCPU threads that are in guest
  148. mode or at least have already disabled interrupts in order to prepare to
  149. enter guest mode. This means that an optimized implementation (see "IPI
  150. Reduction") must be certain when it's safe to not send the IPI. One
  151. solution, which all architectures except s390 apply, is to:
  152. - set ``vcpu->mode`` to IN_GUEST_MODE between disabling the interrupts and
  153. the last kvm_request_pending() check;
  154. - enable interrupts atomically when entering the guest.
  155. This solution also requires memory barriers to be placed carefully in both
  156. the requesting thread and the receiving VCPU. With the memory barriers we
  157. can exclude the possibility of a VCPU thread observing
  158. !kvm_request_pending() on its last check and then not receiving an IPI for
  159. the next request made of it, even if the request is made immediately after
  160. the check. This is done by way of the Dekker memory barrier pattern
  161. (scenario 10 of [lwn-mb]_). As the Dekker pattern requires two variables,
  162. this solution pairs ``vcpu->mode`` with ``vcpu->requests``. Substituting
  163. them into the pattern gives::
  164. CPU1 CPU2
  165. ================= =================
  166. local_irq_disable();
  167. WRITE_ONCE(vcpu->mode, IN_GUEST_MODE); kvm_make_request(REQ, vcpu);
  168. smp_mb(); smp_mb();
  169. if (kvm_request_pending(vcpu)) { if (READ_ONCE(vcpu->mode) ==
  170. IN_GUEST_MODE) {
  171. ...abort guest entry... ...send IPI...
  172. } }
  173. As stated above, the IPI is only useful for VCPU threads in guest mode or
  174. that have already disabled interrupts. This is why this specific case of
  175. the Dekker pattern has been extended to disable interrupts before setting
  176. ``vcpu->mode`` to IN_GUEST_MODE. WRITE_ONCE() and READ_ONCE() are used to
  177. pedantically implement the memory barrier pattern, guaranteeing the
  178. compiler doesn't interfere with ``vcpu->mode``'s carefully planned
  179. accesses.
  180. IPI Reduction
  181. -------------
  182. As only one IPI is needed to get a VCPU to check for any/all requests,
  183. then they may be coalesced. This is easily done by having the first IPI
  184. sending kick also change the VCPU mode to something !IN_GUEST_MODE. The
  185. transitional state, EXITING_GUEST_MODE, is used for this purpose.
  186. Waiting for Acknowledgements
  187. ----------------------------
  188. Some requests, those with the KVM_REQUEST_WAIT flag set, require IPIs to
  189. be sent, and the acknowledgements to be waited upon, even when the target
  190. VCPU threads are in modes other than IN_GUEST_MODE. For example, one case
  191. is when a target VCPU thread is in READING_SHADOW_PAGE_TABLES mode, which
  192. is set after disabling interrupts. To support these cases, the
  193. KVM_REQUEST_WAIT flag changes the condition for sending an IPI from
  194. checking that the VCPU is IN_GUEST_MODE to checking that it is not
  195. OUTSIDE_GUEST_MODE.
  196. Request-less VCPU Kicks
  197. -----------------------
  198. As the determination of whether or not to send an IPI depends on the
  199. two-variable Dekker memory barrier pattern, then it's clear that
  200. request-less VCPU kicks are almost never correct. Without the assurance
  201. that a non-IPI generating kick will still result in an action by the
  202. receiving VCPU, as the final kvm_request_pending() check does for
  203. request-accompanying kicks, then the kick may not do anything useful at
  204. all. If, for instance, a request-less kick was made to a VCPU that was
  205. just about to set its mode to IN_GUEST_MODE, meaning no IPI is sent, then
  206. the VCPU thread may continue its entry without actually having done
  207. whatever it was the kick was meant to initiate.
  208. One exception is x86's posted interrupt mechanism. In this case, however,
  209. even the request-less VCPU kick is coupled with the same
  210. local_irq_disable() + smp_mb() pattern described above; the ON bit
  211. (Outstanding Notification) in the posted interrupt descriptor takes the
  212. role of ``vcpu->requests``. When sending a posted interrupt, PIR.ON is
  213. set before reading ``vcpu->mode``; dually, in the VCPU thread,
  214. vmx_sync_pir_to_irr() reads PIR after setting ``vcpu->mode`` to
  215. IN_GUEST_MODE.
  216. Additional Considerations
  217. =========================
  218. Sleeping VCPUs
  219. --------------
  220. VCPU threads may need to consider requests before and/or after calling
  221. functions that may put them to sleep, e.g. kvm_vcpu_block(). Whether they
  222. do or not, and, if they do, which requests need consideration, is
  223. architecture dependent. kvm_vcpu_block() calls kvm_arch_vcpu_runnable()
  224. to check if it should awaken. One reason to do so is to provide
  225. architectures a function where requests may be checked if necessary.
  226. References
  227. ==========
  228. .. [atomic-ops] Documentation/atomic_bitops.txt and Documentation/atomic_t.txt
  229. .. [memory-barriers] Documentation/memory-barriers.txt
  230. .. [lwn-mb] https://lwn.net/Articles/573436/