netfs-api.rst 18 KB


  1. .. SPDX-License-Identifier: GPL-2.0
  2. ==============================
  3. Network Filesystem Caching API
  4. ==============================
  5. Fscache provides an API by which a network filesystem can make use of local
  6. caching facilities. The API is arranged around a number of principles:
  7. (1) A cache is logically organised into volumes and data storage objects
  8. within those volumes.
  9. (2) Volumes and data storage objects are represented by various types of
  10. cookie.
  11. (3) Cookies have keys that distinguish them from their peers.
  12. (4) Cookies have coherency data that allows a cache to determine if the
  13. cached data is still valid.
  14. (5) I/O is done asynchronously where possible.
  15. This API is used by::
  16. #include <linux/fscache.h>.
  17. .. This document contains the following sections:
  18. (1) Overview
  19. (2) Volume registration
  20. (3) Data file registration
  21. (4) Declaring a cookie to be in use
  22. (5) Resizing a data file (truncation)
  23. (6) Data I/O API
  24. (7) Data file coherency
  25. (8) Data file invalidation
  26. (9) Write back resource management
  27. (10) Caching of local modifications
  28. (11) Page release and invalidation
  29. Overview
  30. ========
  31. The fscache hierarchy is organised on two levels from a network filesystem's
  32. point of view. The upper level represents "volumes" and the lower level
  33. represents "data storage objects". These are represented by two types of
  34. cookie, hereafter referred to as "volume cookies" and "cookies".
  35. A network filesystem acquires a volume cookie for a volume using a volume key,
  36. which represents all the information that defines that volume (e.g. cell name
  37. or server address, volume ID or share name). This must be rendered as a
  38. printable string that can be used as a directory name (ie. no '/' characters
  39. and shouldn't begin with a '.'). The maximum name length is one less than the
  40. maximum size of a filename component (allowing the cache backend one char for
  41. its own purposes).
  42. A filesystem would typically have a volume cookie for each superblock.
  43. The filesystem then acquires a cookie for each file within that volume using an
  44. object key. Object keys are binary blobs and only need to be unique within
  45. their parent volume. The cache backend is responsible for rendering the binary
  46. blob into something it can use and may employ hash tables, trees or whatever to
  47. improve its ability to find an object. This is transparent to the network
  48. filesystem.
  49. A filesystem would typically have a cookie for each inode, and would acquire it
  50. in iget and relinquish it when evicting the cookie.
  51. Once it has a cookie, the filesystem needs to mark the cookie as being in use.
  52. This causes fscache to send the cache backend off to look up/create resources
  53. for the cookie in the background, to check its coherency and, if necessary, to
  54. mark the object as being under modification.
  55. A filesystem would typically "use" the cookie in its file open routine and
  56. unuse it in file release and it needs to use the cookie around calls to
  57. truncate the cookie locally. It *also* needs to use the cookie when the
  58. pagecache becomes dirty and unuse it when writeback is complete. This is
  59. slightly tricky, and provision is made for it.
  60. When performing a read, write or resize on a cookie, the filesystem must first
  61. begin an operation. This copies the resources into a holding struct and puts
  62. extra pins into the cache to stop cache withdrawal from tearing down the
  63. structures being used. The actual operation can then be issued and conflicting
  64. invalidations can be detected upon completion.
  65. The filesystem is expected to use netfslib to access the cache, but that's not
  66. actually required and it can use the fscache I/O API directly.
  67. Volume Registration
  68. ===================
  69. The first step for a network filesystem is to acquire a volume cookie for the
  70. volume it wants to access::
  71. struct fscache_volume *
  72. fscache_acquire_volume(const char *volume_key,
  73. const char *cache_name,
  74. const void *coherency_data,
  75. size_t coherency_len);
  76. This function creates a volume cookie with the specified volume key as its name
  77. and notes the coherency data.
  78. The volume key must be a printable string with no '/' characters in it. It
  79. should begin with the name of the filesystem and should be no longer than 254
  80. characters. It should uniquely represent the volume and will be matched with
  81. what's stored in the cache.
  82. The caller may also specify the name of the cache to use. If specified,
  83. fscache will look up or create a cache cookie of that name and will use a cache
  84. of that name if it is online or comes online. If no cache name is specified,
  85. it will use the first cache that comes to hand and set the name to that.
  86. The specified coherency data is stored in the cookie and will be matched
  87. against coherency data stored on disk. The data pointer may be NULL if no data
  88. is provided. If the coherency data doesn't match, the entire cache volume will
  89. be invalidated.
  90. This function can return errors such as EBUSY if the volume key is already in
  91. use by an acquired volume or ENOMEM if an allocation failure occurred. It may
  92. also return a NULL volume cookie if fscache is not enabled. It is safe to
  93. pass a NULL cookie to any function that takes a volume cookie. This will
  94. cause that function to do nothing.
  95. When the network filesystem has finished with a volume, it should relinquish it
  96. by calling::
  97. void fscache_relinquish_volume(struct fscache_volume *volume,
  98. const void *coherency_data,
  99. bool invalidate);
  100. This will cause the volume to be committed or removed, and if sealed the
  101. coherency data will be set to the value supplied. The amount of coherency data
  102. must match the length specified when the volume was acquired. Note that all
  103. data cookies obtained in this volume must be relinquished before the volume is
  104. relinquished.
  105. Data File Registration
  106. ======================
  107. Once it has a volume cookie, a network filesystem can use it to acquire a
  108. cookie for data storage::
  109. struct fscache_cookie *
  110. fscache_acquire_cookie(struct fscache_volume *volume,
  111. u8 advice,
  112. const void *index_key,
  113. size_t index_key_len,
  114. const void *aux_data,
  115. size_t aux_data_len,
  116. loff_t object_size)
  117. This creates the cookie in the volume using the specified index key. The index
  118. key is a binary blob of the given length and must be unique for the volume.
  119. This is saved into the cookie. There are no restrictions on the content, but
  120. its length shouldn't exceed about three quarters of the maximum filename length
  121. to allow for encoding.
  122. The caller should also pass in a piece of coherency data in aux_data. A buffer
  123. of size aux_data_len will be allocated and the coherency data copied in. It is
  124. assumed that the size is invariant over time. The coherency data is used to
  125. check the validity of data in the cache. Functions are provided by which the
  126. coherency data can be updated.
  127. The file size of the object being cached should also be provided. This may be
  128. used to trim the data and will be stored with the coherency data.
  129. This function never returns an error, though it may return a NULL cookie on
  130. allocation failure or if fscache is not enabled. It is safe to pass in a NULL
  131. volume cookie and pass the NULL cookie returned to any function that takes it.
  132. This will cause that function to do nothing.
  133. When the network filesystem has finished with a cookie, it should relinquish it
  134. by calling::
  135. void fscache_relinquish_cookie(struct fscache_cookie *cookie,
  136. bool retire);
  137. This will cause fscache to either commit the storage backing the cookie or
  138. delete it.
  139. Marking A Cookie In-Use
  140. =======================
  141. Once a cookie has been acquired by a network filesystem, the filesystem should
  142. tell fscache when it intends to use the cookie (typically done on file open)
  143. and should say when it has finished with it (typically on file close)::
  144. void fscache_use_cookie(struct fscache_cookie *cookie,
  145. bool will_modify);
  146. void fscache_unuse_cookie(struct fscache_cookie *cookie,
  147. const void *aux_data,
  148. const loff_t *object_size);
  149. The *use* function tells fscache that it will use the cookie and, additionally,
  150. indicate if the user is intending to modify the contents locally. If not yet
  151. done, this will trigger the cache backend to go and gather the resources it
  152. needs to access/store data in the cache. This is done in the background, and
  153. so may not be complete by the time the function returns.
  154. The *unuse* function indicates that a filesystem has finished using a cookie.
  155. It optionally updates the stored coherency data and object size and then
  156. decreases the in-use counter. When the last user unuses the cookie, it is
  157. scheduled for garbage collection. If not reused within a short time, the
  158. resources will be released to reduce system resource consumption.
  159. A cookie must be marked in-use before it can be accessed for read, write or
  160. resize - and an in-use mark must be kept whilst there is dirty data in the
  161. pagecache in order to avoid an oops due to trying to open a file during process
  162. exit.
  163. Note that in-use marks are cumulative. For each time a cookie is marked
  164. in-use, it must be unused.
  165. Resizing A Data File (Truncation)
  166. =================================
  167. If a network filesystem file is resized locally by truncation, the following
  168. should be called to notify the cache::
  169. void fscache_resize_cookie(struct fscache_cookie *cookie,
  170. loff_t new_size);
  171. The caller must have first marked the cookie in-use. The cookie and the new
  172. size are passed in and the cache is synchronously resized. This is expected to
  173. be called from ``->setattr()`` inode operation under the inode lock.
  174. Data I/O API
  175. ============
  176. To do data I/O operations directly through a cookie, the following functions
  177. are available::
  178. int fscache_begin_read_operation(struct netfs_cache_resources *cres,
  179. struct fscache_cookie *cookie);
  180. int fscache_read(struct netfs_cache_resources *cres,
  181. loff_t start_pos,
  182. struct iov_iter *iter,
  183. enum netfs_read_from_hole read_hole,
  184. netfs_io_terminated_t term_func,
  185. void *term_func_priv);
  186. int fscache_write(struct netfs_cache_resources *cres,
  187. loff_t start_pos,
  188. struct iov_iter *iter,
  189. netfs_io_terminated_t term_func,
  190. void *term_func_priv);
  191. The *begin* function sets up an operation, attaching the resources required to
  192. the cache resources block from the cookie. Assuming it doesn't return an error
  193. (for instance, it will return -ENOBUFS if given a NULL cookie, but otherwise do
  194. nothing), then one of the other two functions can be issued.
  195. The *read* and *write* functions initiate a direct-IO operation. Both take the
  196. previously set up cache resources block, an indication of the start file
  197. position, and an I/O iterator that describes buffer and indicates the amount of
  198. data.
  199. The read function also takes a parameter to indicate how it should handle a
  200. partially populated region (a hole) in the disk content. This may be to ignore
  201. it, skip over an initial hole and place zeros in the buffer or give an error.
  202. The read and write functions can be given an optional termination function that
  203. will be run on completion::
  204. typedef
  205. void (*netfs_io_terminated_t)(void *priv, ssize_t transferred_or_error,
  206. bool was_async);
  207. If a termination function is given, the operation will be run asynchronously
  208. and the termination function will be called upon completion. If not given, the
  209. operation will be run synchronously. Note that in the asynchronous case, it is
  210. possible for the operation to complete before the function returns.
  211. Both the read and write functions end the operation when they complete,
  212. detaching any pinned resources.
  213. The read operation will fail with ESTALE if invalidation occurred whilst the
  214. operation was ongoing.
  215. Data File Coherency
  216. ===================
  217. To request an update of the coherency data and file size on a cookie, the
  218. following should be called::
  219. void fscache_update_cookie(struct fscache_cookie *cookie,
  220. const void *aux_data,
  221. const loff_t *object_size);
  222. This will update the cookie's coherency data and/or file size.
  223. Data File Invalidation
  224. ======================
  225. Sometimes it will be necessary to invalidate an object that contains data.
  226. Typically this will be necessary when the server informs the network filesystem
  227. of a remote third-party change - at which point the filesystem has to throw
  228. away the state and cached data that it had for an file and reload from the
  229. server.
  230. To indicate that a cache object should be invalidated, the following should be
  231. called::
  232. void fscache_invalidate(struct fscache_cookie *cookie,
  233. const void *aux_data,
  234. loff_t size,
  235. unsigned int flags);
  236. This increases the invalidation counter in the cookie to cause outstanding
  237. reads to fail with -ESTALE, sets the coherency data and file size from the
  238. information supplied, blocks new I/O on the cookie and dispatches the cache to
  239. go and get rid of the old data.
  240. Invalidation runs asynchronously in a worker thread so that it doesn't block
  241. too much.
  242. Write-Back Resource Management
  243. ==============================
  244. To write data to the cache from network filesystem writeback, the cache
  245. resources required need to be pinned at the point the modification is made (for
  246. instance when the page is marked dirty) as it's not possible to open a file in
  247. a thread that's exiting.
  248. The following facilities are provided to manage this:
  249. * An inode flag, ``I_PINNING_FSCACHE_WB``, is provided to indicate that an
  250. in-use is held on the cookie for this inode. It can only be changed if the
  251. the inode lock is held.
  252. * A flag, ``unpinned_fscache_wb`` is placed in the ``writeback_control``
  253. struct that gets set if ``__writeback_single_inode()`` clears
  254. ``I_PINNING_FSCACHE_WB`` because all the dirty pages were cleared.
  255. To support this, the following functions are provided::
  256. bool fscache_dirty_folio(struct address_space *mapping,
  257. struct folio *folio,
  258. struct fscache_cookie *cookie);
  259. void fscache_unpin_writeback(struct writeback_control *wbc,
  260. struct fscache_cookie *cookie);
  261. void fscache_clear_inode_writeback(struct fscache_cookie *cookie,
  262. struct inode *inode,
  263. const void *aux);
  264. The *set* function is intended to be called from the filesystem's
  265. ``dirty_folio`` address space operation. If ``I_PINNING_FSCACHE_WB`` is not
  266. set, it sets that flag and increments the use count on the cookie (the caller
  267. must already have called ``fscache_use_cookie()``).
  268. The *unpin* function is intended to be called from the filesystem's
  269. ``write_inode`` superblock operation. It cleans up after writing by unusing
  270. the cookie if unpinned_fscache_wb is set in the writeback_control struct.
  271. The *clear* function is intended to be called from the netfs's ``evict_inode``
  272. superblock operation. It must be called *after*
  273. ``truncate_inode_pages_final()``, but *before* ``clear_inode()``. This cleans
  274. up any hanging ``I_PINNING_FSCACHE_WB``. It also allows the coherency data to
  275. be updated.
  276. Caching of Local Modifications
  277. ==============================
  278. If a network filesystem has locally modified data that it wants to write to the
  279. cache, it needs to mark the pages to indicate that a write is in progress, and
  280. if the mark is already present, it needs to wait for it to be removed first
  281. (presumably due to an already in-progress operation). This prevents multiple
  282. competing DIO writes to the same storage in the cache.
  283. Firstly, the netfs should determine if caching is available by doing something
  284. like::
  285. bool caching = fscache_cookie_enabled(cookie);
  286. If caching is to be attempted, pages should be waited for and then marked using
  287. the following functions provided by the netfs helper library::
  288. void set_page_fscache(struct page *page);
  289. void wait_on_page_fscache(struct page *page);
  290. int wait_on_page_fscache_killable(struct page *page);
  291. Once all the pages in the span are marked, the netfs can ask fscache to
  292. schedule a write of that region::
  293. void fscache_write_to_cache(struct fscache_cookie *cookie,
  294. struct address_space *mapping,
  295. loff_t start, size_t len, loff_t i_size,
  296. netfs_io_terminated_t term_func,
  297. void *term_func_priv,
  298. bool caching)
  299. And if an error occurs before that point is reached, the marks can be removed
  300. by calling::
  301. void fscache_clear_page_bits(struct address_space *mapping,
  302. loff_t start, size_t len,
  303. bool caching)
  304. In these functions, a pointer to the mapping to which the source pages are
  305. attached is passed in and start and len indicate the size of the region that's
  306. going to be written (it doesn't have to align to page boundaries necessarily,
  307. but it does have to align to DIO boundaries on the backing filesystem). The
  308. caching parameter indicates if caching should be skipped, and if false, the
  309. functions do nothing.
  310. The write function takes some additional parameters: the cookie representing
  311. the cache object to be written to, i_size indicates the size of the netfs file
  312. and term_func indicates an optional completion function, to which
  313. term_func_priv will be passed, along with the error or amount written.
  314. Note that the write function will always run asynchronously and will unmark all
  315. the pages upon completion before calling term_func.
  316. Page Release and Invalidation
  317. =============================
  318. Fscache keeps track of whether we have any data in the cache yet for a cache
  319. object we've just created. It knows it doesn't have to do any reading until it
  320. has done a write and then the page it wrote from has been released by the VM,
  321. after which it *has* to look in the cache.
  322. To inform fscache that a page might now be in the cache, the following function
  323. should be called from the ``release_folio`` address space op::
  324. void fscache_note_page_release(struct fscache_cookie *cookie);
  325. if the page has been released (ie. release_folio returned true).
  326. Page release and page invalidation should also wait for any mark left on the
  327. page to say that a DIO write is underway from that page::
  328. void wait_on_page_fscache(struct page *page);
  329. int wait_on_page_fscache_killable(struct page *page);
  330. API Function Reference
  331. ======================
  332. .. kernel-doc:: include/linux/fscache.h