forked from Qortal/Brooklyn
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
526 lines
18 KiB
526 lines
18 KiB
.. SPDX-License-Identifier: GPL-2.0 |
|
|
|
================================= |
|
NETWORK FILESYSTEM HELPER LIBRARY |
|
================================= |
|
|
|
.. Contents: |
|
|
|
- Overview. |
|
- Buffered read helpers. |
|
- Read helper functions. |
|
- Read helper structures. |
|
- Read helper operations. |
|
- Read helper procedure. |
|
- Read helper cache API. |
|
|
|
|
|
Overview |
|
======== |
|
|
|
The network filesystem helper library is a set of functions designed to aid a |
|
network filesystem in implementing VM/VFS operations. For the moment, that |
|
just includes turning various VM buffered read operations into requests to read |
|
from the server. The helper library, however, can also interpose other |
|
services, such as local caching or local data encryption. |
|
|
|
Note that the library module doesn't link against local caching directly, so |
|
access must be provided by the netfs. |
|
|
|
|
|
Buffered Read Helpers |
|
===================== |
|
|
|
The library provides a set of read helpers that handle the ->readpage(), |
|
->readahead() and much of the ->write_begin() VM operations and translate them |
|
into a common call framework. |
|
|
|
The following services are provided: |
|
|
|
* Handles transparent huge pages (THPs). |
|
|
|
* Insulates the netfs from VM interface changes. |
|
|
|
* Allows the netfs to arbitrarily split reads up into pieces, even ones that |
|
don't match page sizes or page alignments and that may cross pages. |
|
|
|
* Allows the netfs to expand a readahead request in both directions to meet |
|
its needs. |
|
|
|
* Allows the netfs to partially fulfil a read, which will then be resubmitted. |
|
|
|
* Handles local caching, allowing cached data and server-read data to be |
|
interleaved for a single request. |
|
|
|
* Handles clearing of bufferage that aren't on the server. |
|
|
|
* Handle retrying of reads that failed, switching reads from the cache to the |
|
server as necessary. |
|
|
|
* In the future, this is a place that other services can be performed, such as |
|
local encryption of data to be stored remotely or in the cache. |
|
|
|
From the network filesystem, the helpers require a table of operations. This |
|
includes a mandatory method to issue a read operation along with a number of |
|
optional methods. |
|
|
|
|
|
Read Helper Functions |
|
--------------------- |
|
|
|
Three read helpers are provided:: |
|
|
|
* void netfs_readahead(struct readahead_control *ractl, |
|
const struct netfs_read_request_ops *ops, |
|
void *netfs_priv);`` |
|
* int netfs_readpage(struct file *file, |
|
struct page *page, |
|
const struct netfs_read_request_ops *ops, |
|
void *netfs_priv); |
|
* int netfs_write_begin(struct file *file, |
|
struct address_space *mapping, |
|
loff_t pos, |
|
unsigned int len, |
|
unsigned int flags, |
|
struct page **_page, |
|
void **_fsdata, |
|
const struct netfs_read_request_ops *ops, |
|
void *netfs_priv); |
|
|
|
Each corresponds to a VM operation, with the addition of a couple of parameters |
|
for the use of the read helpers: |
|
|
|
* ``ops`` |
|
|
|
A table of operations through which the helpers can talk to the filesystem. |
|
|
|
* ``netfs_priv`` |
|
|
|
Filesystem private data (can be NULL). |
|
|
|
Both of these values will be stored into the read request structure. |
|
|
|
For ->readahead() and ->readpage(), the network filesystem should just jump |
|
into the corresponding read helper; whereas for ->write_begin(), it may be a |
|
little more complicated as the network filesystem might want to flush |
|
conflicting writes or track dirty data and needs to put the acquired page if an |
|
error occurs after calling the helper. |
|
|
|
The helpers manage the read request, calling back into the network filesystem |
|
through the suppplied table of operations. Waits will be performed as |
|
necessary before returning for helpers that are meant to be synchronous. |
|
|
|
If an error occurs and netfs_priv is non-NULL, ops->cleanup() will be called to |
|
deal with it. If some parts of the request are in progress when an error |
|
occurs, the request will get partially completed if sufficient data is read. |
|
|
|
Additionally, there is:: |
|
|
|
* void netfs_subreq_terminated(struct netfs_read_subrequest *subreq, |
|
ssize_t transferred_or_error, |
|
bool was_async); |
|
|
|
which should be called to complete a read subrequest. This is given the number |
|
of bytes transferred or a negative error code, plus a flag indicating whether |
|
the operation was asynchronous (ie. whether the follow-on processing can be |
|
done in the current context, given this may involve sleeping). |
|
|
|
|
|
Read Helper Structures |
|
---------------------- |
|
|
|
The read helpers make use of a couple of structures to maintain the state of |
|
the read. The first is a structure that manages a read request as a whole:: |
|
|
|
struct netfs_read_request { |
|
struct inode *inode; |
|
struct address_space *mapping; |
|
struct netfs_cache_resources cache_resources; |
|
void *netfs_priv; |
|
loff_t start; |
|
size_t len; |
|
loff_t i_size; |
|
const struct netfs_read_request_ops *netfs_ops; |
|
unsigned int debug_id; |
|
... |
|
}; |
|
|
|
The above fields are the ones the netfs can use. They are: |
|
|
|
* ``inode`` |
|
* ``mapping`` |
|
|
|
The inode and the address space of the file being read from. The mapping |
|
may or may not point to inode->i_data. |
|
|
|
* ``cache_resources`` |
|
|
|
Resources for the local cache to use, if present. |
|
|
|
* ``netfs_priv`` |
|
|
|
The network filesystem's private data. The value for this can be passed in |
|
to the helper functions or set during the request. The ->cleanup() op will |
|
be called if this is non-NULL at the end. |
|
|
|
* ``start`` |
|
* ``len`` |
|
|
|
The file position of the start of the read request and the length. These |
|
may be altered by the ->expand_readahead() op. |
|
|
|
* ``i_size`` |
|
|
|
The size of the file at the start of the request. |
|
|
|
* ``netfs_ops`` |
|
|
|
A pointer to the operation table. The value for this is passed into the |
|
helper functions. |
|
|
|
* ``debug_id`` |
|
|
|
A number allocated to this operation that can be displayed in trace lines |
|
for reference. |
|
|
|
|
|
The second structure is used to manage individual slices of the overall read |
|
request:: |
|
|
|
struct netfs_read_subrequest { |
|
struct netfs_read_request *rreq; |
|
loff_t start; |
|
size_t len; |
|
size_t transferred; |
|
unsigned long flags; |
|
unsigned short debug_index; |
|
... |
|
}; |
|
|
|
Each subrequest is expected to access a single source, though the helpers will |
|
handle falling back from one source type to another. The members are: |
|
|
|
* ``rreq`` |
|
|
|
A pointer to the read request. |
|
|
|
* ``start`` |
|
* ``len`` |
|
|
|
The file position of the start of this slice of the read request and the |
|
length. |
|
|
|
* ``transferred`` |
|
|
|
The amount of data transferred so far of the length of this slice. The |
|
network filesystem or cache should start the operation this far into the |
|
slice. If a short read occurs, the helpers will call again, having updated |
|
this to reflect the amount read so far. |
|
|
|
* ``flags`` |
|
|
|
Flags pertaining to the read. There are two of interest to the filesystem |
|
or cache: |
|
|
|
* ``NETFS_SREQ_CLEAR_TAIL`` |
|
|
|
This can be set to indicate that the remainder of the slice, from |
|
transferred to len, should be cleared. |
|
|
|
* ``NETFS_SREQ_SEEK_DATA_READ`` |
|
|
|
This is a hint to the cache that it might want to try skipping ahead to |
|
the next data (ie. using SEEK_DATA). |
|
|
|
* ``debug_index`` |
|
|
|
A number allocated to this slice that can be displayed in trace lines for |
|
reference. |
|
|
|
|
|
Read Helper Operations |
|
---------------------- |
|
|
|
The network filesystem must provide the read helpers with a table of operations |
|
through which it can issue requests and negotiate:: |
|
|
|
struct netfs_read_request_ops { |
|
void (*init_rreq)(struct netfs_read_request *rreq, struct file *file); |
|
bool (*is_cache_enabled)(struct inode *inode); |
|
int (*begin_cache_operation)(struct netfs_read_request *rreq); |
|
void (*expand_readahead)(struct netfs_read_request *rreq); |
|
bool (*clamp_length)(struct netfs_read_subrequest *subreq); |
|
void (*issue_op)(struct netfs_read_subrequest *subreq); |
|
bool (*is_still_valid)(struct netfs_read_request *rreq); |
|
int (*check_write_begin)(struct file *file, loff_t pos, unsigned len, |
|
struct page *page, void **_fsdata); |
|
void (*done)(struct netfs_read_request *rreq); |
|
void (*cleanup)(struct address_space *mapping, void *netfs_priv); |
|
}; |
|
|
|
The operations are as follows: |
|
|
|
* ``init_rreq()`` |
|
|
|
[Optional] This is called to initialise the request structure. It is given |
|
the file for reference and can modify the ->netfs_priv value. |
|
|
|
* ``is_cache_enabled()`` |
|
|
|
[Required] This is called by netfs_write_begin() to ask if the file is being |
|
cached. It should return true if it is being cached and false otherwise. |
|
|
|
* ``begin_cache_operation()`` |
|
|
|
[Optional] This is called to ask the network filesystem to call into the |
|
cache (if present) to initialise the caching state for this read. The netfs |
|
library module cannot access the cache directly, so the cache should call |
|
something like fscache_begin_read_operation() to do this. |
|
|
|
The cache gets to store its state in ->cache_resources and must set a table |
|
of operations of its own there (though of a different type). |
|
|
|
This should return 0 on success and an error code otherwise. If an error is |
|
reported, the operation may proceed anyway, just without local caching (only |
|
out of memory and interruption errors cause failure here). |
|
|
|
* ``expand_readahead()`` |
|
|
|
[Optional] This is called to allow the filesystem to expand the size of a |
|
readahead read request. The filesystem gets to expand the request in both |
|
directions, though it's not permitted to reduce it as the numbers may |
|
represent an allocation already made. If local caching is enabled, it gets |
|
to expand the request first. |
|
|
|
Expansion is communicated by changing ->start and ->len in the request |
|
structure. Note that if any change is made, ->len must be increased by at |
|
least as much as ->start is reduced. |
|
|
|
* ``clamp_length()`` |
|
|
|
[Optional] This is called to allow the filesystem to reduce the size of a |
|
subrequest. The filesystem can use this, for example, to chop up a request |
|
that has to be split across multiple servers or to put multiple reads in |
|
flight. |
|
|
|
This should return 0 on success and an error code on error. |
|
|
|
* ``issue_op()`` |
|
|
|
[Required] The helpers use this to dispatch a subrequest to the server for |
|
reading. In the subrequest, ->start, ->len and ->transferred indicate what |
|
data should be read from the server. |
|
|
|
There is no return value; the netfs_subreq_terminated() function should be |
|
called to indicate whether or not the operation succeeded and how much data |
|
it transferred. The filesystem also should not deal with setting pages |
|
uptodate, unlocking them or dropping their refs - the helpers need to deal |
|
with this as they have to coordinate with copying to the local cache. |
|
|
|
Note that the helpers have the pages locked, but not pinned. It is possible |
|
to use the ITER_XARRAY iov iterator to refer to the range of the inode that |
|
is being operated upon without the need to allocate large bvec tables. |
|
|
|
* ``is_still_valid()`` |
|
|
|
[Optional] This is called to find out if the data just read from the local |
|
cache is still valid. It should return true if it is still valid and false |
|
if not. If it's not still valid, it will be reread from the server. |
|
|
|
* ``check_write_begin()`` |
|
|
|
[Optional] This is called from the netfs_write_begin() helper once it has |
|
allocated/grabbed the page to be modified to allow the filesystem to flush |
|
conflicting state before allowing it to be modified. |
|
|
|
It should return 0 if everything is now fine, -EAGAIN if the page should be |
|
regrabbed and any other error code to abort the operation. |
|
|
|
* ``done`` |
|
|
|
[Optional] This is called after the pages in the request have all been |
|
unlocked (and marked uptodate if applicable). |
|
|
|
* ``cleanup`` |
|
|
|
[Optional] This is called as the request is being deallocated so that the |
|
filesystem can clean up ->netfs_priv. |
|
|
|
|
|
|
|
Read Helper Procedure |
|
--------------------- |
|
|
|
The read helpers work by the following general procedure: |
|
|
|
* Set up the request. |
|
|
|
* For readahead, allow the local cache and then the network filesystem to |
|
propose expansions to the read request. This is then proposed to the VM. |
|
If the VM cannot fully perform the expansion, a partially expanded read will |
|
be performed, though this may not get written to the cache in its entirety. |
|
|
|
* Loop around slicing chunks off of the request to form subrequests: |
|
|
|
* If a local cache is present, it gets to do the slicing, otherwise the |
|
helpers just try to generate maximal slices. |
|
|
|
* The network filesystem gets to clamp the size of each slice if it is to be |
|
the source. This allows rsize and chunking to be implemented. |
|
|
|
* The helpers issue a read from the cache or a read from the server or just |
|
clears the slice as appropriate. |
|
|
|
* The next slice begins at the end of the last one. |
|
|
|
* As slices finish being read, they terminate. |
|
|
|
* When all the subrequests have terminated, the subrequests are assessed and |
|
any that are short or have failed are reissued: |
|
|
|
* Failed cache requests are issued against the server instead. |
|
|
|
* Failed server requests just fail. |
|
|
|
* Short reads against either source will be reissued against that source |
|
provided they have transferred some more data: |
|
|
|
* The cache may need to skip holes that it can't do DIO from. |
|
|
|
* If NETFS_SREQ_CLEAR_TAIL was set, a short read will be cleared to the |
|
end of the slice instead of reissuing. |
|
|
|
* Once the data is read, the pages that have been fully read/cleared: |
|
|
|
* Will be marked uptodate. |
|
|
|
* If a cache is present, will be marked with PG_fscache. |
|
|
|
* Unlocked |
|
|
|
* Any pages that need writing to the cache will then have DIO writes issued. |
|
|
|
* Synchronous operations will wait for reading to be complete. |
|
|
|
* Writes to the cache will proceed asynchronously and the pages will have the |
|
PG_fscache mark removed when that completes. |
|
|
|
* The request structures will be cleaned up when everything has completed. |
|
|
|
|
|
Read Helper Cache API |
|
--------------------- |
|
|
|
When implementing a local cache to be used by the read helpers, two things are |
|
required: some way for the network filesystem to initialise the caching for a |
|
read request and a table of operations for the helpers to call. |
|
|
|
The network filesystem's ->begin_cache_operation() method is called to set up a |
|
cache and this must call into the cache to do the work. If using fscache, for |
|
example, the cache would call:: |
|
|
|
int fscache_begin_read_operation(struct netfs_read_request *rreq, |
|
struct fscache_cookie *cookie); |
|
|
|
passing in the request pointer and the cookie corresponding to the file. |
|
|
|
The netfs_read_request object contains a place for the cache to hang its |
|
state:: |
|
|
|
struct netfs_cache_resources { |
|
const struct netfs_cache_ops *ops; |
|
void *cache_priv; |
|
void *cache_priv2; |
|
}; |
|
|
|
This contains an operations table pointer and two private pointers. The |
|
operation table looks like the following:: |
|
|
|
struct netfs_cache_ops { |
|
void (*end_operation)(struct netfs_cache_resources *cres); |
|
|
|
void (*expand_readahead)(struct netfs_cache_resources *cres, |
|
loff_t *_start, size_t *_len, loff_t i_size); |
|
|
|
enum netfs_read_source (*prepare_read)(struct netfs_read_subrequest *subreq, |
|
loff_t i_size); |
|
|
|
int (*read)(struct netfs_cache_resources *cres, |
|
loff_t start_pos, |
|
struct iov_iter *iter, |
|
bool seek_data, |
|
netfs_io_terminated_t term_func, |
|
void *term_func_priv); |
|
|
|
int (*write)(struct netfs_cache_resources *cres, |
|
loff_t start_pos, |
|
struct iov_iter *iter, |
|
netfs_io_terminated_t term_func, |
|
void *term_func_priv); |
|
}; |
|
|
|
With a termination handler function pointer:: |
|
|
|
typedef void (*netfs_io_terminated_t)(void *priv, |
|
ssize_t transferred_or_error, |
|
bool was_async); |
|
|
|
The methods defined in the table are: |
|
|
|
* ``end_operation()`` |
|
|
|
[Required] Called to clean up the resources at the end of the read request. |
|
|
|
* ``expand_readahead()`` |
|
|
|
[Optional] Called at the beginning of a netfs_readahead() operation to allow |
|
the cache to expand a request in either direction. This allows the cache to |
|
size the request appropriately for the cache granularity. |
|
|
|
The function is passed poiners to the start and length in its parameters, |
|
plus the size of the file for reference, and adjusts the start and length |
|
appropriately. It should return one of: |
|
|
|
* ``NETFS_FILL_WITH_ZEROES`` |
|
* ``NETFS_DOWNLOAD_FROM_SERVER`` |
|
* ``NETFS_READ_FROM_CACHE`` |
|
* ``NETFS_INVALID_READ`` |
|
|
|
to indicate whether the slice should just be cleared or whether it should be |
|
downloaded from the server or read from the cache - or whether slicing |
|
should be given up at the current point. |
|
|
|
* ``prepare_read()`` |
|
|
|
[Required] Called to configure the next slice of a request. ->start and |
|
->len in the subrequest indicate where and how big the next slice can be; |
|
the cache gets to reduce the length to match its granularity requirements. |
|
|
|
* ``read()`` |
|
|
|
[Required] Called to read from the cache. The start file offset is given |
|
along with an iterator to read to, which gives the length also. It can be |
|
given a hint requesting that it seek forward from that start position for |
|
data. |
|
|
|
Also provided is a pointer to a termination handler function and private |
|
data to pass to that function. The termination function should be called |
|
with the number of bytes transferred or an error code, plus a flag |
|
indicating whether the termination is definitely happening in the caller's |
|
context. |
|
|
|
* ``write()`` |
|
|
|
[Required] Called to write to the cache. The start file offset is given |
|
along with an iterator to write from, which gives the length also. |
|
|
|
Also provided is a pointer to a termination handler function and private |
|
data to pass to that function. The termination function should be called |
|
with the number of bytes transferred or an error code, plus a flag |
|
indicating whether the termination is definitely happening in the caller's |
|
context. |
|
|
|
Note that these methods are passed a pointer to the cache resource structure, |
|
not the read request structure as they could be used in other situations where |
|
there isn't a read request structure as well, such as writing dirty data to the |
|
cache.
|
|
|