From: David Howells <dhowells@redhat.com>

The attached patch provides documentation for CacheFS.

Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
---

 25-akpm/Documentation/filesystems/cachefs.txt |  892 ++++++++++++++++++++++++++
 1 files changed, 892 insertions(+)

diff -puN /dev/null Documentation/filesystems/cachefs.txt
--- /dev/null	2003-09-15 06:40:47.000000000 -0700
+++ 25-akpm/Documentation/filesystems/cachefs.txt	2004-09-06 16:47:55.772872448 -0700
@@ -0,0 +1,892 @@
+			  ===========================
+			  CacheFS: Caching Filesystem
+			  ===========================
+
+========
+OVERVIEW
+========
+
+CacheFS is a general purpose cache for network filesystems, though it could be
+used for caching other things such as ISO9660 filesystems too.
+
+CacheFS uses a block device directly rather than a bunch of files under an
+already mounted filesystem. For why this is so, see further on. If necessary,
+however, a file can be loopback mounted as a cache.
+
+CacheFS does not follow the idea of completely loading every netfs file opened
+into the cache before it can be operated upon, and then serving the pages out
+of CacheFS rather than the netfs because:
+
+ (1) It must be practical to operate without a cache.
+
+ (2) The size of any accessible file must not be limited to the size of the
+     cache.
+
+ (3) The combined size of all opened files (this includes mapped libraries)
+     must not be limited to the size of the cache.
+
+ (4) The user should not be forced to download an entire file just to do a
+     one-off access of a small portion of it.
+
+It rather serves the cache out in PAGE_SIZE chunks as and when requested by
+the netfs('s) using it.
+
+
+CacheFS provides the following facilities:
+
+ (1) More than one block device can be mounted as a cache.
+
+ (2) Caches can be mounted / unmounted at any time.
+
+ (3) The netfs is provided with an interface that allows either party to
+     withdraw caching facilities from a file (required for (2)).
+
+ (4) The interface to the netfs returns as few errors as possible, preferring
+     rather to let the netfs remain oblivious.
+
+ (5) Cookies are used to represent files and indexes to the netfs. The simplest
+     cookie is just a NULL pointer - indicating nothing cached there.
+
+ (6) The netfs is allowed to propose - dynamically - any index hierarchy it
+     desires, though it must be aware that the index search function is
+     recursive and stack space is limited.
+
+ (7) Data I/O is done direct to and from the netfs's pages. The netfs indicates
+     that page A is at index B of the data-file represented by cookie C, and
+     that it should be read or written. CacheFS may or may not start I/O on
+     that page, but if it does, a netfs callback will be invoked to indicate
+     completion.
+
+ (8) Cookies can be "retired" upon release. At this point CacheFS will mark
+     them as obsolete and the index hierarchy rooted at that point will get
+     recycled.
+
+ (9) The netfs provides a "match" function for index searches. In addition to
+     saying whether a match was made or not, this can also specify that an
+     entry should be updated or deleted.
+
+(10) All metadata modifications (this includes index contents) are performed
+     as journalled transactions. These are replayed on mounting.
+
+
+=============================================
+WHY A BLOCK DEVICE? WHY NOT A BUNCH OF FILES?
+=============================================
+
+CacheFS is backed by a block device rather than being backed by a bunch of
+files on a filesystem. This confers several advantages:
+
+ (1) Performance.
+
+     Going directly to a block device means that we can DMA directly to/from
+     the the netfs's pages. If another filesystem was managing the backing
+     store, everything would have to be copied between pages. Whilst DirectIO
+     does exist, it doesn't appear easy to make use of in this situation.
+
+     New address space or file operations could be added to make it possible to
+     persuade a backing discfs to generate block I/O directly to/from disc
+     blocks under its control, but that then means the discfs has to keep track
+     of I/O requests to pages not under its control.
+
+     Furthermore, we only have to do one lot of readahead calculations, not
+     two; in the discfs backing case, the netfs would do one and the discfs
+     would do one.
+
+ (2) Memory.
+
+     Using a block device means that we have a lower memory usage - all data
+     pages belong to the netfs we're backing. If we used a filesystem, we would
+     have twice as many pages at certain points - one from the netfs and one
+     from the backing discfs. In the backing discfs model, under situations of
+     memory pressure, we'd have to allocate or keep around a discfs page to be
+     able to write out a netfs page; or else we'd need to be able to punch a
+     hole in the backing file.
+
+     Furthermore, whilst we have to keep a CacheFS inode around in memory for
+     every netfs inode we're backing, a backing discfs would have to keep the
+     dentry and possibly a file struct too.
+
+ (3) Holes.
+
+     The cache uses holes to indicate to the netfs that it hasn't yet
+     downloaded the data for that page.
+
+     Since CacheFS is its own filesystem, it can support holes in files
+     trivially. Running on top of another discfs would limit us to using ones
+     that can support holes.
+
+     Furthermore, it would have to be made possible to detect holes in a discfs
+     file, rather than just seeing zero filled blocks.
+
+ (4) Data Consistency.
+
+     Cachefs uses a pair of journals to keep track of the state of the cache
+     and all the pages contained therein. This means that it doesn't get into
+     an inconsistent state in the on-disc cache and it doesn't lose disc space.
+
+     CacheFS takes especial care between the allocation of a block and its
+     splicing into the on-disc pointer tree, and the data having been written
+     to disc. If power is interrupted and then restored, the journals are
+     replayed and if it is seen that a block was allocated but not written it
+     is then punched out. Being backed by a discfs, I'm not certain what will
+     happen. It may well be possible to mark a discfs's journal, if it has one,
+     but how does the discfs deal with those marks? This also limits consistent
+     caching to running on journalled discfs's where there's a function to
+     write extraordinary marks into the journal.
+
+     The alternative would be to keep flags in the superblock, and to
+     re-initialise the cache if it wasn't cleanly unmounted.
+
+     Knowing that your cache is in a good state is vitally important if you,
+     say, put /usr on AFS. Some organisations put everything barring /etc,
+     /sbin, /lib and /var on AFS and have an enormous cache on every
+     computer. Imagine if the power goes out and renders every cache
+     inconsistent, requiring all the computers to re-initialise their caches
+     when the power comes back on...
+
+ (5) Recycling.
+
+     Recycling is simple on CacheFS. It can just scan the metadata index to
+     look for inodes that require reclamation/recycling; and it can also build
+     up a list of the least recently used inodes so that they can be reclaimed
+     later to make space.
+
+     Doing this on a discfs would require a search going down through a nest
+     of directories, and would probably have to be done in userspace.
+
+ (6) Disc Space.
+
+     Whilst the block device does set a hard ceiling on the amount of space
+     available, CacheFS can guarantee that all that space will be available to
+     the cache. On a discfs-backed cache, the administrator would probably want
+     to set a cache size limit, but the system wouldn't be able guarantee that
+     all that space would be available to the cache - not unless that cache was
+     on a partition of its own.
+
+     Furthermore, with a discfs-backed cache, if the recycler starts to reclaim
+     cache files to make space, the freed blocks may just be eaten directly by
+     userspace programs, potentially resulting in the entire cache being
+     consumed. Alternatively, netfs operations may end up being held up because
+     the cache can't get blocks on which to store the data.
+
+ (7) Users.
+
+     Users can't so easily go into CacheFS and run amok. The worst they can do
+     is cause bits of the cache to be recycled early. With a discfs-backed
+     cache, they can do all sorts of bad things to the files belonging to the
+     cache, and they can do this quite by accident.
+
+
+On the other hand, there would be some advantages to using a file-based cache
+rather than a blockdev-based cache:
+
+ (1) Having to copy to a discfs's page would mean that a netfs could just make
+     the copy and then assume its own page is ready to go.
+
+ (2) Backing onto a discfs wouldn't require a committed block device. You would
+     just nominate a directory and go from there. With CacheFS you have to
+     repartition or install an extra drive to make use of it in an existing
+     system (though the loopback device offers a way out).
+
+ (3) CacheFS requires the netfs to store a key in any pertinent index entry,
+     and it also permits a limited amount arbitrary data to be stored there.
+
+     A discfs could be requested to store the netfs's data in xattrs, and the
+     filename could be used to store the key, though the key would have to be
+     rendered as text not binary. Likewise indexes could be rendered as
+     directories with xattrs.
+
+ (4) You could easily make your cache bigger if the discfs has plenty of space,
+     you could even go across multiple mountpoints.
+
+
+======================
+GENERAL ON-DISC LAYOUT
+======================
+
+The filesystem is divided into a number of parts:
+
+  0	+---------------------------+
+	|        Superblock         |
+  1	+---------------------------+
+	|      Update Journal       |
+	+---------------------------+
+	|     Validity Journal      |
+	+---------------------------+
+	|    Write-Back Journal     |
+	+---------------------------+
+	|                           |
+	|           Data            |
+	|                           |
+ END	+---------------------------+
+
+The superblock contains the filesystem ID tags and pointers to all the other
+regions.
+
+The update journal consists of a set of entries of sector size that keep track
+of what changes have been made to the on-disc filesystem, but not yet
+committed.
+
+The validity journal contains records of data blocks that have been allocated
+but not yet written. Upon journal replay, all these blocks will be detached
+from their pointers and recycled.
+
+The writeback journal keeps track of changes that have been made locally to
+data blocks, but that have not yet been committed back to the server. This is
+not yet implemented.
+
+The journals are replayed upon mounting to make sure that the cache is in a
+reasonable state.
+
+The data region holds a number of things:
+
+  (1) Index Files
+
+      These are files of entries used by CacheFS internally and by filesystems
+      that wish to cache data here (such as AFS) to keep track of what's in
+      the cache at any given time.
+
+      The first index file (inode 1) is special. It holds the CacheFS-specific
+      metadata for every file in the cache (including direct, single-indirect
+      and double-indirect block pointers).
+
+      The second index file (inode 2) is also special. It has an entry for
+      each filesystem that's currently holding data in this cache.
+
+      Every allocated entry in an index has an inode bound to it. This inode is
+      either another index file or it is a data file.
+
+  (2) Cached Data Files
+
+      These are caches of files from remote servers. Holes in these files
+      represent blocks not yet obtained from the server.
+
+  (3) Indirection Blocks
+
+      Should a file have more blocks than can be pointed to by the few
+      pointers in its storage management record, then indirection blocks will
+      be used to point to further data or indirection blocks.
+
+      Three levels of indirection are currently supported:
+
+	- single indirection
+	- double indirection
+
+  (4) Allocation Nodes and Free Blocks
+
+      The free blocks of the filesystem are kept in two single-branched
+      "trees". One tree is the blocks that are ready to be allocated, and the
+      other is the blocks that have just been recycled. When the former tree
+      becomes empty, the latter tree is decanted across.
+
+      Each tree is arranged as a chain of "nodes", each node points to the next
+      node in the chain (unless it's at the end) and also up to 1022 free
+      blocks.
+
+Note that all blocks are PAGE_SIZE in size. The blocks are numbered starting
+with the superblock at 0. Using 32-bit block pointers, a maximum number of
+0xffffffff blocks can be accessed, meaning that the maximum cache size is ~16TB
+for 4KB pages.
+
+
+========
+MOUNTING
+========
+
+Since CacheFS is actually a quasi-filesystem, it requires a block device behind
+it. The way to give it one is to mount it as cachefs type on a directory
+somewhere. The mounted filesystem will then present the user with a set of
+directories outlining the index structure resident in the cache. Indexes
+(directories) and files can be turfed out of the cache by the sysadmin through
+the use of rmdir and unlink.
+
+For instance, if a cache contains AFS data, the user might see the following:
+
+	root>mount -t cachefs /dev/hdg9 /cache-hdg9
+	root>ls -1 /cache-hdg9
+	afs
+	root>ls -1 /cache-hdg9/afs
+	cambridge.redhat.com
+	root>ls -1 /cache-hdg9/afs/cambridge.redhat.com
+	root.afs
+	root.cell
+
+However, a block device that's going to be used for a cache must be prepared
+before it can be mounted initially. This is done very simply by:
+
+	echo "cachefs___" >/dev/hdg9
+
+During the initial mount, the basic structure will be scribed into the cache,
+and then a background thread will "recycle" the as-yet unused data blocks.
+
+
+======================
+NETWORK FILESYSTEM API
+======================
+
+There is, of course, an API by which a network filesystem can make use of the
+CacheFS facilities. This is based around a number of principles:
+
+ (1) Every file and index is represented by a cookie. This cookie may or may
+     not have anything associated with it, but the netfs doesn't need to care.
+
+ (2) Barring the top-level index (one entry per cached netfs), the index
+     hierarchy for each netfs is structured according the whim of the netfs.
+
+ (3) Any netfs page being backed by the cache must have a small token
+     associated with it (possibly pointed to by page->private) so that CacheFS
+     can keep track of it.
+
+This API is declared in <linux/cachefs.h>.
+
+
+NETWORK FILESYSTEM DEFINITION
+-----------------------------
+
+CacheFS needs a description of the network filesystem. This is specified using
+a record of the following structure:
+
+	struct cachefs_netfs {
+		const char			*name;
+		unsigned			version;
+		struct cachefs_netfs_operations	*ops;
+		struct cachefs_cookie		*primary_index;
+		...
+	};
+
+This first three fields should be filled in before registration, and the fourth
+will be filled in by the registration function; any other fields should just be
+ignored and are for internal use only.
+
+The fields are:
+
+ (1) The name of the netfs (used as the key in the toplevel index).
+
+ (2) The version of the netfs (if the name matches but the version doesn't, the
+     entire on-disc hierarchy for this netfs will be scrapped and begun
+     afresh).
+
+ (3) The operations table is defined as follows:
+
+	struct cachefs_netfs_operations {
+		struct cachefs_page *(*get_page_cookie)(struct page *page);
+	};
+
+     The functions here must all be present. Currently the only one is:
+
+     (a) get_page_cookie(): Get the token used to bind a page to a block in a
+         cache. This function should allocate it if it doesn't exist.
+
+	 Return -ENOMEM if there's not enough memory and -ENODATA if the page
+	 just shouldn't be cached.
+
+	 Set *_page_cookie to point to the token and return 0 if there is now a
+	 cookie. Note that the netfs must keep track of the cookie itself (and
+	 free it later). page->private can be used for this (see below).
+
+ (4) The cookie representing the primary index will be allocated according to
+     another parameter passed into the registration function.
+
+For example, kAFS (linux/fs/afs/) uses the following definitions to describe
+itself:
+
+	static struct cachefs_netfs_operations afs_cache_ops = {
+		.get_page_cookie	= afs_cache_get_page_cookie,
+	};
+
+	struct cachefs_netfs afs_cache_netfs = {
+		.name			= "afs",
+		.version		= 0,
+		.ops			= &afs_cache_ops,
+	};
+
+
+INDEX DEFINITION
+----------------
+
+Indexes are used for two purposes:
+
+ (1) To speed up the finding of a file based on a series of keys (such as AFS's
+     "cell", "volume ID", "vnode ID").
+
+ (2) To make it easier to discard a subset of all the files cached based around
+     a particular key - for instance to mirror the removal of an AFS volume.
+
+However, since it's unlikely that any two netfs's are going to want to define
+their index hierarchies in quite the same way, CacheFS tries to impose as few
+restraints as possible on how an index is structured and where it is placed in
+the tree. The netfs can even mix indexes and data files at the same level, but
+it's not recommended.
+
+There are some limits on indexes:
+
+ (1) All entries in any given index must be the same size. An array of such
+     entries needn't fit exactly into a page, but they will be not laid across
+     a page boundary.
+
+     The netfs supplies a blob of data for each index entry, and CacheFS
+     provides an inode number and a flag.
+
+ (2) The entries in one index can be of a different size to the entries in
+     another index.
+
+ (3) The entry data must be journallable, and thus must be able to fit into an
+     update journal entry - this limits the maximum size to a little over 400
+     bytes at present.
+
+ (4) The index data must start with the key. The layout of the key is described
+     in the index definition, and this is used to display the key in some
+     appropriate way.
+
+ (5) The depth of the index tree should be judged with care as the search
+     function is recursive. Too many layers will run the kernel out of stack.
+
+To define an index, a structure of the following type should be filled out:
+
+	struct cachefs_index_def
+	{
+		uint8_t name[8];
+		uint16_t data_size;
+		struct {
+			uint8_t type;
+			uint16_t len;
+		} keys[4];
+
+		cachefs_match_val_t (*match)(void *target_netfs_data,
+					     const void *entry);
+
+		void (*update)(void *source_netfs_data, void *entry);
+	};
+
+This has the following fields:
+
+ (1) The name of the index (NUL terminated unless all 8 chars are used).
+
+ (2) The size of the data blob provided by the netfs.
+
+ (3) A definition of the key(s) at the beginning of the blob. The netfs is
+     permitted to specify up to four keys. The total length must not exceed the
+     data size. It is assumed that the keys will be laid end to end in order,
+     starting at the first byte of the data.
+
+     The type field specifies the way the data should be displayed. It can be
+     one of:
+
+	(*) CACHEFS_INDEX_KEYS_NOTUSED	- key field not used
+	(*) CACHEFS_INDEX_KEYS_BIN	- display byte-by-byte in hex
+	(*) CACHEFS_INDEX_KEYS_ASCIIZ	- NUL-terminated ASCII
+	(*) CACHEFS_INDEX_KEYS_IPV4ADDR	- display as IPv4 address
+	(*) CACHEFS_INDEX_KEYS_IPV6ADDR	- display as IPv6 address
+
+ (4) A function to compare an in-page-cache index entry blob with the data
+     passed to the cookie acquisition function. This function can also be used
+     to extract data from the blob and copy it into the netfs's structures.
+
+     The values this function can return are:
+
+	(*) CACHEFS_MATCH_FAILED - failed to match
+	(*) CACHEFS_MATCH_SUCCESS - successful match
+	(*) CACHEFS_MATCH_SUCCESS_UPDATE - successful match, entry needs update
+	(*) CACHEFS_MATCH_SUCCESS_DELETE - entry should be deleted
+
+     For example, in linux/fs/afs/vnode.c:
+
+	static cachefs_match_val_t
+	afs_vnode_cache_match(void *target, const void *entry)
+	{
+		const struct afs_cache_vnode *cvnode = entry;
+		struct afs_vnode *vnode = target;
+
+		if (vnode->fid.vnode != cvnode->vnode_id)
+			return CACHEFS_MATCH_FAILED;
+
+		if (vnode->fid.unique != cvnode->vnode_unique ||
+		    vnode->status.version != cvnode->data_version)
+			return CACHEFS_MATCH_SUCCESS_DELETE;
+
+		return CACHEFS_MATCH_SUCCESS;
+	}
+
+ (5) A function to initialise or update an in-page-cache index entry blob from
+     netfs data passed to CacheFS by the netfs. This function should not assume
+     that there's any data yet in the in-page-cache.
+
+     Continuing the above example:
+
+	static void afs_vnode_cache_update(void *source, void *entry)
+	{
+		struct afs_cache_vnode *cvnode = entry;
+		struct afs_vnode *vnode = source;
+
+		cvnode->vnode_id	= vnode->fid.vnode;
+		cvnode->vnode_unique	= vnode->fid.unique;
+		cvnode->data_version	= vnode->status.version;
+	}
+
+To finish the above example, the index definition for the "vnode" level is as
+follows:
+
+	struct cachefs_index_def afs_vnode_cache_index_def = {
+		.name		= "vnode",
+		.data_size	= sizeof(struct afs_cache_vnode),
+		.keys[0]	= { CACHEFS_INDEX_KEYS_BIN, 4 },
+		.match		= afs_vnode_cache_match,
+		.update		= afs_vnode_cache_update,
+	};
+
+The first element of struct afs_cache_vnode is the vnode ID.
+
+And for contrast, the cell index definition is:
+
+	struct cachefs_index_def afs_cache_cell_index_def = {
+		.name			= "cell_ix",
+		.data_size		= sizeof(afs_cell_t),
+		.keys[0]		= { CACHEFS_INDEX_KEYS_ASCIIZ, 64 },
+		.match			= afs_cell_cache_match,
+		.update			= afs_cell_cache_update,
+	};
+
+The cell index is the primary index for kAFS.
+
+
+NETWORK FILESYSTEM (UN)REGISTRATION
+-----------------------------------
+
+The first step is to declare the network filesystem to the cache. This also
+involves specifying the layout of the primary index (for AFS, this would be the
+"cell" level).
+
+The registration function is:
+
+	int cachefs_register_netfs(struct cachefs_netfs *netfs,
+				   struct cachefs_index_def *primary_idef);
+
+It just takes pointers to the netfs definition and the primary index
+definition. It returns 0 or an error as appropriate.
+
+For kAFS, registration is done as follows:
+
+	ret = cachefs_register_netfs(&afs_cache_netfs,
+				     &afs_cache_cell_index_def);
+
+The last step is, of course, unregistration:
+
+	void cachefs_unregister_netfs(struct cachefs_netfs *netfs);
+
+
+INDEX REGISTRATION
+------------------
+
+The second step is to inform cachefs about part of an index hierarchy that can
+be used to locate files. This is done by requesting a cookie for each index in
+the path to the file:
+
+	struct cachefs_cookie *
+	cachefs_acquire_cookie(struct cachefs_cookie *iparent,
+			       struct cachefs_index_def *idef,
+			       void *netfs_data);
+
+This function creates an index entry in the index represented by iparent,
+loading the associated blob by calling iparent's update method with the
+supplied netfs_data.
+
+It also creates a new index inode, formatted according to the definition
+supplied in idef. The new cookie is then returned in *_cookie.
+
+Note that this function never returns an error - all errors are handled
+internally. It may also return CACHEFS_NEGATIVE_COOKIE. It is quite acceptable
+to pass this token back to this function as iparent (or even to the relinquish
+cookie, read page and write page functions - see below).
+
+Note also that no indexes are actually created on disc until a data file needs
+to be created somewhere down the hierarchy. Furthermore, an index may be
+created in several different caches independently at different times. This is
+all handled transparently, and the netfs doesn't see any of it.
+
+For example, with AFS, a cell would be added to the primary index. This index
+entry would have a dependent inode containing a volume location index for the
+volume mappings within this cell:
+
+	cell->cache =
+		cachefs_acquire_cookie(afs_cache_netfs.primary_index,
+				       &afs_vlocation_cache_index_def,
+				       cell);
+
+Then when a volume location was accessed, it would be entered into the cell's
+index and an inode would be allocated that acts as a volume type and hash chain
+combination:
+
+	vlocation->cache =
+		cachefs_acquire_cookie(cell->cache,
+				       &afs_volume_cache_index_def,
+				       vlocation);
+
+And then a particular flavour of volume (R/O for example) could be added to
+that index, creating another index for vnodes (AFS inode equivalents):
+
+	volume->cache =
+		cachefs_acquire_cookie(vlocation->cache,
+				       &afs_vnode_cache_index_def,
+				       volume);
+
+
+DATA FILE REGISTRATION
+----------------------
+
+The third step is to request a data file be created in the cache. This is
+almost identical to index cookie acquisition. The only difference is that a
+NULL index definition is passed.
+
+	vnode->cache =
+		cachefs_acquire_cookie(volume->cache,
+				       NULL,
+				       vnode);
+
+
+
+PAGE ALLOC/READ/WRITE
+---------------------
+
+And the fourth step is to propose a page be cached. There are two functions
+that are used to do this.
+
+Firstly, the netfs should ask CacheFS to examine the caches and read the
+contents cached for a particular page of a particular file if present, or else
+allocate space to store the contents if not:
+
+	typedef
+	void (*cachefs_rw_complete_t)(void *cookie_data,
+				      struct page *page,
+				      void *end_io_data,
+				      int error);
+
+	int cachefs_read_or_alloc_page(struct cachefs_cookie *cookie,
+				       struct page *page,
+				       cachefs_rw_complete_t end_io_func,
+				       void *end_io_data,
+				       unsigned long gfp);
+
+The cookie argument must specify a data file cookie, the page specified will
+have the data loaded into it (and is also used to specify the page number), and
+the gfp argument is used to control how any memory allocations made are satisfied.
+
+If the cookie indicates the inode is not cached:
+
+ (1) The function will return -ENOBUFS.
+
+Else if there's a copy of the page resident on disc:
+
+ (1) The function will submit a request to read the data off the disc directly
+     into the page specified.
+
+ (2) The function will return 0.
+
+ (3) When the read is complete, end_io_func() will be invoked with:
+
+     (*) The netfs data supplied when the cookie was created.
+
+     (*) The page descriptor.
+
+     (*) The data passed to the above function.
+
+     (*) An argument that's 0 on success or negative for an error.
+
+     If an error occurs, it should be assumed that the page contains no usable
+     data.
+
+Otherwise, if there's not a copy available on disc:
+
+ (1) A block may be allocated in the cache and attached to the inode at the
+     appropriate place.
+
+ (2) The validity journal will be marked to indicate this page does not yet
+     contain valid data.
+
+ (3) The function will return -ENODATA.
+
+
+Secondly, if the netfs changes the contents of the page (either due to an
+initial download or if a user performs a write), then the page should be
+written back to the cache:
+
+	int cachefs_write_page(struct cachefs_cookie *cookie,
+			       struct page *page,
+			       cachefs_rw_complete_t end_io_func,
+			       void *end_io_data,
+			       unsigned long gfp);
+
+The cookie argument must specify a data file cookie, the page specified should
+contain the data to be written (and is also used to specify the page number),
+and the gfp argument is used to control how any memory allocations made are
+satisfied.
+
+If the cookie indicates the inode is not cached then:
+
+ (1) The function will return -ENOBUFS.
+
+Else if there's a block allocated on disc to hold this page:
+
+ (1) The function will submit a request to write the data to the disc directly
+     from the page specified.
+
+ (2) The function will return 0.
+
+ (3) When the write is complete:
+
+     (a) Any associated validity journal entry will be cleared (the block now
+	 contains valid data as far as CacheFS is concerned).
+
+     (b) end_io_func() will be invoked with:
+
+	 (*) The netfs data supplied when the cookie was created.
+
+	 (*) The page descriptor.
+
+	 (*) The data passed to the above function.
+
+	 (*) An argument that's 0 on success or negative for an error.
+
+	 If an error happens, it can be assumed that the page has been
+	 discarded from the cache.
+
+
+PAGE UNCACHING
+--------------
+
+To uncache a page, this function should be called:
+
+	void cachefs_uncache_page(struct cachefs_cookie *cookie,
+				  struct page *page);
+
+This detaches the page specified from the data file indicated by the cookie and
+unbinds it from the underlying block.
+
+Note that pages can't be explicitly detached from the a data file. The whole
+data file must be retired (see the relinquish cookie function below).
+
+Furthermore, note that this does not cancel the asynchronous read or write
+operation started by the read/alloc and write functions.
+
+
+INDEX AND DATA FILE UPDATE
+--------------------------
+
+To request an update of the index data for an index or data file, the following
+function should be called:
+
+	void cachefs_update_cookie(struct cachefs_cookie *cookie);
+
+This function will refer back to the netfs_data pointer stored in the cookie by
+the acquisition function to obtain the data to write into each revised index
+entry. The update method in the parent index definition will be called to
+transfer the data.
+
+
+INDEX AND DATA FILE UNREGISTRATION
+----------------------------------
+
+To get rid of a cookie, this function should be called.
+
+	void cachefs_relinquish_cookie(struct cachefs_cookie *cookie,
+				       int retire);
+
+If retire is non-zero, then the index or file will be marked for recycling, and
+all copies of it will be removed from all active caches in which it is present.
+
+If retire is zero, then the inode may be available again next the the
+acquisition function is called.
+
+One very important note - relinquish must NOT be called unless all "child"
+indexes, files and pages have been relinquished first.
+
+
+PAGE TOKEN MANAGEMENT
+---------------------
+
+As previously mentioned, the netfs must keep a token associated with each page
+currently actively backed by the cache. This is used by CacheFS to go from a
+page to the internal representation of the underlying block and back again. It
+is particularly important for managing the withdrawal of a cache whilst it is
+in active service (eg: it got unmounted).
+
+The token is this:
+
+	struct cachefs_page {
+		...
+	};
+
+Note that all fields are for internal CacheFS use only.
+
+The token only needs to be allocated when CacheFS asks for it. This it will do
+by calling the get_page_cookie() method in the netfs definition ops table. Once
+allocated, the same token should be presented every time the method is called
+again for a particular page.
+
+The token should be retained by the netfs, and should be deleted only after the
+page has been uncached.
+
+One way to achieve this is to attach the token to page->private (and set the
+PG_private bit on the page) once allocated. Shortcut routines are provided by
+CacheFS to do this. Firstly, to retrieve if present and allocate if not:
+
+	struct cachefs_page *cachefs_page_get_private(struct page *page,
+						      unsigned gfp);
+
+Secondly to retrieve if present and BUG if not:
+
+	static inline
+	struct cachefs_page *cachefs_page_grab_private(struct page *page);
+
+To clean up the tokens, the netfs inode hosting the page should be provided
+with address space operations that circumvent the buffer-head operations for a
+page. For instance:
+
+	struct address_space_operations afs_fs_aops = {
+		...
+		.sync_page	= block_sync_page,
+		.set_page_dirty	= __set_page_dirty_nobuffers,
+		.releasepage	= afs_file_releasepage,
+		.invalidatepage	= afs_file_invalidatepage,
+	};
+
+	static int afs_file_invalidatepage(struct page *page,
+					   unsigned long offset)
+	{
+		struct afs_vnode *vnode = AFS_FS_I(page->mapping->host);
+		int ret = 1;
+
+		BUG_ON(!PageLocked(page));
+		if (!PagePrivate(page))
+			return 1;
+		cachefs_uncache_page(vnode->cache,page);
+		if (offset == 0)
+			return 1;
+		BUG_ON(!PageLocked(page));
+		if (PageWriteback(page))
+			return 0;
+		return page->mapping->a_ops->releasepage(page, 0);
+	}
+
+	static int afs_file_releasepage(struct page *page, int gfp_flags)
+	{
+		struct cachefs_page *token;
+		struct afs_vnode *vnode = AFS_FS_I(page->mapping->host);
+
+		if (PagePrivate(page)) {
+			cachefs_uncache_page(vnode->cache, page);
+			token = (struct cachefs_page *) page->private;
+			page->private = 0;
+			ClearPagePrivate(page);
+			if (token)
+				kfree(token);
+		}
+		return 0;
+	}
+
+
+INDEX AND DATA FILE INVALIDATION
+--------------------------------
+
+There is no direct way to invalidate an index subtree or a data file. To do
+this, the caller should relinquish and retire the cookie they have, and then
+acquire a new one.
_