The vnode is the focus of all file activity in
NetBSD. There is a unique vnode allocated for each active file, directory, mounted-on file, fifo, domain socket, symbolic link and device. The kernel has no concept of a file's underlying structure and so it relies on the information stored in the vnode to describe the file. Thus, the vnode associated with a file holds all the administration information pertaining to it.
When a process requests an operation on a file, the
vfs(9) interface passes control to a file system type dependent function to carry out the operation. If the file system type dependent function finds that a vnode representing the file is not in main memory, it dynamically allocates a new vnode from the system main memory pool. Once allocated, the vnode is attached to the data structure pointer associated with the cause of the vnode allocation and it remains resident in the main memory until the system decides that it is no longer needed and can be recycled.
The vnode has the following structure:
struct vnode {
struct uvm_object v_uobj; /* the VM object */
kcondvar_t v_cv; /* synchronization */
voff_t v_size; /* size of file */
voff_t v_writesize; /* new size after write */
int v_iflag; /* VI_* flags */
int v_vflag; /* VV_* flags */
int v_uflag; /* VU_* flags */
int v_numoutput; /* # of pending writes */
int v_writecount; /* ref count of writers */
int v_holdcnt; /* page & buffer refs */
int v_synclist_slot; /* synclist slot index */
struct mount *v_mount; /* ptr to vfs we are in */
int (**v_op)(void *); /* vnode operations vector */
TAILQ_ENTRY(vnode) v_freelist; /* vnode freelist */
struct vnodelst *v_freelisthd; /* which freelist? */
TAILQ_ENTRY(vnode) v_mntvnodes; /* vnodes for mount point */
struct buflists v_cleanblkhd; /* clean blocklist head */
struct buflists v_dirtyblkhd; /* dirty blocklist head */
TAILQ_ENTRY(vnode) v_synclist; /* vnodes with dirty bufs */
LIST_HEAD(, namecache) v_dnclist; /* namecaches (children) */
LIST_HEAD(, namecache) v_nclist; /* namecaches (parent) */
union {
struct mount *vu_mountedhere;/* ptr to vfs (VDIR) */
struct socket *vu_socket; /* unix ipc (VSOCK) */
struct specnode *vu_specnode; /* device (VCHR, VBLK) */
struct fifoinfo *vu_fifoinfo; /* fifo (VFIFO) */
struct uvm_ractx *vu_ractx; /* read-ahead ctx (VREG) */
} v_un;
enum vtype v_type; /* vnode type */
enum vtagtype v_tag; /* type of underlying data */
struct vnlock v_lock; /* lock for this vnode */
void *v_data; /* private data for fs */
struct klist v_klist; /* notes attached to vnode */
};
Most members of the vnode structure should be treated as opaque and only manipulated using the proper functions. There are some rather common exceptions detailed throughout this page.
Files and file systems are inextricably linked with the virtual memory system and
v_uobj contains the data maintained by the virtual memory system. For compatibility with code written before the integration of
uvm(9) into
NetBSD, C-preprocessor directives are used to alias the members of
v_uobj.
Vnode flags are recorded by
v_flag. Valid flags are:
VROOT
This vnode is the root of its file system.
VTEXT
This vnode is a pure text prototype.
VSYSTEM
This vnode is being used by the kernel; only used to skip quota files in vflush().
VISTTY
This vnode represents a tty; used when reading dead vnodes.
VEXECMAP
This vnode has executable mappings.
VWRITEMAP
This vnode might have PROT_WRITE user mappings.
VWRITEMAPDIRTY
This vnode might have dirty pages due to VWRITEMAP
VLOCKSWORK
This vnode's file system supports locking.
VXLOCK
This vnode is currently locked to change underlying type.
VXWANT
A process is waiting for this vnode.
VBWAIT
Waiting for output associated with this vnode to complete.
VALIASED
This vnode has an alias.
VDIROP
This vnode is involved in a directory operation. This flag is used exclusively by LFS.
VLAYER
This vnode is on a layered file system.
VONWORKLST
This vnode is on syncer work-list.
VFREEING
This vnode is being freed.
VMAPPED
This vnode might have user mappings.
The VXLOCK flag is used to prevent multiple processes from entering the vnode reclamation code. It is also used as a flag to indicate that reclamation is in progress. The VXWANT flag is set by threads that wish to be awakened when reclamation is finished. Before
v_flag can be modified, the
v_interlock simplelock must be acquired. See
lock(9) for details on the kernel locking API.
Each vnode has three reference counts:
v_usecount,
v_writecount and
v_holdcnt. The first is the number of active references within the kernel to the vnode. This count is maintained by
vref(),
vrele(),
vrele_async(), and
vput(). The second is the number of active references within the kernel to the vnode performing write access to the file. It is maintained by the
open(2) and
close(2) system calls. The third is the number of references within the kernel requiring the vnode to remain active and not be recycled. This count is maintained by
vhold() and
holdrele(). When both the
v_usecount and
v_holdcnt reach zero, the vnode is recycled to the freelist and may be reused for another file. The transition to and from the freelist is handled by
getnewvnode(),
ungetnewvnode() and
vrecycle(). Access to
v_usecount,
v_writecount and
v_holdcnt is also protected by the
v_interlock simplelock.
The number of pending synchronous and asynchronous writes on the vnode are recorded in
v_numoutput. It is used by
fsync(2) to wait for all writes to complete before returning to the user. Its value must only be modified at splbio (see
spl(9)). It does not track the number of dirty buffers attached to the vnode.
v_dnclist and
v_nclist are used by
namecache(9) to maintain the list of associated entries so that
cache_purge(9) can purge them.
The link to the file system which owns the vnode is recorded by
v_mount. See
vfsops(9) for further information of file system mount status.
The
v_op pointer points to its vnode operations vector. This vector describes what operations can be done to the file associated with the vnode. The system maintains one vnode operations vector for each file system type configured into the kernel. The vnode operations vector contains a pointer to a function for each operation supported by the file system. See
vnodeops(9) for a description of vnode operations.
When not in use, vnodes are kept on the freelist through
v_freelist. The vnodes still reference valid files but may be reused to refer to a new file at any time. When a valid vnode which is on the freelist is used again, the user must call
vget() to increment the reference count and retrieve it from the freelist. When a user wants a new vnode for another file,
getnewvnode() is invoked to remove a vnode from the freelist and initialize it for the new file.
The type of object the vnode represents is recorded by
v_type. It is used by generic code to perform checks to ensure operations are performed on valid file system objects. Valid types are:
VNON
The vnode has no type.
VREG
The vnode represents a regular file.
VDIR
The vnode represents a directory.
VBLK
The vnode represents a block special device.
VCHR
The vnode represents a character special device.
VLNK
The vnode represents a symbolic link.
VSOCK
The vnode represents a socket.
VFIFO
The vnode represents a pipe.
VBAD
The vnode represents a bad file (not currently used).
Vnode tag types are used by external programs only (e.g.,
pstat(8)), and should never be inspected by the kernel. Its use is deprecated since new
v_tag values cannot be defined for loadable file systems. The
v_tag member is read-only. Valid tag types are:
VT_UFS
universal file system
VT_NFS
network file system
VT_MFS
memory file system
VT_MSDOSFS
FAT file system
VT_LFS
log-structured file system
VT_LOFS
loopback file system
VT_FDESC
file descriptor file system
VT_NULL
null file system layer
VT_UMAP
uid/gid remapping file system layer
VT_KERNFS
kernel interface file system
VT_PROCFS
process interface file system
VT_ISOFS
ISO 9660 file system(s)
VT_UNION
union file system
VT_ADOSFS
Amiga file system
VT_EXT2FS
Linux's ext2 file system
VT_FILECORE
filecore file system
VT_NTFS
Microsoft NT's file system
VT_VFS
virtual file system
VT_OVERLAY
overlay file system
VT_PTYFS
pseudo-terminal device file system
VT_TMPFS
efficient memory file system
VT_UDF
universal disk format file system
VT_SYSVBFS
systemV boot file system
All vnode locking operations use
v_lock. This lock is acquired by calling
vn_lock(9) and released by calling
VOP_UNLOCK(9). The reason for this asymmetry is that
vn_lock(9) is a wrapper for
VOP_LOCK(9) with extra checks, while the unlocking step usually does not need additional checks and thus has no wrapper.
The vnode locking operation is complicated because it is used for many purposes. Sometimes it is used to bundle a series of vnode operations (see
vnodeops(9)) into an atomic group. Many file systems rely on it to prevent race conditions in updating file system type specific data structures rather than using their own private locks. The vnode lock can operate as a multiple-reader (shared-access lock) or single-writer lock (exclusive access lock), however many current file system implementations were written assuming only single-writer locking. Multiple-reader locking functions equivalently only in the presence of big-lock SMP locking or a uni-processor machine. The lock may be held while sleeping. While the
v_lock is acquired, the holder is guaranteed that the vnode will not be reclaimed or invalidated. Most file system functions require that you hold the vnode lock on entry. See
lock(9) for details on the kernel locking API.
Each file system underlying a vnode allocates its own private area and hangs it from
v_data.
Most functions discussed in this page that operate on vnodes cannot be called from interrupt context. The members
v_numoutput,
v_holdcnt,
v_dirtyblkhd,
v_cleanblkhd,
v_freelist, and
v_synclist are modified in interrupt context and must be protected by
splbio(9) unless it is certain that there is no chance an interrupt handler will modify them. The vnode lock must not be acquired within interrupt context.