Feb 2007
nullfs: IThe nullfs filesystem is a passthrough
filesystem. When nullfs is mounted it -
literally copies the filesystem transaction. If a
file is deleted, nullfs simply transmits the information down to
the lower filesystem. Conversely, if a file is created, it does the
same and tacks on all of the data needed for the filesystem
underneath. Why is that a good thing? Where did nullfs
come from and why?. What else, if anything, is it good
for? The series focuses on where nullfs comes from,
how it can be leveraged, a code walk and a bare implementation
(nearly a blind copy).
nullfsThe answer to this is simple as quoted from KirkMcKusick:
The null filesystem was done in July 1992 by John Heideman when he was visiting Berkeley to add his stackable filesystem implementation to BSD. John is the person that built the framework and built nullfs to show others how to use it. Jan-Simon Pendry used that framework in February 1994 to build several new filesystem modules including the union filesystem, the kernel filesystem, the umap filesystem, and the portal filesystem.
Stackable filesystems can lay on top each other (as the name implies) but more importantly - they abstract the details of a regular filesystem.
A good example is to look at layers. While there are many more layers an abstraction of file layers might be:
device-driver kernel
fs
vfs
user-interface (shell)
The ten thousand foot view is simplistic, however, look where
the null layer fits in:
device-driver kernel
fs
vfs
nullfs-portion
user-interface (shell)
Other stackable filesystem layers exist that can do the same such as the union filesystem:
device-driver kernel
fs
vfs
unionfs-portion
user-interface (shell)
Chapter 6 pp.231 of The Design and Implementation of the
4.4BSD Operating System (1996 McKusick, Bostic, Karels,
Quarterman) states the stackable filesystems succinctly: ... one
approach is to stack filesystems on top of one another [Rosenthal,
1990].
In short, use an abstracted method that can accommodate
several different filesystem types to communicate via common
API(s). The actual mechanism is the vnode layer discussed
at length later.
Because the null layer does little it is ideal as a starting point for filesystem layer design. For example, the null layer passes object data (often pointers) up and down the layer, if a programmer wished to design a layer that accepted a certain credential, the work of the credential could be put between the pass through layer.
The null filesystem code that is used as an example is found in
the NetBSD kernel. [1] In NetBSD-4 the source files can be found in
~src/sys/miscfs/nullfs[2]. The
files are:
Makefilefiles.nullfsnull.hnull_vfsops.cnull_vnops.cThe makefile and files
file are pretty simple:
MakefileINCSDIR= /usr/include/miscfs/nullfs INCS= null.h .include <bsd.kinc.mk>
A stock looking makefile pointing to the include directory and header.
files.nullfsdeffs NULLFS file miscfs/nullfs/null_vfsops.c nullfs file miscfs/nullfs/null_vnops.c nullfs
The files.. file, tells make which c files to use and what they are for (in this case - nullfs).
In the header file can be found information about key data
structures needed for a filesystem implementation. Following is the
complete source to the null.h header file:
#include <miscfs/genfs/layer.h>
struct null_args {
struct layer_args la; /* generic layerfs args */
};
#define nulla_target la.target
#define nulla_export la.export
#ifdef _KERNEL
struct null_mount {
struct layer_mount lm; /* generic layerfs mount stuff */
};
#define nullm_vfs lm.layerm_vfs
#define nullm_rootvp lm.layerm_rootvp
#define nullm_export lm.layerm_export
#define nullm_flags lm.layerm_flags
#define nullm_size lm.layerm_size
#define nullm_tag lm.layerm_tag
#define nullm_bypass lm.layerm_bypass
#define nullm_alloc lm.layerm_alloc
#define nullm_vnodeop_p lm.layerm_vnodeop_p
#define nullm_node_hashtbl lm.layerm_node_hashtbl
#define nullm_node_hash lm.layerm_node_hash
#define nullm_hashlock lm.layerm_hashlock
struct null_node {
struct layer_node ln;
};
#define null_hash ln.layer_hash
#define null_lowervp ln.layer_lowervp
#define null_vnode ln.layer_vnode
#define null_flags ln.layer_flags
int null_node_create(struct mount *, struct vnode *,
struct vnode **);
#define MOUNTTONULLMOUNT(mp) ((struct null_mount *)((mp)->mnt_data))
#define VTONULL(vp) ((struct null_node *)(vp)->v_data)
#define NULLTOV(xp) ((xp)->null_vnode)
#ifdef NULLFS_DIAGNOSTIC
struct vnode *layer_checkvp(struct vnode *, char *, int);
#define NULLVPTOLOWERVP(vp) layer_checkvp((vp), __FILE__, __LINE__)
#else
#define NULLVPTOLOWERVP(vp) (VTONULL(vp)->null_lowervp)
#endif
Digesting the header file at once might be daunting. First, the
top of the file includes genfs bits needed. In the
genfs layer header are structures, functions and
macros for generic filesystems:
~src/sys/miscfs/genfs/layer.h
#ifndef _MISCFS_GENFS_LAYER_H_
#define _MISCFS_GENFS_LAYER_H_
struct layer_args {
char *target; /* Target of loopback */
struct export_args30 _pad1; /* compat with old userland tools */
};
#ifdef _KERNEL
struct layer_node;
LIST_HEAD(layer_node_hashhead, layer_node);
struct layer_mount {
struct mount *layerm_vfs;
struct vnode *layerm_rootvp; /* Ref to root layer_node */
u_int layerm_flags; /* mount point layer flags */
u_int layerm_size; /* size of fs's struct node */
enum vtype layerm_tag; /* vtag of our vnodes */
int /* bypass routine for this mount */
(*layerm_bypass)(void *);
int (*layerm_alloc) /* alloc a new layer node */
(struct mount *, struct vnode *,
struct vnode **);
int (**layerm_vnodeop_p) /* ops for our nodes */
(void *);
struct layer_node_hashhead /* head of hash list for layer_nodes */
*layerm_node_hashtbl;
u_long layerm_node_hash; /* hash mask for hash chain */
struct simplelock layerm_hashlock; /* interlock for hash chain. */
};
#define LAYERFS_MFLAGS 0x00000fff /* reserved layer mount flags */
#define LAYERFS_MBYPASSDEBUG 0x00000001
struct layer_node {
LIST_ENTRY(layer_node) layer_hash; /* Hash list */
struct vnode *layer_lowervp; /* VREFed once */
struct vnode *layer_vnode; /* Back pointer */
unsigned int layer_flags; /* locking, etc. */
};
#define LAYERFS_RESFLAGS 0x00000fff /* flags reserved for layerfs */
#define LAYERFS_REMOVED 0x00000001 /* Did a remove on this node */
#define LAYERFS_UPPERLOCK(v, f, r) do { \
if ((v)->v_vnlock == NULL) \
r = lockmgr(&(v)->v_lock, (f), &(v)->v_interlock); \
else \
r = 0; \
} while (0)
#define LAYERFS_UPPERUNLOCK(v, f, r) do { \
if ((v)->v_vnlock == NULL) \
r = lockmgr(&(v)->v_lock, (f) | LK_RELEASE, &(v)->v_interlock); \
else \
r = 0; \
} while (0)
#define LAYERFS_UPPERISLOCKED(v, r) do { \
if ((v)->v_vnlock == NULL) \
r = lockstatus(&(v)->v_lock); \
else \
r = -1; \
} while (0)
#define LAYERFS_DO_BYPASS(vp, ap) \
(*MOUNTTOLAYERMOUNT((vp)->v_mount)->layerm_bypass)((ap))
struct vnode *layer_checkvp(struct vnode *vp, const char *fil, int lno);
#define MOUNTTOLAYERMOUNT(mp) ((struct layer_mount *)((mp)->mnt_data))
#define VTOLAYER(vp) ((struct layer_node *)(vp)->v_data)
#define LAYERTOV(xp) ((xp)->layer_vnode)
#ifdef LAYERFS_DIAGNOSTIC
#define LAYERVPTOLOWERVP(vp) layer_checkvp((vp), __FILE__, __LINE__)
extern int layerfs_debug;
#else
#define LAYERVPTOLOWERVP(vp) (VTOLAYER(vp)->layer_lowervp)
#endif
#endif /* _KERNEL */
#endif /* _MISCFS_GENFS_LAYER_H_ */
Of import are the layer_args,
layer_mount and layer_node structures,
the null.h header accounts for and defines them for
the null layer context. For example, the layer_mount
data:
struct mount *layerm_vfs;
Is redefined with:
#define nullm_vfs lm.layerm_vfs
Note that a whole new data structure, null_mount
was instantiated from the genericfilesystem:
struct null_mount {
struct layer_mount lm; /* generic layerfs mount stuff */
};
Essentially, using the generic filesystem bits one can construct a filesystem layer with general ease.
Stackable filesystems enable system programmers the capability
to rapidly prototype, design and in some cases deploy new
filesystems and/or new filesystem layers. The null
filesystem is a great template for getting started on filesystem
design. In the next part(s) of the series a codewalk of key
functions within the actual null layer code, hooking
into the kernel and an example implementation.