"Kerneld": An implementation of "kernel deamons"
                                or
   A truly automatic and invisible support for loadable modules in Linux

        Bjorn Ekwall <bj0rn@blox.se> in May 1995 and January + March 1996


Executive summary:

	- Read the section "Extensions", last in this document, and
	  decide if you would like to try them.  You don't have to...
	  (It's about getting the pid of the "trigger" process
	   and about persistent module storage)

	- Install the module utilities in this package.

	- Apply the the relevant patch to your kernel (for 1.2 kernels)

	- Enable CONFIG_KERNELD in "make config"

	- Make a kernel with "everything" (but the root fs, root disk)
	  as loadable modules.
	  Install the modules with "make modules_install"

	- A "make install" in this directory will put "kerneld" in /sbin

	- Add these lines at the top of /etc/rc:
		if [ -x /sbin/kerneld ]
		then
			/sbin/kerneld
		fi

	- Read the man page for modprobe to see if you want to create/change
	  the configuration file "/etc/conf.modules"

	  Example: I have these lines to complement the defaults in modprobe:
		alias scsi_hostadapter aha1542
	  	alias eth0 3c509
		alias eth1 de620
		options de620 bnc=1
	  ("modprobe -c" will show you the current modprobe configuration)

	- Remove any explicit "insmod" that you might have in your
	  startup scripts (in /etc/rc*/*).

	- Reboot and enjoy... i.e. mount some file systems, ifconfig eth0 ...

	- Forget about how to do "insmod" manually :-)


Introduction:

The aim of "kerneld" is to give the kernel a possibility to send requests
to the user level.  These requests will result in actions by user-level
programs.  The results of these actions will be reported back to the kernel,
if the kernel wants them...

The requests we have implemented now support "on demand loading" and
unloading of loadable kernel modules, since it is quite easy for the
kernel to notice when a process requests a functionality that is not
supplied in the standard kernel configuration, but might be implemented
as a loadable kernel module.
A kernel that is configured to use "kerneld" can thus be seen as a
"virtual" full-configured kernel.

The "kerneld" support can be seen as an "inverted" fork/exec, where the
kernel can decide, for each fork/exec, whether it wants to wait for an
exit status or not.
If the kernel selects to wait for an exit status, it can also select to
recieve data from "kerneld".

I decided to use the IPC message mechanism instead of sockets, since the
IPC "fits" better for this purpose IMHO.  If (=when) the net subsystem is
implemented as loadable modules, this decision will be seen as a wise one :-)

I just made a couple of small additions to the kernel IPC support,
and I think that we have something workable...


Status:

The kernel modifications are in a quite a good shape for now, and the
user level "kerneld" as well, although "kerneld" now only handles a few
different actions. It is easy to extend this if you want to...
Note that kerneld automatically removes unused modules and stacks that
it has loaded even without any explicit kernel request for this!


Modifications:

In "ipc/msg.c" there are some small additions, and an added function:

	int kerneld_send(int msgtype, int return_size, int msgsz,
		const char *text, const char *return_value);

We have also added the following (inline) kernel support functions:
	ksystem (const char *command, int wait_flag);
	request_module (const char *name);
	release_module (const char *name, int wait_flag);
	delayed_release_module (const char *name);
	cancel_release_module (const char *name);
	kerneld_route(const char *ip_route);

(See also: <linux/kerneld.h>)

For the "request_module()" call, the kernel will wait for "kerneld"
to finish its job. For the other "*module()" calls it will not wait,
unless there is a wait_flag that has been set.

Jacques Gelinas and I have added "request_module()" calls for _all_
filesystems, block and character devices, as well as the net drivers.
There is also a "request_module()" in "fs/exec.c" which will try to
load any missing binary format interpreter (such as iBCS).
Note that the kernel will now support _all_ filesystems, devices,
net drivers and binary formats, even those that aren't written yet!
All that is required is that the functionality is available as
a kernel loadable module and that modprobe can find it.
You might have to update "/etc/conf.modules" for the new module though...

The additional code is practically insignificant; just a couple of lines
in "fs/devices.c", "fs/super.c", "fs/exec.c", "net/inet/route.c"
and "net/inet/dev.c"...


"Theory:"

A specific kernel message queue is allocated for use as a channel between
the kernel and the user level.  When the user level daemon "kerneld"
opens a message queue with "msgget()", where the flag has the new bit
IPC_KERNELD set (defined in <linux/msg.h>), this queue will be marked
as being the "kerneld" message queue.
All sends and receives on this queue will be handled specially by the kernel.

The function "kerneld_send" will add a message from the kernel to the
queue, and this message will be picked up by the user level kernel daemon.
The action requested is encoded in the message type.
The traditional IPC message text array ("mtext") is now divided into two
separate fields; the "id" and "text" fields:

	struct kerneld_msg {
		long mtype;
		long id;
		char text[1];
	};

The field "id" is used by the kernel to store a sequence number, so that a
return status can be picked up by the correct, sleeping, kernel level process.

The user level kernel daemon is really only a "switch(mtype)", where we for
now use 1 as signifying a "system" command, 2 as a request for "insmod"
and 3 as a request for "rmmod", and so on.
The relevent command is executed, using "text" as a parameter.
If the "id" field is 0, it is interpreted by kerneld as an indication
that the kernel don't want any status information.
If the "id" field is non-zero, it will be used as the message type of the
return message, from kerneld to the kernel, and the "id" field will now
instead be used to store the exit value from the executed command.


If kerneld_send is called with "(return_size & KERNELD_WAIT) == 0", the
kernel will not wait for an "answer" from "kerneld".  Otherwise it will
wait for a message where the type is equal to the sequence number of the
original request.
If "return_size & ~KERNELD_WAIT > 0", and "return_value != NULL", the
returned information in the "text" field will be copied to the area
pointed to by "return_value".
In all cases where the KERNELD_WAIT flag is set, the status will
be returned to the caller of kerneld_send().
In this way it is possible to have concurrent requests to "kerneld".

Since the requests are encoded in the message type, it is possible
to have several, independent, kernel deamons, that each "listens"
for a specific message type.

The message types from the kernel to kerneld are limited to <= KERNELD_MAXCMD,
so that all sends from the kernel to kerneld will have a type less than that
value (defined in <linux/kerneld.h>).
The sequence numbers start at KERNELD_MINSEQ, which means that all
messages from kerneld back to the kernel will all have a type greater
or equal than this value, but still positive (defined in <linux/kerneld.h),
and thus easily distinguishable.

Note that more than one daemon can exist simultanuously, if the other
daemons only "listens" for _one_ specific message type, since the
"normal" kerneld is set up to only look for message types less than 256.

This makes the linux kernel somewhat "micro-kernelish", with server
processes that are called by the kernel through a well defined interface!
Note that the "text" field in the message can be any arbitrary bit
pattern, e.g. it is possible to send any type of structs back and forth
between the kernel and the daemon(s)!  The only requirement is that the
kernel and the kerneld processing agree on how to interpret the info.

	struct param parameter;
	struct retval return_value;
	int kerneld_function_number = KERNELD_KNOWS_MY_SPECIAL_MAGIC_FUNCTION;

	...

	status = kerneld_send(kerneld_function_number,
				sizeof(struct retval) | KERNELD_WAIT,
				sizeof(struct param),
				(char *) &parameter,
				(char *) &return_value);
	...


Check out kdsend.c and kdstat.c for a use of kerneld from user space...

Note that modules now can have a "hidden" autoclean attribute, that is set
on modules loaded by kerneld (via the "-k" option to insmod).
Modules marked "autoclean" will be subject to removal attempts every minute.
Since the timer is restarted every time a module is loaded by "kerneld",
a module will be able to stay "unused" in the kernel for max one minute.


Extensions:

The "pid" extension:
-------------------

  The kerneld protocol can be extended to also send the pid of the process
  that triggered the kerneld event.  The pid of the "triggering" process is
  then available to kerneld, and all programs that are started by kerneld,
  via the environment variable KERNELD_TRIGGER.
  If kerneld was triggered during an interrupt, the pid will be set to 0.

  The 2.0-kernels are prepared for this, the only thing you have to do is
  to uncomment the definition in linux/include/linux/kerneld.h, line 18:


--- linux/include/linux/kerneld.h.org	Mon Jun 10 18:16:31 1996
+++ linux/include/linux/kerneld.h	Mon Jun 10 18:17:24 1996
@@ -14,8 +14,8 @@
  * Uncomment the following line for the new kerneld protocol
  * This includes the pid of the kernel level requester into the kerneld header
  */
-/*
 #define NEW_KERNELD_PROTOCOL
+/*
  */
 #ifdef NEW_KERNELD_PROTOCOL
 #define OLDIPC_KERNELD 00040000   /* use the kerneld message channel */


  I have tried to make kerneld backwards compatible with un-patched kernels,
  but the patch will require a re-compile of all kerneld-type daemons...

  The reason for this extension is to make it possible to e.g. create
  "just-in-time" debuggers for Linux, that are started whenever a fault
  is detected by the kernel.

  Another use is to enable custom made "beeps", via the kdsound daemon in
  the GOODIES directory.  The command name, used to select the "beep", can
  easily be found in the "/proc/<pid>/cmdline" file, or via e.g.:
	set `ps hc $KERNELD_TRIGGER`
	case $5 in
	xbiff) ..... ;;
	bash) .... ;;
	*) ... ;;
	esac


The "persist" extension:
-----------------------

  In this release there is support for persistent module storage via kerneld.
  This means that a module can save its internal state, to be restored on
  the next invocation.

  Note: this is still experimental, since there is no agreed-upon syntax
    for the keys.  The key _could_ be created by prepending the module name
    to the key name, so that no name collisions would happen (unless we
    _want_ the to happen, of course).

  To enble this feature, apply the included "kerneld-persist-patch" to
  the kernel.  This will upgrade linux/include/linux/kerneld.h

  If you have libgdbm, and if you want to be able to control the
  behaviour of the persistent storage, you should modify the Makefile
  in this directory according to the instructions at the top.


  How it works:

  If a module wants to save some information that might be useful for
  when it is reloaded at a later time, a call to "set_persist" will
  make this possible.

  The syntax is:
	int set_persist(char *key, void *value, size_t size)

  where the key can be any string, the value can be anything (including
  structs and arrays) and size is the size of the value to be stored.
  (A negative return value indicates that the kernel couldn't send
  the message to kerneld.)

  This call will trigger a kerneld action, where the key and value
  is stored inside kerneld, as well as in a gdbm-file, if you have
  enabled this in the Makefile.

  It is possible to use the persistant storage even without gdbm,
  but the information will the be lost as soon as kerneld exits.

  The gdbm file will reside in /lib/modules/`uname -r`/persist.gdbm.

  The information in the gdbm file can be read and updated by ordinary user
  programs (with the right authorities), and any changes can be reloaded
  into the running kerneld by sending a SIGHUP to kerneld ("kill -1").

  Note that kerneld only keeps the gdbm file open as long as it uses it.
  The file will be closed after each update, so the gdbm limitation
  of only one writer at a time is not a big problem...

  Also note that kerneld will initialise itself from this gdbm file
  whenever kerneld starts to execute as well.  This makes the information
  even survive re-boots.


  If a module wants to get the saved persistent information, it could
  do a call to "get_persist", using the same key as was used when the
  information was stored.  See below for insmod support (which is better).

  The syntax is:
	int get_persist(char *key, void *value, size_t size)

  where the key can be any string, the value is a pointer to the memory
  where the return value should be stored, and size is the size of the
  value to be stored. The returned value will never be larger than this.
  The return value from the call tells how many bytes that were actually
  read, and can be used to verify that the correct data type was read.

  If the return value from get_persist is 0, then no value could be
  found for this key.
  A negative return value indicates that the kernel couldn't send
  the message to kerneld, so no persistent information was available.


  If, for some reason, a module wants to remove a saved entry, this call
  will make that possible:

	set_persist("the key", NULL, 0);


  How to use it in your modules:

  An example can be found in "check_persist.c", which shows all the
  normal ways of using the persistent storage.  Even though the example
  puts the call to set_persist in the init_module function, there is
  no limitation to where it can be put in a real module, since the
  kerneld action will be performed asynchronously, i.e. the kernel
  doesn't have to wait until kerneld has stored the value.

  It's really quite trivial to use, so go right ahead!

  Note that insmod now can conditionally set the values of any symbol
  from a key:

	insmod module.o symbol=?key:default_value

  If there is no persistent key with that name, the default_value (if any)
  will be used.
  Note that this makes it possible to drop most calls to "get_persist"!

  To administer the persistent values, a simple utility "admpersist" is
  included.  It can be used like this:

	admpersist		# shows all keys and their values
	admpersist a_key	# shows the value stored by "a_key"
	admpersist a_key=0x1234	# creates/updates the persistent value
	admpersist -d a_key	# deletes the persistent value

  More examples:

	admpersist -d key1 key2 key3	# delete these persistent values
	admpersist an_array=27,0x1b,033	# handle an integer array

  The admpersist utility can handle arrays similar to what insmod does, and
  understands integer values expressed in hex, octal or decimal.

  If you want kerneld to note the changes, do a "kill -1" on kerneld.


Bjorn Ekwall  <bj0rn@blox.se>  in June 1996