"Kerneld": An implementation of "kernel deamons" or A truly automatic and invisible support for loadable modules in Linux Bjorn Ekwall in May 1995 and January + March 1996 Executive summary: - Read the section "Extensions", last in this document, and decide if you would like to try them. You don't have to... (It's about getting the pid of the "trigger" process and about persistent module storage) - Install the module utilities in this package. - Apply the the relevant patch to your kernel (for 1.2 kernels) - Enable CONFIG_KERNELD in "make config" - Make a kernel with "everything" (but the root fs, root disk) as loadable modules. Install the modules with "make modules_install" - A "make install" in this directory will put "kerneld" in /sbin - Add these lines at the top of /etc/rc: if [ -x /sbin/kerneld ] then /sbin/kerneld fi - Read the man page for modprobe to see if you want to create/change the configuration file "/etc/conf.modules" Example: I have these lines to complement the defaults in modprobe: alias scsi_hostadapter aha1542 alias eth0 3c509 alias eth1 de620 options de620 bnc=1 ("modprobe -c" will show you the current modprobe configuration) - Remove any explicit "insmod" that you might have in your startup scripts (in /etc/rc*/*). - Reboot and enjoy... i.e. mount some file systems, ifconfig eth0 ... - Forget about how to do "insmod" manually :-) Introduction: The aim of "kerneld" is to give the kernel a possibility to send requests to the user level. These requests will result in actions by user-level programs. The results of these actions will be reported back to the kernel, if the kernel wants them... The requests we have implemented now support "on demand loading" and unloading of loadable kernel modules, since it is quite easy for the kernel to notice when a process requests a functionality that is not supplied in the standard kernel configuration, but might be implemented as a loadable kernel module. A kernel that is configured to use "kerneld" can thus be seen as a "virtual" full-configured kernel. The "kerneld" support can be seen as an "inverted" fork/exec, where the kernel can decide, for each fork/exec, whether it wants to wait for an exit status or not. If the kernel selects to wait for an exit status, it can also select to recieve data from "kerneld". I decided to use the IPC message mechanism instead of sockets, since the IPC "fits" better for this purpose IMHO. If (=when) the net subsystem is implemented as loadable modules, this decision will be seen as a wise one :-) I just made a couple of small additions to the kernel IPC support, and I think that we have something workable... Status: The kernel modifications are in a quite a good shape for now, and the user level "kerneld" as well, although "kerneld" now only handles a few different actions. It is easy to extend this if you want to... Note that kerneld automatically removes unused modules and stacks that it has loaded even without any explicit kernel request for this! Modifications: In "ipc/msg.c" there are some small additions, and an added function: int kerneld_send(int msgtype, int return_size, int msgsz, const char *text, const char *return_value); We have also added the following (inline) kernel support functions: ksystem (const char *command, int wait_flag); request_module (const char *name); release_module (const char *name, int wait_flag); delayed_release_module (const char *name); cancel_release_module (const char *name); kerneld_route(const char *ip_route); (See also: ) For the "request_module()" call, the kernel will wait for "kerneld" to finish its job. For the other "*module()" calls it will not wait, unless there is a wait_flag that has been set. Jacques Gelinas and I have added "request_module()" calls for _all_ filesystems, block and character devices, as well as the net drivers. There is also a "request_module()" in "fs/exec.c" which will try to load any missing binary format interpreter (such as iBCS). Note that the kernel will now support _all_ filesystems, devices, net drivers and binary formats, even those that aren't written yet! All that is required is that the functionality is available as a kernel loadable module and that modprobe can find it. You might have to update "/etc/conf.modules" for the new module though... The additional code is practically insignificant; just a couple of lines in "fs/devices.c", "fs/super.c", "fs/exec.c", "net/inet/route.c" and "net/inet/dev.c"... "Theory:" A specific kernel message queue is allocated for use as a channel between the kernel and the user level. When the user level daemon "kerneld" opens a message queue with "msgget()", where the flag has the new bit IPC_KERNELD set (defined in ), this queue will be marked as being the "kerneld" message queue. All sends and receives on this queue will be handled specially by the kernel. The function "kerneld_send" will add a message from the kernel to the queue, and this message will be picked up by the user level kernel daemon. The action requested is encoded in the message type. The traditional IPC message text array ("mtext") is now divided into two separate fields; the "id" and "text" fields: struct kerneld_msg { long mtype; long id; char text[1]; }; The field "id" is used by the kernel to store a sequence number, so that a return status can be picked up by the correct, sleeping, kernel level process. The user level kernel daemon is really only a "switch(mtype)", where we for now use 1 as signifying a "system" command, 2 as a request for "insmod" and 3 as a request for "rmmod", and so on. The relevent command is executed, using "text" as a parameter. If the "id" field is 0, it is interpreted by kerneld as an indication that the kernel don't want any status information. If the "id" field is non-zero, it will be used as the message type of the return message, from kerneld to the kernel, and the "id" field will now instead be used to store the exit value from the executed command. If kerneld_send is called with "(return_size & KERNELD_WAIT) == 0", the kernel will not wait for an "answer" from "kerneld". Otherwise it will wait for a message where the type is equal to the sequence number of the original request. If "return_size & ~KERNELD_WAIT > 0", and "return_value != NULL", the returned information in the "text" field will be copied to the area pointed to by "return_value". In all cases where the KERNELD_WAIT flag is set, the status will be returned to the caller of kerneld_send(). In this way it is possible to have concurrent requests to "kerneld". Since the requests are encoded in the message type, it is possible to have several, independent, kernel deamons, that each "listens" for a specific message type. The message types from the kernel to kerneld are limited to <= KERNELD_MAXCMD, so that all sends from the kernel to kerneld will have a type less than that value (defined in ). The sequence numbers start at KERNELD_MINSEQ, which means that all messages from kerneld back to the kernel will all have a type greater or equal than this value, but still positive (defined in /cmdline" file, or via e.g.: set `ps hc $KERNELD_TRIGGER` case $5 in xbiff) ..... ;; bash) .... ;; *) ... ;; esac The "persist" extension: ----------------------- In this release there is support for persistent module storage via kerneld. This means that a module can save its internal state, to be restored on the next invocation. Note: this is still experimental, since there is no agreed-upon syntax for the keys. The key _could_ be created by prepending the module name to the key name, so that no name collisions would happen (unless we _want_ the to happen, of course). To enble this feature, apply the included "kerneld-persist-patch" to the kernel. This will upgrade linux/include/linux/kerneld.h If you have libgdbm, and if you want to be able to control the behaviour of the persistent storage, you should modify the Makefile in this directory according to the instructions at the top. How it works: If a module wants to save some information that might be useful for when it is reloaded at a later time, a call to "set_persist" will make this possible. The syntax is: int set_persist(char *key, void *value, size_t size) where the key can be any string, the value can be anything (including structs and arrays) and size is the size of the value to be stored. (A negative return value indicates that the kernel couldn't send the message to kerneld.) This call will trigger a kerneld action, where the key and value is stored inside kerneld, as well as in a gdbm-file, if you have enabled this in the Makefile. It is possible to use the persistant storage even without gdbm, but the information will the be lost as soon as kerneld exits. The gdbm file will reside in /lib/modules/`uname -r`/persist.gdbm. The information in the gdbm file can be read and updated by ordinary user programs (with the right authorities), and any changes can be reloaded into the running kerneld by sending a SIGHUP to kerneld ("kill -1"). Note that kerneld only keeps the gdbm file open as long as it uses it. The file will be closed after each update, so the gdbm limitation of only one writer at a time is not a big problem... Also note that kerneld will initialise itself from this gdbm file whenever kerneld starts to execute as well. This makes the information even survive re-boots. If a module wants to get the saved persistent information, it could do a call to "get_persist", using the same key as was used when the information was stored. See below for insmod support (which is better). The syntax is: int get_persist(char *key, void *value, size_t size) where the key can be any string, the value is a pointer to the memory where the return value should be stored, and size is the size of the value to be stored. The returned value will never be larger than this. The return value from the call tells how many bytes that were actually read, and can be used to verify that the correct data type was read. If the return value from get_persist is 0, then no value could be found for this key. A negative return value indicates that the kernel couldn't send the message to kerneld, so no persistent information was available. If, for some reason, a module wants to remove a saved entry, this call will make that possible: set_persist("the key", NULL, 0); How to use it in your modules: An example can be found in "check_persist.c", which shows all the normal ways of using the persistent storage. Even though the example puts the call to set_persist in the init_module function, there is no limitation to where it can be put in a real module, since the kerneld action will be performed asynchronously, i.e. the kernel doesn't have to wait until kerneld has stored the value. It's really quite trivial to use, so go right ahead! Note that insmod now can conditionally set the values of any symbol from a key: insmod module.o symbol=?key:default_value If there is no persistent key with that name, the default_value (if any) will be used. Note that this makes it possible to drop most calls to "get_persist"! To administer the persistent values, a simple utility "admpersist" is included. It can be used like this: admpersist # shows all keys and their values admpersist a_key # shows the value stored by "a_key" admpersist a_key=0x1234 # creates/updates the persistent value admpersist -d a_key # deletes the persistent value More examples: admpersist -d key1 key2 key3 # delete these persistent values admpersist an_array=27,0x1b,033 # handle an integer array The admpersist utility can handle arrays similar to what insmod does, and understands integer values expressed in hex, octal or decimal. If you want kerneld to note the changes, do a "kill -1" on kerneld. Bjorn Ekwall in June 1996