#/* Copyright 2002 by Jos Visser # * All rights reserved # * # * A cluster aware cron daemon... # * # * Sat Feb 2 08:57:45 CET 2002 # * # * For the license information see the README file # * # * Send bug reports, bug fixes, enhancements, requests, flames, etc., and # * I'll try to keep a version up to date. # */ The story is simple, and like this: Cron is a simple yet effective mechanism to schedule Unix commands on a regular basis. I think that everybody with any experience in the use of cron can easily point out a number of weaknesses (especially those with exposure to job scheduling mechanisms like JES2/JES3, OPC/A, Maestro or others), but its universal presence and ease of use make it still the scheduler of choice in many environments. Paul Vixie's cron is (in my humble opinion) one of the best around... The cron daemon has a system wide perspective. Each user as one crontab for the entire system, and it's cron's job to ensure that that crontab is honoured. It pursues this with an iron discipline. There is one particular environment in which cron's system wide look on things is too rigid: HA failover clusters. In failover HA environments, packages (or whatever you like to name them) are switched from one node to another, taking with them the file systems, processes, IP addresses and other resources that are necessary to run the service on the new node. Cron jobs have "traditionally" been somewhat of a headache in HA environments. Because cron jobs are scheduled system wide, and not on a package basis, we run into the following problems: - When deploying the application in the cluster, the crontab(s) must be activated on each and every node. Although excruciatingly simple, my experience has shown that this is error prone. - Cron jobs start running on nodes where package are currently not present. This usually leads to a failure of the cron job because it references commands or data that are stored in file systems that are currently not mounted. This messes up the cron log and confuses system monitoring (e.g. OpenView or OpenNMS) monitors. Usually we solve these problem in one of the following ways: 1) Live with it, and adjust the monitoring software to not recognise this as an error. 2) Wrap the actual cron jobs with a script that checks for package presence. 3) (De)Install the package crontabs on the node in the package startup/shutdown scripts. This has a nasty condition that if a node crashes, the package switches but the crontab is not removed. One site I know then has cron jobs running that then remove these "orphaned" crontabs. I think I have the better solution for this problem: a package specific (cluster aware) cron daemon! I "enhanced" (some would say "bloated") Paul Vixie's cron software to recognise an alternative cron administration directory. The system wide Vixie cron daemon stores its administration in: - /var/cron crontabs, log - /var/run/cron.pid PID file - /etc/crontab The old BSD system wide crontab With my small changes, cron (and crontab) recognises the "-d" argument and the CRONDIR environment variable. The cluster cron daemon requires one of these methods to determine an alternative root for its crontabs, logs and PID file (cluster cron also ignores the /etc/crontab). The cron daemon thus started stores its entire configuration relative to this crondir. By setting the crondir to a directory within a package (floating) file system, and starting/stopping a cron instance at package startup/shutdown, all the issues mentioned above are solved. The crontabs automatically travel with the package, and are automatically picked up at the new node. When a node crashes, the package cron dies as well, and is not restarted when the node comes up (because it now runs wherever the package is). A potential security breach exists because cron runs with the root userid and crontab is even setuid root. If an ordinary user can start crontab with -d any_directory, the potential for fraudulently overwriting existing files/directories to which the user would normally have no access exists. To remedy this, cluster cron does some checks on the crondir specified to it through either -d or the CRONDIR environment variable. That directory must: - Exist. - Be owned by root - Not be writable by anyone else but root - Contain a ``cluster_cron.auth'' file (which must be owned by root). These checks should guard against a directory not set up specifically to be the root of a cluster cron inadvertently being used. Please note that on system with the standard SysV chown semantics (anyone may chown, giving a file/directory of him/herself away), it is possible for ordinary users to set up cluster cron directory of themselves. I currently do not think that this is a security breach though.... The build the cluster cron, tweak the Makefile and config.h to your liking for your platform (according to the instructions in the README file) and define -DCLUSTER_CRON in the Makefile's DEFS variable. Build and install the cluster cron for your system (goes to /opt/clustercron by default). Then set and export the CRONDIR variable, or use cron/crontab with the -d option, and off you go! Please note that the cluster cron binaries must either be installed on every node, or be part of the package.... Questions? Comments? Enhancements? Flames? Please contact Jos Visser