#/* Copyright 2002 by Jos Visser <josv@osp.nl>
# * All rights reserved
# *
# * A cluster aware cron daemon...
# *
# * Sat Feb  2 08:57:45 CET 2002
# *
# * For the license information see the README file
# *
# * Send bug reports, bug fixes, enhancements, requests, flames, etc., and
# * I'll try to keep a version up to date.
# */

The story is simple, and like this: Cron is a simple yet effective 
mechanism to schedule Unix commands on a regular basis. I think that 
everybody with any experience in the use of cron can easily point out a 
number of weaknesses (especially those with exposure to job scheduling 
mechanisms like JES2/JES3, OPC/A, Maestro or others), but its universal 
presence and ease of use make it still the scheduler of choice in many 
environments. Paul Vixie's cron is (in my humble opinion) one of the 
best around...

The cron daemon has a system wide perspective. Each user as one crontab 
for the entire system, and it's cron's job to ensure that that crontab 
is honoured. It pursues this with an iron discipline. There is one 
particular environment in which cron's system wide look on things is 
too rigid: HA failover clusters.

In failover HA environments, packages (or whatever you like to name 
them) are switched from one node to another, taking with them the file 
systems, processes, IP addresses and other resources that are necessary 
to run the service on the new node.

Cron jobs have "traditionally" been somewhat of a headache in HA 
environments.  Because cron jobs are scheduled system wide, and not on 
a package basis, we run into the following problems:

- When deploying the application in the cluster, the crontab(s) must be 
  activated on each and every node. Although excruciatingly simple, my 
  experience has shown that this is error prone.

- Cron jobs start running on nodes where package are currently not
  present. This usually leads to a failure of the cron job because it 
  references commands or data that are stored in file systems that are 
  currently not mounted. This messes up the cron log and confuses system 
  monitoring (e.g. OpenView or OpenNMS) monitors.

Usually we solve these problem in one of the following ways:

1) Live with it, and adjust the monitoring software to not recognise 
   this as an error.

2) Wrap the actual cron jobs with a script that checks for package 
   presence.
  
3) (De)Install the package crontabs on the node in the package 
   startup/shutdown scripts. This has a nasty condition that if a node 
   crashes, the package switches but the crontab is not removed. One site 
   I know then has cron jobs running that then remove these "orphaned" 
   crontabs.  

I think I have the better solution for this problem: a package specific 
(cluster aware) cron daemon!

I "enhanced" (some would say "bloated") Paul Vixie's cron software to 
recognise an alternative cron administration directory. The system wide 
Vixie cron daemon stores its administration in:

- /var/cron		crontabs, log
- /var/run/cron.pid	PID file
- /etc/crontab		The old BSD system wide crontab

With my small changes, cron (and crontab) recognises the "-d" argument
and the CRONDIR environment variable. The cluster cron daemon requires
one of these methods to determine an alternative root for its crontabs,
logs and PID file (cluster cron also ignores the /etc/crontab). The
cron daemon thus started stores its entire configuration relative to
this crondir. By setting the crondir to a directory within a package
(floating) file system, and starting/stopping a cron instance at
package startup/shutdown, all the issues mentioned above are solved.
The crontabs automatically travel with the package, and are automatically
picked up at the new node. When a node crashes, the package cron dies
as well, and is not restarted when the node comes up (because it now
runs wherever the package is).

A potential security breach exists because cron runs with the root userid
and crontab is even setuid root. If an ordinary user can start crontab
with -d any_directory, the potential for fraudulently overwriting existing
files/directories to which the user would normally have no access exists.
To remedy this, cluster cron does some checks on the crondir specified to
it through either -d or the CRONDIR environment variable. That directory
must:

- Exist.
- Be owned by root
- Not be writable by anyone else but root
- Contain a ``cluster_cron.auth'' file (which must be owned by root).

These checks should guard against a directory not set up specifically to
be the root of a cluster cron inadvertently being used. Please note that
on system with the standard SysV chown semantics (anyone may chown, giving
a file/directory of him/herself away), it is possible for ordinary users
to set up cluster cron directory of themselves. I currently do not think
that this is a security breach though....

The build the cluster cron, tweak the Makefile and config.h to your liking
for your platform (according to the instructions in the README file) and
define -DCLUSTER_CRON in the Makefile's DEFS variable. Build and install
the cluster cron for your system (goes to /opt/clustercron by default).
Then set and export the CRONDIR variable, or use cron/crontab with the 
-d option, and off you go! 

Please note that the cluster cron binaries must either be installed on
every node, or be part of the package....

Questions? Comments? Enhancements? Flames? Please contact

Jos Visser
<josv@osp.nl>