E.g. want errors to cause e-mails, but everything else to just go to logs? Use a timer to activate a service, and make systemd activate another service on failure.
Want to avoid double execution? That's the default (timers are usually used to activate another unit, as long as that unit doesn't start something that doubleforks, it won't get activated twice).
(Some) protection against thundering herd is built in: You specify the level of accuracy (default 1m), and each machine on boot will randomly select a number of seconds to offset all timers on that host with. You can set this per timer or for the entire host.
And if you're using fleet, you can use fleet to automatically re-schedule cluster-wide jobs if a machine fails.
And the journal will capture all the output and timestamp it.
systemctl list-timers will show you which timers are scheduled, when they're scheduled to run next, how long is left until then, when they ran last, how long that is ago:
$ systemctl list-timers
NEXT LEFT LAST PASSED UNIT
Sat 2015-10-17 01:30:15 UTC 51s left Sat 2015-10-17 01:29:15 UTC 8s ago motdgen.timer
Sat 2015-10-17 12:00:34 UTC 10h left Sat 2015-10-17 00:00:33 UTC 1h 28min ago rkt-gc.timer
Sun 2015-10-18 00:00:00 UTC 22h left Sat 2015-10-17 00:00:00 UTC 1h 29min ago logrotate.timer
Sun 2015-10-18 00:15:26 UTC 22h left Sat 2015-10-17 00:15:26 UTC 1h 13min ago systemd-tmpfiles-clean.timer
And the timer specification itself is extremely flexible. E.g. you can schedule a timer to run x seconds after a specific unit was activated, or x seconds after boot, or x seconds after the timer itself fired, or x seconds after another unit was deactivated. Or combinations. "xkcd" = {
description = "send latest xkcd comic";
wants = [ "network.target" ];
startAt = "Mon,Wed,Fri *:0/30";
path = with pkgs; [ telegram-cli ];
serviceConfig = {
User = "rnhmjoj";
Type = "oneshot";
ExecStart = "${cabal}/bin/xkcd";
};
} // basicEnv;The biggest shortcoming with systemd timers, is that it doesn't have an easy way to notify admins of failures like standard cron does.
I tried to hack around this[0], but it still feels wrong.
Here's a small list of things we're getting out of it:
- concurrent run protection (& queue management via https://wiki.jenkins-ci.org/display/JENKINS/Concurrent+Run+B... )
- load balancing (e.g. max concurrent tasks) and remote execution with jenkins slaves [sounds complicated, but really jenkins just knows how to SSH]
- job timeouts. No more hanging jobs.
- failure notifications via slack/hipchat/email/whatever. [email only on status change via https://wiki.jenkins-ci.org/display/JENKINS/Email-ext+plugin ]
- log/history management: rotation & compression.
- fancy scheduling: e.g. run this job once every 24h, but if it fails keep retrying in 5 minute increments (https://wiki.jenkins-ci.org/display/JENKINS/Naginator+Plugin ). You could also use project dependencies for pipelines, but we've been staying away from that.
- monitoring: we use the datadog reporter & alert on time since last success. Given how mature Jenkins is, this likely translates to whatever system you're using just as well.
It's worked incredibly well for us. We migrated to Jenkins from crontabs with cronwrap (https://github.com/zomo/cronwrap). We're never going back.
Once I had a job that went stray and got the disk full with logs. Since Jenkins couldn't write to the disk anymore, it stopped working completely and thus no jobs and more importantly no notifications. Funny thing, there was one job to monitor the free disk space but the stray app wrote ~100GB in less than 15 minutes (damn SSDs :p).
Another time (times actually), I had the OOM killer kill a jenkins related process. Being a JVM based app and starting with about 1GB of RAM use, doesn't help I guess. This lead Jenkins to hang on a job; timeout didn't work, I couldn't even stop the job manually. Other jobs wouldn't start and no notifications would be sent again.
I inherited a legacy application with tons of cron jobs running scripts on the production server. Instead of risking moving our jobs to jenkins, we're simply using jenkin's post endpoint to post job results from the cron jobs themselves. It's not perfect, and doesn't give us all the goodies listed above, but it does give us more visibility on the jobs themselves until we can move them all off reliably. +1 from me if you are in a similar situation.
We made sure that jenkins doesn't fiddle with the environment, so that everything was derived from the various networked user accounts.
using @hourly, it spreads the load evenly over the hour to even out resource starvation spikes.
We have jenkin's job builder(and yaml) in a git repo to make sure that the delicate snowflake that is jenkins is repeatable.
I also end up creating a lot of Twilio scripts which are either positive control or negative control for the call/SMS, depending on how critical the thing is that I'm monitoring. For example, one of my sites updates an /api/healthcheck result with a timestamp every five minutes if everything is going peachy, and another box polling that endpoint blows up my phone if it fails to get HTTP 200 and a timestamp within the last five minutes. (This works, but I swear I need to tweak it just a wee bit, as today I had my once quarterly woken-up-at-4-AM-because-gremlins-ate-a-single-HTTP-request.)
Healthchecks handles this a lot more sensibly. I might throw it on a linode and give it a shot. Thanks for releasing it.
Take all of the features he mentions, and abstract the to a launch_from_cron.sh file. Make that file accept a script path as an argument and viola! All of the safety added to cron without the need for code duplication or these massive overhead solutions listed in these comments.
We've found that deploying cronjobs onto individual hosts is quite powerful, and helps us fill a niche between configuration management tools (like Puppet) and specialized coprocesses (like Smartstack). We have cronjobs for downloading code deploys, showing Sensu state within the motd, reconfiguring daemons (especially the Smartstack ones), and (of course) cleaning up unused data.
Of course, there's also the separate problem of scheduling and coordinating tasks across an entire cluster. In most cases we don't use our cron daemons for this, although we do have some jobs that run on multiple hosts and enforce mutual exclusion by grabbing a lock in Zookeeper.
I've been using it for two years now. This has replaced cron on about 200 nodes.
Not only it does cron, but also helps deploying artefacts (integrated with Jenkins) through simple forms. We now have ops with 0 experience in Linux deploying code.
Instead use https://github.com/zimbatm/logmail. It's a `sendmail` replacement that forwards everything to syslog. Then forward all your syslogs to a central place an you can capture and analyze these messages.
The problem is that we don't have any way of alerting our monitoring systems from a cron job.
This is exactly what I've been implementing, a simple curl API call to our monitoring system when a cron job has run is all that we need. This puts the monitoring of cron into the same sphere as all other monitoring and puts the alert on a webpage where it can be found eventually by our 2nd line or our on-call personnel, instead of in someones mailbox.
Edit: And you don't need a fancy REST based API for your monitoring system to do this, ye ol' nagios agent could do it with some hacks.
The hard part is having the discipline to fix all your cron jobs in this way, but adding || true is already tantamount to this.
http://www.robustperception.io/monitoring-batch-jobs-in-pyth... is the full Python version, and the simple version is a bash one-liner too.
Also: /home/on_a_phone/parse_today.sh `date +%Y%m%d`
Will fail catastrophically because cron treats '%' as a newline character for some silly reason. Have fun troubleshooting that one!
Side note - clean your damn leap second crons, Steve!
/home/on_a_phone/parse_today.sh $DATE
how to get cron to only send important emails and not every time it runs
you think maybe you should have just used
> /dev/null
and not > /dev/null 2>&1
why is this a full blog post?15 * * * * ( flock -w 0 200 && sleep `perl -e 'print int(rand(60))'` && nice /command/to/run && date > /var/run/last_successful_run ) 2>&1 200> /var/run/cron_job_lock | while read line ; echo `date` "$line" ; done > /path/to/the/log || true
TempleOS is designed like a C64.
I don't see cron as useful for a C64 user.
Chronos is the only one I'm aware of, but I don't believe it supports event based tasks.
- Save result as json, which is queryable in the database
- Save enough info in the task result to retry the task from the result.
- Retry a task from its result in Django Admin
- Run a periodic task now (e.g., to test your cron task)
[1] https://github.com/resulto-admin/django-celery-fulldbresult