10. Etcd integration
PgQuartz can integrate with etcd in which case it will lock a key while it is running. This allows for scheduling jobs across clustered nodes one at a time, where
any node will lock the key and run the steps
all other nodes will wait until the node has finished its operations and released the key
when the key is released the next node will lock the key, run the steps and release the key
and on and on it goes until either
all nodes have successfully finished running the steps, or
the Jobs timeout has expired after which all nodes that have not run yet will exit with an error code
10.1. Configuration options
10.1.1. endpoints
a list of endpoints can be set to point to all instances of etcd.
Every node should be formatted as hostname:port
(e.a. server1:2379
).
When not set, this defaults to only one node localhost:2379
.
10.1.2. LockKey
The key that will be locked for this job can be configured.
When not set, it defaults to the job name.
note that the job name is derived from the yaml that defines the job (e.a. /etc/pgquartz/jobs/job1.yaml
would result in a job name job1
)
10.1.3. Lock timeout
While running, PgQuartz automatically refreshes the lock, and when finished, PgQuartz automatically releases the lock. But there are circumstances where etcd would retain the lock until a preset lock timeout, after which it would release the lock.
Examples of these circumstances are:
PgQuartz dies a horrible death (e.a.
killall -KILL pgquartz
would be horrible)a network partition would occur between PgQuartz and etcd
And when the lock is released:
the current PgQuartz job fails
PgQuartz running on another node would be released to run the job
10.2. Example
To explain the ‘etcd integration’ consider the following config:
10.2.1. Example config
steps:
step 1:
commands:
- name: Run command 1.1
type: shell
inline: "sleep 10"
- name: Run command 1.2
type: shell
inline: "date +%s > /tmp/beenhere.txt"
etcdConfig:
endpoints:
- localhost:2379
lockKey: awesomeJob1
lockTimeout: 1m
timeout: 75s
10.2.2. What does it do under normal circumstances?
As an example the example config would be run as a job on a cluster of 3 nodes in an etcd cluster.
each node is running etcd on port 2379, and together they form a cluster
each node has this exact job definition, and PgQuartz is scheduled to run this job at the same time
The following would happen:
any node would be first to run PgQuartz and lock the key
awesomeJob1
that node would wait 10 seconds and then add the current epoch (seconds elapsed since January 1, 1970) into a file /tmp/beenhere.txt
other nodes wait until the job on the locking node is finished and has released the node
When the job on the first node is finished
on this initial node, PgQuartz releases the lock, and wraps up (run checks and exit)
one of the other 2 nodes would notice the release and lock the key
awesomeJob1
after which it wouldwait 10 seconds
add the current epoch (seconds elapsed since January 1, 1970) into a file /tmp/beenhere.txt
the last node waits until the job on the second node is released
When the job on the second node is finished
on this second node, PgQuartz releases the lock, and wraps up (run checks and exit)
the last node would notice the release and lock the key
awesomeJob1
after which it wouldwait 10 seconds
add the current epoch (seconds elapsed since January 1, 1970) into a file /tmp/beenhere.txt
note that:
The job has been scheduled to run at the same time
The job has actually run after each other one node at a time
There is no predetermined order. Probably, ntp drift would decide run order
the last line in the /tmp/beenhere.txt files would all differ by 10 seconds (or 11 in some rare cases)
10.2.3. What happens on issues
If a network partition would occur between the node currently running the job, and the other 2 nodes, the following could happen:
If the network partition resolves within 60 seconds, all works out fine
If the network partition takes longer:
etcd releases the lock
the job that was running exits with an error
the next node in line would run the job
Maybe the second job, and the 3rd job for sure will not succeed due to the job timeout (81 seconds):
first node runs (0 - 10) seconds
partition hangs for 60 seconds
second node runs for 10 seconds (starts 60-70 seconds after job start)
if it is started after waiting for 65 seconds or more, it would not finish, but be timed out after 75 seconds
If not, it will finish, only to leave 0-5 seconds for node 3 before timing out as well
If any issue occurs (hang, process kill, etc.) either PgQuartz would cleanly release the lock, or the above described behavior would be perceived.