A friend recently reminded me of the existence of chrony, a "versatile implementation of the Network Time Protocol (NTP)". The excellent introduction is worth quoting in full:
It can synchronise the system clock with NTP servers, reference clocks (e.g. GPS receiver), and manual input using wristwatch and keyboard. It can also operate as an NTPv4 (RFC 5905) server and peer to provide a time service to other computers in the network.
It is designed to perform well in a wide range of conditions, including intermittent network connections, heavily congested networks, changing temperatures (ordinary computer clocks are sensitive to temperature), and systems that do not run continuosly, or run on a virtual machine.
Typical accuracy between two machines synchronised over the Internet is within a few milliseconds; on a LAN, accuracy is typically in tens of microseconds. With hardware timestamping, or a hardware reference clock, sub-microsecond accuracy may be possible.
Now that's already great documentation right there. What it is, why it's good, and what to expect from it. I want more. They have a very handy comparison table between chrony
, ntp and openntpd.
My problem with OpenNTPd
Following concerns surrounding the security (and complexity) of the venerable ntp
program, I have, a long time ago, switched to using openntpd on all my computers. I hadn't thought about it until I recently noticed a lot of noise on one of my servers:
jan 18 10:09:49 curie ntpd[1069]: adjusting local clock by -1.604366s
jan 18 10:08:18 curie ntpd[1069]: adjusting local clock by -1.577608s
jan 18 10:05:02 curie ntpd[1069]: adjusting local clock by -1.574683s
jan 18 10:04:00 curie ntpd[1069]: adjusting local clock by -1.573240s
jan 18 10:02:26 curie ntpd[1069]: adjusting local clock by -1.569592s
You read that right, openntpd
was constantly rewinding the clock, sometimes in less than two minutes. The above log was taken while doing diagnostics, looking at the last 30 minutes of logs. So, on average, one 1.5 seconds rewind per 6 minutes!
That might be due to a dying real time clock (RTC) or some other hardware problem. I know for a fact that the CMOS battery on that computer (curie) died and I wasn't able to replace it (!). So that's partly garbage-in, garbage-out here. But still, I was curious to see how chrony
would behave... (Spoiler: much better.)
But I also had trouble on another workstation, that one a much more recent machine (angela). First, it seems OpenNTPd would just fail at boot time:
anarcat@angela:~(main)$ sudo systemctl status openntpd
● openntpd.service - OpenNTPd Network Time Protocol
Loaded: loaded (/lib/systemd/system/openntpd.service; enabled; vendor pres>
Active: inactive (dead) since Sun 2022-01-23 09:54:03 EST; 6h ago
Docs: man:openntpd(8)
Process: 3291 ExecStartPre=/usr/sbin/ntpd -n $DAEMON_OPTS (code=exited, sta>
Process: 3294 ExecStart=/usr/sbin/ntpd $DAEMON_OPTS (code=exited, status=0/>
Main PID: 3298 (code=exited, status=0/SUCCESS)
CPU: 34ms
jan 23 09:54:03 angela systemd[1]: Starting OpenNTPd Network Time Protocol...
jan 23 09:54:03 angela ntpd[3291]: configuration OK
jan 23 09:54:03 angela ntpd[3297]: ntp engine ready
jan 23 09:54:03 angela ntpd[3297]: ntp: recvfrom: Permission denied
jan 23 09:54:03 angela ntpd[3294]: Terminating
jan 23 09:54:03 angela systemd[1]: Started OpenNTPd Network Time Protocol.
jan 23 09:54:03 angela systemd[1]: openntpd.service: Succeeded.
After a restart, somehow it worked, but it took a long time to sync the clock. At first, it would just not consider any peer at all:
anarcat@angela:~(main)$ sudo ntpctl -s all
0/20 peers valid, clock unsynced
peer
wt tl st next poll offset delay jitter
159.203.8.72 from pool 0.debian.pool.ntp.org
1 5 2 6s 6s ---- peer not valid ----
138.197.135.239 from pool 0.debian.pool.ntp.org
1 5 2 6s 7s ---- peer not valid ----
216.197.156.83 from pool 0.debian.pool.ntp.org
1 4 1 2s 9s ---- peer not valid ----
142.114.187.107 from pool 0.debian.pool.ntp.org
1 5 2 5s 6s ---- peer not valid ----
216.6.2.70 from pool 1.debian.pool.ntp.org
1 4 2 2s 8s ---- peer not valid ----
207.34.49.172 from pool 1.debian.pool.ntp.org
1 4 2 0s 5s ---- peer not valid ----
198.27.76.102 from pool 1.debian.pool.ntp.org
1 5 2 5s 5s ---- peer not valid ----
158.69.254.196 from pool 1.debian.pool.ntp.org
1 4 3 1s 6s ---- peer not valid ----
149.56.121.16 from pool 2.debian.pool.ntp.org
1 4 2 5s 9s ---- peer not valid ----
162.159.200.123 from pool 2.debian.pool.ntp.org
1 4 3 1s 6s ---- peer not valid ----
206.108.0.131 from pool 2.debian.pool.ntp.org
1 4 1 6s 9s ---- peer not valid ----
205.206.70.40 from pool 2.debian.pool.ntp.org
1 5 2 8s 9s ---- peer not valid ----
2001:678:8::123 from pool 2.debian.pool.ntp.org
1 4 2 5s 9s ---- peer not valid ----
2606:4700:f1::1 from pool 2.debian.pool.ntp.org
1 4 3 2s 6s ---- peer not valid ----
2607:5300:205:200::1991 from pool 2.debian.pool.ntp.org
1 4 2 5s 9s ---- peer not valid ----
2607:5300:201:3100::345c from pool 2.debian.pool.ntp.org
1 4 4 1s 6s ---- peer not valid ----
209.115.181.110 from pool 3.debian.pool.ntp.org
1 5 2 5s 6s ---- peer not valid ----
205.206.70.42 from pool 3.debian.pool.ntp.org
1 4 2 0s 6s ---- peer not valid ----
68.69.221.61 from pool 3.debian.pool.ntp.org
1 4 1 2s 9s ---- peer not valid ----
162.159.200.1 from pool 3.debian.pool.ntp.org
1 4 3 4s 7s ---- peer not valid ----
Then it would accept them, but still wouldn't sync the clock:
anarcat@angela:~(main)$ sudo ntpctl -s all
20/20 peers valid, clock unsynced
peer
wt tl st next poll offset delay jitter
159.203.8.72 from pool 0.debian.pool.ntp.org
1 8 2 5s 6s 0.672ms 13.507ms 0.442ms
138.197.135.239 from pool 0.debian.pool.ntp.org
1 7 2 4s 8s 1.260ms 13.388ms 0.494ms
216.197.156.83 from pool 0.debian.pool.ntp.org
1 7 1 3s 5s -0.390ms 47.641ms 1.537ms
142.114.187.107 from pool 0.debian.pool.ntp.org
1 7 2 1s 6s -0.573ms 15.012ms 1.845ms
216.6.2.70 from pool 1.debian.pool.ntp.org
1 7 2 3s 8s -0.178ms 21.691ms 1.807ms
207.34.49.172 from pool 1.debian.pool.ntp.org
1 7 2 4s 8s -5.742ms 70.040ms 1.656ms
198.27.76.102 from pool 1.debian.pool.ntp.org
1 7 2 0s 7s 0.170ms 21.035ms 1.914ms
158.69.254.196 from pool 1.debian.pool.ntp.org
1 7 3 5s 8s -2.626ms 20.862ms 2.032ms
149.56.121.16 from pool 2.debian.pool.ntp.org
1 7 2 6s 8s 0.123ms 20.758ms 2.248ms
162.159.200.123 from pool 2.debian.pool.ntp.org
1 8 3 4s 5s 2.043ms 14.138ms 1.675ms
206.108.0.131 from pool 2.debian.pool.ntp.org
1 6 1 0s 7s -0.027ms 14.189ms 2.206ms
205.206.70.40 from pool 2.debian.pool.ntp.org
1 7 2 1s 5s -1.777ms 53.459ms 1.865ms
2001:678:8::123 from pool 2.debian.pool.ntp.org
1 6 2 1s 8s 0.195ms 14.572ms 2.624ms
2606:4700:f1::1 from pool 2.debian.pool.ntp.org
1 7 3 6s 9s 2.068ms 14.102ms 1.767ms
2607:5300:205:200::1991 from pool 2.debian.pool.ntp.org
1 6 2 4s 9s 0.254ms 21.471ms 2.120ms
2607:5300:201:3100::345c from pool 2.debian.pool.ntp.org
1 7 4 5s 9s -1.706ms 21.030ms 1.849ms
209.115.181.110 from pool 3.debian.pool.ntp.org
1 7 2 0s 7s 8.907ms 75.070ms 2.095ms
205.206.70.42 from pool 3.debian.pool.ntp.org
1 7 2 6s 9s -1.729ms 53.823ms 2.193ms
68.69.221.61 from pool 3.debian.pool.ntp.org
1 7 1 1s 7s -1.265ms 46.355ms 4.171ms
162.159.200.1 from pool 3.debian.pool.ntp.org
1 7 3 4s 8s 1.732ms 35.792ms 2.228ms
It took a solid five minutes to sync the clock, even though the peers were considered valid within a few seconds:
jan 23 15:58:41 angela systemd[1]: Started OpenNTPd Network Time Protocol.
jan 23 15:58:58 angela ntpd[84086]: peer 142.114.187.107 now valid
jan 23 15:58:58 angela ntpd[84086]: peer 198.27.76.102 now valid
jan 23 15:58:58 angela ntpd[84086]: peer 207.34.49.172 now valid
jan 23 15:58:58 angela ntpd[84086]: peer 209.115.181.110 now valid
jan 23 15:58:59 angela ntpd[84086]: peer 159.203.8.72 now valid
jan 23 15:58:59 angela ntpd[84086]: peer 138.197.135.239 now valid
jan 23 15:58:59 angela ntpd[84086]: peer 162.159.200.123 now valid
jan 23 15:58:59 angela ntpd[84086]: peer 2607:5300:201:3100::345c now valid
jan 23 15:59:00 angela ntpd[84086]: peer 2606:4700:f1::1 now valid
jan 23 15:59:00 angela ntpd[84086]: peer 158.69.254.196 now valid
jan 23 15:59:01 angela ntpd[84086]: peer 216.6.2.70 now valid
jan 23 15:59:01 angela ntpd[84086]: peer 68.69.221.61 now valid
jan 23 15:59:01 angela ntpd[84086]: peer 205.206.70.40 now valid
jan 23 15:59:01 angela ntpd[84086]: peer 205.206.70.42 now valid
jan 23 15:59:02 angela ntpd[84086]: peer 162.159.200.1 now valid
jan 23 15:59:04 angela ntpd[84086]: peer 216.197.156.83 now valid
jan 23 15:59:05 angela ntpd[84086]: peer 206.108.0.131 now valid
jan 23 15:59:05 angela ntpd[84086]: peer 2001:678:8::123 now valid
jan 23 15:59:05 angela ntpd[84086]: peer 149.56.121.16 now valid
jan 23 15:59:07 angela ntpd[84086]: peer 2607:5300:205:200::1991 now valid
jan 23 16:03:47 angela ntpd[84086]: clock is now synced
That seems kind of odd. It was also frustrating to have very little information from ntpctl
about the state of the daemon. I understand it's designed to be minimal, but it could inform me on his known offset, for example. It does tell me about the offset with the different peers, but not as clearly as one would expect. It's also unclear how it disciplines the RTC at all.
Compared to chrony
Now compare with chrony
:
jan 23 16:07:16 angela systemd[1]: Starting chrony, an NTP client/server...
jan 23 16:07:16 angela chronyd[87765]: chronyd version 4.0 starting (+CMDMON +NTP +REFCLOCK +RTC +PRIVDROP +SCFILTER +SIGND +ASYNCDNS +NTS +SECHASH +IPV6 -DEBUG)
jan 23 16:07:16 angela chronyd[87765]: Initial frequency 3.814 ppm
jan 23 16:07:16 angela chronyd[87765]: Using right/UTC timezone to obtain leap second data
jan 23 16:07:16 angela chronyd[87765]: Loaded seccomp filter
jan 23 16:07:16 angela systemd[1]: Started chrony, an NTP client/server.
jan 23 16:07:21 angela chronyd[87765]: Selected source 206.108.0.131 (2.debian.pool.ntp.org)
jan 23 16:07:21 angela chronyd[87765]: System clock TAI offset set to 37 seconds
First, you'll notice there's none of that "clock synced" nonsense, it picks a source, and then... it's just done. Because the clock on this computer is not drifting that much, and openntpd had (presumably) just sync'd it anyways. And indeed, if we look at detailed stats from the powerful chronyc
client:
anarcat@angela:~(main)$ sudo chronyc tracking
Reference ID : CE6C0083 (ntp1.torix.ca)
Stratum : 2
Ref time (UTC) : Sun Jan 23 21:07:21 2022
System time : 0.000000311 seconds slow of NTP time
Last offset : +0.000807989 seconds
RMS offset : 0.000807989 seconds
Frequency : 3.814 ppm fast
Residual freq : -24.434 ppm
Skew : 1000000.000 ppm
Root delay : 0.013200894 seconds
Root dispersion : 65.357254028 seconds
Update interval : 1.4 seconds
Leap status : Normal
We see that we are nanoseconds away from NTP time. That was ran very quickly after starting the server (literally in the same second as chrony
picked a source), so stats are a bit weird (e.g. the Skew
is huge). After a minute or two, it looks more reasonable:
Reference ID : CE6C0083 (ntp1.torix.ca)
Stratum : 2
Ref time (UTC) : Sun Jan 23 21:09:32 2022
System time : 0.000487002 seconds slow of NTP time
Last offset : -0.000332960 seconds
RMS offset : 0.000751204 seconds
Frequency : 3.536 ppm fast
Residual freq : +0.016 ppm
Skew : 3.707 ppm
Root delay : 0.013363549 seconds
Root dispersion : 0.000324015 seconds
Update interval : 65.0 seconds
Leap status : Normal
Now it's learning how good or bad the RTC clock is ("Frequency"), and is smoothly adjusting the System time
to follow the average offset (RMS offset
, more or less). You'll also notice the Update interval
has risen, and will keep expanding as chrony
learns more about the internal clock, so it doesn't need to constantly poll the NTP servers to sync the clock. In the above, we're 487 micro seconds (less than a milisecond!) away from NTP time.
(People interested in the explanation of every single one of those fields can read the excellent chronyc manpage. That thing made me want to nerd out on NTP again!)
On the machine with the bad clock, chrony
also did a 1.5 second adjustment, but just once, at startup:
jan 18 11:54:33 curie chronyd[2148399]: Selected source 206.108.0.133 (2.debian.pool.ntp.org)
jan 18 11:54:33 curie chronyd[2148399]: System clock wrong by -1.606546 seconds
jan 18 11:54:31 curie chronyd[2148399]: System clock was stepped by -1.606546 seconds
jan 18 11:54:31 curie chronyd[2148399]: System clock TAI offset set to 37 seconds
Then it would still struggle to keep the clock in sync, but not as badly as openntpd. Here's the offset a few minutes after that above startup:
System time : 0.000375352 seconds slow of NTP time
And again a few seconds later:
System time : 0.001793046 seconds slow of NTP time
I don't currently have access to that machine, and will update this post with the latest status, but so far I've had a very good experience with chrony
on that machine, which is a testament to its resilience, and it also just works on my other machines as well.
Extras
On top of "just working" (as demonstrated above), I feel that chrony
's feature set is so much superior... Here's an excerpt of the extras in chrony, taken from comparison table:
- source frequency tracking
- source state restore from file
- temperature compensation
- ready for next NTP era (year 2036)
- replace unreachable / falseticker servers
- aware of jitter
- RTC drift tracking
- RTC trimming
- Restore time from file w/o RTC
- leap seconds correction, in slew mode
- drops root privileges
I even understand some of that stuff. I think.
So kudos to the chrony
folks, I'm switching.
Caveats
One thing to keep in mind in the above, however is that it's quite possible chrony
does as bad of a job as openntpd
on that old machine, and just doesn't tell me about it. For example, here's another log sample from another server (marcos):
jan 23 11:13:25 marcos ntpd[1976694]: adjusting clock frequency by 0.451035 to -16.420273ppm
I get those basically every day, which seems to show that it's at least trying to keep track of the hardware clock.
In other words, it's quite possible I have no idea what I'm talking about and you definitely need to take this article with a grain of salt. I'm not an NTP expert.
Switching to chrony
Because the default configuration in chrony
(at least as shipped in Debian) is sane (good default peers, no open network by default), installing it is as simple as:
apt install chrony
And because it somehow conflicts with openntpd
, that also takes care of removing that cruft as well.
from Planet Python
via read more
No comments:
Post a Comment