virtualized time
Timekeeping in virtualized Linux
Time in a virtual machine is not one thing. User-space sees clocks. The kernel maintains timekeeping state. Hardware or the hypervisor provides counters and wall-clock seeds. Debugging time drift or missing clocksources requires keeping those layers separate.
The three-layer model
For this note, it is useful to split time into three layers.
| Layer | Examples | Question |
|---|---|---|
| user space | clock_gettime, date, timers | what does the application observe? |
| kernel | timekeeper, hrtimer, jiffies, clocksource | how does Linux maintain time? |
| hardware or hypervisor | RTC, TSC, HPET, ACPI PM timer, pvclock | what source feeds the kernel? |
The real kernel has more machinery, but this model prevents a common mistake: treating RTC, clocksource, NTP, and application clocks as the same problem.
Initial wall-clock time
At boot, the kernel needs an initial wall-clock value. A real-time clock device can provide calendar-like data: year, month, day, hour, minute, second. Linux converts that into internal timekeeping state.
After boot, the RTC is not the primary mechanism for every timestamp. The system keeps time by measuring elapsed time from a counter and combining that delta with the initial wall-clock value.
time now = boot wall-clock seed + elapsed counter delta
This is why a broken RTC and a bad clocksource produce different symptoms.
Elapsed time is the center
The important idea is that the operating system does not need to ask the RTC for the current calendar time on every timestamp. Once Linux has an initial wall-clock seed, it mainly needs elapsed time.
If a counter increases at a known rate, Linux can read the counter later and calculate the delta:
counter at boot = 20
counter later = 100
delta = 80 ticks
With the counter frequency, that becomes elapsed time. With elapsed time and the initial wall-clock seed, Linux can maintain wall-clock time, monotonic time, scheduler timestamps, and timer expirations.
That is why clocksource quality matters so much. A clocksource is not a decorative device. It is the foundation under the timekeeping state that user space eventually observes.
Clocksource
Linux abstracts readable counters through the clocksource framework. A good clocksource should be monotonic, stable, fast to read, high enough resolution, and not too expensive under virtualization.
Classic PC timers include PIT, HPET, and the ACPI PM timer. They are useful as fallback or compatibility devices, but they are often poor primary clocksources in virtual machines because reads may involve emulation.
What a good counter should provide
The original note listed the properties in plain terms. A useful clocksource should:
- not move backward under normal operation,
- not stop unexpectedly,
- not jump unless the kernel knows how to account for it,
- have a stable or known frequency,
- have enough resolution for the workload,
- be cheap to read,
- work across CPUs or be constrained so Linux can use it safely,
- and, ideally, support fast user-space reads through vDSO when appropriate.
No physical or virtual device is perfect. Linux therefore rates, validates, and sometimes rejects clocksources. The tsc can be excellent, but only if Linux trusts its frequency and stability in that environment.
Legacy devices in virtual machines
PC-compatible systems have long exposed devices such as:
| Device | Role | VM concern |
|---|---|---|
| PIT | old programmable interval timer | low resolution and usually emulated |
| HPET | high precision event timer | can be expensive when emulated |
| ACPI PM timer | platform timer | read path may be slow in a guest |
| RTC | wall-clock seed and persistent clock | lifecycle semantics depend on platform |
These devices are still useful for compatibility, fallback, and calibration. They are usually not what you want as the primary clocksource in a modern VM.
The reason is cost. A guest read of an emulated legacy device can require a VM exit or a hypervisor-assisted path. A time read is a common operation, so that cost matters.
TSC
The Time Stamp Counter is a CPU counter. On modern systems it is often the best clocksource because it is extremely fast to read. The catch is stability. Linux must know whether the TSC is invariant, synchronized enough across CPUs, and usable under the hypervisor's behavior.
A TSC that changes frequency, stops in deep idle states, jumps during migration, or differs between CPUs is dangerous as a primary clocksource. Modern hardware and hypervisors generally handle this much better than older systems, but the kernel still has validation paths for a reason.
Historically, TSC handling was one of the places where virtualization details leaked into the guest. Migration, CPU frequency behavior, nested virtualization, and the way the hypervisor presents CPU features can all affect whether Linux trusts it.
When the TSC is usable, it is often the best answer. When it is not provably usable, Linux may choose a paravirtual clocksource instead.
pvclock and paravirtual clocks
Virtualized guests need help. KVM and Xen can expose paravirtual clock information so the guest does not treat every time read as an emulated legacy-device access. In Linux, KVM guests commonly use kvm-clock; Xen guests use Xen clock support.
The point is not that pvclock is magical. The point is that the hypervisor and guest cooperate so Linux can compute time cheaply and consistently.
The paravirtual interface can provide data that lets the guest convert a counter value into nanoseconds without repeatedly reading slow legacy devices. This is why kvm-clock or Xen clock support can be a better primary clocksource than HPET or ACPI PM in a guest.
The relationship between pvclock and TSC is close. pvclock often uses TSC-derived data with hypervisor-provided conversion information. That is why the missing-tsc story in kernel patch eventually led into KVM clock and TSC calibration code.
Practical expectations
| Environment | Usually good | Usually poor |
|---|---|---|
| KVM guest | kvm-clock, validated TSC | HPET or ACPI PM as primary source |
| Xen guest | Xen clocksource, validated TSC on suitable hardware | emulated legacy timers |
| bare metal modern x86 | invariant TSC | PIT as primary source |
Always check the actual machine. Hypervisor version, CPU features, migration settings, and kernel version matter.
The earlier version had more opinionated tables. They are useful as intuition, but they should not be read as universal rules for every current cloud or hypervisor release.
Older Xen-like environment
| Source | Read/write shape | vDSO-friendly | Practical note |
|---|---|---|---|
| paravirtual clock | read-oriented | often no | good baseline for old guests |
| TSC | read and sometimes virtualized write context | yes when trusted | risky on older hardware or migration paths |
| ACPI PM | read-only | no | fallback, not preferred |
| HPET | read-only | no | fallback, not preferred |
| jiffies | kernel tick state | no | not a precision time source |
The warning in the original note was mostly about older hardware and migration behavior. If a guest can move between hosts where TSC behavior is not consistent, blindly trusting TSC is unsafe.
Newer Xen-like environment
| Source | Read/write shape | vDSO-friendly | Practical note |
|---|---|---|---|
| paravirtual clock | read-oriented | environment-dependent | still useful |
| validated TSC | read path | yes | can be excellent when invariant and synchronized |
| ACPI PM | read-only | no | fallback |
| HPET | read-only | no | fallback |
| jiffies | kernel tick state | no | not a precision time source |
The shift is that newer CPU and hypervisor support can make TSC much more attractive.
KVM-like environment
| Source | Read/write shape | vDSO-friendly | Practical note |
|---|---|---|---|
kvm-clock | paravirtual clocksource | yes on common setups | strong default in many guests |
| validated TSC | fast counter | yes | strong when Linux trusts it |
| ACPI PM | read-only | no | fallback and calibration reference |
| HPET | read-only | no | fallback and calibration reference |
| jiffies | kernel tick state | no | not a primary precision source |
In practice, the selected clocksource is a kernel decision. Do not force a source because a table says it is theoretically faster. First read what Linux selected and why.
Checking Linux state
The selected and available clocksources are exposed in sysfs:
cat /sys/devices/system/clocksource/clocksource0/current_clocksource
cat /sys/devices/system/clocksource/clocksource0/available_clocksource
CPU flags can explain what Linux believed about the processor:
grep -m1 '^flags' /proc/cpuinfo
Boot logs often explain clocksource decisions:
dmesg -T | grep -iE 'clocksource|tsc|kvm-clock|xen|hpet|acpi_pm'
User-space time state is visible through:
timedatectl status
If the problem is application-visible time, also check which clock the application uses. CLOCK_REALTIME, CLOCK_MONOTONIC, timerfd, scheduler time, and file timestamps do not all answer the same question.
RTC is a different question
The RTC is mostly about the wall-clock seed and about time across power transitions. In a VM, RTC persistence depends on the hypervisor and lifecycle operation. Reboot, shutdown, suspend, stop/start, and migration can behave differently.
| Hypervisor case | Guest can read/write RTC | Reboot behavior | Stop/start behavior |
|---|---|---|---|
| KVM-style local VM | yes | usually persistent | depends on management layer |
| Xen-style guest | yes | depends on domain lifecycle | depends on toolstack |
| cloud VM | abstracted by platform | provider-specific | provider-specific |
Treat RTC observations as platform evidence, not universal law.
The original note emphasized a useful operational point: if the guest time is wrong after boot, do not blame only the RTC. After boot, the ongoing drift is more often about the selected clocksource, host/guest synchronization, NTP or chrony state, suspend/resume, or migration behavior.
NTP and correction
Clocksource and NTP solve different layers.
The clocksource gives Linux a local timeline. NTP or chrony compares the system to external time and adjusts it. If the local counter is unstable, NTP may continuously correct symptoms without removing the underlying source of drift.
Useful checks:
timedatectl status
chronyc tracking
chronyc sources -v
For incident timelines, this matters because a large correction can make log ordering confusing. When time is part of the incident, record both the synchronization state and the kernel clocksource state.
What can go wrong
Common failure shapes:
| Symptom | Possible layer |
|---|---|
| wrong time immediately after boot | RTC seed, platform lifecycle, image config |
| steady drift after boot | clocksource stability, NTP state, host issue |
| time jumps after resume or migration | hypervisor lifecycle, TSC scaling, synchronization |
| high cost in time reads | emulated legacy clocksource |
tsc missing from available clocksources | kernel validation, CPU flags, hypervisor presentation |
| logs appear out of order | clock adjustment, different clocks, delayed logging |
The missing-tsc case in kernel patch belongs to the validation row. Linux had to decide whether a counter was trustworthy enough to expose.
Debugging checklist
When time appears wrong, ask:
- Is the application using realtime, monotonic, or another clock?
- What clocksource did Linux select?
- What clocksources were available?
- Did boot logs mark TSC unstable?
- Is NTP correcting a boot-time error after the fact?
- Did the VM migrate or resume?
- Is the hypervisor exposing paravirtual clock features?
The kernel patch note is one concrete example: a KVM guest did not list tsc as an available clocksource, so the investigation had to move from sysfs output into kernel validation logic.
References
See also
Related: Linux, kernel patch, linux troubleshooting.