virtualized time

Timekeeping in virtualized Linux

Time in a virtual machine is not one thing. User-space sees clocks. The kernel maintains timekeeping state. Hardware or the hypervisor provides counters and wall-clock seeds. Debugging time drift or missing clocksources requires keeping those layers separate.

The three-layer model

For this note, it is useful to split time into three layers.

Layer	Examples	Question
user space	`clock_gettime`, `date`, timers	what does the application observe?
kernel	timekeeper, hrtimer, jiffies, clocksource	how does Linux maintain time?
hardware or hypervisor	RTC, TSC, HPET, ACPI PM timer, pvclock	what source feeds the kernel?

The real kernel has more machinery, but this model prevents a common mistake: treating RTC, clocksource, NTP, and application clocks as the same problem.

Initial wall-clock time

At boot, the kernel needs an initial wall-clock value. A real-time clock device can provide calendar-like data: year, month, day, hour, minute, second. Linux converts that into internal timekeeping state.

After boot, the RTC is not the primary mechanism for every timestamp. The system keeps time by measuring elapsed time from a counter and combining that delta with the initial wall-clock value.

time now = boot wall-clock seed + elapsed counter delta

This is why a broken RTC and a bad clocksource produce different symptoms.

Elapsed time is the center

The important idea is that the operating system does not need to ask the RTC for the current calendar time on every timestamp. Once Linux has an initial wall-clock seed, it mainly needs elapsed time.

If a counter increases at a known rate, Linux can read the counter later and calculate the delta:

counter at boot = 20
counter later   = 100
delta           = 80 ticks

With the counter frequency, that becomes elapsed time. With elapsed time and the initial wall-clock seed, Linux can maintain wall-clock time, monotonic time, scheduler timestamps, and timer expirations.

That is why clocksource quality matters so much. A clocksource is not a decorative device. It is the foundation under the timekeeping state that user space eventually observes.

Clocksource

Linux abstracts readable counters through the clocksource framework. A good clocksource should be monotonic, stable, fast to read, high enough resolution, and not too expensive under virtualization.

Classic PC timers include PIT, HPET, and the ACPI PM timer. They are useful as fallback or compatibility devices, but they are often poor primary clocksources in virtual machines because reads may involve emulation.

What a good counter should provide

The original note listed the properties in plain terms. A useful clocksource should:

not move backward under normal operation,
not stop unexpectedly,
not jump unless the kernel knows how to account for it,
have a stable or known frequency,
have enough resolution for the workload,
be cheap to read,
work across CPUs or be constrained so Linux can use it safely,
and, ideally, support fast user-space reads through vDSO when appropriate.

No physical or virtual device is perfect. Linux therefore rates, validates, and sometimes rejects clocksources. The tsc can be excellent, but only if Linux trusts its frequency and stability in that environment.

Legacy devices in virtual machines

PC-compatible systems have long exposed devices such as:

Device	Role	VM concern
PIT	old programmable interval timer	low resolution and usually emulated
HPET	high precision event timer	can be expensive when emulated
ACPI PM timer	platform timer	read path may be slow in a guest
RTC	wall-clock seed and persistent clock	lifecycle semantics depend on platform

These devices are still useful for compatibility, fallback, and calibration. They are usually not what you want as the primary clocksource in a modern VM.

The reason is cost. A guest read of an emulated legacy device can require a VM exit or a hypervisor-assisted path. A time read is a common operation, so that cost matters.

TSC

The Time Stamp Counter is a CPU counter. On modern systems it is often the best clocksource because it is extremely fast to read. The catch is stability. Linux must know whether the TSC is invariant, synchronized enough across CPUs, and usable under the hypervisor's behavior.

A TSC that changes frequency, stops in deep idle states, jumps during migration, or differs between CPUs is dangerous as a primary clocksource. Modern hardware and hypervisors generally handle this much better than older systems, but the kernel still has validation paths for a reason.

Historically, TSC handling was one of the places where virtualization details leaked into the guest. Migration, CPU frequency behavior, nested virtualization, and the way the hypervisor presents CPU features can all affect whether Linux trusts it.

When the TSC is usable, it is often the best answer. When it is not provably usable, Linux may choose a paravirtual clocksource instead.

pvclock and paravirtual clocks

Virtualized guests need help. KVM and Xen can expose paravirtual clock information so the guest does not treat every time read as an emulated legacy-device access. In Linux, KVM guests commonly use kvm-clock; Xen guests use Xen clock support.

The point is not that pvclock is magical. The point is that the hypervisor and guest cooperate so Linux can compute time cheaply and consistently.

The paravirtual interface can provide data that lets the guest convert a counter value into nanoseconds without repeatedly reading slow legacy devices. This is why kvm-clock or Xen clock support can be a better primary clocksource than HPET or ACPI PM in a guest.

The relationship between pvclock and TSC is close. pvclock often uses TSC-derived data with hypervisor-provided conversion information. That is why the missing-tsc story in kernel patch eventually led into KVM clock and TSC calibration code.

Practical expectations

Environment	Usually good	Usually poor
KVM guest	`kvm-clock`, validated TSC	HPET or ACPI PM as primary source
Xen guest	Xen clocksource, validated TSC on suitable hardware	emulated legacy timers
bare metal modern x86	invariant TSC	PIT as primary source

Always check the actual machine. Hypervisor version, CPU features, migration settings, and kernel version matter.

The earlier version had more opinionated tables. They are useful as intuition, but they should not be read as universal rules for every current cloud or hypervisor release.

Older Xen-like environment

Source	Read/write shape	vDSO-friendly	Practical note
paravirtual clock	read-oriented	often no	good baseline for old guests
TSC	read and sometimes virtualized write context	yes when trusted	risky on older hardware or migration paths
ACPI PM	read-only	no	fallback, not preferred
HPET	read-only	no	fallback, not preferred
jiffies	kernel tick state	no	not a precision time source

The warning in the original note was mostly about older hardware and migration behavior. If a guest can move between hosts where TSC behavior is not consistent, blindly trusting TSC is unsafe.

Newer Xen-like environment

Source	Read/write shape	vDSO-friendly	Practical note
paravirtual clock	read-oriented	environment-dependent	still useful
validated TSC	read path	yes	can be excellent when invariant and synchronized
ACPI PM	read-only	no	fallback
HPET	read-only	no	fallback
jiffies	kernel tick state	no	not a precision time source

The shift is that newer CPU and hypervisor support can make TSC much more attractive.

KVM-like environment

Source	Read/write shape	vDSO-friendly	Practical note
`kvm-clock`	paravirtual clocksource	yes on common setups	strong default in many guests
validated TSC	fast counter	yes	strong when Linux trusts it
ACPI PM	read-only	no	fallback and calibration reference
HPET	read-only	no	fallback and calibration reference
jiffies	kernel tick state	no	not a primary precision source

In practice, the selected clocksource is a kernel decision. Do not force a source because a table says it is theoretically faster. First read what Linux selected and why.

Checking Linux state

The selected and available clocksources are exposed in sysfs:

cat /sys/devices/system/clocksource/clocksource0/current_clocksource
cat /sys/devices/system/clocksource/clocksource0/available_clocksource

CPU flags can explain what Linux believed about the processor:

grep -m1 '^flags' /proc/cpuinfo

Boot logs often explain clocksource decisions:

dmesg -T | grep -iE 'clocksource|tsc|kvm-clock|xen|hpet|acpi_pm'

User-space time state is visible through:

timedatectl status

If the problem is application-visible time, also check which clock the application uses. CLOCK_REALTIME, CLOCK_MONOTONIC, timerfd, scheduler time, and file timestamps do not all answer the same question.

RTC is a different question

The RTC is mostly about the wall-clock seed and about time across power transitions. In a VM, RTC persistence depends on the hypervisor and lifecycle operation. Reboot, shutdown, suspend, stop/start, and migration can behave differently.

Hypervisor case	Guest can read/write RTC	Reboot behavior	Stop/start behavior
KVM-style local VM	yes	usually persistent	depends on management layer
Xen-style guest	yes	depends on domain lifecycle	depends on toolstack
cloud VM	abstracted by platform	provider-specific	provider-specific

Treat RTC observations as platform evidence, not universal law.

The original note emphasized a useful operational point: if the guest time is wrong after boot, do not blame only the RTC. After boot, the ongoing drift is more often about the selected clocksource, host/guest synchronization, NTP or chrony state, suspend/resume, or migration behavior.

NTP and correction

Clocksource and NTP solve different layers.

The clocksource gives Linux a local timeline. NTP or chrony compares the system to external time and adjusts it. If the local counter is unstable, NTP may continuously correct symptoms without removing the underlying source of drift.

Useful checks:

timedatectl status
chronyc tracking
chronyc sources -v

For incident timelines, this matters because a large correction can make log ordering confusing. When time is part of the incident, record both the synchronization state and the kernel clocksource state.

What can go wrong

Common failure shapes:

Symptom	Possible layer
wrong time immediately after boot	RTC seed, platform lifecycle, image config
steady drift after boot	clocksource stability, NTP state, host issue
time jumps after resume or migration	hypervisor lifecycle, TSC scaling, synchronization
high cost in time reads	emulated legacy clocksource
`tsc` missing from available clocksources	kernel validation, CPU flags, hypervisor presentation
logs appear out of order	clock adjustment, different clocks, delayed logging

The missing-tsc case in kernel patch belongs to the validation row. Linux had to decide whether a counter was trustworthy enough to expose.

Debugging checklist

When time appears wrong, ask:

Is the application using realtime, monotonic, or another clock?
What clocksource did Linux select?
What clocksources were available?
Did boot logs mark TSC unstable?
Is NTP correcting a boot-time error after the fact?
Did the VM migrate or resume?
Is the hypervisor exposing paravirtual clock features?

The kernel patch note is one concrete example: a KVM guest did not list tsc as an available clocksource, so the investigation had to move from sysfs output into kernel validation logic.

virtualized time

virtualized time

Timekeeping in virtualized Linux

The three-layer model

Initial wall-clock time

Elapsed time is the center

Clocksource

What a good counter should provide

Legacy devices in virtual machines

TSC

pvclock and paravirtual clocks

Practical expectations

Older Xen-like environment

Newer Xen-like environment

KVM-like environment

Checking Linux state

RTC is a different question

NTP and correction

What can go wrong

Debugging checklist

References

See also

backlinks