vmm.dev

xv6 bootloader

xv6 bootl...xv6 bootloaderrust dosrust doskrabskrabsxv6xv6ososvmm.devvmm.dev

xv6 bootloader

Reading the xv6 bootloader

xv6 is useful because its boot path is small enough to read directly. BIOS loads one sector, that sector moves the CPU from real mode toward protected mode, a tiny C loader reads an ELF kernel image, and control finally moves to the kernel entry point.

The point of this note is to make that path explicit. Many kernel explanations begin after the machine is already executing kernel code. For low-level work, the more useful question is: how did the CPU get there?

BIOS 0x7c00 bootblock load ELF kernel entry entry.S real mode -> protected mode -> kernel virtual address space

Files worth reading

The initial path is concentrated in a handful of files:

  • Makefile
  • bootasm.S
  • bootmain.c
  • sign.pl
  • elf.h
  • x86.h
  • kernel.ld
  • entry.S
  • memlayout.h

The flow is:

  1. BIOS loads the first disk sector at 0x7c00.
  2. bootasm.S starts in 16-bit real mode.
  3. The boot code clears segment registers and disables interrupts.
  4. The loader enables A20.
  5. It installs a minimal GDT.
  6. It switches into protected mode.
  7. It calls bootmain.
  8. bootmain.c reads the kernel ELF image from disk.
  9. The program headers tell the loader where to place each segment.
  10. The loader jumps to the ELF entry point.

Start from the image target

A useful reading habit is to start from the thing being built. xv6 creates xv6.img by writing the boot block first and the kernel right after it.

xv6.img: bootblock kernel
	dd if=/dev/zero of=xv6.img count=10000
	dd if=bootblock of=xv6.img conv=notrunc
	dd if=kernel of=xv6.img seek=1 conv=notrunc

That tells us that bootblock is the first sector and kernel begins at sector 1. The boot sector does not need a filesystem. It knows that the kernel is immediately after it.

The boot block itself is linked for the address where BIOS will load it:

bootblock: bootasm.S bootmain.c
	$(CC) $(CFLAGS) -fno-pic -O -nostdinc -I. -c bootmain.c
	$(CC) $(CFLAGS) -fno-pic -nostdinc -I. -c bootasm.S
	$(LD) $(LDFLAGS) -N -e start -Ttext 0x7C00 -o bootblock.o bootasm.o bootmain.o
	$(OBJDUMP) -S bootblock.o > bootblock.asm
	$(OBJCOPY) -S -O binary -j .text bootblock.o bootblock
	./sign.pl bootblock

The -e start option names the entry symbol. The -Ttext 0x7C00 option says where the text section expects to run.

Boot sector and signature

On PC-compatible BIOS systems, the first sector of the boot device is loaded at physical address 0x7c00. The CPU starts executing there in real mode.

boot sector 0x0000:0x7c00 low memory 1 MiB edge

The last two bytes must be 0x55 and 0xaa. xv6 appends that signature with sign.pl:

if($n > 510){
  print STDERR "boot block too large: $n bytes (max 510)\n";
  exit 1;
}

print SIG $buf;
print SIG "\0" x (510-$n);
print SIG "\x55\xAA";

This also shows the size pressure. The boot code must fit in 510 bytes plus the two-byte signature. xv6 therefore uses assembly only for the transition work and hands off to a tiny C loader as soon as it can.

Real mode address arithmetic

Real mode uses segment:offset addressing. The linear address is:

segment * 16 + offset
segment 0x07c0 << 4 offset 0x0200 linear 0x07e00 real mode address = segment * 16 + offset

This is why boot examples constantly mention 0x7c00, segment registers, and low memory. Before protected mode and paging, the CPU is still living with these legacy rules.

The first assembly instructions

bootasm.S begins in 16-bit mode:

.code16
.globl start
start:
  cli
  xorw %ax,%ax
  movw %ax,%ds
  movw %ax,%es
  movw %ax,%ss

cli disables interrupts. The segment registers are set to zero, making memory references easier to reason about. This is early boot code, so predictable state matters more than abstraction.

Enabling A20

The original 8086 addressing behavior wrapped addresses at the 1 MiB boundary. Later systems had an A20 line so software could enable access above that boundary while preserving old compatibility behavior when needed.

20-bit bus wraps at 1 MiB 32-bit bus extended memory A20 enable line

xv6 enables A20 through the keyboard-controller path. That sounds strange today, but it is part of the legacy PC boot environment. The pattern is:

  1. wait for the controller input buffer to be clear,
  2. send a command,
  3. wait again,
  4. write the value that enables A20.

The important lesson is not the keyboard controller itself. The lesson is that boot code must sometimes perform old platform rituals before modern execution modes become available.

The GDT and protected mode

Protected mode uses segment descriptors instead of real-mode segment shifting. xv6 installs a minimal GDT with flat code and data segments.

code segment: base 0, limit 4 GiB data segment: base 0, limit 4 GiB

The GDT entries give code and data segments with base 0 and a large limit. Once these are active, addressing is much closer to the flat model that C code expects.

The switch happens by setting the protected-mode enable bit in cr0, then doing a far jump so the code segment selector is reloaded:

movl %cr0, %eax
orl  $CR0_PE, %eax
movl %eax, %cr0
ljmp $(SEG_KCODE<<3), $start32

After that jump, execution continues in .code32.

Calling C code

Once in 32-bit mode, xv6 loads the data segment selectors and prepares a stack:

.code32
start32:
  movw $(SEG_KDATA<<3), %ax
  movw %ax,%ds
  movw %ax,%es
  movw %ax,%ss
  movl $start, %esp
  call bootmain

The stack is small and temporary. It only has to survive the loader's work.

Disk reads

The C loader reads sectors using PIO IDE operations. This is not a general storage stack. It is just enough code to read the kernel from the known disk location.

static void
waitdisk(void)
{
  while((inb(0x1F7) & 0xC0) != 0x40)
    ;
}

The read path waits for the disk, selects a sector, issues a command, waits again, and reads words from the data port. The loader can be this direct because xv6's disk image layout is deliberately simple.

ELF header check

bootmain first reads the beginning of the kernel image and checks the ELF magic.

elf = (struct elfhdr*)0x10000;
readseg((uchar*)elf, 4096, 0);
if(elf->magic != ELF_MAGIC)
  return;

Reading 4096 bytes is more than the ELF header itself. That is useful because the program-header table is nearby, and the loader needs it next.

Program headers

The ELF program headers tell the loader what to put into memory. The important fields are:

FieldMeaning
offoffset in the file
paddrphysical load address
fileszbytes present in the file
memszbytes required in memory
ELF file headers program hdr paddr + size memory segments bootmain loads each segment, then jumps to entry.

The main loading loop is compact:

ph = (struct proghdr*)((uchar*)elf + elf->phoff);
eph = ph + elf->phnum;
for(; ph < eph; ph++){
  pa = (uchar*)ph->paddr;
  readseg(pa, ph->filesz, ph->off);
  if(ph->memsz > ph->filesz)
    stosb(pa + ph->filesz, 0, ph->memsz - ph->filesz);
}

filesz and memsz differ when a segment contains memory that should exist at runtime but does not occupy bytes in the file, such as BSS. xv6 zero-fills that extra memory with stosb.

Entry point problem

The loader finally calls the ELF entry point:

entry = (void(*)(void))(elf->entry);
entry();

At first this looks suspicious. Kernel code is linked at high virtual addresses, but paging is not fully established when the boot loader first jumps. xv6 handles this through its linker script and address macros. memlayout.h defines the relationship:

#define EXTMEM   0x100000
#define KERNBASE 0x80000000
#define KERNLINK (KERNBASE+EXTMEM)
#define V2P(a) (((uint) (a)) - KERNBASE)

The entry path is arranged so the first executed address is usable before the kernel turns on the full virtual mapping.

Kernel linker script

The kernel linker script places the kernel at its virtual address while also describing the physical load address:

. = 0x80100000;
.text : AT(0x100000) {
  *(.text .stub .text.* .gnu.linkonce.t.*)
}

The AT address is the load address. The section's virtual address is where the kernel expects to run after paging is active. This split between load memory address and virtual memory address is one of the key bootloader ideas.

Why this matters

The xv6 loader is tiny, but it crosses the same conceptual boundaries as larger systems:

  • firmware hands control to a boot sector,
  • real mode gives way to protected mode,
  • the loader creates the minimal descriptor state needed for C,
  • the disk is read through hardware ports,
  • ELF metadata describes the kernel image,
  • physical loading is reconciled with virtual execution.

That is why this note is a useful companion to rust dos and krabs. DOS shows the same machine in an even smaller environment; KRaBs expands the bootloader idea into a Rust project.

See also

Related: xv6, krabs, rust dos.