vmm.dev

rust dos

rust dosrust dosxv6 bootl...xv6 bootloaderkrabskrabsxv6xv6vmm.devvmm.dev

rust dos

Creating a DOS executable in Rust

I originally wrote this note to use FreeDOS as a small x86 real-mode programming lab, then make Rust emit a DOS COM program. The path is deliberately concrete: start from raw bytes with debug, move to a freestanding Rust binary, shape the output with a linker script, and finally wrap DOS and keyboard-controller operations in small Rust modules.

The companion notes are xv6 bootloader, where the same real-mode ground appears in a boot sector, and krabs, where the idea grows into a Rust bootloader.

Why Rust

The original motivation was simple: Rust has a strong toolchain, strong types, good performance, and a build system that is much easier to live with than hand-managed C object files. If Rust can target a freestanding environment, it should be possible to use it below the ordinary operating-system layer too.

That does not mean Rust removes the unsafe parts. Real-mode code, BIOS calls, DOS interrupts, port I/O, and binary layout all require explicit unsafe work. The value is that the unsafe surface can be made small and visible. Once the startup and I/O boundary is isolated, ordinary Rust modules can describe the rest of the program more clearly.

Modern note: stable Rust now has asm! support on x86 and x86_64, so the old #![feature(asm)] framing should be treated as historical. Rebuilding core with build-std is still part of the unstable Cargo workflow, so a fully custom no-OS target can still require nightly pieces depending on the exact build plan.

Why DOS

DOS is not used here because it is a modern application platform. It is useful because it is small and direct.

DOS runs in x86 real mode. There is no process isolation in the modern sense, no virtual memory, and very little standing between a program and the machine. That makes it dangerous for production and useful for learning. A bug can hang the system, but a correct eight-byte program can also be understood completely.

The important properties are:

  • COM programs are flat binaries.
  • A COM program starts at offset 0x100.
  • DOS services are called with software interrupts.
  • Hardware can be reached through I/O ports.
  • Addresses are visible as segment and offset pairs.

That is why this note sits between ordinary Rust and lower-level notes such as xv6 bootloader and krabs.

Preparing a FreeDOS image

The old setup used QEMU and a FreeDOS disk image on macOS.

brew install qemu
qemu-img create -f raw freedos.img 100M
wget http://www.freedos.org/download/download/FD12CD.iso
qemu-system-i386 freedos.img -cdrom FD12CD.iso -boot d
qemu-system-i386 freedos.img -boot c

After the installer finishes, booting the disk gives a DOS prompt.

C:\> debug - real mode playground QEMU disk image

This is the whole laboratory. It is small enough to inspect, easy to reset, and close enough to the hardware to make real-mode behavior visible.

First program with debug

DOS includes debug, an old but very useful tool for learning. It can assemble instructions, inspect registers, dump memory, disassemble bytes, set a file name, and write memory to disk.

CommandMeaningUse
Aassembleenter assembly instructions
Uunassembledisassemble memory
Rregisterread or modify registers
Nnameset the output file name
Wwritewrite memory to a file
Qquitleave debug

Running r shows the register state. The output is a reminder that the program is running in a segmented 16-bit world.

AX=0000 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 DS=083F ES=083F SS=083F CS=083F IP=0100 NV UP EI PL NZ NA PO NC

COM files and offset 0x100

DOS has several executable formats. This note uses the COM format because it is the simplest. A COM file is loaded into one segment, and execution begins at offset 0x100.

The first 256 bytes belong to the Program Segment Prefix. DOS prepares that area before the program begins. It contains command-line and process information. For this experiment, the practical rule is simple: assemble the program at 0x100.

-a 100

Then enter a tiny program:

mov ah, 2
mov dl, 41
int 21
int 20
-a 100 083F:0100 mov ah,2 083F:0102 mov dl,41 083F:0104 int 21

The instructions mean:

  1. Put 2 in AH. This selects DOS interrupt 21h, function 2: output one character.
  2. Put 0x41 in DL. That is ASCII A.
  3. Execute int 21h. DOS prints the character.
  4. Execute int 20h. DOS terminates the program.

A dump of memory shows the machine code.

083F:0100 B4 02 B2 41 CD 21 CD 20 bytes are code and data CD is the x86 INT opcode

The bytes are:

b4 02 b2 41 cd 21 cd 20

This is the important conceptual step. A program is bytes. Assembly text is one way to describe those bytes. A COM file is those bytes written to disk. The CPU gives those bytes meaning by decoding them as instructions.

Saving and running the COM file

To save the program, set a file name, set CX to the number of bytes, write from 0x100, and quit.

-n hello.com
-r cx
CX 0000
:8
-w 100
-q

Then run it from the DOS prompt:

C:\> hello
A

This is a complete DOS application. It has no runtime, no object format at execution time, no loader complexity beyond DOS loading a COM file at the expected offset. The program is small enough to account for every byte.

Moving toward Rust

The next question is whether Rust can produce the same kind of thing.

That requires removing assumptions that normal Rust binaries make:

  • no host operating system ABI,
  • no standard library,
  • no ordinary main entry path,
  • no default runtime startup,
  • a fixed load address,
  • and a flat binary output.
Rust no_std linker 0x100 COM DOS int 21h custom target + startup section + objcopy to .COM

The path is:

Rust source -> ELF object -> linked image at 0x100 -> flat COM binary

The ELF image is useful during the build because it gives the linker enough structure. The final DOS program is not meant to stay ELF; it is converted into a raw binary.

Installing the Rust pieces

The original article used nightly Rust and tools common in older bare-metal Rust workflows.

curl https://sh.rustup.rs -sSf | sh
rustup toolchain install nightly
rustup component add rust-src --toolchain nightly
cargo install cargo-xbuild
cargo install cargo-binutils
rustup component add llvm-tools-preview

Some of these names have changed over time, and modern Rust has stabilized parts of this workflow. The conceptual requirements remain the same: build core, avoid std, provide a target specification, link with the layout you need, and convert the output.

Target specification

A target specification tells Rust and LLVM what kind of machine code to produce. A normal target such as i586-unknown-linux-gnu assumes a Linux ABI. DOS real mode does not match that, so the old experiment used a custom JSON target.

A simplified target spec looks like this:

{
  "llvm-target": "i586-unknown-none",
  "arch": "x86",
  "target-endian": "little",
  "target-pointer-width": "32",
  "target-c-int-width": "32",
  "os": "none",
  "executables": true,
  "linker-flavor": "ld.lld",
  "panic-strategy": "abort",
  "disable-redzone": true
}

The key point is not the exact JSON. The key point is that the program does not target Linux, macOS, or Windows. It targets a freestanding x86 environment and then constrains the linked image to something DOS can run.

In .cargo/config.toml, point Cargo at the target and pass the linker script:

[build]
target = "i586-rust_dos.json"

[target.i586-rust_dos]
rustflags = ["-C", "link-arg=-Tlinker.ld"]

no_std

By default, Rust links std, and std depends on an operating system. A DOS COM file does not provide the host interfaces std expects, so the crate starts with no_std.

#![no_std]

With no_std, Rust still has core: basic language items, primitive traits, slices, options, results, and other pieces that do not require an OS.

A panic handler is required because there is no standard runtime to provide one:

use core::panic::PanicInfo;

#[panic_handler]
fn panic(_info: &PanicInfo) -> ! {
    loop {}
}

For a real DOS program you may want the panic handler to print a message and terminate through DOS, but an infinite loop is enough for the first minimal binary.

no_main and startup

A normal Rust binary expects the compiler and runtime to arrange a call into main. That is not what a COM file needs. The program needs its own entry symbol, placed at the beginning of the linked image.

#![no_main]

#[link_section = ".startup"]
#[no_mangle]
pub extern "C" fn _start() -> ! {
    main();
    exit();
}

The first version used the inline assembly available at that time. Modern Rust uses different syntax, but the idea is the same: put DOS function numbers and arguments into the expected registers, then execute int 21h.

pub fn exit() -> ! {
    unsafe {
        core::arch::asm!(
            "mov ah, 0x4c",
            "mov al, 0",
            "int 0x21",
            options(noreturn),
        );
    }
}

For the first demonstration, output one character:

#[no_mangle]
pub extern "C" fn _start() -> ! {
    unsafe {
        core::arch::asm!(
            "mov ah, 2",
            "mov dl, 0x41",
            "int 0x21",
            "int 0x20",
            options(noreturn),
        );
    }
}

Again, treat this as a conceptual sketch. The exact inline assembly constraints and register classes depend on the Rust version and target.

Linker script

The linker script makes the binary layout match DOS expectations. The essential rule is that code begins at 0x100.

ENTRY(_start)
SECTIONS
{
  . = 0x100;
  .startup : { *(.startup) }
  .text : { *(.text*) }
  .rodata : { *(.rodata*) }
  .data : { *(.data*) }
  .bss : { *(.bss*) }
}

That puts the custom startup section first, then normal code and data. This mirrors what debug did manually: place executable bytes at offset 0x100.

Building and extracting a COM binary

Build the ELF image first:

cargo xbuild --release

Inspecting the result with objdump is useful:

cargo objdump --release -- -d

Then convert the linked ELF into a raw binary:

cargo objcopy --release -- -O binary rust_dos.com

The result should be small. That smallness is useful. If the output is unexpectedly large, inspect the sections and symbols. In this kind of program, accidental formatting code or panic machinery can pull in more than expected.

Copying the program into the DOS image

On macOS the old flow mounted the DOS image, copied the COM file, then detached it.

hdiutil attach freedos.img
cp target/i586-rust_dos/release/rust_dos.com /Volumes/FREEDOS2016/
hdiutil detach /dev/disk2

Then boot the image:

qemu-system-i386 freedos.img -boot c

At the prompt:

C:\> rust_dos
A

At this point the Rust-generated COM program has matched the handwritten debug program.

Splitting startup from main

The next cleanup is to make _start small and move actual behavior into main.

#[link_section = ".startup"]
#[no_mangle]
pub extern "C" fn _start() -> ! {
    main();
    dos::exit(0)
}

fn main() {
    print!("hello from rust\r\n");
}

The startup function is still special. It is the ABI boundary. The rest of the code can look more like ordinary Rust.

DOS console module

A console module wraps interrupt 21h function 2.

pub mod console {
    pub fn putc(ch: u8) {
        unsafe {
            core::arch::asm!(
                "mov ah, 2",
                "mov dl, {0}",
                "int 0x21",
                in(reg_byte) ch,
            );
        }
    }

    pub fn puts(text: &str) {
        for byte in text.bytes() {
            putc(byte);
        }
    }
}

A macro makes this tolerable:

#[macro_export]
macro_rules! print {
    ($text:expr) => {{
        $crate::dos::console::puts($text);
    }};
}

Now the higher-level program does not need to know about AH, DL, or int 21h.

Port I/O

To move beyond DOS services and touch hardware, the program needs port I/O. On x86 this uses in and out instructions.

pub unsafe fn inb(port: u16) -> u8 {
    let value: u8;
    core::arch::asm!("in al, dx", in("dx") port, out("al") value);
    value
}

pub unsafe fn outb(port: u16, value: u8) {
    core::arch::asm!("out dx, al", in("dx") port, in("al") value);
}

This is exactly the kind of unsafe boundary Rust should make visible. The operation is inherently unsafe because the compiler cannot know whether a port exists, whether the device is ready, or whether the write will hang the machine.

Keyboard controller

The original article then used the keyboard controller as a small hardware experiment. The classic PC keyboard controller exposes status and data ports:

PortUse
0x64status and command
0x60data

A simplified polling read looks like this:

const KBC_DATA: u16 = 0x60;
const KBC_STATUS: u16 = 0x64;

pub fn read_scan_code() -> u8 {
    loop {
        let status = unsafe { inb(KBC_STATUS) };
        if status & 1 != 0 {
            return unsafe { inb(KBC_DATA) };
        }
    }
}

The scan code is not yet a character. The program needs a map. The original implementation used small maps for plain, shift, control, and alt states, with only part of the US keyboard filled in.

static MAP_PLAIN: [u8; 128] = [
    0, 27, b'1', b'2', b'3', b'4', b'5', b'6',
    b'7', b'8', b'9', b'0', b'-', b'=', 8, b'\t',
    b'q', b'w', b'e', b'r', b't', b'y', b'u', b'i',
    b'o', b'p', b'[', b']', b'\n', 0, b'a', b's',
    // ...
];

A real keyboard driver must handle key release codes, modifier state, extended scan codes, layout differences, and interrupts. The demo stayed smaller: poll, translate, print.

Why Bochs helped

QEMU was convenient for installing and booting FreeDOS. For the keyboard-controller details, the old experiment found Bochs more useful because its legacy-device emulation exposed the behavior being tested more clearly.

brew install bochs
cargo xbuild --release
cargo objcopy -- -I elf32-i386 -O binary \
  target/i586-rust_dos/release/rust_dos \
  target/i586-rust_dos/release/rust_dos.com

Then copy the COM file into the DOS disk image and boot it under Bochs.

References

What this teaches

The route from debug to Rust is the important part:

  1. A DOS COM program is just bytes loaded at 0x100.
  2. DOS interrupts make the ABI visible through registers.
  3. no_std removes OS assumptions.
  4. no_main lets the program own startup.
  5. A linker script gives control over layout.
  6. objcopy turns a linked image into a flat binary.
  7. Small Rust modules can wrap unsafe interrupt and port-I/O boundaries.

That is not a production recommendation. It is a controlled way to understand binary layout, real-mode execution, interrupts, and hardware I/O while still using Rust for structure.

See also

Related: xv6, xv6 bootloader, krabs.