rust dos
Creating a DOS executable in Rust
I originally wrote this note to use FreeDOS as a small x86 real-mode programming lab, then make Rust emit a DOS COM program. The path is deliberately concrete: start from raw bytes with debug, move to a freestanding Rust binary, shape the output with a linker script, and finally wrap DOS and keyboard-controller operations in small Rust modules.
The companion notes are xv6 bootloader, where the same real-mode ground appears in a boot sector, and krabs, where the idea grows into a Rust bootloader.
Why Rust
The original motivation was simple: Rust has a strong toolchain, strong types, good performance, and a build system that is much easier to live with than hand-managed C object files. If Rust can target a freestanding environment, it should be possible to use it below the ordinary operating-system layer too.
That does not mean Rust removes the unsafe parts. Real-mode code, BIOS calls, DOS interrupts, port I/O, and binary layout all require explicit unsafe work. The value is that the unsafe surface can be made small and visible. Once the startup and I/O boundary is isolated, ordinary Rust modules can describe the rest of the program more clearly.
Modern note: stable Rust now has asm! support on x86 and x86_64, so the old #![feature(asm)] framing should be treated as historical. Rebuilding core with build-std is still part of the unstable Cargo workflow, so a fully custom no-OS target can still require nightly pieces depending on the exact build plan.
Why DOS
DOS is not used here because it is a modern application platform. It is useful because it is small and direct.
DOS runs in x86 real mode. There is no process isolation in the modern sense, no virtual memory, and very little standing between a program and the machine. That makes it dangerous for production and useful for learning. A bug can hang the system, but a correct eight-byte program can also be understood completely.
The important properties are:
- COM programs are flat binaries.
- A COM program starts at offset
0x100. - DOS services are called with software interrupts.
- Hardware can be reached through I/O ports.
- Addresses are visible as segment and offset pairs.
That is why this note sits between ordinary Rust and lower-level notes such as xv6 bootloader and krabs.
Preparing a FreeDOS image
The old setup used QEMU and a FreeDOS disk image on macOS.
brew install qemu
qemu-img create -f raw freedos.img 100M
wget http://www.freedos.org/download/download/FD12CD.iso
qemu-system-i386 freedos.img -cdrom FD12CD.iso -boot d
qemu-system-i386 freedos.img -boot c
After the installer finishes, booting the disk gives a DOS prompt.
This is the whole laboratory. It is small enough to inspect, easy to reset, and close enough to the hardware to make real-mode behavior visible.
First program with debug
DOS includes debug, an old but very useful tool for learning. It can assemble instructions, inspect registers, dump memory, disassemble bytes, set a file name, and write memory to disk.
| Command | Meaning | Use |
|---|---|---|
A | assemble | enter assembly instructions |
U | unassemble | disassemble memory |
R | register | read or modify registers |
N | name | set the output file name |
W | write | write memory to a file |
Q | quit | leave debug |
Running r shows the register state. The output is a reminder that the program is running in a segmented 16-bit world.
COM files and offset 0x100
DOS has several executable formats. This note uses the COM format because it is the simplest. A COM file is loaded into one segment, and execution begins at offset 0x100.
The first 256 bytes belong to the Program Segment Prefix. DOS prepares that area before the program begins. It contains command-line and process information. For this experiment, the practical rule is simple: assemble the program at 0x100.
-a 100
Then enter a tiny program:
mov ah, 2
mov dl, 41
int 21
int 20
The instructions mean:
- Put
2inAH. This selects DOS interrupt21h, function 2: output one character. - Put
0x41inDL. That is ASCIIA. - Execute
int 21h. DOS prints the character. - Execute
int 20h. DOS terminates the program.
A dump of memory shows the machine code.
The bytes are:
b4 02 b2 41 cd 21 cd 20
This is the important conceptual step. A program is bytes. Assembly text is one way to describe those bytes. A COM file is those bytes written to disk. The CPU gives those bytes meaning by decoding them as instructions.
Saving and running the COM file
To save the program, set a file name, set CX to the number of bytes, write from 0x100, and quit.
-n hello.com
-r cx
CX 0000
:8
-w 100
-q
Then run it from the DOS prompt:
C:\> hello
A
This is a complete DOS application. It has no runtime, no object format at execution time, no loader complexity beyond DOS loading a COM file at the expected offset. The program is small enough to account for every byte.
Moving toward Rust
The next question is whether Rust can produce the same kind of thing.
That requires removing assumptions that normal Rust binaries make:
- no host operating system ABI,
- no standard library,
- no ordinary
mainentry path, - no default runtime startup,
- a fixed load address,
- and a flat binary output.
The path is:
Rust source -> ELF object -> linked image at 0x100 -> flat COM binary
The ELF image is useful during the build because it gives the linker enough structure. The final DOS program is not meant to stay ELF; it is converted into a raw binary.
Installing the Rust pieces
The original article used nightly Rust and tools common in older bare-metal Rust workflows.
curl https://sh.rustup.rs -sSf | sh
rustup toolchain install nightly
rustup component add rust-src --toolchain nightly
cargo install cargo-xbuild
cargo install cargo-binutils
rustup component add llvm-tools-preview
Some of these names have changed over time, and modern Rust has stabilized parts of this workflow. The conceptual requirements remain the same: build core, avoid std, provide a target specification, link with the layout you need, and convert the output.
Target specification
A target specification tells Rust and LLVM what kind of machine code to produce. A normal target such as i586-unknown-linux-gnu assumes a Linux ABI. DOS real mode does not match that, so the old experiment used a custom JSON target.
A simplified target spec looks like this:
{
"llvm-target": "i586-unknown-none",
"arch": "x86",
"target-endian": "little",
"target-pointer-width": "32",
"target-c-int-width": "32",
"os": "none",
"executables": true,
"linker-flavor": "ld.lld",
"panic-strategy": "abort",
"disable-redzone": true
}
The key point is not the exact JSON. The key point is that the program does not target Linux, macOS, or Windows. It targets a freestanding x86 environment and then constrains the linked image to something DOS can run.
In .cargo/config.toml, point Cargo at the target and pass the linker script:
[build]
target = "i586-rust_dos.json"
[target.i586-rust_dos]
rustflags = ["-C", "link-arg=-Tlinker.ld"]
no_std
By default, Rust links std, and std depends on an operating system. A DOS COM file does not provide the host interfaces std expects, so the crate starts with no_std.
#![no_std]
With no_std, Rust still has core: basic language items, primitive traits, slices, options, results, and other pieces that do not require an OS.
A panic handler is required because there is no standard runtime to provide one:
use core::panic::PanicInfo;
#[panic_handler]
fn panic(_info: &PanicInfo) -> ! {
loop {}
}
For a real DOS program you may want the panic handler to print a message and terminate through DOS, but an infinite loop is enough for the first minimal binary.
no_main and startup
A normal Rust binary expects the compiler and runtime to arrange a call into main. That is not what a COM file needs. The program needs its own entry symbol, placed at the beginning of the linked image.
#![no_main]
#[link_section = ".startup"]
#[no_mangle]
pub extern "C" fn _start() -> ! {
main();
exit();
}
The first version used the inline assembly available at that time. Modern Rust uses different syntax, but the idea is the same: put DOS function numbers and arguments into the expected registers, then execute int 21h.
pub fn exit() -> ! {
unsafe {
core::arch::asm!(
"mov ah, 0x4c",
"mov al, 0",
"int 0x21",
options(noreturn),
);
}
}
For the first demonstration, output one character:
#[no_mangle]
pub extern "C" fn _start() -> ! {
unsafe {
core::arch::asm!(
"mov ah, 2",
"mov dl, 0x41",
"int 0x21",
"int 0x20",
options(noreturn),
);
}
}
Again, treat this as a conceptual sketch. The exact inline assembly constraints and register classes depend on the Rust version and target.
Linker script
The linker script makes the binary layout match DOS expectations. The essential rule is that code begins at 0x100.
ENTRY(_start)
SECTIONS
{
. = 0x100;
.startup : { *(.startup) }
.text : { *(.text*) }
.rodata : { *(.rodata*) }
.data : { *(.data*) }
.bss : { *(.bss*) }
}
That puts the custom startup section first, then normal code and data. This mirrors what debug did manually: place executable bytes at offset 0x100.
Building and extracting a COM binary
Build the ELF image first:
cargo xbuild --release
Inspecting the result with objdump is useful:
cargo objdump --release -- -d
Then convert the linked ELF into a raw binary:
cargo objcopy --release -- -O binary rust_dos.com
The result should be small. That smallness is useful. If the output is unexpectedly large, inspect the sections and symbols. In this kind of program, accidental formatting code or panic machinery can pull in more than expected.
Copying the program into the DOS image
On macOS the old flow mounted the DOS image, copied the COM file, then detached it.
hdiutil attach freedos.img
cp target/i586-rust_dos/release/rust_dos.com /Volumes/FREEDOS2016/
hdiutil detach /dev/disk2
Then boot the image:
qemu-system-i386 freedos.img -boot c
At the prompt:
C:\> rust_dos
A
At this point the Rust-generated COM program has matched the handwritten debug program.
Splitting startup from main
The next cleanup is to make _start small and move actual behavior into main.
#[link_section = ".startup"]
#[no_mangle]
pub extern "C" fn _start() -> ! {
main();
dos::exit(0)
}
fn main() {
print!("hello from rust\r\n");
}
The startup function is still special. It is the ABI boundary. The rest of the code can look more like ordinary Rust.
DOS console module
A console module wraps interrupt 21h function 2.
pub mod console {
pub fn putc(ch: u8) {
unsafe {
core::arch::asm!(
"mov ah, 2",
"mov dl, {0}",
"int 0x21",
in(reg_byte) ch,
);
}
}
pub fn puts(text: &str) {
for byte in text.bytes() {
putc(byte);
}
}
}
A macro makes this tolerable:
#[macro_export]
macro_rules! print {
($text:expr) => {{
$crate::dos::console::puts($text);
}};
}
Now the higher-level program does not need to know about AH, DL, or int 21h.
Port I/O
To move beyond DOS services and touch hardware, the program needs port I/O. On x86 this uses in and out instructions.
pub unsafe fn inb(port: u16) -> u8 {
let value: u8;
core::arch::asm!("in al, dx", in("dx") port, out("al") value);
value
}
pub unsafe fn outb(port: u16, value: u8) {
core::arch::asm!("out dx, al", in("dx") port, in("al") value);
}
This is exactly the kind of unsafe boundary Rust should make visible. The operation is inherently unsafe because the compiler cannot know whether a port exists, whether the device is ready, or whether the write will hang the machine.
Keyboard controller
The original article then used the keyboard controller as a small hardware experiment. The classic PC keyboard controller exposes status and data ports:
| Port | Use |
|---|---|
0x64 | status and command |
0x60 | data |
A simplified polling read looks like this:
const KBC_DATA: u16 = 0x60;
const KBC_STATUS: u16 = 0x64;
pub fn read_scan_code() -> u8 {
loop {
let status = unsafe { inb(KBC_STATUS) };
if status & 1 != 0 {
return unsafe { inb(KBC_DATA) };
}
}
}
The scan code is not yet a character. The program needs a map. The original implementation used small maps for plain, shift, control, and alt states, with only part of the US keyboard filled in.
static MAP_PLAIN: [u8; 128] = [
0, 27, b'1', b'2', b'3', b'4', b'5', b'6',
b'7', b'8', b'9', b'0', b'-', b'=', 8, b'\t',
b'q', b'w', b'e', b'r', b't', b'y', b'u', b'i',
b'o', b'p', b'[', b']', b'\n', 0, b'a', b's',
// ...
];
A real keyboard driver must handle key release codes, modifier state, extended scan codes, layout differences, and interrupts. The demo stayed smaller: poll, translate, print.
Why Bochs helped
QEMU was convenient for installing and booting FreeDOS. For the keyboard-controller details, the old experiment found Bochs more useful because its legacy-device emulation exposed the behavior being tested more clearly.
brew install bochs
cargo xbuild --release
cargo objcopy -- -I elf32-i386 -O binary \
target/i586-rust_dos/release/rust_dos \
target/i586-rust_dos/release/rust_dos.com
Then copy the COM file into the DOS disk image and boot it under Bochs.
References
What this teaches
The route from debug to Rust is the important part:
- A DOS COM program is just bytes loaded at
0x100. - DOS interrupts make the ABI visible through registers.
no_stdremoves OS assumptions.no_mainlets the program own startup.- A linker script gives control over layout.
objcopyturns a linked image into a flat binary.- Small Rust modules can wrap unsafe interrupt and port-I/O boundaries.
That is not a production recommendation. It is a controlled way to understand binary layout, real-mode execution, interrupts, and hardware I/O while still using Rust for structure.
See also
Related: xv6, xv6 bootloader, krabs.