First of all please note that this is not nasm guide. I assume that you know nasm syntax; I will not explain any nasm-related things here.
Second, I expect that you will examine source code of asmutils. This document is not intended to replace the source, its goal is only to accompany asmutils source and explain some unclear moments. Again, examine ALL source code. Look how command line parsing is done, how conditional assembly for different kernel versions is done and so on -- I am not going to explain all and everything here.
Mostly this guide describes a set of macros I've developed to write fast and readable; they are hiding from you unneeded details, and also take care of optimization.
You may also want to read Startup state of Linux/i386 ELF binary and Linux/i386 system calls documents to get better general (not asmutils specific) understanding of asmutils source code.
There are three macros that make section definition as simple as possible: CODESEG, DATASEG and UDATASEG (similar to TASM ideal mode syntax). END macro marks end of file.
Program must have at least CODESEG (.text) section, other sections are optional. CODESEG is read-only, DATASEG and UDATASEG are read-write; i.e. you can place data in CODESEG as long as you will not change it. You can also define your own sections if you want, but there's very rare need of doing so. Each section (even if it is empty) will enlarge your file.
START macro tells linker the entry point, and MUST be present.
So, typical code will look like:
%include "system.inc" CODESEG START: ;entry point your_code_here DATASEG your_data_here UDATASEG your_bss_here END
This file is vital and MUST be included into program code to do anything else; without this file you'll have to write in usual boring way.
CODESEG, DATASEG, UDATASEG, END, I_STRUC, I_END macros are here, some other will be added.
Also it contains optimizing macros _mov (former __setreg32), _add, _sub that perform register assignment, addition and substraction. You can use these macros instead of mov, add, sub; if you take care of size, this will produce quite good results (do not try to understand how they work :).
This file includes two others: includes.inc and syscall.inc you do need to include them manually.
This file stores generic constant definitions and structures (from libc headers), that are OS independent. If you add some defined constant, please do not forget to mention header file it was taken from.
File holds system call macros. A lot of people asked me for some description of them, so here are general things to know about sys_xxx macros:
_mov edx,1 __syscall write...
mov edx,ecx mov ecx,ebx mov ebx,eax __syscall write...
WARNING: *never* use __syscall macro in your program directly (of course the same applies to int 0x80 !!). This is a VERY BAD thing to do. This will MAKE YOUR CODE UNPORTABLE! So please use only sys_xxx macros!
If some system call is missing, you can add it to this file; it's simple, just look how others are done there; use sys_syscallname as macro name.
Note: there are two additional forms of sys_exit call: sys_exit_true is 'true' exit, and sys_exit_false is 'false' exit.
This file applies only to Linux. ELF macros are defined here. These macros can be (and are, by default) used to reduce final size of executable. Brian Raiter wrote README.elf and comments in elf.inc containing all you need to know about what they do and how they work. But here are another good news: almost all of them (except ELF_BSTRUC and ELF_AT) are integrated into existing program structure. To enable them you just need to have ELF_MACROS = y line in Makefile (enabled by default), this turns on automatic usage of these macros (and you do not have to include elf.inc). And if you will follow simple rules when writing a program, then you will not have to carry out two different definitions for sections and structures; so, you can compile the same source with and without usage of these macros, getting correct code in both cases. This is experimental thing, however it seems to work well. Rules are simple: use following section order: CODESEG, DATASEG, UDATASEG, END, and use I_STRUC and I_END to define structures in UDATASEG instead of istruc and iend (take any asmutils source as an example). Alternatively, you can use macros from elf.inc directly if you want, but then you can't compile your source using usual nasm/ld procedure. If you want to go this way, take the time and read REAME.elf carefully (also do read it if you want to understand how they work). Personally I think that first way is simpler.
In fact this must be done by assembler.. but.. optimizing assembler is a dream. So, I've took care of it. By default code is optimized for size, and you can get up to 20% smaller executable; speed optimization in fact is a fake, it is just absence of size optimization :), though theoretically you can gain something on pentium processors.. To enable speed optimization set OPTIMIZE to SPEED in Makefile. Optimization touches register assignment, addition and substraction (_mov, _add, _sub macros), and section alignment (CODESEG, DATASEG macros). Optimization is a work in progress, so results may be better in future versions.
If you've gone crazy on binary size, you may want to use some of things described below.
First of all, try to keep your program in one CODESEG (.text) section. Remember, every new section (even if it is empty) increases size of executable file. Unless you have any read-write data, do not create DATASEG (.data section), keep your data in CODESEG. Even if you've got one/two variables with assigned initial values, first think of keeping them dynamically on the stack instead of creating DATASEG. And if your initial value is zero, place such variable in UDATASEG (.bss) section, it will be zeroed out by kernel.
Use _mov macro instead of mov instruction (if you do not assign one register to another), this will track several special cases and produce smaller code.
Avoid using 16bit registers (ax, bx, cx, etc) unless you know exactly what you're doing. Every 16bit instruction will take one more byte (0x66 prefix). For instance, inc ax will produce greater code than inc eax.
Here are some assembly examples you can use instead of cmp instruction to produce smaller code:
;if eax < 0 (signed compare) test eax,eax js is_less ;if eax == 0 test eax,eax jz is_zero ;if eax == 0 or eax,eax jz is_zero ;if eax == 1 (and you no more care of its value) dec eax jz is_one ;if eax == 2 (and you no more care of its value) dec eax dec eax jz is_one ;if eax == -1 (and you no more care of its value) inc eax jz is_minus_one ;if eax == -2 (and you no more care of its value) inc eax inc eax jz is_minus_one ;if -255 < value < 255, you can use cmp eax,byte value ;or -value ;instead of cmp eax,value
Seek, and you may find more :)