I had talked to you of my clib I release for
my OS...I have made a version for linux that can be used with C style call
or ASM style calls(regs)...
I have modified It , including a lot of conditionnal assembly... I
have defined two var for call style:
it is at the beginning of the source:
C_CALL is for C style passing of args on stack
ASM_CALL is for asm call style through registers
the second flags are:
SPEEDOPT for speed
optimization
SIZEOPT for guess what ???
yes size optimization
for the moment , you have to choose between SIZEOPT or SPEEDOPT
not the two at the same time..
idem for C_CALL or ASM_CALL
to set the flags you just have to uncomment the
related line at the beginning of the source
OK here is the functions implemented in this first version...:
itoa
strlen
memset
memcpy
memcpyl
fprintf
printf
malloc
calloc
free
first: the memory allocation trio (malloc, free, calloc) & MemInit
the memory allocation trio (malloc,free,calloc)
were developped for my OS first, I have tested them extensively , and I
am proud of them , because they are pretty fast & short...I think they
are pretty sure too , no bugs...
The memory allocation scheme , use a sort of linked
list... each memory block size is 12 bytes
structured like this...:
struc MAB
.start resd 1
; begginning of block (divide by 32, starting at 0 first location
of the mem space)
.size resd 1
; the size of the mem block (divide by 32 too)
.pid resd 1
; the pid of the process which allocate the block
endstruc
the memory allocated block sizes are 32 bytes multiple,
I could have made 1 byte memory block , but I think that 32 byte
is good like that , it is always aligned on a cache line boundary, and
it avoid memory fragmentation...
when you pass memory size to malloc , it convert
it to a 32 byte multiple size, so it is exactly compatible with standard
C malloc, idem for free & calloc...
I have include the pid of the process that allocated the memory block , like that I will implement a sort of killmem(pid) function , which will free all the block allocated by a process, based on his pid...
I have done some timing on numerous allocation, and
I can say that it is pretty fast...
In fact for this first version there is no multithreading
locking, for avoiding different process to allocate block at the same time...
But I will do it very soon...
OK...so to use these memory allocation progz...
You have to call MemInit first... the C syntax
is MemInit (memaddr, memsize) , look in the source for ASM syntax...
This function pass to the memory allocation core the adress(memaddr)
& the size(memsize) of a physical memory block, that will be
used as the available memory for malloc, free, & calloc...
For example, if you want to use my clib with you
C program, you can for example allocate a block of memory of 5 MBytes with
standard malloc, then you call MemInit with the adress of the
block return by C malloc , and its size....
Then it's all, you can now work only with my mem.alloc
versions of malloc, free, calloc, like you do with the system version...
itoa: convert int to ascii
here is the core of itoa, like you see it's short, this prog can convert
a 32 bit value give in eax , to a ascii string store in [edi], and it can
convert it in binary,decimal, hexadecimal , octal mode
you just give the wanted mode in ecx (2,8,10,16 for example)
I was using this algorithm to convert to decimal string, and I never
saw before that just in adding the
cmp dl,'9' to process hexadecimal too , then you have an universal
int to ascii converter.
I use it for my fprintf, to win space...
itoa:
sub edx,edx
div ecx
test eax,eax
jz short .print0
push edx
call itoa
pop edx
.print0:
add dl,'0'
cmp dl,'9'
jle short .print1
add dl,0x27
.print1:
mov [edi],dl
inc edi
ret
memset & strlen
the version of strlen & memset are the more optimized of all...
I take some idea from the assembly journal , for the sized optimize
version of strlen, & speed optimized version of memset...
the size optimized version of strlen if only 10
bytes
the speed optimized version of strlen process 1
character in 0.75 cycle, and it auto process misalign memory reference
without penalty...I think it is near optimal speed ...
the speed optimized version of memset is fucking speed too, & if avoid misalign memory access too, it write ...I think it near from optimal speed too..
fprint - first version
I have developped printf version for my OS I used it extensively , and
I think there is no bug
for the moment fprintf can process:
%s for string
%d for decimal string
%x for hexadecimal string
%o for octal string
%b for binary string
\n line feed or you can use
db 10 in assembler it's the same
for the moment it is the only thing processed , more will come...
I only use C style call for fprint , because there can be many args
, and passing them by registers was too hard..
the string are null-terminated like in C
fprintf(filedesc,string, args, .... )
fprintf can write to STDOUT or file
so I implement printf as a macro that do fprint(STDOUT, string, args,
...)
I think that this fprintf version is compact & small , tell me what you think of it , & if it is interesting to implement all the printf stuff, or to do a compact version that just handle the most useful things...
My fprintf version, is compatible with C standard one...
memcpy - the worst
I haven't found any good optimization for memcpy,
because the source, & dest must be dword aligned to be optimize, and
if one of the two if not aligned correctly you can't synchronize the two
so I implement memcpy with rep movsb
and I do a memcpyl with rep movsd
bu the source & dest must be dword aligned or no speed gain will be
feel...
SPEEDOPT & C_CALL ->
790 bytes
SIZEOPT & C_CALL
-> 618 bytes
SPEEDOPT & ASM_CALL -> 676 bytes
SIZEOPT & ASM_CALL -> 504
bytes
so the SIZE OPT make wins 172 bytes for the moment
pretty small no ???