GBA BIOS Notes
In the process of High Level Emulating (HLE) the BIOS for my emulator, I noticed a lack of information on timings and unexpedted/undefined output. This page documents my (very limited) findings and gives me a place to rant about dumb things I found in the code.
All code is from the disassembly at https://github.com/PikalaxALT/gba_bios
SWI Functions
Number | Name | Inputs | Outputs |
---|---|---|---|
0x00 | SoftReset | none | reset |
0x01 | RegisterRamReset | none | |
0x0B | CpuSet | r0, r1, r2 | todo |
0x0C | CpuFastSet | r0, r1, r2 | |
0x0D | BiosChecksum | none | r0, |
Input and output registers are listed for expected behavior. If a register is not listed, it does not get changed. If a register is crossed out, that means it gets trashed or set to something useless/undefined. I will still do my best to document these.
CpuFastSet
Copies data from one location to another using ldmia r0!, {r2-r9}; stmia r1!, {r2-r9}
or sets it to the value of [r0]
using stmia r1!, {r2-r9}
Inputs
Register | Purpose |
---|---|
r0 | Source address |
r1 | Destination address |
r2 | Length and source mode |
- r2:
- bits 0-20: The number of words to transfer. This will be rounded up to the nearest multiple of 8 if it is not 0.
- bit 24: A value of 0 will do a copy; a value of 1 will do a set/fill as described above.
The transfer will not be performed if the size, size & 0xE000000
, or (size + sourceAddress) & 0xE000000
is 0.
Outputs
Register | Purpose |
---|---|
r0 | End of coppied area |
r1 | End of written area |
r2 | 8th last transfered value |
r3 | 7th last transfered value |
r12 | (r12 & ~0xFE000000) + sourceAddress |
CpuFastSet does not push and pop all the registers it uses for transfers. As a result, registers 2 and 3 still contain some of the coppied data.
In the case of a fill, r0 remains unchanged. Otherwise it’s the memory address right after the last coppied word. r1 is always the address after the last written word. These could be used to fill a continuous region of memory from different locations or vice-versa with only one register change.
If the arguments are not valid r0, r1, r2, and r3 will remain untouched. r12 will be unchanged only if the size is 0.
Time
Rants
While looking at this code there were multiple times I found something weird or wanted to slap the programmers who wrote it. In order to spare the people in the Emudev Discord server from me spamming #gba, I am keeping it all constrained to this section.
CpuSet could have been faster if the developers put in less work
While it wasn’t intended to be the method for fast copying, CpuSet would have been much better off copying the code literally two or three subroutines below it in CpuFastSet. The first thing I noticed was this when selecting between 16 and 32 bit transfers.
r5
gets set to 0
only to be changed to something different if the 16 bit option is set. The movs r5, #0
could be moved to _0B78
and save a cycle in some cases, but there’s actually a better solution that I’ll get to soon.
You know how I said they should have coppied CpuFastSet? Well they did, in a really weird way.
I guess ldm
/stm
are the same speed as ldr/str, but it’s still a weird decision. They can’t benefit from ldm
/stm
’s writeback on halfword transfers, so let’s see what they-what?
Let’s ignore the fact that different pieces of code in the same function doing an almost identical task were written with different styles and use the same registers for different purposes. The only way this can be explained is the author never learning about post-indexing. My solution from a quick modification of CpuFastSet looks like
I could probably save a couple cycles by switching back to the bge
at the top of the loop. Most importantly this saves a cycle on each transfer, is more consistent with the other copy function, and can easily be used for all four cases with little modification. Remember that movs r5, #0
earlier? That can be replaced with the adds r5, r1, r4
below it to save the same number of cycles and cut two bytes.
Conclusion: This is all a conspiracy to artificially drive up business for big-CpuFastSet.
RegisterRamReset is both smart and dumb
- Depends on CpuFastSet’s strange outputs
- Missed a conditional
- Extra write to SIODATA2
- Extra write to (?)
- SOUNDBIAS (?)