GBA BIOS Notes

In the process of High Level Emulating (HLE) the BIOS for my emulator, I noticed a lack of information on timings and unexpedted/undefined output. This page documents my (very limited) findings and gives me a place to rant about dumb things I found in the code.

All code is from the disassembly at https://github.com/PikalaxALT/gba_bios


SWI Functions

Number Name Inputs Outputs
0x00 SoftReset none reset
0x01 RegisterRamReset none r0, r1, r2, r3
0x0B CpuSet r0, r1, r2 todo
0x0C CpuFastSet r0, r1, r2 r0, r1, r2, r3, r12
0x0D BiosChecksum none r0, r2, r3, r12

Input and output registers are listed for expected behavior. If a register is not listed, it does not get changed. If a register is crossed out, that means it gets trashed or set to something useless/undefined. I will still do my best to document these.

CpuFastSet

Copies data from one location to another using ldmia r0!, {r2-r9}; stmia r1!, {r2-r9} or sets it to the value of [r0] using stmia r1!, {r2-r9}

Inputs

Register Purpose
r0 Source address
r1 Destination address
r2 Length and source mode
  • r2:
    • bits 0-20: The number of words to transfer. This will be rounded up to the nearest multiple of 8 if it is not 0.
    • bit 24: A value of 0 will do a copy; a value of 1 will do a set/fill as described above.

The transfer will not be performed if the size, size & 0xE000000, or (size + sourceAddress) & 0xE000000 is 0.

Outputs

Register Purpose
r0 End of coppied area
r1 End of written area
r2 8th last transfered value
r3 7th last transfered value
r12 (r12 & ~0xFE000000) + sourceAddress

CpuFastSet does not push and pop all the registers it uses for transfers. As a result, registers 2 and 3 still contain some of the coppied data.

In the case of a fill, r0 remains unchanged. Otherwise it’s the memory address right after the last coppied word. r1 is always the address after the last written word. These could be used to fill a continuous region of memory from different locations or vice-versa with only one register change.

If the arguments are not valid r0, r1, r2, and r3 will remain untouched. r12 will be unchanged only if the size is 0.

Time


Rants

While looking at this code there were multiple times I found something weird or wanted to slap the programmers who wrote it. In order to spare the people in the Emudev Discord server from me spamming #gba, I am keeping it all constrained to this section.

CpuSet could have been faster if the developers put in less work

While it wasn’t intended to be the method for fast copying, CpuSet would have been much better off copying the code literally two or three subroutines below it in CpuFastSet. The first thing I noticed was this when selecting between 16 and 32 bit transfers.

movs r5, #0
lsrs r3, r2, #0x1b
bcc _0B78 ; 32 bit
adds r5, r1, r4
... ; 16 bit

r5 gets set to 0 only to be changed to something different if the 16 bit option is set. The movs r5, #0 could be moved to _0B78 and save a cycle in some cases, but there’s actually a better solution that I’ll get to soon.

You know how I said they should have coppied CpuFastSet? Well they did, in a really weird way.

	ldm r0!, {r3}
_0B66: ; Fixed source address
  cmp r1, r5
	bge _0B96
	stm r1!, {r3}
	b _0B66
_0B6E: ; Updated source address
	cmp r1, r5
	bge _0B96
	ldm r0!, {r3}
	stm r1!, {r3}
	b _0B6E

I guess ldm/stm are the same speed as ldr/str, but it’s still a weird decision. They can’t benefit from ldm/stm’s writeback on halfword transfers, so let’s see what they-what?

	ldrh r3, [r0]
_0B80:
	cmp r5, r4
	bge _0B96
	strh r3, [r1, r5]
	adds r5, r5, #2
	b _0B80
_0B8A:
	cmp r5, r4
	bge _0B96
	ldrh r3, [r0, r5]
	strh r3, [r1, r5]
	adds r5, r5, #2
	b _0B8A

Let’s ignore the fact that different pieces of code in the same function doing an almost identical task were written with different styles and use the same registers for different purposes. The only way this can be explained is the author never learning about post-indexing. My solution from a quick modification of CpuFastSet looks like

    adds r5, r1, r4
. . .
_0B8A:
    cmp r1, r5
    ldrlth r3, [r0], 2
    strlth r3, [r1], 2
    blt _0B8A
    b _0B96

I could probably save a couple cycles by switching back to the bge at the top of the loop. Most importantly this saves a cycle on each transfer, is more consistent with the other copy function, and can easily be used for all four cases with little modification. Remember that movs r5, #0 earlier? That can be replaced with the adds r5, r1, r4 below it to save the same number of cycles and cut two bytes.

Conclusion: This is all a conspiracy to artificially drive up business for big-CpuFastSet.

RegisterRamReset is both smart and dumb

  • Depends on CpuFastSet’s strange outputs
  • Missed a conditional
  • Extra write to SIODATA2
  • Extra write to (?)
  • SOUNDBIAS (?)
 Date: April 13, 2022
 Tags: 

Previous
⏪ Hello World

Next
Emudev Memes ⏩