POUR Paprikarma - Un grand Merci pour transmettre a qui de droit
Hi Intelecto,
Recent owner of a GP32, i've purchased that item in mind to play old snes software.
Let me introduce myself. I'm was a game/devlopper/designer in the late 80's and early 90's. I worked on a bunch of about twelve games on several hardware like Amstrad, Oric (8bits), Atari ST, Amiga (16/32 bits). I've some good (almost) skill in assembly coding expecialy in optimization. 2D image processing is no abstract for me nor ARM based coding (so mush time spent on Acorn ARM Risc there's ten years ago).
Still have abundant documentation on snes platform too...
So, briefly...
I've understood that the Snes9xGP is a snes9x based implementation.
You've done a great job to port snes9x to the GP console.
There are many things in snes9x core opensource code that could be optimized on a risc platform ( particularly in assembly 'cause C compiler often doesn't use the whole registry capability.
You done recently a background work on main 65C016/SPC core emulation.
I've noticed that on original snes9x source code, main loop is heavily disrupted with HBlank, SPC cycles to left, and MNI and other cpu flags. The main problem i've seen is the 14 divide the arm cannot easy handle. So the way to go in this case is to normalize SPC timing units to CPU ones to avoid such divide. There is another thing that can be considered. Opcode call table (functions with heavly C frame relevant cycles) can be optimized. The following can be a example of what i'm thinking of ( taking care of s3c2400 associative cache):
//
// (arm mode not thumb)
// ex : 'ORA' opcode simulation (05)//
// R12 = main SPC evaluator
// R11 = main CPC event evaluator (HBlank,flags)
// R10 = main 65C016 cycles left before event
// R9 = main 65C016 PrgCounter
// R8 = main 65C016 registers
// R7 = SPC PrgCounter
// R6 = SPC registers
// R5 = CPU to SPC switch (bit stream)
// R4 = Base 65C16 opcode jump table
// R3 = Base SPC opcode jump table
// R0-R2 free registers (it's not so bad
+R14?
65C016_jump_bootstrap:
LDRB R0,[R9],#1 Get the next 8Bits opcode from (R9:PC++)
ADD R15,R4,R0 SHL 8 Jump to indextable ( fixed 256bytes per opcode processing )
R11 is base table opcode jump table for current M and X mode for 65C016
->
// 256 Bytes aligned ORA code simulation page(M0X0)
LDRB R1,[R9],#1! Immediate value
LDR R0,[R8+0x4] Load the 65C016 accumulator A
ORR R0,R0,R1 Process the operation
STR R0,[R8+0x4] Save the 65C016 accumulator A
LDR R2,[R8+0x8] load 65C016 flags
ANDS R2,R2,#0x7D Remove N & Z flags
ORREQ R2,R2,#0x2 Put Z flag
AND R0,R0,#0x80 sign
ORR R2,R2,R0 Put N flag
STR R2,[R7+0x8] save 65C016 flags
;Timing specs
SUB R10,R10,#2 2 cycles
MOVCS R15,R11,R11 Return to event catcher
ADD R5,R5,0x48000000 CPU to SPC time fractionnal
;Jump to another opcode
LDRCCB R0,[R9],#1! Get the next 8Bits opcode (65C016) from (R9:PC++)
ADDCC R15,R4,R0 SHL 8 Jump to indextable ( fixed 256bytes per opcode processing )
R11 is base table opcode jump table for current M and X mode for 65C016
; If it's time to switch to spc...
LDRCSB R0,[R7],#1! Get the next 8Bits opcode (SPC) from (R7:PC++)
ADDCS R15,R3,R0 SHL 8 Jump to indextable ( fixed 256bytes per opcode processing )
R11 is base table opcode jump table for current M and X mode for 65C016
Hum something like this, to be compared to the assembly C code...sorry i don't get your materials for checking in depth...
I know, it's a huge job to convert the whole instruction set (EVEN with cool macros), can i help ?
Thanks to reply to topoke@hotmail.com if you're think it's could/couldn't match the bill.