1

Is there any reason to NOT use pre-shifted hardware coordinates for sprites in Neo Geo games?. I have been optimising the homebrew game Neo Thunder. And it has annoyed me how much CPU time was spent converting the X and Y (especially) screen coordinates of sprites to hardware coordinates. This is done to make sure they are in the right form to update VRAM correctly. The shifts alone (7 places to the left) take 20 CPU cycles each. With a large amount of sprites, quite a lot of time is "wasted" doing this.


O3sZmaM.png


I think using native hardware coordinates would also mean that implementing a sprite update buffer (where you save sprite updates for the vertical blank period) becomes much more trivial + faster. In the sense it would not be needed for many sprites now. Since the X and Y values can now be *directly* taken from the bullet-list, alien-list etc with no need for duplication or pointers

Negatives (for Neo Thunder) :

- With the current version of NeoThunder, doing it this way, would mean i can no longer use ADDQ, SUBQ to quickly make a bounding box for a bullet in the player bullets VS Aliens collision detection routine (the most time intensive routine which is in inline assembler)

- I also won't be able to use the limit-check trick that works with unsigned numbers where you only need one limit check instead of 2.


However I think overall it would be much faster to do it this way because I would save a lot of time throughout the program

Neo Thunder is maybe unique in that every sprite is only 1 character in height and has no sticky bits set - so I can keep these lower bits permanently set and they won't affect comparisons etc. But even in games that don't have this advantage, it would be fast to just OR in the correct settings. (actually thinking about it more, some comparisons will work anyway so the OR is not needed for those)

I don't *need* to do this in order to get the game running smoothly, but it has made me think about the best way to store X and Y values for future projects. And it is always good to save time if possible

What do people think about doing it like this? Are there downsides that I am not seeing?

Thank you 👍

2

I just looked up Sega Megadrive sprites and so much easier 🙂 No need for shifts and y coords increase as they go down the screen. Only downside is top left of screen is 128,128

LSzwPVu.png

3

I am 60% of the way through converting the game code to use "pre-shifted hardware coordinates now" so I thought I would report back in case anyone else is interested in doing this in future.

Firstly I made a mistake about the range trick not being possible - it obviously is, since it actually works with *unsigned numbers*

Single Range checks can be slightly more tricky + slow though

e.g

If (JoyLeft and (x <= -24))
... stop player movement to left

becomes something like

if (JoyLeft and ((x + 3072) >= 44800))

Please note this is equivalent to : (before pre-shifting)

if (JoyLeft and ((x + 24) >= 350))


The other main issue I correctly identified is that ADDQ, SUBQ (these can only be used with numbers 1-8) are no longer possible in many cases. Which I think adds 4 CPU cycles for each ADDI,SUBI instruction that is substituted for them. I am mostly using C but I believe the C Compiler uses ADDQ, SUBQ etc where it can

Some examples :

-It's common in NeoThunder to subtract 1 from every alien and background tile position to move them to the left

-Player bullets are moved by 5 pixels to the right each time.

-And as mentioned in my first post, bounding boxes for the player and enemy bullets are constructed this way. (4 operations per bullet)


So these were all suitable for ADDQ, SUBQ use

The extra time taken does add up for loops with a large number of elements



BUT the saving of doing it this new way, is very large. e.g. each Y screen coordinate to VRAM conversion takes 44 CPU cycles and each X coord takes 20 CPU cycles (just the left shift 7 places needed for X coords!)

With up to 250 individual sprites on screen this adds up to 21 display lines saved


*So far* I think I have made the correct decision to do this. But maybe I will still find a bigger problem, as I continue with the conversion

It's also been quite annoying doing this, since gaps in my understanding made me make several mistakes that caused bugs which took me a while to fix. But I am learning more as I go on.

4

I have realized there is a big issue when collision boxes are off the left side of the screen. e.g. x now becomes 500 instead of what was previously a - 12 screen coordinate

I can't really think of a way round this - other than to add an offset to all x coords to make sure all checks are *always* done onscreen. Then when I update VRAM I will have to subtract this offset. This seems like the best way round to do it because I don't want to slowdown the Player Bullets VS Enemies collision detection routine by doing extra arithmetic in those loop(s). e.g. 60 enemies x 20 player bullets = 1200 checks. Where as changing x everytime I update 200+ sprites is faster.

Does anyone have any better solutions to this? Thank you

5

FINAL UPDATE : his turned out to be a great way to speed up my program. I have tried to max it out and I am getting 340+ total sprites moving on screen, at 60fps with collision detection on too. Not bad!

Doing it this way - using "hardware coordinates" for everything - was tricky for me to fully understand at first and there are a few things it makes slower. But for the simple shooter I am working on - this gave a nice speed boost overall. Would recommend doing it if you need some extra speed in your game.

6

Thanks for sharing this. Just posting to say it's being read with interest.
avatar

7

Clearpaper (./6) :
Thanks for sharing this. Just posting to say it's being read with interest.

Good to hear it has been of interest. Appreciate you letting me know 👍

After finishing this conversion - I now take back what I said about the Megadrive hardware coordinates starting at 128,128 being bad! I now realize that's a better way of doing it. Since you don't get screen border issues like you do on the Neo Geo

Also just to be clear - when I said I was now getting 340 sprites+. That's total sprites (including background and the player-ship) . There are 21 player bullets, 140 enemies and 153 enemy bullets. So the collision detection is only between these groups of sprites (+ the player-ship). The overall improvement I am getting is *also* down to a new collision detection routine I wrote (between player bullets and enemies). That is now written in C again but its much more efficient than before.

The game engine is not actually maxed out yet though - I just ran out of sprite slots I had allocated myself. But I think its close now to dropping a frame.

8

A final update to say I managed to get all 380 sprites in the game (at 60fps) using this technique! It helped a lot, that I had a sprite update X coord buffer that was *exactly* matched to the X coordinates of the objects. E.g. Update_SpriteX[380]. Then in that array elements 1-32 are the x coords of the background tiles, 33 to 53 are x coords of the player bullets etc. I would use this array *directly* whenever I updated or checked the x coord of any object to avoid duplication

This shows how the sprites are allocated (from my C program)

// ********* SPRITE ALLOCATIONS *************** // Background starfield 1-32 (32 total) // Player bullets 33 - 53 (21 total ) // Enemies 54 - 208 (155 total) Explosions *share* sprites 193 - 208 (16 max) but are overwritten (oldest to newest) if more enemies are needed on screen // Player ship 209, 210 (2 sprites, nose and tail) // Enemy bullets 211 - 380 (170 total)
Here is my current inline assembler routine to update all 380 sprites X positions. Please remember I had to add 32 to all my x values to avoid problems when sprites moved off the left side of the screen, and the coords would wrap round. So in this routine I had to subtract 32 (4096 in hardware coordinates) from each coordinate. It was a shame but I couldn't think of a way round this.

// *** UPDATE X FOR ALL 380 SPRITES (This is very brute force!) update_spriteXP = update_spriteX; // This is needed because this old version of GCC uses the wrong address for update-spriteX otherwise! asm volatile ( "move.l %0, %%a0\n\t" // load update_spriteX base address into a0. Must use move for pointer "addq.l #2,%%a0\n\t" // add 2 bytes to start at spriteX[1], can remove this line if use a direct pointer instead "lea 0x3C0002, %%a1\n\t" // Load address 0x3C0002 Data Port into a1 "move.w #0x8401, -2(%%a1)\n\t" // sprite 1 address to address port "move.w #4096, %%d1\n\t" // USE THIS TO SPEED UP SUBTRACTION "move.w #1,2(%%a1)\n\t" // Set vid_modulo to 1 (0x3C0004) SET THIS ONCE HERE ".rept 380\n\t" // Repeat the next instructions 380 times "move.w (%%a0)+,%%d0\n\t" "sub.w %%d1, %%d0\n\t" // subtract 32 so all boundary calculations are done with onscreen coords "move.w %%d0,(%%a1)\n\t" // Write x coord - 32 to address in a1 ".endr\n\t" : // No output operands : "m" (update_spriteXP) // Input operands: update_spriteX array : "d0","d1","a0","a1" // Clobbered registers );
I also had a smaller Sprite Y buffer (170 elements) that contained all the enemy bullet y positions. These were also updated every frame. And were also directly manipulated by the program routines. So again no duplication needed


I did *still* need a couple of separate sprite update buffers for the few objects where y positions and graphics tiles change though. I would just store these as VRAM address, update value.. (repeated)

I will release all the code when I finish (optimizing Neo Thunder). It is a simple game, so it's easier to optimize, but many of the techniques would still be useful to build upon.

I also had to convert a couple of simpler (background update and explosion update) routines to 68000 to get to the full 60fps using all 380 sprites. I still want to convert one of the more demanding collision functions to 68000 before I finish because I am right on the limit of dropping a frame atm. Sometimes there is literally no time left at the end of the frame! It varies from 11 to 0 display lines left, with maximum objects on screen.