1

I only found about this recently - thanks to the Neo Dev Wiki slowdown page on Metal Slug 2 https://wiki.neogeodev.org/index.php?title=Metal_Slug_2_-_Super_Vehicle-001/II - but you can do a 32-bit MOVE.L write to the VRAM address port and Data Port in the same instruction. Apparently this is the cause of some slowdown in MS2 because it ONLY writes to VRAM this way. But ironically(!), I was able to use the method to speed up some of my VRAM writes in the game I am working on

e.g. if you have a contiguous list of sprite addresses and character tiles to be updated

You might normally do :

.loop: move.w (a0)+,(a1) // Write sprite address to address port 12 cycles move.w (a0)+,(a2) // Write tile (character) word to data port 12 cycles dbra d0, .loop

Here is how you would do it with 32 bit writes :

.loop: move.l (a0)+,(a1) // 32 bit write to address and data port of *both* sprite address *and* tile/character data = 20 cycles dbra d0, .loop
So you save 4 cycles per loop


I also *assume* (not 100% sure!) you can unroll such loops better with MOVE.L because both the 16 bit writes come at the end of the instruction (3 reads then 2 writes) So there is enough natural delay (16 cycles) to meet VRAM restrictions before writing to the address port again

VRAM Timings from Dev Wiki :

VRAM-TIMINGS.jpg

e.g.

move.l (a0)+,(a1) 20 cycles move.l (a0)+,(a1) 20 cycles move.l (a0)+,(a1) 20 cycles move.l (a0)+,(a1) 20 cycles etc
But with the standard way of word (16 bit) writes, you would need to insert a NOP delay before writing to the address port again (requires a 16 cycle delay)

e.g.

move.w (a0)+,(a1) 12 cycles move.w (a0)+,(a2) 12 cycles nop 4 cycles move.w (a0)+,(a1) 12 cycles move.w (a0)+,(a2) 12 cycles nop 4 cycles etc
So a 32 bit write would be even more efficient in this situation - saving 8 cycles per iteration. I haven't tried this method myself yet though

2

I was researching deeper into this yesterday , and it seems that the order of the read/write cycles in a MOVE.L (a0)+,(a1) instruction is probably :

Read (4 cycles), Read (4 cycles), Write (4 cycles), Write (4 cycles) Read (4 cycles) = 20 cycles total

This seems counter intuitive e.g. Why is there a "Read" at the end? It's because the read at the end is a pre-fetch (the 68000 has a small 3 word prefetch queue). It's also possible one of the other reads is a prefetch too.

Assuming I am right (can anyone confirm?) - the 2nd example in my previous post should still work because there is still a 16 cycle delay between writing to the VRAM address port again.

68000 is more complicated that I thought. I had just assumed if the manual said 3 reads and 2 writes they would be sequential!


The info on pre-fetch comes from here : https://pasti.fxatari.com/68kdocs/68kPrefetch.html

MOVE instructions. Most variants, except the ones noted below

1) Perform as many prefetch cycles as extension words are in the source operand (optional).

2) Read source operand (optional if source is register).

3) Perform as many prefetch cycles as extension words are in the destination operand (optional).

4) Writes memory operand.

5) Perform last prefetch cycle.

3

What's the point of the last prefetch though? 🤔
avatar
Highway Runners, mon jeu de racing à la Outrun qu'il est sorti le 14 décembre 2016 ! N'hésitez pas à me soutenir :)

https://itunes.apple.com/us/app/highway-runners/id964932741

4

Brunni (./3) :
What's the point of the last prefetch though? 🤔

My understanding is all of the pre-fetch cycles are fetching words for the *next* instruction that follows this one. That includes the last read too. I think most instructions do more prefetch reads than actually read data needed for the current instruction!

This just enables the 68000 to run at the speed it does. Without the prefetching it would be slower. It means it can start reading + decoding the next instruction while it is running the current one

You can see here with NOP (which has no operands and is one word long).

The prefetch queue is re-filled during the execution of an instruction. Most instructions perform as many prefetch cycles as the total number of words in the instruction. For example NOP, the simplest instruction, it performs one prefetch at the address two words (four bytes) above the location of the NOP

5

I have managed to confirm this is correct now :

Cosmic Riko (./2) :
I was researching deeper into this yesterday , and it seems that the order of the read/write cycles in a MOVE.L (a0)+,(a1) instruction is probably :

Read (4 cycles), Read (4 cycles), Write (4 cycles), Write (4 cycles) Read (4 cycles) = 20 cycles total

A guy on an Atari ST forum showed me this document : that has accurate cycle and bus usage stats for each 68000 instruction (tested on real hardware)

https://gist.github.com/cbmeeks/e759c7061d61ec4ac354a7df44a4a8f1#file-yacht-txt


Incidentally my 68000 manual (7th edition) has the wrong cycle times for a few instructions. I checked the 9th edition online and they are also wrong on there. BUT strangely they are correct in a very early edition of the manual from 1983!

On the Neo Geo Dev Wiki these are correct, but I did notice another cycle time was wrong (AddQ.B/W to memory was 12+ when it should be 8+).

When I get time I will go through all the instructions there on the wiki to to make sure they are right. I'm sure most of them are