e.g. if you have a contiguous list of sprite addresses and character tiles to be updated
You might normally do :
.loop:
move.w (a0)+,(a1) // Write sprite address to address port 12 cycles
move.w (a0)+,(a2) // Write tile (character) word to data port 12 cycles
dbra d0, .loop Here is how you would do it with 32 bit writes :
.loop:
move.l (a0)+,(a1) // 32 bit write to address and data port of *both* sprite address *and* tile/character data = 20 cycles
dbra d0, .loop So you save 4 cycles per loop
I also *assume* (not 100% sure!) you can unroll such loops better with MOVE.L because both the 16 bit writes come at the end of the instruction (3 reads then 2 writes) So there is enough natural delay (16 cycles) to meet VRAM restrictions before writing to the address port again
VRAM Timings from Dev Wiki :

e.g.
move.l (a0)+,(a1) 20 cycles
move.l (a0)+,(a1) 20 cycles
move.l (a0)+,(a1) 20 cycles
move.l (a0)+,(a1) 20 cycles
etcBut with the standard way of word (16 bit) writes, you would need to insert a NOP delay before writing to the address port again (requires a 16 cycle delay)
e.g.
move.w (a0)+,(a1) 12 cycles
move.w (a0)+,(a2) 12 cycles
nop 4 cycles
move.w (a0)+,(a1) 12 cycles
move.w (a0)+,(a2) 12 cycles
nop 4 cycles
etc So a 32 bit write would be even more efficient in this situation - saving 8 cycles per iteration. I haven't tried this method myself yet though
