Razoola (./59) :
Yup, its a fast way and only marginally slower (2 scanlines) than using longwords, defo faster than the original method, I did not think it would be so close to longword speed in this instance. The method I mentioned altering the buffer format and using longwords is going to give a little more (45 or just into 46 scanlines) but there is the buffer format change to think about which might cancel that out plus more.
In my years of experience I know using longwords is the maximum for speed when copying memory around but I did not realise that sometimes using movem with words can get you almost as close in some situations so I learned something from your initial idea Dresbenboy. Maybe you learned that with movem its sometimes good to use an address register (indirect with increment) and not as storage . And blaster has been taken out of the world of move.l (a0)+,(a1)+ and introduced to the world of movem. We all gained something and that's good for everyone.

I only have little knowledge about the Neo Geo's hardware details, so I avoided the A7 variant.

BTW, for the DIFF demo I coded in Easy68k without any real HW or even a set up emulator and development environment.
I come from the C64 (with crazy HW effects by tricking the VIC's flip flops etc., the Amiga (thus my 68k experience) and x86 (since 386). The Neo Geo is an interesting new experience to me!