TIGCCLIB sprite routines... - Page 1

1

Le 11/07/2009 à 17:49

This topic aims at discussing the extensions and optimizations to the TIGCCLIB sprite routines (currently, Sprite8, Sprite16 & Sprite32).

First, a little bit of history.
There have been at least three rounds of optimizations and extensions to the TIGCCLIB sprite routines:
[ul][li]a minor optimization on the address computation. I sent it to Kevin in May 2002, i.e. roughly when the same optimization was applied to ExtGraph.[/li]
[li]further optimization on the address computation + a change in the Sprite32 algorithm: like in ExtGraph, operations can be made on 1 long + 1 short instead of two longs, one of which has a shift count > 16. I sent the modified routines to Kevin in October 2003, i.e. roughly when the same optimizations were applied to ExtGraph.[/li]
[li]Joey Adams' (MrJoey / joeyadams) work on sprite routines: rewrite in assembly; new SPRT_RPLC (AND+OR the same sprite in a single call) drawing mode; new routines: generic (multi-mode) clipped routines, single-mode non-clipped and clipped routines.[/li][/ul]
A tiny subset (Sprite8, Sprite16, Sprite32 with SPRT_RPLC support) of Joey's work of extension+optimization was merged in GCC4TI Beta 10, after further optimization (conversion of branches to explicit short form; address computation; reordering of Basic Blocks; Sprite32 algorithm change).
And I've just noticed that we can squeeze two more bytes on all three routines, by using an optimization I described in the S1P9 tutorial: subq.w #1; beq.s; addq.w #1 instead of cmpi.w #1; beq.s; tst.w.

Joey's work, containing an extra-strong test procedure (exhaustive comparison against ExtGraph routines, with buffer overflow detection), can be downloaded at http://www.funsitelots.com/pub/Sprite_8_16_32_Stable.zip . Two ExtGraph routines were fixed in October 2005 thanks to Joey's work.
A very slightly accelerated and modularized (#defines to individually enable each of the 12 subsets among the 48 tested routines) version of the test program is available within http://www.funsitelots.com/pub/Sprite_8_16_32-20090626.tar.bz2 .

Where do we want to put the slider on the size optimization - speed optimization tradeoff ?

Joey's routines tend to use an external routine for the address computation and/or clipping. This decreases code size if more than one routine is used (especially for clipped routines), but increases it if only one of them is used... And, when more than one drawing mode is used, I'm rather unconvinced that it makes real sense to use two or more routines of the same family (say, ClipSprite16{,AND,OR,RPLC,XOR}):
[ul][li]obviously, it does hardly make sense to use a generic routine and a specialized routine of the same family;[/li]
[li]if size is what matters (that would be the use case for TIGCCLIB, I'm told, though TIGCCLIB is not completely size-optimized), then the generic routine with inlined address computation and/or clipping should be used: the generic routine is smaller than two specialized routines, even if the address computation and/or clipping used by the specialized routines is externalized;[/li]
[li]if speed is what matters (that would be the use case for specialized libraries: ExtGraph and Genlib), specialized routines should obviously be used, and a function call is not a step in the direction of fulfilling the goal of making fast routines.[/li][/ul]
Pushing the address computation and/or clipping to an external routine is a tradeoff between size optimization and speed optimization, but I'm not sure to see the point of that particular tradeoff: in at least one common use case (only a generic routine used), it makes the code both larger and slower.

What do YOU think ?

Membre de la TI-Chess Team.
Co-mainteneur de GCC4TI (documentation en ligne de GCC4TI), TIEmu et TILP.
Co-admin de TI-Planet.

2

Le 13/07/2009 à 19:37

I'm going to try reformulating my previous post

In Joey's implementation, the code that is functionally common (address computation and/or clipping) to the routines of a given family (Sprite8, ClipSprite16, etc.) is externalized to a routine called by the generic routine and all four specialized routines.

Let's describe several use cases, based on two axes: speed optimization vs. size optimization, and number of drawing modes used in the program:
[ul][li]size optimization matters, a single drawing mode used in the whole program: a specialized routine should be used, obviously;[/li]
[li]size optimization matters, multiple drawing modes used in the whole program: the generic routine should be used, obviously;[/li]
[li]speed optimization matters, a single drawing mode used in the whole program: a specialized routine should be used, obviously;[/li]
[li]speed optimization matters, multiple drawing modes used in the whole program: specialized routines should be used, probably.[/li][/ul]
In all four use cases, externalizing common code to a routine yields a slower program (which is not a step in the good direction for the last two use cases), and the program is larger in the three first use cases (which is not a step in the good direction for the first two use cases).

I completely understand that the tradeoffs in TIGCCLIB can be different from those made in fast libraries such as ExtGraph and Genlib (the two surviving libraries of graphical functions), but we could do better than:
[ul][li]either unconditionally externalizing the address computation and/or clipping (even in the generic routine), which adversely effects performance in all situations, and footprint in most of them;[/li]
[li]or unconditionally inlining the address computation and/or clipping, even in the specialized routines, which helps performance in all use cases, but hurts footprint in the fourth use case (I'm not sure for Genlib, but that's what ExtGraph does)[/li][/ul]
Since it doesn't make sense to use both the generic routine and one or more specialized routine, what if the following tradeoff was implemented ?
[ul][li]inline the address computation and/or clipping in the generic routine;[/li]
[li]leave the address computation and/or clipping externalized for the specific routines.[/li][/ul]
Doing so would fix the size of the generic routine, while limiting the footprint of multiple specialized routines, which would be a slightly better tradeoff IMO.
If speed and use of powerful graphical functions really matter, than people won't be using TIGCCLIB anyway, so we can leave the "high speed, high footprint" use cases to ExtGraph and Genlib.

What do YOU think ?

Membre de la TI-Chess Team.
Co-mainteneur de GCC4TI (documentation en ligne de GCC4TI), TIEmu et TILP.
Co-admin de TI-Planet.

3

Le 13/07/2009 à 22:16

My personal opinion is that I don't care at all about the TIGCCLIB sprite routines. I have never used them, neither plan to use them.

4

Le 14/07/2009 à 16:57

I understant because I think that integrating ExtGraph would be better. And someone which want a powerfull GFX static lib will use it, not TIGCCLIB functions.

Nevertheless, I think that if I need speed, I won't care about the place of the routines I use. So I think the specialized routines shouldn't share any code with any other one.

5

Le 23/07/2009 à 09:35

If size is what matters, it doesn't make a lot of sense to use both a non-clipped routine (say, Sprite8) and its clipped sibling (say, ClipSprite8) in a given program. In such a case, only the clipped routine (which should have no subroutines, for both size and speed optimization) should be used.

Membre de la TI-Chess Team.
Co-mainteneur de GCC4TI (documentation en ligne de GCC4TI), TIEmu et TILP.
Co-admin de TI-Planet.

6

Le 23/07/2009 à 12:43

I agree.
Clipped routine are better since tigcclib user are probably not advanced users.

7

Le 24/07/2009 à 18:59

Generic clipped sprite routines with inline address computation & clipping committed at r1361.

Membre de la TI-Chess Team.
Co-mainteneur de GCC4TI (documentation en ligne de GCC4TI), TIEmu et TILP.
Co-admin de TI-Planet.