This text file and most all others in this new GS game folder
for Super Mario Bros IIgs can be found at the program authors website urls below:

http://www.d.umn.edu/~lscharen/code0998/

(standard item directory)

or

http://www.d.umn.edu/~lscharen/
(html document pages - WWW)
------------------------------------------------------------------------
This document descibes a new type of compiled sprite I have constructed 
for my programming use


------------------------------------------------------------------------

Everyone who has used compiled sprites has been impessed by their speed, 
but gritted teeth over their numerous shortcomings. i.e. no clipping, 
special effects difficult, etc.

To address these concers I have created a type of compiled sprite that 
has the following features: 
ĽSelf-clipping: Define a minimum and maximum X range and the sprite will 
clip itself 
ĽVertical flipping: useful for effects and reducing code size 
ĽIndependent settings for each scan line 
ĽSupport of simple overlap management: useful for constructing 
pseudo-backgrounds out of sprites without slow down


------------------------------------------------------------------------

First let's define a few things. The sprite itself is descriped as an 
array of pointers to scanline functions. Each scanline fuction is 
responsible for drawing on horizontal line of the sprite.


Sprite	dc	a4'scanline1'
	dc	a4'scanline2'
	dc	a4'scanline2'
	dc	a4'scanline5'
	dc	a4'scanline1'
	.
	.
	.

Notice that the same scanline can be used more than once. This is 
helpfule when horizontal slices of the sprite are similar. e.g. a solid 
box. 

Now what about the scanline functions. Each procedure contains a header, 
several data tables, and assumes certain parameters are passed via the 
registers. 

X-reg:	contains the rightmost word of the sprite to be drawn
Y-reg:	contains the leftmost word of the sprite to be drawn
DP:	address of the left side of the scanline on the graphics screen
SP:	address of the rightmost side of the scanline

Note: X-reg must be >= Y-reg or else a crash is almost certain.


The Header of the scanline function is as follows: 


Entry	anop			;entry point
	bra	begin		;jump into code
link	ds	4		;pointer to address of code to return to
r_mask	ds	2		;mask for rightmost work
l_mask	ds	2		;mask for leftmost word
tmp1	ds	2		;temporary storage
tmp2	ds	2
tmp3	ds	2

begin	lda	mask,x		;first do the rightmost word
	ora	r_mask		;combine the masks
	bne	Need2Msk	;if we can blit the whole word, do it
	dex			;DEX twice because we do a jmp (Tbl+2,x)
	dex			;later, and we want to do the first word
	bra	patch		;as well

Need2Msk and	<$00,x		;AND it with the screen data
	sta	tmp1		;save the result
	lda	r_mask
	eor	#$FFFF		;invert the mask
	and	data,x		;clear the data
	ora	tmp1		;combine with the previous result
	pha			;put it on the screen


This code has just put the first word of data on the screen. Now we need 
to set some dispatch vectors so we can jump into the speedy compiled 
code.


patch	lda	Tbl,y		;patch the dispatch table here so we do
	sta	tmp1		;the left edge as a special case
	lda	Tbl+2,y		;patch this is case Xreg == Yreg
	sta	tmp2		;save these to restore later
	lda	#l_word
	sta	Tbl,y		;patch code to do the last word
	lda	#e_code		;and patch in the exit code
	sta	Tbl+2,y
	jmp	(Tbl+2,x)	;now jump into the compiled sprite code


Now our dispatch vectors are set. The routine l_word properly draws the 
last word of data in a similar way we did the right word above. e_code
 patches things up and exits cleanly. 

Now let's look at l_word and e_code.


l_word	tyx			;now clip the last word (need it in X)
	lda	mask,x		;same procedure as above
	ora	l_mask
	and	<$00,x
	sta	tmp3		;the other space is used
	lda	l_mask
	eor	#$FFFF
	and	data,x
	ora	tmp3
	pha
;	jmp	e_code		;this can be eliminated

e_code	lda	tmp1
	sta	Tbl,y		;restore values in the table
	lda	tmp2
	sta	Tbl+2,y
	jmp	(link)		;now jump to wherever


In case you're wondering, the reason for both l_word and e_code is to 
handle the case where X-reg == Y-reg. In that case the dispatch code 
jumps directly to e_code.

What follows is the data tables and such. This code is takes from test 
code I've compiled and worked with, so it is correct. Try and figure it 
out. :)


data	dc	h'1111 2222 3300 0000 0000 0044 4455 5566 0000 0077 8888'
mask	dc	h'0000 0000 00ff ffff ffff ff00 0000 0000 ffff ff00 0000'

Tbl	dc	i2'w_10,w_9,w_8,w_7,w_6,w_5,w_4,w_3,w_2,w_1,w_0'

w_0	pea	$8888
	jmp	(Tbl+18)	;this goes in right to left order
w_1	lda	<$12
	and	#$ff00
	ora	#0077
	pha
	jmp	(Tbl+16)
w_2	pei	<$10		;transparent word
	jmp	(Tbl+14)
w_3	pea	$5566
	jmp	(Tbl+12)
w_4	pea	$4455
        jmp     (Tbl+10)
w_5	lda	<$0A
	and	#$ff00
	ora	#$0044
	pha			;masked word
	jmp	(Tbl+8)
w_6	pei	<$08
	jmp	(Tbl+6)
w_7	pei	<$06
	jmp	(Tbl+4)
w_8	lda	$04
	and	#$00ff
	ora	#$3300
	pha
	jmp	(Tbl+2)
w_9	pea	$2222
	jmp	(Tbl)
w_10	pea	$1111
	jmp	e_code		;this is the last word, exit now


The algorithmic idea behind this sprite, is to handle the left and right 
edges as a special case, so that any data in the middle can be blitted 
one word at a time.

As you can see, there is a bit of overhead for each word, but it is 
still ~2 to 3 times better than a bitmap (11 to 23 cycles vs. 33 
cycles). Also, since the scanlines are stored in an array, one could 
have an independent table of offsets for each line for wave effects on a 
per sprite basis, or vertical flipping. Also, with the self clipping 
aspect, a different Xmin and Xmax can be defined for each line of the 
graphics screen, allowing for very complex borders.

If one were to use this system, a convenient structure to use as defined 
in C might be:


#define numLines 20

typedef struct sprite {
   int height;
   int width;
   void (*scanlines)[numLines](void);
   int offset[numLines]
   int Xmin[numLines]
   int Xmax[numLines]
   } sprite;


My main motivation for developing this type of sprite was to have a 
technique to eliminate the erase/update part of a draw/erase/update 
loop. In my Super Mario Bros game, I had at one point just a draw/draw 
loop where the screen would scroll and after the baseline of a sprite 
was reached, it would be drawn on top of the newly scrolled area. This 
was good, but introduced noticable flicker, especially for large 
sprites. If I could draw the sprites 1 scanline at a time, some tearing 
may occur, but no flicker. So this format was born. The extra features 
contained within it are just extensions of the scanline-independent 
nature.

I've tried to be careful to make sure that the same scanline can be used 
multiple times across several sprites. If one were to put some effort 
into it, sprites could dynamically change by changing entries in their 
scanline table. This could be used for some cool effects. 

Also, one point that may need some explanation. The reason the sprite 
exits via a 
jml (abs)
is two-fold. First, the SP is used to pass a parameter, and doing a 
js[lr] would corrupt that, not to mention use memory that's not yours. 
Also, since we are blitting to Bank 01, the DP and Stack are in that 
bank, so rt[ls] are not usable. By doing a jml to and from the sprite, 
we can make a dispatch routine to call them on a per-scanline basis.

Also, if needed, the sprite format can be modified to draw to any bank 
and be callable via an rtl. Some extra fields in the header need to be 
filled out, and the PEA,PEI optimization is no longer possible, so I 
don't know if it'd be worth it unless the variable clipping was worth 
it.

I'm sure there are flaws and sub-optimal code in my sprites, if anyone 
knows of another way to implement the self-clipping without testing 
after each blitted word or special-casing everything please get in touch 
with me. 

------------------------------------------------------------------------

One last thing. I always thought transparency was cool, so here's a way 
to do it for a compiled sprite. It's not 100% correct, but gives pretty 
good results:


	lda	screen_data
	and	#$EEEE
	clc
	adc	#sprite_data
	ror
	sta	screen_data

where sprite_data has the form %XXX0 XXX0 XXX0 XXX0