6502's don't support division directly so the closest thing we've got are
the shift operators. The nasty part is that I'm trying to write a pixel
driver and need to divide the X coord by 7 to get the correct pixel byte. I
thought about using a table, but somehow, having 140 or even 280 different
values stored didn't seem appealing for addressing a meager 40 bytes. Is
this the fastest division by 7 routine possible?


;Divide by 7 Specific routine
;Divisor in A
;Dividend always 7
DIV7   TAX            ;clear SAVEA for use
       LDA    #$0
       STA    RESULT
       TXA
LDIV   PHA            ;save A onto the stack
       LSR            ;divide a by 8
       LSR
       LSR
       STA    SAVEA
       ADC    RESULT  ;add the results together
       STA    RESULT
       PLA
       AND    #$7     ;get the remainder
       ADC    SAVEA   ;add back the quotient
       CMP    #$7     ;are we done yet?
       BNE    TEST2
       INC    RESULT
       LDA    #0
       BEQ    DONE
TEST2  BCS    LDIV
DONE   TAX            ;the remainder goes in X
       LDA    RESULT  ;the result goes in A
       RTS

This would loop at most 3 times. Or is it better to just do this:

;General division routine
;Divisor in A
;Dividend in X
DIV    CLC
       STX    DIVISR
       PHA
       LDA    #0
       STA    RESULT
       PLA
       BNE    ADJUST
       BRK            ;division by zero error
ADJUST BMI    DIVIDE  ;loop till the high bit gets set
       ASL    DIVISR
       BCC    ADJUST
DIVIDE CMP    DIVISR
       BCC    NEXT    ;skip ahead if A is less
SET    PHA
       LDA    RESULT
       ORA    #1
       STA    RESULT
       PLA
       SBC    DIVISR
NEXT   ASL    RESULT
       CLC
       ROR    DIVISR
       BCC    DIVIDE
DONE   TAX            ;put the remainder in X
       LDA    RESULT  ;and the result in A
       RTS

In this case, the ADJUST and DIVIDE loops may loop up to 7 times for any
given number. It's definitely more utility, but it's longer and more time
consuming. Is there a faster way to handle the X address of a pixel short of
using a table?

R. King


http://rich12345.tripod.com/srccode/graphic.html

this above page has the whole sourcecode

following this post is the division routine.

it takes the x coordinate, subtracts 7 from
it until it is less than 7, and puts the
result in Y register/xbyte variable and the
remainder in A register/xbit variable.

xbyte refers to which byte to draw to,
xbit refers to which shape in your preshifted
table should be drawn.

Rich

SCX = screen X coordinate

start 	ldy #0
	lda scx
	clc
dloop 	cmp #7		; this is a cheap division
	bcc ddone	; that can be avoided
	sec
	sbc #7
	iny
	clc
	jmp dloop
ddone 	sta xbit	; division done
	sty xbyte


aiiadict@gmail.com writes:

> http://rich12345.tripod.com/srccode/graphic.html
> 
> this above page has the whole sourcecode
> 
> following this post is the division routine.
> 
> it takes the x coordinate, subtracts 7 from
> it until it is less than 7, and puts the
> result in Y register/xbyte variable and the
> remainder in A register/xbit variable.
> 
> xbyte refers to which byte to draw to,
> xbit refers to which shape in your preshifted
> table should be drawn.
> 
> Rich
> 
> SCX = screen X coordinate
> 
> start 	ldy #0
> 	lda scx
> 	clc
> dloop 	cmp #7		; this is a cheap division
> 	bcc ddone	; that can be avoided
> 	sec
> 	sbc #7
> 	iny
> 	clc
> 	jmp dloop
> ddone 	sta xbit	; division done
> 	sty xbyte
> 

The loop in the above routine can be shortened to three instructions:


start 	ldy #ff
	lda scx
	sec
dloop 	iny
	sbc #7		; this is a cheap division
	bcs dloop	; that can be avoided
        adc #7
 	sta xbit	; division done
	sty xbyte

Scott
-- 
Scott Hemphill	hemphill@alumni.caltech.edu
"This isn't flying.  This is falling, with style."  -- Buzz Lightyear


Ok. Just finished cycle counting.

In the "Divide by 7 specific" routine I gave, it measures out to around 135
cycles for calculating 255/7. For the super-short subtraction version given
below it comes up to 234 given the same assumptions and calculation. Is
there any way to get a result as fast as the routine I listed while making
it smaller like the one below?


"Scott Hemphill" <hemphill@hemphills.net> wrote in message
news:m3zmybq9ey.fsf@pearl.local...
> aiiadict@gmail.com writes:
>
<snipped>
>
> start ldy #ff
> lda scx
> sec
> dloop iny
> sbc #7 ; this is a cheap division
> bcs dloop ; that can be avoided
>         adc #7
>   sta xbit ; division done
> sty xbyte


"Ranando King" <rk@magictouchcorp.com> writes:

> Ok. Just finished cycle counting.
> 
> In the "Divide by 7 specific" routine I gave, it measures out to around 135
> cycles for calculating 255/7. For the super-short subtraction version given
> below it comes up to 234 given the same assumptions and calculation. Is
> there any way to get a result as fast as the routine I listed while making
> it smaller like the one below?

Do you really need the whole range 0-255 ?  Is it worst case performance
that needs to be minimized or average case?

> "Scott Hemphill" <hemphill@hemphills.net> wrote in message
> news:m3zmybq9ey.fsf@pearl.local...
> > aiiadict@gmail.com writes:
> >
> <snipped>
> >
> > start ldy #ff
> > lda scx
> > sec
> > dloop iny
> > sbc #7 ; this is a cheap division
> > bcs dloop ; that can be avoided
> >         adc #7
> >   sta xbit ; division done
> > sty xbyte

You can split the range in half (roughly) with a few more instructions:

        lda scx
        cmp #126      ; decimal 7*18
        bcc dentry1   ; branch if less than 126
        ldy #17       ; start with quotient 18-1
        sbc #126
        bcs dloop
dentry1 ldy #-1
        sec
dloop   iny
        sbc #7
        bcs dloop
        adc #7
        sta xbit
        sty xbyte

Scott
-- 
Scott Hemphill	hemphill@alumni.caltech.edu
"This isn't flying.  This is falling, with style."  -- Buzz Lightyear


> With the function I wrote, for 35 of the possible 255 values, the time is
> 53-61 cycles.

Here is a 22-byte routine that takes between 32 and 67 cycles
(about 50 average), not counting the final RTS.

* Integer division by seven
* Input: an unsigned byte (0..255) in A
* Output: Computes A/7
*         quotient (0..36) in X
*         remainder (0..6) in A

* I use (8a+b)/7 = a + (a+b)/7
* where a = higher five bits and b = lower three bits

DIVIDE7:
    tax
    and #$7
    sta TMP     ; low bits (b)
    txa
    lsr         ; replace with ror to use carry as 9th bit of A
    lsr
    lsr
    tax         ; high bits, divided by 8 (a)
    clc
    adc TMP     ; a + b. here we know carry is low.
.1  inx         ; the loop is executed between 1 and 6 times
    sbc #7      ; since A is at most 38
    bcs .1      
    adc #8      ; not 7 because of low carry at top of loop
    dex
    rts

Paul Guertin
pg@sff.net


Ranando King wrote:
> 6502's don't support division directly so the closest thing we've got are
> the shift operators. The nasty part is that I'm trying to write a pixel
> driver and need to divide the X coord by 7 to get the correct pixel byte. I
> thought about using a table, but somehow, having 140 or even 280 different
> values stored didn't seem appealing for addressing a meager 40 bytes. Is
> this the fastest division by 7 routine possible?

For speed, you would be better off using lookup tables. A while back I 
put out a sample block routine. If you want to avoid generating the 
tables yourself try this.

ftp://ftp.apple.asimov.net/pub/apple_II/documentation/zblock.source.txt

Of course you might be able to use the whole thing with a modification 
or two.

Cheers,
Mike T.


"Ranando King" <rk@magictouchcorp.com> writes:

> 6502's don't support division directly so the closest thing we've got are
> the shift operators. The nasty part is that I'm trying to write a pixel
> driver and need to divide the X coord by 7 to get the correct pixel byte. I
> thought about using a table, but somehow, having 140 or even 280 different
> values stored didn't seem appealing for addressing a meager 40 bytes. Is
> this the fastest division by 7 routine possible?
> 
> 
> ;Divide by 7 Specific routine
> ;Divisor in A
> ;Dividend always 7
> DIV7   TAX            ;clear SAVEA for use
>        LDA    #$0
>        STA    RESULT
>        TXA
> LDIV   PHA            ;save A onto the stack
>        LSR            ;divide a by 8
>        LSR
>        LSR
>        STA    SAVEA
>        ADC    RESULT  ;add the results together
>        STA    RESULT
>        PLA
>        AND    #$7     ;get the remainder
>        ADC    SAVEA   ;add back the quotient
>        CMP    #$7     ;are we done yet?
>        BNE    TEST2
>        INC    RESULT
>        LDA    #0
>        BEQ    DONE
> TEST2  BCS    LDIV
> DONE   TAX            ;the remainder goes in X
>        LDA    RESULT  ;the result goes in A
>        RTS

It seems to me that there must be room for improvement, because the
following code successfully calculates division by 7 with no loops.
The drawback is that it doesn't produce a remainder.

div7    sta     dvdend
        lsr
        lsr
        lsr
        adc     dvdend
        ror
        lsr
        lsr
        adc     dvdend
        ror
        lsr
        lsr
        sta     result
        rts

Scott
-- 
Scott Hemphill	hemphill@alumni.caltech.edu
"This isn't flying.  This is falling, with style."  -- Buzz Lightyear


Scott Hemphill <hemphill@hemphills.net> writes:

> It seems to me that there must be room for improvement, because the
> following code successfully calculates division by 7 with no loops.
> The drawback is that it doesn't produce a remainder.
> 
> div7    sta     dvdend
>         lsr
>         lsr
>         lsr
>         adc     dvdend
>         ror
>         lsr
>         lsr
>         adc     dvdend
>         ror
>         lsr
>         lsr
>         sta     result
>         rts

One way to get the remainder is to just multiply the result by 7 and
subtrace from the dividend:

div7    sta     dvdend
        lsr
        lsr
        lsr
        adc     dvdend
        ror
        lsr
        lsr
        adc     dvdend
        ror
        lsr
        lsr
        sta     result
        asl
        asl
        asl
        eor     #-1
        sec
        adc     result
        clc
        adc     dvdend
        tax
        lda     result
        rts        

Scott
-- 
Scott Hemphill	hemphill@alumni.caltech.edu
"This isn't flying.  This is falling, with style."  -- Buzz Lightyear


"Ranando King" <rk@magictouchcorp.com> writes:

> ... and that would be the type of answer I was looking for.
> Consistently fast, small code. Always comes out to about 61 cycles or less.

Although this code can be modified to work with carry as a 9th bit, Paul
Guertin's code looks like it will then be faster in the worst case.

It's interesting to compare the theories of operation of the posted division
methods.

Your code divides by 8 to get an approximate quotient, then subtracts seven
times that from the dividend to get a remainder, which needs to be divided
by seven.  You then repeat this operation in a loop, needing no more than
three tries to get the remainder down below 7.

Rich (aiiadict@gmail.com) and my first posted code divide by seven by
subtracting seven repeatedly, counting the number of times you can do this.

Paul Guertin's code is a hybrid between these two methods, dividing by 8
to get an approximate quotient and a remainder, then subtracting off 7
repeatedly to finalize the answer.

This last posting of mine divides by seven by multiplying by one seventh.
One seventh has the binary representation 0.00100100,10010010,01....
I use just the first three one bits, using 1/7 ~ (1/8+1/64+1/512),
calculating the low bits of the answer first.  There's also some magic
in using the carries from the LSR instructions in the ADCs to cause
rounding up in all the right places.

[snip]

> There is one more question though. Does anyone know where in memory the
> Applesoft math functions are?

Try $E7A0 through $F1D4.  I don't have the entry points handy.

Scott
-- 
Scott Hemphill	hemphill@alumni.caltech.edu
"This isn't flying.  This is falling, with style."  -- Buzz Lightyear


On Tue, 15 Feb 2005 14:33:45 -0600, "Ranando King" <rk@magictouchcorp.com>
wrote:

> > Although this code can be modified to work with carry as a 9th bit, Paul
> > Guertin's code looks like it will then be faster in the worst case.
> 
> True. That's why I'm using his division routine in the graphics module.

Glad to be of help.

If you're going to use the carry as a 9th bit, it may be faster
yet to use (64a+b+c)/7 = 9a+b + (a+b+c)/7. That is, split the
9-bit dividend into three 3-bit slices, form 9a+b and a+b+c using
three-position shifts, and count how many times you can decrement
a+b+c by 7, staying positive. Either with a loop as I did, or by
manually unrolling the loop (decision tree) if every cycle counts.

I'm a little busy right now and don't have to time to write a
proper implementation, but my guess is it should be significantly
faster if the dividend is often >255 (since in this case,
a+b+c <= 7+7+7 = 21 instead of a+b <= 63+7 =70. So three loops
maximum (or two if you exclude 511, the only number which makes
a+b+c=21) instead of ten.

I don't know if it is worth it if your numbers are always going
to be in 0..279, though. 279 only loops 6 times.

Come to think of it, the simplest way might be to test the state
of the carry bit at the very beginning of the routine. If C=0, use
my routine, and if C=1, use a modified version of my routine which
adds 36 to X and 4 to A before the loop (since 256 = 36*7 + 4).

Sorry for the unstructured rambling and no code.

> > It's interesting to compare the theories of operation of the posted
> division
> > methods.

Just for fun, here's a different method, which works but doesn't
seem to have any significant advantage over the ones already posted.
It takes between 67 and 73 cycles. The idea is quite simple.

   cmp #$e0     ; divide by 7*2^n, for n=6 down to 1
   bcc .1
   sbc #$e0
.1 rol tmp

   cmp #$70
   bcc .2
   sbc #$70
.2 rol tmp

   cmp #$38
   bcc .3
   sbc #$38
.3 rol tmp

   cmp #$1c
   bcc .4
   sbc #$1c
.4 rol tmp

   cmp #$e
   bcc .5
   sbc #$e
.5 rol tmp

   cmp #$7
   bcc .6
   sbc #$7
.6 rol tmp

   tax          ; remainder in x
   lda tmp
   and #$3f     ; quotient in a

Takes about 75 cycles worst case, 68 cycles average.

> > > There is one more question though. Does anyone know where in memory the
> > > Applesoft math functions are?

See http://cosmicwolf.com/AppleII/AppleSoft_Commented.htm

Paul Guertin
pg@sff.net


Scott Hemphill <hemphill@hemphills.net> writes:

> "Ranando King" <rk@magictouchcorp.com> writes:
> 
> > ... and that would be the type of answer I was looking for.
> > Consistently fast, small code. Always comes out to about 61 cycles or less.

OK, I've got another improvement.  It divides A by seven, returning the
quotient in A and the remainder in the lowest three bits of X.

div7    sta     dvdend
        lsr
        lsr
        lsr
        adc     dvdend
        ror
        lsr
        lsr
        adc     dvdend
        ror
        lsr
        lsr
	adc	dvdend
	tax			; save for remainder
	ror
	lsr
	lsr			; quotient is now in A
	bcc	end		; subtract one from X if carry set
	dex			; remainder is in lowest 3 bits of X
end:    rts        

It can be extended to work for (256*carry+A) < 448, but it loses its
elegance.  When you add the dividend back in each time, you need to keep
the carry bit if it resulted from the adc, but also set carry if it was set
upon entry to the routine.  448 is the upper limit because that's when
A + (A/7) overflows 9 bits.

div7    php
	sta     dvdend
        ror
        lsr
        lsr
        adc     dvdend
	bcs	:1
	plp
	php
:1      ror
        lsr
        lsr
        adc     dvdend
	bcs	:2
	plp
	php
:2      ror
        lsr
        lsr
	adc	dvdend
	bcs	:3
	plp
	php
:3	tax			; save for remainder
	ror
	lsr
	lsr			; quotient is now in A
	bcc	end		; subtract one from X if carry set
	dex			; remainder is in lowest 3 bits of X
end:    plp
	rts        


Scott
-- 
Scott Hemphill	hemphill@alumni.caltech.edu
"This isn't flying.  This is falling, with style."  -- Buzz Lightyear


On 15 Feb 2005 13:21:33 -0500, Scott Hemphill <hemphill@hemphills.net> wrote:

> This last posting of mine divides by seven by multiplying by one seventh.
> One seventh has the binary representation 0.00100100,10010010,01....
> I use just the first three one bits, using 1/7 ~ (1/8+1/64+1/512),
> calculating the low bits of the answer first.  There's also some magic
> in using the carries from the LSR instructions in the ADCs to cause
> rounding up in all the right places.

Indeed, good magic is happening with the carry bit in your routine.
It's very nice!

The same idea can be used to divide by 3, 15, or in general
2**k-1 for any k.

Paul Guertin
pg@sff.net


Paul Guertin <pg@sff.net> writes:

> On 15 Feb 2005 13:21:33 -0500, Scott Hemphill <hemphill@hemphills.net> wrote:
> 
> > This last posting of mine divides by seven by multiplying by one seventh.
> > One seventh has the binary representation 0.00100100,10010010,01....
> > I use just the first three one bits, using 1/7 ~ (1/8+1/64+1/512),
> > calculating the low bits of the answer first.  There's also some magic
> > in using the carries from the LSR instructions in the ADCs to cause
> > rounding up in all the right places.
> 
> Indeed, good magic is happening with the carry bit in your routine.
> It's very nice!
> 
> The same idea can be used to divide by 3, 15, or in general
> 2**k-1 for any k.
> 
> Paul Guertin
> pg@sff.net

But by extension, you can divide by any integer.  For example, 1/5 doesn't
have a denominator that fits the pattern 2**k-1, but 3/15 does.  One fifth
has the binary representation 0.00110011,00110011,....  The following
routine works:

div5	sta	dvdend
	lsr
	adc	dvdend
	ror
	lsr
	lsr
	adc	dvdend
	ror
	adc	dvdend
	ror
	lsr
	lsr
	adc	dvdend
	ror
	adc	dvdend
	tax		; save for remainder
	ror
	lsr
	lsr
			; if we didn't need the remainder
			;  we could return here
	tay		; save quotient
	txa		; retrieve last 3 bits shifted out
	and	#7	; calculate remainder
	ina
	lsr
	tax		; store remainder
	tya		; retrieve quotient
	rts	

Scott
-- 
Scott Hemphill	hemphill@alumni.caltech.edu
"This isn't flying.  This is falling, with style."  -- Buzz Lightyear


Paul Guertin <pg@sff.net> writes:

> On 15 Feb 2005 13:21:33 -0500, Scott Hemphill <hemphill@hemphills.net> wrote:
> 
> > This last posting of mine divides by seven by multiplying by one seventh.
> > One seventh has the binary representation 0.00100100,10010010,01....
> > I use just the first three one bits, using 1/7 ~ (1/8+1/64+1/512),
> > calculating the low bits of the answer first.  There's also some magic
> > in using the carries from the LSR instructions in the ADCs to cause
> > rounding up in all the right places.
> 
> Indeed, good magic is happening with the carry bit in your routine.
> It's very nice!

It's hard to prove that it all works, but it's easier to show that it works.
I used Mathematica to operate on all possible dividends simultaneously,
using the following code:

------------------------------------------------------------------------
adc[m_] := Block[{}, a+=m+c; c=Floor[a/256]; a=Mod[a,256]];
and[m_] := a=BitAnd[a,m];
asl := Block[{}, a*=2; c=Floor[a/256]; a=Mod[a,256]];
bit[m_] := BitAnd[a,m];
clc := c=0&/@c;
cmp[m_] := c=Floor[1+(a-m)/256];
cpx[m_] := c=Floor[1+(x-m)/256];
cpy[m_] := c=Floor[1+(y-m)/256];
dea := a=Mod[a-1,256];
dex := x=Mod[x-1,256];
dey := y=Mod[y-1,256];
eor[m_] := a=BitXor[a,m];
ina := a=Mod[a+1,256];
inx := x=Mod[x+1,256];
iny := y=Mod[y+1,256];
lda[m_] := a=m;
ldx[m_] := x=m;
ldy[m_] := y=m;
lsr := Block[{}, c=Mod[a,2]; a=Floor[a/2]];
ora[m_] := BitOr[a,m];
rol := Block[{}, a=2*a+c; c=Floor[a/256]; a=Mod[a,256]];
ror := Block[{}, a+=256*c; c=Mod[a,2]; a=Floor[a/2]];
sbc[m_] := Block[{}, a+=255-m+c; c=Floor[a/256]; a=Mod[a,256]];
sec := c=1&/@c;
tax := x=a;
tay := y=a;
txa := a=x;
tya := a=y;

div7 := Block[{dvdend},
  dvdend = a;
  lsr;
  lsr;
  lsr;
  adc[dvdend];
  ror;
  lsr;
  lsr;
  adc[dvdend];
  ror;
  lsr;
  lsr;
  adc[dvdend];
  tax;
  ror;
  lsr;
  lsr;
  x -= c;
  a];

a = Range[256]-1;  (* a is a list from 0..255 *)
q = Floor[a/7];    (* this is the quotient for each element in list *)
r = Mod[a,7];      (* this is the remainder for each element in list *)

div7;              (* list of carries will be intialized by first lsr *)

If[ a == q, Print["quotient is correct"]];
If[ BitAnd[x,7] == r, Print["remainder is correct"]];
------------------------------------------------------------------------

Scott
-- 
Scott Hemphill	hemphill@alumni.caltech.edu
"This isn't flying.  This is falling, with style."  -- Buzz Lightyear