6502's don't support division directly so the closest thing we've got are the shift operators. The nasty part is that I'm trying to write a pixel driver and need to divide the X coord by 7 to get the correct pixel byte. I thought about using a table, but somehow, having 140 or even 280 different values stored didn't seem appealing for addressing a meager 40 bytes. Is this the fastest division by 7 routine possible? ;Divide by 7 Specific routine ;Divisor in A ;Dividend always 7 DIV7 TAX ;clear SAVEA for use LDA #$0 STA RESULT TXA LDIV PHA ;save A onto the stack LSR ;divide a by 8 LSR LSR STA SAVEA ADC RESULT ;add the results together STA RESULT PLA AND #$7 ;get the remainder ADC SAVEA ;add back the quotient CMP #$7 ;are we done yet? BNE TEST2 INC RESULT LDA #0 BEQ DONE TEST2 BCS LDIV DONE TAX ;the remainder goes in X LDA RESULT ;the result goes in A RTS This would loop at most 3 times. Or is it better to just do this: ;General division routine ;Divisor in A ;Dividend in X DIV CLC STX DIVISR PHA LDA #0 STA RESULT PLA BNE ADJUST BRK ;division by zero error ADJUST BMI DIVIDE ;loop till the high bit gets set ASL DIVISR BCC ADJUST DIVIDE CMP DIVISR BCC NEXT ;skip ahead if A is less SET PHA LDA RESULT ORA #1 STA RESULT PLA SBC DIVISR NEXT ASL RESULT CLC ROR DIVISR BCC DIVIDE DONE TAX ;put the remainder in X LDA RESULT ;and the result in A RTS In this case, the ADJUST and DIVIDE loops may loop up to 7 times for any given number. It's definitely more utility, but it's longer and more time consuming. Is there a faster way to handle the X address of a pixel short of using a table? R. King http://rich12345.tripod.com/srccode/graphic.html this above page has the whole sourcecode following this post is the division routine. it takes the x coordinate, subtracts 7 from it until it is less than 7, and puts the result in Y register/xbyte variable and the remainder in A register/xbit variable. xbyte refers to which byte to draw to, xbit refers to which shape in your preshifted table should be drawn. Rich SCX = screen X coordinate start ldy #0 lda scx clc dloop cmp #7 ; this is a cheap division bcc ddone ; that can be avoided sec sbc #7 iny clc jmp dloop ddone sta xbit ; division done sty xbyte aiiadict@gmail.com writes: > http://rich12345.tripod.com/srccode/graphic.html > > this above page has the whole sourcecode > > following this post is the division routine. > > it takes the x coordinate, subtracts 7 from > it until it is less than 7, and puts the > result in Y register/xbyte variable and the > remainder in A register/xbit variable. > > xbyte refers to which byte to draw to, > xbit refers to which shape in your preshifted > table should be drawn. > > Rich > > SCX = screen X coordinate > > start ldy #0 > lda scx > clc > dloop cmp #7 ; this is a cheap division > bcc ddone ; that can be avoided > sec > sbc #7 > iny > clc > jmp dloop > ddone sta xbit ; division done > sty xbyte > The loop in the above routine can be shortened to three instructions: start ldy #ff lda scx sec dloop iny sbc #7 ; this is a cheap division bcs dloop ; that can be avoided adc #7 sta xbit ; division done sty xbyte Scott -- Scott Hemphill hemphill@alumni.caltech.edu "This isn't flying. This is falling, with style." -- Buzz Lightyear Ok. Just finished cycle counting. In the "Divide by 7 specific" routine I gave, it measures out to around 135 cycles for calculating 255/7. For the super-short subtraction version given below it comes up to 234 given the same assumptions and calculation. Is there any way to get a result as fast as the routine I listed while making it smaller like the one below? "Scott Hemphill" wrote in message news:m3zmybq9ey.fsf@pearl.local... > aiiadict@gmail.com writes: > > > start ldy #ff > lda scx > sec > dloop iny > sbc #7 ; this is a cheap division > bcs dloop ; that can be avoided > adc #7 > sta xbit ; division done > sty xbyte "Ranando King" writes: > Ok. Just finished cycle counting. > > In the "Divide by 7 specific" routine I gave, it measures out to around 135 > cycles for calculating 255/7. For the super-short subtraction version given > below it comes up to 234 given the same assumptions and calculation. Is > there any way to get a result as fast as the routine I listed while making > it smaller like the one below? Do you really need the whole range 0-255 ? Is it worst case performance that needs to be minimized or average case? > "Scott Hemphill" wrote in message > news:m3zmybq9ey.fsf@pearl.local... > > aiiadict@gmail.com writes: > > > > > > > start ldy #ff > > lda scx > > sec > > dloop iny > > sbc #7 ; this is a cheap division > > bcs dloop ; that can be avoided > > adc #7 > > sta xbit ; division done > > sty xbyte You can split the range in half (roughly) with a few more instructions: lda scx cmp #126 ; decimal 7*18 bcc dentry1 ; branch if less than 126 ldy #17 ; start with quotient 18-1 sbc #126 bcs dloop dentry1 ldy #-1 sec dloop iny sbc #7 bcs dloop adc #7 sta xbit sty xbyte Scott -- Scott Hemphill hemphill@alumni.caltech.edu "This isn't flying. This is falling, with style." -- Buzz Lightyear > With the function I wrote, for 35 of the possible 255 values, the time is > 53-61 cycles. Here is a 22-byte routine that takes between 32 and 67 cycles (about 50 average), not counting the final RTS. * Integer division by seven * Input: an unsigned byte (0..255) in A * Output: Computes A/7 * quotient (0..36) in X * remainder (0..6) in A * I use (8a+b)/7 = a + (a+b)/7 * where a = higher five bits and b = lower three bits DIVIDE7: tax and #$7 sta TMP ; low bits (b) txa lsr ; replace with ror to use carry as 9th bit of A lsr lsr tax ; high bits, divided by 8 (a) clc adc TMP ; a + b. here we know carry is low. .1 inx ; the loop is executed between 1 and 6 times sbc #7 ; since A is at most 38 bcs .1 adc #8 ; not 7 because of low carry at top of loop dex rts Paul Guertin pg@sff.net Ranando King wrote: > 6502's don't support division directly so the closest thing we've got are > the shift operators. The nasty part is that I'm trying to write a pixel > driver and need to divide the X coord by 7 to get the correct pixel byte. I > thought about using a table, but somehow, having 140 or even 280 different > values stored didn't seem appealing for addressing a meager 40 bytes. Is > this the fastest division by 7 routine possible? For speed, you would be better off using lookup tables. A while back I put out a sample block routine. If you want to avoid generating the tables yourself try this. ftp://ftp.apple.asimov.net/pub/apple_II/documentation/zblock.source.txt Of course you might be able to use the whole thing with a modification or two. Cheers, Mike T. "Ranando King" writes: > 6502's don't support division directly so the closest thing we've got are > the shift operators. The nasty part is that I'm trying to write a pixel > driver and need to divide the X coord by 7 to get the correct pixel byte. I > thought about using a table, but somehow, having 140 or even 280 different > values stored didn't seem appealing for addressing a meager 40 bytes. Is > this the fastest division by 7 routine possible? > > > ;Divide by 7 Specific routine > ;Divisor in A > ;Dividend always 7 > DIV7 TAX ;clear SAVEA for use > LDA #$0 > STA RESULT > TXA > LDIV PHA ;save A onto the stack > LSR ;divide a by 8 > LSR > LSR > STA SAVEA > ADC RESULT ;add the results together > STA RESULT > PLA > AND #$7 ;get the remainder > ADC SAVEA ;add back the quotient > CMP #$7 ;are we done yet? > BNE TEST2 > INC RESULT > LDA #0 > BEQ DONE > TEST2 BCS LDIV > DONE TAX ;the remainder goes in X > LDA RESULT ;the result goes in A > RTS It seems to me that there must be room for improvement, because the following code successfully calculates division by 7 with no loops. The drawback is that it doesn't produce a remainder. div7 sta dvdend lsr lsr lsr adc dvdend ror lsr lsr adc dvdend ror lsr lsr sta result rts Scott -- Scott Hemphill hemphill@alumni.caltech.edu "This isn't flying. This is falling, with style." -- Buzz Lightyear Scott Hemphill writes: > It seems to me that there must be room for improvement, because the > following code successfully calculates division by 7 with no loops. > The drawback is that it doesn't produce a remainder. > > div7 sta dvdend > lsr > lsr > lsr > adc dvdend > ror > lsr > lsr > adc dvdend > ror > lsr > lsr > sta result > rts One way to get the remainder is to just multiply the result by 7 and subtrace from the dividend: div7 sta dvdend lsr lsr lsr adc dvdend ror lsr lsr adc dvdend ror lsr lsr sta result asl asl asl eor #-1 sec adc result clc adc dvdend tax lda result rts Scott -- Scott Hemphill hemphill@alumni.caltech.edu "This isn't flying. This is falling, with style." -- Buzz Lightyear "Ranando King" writes: > ... and that would be the type of answer I was looking for. > Consistently fast, small code. Always comes out to about 61 cycles or less. Although this code can be modified to work with carry as a 9th bit, Paul Guertin's code looks like it will then be faster in the worst case. It's interesting to compare the theories of operation of the posted division methods. Your code divides by 8 to get an approximate quotient, then subtracts seven times that from the dividend to get a remainder, which needs to be divided by seven. You then repeat this operation in a loop, needing no more than three tries to get the remainder down below 7. Rich (aiiadict@gmail.com) and my first posted code divide by seven by subtracting seven repeatedly, counting the number of times you can do this. Paul Guertin's code is a hybrid between these two methods, dividing by 8 to get an approximate quotient and a remainder, then subtracting off 7 repeatedly to finalize the answer. This last posting of mine divides by seven by multiplying by one seventh. One seventh has the binary representation 0.00100100,10010010,01.... I use just the first three one bits, using 1/7 ~ (1/8+1/64+1/512), calculating the low bits of the answer first. There's also some magic in using the carries from the LSR instructions in the ADCs to cause rounding up in all the right places. [snip] > There is one more question though. Does anyone know where in memory the > Applesoft math functions are? Try $E7A0 through $F1D4. I don't have the entry points handy. Scott -- Scott Hemphill hemphill@alumni.caltech.edu "This isn't flying. This is falling, with style." -- Buzz Lightyear On Tue, 15 Feb 2005 14:33:45 -0600, "Ranando King" wrote: > > Although this code can be modified to work with carry as a 9th bit, Paul > > Guertin's code looks like it will then be faster in the worst case. > > True. That's why I'm using his division routine in the graphics module. Glad to be of help. If you're going to use the carry as a 9th bit, it may be faster yet to use (64a+b+c)/7 = 9a+b + (a+b+c)/7. That is, split the 9-bit dividend into three 3-bit slices, form 9a+b and a+b+c using three-position shifts, and count how many times you can decrement a+b+c by 7, staying positive. Either with a loop as I did, or by manually unrolling the loop (decision tree) if every cycle counts. I'm a little busy right now and don't have to time to write a proper implementation, but my guess is it should be significantly faster if the dividend is often >255 (since in this case, a+b+c <= 7+7+7 = 21 instead of a+b <= 63+7 =70. So three loops maximum (or two if you exclude 511, the only number which makes a+b+c=21) instead of ten. I don't know if it is worth it if your numbers are always going to be in 0..279, though. 279 only loops 6 times. Come to think of it, the simplest way might be to test the state of the carry bit at the very beginning of the routine. If C=0, use my routine, and if C=1, use a modified version of my routine which adds 36 to X and 4 to A before the loop (since 256 = 36*7 + 4). Sorry for the unstructured rambling and no code. > > It's interesting to compare the theories of operation of the posted > division > > methods. Just for fun, here's a different method, which works but doesn't seem to have any significant advantage over the ones already posted. It takes between 67 and 73 cycles. The idea is quite simple. cmp #$e0 ; divide by 7*2^n, for n=6 down to 1 bcc .1 sbc #$e0 .1 rol tmp cmp #$70 bcc .2 sbc #$70 .2 rol tmp cmp #$38 bcc .3 sbc #$38 .3 rol tmp cmp #$1c bcc .4 sbc #$1c .4 rol tmp cmp #$e bcc .5 sbc #$e .5 rol tmp cmp #$7 bcc .6 sbc #$7 .6 rol tmp tax ; remainder in x lda tmp and #$3f ; quotient in a Takes about 75 cycles worst case, 68 cycles average. > > > There is one more question though. Does anyone know where in memory the > > > Applesoft math functions are? See http://cosmicwolf.com/AppleII/AppleSoft_Commented.htm Paul Guertin pg@sff.net Scott Hemphill writes: > "Ranando King" writes: > > > ... and that would be the type of answer I was looking for. > > Consistently fast, small code. Always comes out to about 61 cycles or less. OK, I've got another improvement. It divides A by seven, returning the quotient in A and the remainder in the lowest three bits of X. div7 sta dvdend lsr lsr lsr adc dvdend ror lsr lsr adc dvdend ror lsr lsr adc dvdend tax ; save for remainder ror lsr lsr ; quotient is now in A bcc end ; subtract one from X if carry set dex ; remainder is in lowest 3 bits of X end: rts It can be extended to work for (256*carry+A) < 448, but it loses its elegance. When you add the dividend back in each time, you need to keep the carry bit if it resulted from the adc, but also set carry if it was set upon entry to the routine. 448 is the upper limit because that's when A + (A/7) overflows 9 bits. div7 php sta dvdend ror lsr lsr adc dvdend bcs :1 plp php :1 ror lsr lsr adc dvdend bcs :2 plp php :2 ror lsr lsr adc dvdend bcs :3 plp php :3 tax ; save for remainder ror lsr lsr ; quotient is now in A bcc end ; subtract one from X if carry set dex ; remainder is in lowest 3 bits of X end: plp rts Scott -- Scott Hemphill hemphill@alumni.caltech.edu "This isn't flying. This is falling, with style." -- Buzz Lightyear On 15 Feb 2005 13:21:33 -0500, Scott Hemphill wrote: > This last posting of mine divides by seven by multiplying by one seventh. > One seventh has the binary representation 0.00100100,10010010,01.... > I use just the first three one bits, using 1/7 ~ (1/8+1/64+1/512), > calculating the low bits of the answer first. There's also some magic > in using the carries from the LSR instructions in the ADCs to cause > rounding up in all the right places. Indeed, good magic is happening with the carry bit in your routine. It's very nice! The same idea can be used to divide by 3, 15, or in general 2**k-1 for any k. Paul Guertin pg@sff.net Paul Guertin writes: > On 15 Feb 2005 13:21:33 -0500, Scott Hemphill wrote: > > > This last posting of mine divides by seven by multiplying by one seventh. > > One seventh has the binary representation 0.00100100,10010010,01.... > > I use just the first three one bits, using 1/7 ~ (1/8+1/64+1/512), > > calculating the low bits of the answer first. There's also some magic > > in using the carries from the LSR instructions in the ADCs to cause > > rounding up in all the right places. > > Indeed, good magic is happening with the carry bit in your routine. > It's very nice! > > The same idea can be used to divide by 3, 15, or in general > 2**k-1 for any k. > > Paul Guertin > pg@sff.net But by extension, you can divide by any integer. For example, 1/5 doesn't have a denominator that fits the pattern 2**k-1, but 3/15 does. One fifth has the binary representation 0.00110011,00110011,.... The following routine works: div5 sta dvdend lsr adc dvdend ror lsr lsr adc dvdend ror adc dvdend ror lsr lsr adc dvdend ror adc dvdend tax ; save for remainder ror lsr lsr ; if we didn't need the remainder ; we could return here tay ; save quotient txa ; retrieve last 3 bits shifted out and #7 ; calculate remainder ina lsr tax ; store remainder tya ; retrieve quotient rts Scott -- Scott Hemphill hemphill@alumni.caltech.edu "This isn't flying. This is falling, with style." -- Buzz Lightyear Paul Guertin writes: > On 15 Feb 2005 13:21:33 -0500, Scott Hemphill wrote: > > > This last posting of mine divides by seven by multiplying by one seventh. > > One seventh has the binary representation 0.00100100,10010010,01.... > > I use just the first three one bits, using 1/7 ~ (1/8+1/64+1/512), > > calculating the low bits of the answer first. There's also some magic > > in using the carries from the LSR instructions in the ADCs to cause > > rounding up in all the right places. > > Indeed, good magic is happening with the carry bit in your routine. > It's very nice! It's hard to prove that it all works, but it's easier to show that it works. I used Mathematica to operate on all possible dividends simultaneously, using the following code: ------------------------------------------------------------------------ adc[m_] := Block[{}, a+=m+c; c=Floor[a/256]; a=Mod[a,256]]; and[m_] := a=BitAnd[a,m]; asl := Block[{}, a*=2; c=Floor[a/256]; a=Mod[a,256]]; bit[m_] := BitAnd[a,m]; clc := c=0&/@c; cmp[m_] := c=Floor[1+(a-m)/256]; cpx[m_] := c=Floor[1+(x-m)/256]; cpy[m_] := c=Floor[1+(y-m)/256]; dea := a=Mod[a-1,256]; dex := x=Mod[x-1,256]; dey := y=Mod[y-1,256]; eor[m_] := a=BitXor[a,m]; ina := a=Mod[a+1,256]; inx := x=Mod[x+1,256]; iny := y=Mod[y+1,256]; lda[m_] := a=m; ldx[m_] := x=m; ldy[m_] := y=m; lsr := Block[{}, c=Mod[a,2]; a=Floor[a/2]]; ora[m_] := BitOr[a,m]; rol := Block[{}, a=2*a+c; c=Floor[a/256]; a=Mod[a,256]]; ror := Block[{}, a+=256*c; c=Mod[a,2]; a=Floor[a/2]]; sbc[m_] := Block[{}, a+=255-m+c; c=Floor[a/256]; a=Mod[a,256]]; sec := c=1&/@c; tax := x=a; tay := y=a; txa := a=x; tya := a=y; div7 := Block[{dvdend}, dvdend = a; lsr; lsr; lsr; adc[dvdend]; ror; lsr; lsr; adc[dvdend]; ror; lsr; lsr; adc[dvdend]; tax; ror; lsr; lsr; x -= c; a]; a = Range[256]-1; (* a is a list from 0..255 *) q = Floor[a/7]; (* this is the quotient for each element in list *) r = Mod[a,7]; (* this is the remainder for each element in list *) div7; (* list of carries will be intialized by first lsr *) If[ a == q, Print["quotient is correct"]]; If[ BitAnd[x,7] == r, Print["remainder is correct"]]; ------------------------------------------------------------------------ Scott -- Scott Hemphill hemphill@alumni.caltech.edu "This isn't flying. This is falling, with style." -- Buzz Lightyear