In article , John B. Matthews wrote: >In article > , "John B. Matthews" wrote: > >> In article <370265cd.0405111045.7cd359b7@posting.google.com>, aiiadict@hotmail.com (Rich J.) wrote: >> > >> > 16bit/16 bit division with 16 bit result (or 8?), with remainder >> > 16 bit to 16 bit compare (greater than, less than, equal) >> > >> > all with 6502 instructions. >> > >> > Rich >> > >Egad! I'd have sworn Sweet 16 does multiply and divide. Instead, >see lines 522 & 543 of the system monitor listing > >http://members.buckeye-express.com/marksm/6502/MON.TXT > The Apple II monitor listing above is a good reference, but those routines are kinda oddball in that they've been over-optimized for space, not speed or simplicity. Just a warning. Lots of folks have posted 16-bit compare/add/subtract, and you should make sure you understand those routines before moving on to multiply and divide. I'm going to give three multiply routines of increasing complexity. (I realize you didn't ask for this). I'll do divide in a few days. The routines will assume some zero-page locations: # Multiply VAL1 by VAL2, put result in VAL3 # Assume all numbers are positive (or just treat as unsigned) VAL1 equ $00 VAL2 equ $02 VAL3 equ $04 1. Simple shift and add algorithm Algorithm 1: look at low bit of VAL1. If set, add VAL2 to VAL3. Then, always shift VAL2 left and VAL1 right. Routine does MULADD: VAL3 = VAL3 + VAL1*VAL2. Set VAL3=0 before starting to just get MUL. This is just a 16x16=16 bit multiplier. The upper 16-bits of the true 32-bit result are lost MUL LDY #$10 MULLOOP LSR VAL1+1 ROR VAL1 BCC MULSHFT CLC LDA VAL2 ADC VAL3 STA VAL3 LDA VAL2+1 ADe VAL3+1 STA VAL3+1 MULSHFT ASL VAL2 ROL VAL2+1 DEY BNE MULLOOP RTS The basic idea is to do multiply like the following 4x4-bit sample: 0101 (5) x 1101 (13) ----- 0101 0000 0101 +0101 --------- 01000001 (65) Make sure you understand how the above works. 2. 16x16 = 32 bit multiply Here's a version that is a little better--it does a 16x16=32 bit multiply and leaves the low 16 bit result in VAL1 (upper part in VAL3): MUL LDY #$10 MULLOOP LDA VAL1 LSR BCC MULSHFT CLC LDA VAL2 ADC VAL3 STA VAL3 LDA VAL2+1 ADC VAL3+1 STA VAL3+1 MULSHFT LSR VAL3+1 ROR VAL3 ROR VAL1+1 ROR VAL1 DEY BNE MULLOOP RTS This is only a little slower than the previous algorithm and you get 16x16=32 bit multiply. The result is placed with low 16 bits in VAL1 and upper 16 bits in VAL3. It still adds in VAL3. This algorithm is similar to the Apple II monitor routine, but not as convoluted (that algorithm is complicated by an effort to save a few bytes). 3. Difference of squares. I haven't written this one out in 6502 assembly, but here's the sample C code: // input: x, y 8-bit numbers // vars: a, b, a2, b2 are 16-bit numbers mul8x8=16 bit: a = x + y b = x - y; if(b < 0) [ b = -b; } a2 = table[a >> 1]; if(a & 1) { a2 = a2 + ((2*a - 1) >> 2); } b2 = table[b >> 1]; if(b & 1) { b2 = b2 + ((2*b - 1) >> 2); } result = (a2 - b2); The above produces an 8x8=16 bit multiply using two table lookups into a 256-entry table of 16-bit entries. The 512-byte table contains 256 16-bit entries. Initialize the table with: for(i = 0; i < 256; i++) { table[i] = i*i; } How this works is a little tricky, but the basic idea is that instead of multiplying x*y, instead let a = x+y and b = x-y. Then a^2 - b^2 = (x+y)*(x+y) - (x-y)*(x-y) = x^2 + 2xy + y^2 - (x^2 - 2xy + y^2) = 4xy. So by doing an add and a subtract, and then two table lookups, we have our product (times 4). As an example, let x=4, y = 9. Then a = 13 and b = 5. 13^2 = 169, and 5^2=25. Then 169-25 = 144. Divide by 4 to get 36. However, if x and y are 8-bit numbers, then x+y is a 9-bit number, requiring 512 entries of 16-bits each. But the table contains squares, and its easy to figure out the next entry based on the previous entry: (n-1)^2 = n^2 - 2n + 1. So the delta between n^2 and (n-1)^2 is 2n - 1. The table is then shrunk by an index bit to be just 256 entries by calculating the odd squares from the even squares. Removing the factor of 4 from each entry is the final step that makes all the numbers work out right. It's an exercise for the reader to show why this works. (Hint: we're only storing even squares). The code can be further optimized since ((2*b - 1) >> 2) = ((b - 1) >> 1). And b must always be 8-bit, allowing some optimizations. In general, this algorithm probably doesn't help 6502 all that much since you need to call this routine 4 times to get 16x16=32 bit multiply, or at least 3 times to get 16x16=16 bit multiply. And all the little fixups needed don't map well to 8-bit operations on a 6502. It's a lot of overhead, but I throw it out to show a different multiply algorithm. Kent Dickey