In article <2gj3o9F3d6lnU1@uni-berlin.de>, Kent Dickey wrote:
>In article , John B. Matthews wrote:
>>In article , "John B. Matthews" wrote:
>>> In article <370265cd.0405111045.7cd359b7@posting.google.com>, aiiadict@hotmail.com (Rich J.) wrote:
>>> >
>>> > 16bit/16 bit division with 16 bit result (or 8?), with remainder
>>> > 16 bit to 16 bit compare (greater than, less than, equal)
>>> >
>>> > all with 6502 instructions.
>>> >
>>> > Rich
>>>
>>
>>Egad! I'd have sworn Sweet 16 does multiply and divide. Instead,
>>see lines 522 & 543 of the system monitor listing
>>
>>http://members.buckeye-express.com/marksm/6502/MON.TXT
>>
>
>The Apple II monitor listing above is a good reference, but those routines
>are kinda oddball in that they've been over-optimized for space, not speed or
>simplicity. Just a warning.
>
[ snip]
>I'm going to give three multiply routines of increasing complexity.
>(I realize you didn't ask for this). I'll do divide in a few days.
OK, now for some divide routines.
Again, some zero-page locations are used:
NUM equ $00 ; Numerator
DEN equ $02 ; Denominator
REM equ $04 ; Remainder
QUOT equ $06 ; Quotient = result (for first routine only)
The idea is to do long division in binary. Here's an example:
1 0010
___________
101 ) 101 1101
-101
----
000
000 1
- 0000
----
1
11
- 000
---
11
110
-101
---
001
11
-000
---
11
Result = 10010, remainder = 11. And yup, 93/5 = 18 remainder 3.
To do division, you have to "line-up" the denominator with the numerator
and then start producing the quotient digits.
A simple approach would be to shift the denominator to the left until it
is >= the numerator, then do the compare/subtract/shift steps. Then
undo the shift at the end to fix up the remainder.
But this is inefficient and unnecessary. Instead, the numerator can
be shifted into the REM register one bit a time, and compared with the
normal denominator. If REM >= DEN, then shift a 1 into the QUOT,
otherwise shift in a zero. Repeat for all 16 bits.
Here's that code (again, unsigned only, or positive values only):
DIVIDE
LDY #$10
LDA #$00
STA REM+1
STA REM
STA QUOT+1
STA QUOT
DIVLOOP
ASL NUM
ROL NUM+1
ROL REM
ROL REM+1
SEC
LDA REM
SBC DEN
PHA
LDA REM+1
SBC DEN+1
BCC DIVSHIFT
STA REM+1
PLA
STA REM
PHA
DIVSHIFT
PLA ; toss
ROL QUOT
ROL QUOT+1
DIVLOOPEND
DEY
BNE DIVLOOP
RTS
The routine does the subtraction of REM-DEN and saves the result on the
stack and in ACC until it sees if it is < 0 or not. If it is >= 0, it
updates REM with the new values. In any case, the carry gets shifted
into QUOT indicating whether to set this bit in QUOT or not.
There are many inefficiencies in the above, I wrote it as a more straight-
forward form. First, the QUOT register is unnecessary. Note how the NUM
is shifting by one each time just as QUOT is. So we can have them share
the same memory. And the PHA/PLA can be replaced with TAX/TXA.
DIVIDE
LDY #$10
LDA #$00
STA REM+1
STA REM
ASL NUM
ROL NUM+1
DIVLOOP
ROL REM
ROL REM+1
SEC
LDA REM
SBC DEN
TAX
LDA REM+1
SBC DEN+1
BCC DIVSHIFT
STA REM+1
STX REM
DIVSHIFT
ROL NUM
ROL NUM+1
DIVLOOPEND
DEY
BNE DIVLOOP
RTS
There are some other optimizations to be made. The A2 monitor routine is
very similar to this except it eliminates the redundant ROL NUM's before
DIVLOOP and at DIVSHIFT.
Kent Dickey
In article <370265cd.0405161945.2ea7db95@posting.google.com>, Rich J. wrote:
>the method subtracts the 8 bit divisor from the 16 bit numerator until
>it is less than the divisor. the ammount left in the numerator when
>it is less than the divisor is the remainder.
>
>any advice appreciated (ps, I know I should read references. they
>aren't here. Internet searches come up will all sorts of results, but
>none are what I want specifically)
>
>the numerator is between 0 and 400
>the divisor is 20 (always)
>the quotient will always be 1byte (400/20 = 20)
>the remainder will of course always be one byte
>the numerator need not be left intact.
>
>the code below (written from memory of my battle with the IIe earlier)
>seems to give strange results sometimes. but I am understanding more
>about this now that I have typed it all out and thought about it.
>
>yuck. maybe not. when a borrow condition is found the code doesn't
>work. please save my hair. I am about to pull it out.
>
> ldy #0 ;initialize quotient
> start lda numLO
> cmp #divisor
> bmi testHI ; if low byte is less than divisor,
>see if we can \
> ;borrow from hibyte
> sec
> sbc divisor ;subtract divisor from low byte
> sta numLO
> lda numHI
>subHI iny ;increment quotient
> sbc #00 ;subtract 1 from HI byte if there
>was a borrow
> sta numHI
> bne start ;hibyte is still > 0, so keep on
>subtracting
> lda numLO ;hibyte is 0, see if Lobyte can still
>have divisor
> cmp divisor ;subtracted from it
> bmi done ;lobyte is less than divisor, end
>subtraction
> jmp start
>testHI lda numHI
> BEQ done
> sec
> jmp subHI
>
>done sty quotient
> lda numLO
> sta remainder
> rts
>
>numHI NOP
>numLO NOP
>divisor equ #20 ;I want to divide a 16 bit # by 20
>remainder nop
>quotient nop
>
>
>Rich
>aiiadict AT hotmail DOT com
>http://rich12345.tripod.com
As you said, the above code does not work.
First, a minor nit: I think you need to always code "cmp #divisor",
with the "#" everytime--otherwise you'll probably get cmp $14 which will
compare to zero-page location $14. Even if your assembler has special
syntax for this, I recommend staying compatible with most other
assemblers.
There are two main flaws in the above algorithm: BMI is not the function
you want to use to see if the CMP indicates the subtract was successful,
and if numLO < 0x14 but numHI != 0, the subtraction does not work correctly.
I think you may have some confusion about how SBC/CMP work, so I'll give
my explanation. In this case, you're doing a SBC #$14 (decimal 20).
It's useful to note that 6502 subtraction is always this:
SBC val is the same thing as ADC (val ^ 0xff)
In other words, a subtract just inverts all the bits in its operand and
does a normal add-with-carry. This means SBC #$14 is identical to ADC
#$EB. The other fact to note is that ((val ^ 0xff) + 1) = -val in
binary signed math (which is what the 6502 does). This is why SBC
requires carry to be set beforehand--just inverting the value does not
make it -value, it needs an extra 1 carried in. Once you see this, a
lot of confusion around subtraction should go away.
I followed your same basic flow but fixed the problems here:
ldy #0
start lda numLO
sec
sbc #20 ; skip the CMP, just do the subtract
bcs DoSub ; carry set = subtract is OK to do
ldx numHI ; else, check if numHI is 0
beq done ; if it is, we're done
DoSub sta numLO ; If we get here, subtract #$14 from num
lda numHI
sbc #0
sta numHI
iny ; increment quotient in Y
bne start ; Can use bne instead of JMP since Y < 255
done sty quotient
lda numLO
sta remainder
rts
The basic idea is to first see if numLO >= 20. If it is, go to DoSub to
do a subtraction. Otherwise, see if numHI != 0. If it is, go to DoSub;
otherwise, we're done.
An optimization just does the SBC instead of a CMP since they set the
carry bit the same but SBC also gives the result we need. Another trick
is to check numHI with a LDX to preserve the carry which DoSub will
need. As with almost any piece of code, lots of optimizations are still
possible (for instance, the SEC can be moved outside of the loop,
and numLO could be kept in the X register).
However, if you really only need to divide a maximum of 400 by a fixed
divider of 20, then more tricks can be used.
First, 20 can be factored 5*4. A divide by 4 is just a simple shift, so
that can be done outside the loop. This changes the problem to a new
range of a max of 100 divided by 5. Here's an algorithm to then do an
8-bit divide by 5 in the inner loop and handle the divide by 4 at the
beginning and end. The end divide-by-4 fixups just affects the remainder--
the main divide will give the correct quotient without the fixup at the end:
Divide_4_5
lsr NUM+1
ror NUM
ror NUM+1 ; stick bit at top of NUM+1, rotate out next bit
ror NUM
ror NUM+1 ; stick bit at top of NUM+1
LDY #$8
LDA #$00
ASL NUM
DIVLOOP
ROL
CMP #$5
BCC DIVSHIFT
SBC #$5
DIVSHIFT
ROL NUM
DIVLOOPEND
DEY
BNE DIVLOOP
asl num+1
rol
asl num+1
rol
sta REM
RTS
This converts NUM+1 and NUM into a single byte: NUM, and then the DIVLOOP
does an 8-bit divide. An optimization is to now save the REM location
in the accumulator for the inner loop which is faster and saves space.
The routine can handle NUM at the start being as high as 1023 ($3FF).
Overall, this routine is faster in the worst-case than the divide-by-
repeated-subtraction method earlier, but if you know you're numerator
values will often be small, then that routine could be faster for your
purposes.
You only need a single shift to get the range down from 400 max
to 200, and that would fit in a byte, so a "ror NUM; ror NUM+1" at the
top and a "asl num+1; rol" at the bottom can be removed to save space
if the CMP and SBC are changed to use #$0a instead.
Kent Dickey