In article <2gj3o9F3d6lnU1@uni-berlin.de>, Kent Dickey <kadickey@alumni.princeton.edu> wrote:
>In article <nospam-4C3750.23050311052004@news-server.woh.rr.com>, John B. Matthews <nospam@nospam.com> wrote:
>>In article <nospam-AAC5F1.22351411052004@news-server-fe-02.columbus.rr.com> , "John B. Matthews" <nospam@nospam.com> wrote:
>>> In article <370265cd.0405111045.7cd359b7@posting.google.com>, aiiadict@hotmail.com (Rich J.) wrote:
>>> >
>>> > 16bit/16 bit division with 16 bit result (or 8?), with remainder
>>> > 16 bit to 16 bit compare (greater than, less than, equal)
>>> > 
>>> > all with 6502 instructions.
>>> > 
>>> > Rich
>>> 
>>
>>Egad! I'd have sworn Sweet 16 does multiply and divide. Instead, 
>>see lines 522 & 543 of the system monitor listing
>>
>>http://members.buckeye-express.com/marksm/6502/MON.TXT
>>
>
>The Apple II monitor listing above is a good reference, but those routines
>are kinda oddball in that they've been over-optimized for space, not speed or
>simplicity.  Just a warning.
>
[ snip]

>I'm going to give three multiply routines of increasing complexity.
>(I realize you didn't ask for this).  I'll do divide in a few days.

OK, now for some divide routines.

Again, some zero-page locations are used:
NUM	equ	$00	; Numerator
DEN	equ	$02	; Denominator
REM	equ	$04	; Remainder
QUOT	equ	$06	; Quotient = result (for first routine only)

The idea is to do long division in binary.  Here's an example:
           1 0010
      ___________
  101 )  101 1101
        -101
        ----
         000

         000 1
        - 0000
          ----
             1

             11
          - 000
            ---
             11

             110
            -101
             ---
             001
               11
             -000
              ---
               11

Result = 10010, remainder = 11.  And yup, 93/5 = 18 remainder 3.

To do division, you have to "line-up" the denominator with the numerator
and then start producing the quotient digits.

A simple approach would be to shift the denominator to the left until it
is >= the numerator, then do the compare/subtract/shift steps.  Then
undo the shift at the end to fix up the remainder.

But this is inefficient and unnecessary.  Instead, the numerator can
be shifted into the REM register one bit a time, and compared with the
normal denominator.  If REM >= DEN, then shift a 1 into the QUOT,
otherwise shift in a zero.  Repeat for all 16 bits.

Here's that code (again, unsigned only, or positive values only):

DIVIDE
        LDY     #$10
        LDA     #$00
        STA     REM+1
        STA     REM
        STA     QUOT+1
        STA     QUOT
DIVLOOP 
        ASL     NUM
        ROL     NUM+1
        ROL     REM
        ROL     REM+1
        SEC     
        LDA     REM
        SBC     DEN
        PHA     
        LDA     REM+1
        SBC     DEN+1
        BCC     DIVSHIFT
        STA     REM+1
        PLA     
        STA     REM
        PHA     
DIVSHIFT
        PLA             ; toss
        ROL     QUOT
        ROL     QUOT+1
DIVLOOPEND
        DEY     
	BNE	DIVLOOP
	RTS

The routine does the subtraction of REM-DEN and saves the result on the
stack and in ACC until it sees if it is < 0 or not.  If it is >= 0, it
updates REM with the new values.  In any case, the carry gets shifted
into QUOT indicating whether to set this bit in QUOT or not.

There are many inefficiencies in the above, I wrote it as a more straight-
forward form.  First, the QUOT register is unnecessary.  Note how the NUM
is shifting by one each time just as QUOT is.  So we can have them share
the same memory.  And the PHA/PLA can be replaced with TAX/TXA. 


DIVIDE
        LDY     #$10
        LDA     #$00
        STA     REM+1
        STA     REM
        ASL     NUM
        ROL     NUM+1
DIVLOOP
        ROL     REM
        ROL     REM+1
        SEC
        LDA     REM
        SBC     DEN
        TAX
        LDA     REM+1
        SBC     DEN+1
        BCC     DIVSHIFT
        STA     REM+1
        STX     REM
DIVSHIFT
        ROL     NUM
        ROL     NUM+1
DIVLOOPEND
        DEY
        BNE     DIVLOOP
        RTS

There are some other optimizations to be made.  The A2 monitor routine is
very similar to this except it eliminates the redundant ROL NUM's before
DIVLOOP and at DIVSHIFT.

Kent Dickey


In article <370265cd.0405161945.2ea7db95@posting.google.com>, Rich J. <aiiadict@hotmail.com> wrote:
>the method subtracts the 8 bit divisor from the 16 bit numerator until
>it is less than the divisor.  the ammount left in the numerator when
>it is less than the divisor is the remainder.
>
>any advice appreciated (ps, I know I should read references.  they
>aren't here.  Internet searches come up will all sorts of results, but
>none are what I want specifically)
>
>the numerator is between 0 and 400
>the divisor is 20 (always)
>the quotient will always be 1byte (400/20 = 20)
>the remainder will of course always be one byte
>the numerator need not be left intact.   
>
>the code below (written from memory of my battle with the IIe earlier)
>seems to give strange results sometimes.  but I am understanding more
>about this now that I have typed it all out and thought about it.
>
>yuck.  maybe not.  when a borrow condition is found the code doesn't
>work.  please save my hair.  I am about to pull it out.
>
>      ldy #0                         ;initialize quotient
> start  lda numLO
>      cmp #divisor
>      bmi testHI                 ; if low byte is less than divisor,
>see if we can \
>                                          ;borrow from hibyte
>      sec                
>      sbc divisor                ;subtract divisor from low byte
>      sta numLO
>      lda numHI
>subHI      iny                    ;increment quotient
>      sbc #00                     ;subtract 1 from HI byte if there
>was a borrow
>      sta numHI
>      bne start                    ;hibyte is still > 0, so keep on
>subtracting
>      lda numLO                ;hibyte is 0, see if Lobyte can still
>have divisor
>      cmp divisor                ;subtracted from it 
>      bmi done                    ;lobyte is less than divisor, end
>subtraction
>      jmp start
>testHI lda numHI
>            BEQ done
>            sec
>            jmp subHI
>
>done sty quotient
>       lda numLO
>       sta remainder
>       rts
>
>numHI NOP
>numLO NOP
>divisor equ #20             ;I want to divide a 16 bit # by 20
>remainder nop
>quotient nop
>
>
>Rich
>aiiadict AT hotmail DOT com
>http://rich12345.tripod.com

As you said, the above code does not work.

First, a minor nit:  I think you need to always code "cmp #divisor",
with the "#" everytime--otherwise you'll probably get cmp $14 which will
compare to zero-page location $14.  Even if your assembler has special
syntax for this, I recommend staying compatible with most other
assemblers.

There are two main flaws in the above algorithm: BMI is not the function
you want to use to see if the CMP indicates the subtract was successful,
and if numLO < 0x14 but numHI != 0, the subtraction does not work correctly.

I think you may have some confusion about how SBC/CMP work, so I'll give
my explanation.  In this case, you're doing a SBC #$14 (decimal 20).
It's useful to note that 6502 subtraction is always this:

	SBC val   is the same thing as ADC (val ^ 0xff)

In other words, a subtract just inverts all the bits in its operand and
does a normal add-with-carry.  This means SBC #$14 is identical to ADC
#$EB.  The other fact to note is that ((val ^ 0xff) + 1) = -val in
binary signed math (which is what the 6502 does).  This is why SBC
requires carry to be set beforehand--just inverting the value does not
make it -value, it needs an extra 1 carried in.  Once you see this, a
lot of confusion around subtraction should go away.

I followed your same basic flow but fixed the problems here:

        ldy     #0
start   lda     numLO
        sec     
        sbc     #20		; skip the CMP, just do the subtract
        bcs     DoSub		; carry set = subtract is OK to do
        ldx     numHI		; else, check if numHI is 0
        beq     done		; if it is, we're done
DoSub   sta     numLO		; If we get here, subtract #$14 from num
        lda     numHI
        sbc     #0
        sta     numHI
        iny			; increment quotient in Y
        bne     start		; Can use bne instead of JMP since Y < 255
done    sty     quotient
        lda     numLO
        sta     remainder
        rts

The basic idea is to first see if numLO >= 20.  If it is, go to DoSub to
do a subtraction.  Otherwise, see if numHI != 0.  If it is, go to DoSub;
otherwise, we're done.

An optimization just does the SBC instead of a CMP since they set the
carry bit the same but SBC also gives the result we need.  Another trick
is to check numHI with a LDX to preserve the carry which DoSub will
need.  As with almost any piece of code, lots of optimizations are still
possible (for instance, the SEC can be moved outside of the loop,
and numLO could be kept in the X register).

However, if you really only need to divide a maximum of 400 by a fixed
divider of 20, then more tricks can be used.

First, 20 can be factored 5*4.  A divide by 4 is just a simple shift, so
that can be done outside the loop.  This changes the problem to a new
range of a max of 100 divided by 5.  Here's an algorithm to then do an
8-bit divide by 5 in the inner loop and handle the divide by 4 at the
beginning and end.  The end divide-by-4 fixups just affects the remainder--
the main divide will give the correct quotient without the fixup at the end:

Divide_4_5
        lsr     NUM+1
        ror     NUM
        ror     NUM+1   ; stick bit at top of NUM+1, rotate out next bit
        ror     NUM
        ror     NUM+1   ; stick bit at top of NUM+1
        LDY     #$8
        LDA     #$00
        ASL     NUM
DIVLOOP
        ROL     
        CMP     #$5
        BCC     DIVSHIFT
        SBC     #$5
DIVSHIFT
        ROL     NUM
DIVLOOPEND
        DEY
        BNE     DIVLOOP
        asl     num+1
        rol
        asl     num+1
        rol
        sta     REM
        RTS

This converts NUM+1 and NUM into a single byte: NUM, and then the DIVLOOP
does an 8-bit divide.  An optimization is to now save the REM location
in the accumulator for the inner loop which is faster and saves space.
The routine can handle NUM at the start being as high as 1023 ($3FF).

Overall, this routine is faster in the worst-case than the divide-by-
repeated-subtraction method earlier, but if you know you're numerator
values will often be small, then that routine could be faster for your
purposes.

You only need a single shift to get the range down from 400 max
to 200, and that would fit in a byte, so a "ror NUM; ror NUM+1" at the
top and a "asl num+1; rol" at the bottom can be removed to save space
if the CMP and SBC are changed to use #$0a instead.

Kent Dickey