Path: blue.weeg.uiowa.edu!news.uiowa.edu!hobbes.physics.uiowa.edu!math.ohio-state.edu!howland.reston.ans.net!torn!news.unb.ca!upei.ca!usenet
From: george@grover.psyc.upei.ca (George Taylor)
Newsgroups: comp.sys.apple2.programmer
Subject: Re: ASM multiply/divide
Date: 6 Jul 1994 18:59:53 GMT
Organization: University of Prince Edward Island, Charlottetown, PEI Canada
Lines: 28
Message-ID: <2veuv9$1u6@atlas.cs.upei.ca>
References:
Reply-To: yurik@io.org
NNTP-Posting-Host: grover.psyc.upei.ca
Sean Dockery writes
(whole bunch of divide/multiply routines).
Well, I just stumbled on this newsgroup by accident. Normally I read
comp.sys.cbm , being a 64 programmer. But, 6502 is my area of expertise,
and in 'our' world, we have much faster routines.
I have a multiply which takes only 40 cycles. this requires 1.5k of
tables, however. It works on a difference of squares algorithm.
4*a*b=(a+b)^2-(a-b)^2
You can derive the code from this formula...
Hint: put a in a zp location, and b in y, so you can do lda (adr),y to
read from 0-510 location table.
As for an algorithmic divide, an 8 bit divide can be done at 14 cycles per
bit at best. If you start using tables, the speed can be decreased by
about 50%.
I find it interesting to see how 'developed' coders are in a different
6502 world...
Btw, the difference of squares method is so powerful, I hear it saves even
a few cycles on an Amiga running with a 68000!
(as compared to a HARDWARE divide). Algorithms are in fact being replaced
by tables now in the design of some new math comprocessors.
(see the lastest transactions of IEEE for the floating point article).
Just out of curiousity, has anyone done solid vectors on the apple?
On the 64, you can plot an 80x80 solid cube, rotating in 3d, in a time of
about two frames per rotation ( in 4 colors).
That's around 40,000 cycles.
How fast is multiply on the 65816?
Later,
yurik/TEG
Path: blue.weeg.uiowa.edu!news.uiowa.edu!hobbes.physics.uiowa.edu!math.ohio-state.edu!howland.reston.ans.net!spool.mu.edu!torn!news.unb.ca!upei.ca!usenet
From: student@bert.psyc.upei.ca (Student Demonstations)
Newsgroups: comp.sys.apple2.programmer
Subject: Re: ASM multiply/divide
Date: 7 Jul 1994 10:19:10 GMT
Organization: University of Prince Edward Island, Charlottetown, PEI Canada
Lines: 45
Message-ID: <2vgkqu$g64@atlas.cs.upei.ca>
References: <2veuv9$1u6@atlas.cs.upei.ca>
Reply-To: yurik@io.org
NNTP-Posting-Host: bert.psyc.upei.ca
In article <2veuv9$1u6@atlas.cs.upei.ca> george@grover.psyc.upei.ca
(George Taylor) writes:
> As for an algorithmic divide, an 8 bit divide can be done at 14 cycles
per
> bit at best. If you start using tables, the speed can be decreased by
> about 50%.
I now follow up to myself with the code :)
8 bit unsigned divide
ldx #8
lda dividend
div cmp divisor
bcs s
sbc divisor
s rol quotient
rol
dex
bne div
timing in cycles:
overhead 5
"0" bit 20
"1" bit 18
range 148 to 164
average 156
unrolled, with dividend in A register, and last iteration optimized: 105
average of 14 cycles per bit.
dividend, quotient, and divisor are 8 bit values stored in zero page.
The remainder is left in A.
note: dividend/divisor must be < 2. So this is not a fully general
routine. To handle the full range, you must do a prefix of the dividend
to reduce it to the right range, then fix the quotient after. This is
true of any shift and subtract divide. True divides are much slower.
There are two faster routines I have written, but this is the fastest
algorithmic version. A faster version uses decision tree optimization and
requires over 256 bytes of code. The fastest version requires tables
which I cannot calculate without a computer.
I mean, I cannot even easily specify how to find them, since it is not a
formula.
-yurik
ps if anyone has some interesting math algorithms or graphics ideas,
please send me. Yes, I have already found the 14 cycle per point line
draw (not including plot) and the all integer ellipse and circle :)
Search my name in comp.sys.cbm for more interesting info.