Newsgroups: comp.sys.apple2
Path: news.weeg.uiowa.edu!news.uiowa.edu!hobbes.physics.uiowa.edu!moe.ksu.ksu.edu!crcnis1.unl.edu!wupost!uunet!comp.vuw.ac.nz!actrix!dempson
From: dempson@swell.actrix.gen.nz (David Empson)
Subject: Re: Lousy code generation (Re: MoriaGS 5.3.1 coming...)
Organization: Actrix Information Exchange
Date: Wed, 14 Apr 1993 13:26:32 GMT
Message-ID: <1993Apr14.132632.7719@actrix.gen.nz>
References: <1q03vsINN62k@gap.caltech.edu> <jpenne.734744316@ee.ualberta.ca> <1qg65mINNsm6@gap.caltech.edu>
Sender: dempson@actrix.gen.nz (David Empson)
Lines: 90

In article <1qg65mINNsm6@gap.caltech.edu> nathan@cco.caltech.edu (Nathan Mates) writes:
> In article <jpenne.734744316@ee.ualberta.ca> jpenne@ee.ualberta.ca (Jerry Penner) writes:
> >
> >Subtracting two numbers takes just as long (or short) as comparing
> >them.  I think you'll need to come up with a better example than that.
> >What else about MPW C's generated code is bloated?
>
>     Sorry to disagree with you. As the chart that came with my ORCA/M 1.1
> assembler says, the sbc #xxxx and cmp #xxxx operands both take the same amount
> of time. That I'll give you. But, look at what's necessary before subtraction
> as opposed to comparing the numbers: a sec instruction. 2 measley cycles.
> That's it...

[stuff deleted]

> 
> 720A: lda F0              (4+)
> 720C: sec                 (2)
> 720D: sbc #0003           (3)
> 7210: bvs 7215 {+03}      (2,3)
> 7212: eor #8000           (3)
> 7215: bmi 7230 {+19}      (2,3)
> 7217: lda xxxx
>    some other random code...
> 7230: lda yyyy

There's a very good reason for doing all of the strange code: the
value loaded from location F0 is a _signed_ integer.  The CMP
instruction only deals with unsigned values, so you have to use SBC if
you want to compare a signed integer.

Determining the relationship between signed integers after doing the
SBC involves examining the N, V and possibly Z flags.

The following table gives the relationships.  A is the accumulator, B
is the value in memory that was subtracted.

N V Z   Relationship
0 0 0   A > B
0 0 1   A = B
0 1 0   A < B
0 1 1   not possible
1 0 0   A < B
1 0 1   not possible
1 1 0   A > B
1 1 1   not possible

Most processors have special signed branch instructions which test the
various combinations of N, V and Z.  The 6502 and 65816 don't have any
such instructions, so the program has to simulate their operation.

The tests are as follows:

Branch if greater than:          Z = 0 and N = V
Branch if less than or equal:    Z = 1 or N <> V
Branch if less than:             N <> V
Branch if greater than or equal: N = V


The code you quoted is the fastest way to synthesize a "signed branch
if greater than or equal" operation on the 65816.


What this all boils down to is the original code: since the program
declared the variable as an int (signed), the compiler must use signed
comparison to work out if the variable is greater than or equal to 3.
If the program declared the variable as an unsigned int, then the
compiler could use unsigned comparison (the CMP instruction), which
would be much faster.  For this reason, if you're writing code in C on
a 6502-based computer, it is always best to use unsigned variables
whenever possible.

For your information: ORCA/C, ORCA/Pascal, MPW Pascal, TML Pascal and
all the other IIgs native C and Pascal compilers would have generated
exactly the same code for the comparison.

>    If you can point out something that I've missed in this, go ahead. The
> code works, I'll give it that, but it's rather wasteful of memory and
> processor time. I was half thinking of writing an optimization program to
> go through the code, scanning for things like this, and replacing it with
> the shorter code, but now that moria's coming out, why bother?

If the variable really can become negative, then "optimizing" this
code would introduce a bug into the program.  It would be a better
idea to get the source code and change the variables to unsigned where
appropriate.
-- 
David Empson                                                               
dempson@swell.actrix.gen.nz    <--- Note my new E-Mail address!            
Snail mail: P.O. Box 27-103, Wellington, New Zealand 
Path: news.weeg.uiowa.edu!news.uiowa.edu!hobbes.physics.uiowa.edu!zaphod.mps.ohio-state.edu!swrinde!elroy.jpl.nasa.gov!nntp-server.caltech.edu!toddpw
From: toddpw@cco.caltech.edu (Todd P. Whitesel)
Newsgroups: comp.sys.apple2
Subject: Re: Lousy code generation (Re: MoriaGS 5.3.1 coming...)
Date: 18 Apr 1993 12:20:19 GMT
Organization: California Institute of Technology, Pasadena
Lines: 55
Message-ID: <1qrh23INNmid@gap.caltech.edu>
References: <7cHh03RCcf3E00@amdahl.uts.amdahl.com> <1q03vsINN62k@gap.caltech.edu> <jpenne.734744316@ee.ualberta.ca> <1qg65mINNsm6@gap.caltech.edu>
NNTP-Posting-Host: punisher.caltech.edu

nathan@cco.caltech.edu (Nathan Mates) writes:
[stuff]

Depending on the type of data that's in $F0, only one of the code samples is
correct. Isn't anyone teaching today's assembly programmers about signed
comparisons?

The minimal LDA/CMP/BCS sequence assumes that both operands are UNSIGNED or
have the same sign. If they are signed and have different signs, this sequence
fails to compare them correctly. Try 1 >= -1 ($FFFF).

We are up against an inherent flaw in the 65xxx series ALU's here; there is no
direct test for signed comparisons. You need to compute the term
	(N XOR V)
or (in explicit, C-ish notation)
	(N && !V) || (V && !N)
which is easier if you use SBC because it computes V for you. CMP does not.
Computing V manually is a royal pain, so it is better to accomodate SBC and
live with it.

In this particular example, however, the compiler could have taken advantage
of the fact that one of the numbers is known to be positive. We can correctly
do a signed comparison of ($F0 >= #3) with the following:

	lda	$F0
	bmi	less-than
	cmp	#3
	bcs	g-or-e
less-than:

If you are comparing two unknown signed numbers, though, then you are stuck
using something more like this (I altered it to match the explanation better):

	lda	$F0
	sec
	sbc	$F2
	bvc	okay
	eor	#$8000
okay	bpl	g-or-e
less-than:

Start with ($F0 >= $F2), and shuffle it to ($F0 - $F2) >= 0. The first three
instructions compute ($F0 - $F2), and if no overflow occurred (V is clear) we
got a mathematically correct result so we skip down to the bottom and use the
BPL ( >= 0 ) test to branch. If overflow _did_ occur, then the N bit is -wrong-
and we need to invert it first, so we do that by EOR-ing with $8000.

The code example Nathan disassembled does exactly the same thing, only it
reverses the sense of both branches -- they use a BMI to test the opposite
of the branch condition and flip the N bit in the non-overflow case, which
takes more time and seems to be the common path to me, so I saved two cycles
by changing it to branch over the EOR in the non-overflow case instead.

Todd Whitesel
toddpw @ cco.caltech.edu