Newsgroups: comp.sys.apple2 Path: news.weeg.uiowa.edu!news.uiowa.edu!hobbes.physics.uiowa.edu!moe.ksu.ksu.edu!crcnis1.unl.edu!wupost!uunet!comp.vuw.ac.nz!actrix!dempson From: dempson@swell.actrix.gen.nz (David Empson) Subject: Re: Lousy code generation (Re: MoriaGS 5.3.1 coming...) Organization: Actrix Information Exchange Date: Wed, 14 Apr 1993 13:26:32 GMT Message-ID: <1993Apr14.132632.7719@actrix.gen.nz> References: <1q03vsINN62k@gap.caltech.edu> <1qg65mINNsm6@gap.caltech.edu> Sender: dempson@actrix.gen.nz (David Empson) Lines: 90 In article <1qg65mINNsm6@gap.caltech.edu> nathan@cco.caltech.edu (Nathan Mates) writes: > In article jpenne@ee.ualberta.ca (Jerry Penner) writes: > > > >Subtracting two numbers takes just as long (or short) as comparing > >them. I think you'll need to come up with a better example than that. > >What else about MPW C's generated code is bloated? > > Sorry to disagree with you. As the chart that came with my ORCA/M 1.1 > assembler says, the sbc #xxxx and cmp #xxxx operands both take the same amount > of time. That I'll give you. But, look at what's necessary before subtraction > as opposed to comparing the numbers: a sec instruction. 2 measley cycles. > That's it... [stuff deleted] > > 720A: lda F0 (4+) > 720C: sec (2) > 720D: sbc #0003 (3) > 7210: bvs 7215 {+03} (2,3) > 7212: eor #8000 (3) > 7215: bmi 7230 {+19} (2,3) > 7217: lda xxxx > some other random code... > 7230: lda yyyy There's a very good reason for doing all of the strange code: the value loaded from location F0 is a _signed_ integer. The CMP instruction only deals with unsigned values, so you have to use SBC if you want to compare a signed integer. Determining the relationship between signed integers after doing the SBC involves examining the N, V and possibly Z flags. The following table gives the relationships. A is the accumulator, B is the value in memory that was subtracted. N V Z Relationship 0 0 0 A > B 0 0 1 A = B 0 1 0 A < B 0 1 1 not possible 1 0 0 A < B 1 0 1 not possible 1 1 0 A > B 1 1 1 not possible Most processors have special signed branch instructions which test the various combinations of N, V and Z. The 6502 and 65816 don't have any such instructions, so the program has to simulate their operation. The tests are as follows: Branch if greater than: Z = 0 and N = V Branch if less than or equal: Z = 1 or N <> V Branch if less than: N <> V Branch if greater than or equal: N = V The code you quoted is the fastest way to synthesize a "signed branch if greater than or equal" operation on the 65816. What this all boils down to is the original code: since the program declared the variable as an int (signed), the compiler must use signed comparison to work out if the variable is greater than or equal to 3. If the program declared the variable as an unsigned int, then the compiler could use unsigned comparison (the CMP instruction), which would be much faster. For this reason, if you're writing code in C on a 6502-based computer, it is always best to use unsigned variables whenever possible. For your information: ORCA/C, ORCA/Pascal, MPW Pascal, TML Pascal and all the other IIgs native C and Pascal compilers would have generated exactly the same code for the comparison. > If you can point out something that I've missed in this, go ahead. The > code works, I'll give it that, but it's rather wasteful of memory and > processor time. I was half thinking of writing an optimization program to > go through the code, scanning for things like this, and replacing it with > the shorter code, but now that moria's coming out, why bother? If the variable really can become negative, then "optimizing" this code would introduce a bug into the program. It would be a better idea to get the source code and change the variables to unsigned where appropriate. -- David Empson dempson@swell.actrix.gen.nz <--- Note my new E-Mail address! Snail mail: P.O. Box 27-103, Wellington, New Zealand Path: news.weeg.uiowa.edu!news.uiowa.edu!hobbes.physics.uiowa.edu!zaphod.mps.ohio-state.edu!swrinde!elroy.jpl.nasa.gov!nntp-server.caltech.edu!toddpw From: toddpw@cco.caltech.edu (Todd P. Whitesel) Newsgroups: comp.sys.apple2 Subject: Re: Lousy code generation (Re: MoriaGS 5.3.1 coming...) Date: 18 Apr 1993 12:20:19 GMT Organization: California Institute of Technology, Pasadena Lines: 55 Message-ID: <1qrh23INNmid@gap.caltech.edu> References: <7cHh03RCcf3E00@amdahl.uts.amdahl.com> <1q03vsINN62k@gap.caltech.edu> <1qg65mINNsm6@gap.caltech.edu> NNTP-Posting-Host: punisher.caltech.edu nathan@cco.caltech.edu (Nathan Mates) writes: [stuff] Depending on the type of data that's in $F0, only one of the code samples is correct. Isn't anyone teaching today's assembly programmers about signed comparisons? The minimal LDA/CMP/BCS sequence assumes that both operands are UNSIGNED or have the same sign. If they are signed and have different signs, this sequence fails to compare them correctly. Try 1 >= -1 ($FFFF). We are up against an inherent flaw in the 65xxx series ALU's here; there is no direct test for signed comparisons. You need to compute the term (N XOR V) or (in explicit, C-ish notation) (N && !V) || (V && !N) which is easier if you use SBC because it computes V for you. CMP does not. Computing V manually is a royal pain, so it is better to accomodate SBC and live with it. In this particular example, however, the compiler could have taken advantage of the fact that one of the numbers is known to be positive. We can correctly do a signed comparison of ($F0 >= #3) with the following: lda $F0 bmi less-than cmp #3 bcs g-or-e less-than: If you are comparing two unknown signed numbers, though, then you are stuck using something more like this (I altered it to match the explanation better): lda $F0 sec sbc $F2 bvc okay eor #$8000 okay bpl g-or-e less-than: Start with ($F0 >= $F2), and shuffle it to ($F0 - $F2) >= 0. The first three instructions compute ($F0 - $F2), and if no overflow occurred (V is clear) we got a mathematically correct result so we skip down to the bottom and use the BPL ( >= 0 ) test to branch. If overflow _did_ occur, then the N bit is -wrong- and we need to invert it first, so we do that by EOR-ing with $8000. The code example Nathan disassembled does exactly the same thing, only it reverses the sense of both branches -- they use a BMI to test the opposite of the branch condition and flip the N bit in the non-overflow case, which takes more time and seems to be the common path to me, so I saved two cycles by changing it to branch over the EOR in the non-overflow case instead. Todd Whitesel toddpw @ cco.caltech.edu