Marvin Miller <marvinmiller@hotmail.com> wrote:

> Are the TAG and Data rams the same chips? I understand the needs for
> different chip speeds between the two, but other than that, does the
> fact that one is called TAG and one is called DATA mean that the chips
> themselves are different?

No - apart from possible timing issues, both chips are standard 8Kx8 or
32Kx8 static RAMs.

As far as functionality goes:

The DATA RAM holds actual data read from memory in the computer.

The TAG RAM holds information about the validity of the cached data.  My
earlier theory was that it holds a bank number with a reserved value to
mean "invalid", and the ASIC may be using some additional internal RAM
to store the extra address bits if you have less than 64 KB of cache
installed.


Another possibility is that if you don't have 64 KB of cache installed,
it doesn't cache as many banks.

With 32 KB of cache, it would need to store an extra address bit in the
tag RAM, leaving 7 bits for the bank number.  Assuming it can cache fast
RAM and ROM (banks $FC to $FF in a ROM 3) but not slow RAM (banks $E0
and $E1), and needs to reserve one value to mean "invalid", it would not
be able to cache banks $7B through $7F (i.e. only 7.6875 MB of fast RAM
would be cached).

With 16 KB of cache, it would need to store two extra address bits,
leaving 6 bits for the bank number, and would not be able to cache banks
$3B through $7F (i.e. only 3.6875 MB of fast RAM would be cached).

With 8 KB of cache, it would need to store three extra address bits,
leaving 5 bits for the bank number, and would not be able to cache banks
$1B through $7F (i.e. only 1.6875 MB of fast RAM would be cached).


It should be relatively easy to test this theory, given a ZipGS with
different cache sizes: write a machine code program which repeatedly
reads the same memory location a predetermined number of times (large
enough to be able to get a reasonably accurate figure from a stopwatch,
or use the horizontal or vertical screen position to measure the
period), and repeat it for each bank, noting the period required to read
in each bank.  If the read slows down beyond a certain bank and this
changes roughly in line with the numbers above, then we have a very
likely theory of implementation.

The following program does the trick.  It runs at $0300 under
BASIC.SYSTEM (or even if you Ctrl-Reset from the "Check startup device"
screen).

I call it CACHETEST.

        clc
        xce             ; Native mode
        rep #$30        ; Go to 16-bit mode
        pha
        pha             ; Space for the resull
        ldx #$1D02
        jsl $E10000     ; Call _TotalMem
        pla             ; Discard the low order word
        pla             ; Keep the high order word
        sep #$30        ; Back to 8-bit mode (still native)
        tax             ; X has number of banks of RAM, including slow
        dex
        dex             ; Don't count the slow banks
        phx
clear:  stz $1000,x     ; Clear the rest of the table
        inx
        bne clear
        sei             ; Don't interrupt me
        ldy #$01        ; Flag: first pass to load code into cache
        ldx #$01        ; Use bank 0 for this pass
banklp: dex             ; Move down one bank
        phx
        plb             ; Select the target data bank
        lda $1234       ; Read an arbitrary location once - load cache
        lda $E0C02F     ; Read the horizontal counter now
        xba             ; Hold it in B
        cmp $1234       ; Read the same location several times
        cmp $1234
        cmp $1234
        cmp $1234
        cmp $1234
        cmp $1234
        cmp $1234
        cmp $1234
        cmp $1234
        cmp $1234
        lda $E0C02F     ; Read the horizontal counter again
        xba             ; Get back the initial value, hold final
        and #$7F        ; Ignore the vertical count bit
        bne nz1
        lda #$3F        ; Adjust count value to be consecutive $3F-$7F
nz1:    pha
        xba             ; Get back the final value
        and #$7F        ; Ignore the vertical count bit
        bne nz2
        lda #$3F        ; Repeat for the end value
nz2:    cmp 1,s         ; Did the value wrap around?
        bcs nowrap
        adc #$41        ; Yes - compensate
        sec             ; and set the carry for the next subtraction
nowrap: sbc 1,s         ; Get number of 1 MHz cycles
        sta $001000,x   ; Store it in the table
        pla             ; Clean up the stack
        cpy #$01        ; Was this the first pass?
        bne notp1
        dey             ; Yes: go back and do it properly this time
        plx
        bra banklp
notp1:  cpx #$00
        bne banklp
        cli             ; Allow interrupts again
        sec
        xce             ; Emulation mode
        rts

Here it is in machine code.  (Transcribed back after using the
mini-assembler, and subsequently entered again by hand, so it should be
right.)

300:18 FB C2 30 48 48 A2 02 1D 22 00 00 E1 68 68 E2
310:30 AA CA CA DA 9E 00 10 E8 D0 FA 78 A0 01 A2 01
320:CA DA AB AD 34 12 AF 2F C0 E0 EB CD 34 12 CD 34
330:12 CD 34 12 CD 34 12 CD 34 12 CD 34 12 CD 34 12
340:CD 34 12 CD 34 12 CD 34 12 AF 2F C0 E0 EB 29 7F
350:D0 02 A9 3F 48 EB 29 7F D0 02 A9 3F C3 01 B0 03
360:69 41 38 E3 01 9F 00 10 00 68 C0 01 D0 04 88 FA
370:80 AE E0 00 D0 AA 58 38 FB 60

The end result will be a table at memory locations $1000 to $107F which
contains the number of 1 MHz cycles required to do the ten CMP
instructions (plus a little overhead) for each bank, or $00 if that bank
doesn't exist.  These should be roughly the same for each bank, except
where a cache/noncache boundary is crossed, at which point it will show
significantly larger values.

If the IIgs was running at normal speed (1.023 MHz), I'd expect 48
cycles to elapse between the two references to the horizontal counter.

Now for some real tests: my system is a ROM 3 IIgs with 4 MB memory card
(5 MB total) and an 8 MHz ZipGS with 16K of cache.

At normal speed: all banks take 48 ($30) cycles.

At fast speed, Zip disabled: all banks take 20 ($14) cycles.

Zip enabled (8 MHz): banks $00 to $2F take 8 cycles, banks $30 to $4F
take 14 or 15 ($0E or $0F) cycles.

I like it when my theories turn out to be more or less right.  :-)


Just for a laugh, I tried this on Bernie to the Rescue.  It appears to
emulate the horizontal counter correctly if the Control Panel is set to
"normal" speed (1 MHz): 48 cycles per loop.  If the Control Panel is set
to fast, it looks like the horizontal counter is tied to the emulated
CPU frequency, not 1 MHz: I get 18 or 19 cycles no matter what speed I
tell Bernie to run at.

I have another IIgs with a 9 MHz/64K Zip, but it is packed away so I
can't do any further testing.  Would some other people like to try this
little program and provide some results?  It might also be interesting
to run this on TransWarp GS systems (it doesn't do anything specific to
the Zip).


Now, to revise my theory slightly.  It looks like a 16 K cache allows
the Zip to cache 3 MB of RAM (48 banks), not 3.6875 MB (59 banks) as I
expected.

Further testing reveals that the Zip _is_ caching banks $E0 and $E1, as
well as $FC through $FF.  This leaves room for 10 unused values, one of
which could be "not valid".

There might be some extra details for caching the bank-switched memory
areas in banks 0, 1, $E0 and $E1.


To summarise my (revised) theory:

With 8 KB cache, the Zip GS will only cache 1 MB of fast RAM.

With 16 KB cache, the Zip GS will cache 3 MB of fast RAM.  [Confirmed]

With 32 KB cache, the Zip GS will cache 7 MB of fast RAM.

With 64 KB cache, the Zip GS should be able to cache all 8 MB.


(Mitch: this probably explains the slowdown you told me about, if you
only have 32 KB of cache in your ZipGS.)